Recommendations for EDH person-data RDF

At the first meeting of the Open Epigraphic Data Unconference (OEDUc) in London in May 2017, one of the working groups that met in the afternoon (and claim to have completed our brief, so do not propose to meet again) examined the person-data offered for download on the EDH open data repository, and made some recommendations for making this data more compatible with the SNAP:DRGN guidelines.

Currently, the RDF of a person-record in the EDH data (in TTL format) looks like:

    a lawd:Person ;
    lawd:PersonalName "Nonia Optata"@lat ;
    gndo:gender <> ;
    nmo:hasStartDate "0071" ;
    nmo:hasEndDate "0130" ;
    snap:associatedPlace <> ,
        <> ;
    lawd:hasAttestation <> .

We identified a few problems with this data structure, and made recommendations as follows.

  1. We propose that EDH split the current person references in edh_people.ttl into: (a) one lawd:Person, which has the properties for name, gender, status, membership, and hasAttestation, and (b) one lawd:PersonAttestation, which has properties dct:Source (which points to the URI for the inscription itself) and lawd:Citation. Date and location etc. can then be derived from the inscription (which is where they belong).
  2. A few observations:
    1. Lawd:PersonalName is a class, not a property. The recommended property for a personal name as a string is foaf:name
    2. the language tag for Latin should be @la (not lat)
    3. there are currently thousands of empty strings tagged as Greek
    4. Nomisma date properties cannot be used on person, because the definition is inappropriate (and unclear)
    5. As documented, Nomisma date properties refer only to numismatic dates, not epigraphic (I would request a modification to their documentation for this)
    6. the D-N.B ontology for gender is inadequate (which is partly why SNAP has avoided tagging gender so far); a better ontology may be found, but I would suggest plain text values for now
    7. to the person record, above, we could then add dct:identifier with the PIR number (and compare discussion of plans for disambiguation of PIR persons in another working group)

Are you a prosopography?

At the SNAP:DRGN project meeting in Edinburgh a few weeks ago, we decided on a couple of definitions that will impact on the ways in which partner datasets interact with the project. Our current thinking is that we need to distinguish between two kinds of data:

(1) The first kind, which we'll loosely call a "prosopography", is a curated database of person records, with some ambition to be able to be used as an authority list. Prosopographies such as PIR, Broughton, PBW, etc. would be obvious examples of this category, as would the controlled vocabulary of persons in a library catalog like VIAF, Zenon, British Museum persons, Trismegistos Authors, the Perseus Catalog, etc. Even if the task of co-referencing persons is incomplete (as with Trismegistos, say), the intention to disambiguate qualifies the dataset as a "prosopography".

Some example RDF fragments

In the process of working with a few of our partner projects, we have produced some sample RDF fragments, which we thought might be useful as an illustration of SNAP RDF format for other projects currently planning to expose a version of their data via our graph. We hope to include at least some examples of this kind in a later version of the SNAP:DRGN Cookbook.

Entering the SNAPDRGN garden

Now that the SNAP project has started ingest finalized data from the initial core datasets, it is time to think about how to bring in material from the other partners. For some, this will be easy, as they already know to make available their data in RDF form on the open web and simply need to follow the guidelines in the Cookbook. For others quite a lot of work will be involved getting SNAP ready. This post describes some of the stages you may go through, and some of the problems that you may meet.

I have divided the work into six steps:

Fluffy dates and associated places

One of the conversations that it was really useful to hash out in person and with the involvement of so many experts and interested parties present at the workshop a couple weeks ago, was the question of how the SNAP:DRGN Cookbook should recommend contributing person-datasets represent date information.

One of the conversations that it was really useful to hash out in person and with the involvement of so many experts and interested parties present at the workshop a couple weeks ago, was the question of how the SNAP:DRGN Cookbook should recommend contributing person-datasets represent date information.

It has been our working assumption that the minimalist information SNAP is ingesting would optionally include a single, undifferentiated, very crudely recorded date associated with  person. (By the same token, any place information associated with a person would be given only in very blunt form, inasmuch as it serves almost as an extra name, epithet or indentifier for a person. Further more granular place association, à la Pelagios, might be included in the original prosopography, and/or in the exposed RDF serialization of said dataset, but SNAP will only expect and take advantage of associated place in the most abstract form.) The argument may be at its clearest with respect to dating, however, partly because there are so many strong arguments for including more granular and semantic date information in a prosopographic dataset.

Workshop slides and recap

At last week's SNAP workshop in King's College London, we had a very successful and enjoyable two-day meeting, introducing the principles of and the preliminary work done by the SNAP:DRGN project in its first three months, and hearing from several potential project partners about their datasets, practices and reactions to our proposals. It was an extremely productive and positive affair, even when discussions sometimes became vigorous! I don't mean to summarize all of the discussions and outcomes here (a series of blog posts by my colleagues over the next couple of weeks will do more of that), but I will share what I can of the presentations and slideshows that were shown at the workshop.