How to find people in the SNAP graph

As you probably know, the pilot SNAP:DRGN project ended in December 2014, and although there are nearly seven hundred thousand person records visible through the public triplestore (SNAP 1SNAP 673934), we are currently lacking a user-friendly way to search within and find these records. (We’re working on this, as we’ll report here soon.) Most of the person records in SNAP so far are from LGPN, Trismegistos and PIR, but if you have a reference to PIR² M 436, say, or LGPN V.2 Θουκυδίδης 11, and want to find the SNAP URI with which to annotate your texts, there’s no obvious way to know that these are SNAP 9024 and 33624 respectively.

Of course, if you’re a Linked Data professional or developer and want to query our Sparql endpoint directly, you can do so by pointing your code at https://snap.dighum.kcl.ac.uk/sparql/ and adding parameters and queries as desired. You don’t have a problem, so probably weren’t desperate for our putative API in the first place (although the examle SPARQL queries below may still get you started).

You can also in principle query our triplestore using the Sparql test page at https://snap.dighum.kcl.ac.uk/query/, with a few simple lines of Sparql to find the information you need. This is not a user-friendly interface for a historian, and I’m not pretending this is an acceptable substitute for a proper search interface, but if you’re desperate to find SNAP uris to add to your code, this might be a stop-gap to get you started.

Some example SPARQL queries that might be useful for you have been collected at the SNAP volunteers’ Github Wiki. I’ll walk you briefly through the specific query for finding the canonical SNAP uri for a person when you know the uri of the contributing dataset. (This is complicated only slightly by the fact that if two person records have been merged in SNAP, the canonical person id will be the new, merged uri, not the first one that was minted for the contributed record.) In the SPARQL search box, delete the query that is there already, and replace it with the code below, (obviously replace the LGPN id with whatever original id you are looking for).

PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT DISTINCT ?snap2
WHERE {
{?snap2 dc:replaces ?snap1 .
?snap1 prov:wasDerivedFrom <http://www.lgpn.ox.ac.uk/id/V4-18886>}
UNION
{ ?snap2 prov:wasDerivedFrom <http://www.lgpn.ox.ac.uk/id/V4-18886>
MINUS { ?s dc:replaces ?snap2 } }
}

This query is basically saying, in sequence:

  1. find me the new SNAP id that replaced the first id that we minted from the contrubuted record
  2. *and* find me the id that we minted for the contributed record so long as it has not been replaced.

In either case, this should return one and only one, canonical identifier for the person you are looking for.

To find the original URI of a person in:

  1. LGPN: go to the LGPN search page and find the person you’re looking for (search by name or place etc.). Copy the full URI of the person: it will be shown as the id, e.g. V3a-31030, with a link to the URI in the form: http://www.lgpn.ox.ac.uk/id/V3a-31030 (copy link address to get this full link).
  2. Trismegistos: go to Trismegistos People and find the person (not just name) you’re looking for. The person record will have a note of a “stable url (with PERS_ID)”, in the form www.trismegistos.org/person/228313. (Note that this is a PERS_ID from the person page, *not* a REF_ID.) Prefix this with “http://” and add to the end the string “#this”, i.e.: http://www.trismegistos.org/person/228313#this
  3. PIR: search the PIR search mask for your person. Make a note of the person’s PIR identifier, e.g. “PIR² I 662″. Take the URI prefix “http://www.paregorios.org/resources/roman-elites/persons/”, and add your PIR id, in lowercase, with hyphens instead of spaces, and the number padded to 4 digits, i.e. pir2-i-0662. This is not a resolvable URL, so it won’t be found on the web, but it will find the SNAP uri for you using the sparql query above.

Different types of SNAP partner projects

Broadly speaking, there are three categories of project that deal with ancient person or name data which we would like to see collaborating with SNAP:DRGN. For the sake of argument I’ll call these “prosopographies”, “person and name authorities” and “digital editions containing named entities.”

1. Prosopographies

This category of database is the full prosopographical resource: each person entry includes bibliographical and at least some biographical information, along with date, titles, relationships, perhaps “factoids” and other information. This is almost certainly a superset of the information that SNAP RDF will capture, and linking back to the prosopographical  database from SNAP wil be essential to give users access to the full data. (This is standard SNAP practice anyway, but in this case it’s more than just polite, it’s a sine qua non.) The person-records in a “prosopography” are considered to be normalized and disambiguated to a high scholarly standard, with a named and dated authority, and clearly stated principles.

The overlap between any two prosopographies is likely to be very small, by the nature of the division of work in the discipline.

2. Person and name authorities

An authority list of names or persons, on the other hand, might be a controlled vocabulary of ancient authors from a library catalogue, or list of all attested names in an epigraphic corpus. These are expected to be disambiguated and curated, but may not have detailed scholarly apparatus of any kind. They are also unlikely to contain biographical information, and may contain no more than unique identifier plus one form of a name. A list of names, on the other hand, may not disambiguate persons at all, but merely list all attestations in a corpus, even if the same person is mentioned many times. There may be some normalization of names, and to be useful at very least a list of bibliographical attestations would be desirable.

The overlap between any two library catalogues is likely to be very large, but the overlap between a list of ancient authors and a “prosopography” is probably small.

3. Digital editions containing named entities

These sources will probably not in the first instance disambiguate persons or names at all, and rather than contributing new person records to SNAP, it is more likely that they will take advantage of the person authority list by using SNAP identifiers to disambiguate person references in their texts or data. There are a few ways to do this, for example:

a. In a database field: a dataset listing, for example, issuers of coin mints or individual potters, might link from their person fields to the SNAP identifier of that person.

b. In an XML tag: a text mentioning persons might use a TEI element and attribute to point to the SNAP identifier of the persons referred to (either directly or via an internal authority list). For example, in a TEI edition:

<persName ref="http://data.snapdrgn.net/person/10924/">C. Cassius Celer</persname>

c. As RDF annotations, using the Open Annotation format (as specified in the Cookbook under “Scenario 5″). This would serve the same function, albeit via a stand-off method, as b. above, with the added value that the RDF can be contributed to the SNAP:DRGN graph of references to a person. (It is to be hoped that projects using methods a. or b. would also eventually be persuaded to use this RDF format to contribute their annotations to the graph.) An example annotation:

<http://example.org/annotations/01>
     rdf:type oa:Annotation ;
     oa:hasTarget <http://insaph.kcl.ac.uk/iaph2007/iAph120614.html#al1> ;
     oa:hasBody <http://data.snapdrgn.net/person/6293/> ;
     oa:annotatedBy <http://viaf.org/viaf/106795353> ;
     oa:annotatedAt "2014-11-
18T12:17:00+00:00Z"^^xsd:date .

One outcome of this disambiguation process, which might be software assisted, involve student or community contributions, might be to turn a purely “list of names” type projects into a fully disambiguated prosopography, at which point the new list of persons (including no doubt some not yet listed in SNAP:DRGN) could become a new SNAP partner project, of type (1) or (2) above.

Third Advisory Board meeting minutes

SNAP:DRGN Advisory Board (AB)

3nd meeting Skype (voice only) 2015-02-23

Present: Øyvind Eide (ØE, chair), Fabian Koerner (FK), Robert Parker (RP), Laurie Pearce (LP), Charlotte Roueché (CR, until 17:35), Rainer Simon (RS), Gabriel Bodard (GB, principal investigator)

1. Call to order (17.00)

2. Updates from PI (GB) (17.05)

The project ended officially at the end of December 2014, but the granting organisation has approved a budget extension until the end of September 2015 with no additional funding. This means that the remaining approx. £ 4,000 can be used for travel for outreach and work with partners. GB asked for suggestions for how to best put the remaining funds to use, e.g., travel and/or a small workshop.

There are three main outputs from the project:

  • This is ready for release of 2.0 with substantial development from 1.0. Cosmetic changes are still needed in order to improve development form, but the cookbook is functional.
  • This is to a large extent based on reuse from other ontologies, including LAWD, FOAF and SKOS. It contains a number of relationship types between people. This is adequate for current data but experience shows that it must be expanded whenever new datasets are added.
  • This includes less data than expected, only five datasets so far. It turned out to be more difficult than anticipated to get data in from providers. However, another 10 or so datasets are waiting in line, held up by different obstacles.

The only way of finding a person is currently through a URI or through a query to the SPARQL endpoint. Better search facilities are needed. As there is no more funding for salaries left it is limited how much can be done with this for the time being, but a group of nine students are ready to start working on the project. Whether this will be mainly a training exercise for the students or if they will be able to do significant work on some of the around 40 identified remaining micro-tasks remains to be seen. Work on improving the API would be especially welcome.

SNAP is keen to identify new prosopographies and partners. These can be of three types:

  • Actual databases with information about persons, from full prosopographies to those of the type found in LGPN.
  • Sources of names with identifiers, including library catalogues, lists of authors, lists of emperors.
  • Users of the SNAP system for disambiguation of local resources. This includes projects and texts with lots of references to personal names needing disambiguating tools as well as out-of-date works that are still being used and cited; some of which have even been supplanted by more recent print versions.

3. Discussion of update (17.20)

Some travel ideas were suggested:

  • LP suggested a possible visit to BPS. She also pointed out that the HBTIN corpus is still 12-18 months from a stage where it will be ready for SNAP.

GB replied that this sounds interesting; however, given travel distance and costs, it may be better if SNAP team member Hugh Cayless make this visit.

  • E suggested that some funding could be spent on working with the co-reference development and the establishment of family relationships and roles in CIDOC-CRM, e.g., through a workshop in London in late April/early May – some of the people working on this is located in London. He also suggested a presentation of the SNAP ontology at the next CRM SIG meeting in May would be a good idea.

GB was positive to the idea and commented that it would be good to bring team member Faith Lawrence into the conversation about family roles and relationships. There are many new non-family relationships being identified, esp. from Smith Dictionary. Examples include ‘appointed’ and ‘killed’.

  • FK suggested a Berlin visit to meet with the Akademie about PMBZ and other topics of common interest.

GB said that this would be good and that he hopes there will be more data to get from the Akademie. He had a meeting with Matthäus Heil in December, who was keen to work with SNAP.

  • CR suggested that the online version of the British Prosopography of the Byzantine Empire, which first version 2001 on CD, should also be presented in Berlin. John Bradley is close to finalising the work of turning it into a web publication. It includes all relevant PMBZ ID numbers.

4. Future work (17.40)

GB presented his ideas for the future. He is currently putting together a grant for a further project, consisting of:

  1. Data creation and ingestion of existing data sets.
  2. Community building. This is needed because the plans for the inclusion of data sets in the original proposal was too optimistic.
  3. Identify new types of input, e.g., OCR of and automatic extraction of data from already scanned out of copyright sources.

FK asked about the feasibility of this. GB replied that it would be essential to define how much resources (time) to be put into such experiments as they could easily swallow large resources without significant results.

  1. Experiments with the establishment of new prosopographies, e.g., for gods.

Gods, mythological persons, and historical persons are not the same but still interconnected; there was some discussion of the feasibility of such a project, but the idea was taken positively. ØE pointed out that in the cultural heritage context of the CIDOC-CRM gods and mythological persons are concepts.

  1. Scholarly use of the resources created; what will people do with this? Building better search functionality and finding out what functionality researchers want is important. This should be done in the form of workshops with rapid prototyping so that the researchers can play with possible solutions to understand better what functionality they really want.
  2. For the data set itself, it should develop into a gold standard for future work, with the possible use of name tables for spell checking and for morphological parsing.
  3. As a scholarly contribution GB also suggested to define how to model opinions, develop systems for structured commentaries. One could have 2–3 groups of people, e.g., scholars, graduate students, the general public, to contribute disambiguation and co-references.

5. Any other business (17.50)

GB asked if anyone would be interested in a discussion about the SNAP ontology with Faith Lawrence and Hugh Cayless? Nobody opposed this. GB will take upon himself to organise it.

GB also signalled that he would invite the group to become members of the AB of the next project. Formal invitations will follow in due time.

6. Summing up (17.55)

  • E thanked GB on behalf of the AB, GB thanked everybody in the AB for their contribution. A general round of thanks.

Who does SNAP:DRGN serve?

dh-snap-thumbAs we come to the end of the first year of SNAP:DRGN funding, and start planning applications for follow-up funding, it is worth rehearsing the main academic and other benefits of the SNAP:DRGN projects and the prosopographical-onomastic graph that we hope it feeds into.

A virtual authority list of ancient persons for scholars and cultural heritage institutions to point to.

The key role of the SNAP:DRGN project from the start was to provide a set of prosopographical unique identifiers for use in disambiguating names and other person (and person-like) references in digital texts and other datasets. Having a single URI to specify Alexander the Great as opposed to Alexander of Aphrodisias, Alexander of Abounoteichos, or Alexander of Troy will be very powerful and useful. Scholarly and heritage databases will be able to use SNAP identifiers both in their own code (an @ref attribute in a TEI edition, for example) and produce SNAP/Pelagios-style open annotations associating names in their texts with person identifiers.

Ground truth/gold standard dataset for future data mining/NER work.

The long lists of names, people, titles and other references coming from the many corpora that make up the SNAP:DRGN graph will be a valuable dataset in their own right. Lists of names and persons can be seen as a Gold Standard for seeding named-entity recognition tools, especially those involving machine learning. Name lists will also contribute to the back-end of spellcheckers, morphological parsers such as Morpheus/Alpheios, and the toolkits for correcting OCR of ancient texts.

Annotation graph, visualization and API for research purposes.

When we have not only a graph of persons, co-references and relationships from our contributing prosopographical datasets, but also a large collection of open annotations, as described above, we expect that the network of people, names, places, references and citations will be a research tool in its own right. It is our intention to test this assumption by building an API and search interface, perhaps in association with visualization and social network analysis tools, and put them in front of historians to see what they make of them. Vigorous criticism that inevitably results could then the followed by cycles of rapid implementation and further testing.

Public engagement, scholarly contribution and citizen science.

Once we have a large amount of data in the SNAP:DRGN graph, the task of identifying co-references between persons from different databases, recording relationships between persons, and disambiguating other kinds of information will become essential. Some of these may be partly automated, others may only be possible to perform by hand. Scholars, students, and even citizen contributors might be recruited to help with the task of identifying or confirming such annotations, or adding structured commentary of other kinds. The most important outcome of such a “citizen science” exercise, in my opinion, is the engagement of scholars and public from outside the project in this kind of material. Students could benefit hugely from engaging directly with prosopographies, epigraphy and other primary sources, making decision and receiving credit for their work. As I argued recently, “If you’re doing your job properly, there’s no distinction between citizen science and pedagogy.”

Contributions to the tools and ontologies ecosystem.

Finally, SNAP:DRGN is participating in several communities of tooling (Pelagios, Recogito, Perseids, Berkeley Prosopography Services), standards (Pelagios, TEI), and ontologies (LAWD, CIDOC-CRM, Open Annotation Collaboration, Ontology for Historical Prosopography). It is our hope that we will not only benefit from building on these existing community projects and resources, but also contribute experience, code and documentation to many of these projects in turn. If collaborative work of this kind does not result in shared code and outcomes, something has gone horribly wrong. SNAP participants have already been invited onto several special interest groups and advisory boards.

Comments on, objections about, and additions to this account are very welcome.

FAQ: What are the limits of SNAP content?

We have often been asked:

“SNAP” contains the word “Ancient,” which suggests a rather inclusive definition of classical antiquity, but “DRGN” includes “Greco-Roman”, which implies more traditional restriction. Are you interested in prosopographies from outside the strictly Greek and Roman world?

Yes! (Short answer.)

Longer answer is in two parts:

Antirrhinum(1) yes, we’re certainly interested in prosopographies and other person-data lists from outside the classical Greco-Roman world. The second half of the acronym, “DRGN,” was unfortunately massaged to suggest the word “snapdragon,” and I now regret the implication that we might be either linguistically, culturally or geographically limited to Greece and Rome;

(2) we expect the protocols and tools developed by SNAP eventually to be of relevance to all places and periods, but we’re defining our initial scope as “the Ancient Mediterranean and geographically or chronologically intersecting cultures.” So we’ll start with Greece and Rome and Egypt, perhaps Persian, Phoenician, Punic, Tifinag, Iberic, Celtic, etc., include dynastic Egypt and the Byzantine world, and slowly spread outward from there.

ChinaSo if you’re asking that question because you have a prosopography of Ancient India or China, Mediaeval Arabia, a catalogue of Celtic or old Norse personal names, or Sumerian/Babylonian person lists—then yes! We do want to hear from you. There will almost certainly be at least a one-person overlap between any two prosopographies in our collection eventually, and even if they weren’t, a single virtual authority of ancient persons from all world cultures will still be a valuable resource.

State of the Snap-Nation

With the end of the pilot project scarily in sight it is time to review where we are and where we hope to be by the end of December.

The big news is that (hopefully) the first set of SNAP identifiers are now frozen!

What this means is that for the first 5 datasets have now been ingested and had SNAP identifiers linked to each of the persons and those identifiers are fixed. There may still be a few tweaks to the RDF descriptive data coming in from the projects but the identifiers will remain the same.

We had been experimenting with starting the idetifiers at 100001 to add a little bit of consistency with number length but after doing that for a while during the testing stage we decided that it wasn’t worth doing and we would just keep things simple and start at 1.

Currently the following datasets have been ingested:

Project Snap Identifier Range
PIR 1 – 10924
TM 10925 – 367917
LGPN 367918 – 671019
British Museum (Selection) 671020 – 671972
VIAF (Selection) 671973 – 673753

 

One big change over the summer was the move from Sesame to 4Store. We started off using Sesame because that was the standard triplestore for projects at DDH in King’s and therefore the easiest to set up and support on our servers. Whether we did something wrong in setup or the tens of millions of triples more than we usual deal with, Sesame didn’t prove to be robust enough to deal with even the founding datasets in a timely enough fashion. After some wrangling (next time we get VM upgraded to the version that has a package ready built for it) we installed 4Store and so far it has stood up to everything that we have thrown at it without any noticeable loss in performance. While not wanting to go as far as saying it bodes well, it definitely doesn’t fill me with a nameless dread at the thought of more people than me accessing the data.

Moving on from the data itself to displaying it in a easy to read form – person website itself is solidifying slowly (even if it isn’t the fanciest cupcake on the shelf). In this we must thank Davide Bellini who is interning with us. Although he was supposed to be working on another project, we lured him away with promises (or possibly threats) of learning python, django and the opportunity of looking into the abyss which is SPARQL. Having successfully made his first mark on the Person Profile pages he is now working on the script to somewhat automate the record merging procedure and continuing to upgrade the profile page displays . Between the two of us we hope to have the Person profile page filled out and the first merged records ingested by Christmas (I will leave the question of which Christmas to the reader’s imagination).

For those who want to see the ‘in-progress’ person pages, they can be accessed at http://data.snapdrgn.net/person/<number> e.g. http://data.snapdrgn.net/person/1

Minutes of second advisory board meeting

SNAP:DRGN Advisory Board (AB)

2nd meeting Skype (voice only) 2014-08-27

Present: Øyvind Eide (ØE, chair), Fabian Koerner (FK), Laurie Pearce (LP), Charlotte Roueché (CR), Rainer Simon (RS), Gabriel Bodard (GB, principal investigator)

Apologies: Sonia Ranade, Robert Parker.

The meeting lasted one hour.

Minutes written by Øyvind Eide based on notes from Laurie Pearce.

1. Call to order (15.00)

ØE welcomes. Call for other business: none.

2. Updates from PI (GB) (15.05)

Main development is the release of Cookbook 1.0. It required testing, getting base data sets into RDF. Most of work was directed on showing/recommending to others how to put data into SNAP format. There was a major worksprint in Edinburgh where priorities for future funding of the project was also discussed, as the project is currently only funded for this calendar year.

Based on the important distinction of two kinds of data sets, the project has to make decisions about the next stages and priorities in the development. Two types of sets of data are distinguished:

  1. Prosopography: this is information about persons, intended to disambiguate them (even it disambiguation is not always successful). This is the kind of data SNAP would import.
  2. List of attestations: SNAP will not import and assign URIs to such data. The data owners are invited to annotate such datasets with SNAP ids. However, might test second integration to incorporate the data and annotate at a later stage.

SNAP is in discussions with VIAF about useful association between the two. There is a subset of about 2000 person references from VIAF, with dates before 1000 AND wrote in either Latin or Greek. Those w/o dates or languages are omitted from this set. This small subset has been imported to SNAP.

How SNAP can help VIAF: VIAF is not interested in all references, just those who are authors according to the library catalogues. If SNAP had a field for role/occupation, contributors who has data about persons being creators/actors/painters/poets/theologians etc. can be asked to provide it in order to flag relevant persons for VIAF. VIAF would then assign identifiers to these persons, even if no real information beyond the fact that the tombstone says “painter” is available.

Additional datasets: SNAP has received data from the British Museum and VIAF, and are in advanced conversation with others, including the Hellenistic Babylonia, PBW, Smith Dictionary, RIB and the Zenon catalogue.

Working on the triples to show functionality: this is the slowest part to get ready. RDF requires much work. Triples store that had been recommended was not capable of handling the data, and had to start with a more robust triple store. As a result, many elements of the API that were specified haven’t been built yet. SNAP has been in touch with contributors, have made mock up RDFs which are being tested, but no further production imports yet.

3. Discussion of update (15.15)

CR asked about the VIAF relationship. For example, for Julius Caesar, the VIAF record might put in one role only, and that role might not be author(ship). Have to consider the specific relationship as creator of work. There is RDF relationship between individuals and things they create.

GB replied that there is nothing to preclude assigning more roles, but are building subset that is minimalistic to work with other projects; not building a prosopography.

ØE: On more general level, more databases will have specific things each is interested in. Based on the simplicity of the SNAP project, these things will not apply to top layer. So callbacks to local databases may be necessary, but this is not simple. In order to get to a situation where one can access more detailed information from the local databases, one would have to map into something more complex than SNAP. One needs more a more advanced ontology to be able to connect into more complex prosopographies.

GB replied that there are not so many fields left that are not accounted for in SNAP that are reducible, and none of the providers does that level of reduction anyway. So this is currently not a relevant problem.

ØE asked about the discussion in Edinburgh that might have focused on future funding. He asked GB to share his notes/impressions with SNAP, even if the notes are brief. GB agreed to do so.

  • person-search as a research tool
  • graph-search as a crosswalk channel
  • speccing full annotation, certainty & disagreement
    • *and getting uptake*
  • Pelagios-style harvester for oac annotations pointing in to snap
  • infrastructure and optimization

GB: Items that remain for future processing: to integrate into a SNAP graph: a new scholarly statement identifying a name instance as a specific person, and to indicate the authority of who is making that identification and who disagrees. Will not yet have many references pointing in to SNAP by the end of the year. However, it would be useful to have the Pelagios harvester with “here are all the persons” and “these are the datasets that have been annotated with references to these persons.”

Getting infrastructure working and triples store working: Sesame is not powerful, but still needs much more memory, say the equivalent of 10 usual projects in a campus institution. (Migrating to FourStore led to much improved performance, but this is still a relatively small dataset.) Have to consider how to optimize and get more computing power.

ØE: Might look to supercomputing as a possible source of funding – most work in the humanities does not need this, but there is funding available if one can document a need.

CRMinf is an extension to CRM covering argumentation and inference making Link to documentation that is currently under development: CRMinf: the Argumentation Model. An Extension of CIDOC-CRM to support argumentation. http://www.cidoc-crm.org/technical_papers.html

4. Discussion of Cookbook (15.30)

The PDF version does not have much supporting prose. GB asks for pointers that could be added to the paragraphs to clarify what is intended and/or necessary to help users who are not able to read and understand RDF easily. He notes that the cookbook does not include a soup-to-nuts example, one that takes a user from start to finish. Should the cookbook include full markup of persons as examples? The meeting agreed this was a good idea, either in the cookbook or linked to from it.

LP asked about the preferred means for providing comments/feedback.

GB: Email is good for straightforward corrections, but please use the ancient-people email list for discussion. ØE noted that the raising of specific points on the ancient-people list is good way to flag issues.

CR found the cookbook to be very clear and helpful. However, as she went through the list of items she lost the overall picture. It would be good to have examples of minimum structure needed. Some potential contributors may not be certain of whether they have prosopographical data in the SNAP sense.

GB: Can show minimal sets with only date and name. He wants to include a description of what constitutes a prosopography in the SNAP sense.

ØE suggested that illustrations might be useful in order to understand the contents better.

RS found the cookbook clear. He brought up the topic of name properties. What makes a name important enough to get richer encoding? GB would like minimal encoding, such as birth name. Whether one would use additional properties depends on whether the contributing database has controlled vocabularies of names, as Trismegistos and LGPN have. One could and should contribute variant names to SNAP, but SNAP prefers the primary name.

RS brought up the annotating of documents with SNAP URIs as in the Pelagios use case. What is the boundary between a name and attestations to it? When does it become attestation and use RDF in cookbook, contrary to annotating images on inscriptions

GB: This comes down to whether data is truly prosopographical in the SNAP sense, as discussed above. Only some contributors have URNs for names. If you have, contribute them; otherwise, SNAP still wants the names. Attestations are links from SNAP to other data sets. Annotations are links for other data sets to SNAP.

ØE: Two-way links could allow for ingesting lists also from non-prosopographies.

GB: It is a question if your data is prosopographical or not, but this is not thought fully through. The intent of your annotations/attestation is central.

ØE: The issue of date (discussed in the first meeting) is now well-defined, and is not complex. Date is understood as a time period/point that overlaps with the life of person. It would be good to have an equally simple and clear definition for place.

GB: This should be linked to importance. He will ponder on a formulation.

FK: This should be left to the provider. We must keep in mind the choice of place will be difficult.

GB: One can include more than one place, if that’s the case.

FK: Would it be good?

GB: It will probably not hurt. More than one place means that all of them are significant.

ØE: We should stop now and continue the discussion at the ancient-people list.

5. Any other business (15:50)

None.

6. Summing up (15.55)

ØE asked about the SNAP ontology: should this be discussed at Skype or another format?

GB: It would be good to discuss it in more detail, but it is dependent the participation of Faith Lawrence and Hugh Cayless.

ØE: It could either be the topic of the third AB meeting or an additional ad hoc meeting on the topic. The AB will agree on how to proceed via email.

ØE thanked the participants and closed the meeting.

SNAP Persons

SNAP:DRGN uses the LAWD ontology to define persons and other person-like entities in our contributing datasets. A LAWD Person is a CIDOC-CRM E21_Person.

LAWD defines a top-level class “Agent” (lawd:Agent), and four sub-classes:

  • lawd:Person
  • lawd:Deity
  • lawd:MythologicalCreature
  • lawd:Group

When contributing your prosopographical dataset in the SNAP RDF format, you should use whichever of these classes best define the people in your database. Most of these will presumably be people; you may also have groups, families and corporate bodies distinguished, or you may have deities so defined. If you are unable reliably to distinguish between these types in your data, or if you have types that do not fit under any of these four headings, you should use the super-class lawd:Agent encode such persons.

(Or get in touch with me and suggest new terms for the LAWD ontology, if you think they are universal enough to be of general use to the community.)

(SNA)P

Being a conversation between Gabriel Bodard, Yanne Broux and Silke Vanbeselaere about the SNAP:DRGN project and Social Network Analysis

Cross-posted to Data Ninjas: http://spaghetti-os.blogspot.be/

Gabriel Bodard: So, tell me what is Social Network Analysis, and how is it useful for prosopography projects?

Silke Vanbeselaere: Social Network Analysis (SNA) is basically the study of relationships between people through network theory. First used in sociology, it’s now become popular in many other disciplines, with a budding group of enthusiasts in (ancient) history.
What it does, is focus on relations (of whatever kind) instead of on the actors individually. Through visualisation of the network graph and the network statistics, information can be obtained about the structure of the network and the roles of the individuals in it.

The visualization of these network graphs can be especially interesting for prosopography projects as it can help disambiguate people. Individuals are represented by nodes and their relationships are represented by ties or links between those nodes. Instead of dealing with one source at a time, the network allows you to see the whole of the relationships.

GB: Can you perhaps illustrate that with an example and how it could help us?

Yanne Broux: Off the top of my head: one of the things the extremely nifty disambiguation methods we developed could help you out with is the identification of high-ranked Roman officials across the different datasets. Consuls were often mentioned in dating formulas, and procurators, proconsuls, legati and the like were pretty mobile, so chances are they appear in texts across the empire. They’ll light up like Christmas trees once we shape them into a network.

What is the SNAP:DRGN project about, Gabby, and what sort of prosopographical data (especially relationships) does it contain?

GB: I’m glad you asked me that. SNAP:DRGN stands for “Standards for Networking Ancient Prosopographies: Data and Relations in Greco-Roman Names” (not an artificial backronym at all!). Very briefly, the aim of the project is to bring together person-records from as many online prosopographies of the ancient world as possible, using linked data to record only the most basic information (person identifiers, names, citations, date, place and hopefully relationships with other persons). We only plan to store this very summary data, along with links back to the richer records in the contributing data source, and enable annotation on top of that.

SV: What will people be able to do with this limited data, then?

GB: In particular, scholars will be able to (1) join together records originally from different databases that clearly refer to the same person; (2) point out relationships between persons, e.g. person XYZ from this database is the daughter of person ABC from that one; and (3) annotate their own texts (archaeological or library records, etc.) to disambiguate a personal name using SNAP as an authority list.

At the moment there are relatively few co-references between the prosopographical datasets, i.e people who appear in more than one database, in SNAP (although there will be plenty between the library catalogues), and the only explicitly encoded personal relationships are the ones imported from the Trismegistos database, but we’re working to improve both of these things. How does that sound to you?

SV: Basically, what we need for SNA is a link between the people and the texts in which they appear. Now, I have no idea how sophisticated these other datasets are, but to avoid confusion/ mistakes/ whatever kind of apocalyptic disaster, you need unique numerical identifiers, both for your individuals, and for your texts. Trismegistos Texts is now slowly expanding beyond the Egyptian borders, so perhaps we already have some of the texts incorporated in the other datasets, and then it should be pretty easy to link them. But I suspect that for most of the data, new identifiers will have to be created.

GB: Unique identifiers for all persons we have, of course. SNAP mints URIs for all persons we have data for, whether they had dereferenceable URIs in the source datasets or not. In some cases we have identifiers for texts too (TM uses Papyri.info URIs, as you know); in other cases, we’ve had to hope that parsing text strings will be sufficiently unambiguous to be useful. (We’ve identified a few hundred co-references between LGPN and TM using text strings.) We also have a lot of persons from library catalogues (VIAF, the British Museum, Zenon and Trismegistos Authors) among whom co-references ought to be plentiful.

So this seems to be a little circular at the moment, doesn’t it? One of the things SNA might help with is identifying co-references, which in turn will help us build a graph of relationships. But you’re telling me that SNA isn’t really feasible on our data until we have a much better graph of co-references, relationships, and text co-occurrences. Is there anything useful we could do together in the meantime?

YB: Since we are enriching Trismegistos, by adding new texts from around the Mediterranean, by identifying individuals in the Egyptian texts, and by adding extra information such as titles, ethnics and status designations, and at the same time you are enriching SNAP, we are actually feeding into each other symbiotically, like meat ants and leafhoppers that find each other over sugary sap in the Australian outback.

And hey, Silke, what about that “Structural Equivalence” hoodoo you’ve been learning in London, could that be of any help?

SV: Well, it is a very interesting concept that explores the social environment of a person, but that implies that all your data need to be extremely accurate. That means that you first need to identify all the people mentioned in your network. Because without that information, there’s no way you could rightly use your data to explore the structural equivalence of two or more people. As such, I don’t see how it would yet be suitable to use on the data that we would be presented with in SNAP. In the future however…

GB: Are there any improvements you can suggest for the Trismegistos database?

YB: We’re kind of stuck when it comes to titles. You see, we hardly have any. Asking the computer to retrieve them, like we did for the names, proved to be next to impossible, and it’s a hell of a lot of work if you have to go through some 500,000 attestations manually. I’ve already gone through more than 10,000 of them while working on my double names and municipal officials, so I’ve done my share, methinks. Also, it’s not exactly easy to standardize titles, what with all the different languages in Egypt and all. But I guess that if one of the other datasets has a list or something we could look into, that might help us out a bit…

GB: So, in a hopefully not too distant future, when all these relationships are implemented through SNAP:DRGN, how can the participating projects in turn be of service to you and other researchers who would like to use SNA? When SNAP is ready for SNA to be performed on it, what questions will you ask of it?

SV: Well, Gabby, I’m glad you ask. Prosopographies are the ideal datasets for SNA research as those datasets of people have been formed or selected because of some common features (mentioned in the same source, part of an ethnical/social group, time fellows…). Once the technical infrastructure is in place, it will be relatively straightforward to convert the virtual two-mode networks linking texts and the people appearing in them into the one-mode networks (person – person) needed for actual SNA.

GB: This all sounds very promising! Thank you so much for sharing these ideas. I look forward to being in a position to do a bit more with all this some day.

Are you a prosopography?

At the SNAP:DRGN project meeting in Edinburgh a few weeks ago, we decided on a couple of definitions that will impact on the ways in which partner datasets interact with the project. Our current thinking is that we need to distinguish between two kinds of data:

(1) The first kind, which we’ll loosely call a “prosopography”, is a curated database of person records, with some ambition to be able to be used as an authority list. Prosopographies such as PIR, Broughton, PBW, etc. would be obvious examples of this category, as would the controlled vocabulary of persons in a library catalog like VIAF, Zenon, British Museum persons, Trismegistos Authors, the Perseus Catalog, etc. Even if the task of co-referencing persons is incomplete (as with Trismegistos, say), the intention to disambiguate qualifies the dataset as a “prosopography”.

(2) The second, which we call a “list of attestations” is not comprehensively curated or disambiguated in this way, and has no ambition of being an authority list. Examples of this kind of dataset (as I understand them) would include: the EDH person table; the raw list of name references Mark has extracted from Latin inscriptions; the tagged and indexed “names and titles” in the texts of the Inscriptions of Aphrodisias or Inscriptions of Roman Tripolitania.

In the SNAP:DRGN workflow, we hope that all “prosopographies” of type 1 will be contributed into the SNAP graph. We shall assign SNAP URIs to all persons in the datasets, and in time work to co-reference and merge with persons sourced from other projects as well as possible. These will form the authority file to which other datasets will refer, and we would recommend that lists of “attestations” of type 2 use Pelagios-style OAC annotations (*) to point to the SNAP identifiers as a way of disambiguating their person-references.  The process of disambiguating and/or co-referencing persons in this way might eventually lead some lists of annotations to become disambiguated prosopographies in our schema, at which point we would potentially want to include them in the SNAP graph as first class entities.

(*) We hope to the have the SNAP:DRGN guidelines for these Pelagios-like annotations (“Scenario 5″ in our Cookbook) available very shortly.