SNAP and NER for Latin inscriptions

Prosopographies have in the past often taken decades or even centuries to produce. Even for a period with relatively few sources such as antiquity, hundreds of thousands of texts had to be collected and read, personal names had to be copied on index cards, people had to be identified across sources, their relations then had to be examined and their lives had to reconstructed.

You Aren’t Gonna Need It

We've been discussing lately how to merge person records in SNAP, so that when we encounter partner projects that each have a record for the same person, SNAP can provide a useful service by combining those into single, merged records, and we can start to get an idea of the requirements for performing operations like merges on our data. This discussion has proved something of a rabbit hole.

The Old Classes vs Properties Debate (or Relationships Are Hard, Part 2)

One of the decisions that has to be made when creating an ontology is which concepts you encode as classes and which you encode as properties of those classes. One of the difficulties is that there is no overarching 'right answer' (although there are wrong ones) to how you should model your domain, in has to be decided on a case-by-case basis of what works best for the type of world view that you are trying to encapsulate within your model. This post is a request for feedback to help us decide which model works best for both the project and the wider community.


One question that came up during the workshop a couple of weeks ago was: if partner projects already assign their own URIs/ids to their person/name/etc. records, then why should SNAP assign its own identifiers? There are two answers to that, one very practical, and the other a bit more philosophical.

  1. SNAP IDs will be URIs themselves, and when dereferenced in a browser, or by an application, will return a result. Either a web page listing what SNAP knows about the record in question, or RDF data about it. We can’t do this in a practical way without assigning our own identifiers.
  2. On a more theoretical level, we think that any updates made to data post-ingest shouldn’t be made directly on our partners’ data. We believe, for example, that while SNAP might assert an identity between two person records coming from two partner datasets, it will be up to the partners whether they accept that identification.

Why SNAP IDs?

SNAP at Digital Humanities 2014

The SNAP Project is proud to announce the Ontologies for Prosopography: Who's Who? or, Who was Who? one-day workshop developed in conjunction with the People of the Founding Era project based at the University of Virginia. The workshop will give the opportunity for SNAP to present our data model to a wider audience and engage with the researchers working on similar problems other periods and geographic areas.

Fluffy dates and associated places

One of the conversations that it was really useful to hash out in person and with the involvement of so many experts and interested parties present at the workshop a couple weeks ago, was the question of how the SNAP:DRGN Cookbook should recommend contributing person-datasets represent date information.

It has been our working assumption that the minimalist information SNAP is ingesting would optionally include a single, undifferentiated, very crudely recorded date associated with  person. (By the same token, any place information associated with a person would be given only in very blunt form, inasmuch as it serves almost as an extra name, epithet or indentifier for a person. Further more granular place association, à la Pelagios, might be included in the original prosopography, and/or in the exposed RDF serialization of said dataset, but SNAP will only expect and take advantage of associated place in the most abstract form.) The argument may be at its clearest with respect to dating, however, partly because there are so many strong arguments for including more granular and semantic date information in a prosopographic dataset.

Workshop slides and recap

At last week's SNAP workshop in King's College London, we had a very successful and enjoyable two-day meeting, introducing the principles of and the preliminary work done by the SNAP:DRGN project in its first three months, and hearing from several potential project partners about their datasets, practices and reactions to our proposals. It was an extremely productive and positive affair, even when discussions sometimes became vigorous! I don't mean to summarize all of the discussions and outcomes here (a series of blog posts by my colleagues over the next couple of weeks will do more of that), but I will share what I can of the presentations and slideshows that were shown at the workshop.

Procedures to identify co-references in contributing datasets

During the SNAP workshop of March 30 and April 1 we had a breakout session on the problem of trying to identify overlaps between the contributing datasets, one of SNAP challenges. Given how diverse the information provided by each will be, this is not a given. What follows is my idiosyncratic survey of what was discussed at the session, with thanks to the participants of course.

Pelagios and SNAP

SNAP is part of an ecology of interlocking Linked Open Data initiatives, sometimes referred to as the Graph of Ancient World Data (GAWD). A forthcoming special issue of ISAW papers gives a helpful overview of the breadth of this community and its interests. One of the strengths of this approach is that each project contributes to and benefits from activities elsewhere and thus the whole becomes greater than the sum of its parts. One of these projects, Pelagios, plays a complementary role to SNAP by connecting online documents to places, rather than people. Pelagios has now been running since February 2011 – this blog post provides a little background to it and perhaps even indicate possible future directions of travel for SNAP.