We’ve been discussing lately how to merge person records in SNAP, so that when we encounter partner projects that each have a record for the same person, SNAP can provide a useful service by combining those into single, merged records, and we can start to get an idea of the requirements for performing operations like merges on our data. This discussion has proved something of a rabbit hole.
In any digital project there is always a temptation to plan for and build things that you think you may need later, or that might be nice to have, or that might help address questions that you don’t want to answer now, but might in the future. This temptation is almost always to be fought against. This is hard. We love to think about how things might work, and what people might want to do, but it’s always better in my experience to push towards the ruthlessly practical side of things. It is vastly easier to write software and to build data models when you have real requirements rather than speculative ones. Moreover, those speculative requirements frequently turn out to be different when they turn into real requirements. They may disappear on closer examination, or be vastly more complex, or otherwise metamorphose. If you wrote code to address these pseudo-requirements, it would have been a waste of time. Avoiding this kind of trap is a principle in software engineering, called YAGNI (see, e.g. http://c2.com/cgi/wiki?YouArentGonnaNeedIt). The pressure is even more acute (and more of a risk) in many DH projects, which are both research-oriented and often constrained in terms of resources.
SNAP has at least one of these speculative requirements. We know that in the future, we’ll want to allow people to make a variety of assertions about SNAP datasets. For example, we’ll want to support asserting that two “people” from different databases are in fact the same person, or that what is represented as a single person in a data source is actually two people, or that what a partner database has interpreted as a person actually isn’t (maybe a subsequent edition of the source document has determined that what was thought to be a name isn’t).
So how should we model these cases? We shouldn’t. Not until we’ve had the time to properly sort out all of the requirements for having SNAP users make these kinds of assertions against our data. We have ideas about how this might work, but we don’t have enough information on the parameters yet, and this functionality isn’t in scope for the current SNAP grant. All we want to be able to do right now is try out merging a few person records where our partner datasets have overlaps. Therefore, we aren’t going to model assertions about SNAP entities at all, just one of their outcomes: what a merged person looks like. The requirements for this are pretty straightforward: we need to know where the new person resource comes from, who is responsible for it, and why the merge was performed.
So lets start with two partner records (these are real)
<http://www.trismegistos.org/person/14218#this> a lawd:Person ; dc:publisher <http://www.trismegistos.org> ; lawd:hasName <http://www.trismegistos.org/name/6284#this> ; lawd:hasAttestation <http://www.trismegistos.org/ref/30996#person> .
<http://www.lgpn.ox.ac.uk/id/V2-60610> a lawd:Person ; dc:publisher <http://www.lgpn.ox.ac.uk> ; lawd:hasAttestation <http://www.lgpn.ox.ac.uk/id/V2-60610/personref/1>, <http://www.lgpn.ox.ac.uk/id/V2-60610/personref/2>, <http://www.lgpn.ox.ac.uk/id/V2-60610/personref/3> ; lawd:hasName <http://www.lgpn.ox.ac.uk/nym/nTi1marcos> ; foaf:name "Timarcos"@grc-Latn .
We can tell these are the same person, because they are both cite IG II² 3455 and 3777. When these are ingested into SNAP, they’ll get SNAP ID’s (these are imaginary)
<http://data.snapdrgn.net/people/1234> a lawd:Person ; prov:wasDerivedFrom <http://www.trismegistos.org/person/14218#this> .
and
<http://data.snapdrgn.net/people/1235> a lawd:Person ; prov:wasDerivedFrom <http://www.lgpn.ox.ac.uk/id/V2-60610> .
To merge them, we’ll just create a new person:
<http://data.snapdrgn.net/people/1236> a lawd:Person, snap:MergedResource ; dc:publisher <http://snapdrgn.net> ; dc:replaces <http://data.snapdrgn.net/people/1234>, <http://data.snapdrgn.net/people/1235> ; snap:reason <http://data.snapdrgn.net/people/1236#reason1> .
<http://data.snapdrgn.net/people/1236#reason1> a cnt:ContentAsText ; cnt:chars "Merged because both replaced persons cite the same texts, IG II(2) 3455 and 3777." .
And with that, we have the who, what, and why, but we haven’t had to make any guesses about how SNAP might work in the future. We can merge person records without having had to plan out a whole new infrastructure.
One thought on “You Aren’t Gonna Need It”