State of the Snap-Nation

With the end of the pilot project scarily in sight it is time to review where we are and where we hope to be by the end of December.

The big news is that (hopefully) the first set of SNAP identifiers are now frozen!

What this means is that for the first 5 datasets have now been ingested and had SNAP identifiers linked to each of the persons and those identifiers are fixed. There may still be a few tweaks to the RDF descriptive data coming in from the projects but the identifiers will remain the same.

We had been experimenting with starting the idetifiers at 100001 to add a little bit of consistency with number length but after doing that for a while during the testing stage we decided that it wasn’t worth doing and we would just keep things simple and start at 1.

Currently the following datasets have been ingested:

Project Snap Identifier Range
PIR 1 – 10924
TM 10925 – 367917
LGPN 367918 – 671019
British Museum (Selection) 671020 – 671972
VIAF (Selection) 671973 – 673753

 

One big change over the summer was the move from Sesame to 4Store. We started off using Sesame because that was the standard triplestore for projects at DDH in King’s and therefore the easiest to set up and support on our servers. Whether we did something wrong in setup or the tens of millions of triples more than we usual deal with, Sesame didn’t prove to be robust enough to deal with even the founding datasets in a timely enough fashion. After some wrangling (next time we get VM upgraded to the version that has a package ready built for it) we installed 4Store and so far it has stood up to everything that we have thrown at it without any noticeable loss in performance. While not wanting to go as far as saying it bodes well, it definitely doesn’t fill me with a nameless dread at the thought of more people than me accessing the data.

Moving on from the data itself to displaying it in a easy to read form – person website itself is solidifying slowly (even if it isn’t the fanciest cupcake on the shelf). In this we must thank Davide Bellini who is interning with us. Although he was supposed to be working on another project, we lured him away with promises (or possibly threats) of learning python, django and the opportunity of looking into the abyss which is SPARQL. Having successfully made his first mark on the Person Profile pages he is now working on the script to somewhat automate the record merging procedure and continuing to upgrade the profile page displays . Between the two of us we hope to have the Person profile page filled out and the first merged records ingested by Christmas (I will leave the question of which Christmas to the reader’s imagination).

For those who want to see the ‘in-progress’ person pages, they can be accessed at http://data.snapdrgn.net/person/<number> e.g. http://data.snapdrgn.net/person/1

Leave a Reply

Your email address will not be published. Required fields are marked *