Blog

Marking up DOI

thammond – 2007 September 11

In XMP

(Update - 2007.09.15: Clean forgot to add in the rdf: namespace to the examples for xmp:Identifier in this post. I’ve now added in that namespace to the markup fragments listed. Also added in a comment here which shows the example in RDF/XML for those who may prefer that over RDF/N3.)

So, as a preliminary to reviewing how a fuller metadata description of a Crossref resource may best be fitted into an XMP packet for embedding into a PDF, let’s just consider how a DOI can be embedded into XMP. And since it’s so much clearer to read let’s just conduct this analysis using RDF/N3. (Life is too short to be spent reading RDF/XML or C++ code. :~)

(And further to Chris Shillum’s comment [(Update - 2007.09.15: Clean forgot to add in the rdf: namespace to the examples for xmp:Identifier in this post. I’ve now added in that namespace to the markup fragments listed. Also added in a comment here which shows the example in RDF/XML for those who may prefer that over RDF/N3.)

So, as a preliminary to reviewing how a fuller metadata description of a Crossref resource may best be fitted into an XMP packet for embedding into a PDF, let’s just consider how a DOI can be embedded into XMP. And since it’s so much clearer to read let’s just conduct this analysis using RDF/N3. (Life is too short to be spent reading RDF/XML or C++ code. :~)

(And further to Chris Shillum’s comment]2 on my earlier post Metadata in PDF: 2. Use Cases where he notes that Elsevier are looking to upgrade their markup of DOI in PDF to use XMP, I’m really hoping that Elsevier may have something to bring to the party and share with us. A consensus rendering of DOI within XMP is going to be of benefit to all.)

(Continues.)

Within an XMP packet our first idea might be to include the DOI using the Dublin Core (DC) schema element dc:identifier in minimalist fashion:

@prefix dc: <http://purl.org/dc/elements/1.1/> .
<>   dc:identifier "10.1038/nrg2158" .


This simply says that the current document (denoted by the empty URI “<>“) has a string property “10.1038/nrg2158” which is of type identifier from the dc (or Dublin Core) schema which is identified by the URI http://purl.org/dc/elements/1.1/.

Now, since this is just a DOI and the wider public cannot be expected to know about DOIs, it would surely be better to present the DOI in URI form (doi:) as

@prefix dc: <http://purl.org/dc/elements/1.1/> .
<>   dc:identifier "doi:10.1038/nrg2158" .


or, using a registered URI form (info:) as

@prefix dc: <http://purl.org/dc/elements/1.1/> .
<>   dc:identifier "info:doi/10.1038/nrg2158" .


Aside: This shows up a limitation of XMP where the DC schema property value for dc:identifier is fixed as type Text. The natural way to express the above in RDF/N3 would be as:

@prefix dc: <http://purl.org/dc/elements/1.1/> .
<>   dc:identifier <info:doi/10.1038/nrg2158> .


which says that the value is a URI (type URI in XMP terms), not a string (type Text in XMP terms). We either have to flout the XMP specification or else live with this restriction. We’ll opt for the latter for now.

But, the XMP Spec deprecates the use of dc:identifier since the context is not specific. (Note that that’s what was just discussed above. The limitation is built into XMP which builds on RDF but does not fully endorse the RDF world view.) Instead the XMP Spec recommends using xmp:Identifier since the context can be set using a qualified property as:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<>   xmp:Identifier  [
a rdf:Bag;
rdf:_1  [
xmpidq:Scheme "DOI";
rdf:value "10.1038/nrg2158" ] ] .


This says the string “10.1038/nrg2158” belongs to the scheme “DOI”.

Here we have used the scheme “DOI” and, as noted above, for wider recognition it would be better to employ one of the URI forms, e.g.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<>   xmp:Identifier  [
a rdf:Bag;
rdf:_1  [
xmpidq:Scheme "URI";
rdf:value "doi:10.1038/nrg2158" ] ] .


This says the string “doi:10.1038/nrg2158”belongs to the scheme “URI”.

But this is the unregistered URI form (doi:), so should we be using instead the registered form (info:)? Well, turns out that this construct for xmp:Identifier is an rdf:Bag so we can include more than one term. How about using this construct then:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<>   xmp:Identifier  [
a rdf:Bag;
rdf:_1  [
xmpidq:Scheme "URI";
rdf:value "info:doi/10.1038/nrg2158" ];
rdf:_2  [
xmpidq:Scheme "URI";
rdf:value "doi:10.1038/nrg2158" ] ] .


Now we’ve got both forms, which is fair enough since these are equivalent. In RDF terms we can make the statement that:

doi:10.1038/nrg2158 owl:sameAs info:doi10.1038/nrg2158 .

which asserts that the two URIs are equivalent and that they reference the same resource.

So, what if we want to include a native DOI without the URI garb? We can easily do that:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<>   xmp:Identifier  [
a rdf:Bag;
rdf:_1  [
xmpidq:Scheme "URI";
rdf:value "info:doi/10.1038/nrg2158" ];
rdf:_2  [
xmpidq:Scheme "URI";
rdf:value "doi:10.1038/nrg2158" ];
rdf:_3  [
xmpidq:Scheme "DOI";
rdf:value "10.1038/nrg2158" ] ] .


OK, that takes care of the XMP direction to use xmp:Identifier, but, while deprecated by XMP, we note that back in the real world folks will be looking at the DC elements which is the schema with the greatest purchase. So, why not also add in a dc:identifier element such as would be used typically for DOI in citations. How about this:

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<>   xmp:Identifier  [
a rdf:Bag;
rdf:_1  [
xmpidq:Scheme "URI";
rdf:value "info:doi/10.1038/nrg2158" ];
rdf:_2  [
xmpidq:Scheme "URI";
rdf:value "doi:10.1038/nrg2158" ];
rdf:_3  [
xmpidq:Scheme "DOI";
rdf:value "10.1038/nrg2158" ] ];
dc:identifier "doi:10.1038/nrg2158" .


Right, so we’ve taken care of the identfiers. But maybe there’s something missing? There’s no link to the DOI proxy. For widest applicability we should not assume prior knowledge of the DOI system. Perhaps we could include this link using the property dc:relation? Seems feasible though would really like to get some feedback on this. Any ideas?

So here, then, is a fairly full and complete expression of DOI within the XMP packet.

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<>   xmp:Identifier  [
a rdf:Bag;
rdf:_1  [
xmpidq:Scheme "URI";
rdf:value "info:doi/10.1038/nrg2158" ];
rdf:_2  [
xmpidq:Scheme "URI";
rdf:value "doi:10.1038/nrg2158" ];
rdf:_3  [
xmpidq:Scheme "DOI";
rdf:value "10.1038/nrg2158" ] ];
dc:identifier "doi:10.1038/nrg2158";
dc:relation "http://0-dx.doi.org.lib.rivier.edu/10.1038/nrg2158" .


Ta-da!

(Of course, this is all premised on having freedom in writing out the XMP packet. If one is dependent on commercial applications to write out the packet then things may be different. Actually, they will be very different. They may not even be workable.)

Feedback would be very welcome.