Metadata Plus snapshots
Metadata Plus snapshots provide access to our 160,104,382
metadata records in a single file, providing an easy way to retrieve an up-to-date copy of our records. The files are made available via a /snapshots route in the REST API which offers a compressed .tar file (tar.gz) containing the full extract of the metadata corpus in either JSON or XML formats.
How to access snapshots
New snapshots are created each month, available by the 5th day, providing all records up to and including the previous month.
If you’re looking for the most up-to-date snapshot (all records up to and including the previous month), you can use the following URLs which will always alias to the current month:
- JSON output: https://0-api-crossref-org.lib.rivier.edu/snapshots/monthly/latest/all.json.tar.gz
- XML output: https://0-api-crossref-org.lib.rivier.edu/snapshots/monthly/latest/all.xml.tar.gz
If you want to test to see if a particular snapshot is available, you can do a HTTPS HEAD request using the following URL patterns:
- JSON output: https://0-api-crossref-org.lib.rivier.edu/snapshots/monthly/{YYYY/MM}/all.json.tar.gz
- XML output: https://0-api-crossref-org.lib.rivier.edu/snapshots/monthly/{YYYY/MM}/all.xml.tar.gz
Please note that XML snapshots are available in UNIXSD format only.
As snapshots are available to Metadata Plus users only, you will need to identify yourself in the request by using a “Crossref-Plus-API-Token” HTTPS header with your access token. The example below shows how this should be formatted, with XXX replaced by your token:
Crossref-Plus-API-Token: Bearer XXX
The files will be very large (>42GB) so may take a while to download depending on the speed of your internet connection.
Please contact us if you’re unable to access snapshots.
Keeping your data current
For applications where you want to keep a copy of our metadata records current, use OAI-PMH Plus (as described above) or the REST API to query for new records at your preferred interval.
Snapshots FAQs
Are snapshots for ‘all time’ available?
Snapshots are available for current and previous quarters. With each new snapshot, we may remove files older than the current and previous quarters. For example, on 1 April the files from the previous October, November, and December may be removed.
I’m seeing a 404 error when I request the URL
If you’re looking for the current month, this may be because the archive hasn’t yet been created for that month. Snapshots are usually available by the 5th of each month.
If you’re looking for a month that’s more than 6 months old, it may be that the snapshot has been deleted. If the archive you looking isn’t particularly new or old and you’re still seeing a 404 error, please contact us.
I’m seeing a 401 error when I request the URL
Snapshots are only available to Metadata Plus users. This 401 message means that the system doesn’t recognise you as a Metadata Plus user. If you’re already a Metadata Plus user, make sure you’re using your correct token in the header of your query. If you’re still having problems, please contact us.
I need a full snapshot mid-month
Snapshot archives are provided at the start of each month. The archive contains all the registered content received by Crossref up until that time. (Really? Yeah, all of it.) If you need a snapshot mid-month, you should download and ingest the latest archive and then harvest and ingest the registered content that has changed since then.
To get the registered content that has changed since an archive was created, use OAI-PMH Plus or the REST API. For example, if the archive was created on January 31, 2018 then the OAI-PMH Plus harvest’s initial URL is
https://0-oai-crossref-org.lib.rivier.edu/oai?verb=ListRecords&set=J&from=2018-01-31&metadataPrefix=cr_unixsd
This will harvest journal data. If you are interested in book data then use the “B” set.
https://0-oai-crossref-org.lib.rivier.edu/oai?verb=ListRecords&set=B&from=2018-01-31&metadataPrefix=cr_unixsd
If you are interested in series data then use the “S” set.
https://0-oai-crossref-org.lib.rivier.edu/oai?verb=ListRecords&set=S&from=2018-01-31&metadataPrefix=cr_unixsd
It is important to use the created date and not the completed date. It takes time to build the archive, so changes will occur during the build. Using the created date ensures those changes are harvested too.