Recommendations on RSS Feeds for Scholarly Publishers

 

 

Version

1.0 2009-10-19

Status

Release

Contents

  1. Recommendations on RSS Feeds for Scholarly Publishers
    1. Version
    2. Status
    3. Contents
    4. 1. Introduction
    5. 2. General Recommendations for Good Practice
    6. 3. Recommendations for Use of RSS Modules
      1. 3.1 Admin Module
      2. 3.2 Content Module
      3. 3.3 Dublin Core Module
      4. 3.4 PRISM Module
    7. 4. Example RSS 1.0 TOC Feed
      1. 4.1 Example element
      2. 4.2 Example element
    8. 5. How to Make Your RSS Feeds Findable
      1. 5.1 RSS Feed Icons
      2. 5.2 RSS Autodiscovery
        1. 5.2.1 Use of Multiple Auto-Discovery Links
      3. 5.3 Guidelines for Use of OPML
    9. 6. Guidelines on Gathering Statistics on RSS Usage
      1. 6.1 Caveat about statistics
      2. 6.2 Subscribers from Aggregated Services
      3. 6.3 Click-Throughs
      4. 6.4 Impressions
    10. 7. Incorporating Licensing Information into RSS Feeds
    11. 8. Tools for Creating and Validating Your RSS Feeds
      1. 8.1 Creation
      2. 8.2 Validation
    12. 9. Notes on Media Types
    13. A. Example Atom TOC Feed
      1. A.1 Example element
      2. A.2 Example element

 

1. Introduction

 

The JISC funded ticTOCs project has developed a service to enable users to discover, subscribe to, and re-use Table of Contents (TOC) RSS feeds for thousands of journals from a wide range of publishers.

 

The initial phase of the project included analysis of publishers current practices with regard to the provision of RSS TOCs. The analysis revealed a range of issues which may impact on the ability of end users and feed aggregators to effectively utilize feeds.

 

Just because a particular publisher’s feed "looks good" in the major feed readers isn't enough to make said feed truly usable. RSS feeds are designed to be aggregated. When aggregated, even minor variations in feeds can be greatly distracting and make for an unpleasant user experience. For instance, there is currently wide variation amongst scholarly publishers in the use of the <channel><title>  element.  Examples encountered after having looked at only ten publishers during the early stages of the ticTOCs project included:

 

<title>Nature</title>

<title>BMJ Current Issue</title>

<title>British Journal of Visual Impairment current issue</title>

<title>Journal of Geophysics and Engineering latest papers</title>

<title>British Journal of Criminology – current issue</title>

<title>Journal of managerial psychology: table of contents</title>

<title>Science Direct Publication: Biochemical and Biophysical research communications</title>

<title>SpringerLink - Journal</title>

<title>Blackwell Synergy: International Journal of Cosmetic Science: Table of Contents</title>

<title>NATURE –LONDON-</title> 

 

Variations in practice amongst publisher feeds can be irritating for end-users, but they can be insurmountable for automated processes. RSS feeds are increasingly being consumed by knowledge discovery and data mining services. In these cases, variations in date formats,  the practice of lumping all authors together in one <dc:creator> element, or generating invalid XML can render the RSS feed useless to the service accessing it.

 

The draft recommendations below are a result of this initial analysis and are ultimately intended to facilitate good practice in the production and provision of TOC RSS Feeds. The guidelines include general recommendations for good practice, specific recommendations on the use of RSS Modules and an example RSS TOC feed. Ultimately, we expect that industry wide adoption of these best practices will help drive more traffic to publisher web sites. Note that most of these recommendation can also be applied to non-TOC RSS feeds such as thematic feeds, automated search result feeds, etc.


The Best Practice recommendation group included the following members:


The recommendations were based on early work developed by Malcolm Moffat (ICBL, Heriot-Watt University).

 

Your feedback and comments are much appreciated. Please email rss_best_practice@crossref.org

 

2. General Recommendations for Good Practice

 

As discussed above, RSS feeds are designed to be read in aggregate. When RSS feeds from different sources are intermingled in one view, variations in the content and formatting of the title element can become distracting and/or confusing.  Most of these variations in title formatting amongst publishers occur because the publisher has chosen to "annotate" the title with a description of the feed (e.g. "current issue", "latest papers", "table of contents", etc.). These annotations can vary in their naming conventions, their formatting and their placement (e.g. prepended or appended). In addition to being confusing to the researcher, such variations in the title element make it difficult for automated processes to make sense of the title as they have no consistent way of distinguishing the actual content of the title element from the annotation.


The practice of annotating the title channel should generally be deprecated. Ideally, if explanatory text is needed to describe the feed, then this should be placed in the <description> element of the feed. 


However, some might object that RSS readers do not consistently make use of the feed description element and that title annotations are still necessary in order to allow users to distinguish between different feeds. In this case we recommend that such annotations be confined to non table of content feeds (e.g. saved searches, "most cited articles", etc.) and that the RSS feed for the <title> element for table of contents for the current issue should only contain the official name of the publication. 


Finally, in order to introduce some constancy where annotations are included, publishers should:



Thus, the title of the TOC feed for the current issue of the Journal of Psychoceramics would be encoded like this:


<title>Journal of Psychoceramics</title>


Where as the feed of most cited articles from the Journal of Psychoceramics would be encoded like this:


<title>Journal of Psychoceramics [most cited articles]</title>


 

3. Recommendations for Use of RSS Modules

 

RSS 1.0 Modules are XML-namespace based compartmentalised extensions to RSS 1.0. This namespace based modularization allows RSS 1.0 to be extended without the need for rewrites of the core RSS 1.0 specification and without the need for consensus on each and every element. In a nutshell, RSS 1.0 Modules allow the basic RSS 1.0 format to be extended in standard ways by specifying which modules and namespaces are being used. A range of Standard and Proposed RSS 1.0 modules are available which enable RSS 1.0 to be extended in a multitude of ways (e.g. provision of information on feed status or frequency of feed updates, and extension for use with audio or wiki type content).

From the perspective of TOC feeds four RSS 1.0 modules are particularly relevant:

  1. Admin Module

    mod_admin provides administrative properties that can be used to help improve the robustness and reliability of broad RSS usage between providers, aggregators, clients, and other users.

  2. Content Module

    mod_content is a module to extend RSS to permit the inclusion of actual content rather than just metadata descriptions of content.

  3. Dublin Core Module

    mod_dc is a module to extend RSS via the use of the Dublin Core element set

  4. PRISM Module

    mod_prism is a module to extend RSS via the use of the PRISM element set.

Full details of these modules can be found from the links above. The remainder of this section attempts to provide brief outlines of these three modules along with examples of commonly used elements and short notes on their usage.

 

3.1 Admin Module

 The RSS 1.0 Admin Module 'mod_admin' provides two useful properties that can be added to an RSS feed in order to provide a means for consumers of a feed to provide feedback on errors encountered in the feed. This kind of mechanism is essential in a large syndication network in which feeds may be aggregated and re-syndicated before reaching the eventual consumer: there needs to be clear labelling within the feed itself as the client may not be consuming it from its original location.

The two additional elements that the admin module specifies are as follows:

 

The following example illustrates the usage of each element. Note that as the values of both elements are URIs, and are hence pointers to other web resources, then the "rdf:resource" attribute is used rather than element content.

 

<channel rdf:about="...">

<admin:generatorAgent rdf:resource="http://www.example.org/software/platform/1.0"/>

<admin:errorReportsTo rdf:resource="mailto:errors@example.org"/>

</channel>

 

3.2 Content Module 

The RSS 1.0 Content Module 'mod_content' was originally intended to allow RSS 1.0 to be extended to permit the inclusion actual content, rather than just metadata descriptions, within RSS 1.0. It has, however, become common practice to use this module to enable inclusion of HTML marked up versions of <item><descriptions> within feeds. This is accomplished by the use of the <content:encoded> element as illustrated below.

 

Element
Typical Example

<content:encoded>

<content:encoded>

<![CDATA[ <p> <b>Biochemistry: Designer enzymes</b> </p> <p>Nature 448, 757 (2007). <a href="http://0-dx-doi-org.lib.rivier.edu/10.1038/448757a">doi:10.1038/448757a</a> </p> <p>Authors: Michael P. Robertson &amp; William G. Scott</p> <p>Evolution has crafted thousands of enzymes that are efficient catalysts for a plethora of reactions. Human attempts at enzyme design trail far behind, but may benefit from exploiting evolutionary tactics.</p> ]]>

</content:encoded>

 

Using the RSS 1.0 Content module in this way allows both metadata descriptions and HTML marked up content to be simultaneously presented within a feed. Aggregators, feed readers, and other third parties can then choose which form to utilise.

 

3.3 Dublin Core Module 

The RSS 1.0 Dublin Core Module 'mod_dc' allows RSS 1.0 to be extended to utilise the Dublin Core Metadata Element Set.


Typically used elements include:

 

Element
Typical Examples

<dc:publisher>

<dc:publisher>Institute of Physics Publishing</dc:publisher>

<dc:language>

<dc:language>en</dc:language>

<dc:rights>

<dc:rights>Copyright Institute of Physics Publishing 2007</dc:rights>

<dc:date>

<dc:date>2007-03-23</dc:date>

<dc:title>

<dc:title>Rock classification based on resistivity patterns</dc:title>

<dc:creator>

<dc:creator>Linek, Margarete</dc:creator>

<dc:creator>Jungmann, Matthias</dc:creator>

<dc:creator>Berlage, Thomas</dc:creator>

<dc:identifier>

<dc:identifier>doi:10.1088/1742-2132/4/2/006</dc:identifier>

 

Usage Notes:


 

See the Dublin Core Usage Guide for further details on the usage of individual dublin core elements.

 

3.4 PRISM Module 

The RSS 1.0 PRISM Module 'mod_prism' allows RSS 1.0 to be extended to utilise the PRISM specification (Publisher Requirements for Industry Standard Metadata). Basically the PRISM element set comprises of a range of metadata elements specifically selected for use in electronic publishing contexts and is therefore highly appropriate for use within TOC related RSS feeds.

 

Typically used elements include:

 

Element
Typical Example

<prism:doi>*

<prism:doi>10.1093/bjc/azn067</prism:doi>

<prism:url>*

<prism:url>http://0-dx-doi-org.lib.rivier.edu/10.1093/bjc/azn067</prism:url>

<prism:publicationName>

<prism:publicationName>British Journal of Criminology</prism:publicationName>

<prism:publicationDate>

<prism:publicationDate>2009-01</prism:publicationDate>

<prism:issn>

<prism:issn>0007-0955</prism:issn>

<prism:eIssn>

<prism:eIssn>1464-3529</prism:eIssn>

<prism:copyright>

<prism:copyright>Copyright Oxford Journals 2009</prism:copyright>

<prism:rightsAgent>

<prism:rightsAgent>journals.permissions@oxfordjournals.org

</prism:rightsAgent>

<prism:volume>

<prism:volume>49</prism:volume>

<prism:number>

<prism:number>1</prism:number>

<prism:startingPage>

<prism:startingPage>68</prism:startingPage>

<prism:endingPage>

<prism:endingPage>87</prism:endingPage>

 

* Elements require use of PRISM 2.0 namespace (or above), otherwise PRISM 1.2 namespace is sufficient.

 

Usage Notes: 

 

This document strongly recommends the use of PRISM 2.0 (or above) as this allows for the direct inclusion of DOI and associated URL into feeds.

 

See the PRISM specifications for further details on individual PRISM elements. 

 

4. Example RSS 1.0 TOC Feed

 

An example RSS 1.0 TOC feed utilising the the Admin, Content, Dublin Core and PRISM modules is available here. Both <channel> and <item> elements are shown below.

 

4.1 Example <channel> element

 

<channel rdf:about="http://www.nature.com/nature/current_issue/rss">

<title>Nature</title>

<description>Nature is a weekly international journal publishing the finest peer-reviewed research in all fields of science and technology on the basis of its originality, importance, interdisciplinary interest, timeliness, accessibility, elegance and surprising conclusions. Nature also provides rapid, authoritative, insightful and arresting news and interpretation of topical and coming trends affecting science, scientists and the wider public.</description>

<link>http://www.nature.com/nature/current_issue/</link>

<admin:generatorAgent rdf:resource="http://www.nature.com/"/>

<admin:errorReportsTo rdf:resource="mailto:feedback@nature.com"/>

<dc:publisher>Nature Publishing Group</dc:publisher>

<dc:language>en</dc:language>

<dc:rights>© 2009 Nature Publishing Group</dc:rights>

<prism:publicationName>Nature</prism:publicationName>

<prism:issn>0028-0836</prism:issn>

<prism:eIssn>1476-4679</prism:eIssn>

<prism:copyright>© 2009 Nature Publishing Group</prism:copyright>

<prism:rightsAgent>permissions@nature.com</prism:rightsAgent>

<image rdf:resource="http://www.nature.com/includes/rj_globnavimages/nature_logo.gif"/>

<items>

<rdf:Seq>

...

<rdf:li rdf:resource="http://0-dx-doi-org.lib.rivier.edu/10.1038/458587a"/>

...

</rdf:Seq>

</items>

...

<item rdf:resource="http://0-dx-doi-org.lib.rivier.edu/10.1038/458587a"/>

...

</item>

...

</channel>

 

4.2 Example <item> element

 

<item rdf:about="http://0-dx-doi-org.lib.rivier.edu/10.1038/458587a">

<title>Cosmology: Dark matter and dark energy</title>

<link>http://0-dx-doi-org.lib.rivier.edu/10.1038/458587a</link>

<description>Observations continue to indicate that the Universe is dominated by invisible components — dark matte

r and dark energy. Shedding light on this cosmic darkness is a priority for astronomers and physicists.</description>

<content:encoded><![CDATA[

<p>

<b>Cosmology: Dark matter and dark energy</b>

</p>

<p>Nature 458, 587 (2009). <a href="http://0-dx-doi-org.lib.rivier.edu/10.1038/458587a">doi:10.1038/458587a</a>

</p>

<p>Authors: Robert Caldwell

& Marc Kamionkowski</p>

<p>Observations continue to indicate that the Universe is dominated by invisible components — dark matter and dark

 energy. Shedding light on this cosmic darkness is a priority for astronomers and physicists.</p>

]]></content:encoded>

<dc:title>Cosmology: Dark matter and dark energy</dc:title>

<dc:creator>Robert Caldwell</dc:creator>

<dc:creator>Marc Kamionkowski</dc:creator>

<dc:identifier>doi:10.1038/458587a</dc:identifier>

<dc:source>Nature 458, 587 (2009)</dc:source>

<dc:date>2009-04-01</dc:date>

<prism:publicationName>Nature</prism:publicationName>

<prism:publicationDate>2009-04-01</prism:publicationDate>

<prism:volume>458</prism:volume>

<prism:number>7238</prism:number>

<prism:section>News and Views Q&A</prism:section>

<prism:startingPage>587</prism:startingPage>

<prism:endingPage>589</prism:endingPage>

<prism:doi>10.1038/458587a</prism:doi>

<prism:url>http://0-dx-doi-org.lib.rivier.edu/10.1038/458587a</prism:url>

</item>

 

5. How to Make Your RSS Feeds Findable

 

There are several ways in which in which a website might expose RSS feeds in order that a user might find and subscribe to items of interest. One option is to provide a browsable directory of RSS feeds that exists separately to the rest of the publication website. Users would be directed to this list of feeds from help documentation or via additions to typical "alerting" options. The user would then browse the directory and click on feeds of interest in order to subscribe. Assuming that the feed is valid and has been delivered using a suitable mime type, then the subscription process will be relatively straight-forward. The main downside to this option however is that the user must cease their browsing of the publications and switch to browsing of the RSS feeds, making the user experience less than ideal.

 

A better approach, optimised for the typical use case of a user subscribing to one, or a few, RSS feeds in a single session, is to link the RSS feed directly from the main body of the website, e.g. from a journal homepage or TOC page. This allows a user to quickly subscribe to a feed without having to browse to another section of the website. A standard set of feed icons are being used across the web to include links to RSS feeds on web pages, providing users with a recognisable way to find and identify that RSS feeds are available (see RSS feed icons, below).

 

A further mechanism for associating one or more RSS feeds with a web page is via the RSS "auto-discovery" mechanism. This involves placing some metadata in the head of the webpage that identifies an associated RSS feed, providing a title and a link to the document. The format of this metadata is standardised and is supported in all modern web browsers and feed readers. If a web browser auto-discovers an RSS feed then it will provide the user with additional options to subscribe to the feed, typically by placing an icon in the location bar, or activating additional menu items. Feed readers support auto-discovery by using this as a fall-back mechanism for finding a feed, if the user attempts to use the URL of the TOC page (for example) instead of a direct link to the RSS feed when attempting to subscribe. The reader will auto-discover the feed and will be able to successfuly carry out the users intended action.

 

It is recommended that publishers support both auto-discovery and direct, context-sensitive, links to RSS feeds from their content browsing pages as these provide the best user experience and conform with current best practices for RSS feed discovery and subscription across the web. Directories of RSS feeds are also useful but are typically of more interest to power users who may, for example, wish to quickly find and subscribe to a large number of feeds in a single session. Directories of RSS feeds can also be published in a machine-readable fashion using a format known as OPML. See below for recommendations on its usage.

 

5.1 RSS Feed Icons

 

Many web sites use simple textual labels (e.g. "RSS Feed", "Subscribe"), etc to present links to RSS links to users. These links often fit easily into existing navigation. However with variation across sites, a user often has to search through a page in order to find whether an RSS feed is available. One partial remedy for this is to support RSS autodiscovery, as the users web browser may include additional cues that a feed is available (e.g. an icon in the location bar), but this does not help if the user is using a desktop RSS reader and needs to find and copy the link in order to subscribe.

 

To improve the user experience for finding an RSS feed on a page, some effort has been made to standardise on a common way to present RSS feed links to users. The Feed Icons website provides a set of downloadable RSS icons that can be used on a publishers website to link to RSS feeds. While the icons are available in a range of colours, allowing some tailoring for a sites existing design, it is strongly recommended that publishers use the orange and white icon which is the most commonly used icon. The icon is available in several sizes allowing it to be easily incorporated into navigation. For accessibility reasons, publishers should ensure that the icon and RSS feed links have appropriate alternative labels. The icon may also be added next to existing textual links.

 

A number of websites offering RSS feeds also include additional icons or "badges" that provide quick subscription links for specific RSS reader applications, e.g  Google Reader, My Yahoo!, etc. While these may provide some additional convenience, there are some downsides. Firstly the icons take up additional space on the page, which may be better used elsewhere. Secondly the list of applications is likely to change over time, requiring some maintenance of the links. Thirdly, and most importantly, a confusion of additional links may hinder users in finding and using the RSS feeds provided. Publishers are therefore recommended to use the standard RSS feed icons rather than application specific badges.

 

However, if a publisher does feel that it would be useful to provide a user with a wider range of subscription options, it is recommended that a service like Add to Any be used. This, and similar services, provide an easy way to include multiple subscription links on a page in an unobtrusive way.

 

5.2 RSS Autodiscovery

 

As described in the introductory section above, RSS "auto-discovery" defines a standard way for a user agent, such as a web browser or feed reader, to find the RSS feed associated with a page. In short, it is the metadata equivalent of a standard feed icon.

 

Auto-discovery is enabled by adding a <link/> element into the <head/> of an HTML document. An example head element containing an auto-discovery link is shown below: 

 

<head>
<title>Journal of Tree Studies</title>
<link rel="alternate" title="Journal of Tree Studies" href="http://www.example.org/journals/trees.rss" type="application/rss+xml">
</head>

There are 3 required and one optional attribute on the link element.

 

Attribute Required? Description
rel Yes Should always have a fixed value of "alternate"
href Yes Should contain a link to the RSS feed
type Yes Should indicate the mime-type of the RSS feed. Some applications support multiple values for this element, e.g. allowing the link to indicate whether a feed is RSS or Atom. It is recommended that publishers standardise on the use of "application/rss+xml".
title No A title for the RSS feed

 

Note that while strictly speaking the title attribute is optional it is recommended that publishers always include it in their auto-discovery links, making sure that the labelling is clear to enable users to easily identify and subscribe to the feed of interest.

 

5.2.1 Use of Multiple Auto-Discovery Links

 

It is possible to include multiple auto-discovery links within a single page, allowing a list of RSS feeds to be discovered. A user would typically be presented with this list of feeds, allowing them to choose the specific feed to which they would like to subscribe. It is common to see multiple auto-discovery links in two situations:

 

  1. Linking to the same feed, but in alternate formats (e.g. RSS 1.0 and Atom)
  2. Linking to feeds that carry different information, e.g. the RSS feed for a blog and the RSS feed for the comments associated with a blog post.

 

The first scenario has the potential to confuse users: the formats may be different but the content is essentially the same, The second scenario has value in providing a way for a user to be presented with options to choose from a selected of alternatives. Publishers are therefore recommended to not provide multiple auto-discovery feeds to expose different formats, instead using it as a way to present alternatives.

 

5.3 Guidelines for Use of OPML

 

Typically RSS feeds are subscribed to on a individual basis, i.e. as a user discovers a feed of interest they will subscribe to it using their feed reader. However there are scenarios in which it is useful to be able to find and subscribe to a collection of RSS feeds, e.g. all feeds produced by a specific publisher, or all feeds from journals in a specific subject. Supporting this option will allow RSS feed aggregators and indexers to more easily find all feeds exposed from a particular site, or allow an institutional librarian to find all feeds from a specific subject category for incorporation into a library web site, OPAC, or for bulk importing into a feed reader application.

 

The standard mechanism for exposing collections of RSS feeds is through a technology known as OPML. OPML is an XML vocabulary that can be used to describe a simple directory of RSS feeds that includes the title of the feed, a link to the home page of the feed (e.g. the journal homepage), and a link to the RSS feed itself. A simple OPML document containing two RSS feeds is illustrated below:

 

<opml version="2.0">
<head>
<title>RSS Feeds for Botany</title>
</head>
<body>
<outline title="Journal of Flowers" type="rss"
xmlUrl="http://www.example.org/journal/flowers/latest.rss"
htmlUrl="http://www.example.org/journal/flowers/latest"/>
<outline title="Journal of Trees" type="rss"
xmlUrl="http://www.example.org/journal/trees/latest.rss"
htmlUrl="http://www.example.org/journal/trees/latest"/>
</body>
</opml>

More information on the OPML format can be found in the specification. OPML is well supported in RSS applications, with the majority of applications allowing a user to import an OPML file and automatically subscribe to all of the listed feeds.

 

Publishers are therefore recommended to publish OPML documents that list all of the feeds from their website and, ideally, OPML documents for subject categories that list all journal feeds in that category. Unlike RSS feeds there is no standard way to link to OPML feeds from web pages. Various attempts have been made at defining an auto-discovery mechanism for OPML using the HTML link element, but none of these have achieved wide adoption.

 

 

6. Guidelines on Gathering Statistics on RSS Usage

 

6.1 Caveat about statistics

Gathering meaningful usage statistics on your RSS usage is probably even more fraught and difficult to do than gathering regular web statistics. This is due to a number of idiosyncrasies around RSS usage, including:

 

 

There are generally three major types of statistics that you may want to gather about your RSS usage:

 

 

Each is discussed in more detail below.

 

6.2 Subscribers from Aggregated Services 

 

There is evidence that the vast majority of RSS consumption occurs through one of the major feed aggregators, Google, iGoogle, MyYahoo, Bloglines, etc.

 

The good news is that web based feed readers are generally well-behaved and, when their bots retrieve your RSS feeds, they will typically use the UserAgent string to inform you of how many subscribers they have to the feed they are retrieving. So, for instance looking in your web server logs you might see an entry like this:

 

GET /journal_of_psychoceramics/toc_rss1.xml - x.x.x.x HTTP/1.1 Bloglines/3.1+(http://www.bloglines.com;+43+subscribers)

 

Which indicates that there are 43 Bloglines subscribers to the “Journal of Psychoceramics” RSS feed. Obviously, if you offer your RSS feed in multiple formats (RSS 1, RSS 2, ATOM), you would need to sum the subscribers of all the formats to get an overall count.

 

The bad news is that the term “subscribers” does not mean “active subscribers”. In other words, this statistic gives no evidence that said subscribers are actually reading your feeds (or that they even continue to use the BlogLines service). It is also possible that some “subscribers” have subscribed to your content in multiple systems. For example, a user moving from using BlogLines to Google Reader is unlikely to delete their old BlogLines subscriptions and therefor all of their subscriptions might be double-counted.

 

Still, the subscriber count not not entirely useless. At the very least the act of subscribing to somethig usually indicates interest and intent.

 

6.3 Click-Throughs 

 

Measuring “click-throughs” might  give one a better idea of how many people are actually actively engaged with your content, though one has to be careful to not sacrifice user experience for the sake of accurate statistics. As discussed above, creating a “partial feed” and forcing a user to click-through to your site in order to get an article abstract just so that you can boost your click-through statistics may ultimately be counter-productive. Again, at the very least you should familiarize yourself with the “full vs partial feed” debate before making a decision about how much you want to depend on click-throughs for measuring usage of your RSS feeds.

 

Gathering click-through statistics from RSS feeds is generally accomplished by making sure that links in the RSS feed are encoded in some way that lets the publisher look at the referrer and determine that the link was followed from an RSS feed. This is often done by appending parameters to the link url.

 

Example:

http://www.zyz.com/index.html?source=rss

 

 If you generate your RSS feeds dynamically, you might even wish to customize the links based on the UserAgent of the application retrieving the feed:

 

Example:

 http://www.zyz.com/index.html?source=bloglines_rss

 

We recommend that, where possible, Scholarly publishers use the DOI to link to their content. In order to measure click-throughs and still use the DOI for linking to your content, you will want to structure your links using DOI Parameter Passing. In practice, this would mean that a link to an article that might normally be recorded like this: 

 

Normal Link Element Example:

<link>http://0-dx-doi-org.lib.rivier.edu/10.5555/5551212</link>

 

 Would instead be encoded like this:

 

Parameter Passing Enabled Link Element Example:

<link>http://0-dx-doi-org.lib.rivier.edu/openurl?url_ver=Z39.88-2003&

      rfr_id=info:sid/psychoceramicsjournal.org&

      rft_id=doi:10.5555/5551212&

      rfr_dat="rss%3dyes%26source%3dbloglines_rss"</link>

 

 

Note that for DOI parameter passing to work, the publisher must have implemented a service capable of using the nested parameters and have agreed to participate in DOI Parameter Passing.

 

6.4 Impressions 

 

Again, one of the problems with gathering statistics on RSS uptake is that the act of reading an RSS feed  entry does not normally trigger a page GET on your site. One way in which publishers have tried to get around this is to embed uniquely named, invisible graphic in the feed item. This way, any rendering of the RSS feed item would trigger a GET of the graphic from your site. In theory, this should mean that you should be able to detect when a particular item is being read, but in practice this technique is not very accurate because:

 

 

7. Incorporating Licensing Information into RSS Feeds

 

As noted in the introduction to these recommendations, RSS feeds are increasingly used for purposes beyond simply syndicating information to users: feeds are also typically processed, aggregated and repurposed as part of a growing range of knowledge discovery and data mining services. The recommendations in this document aim to support and encourage these diverse users by ensuring that there is rich, well-expressed metadata included in each feed.

However to properly enable these forms, RSS feeds should ideally be clearly annotated with some form of rights statement. Publishers are encouraged to consider the license to reuse metadata embedded in their feeds, such reuses are likely to cover both commercial and non-commercial uses of the data.

The Creative Commons website defines a standard way to include licensing metadata in an RSS 1.0 feed. This approach allows a publisher to use any of the existing range of Creative Commons licensing options and to also provide pointers to additional licensing terms, e.g. to directly enable academic (non-commercial) usage, but require additional agreements for commercial usage.

 

8. Tools for Creating and Validating Your RSS Feeds

 

8.1 Creation

These are as many methods for creating RSS Feeds as there are for creating HTML web pages. Here is a brief summary of some of the methods.

 

 

8.2 Validation

It is essential that Feeds and OPML files are valid. Here are some online validation tools.

 

 

Note that the above-mentioned validation tools will only verify that Feeds or OPML files are valid, they will not confirm that they follow the best practices recommended here. CrossRef will develop and host a tool that can be used to confirm that your feeds are both valid and that they conform to these recommendations. We will update this document with a link to the tool when it is ready for use.

 

9. Notes on Media Types

 

This section provides some additional discussion on the correct use of media types for RSS feeds.

 

Media types are an important feature of the web architecture that enable browsers and other user agents (e.g. RSS feed readers) to correctly identify and process content that they retrieve from the web. In the context of RSS this is particularly important as without using the correct media type, a user may by shown XML markup rather than being offered the option to subscribe to an RSS feed using their configured feed reader.

 

Several mimetypes have been commonly used or recommended for use for delivering RSS feeds. This advice is often contradictory and does not always take into account all relevant use cases. These recommendations take a pragmatic approach that attempts to address the most common issues. The following table outlines some commonly used media types and the issues related to them:

 

Media Type Issues
text/xml This media type is often used for delivering RSS feeds or XML documents. However due to the way that the media type has been specified, user agents may incorrectly process the contents with a character set of US-ASCII. This is highly likely to cause problems with the RSS feeds containing bibliographhic information that typically contains characters from the wider unicode character set.
application/rss+xml
This media type was proposed as a standard for delivering RSS feeds, and is still widely used. It is also used as an identified in RSS auto-discovery links to indicate that the link refers to an RSS feed. However, while media type is well supported in RSS readers, the media type is not formally registered.
application/rdf+xml This is the correct, standard media type for delivering RDF/XML documents such as RSS 1.0 feeds, as specified in this document. However this media type is not well supported in browsers and in many cases may cause confusion for an end user who may be prompted to download the feed rather than subscribe to it
application/atom+xml This is the correct, standard media type for delivering Atom feeds. It should be used for delivering only Atom documents, and not RSS 1.0 feeds.
application/xml This is the standard, default media type for delivering XML documents over the web. It does not suffer from the same character encoding issues as text/xml. The media type is well supported in RSS feaders, so offers the same advantages as application/rss+xml.

 

To encourage the use of standardised media types, this document therefore recommends the use of application/xml as the default media type for delivering RSS feeds. It is also recommended that feeds are delivered with UTF-8 character encoding, requiring that the feed be delivered with a content type of: application/xml; encoding=UTF-8.

 

Note: this might seemingly introduce a slight discrepancy between the recommended media type for delivering an RSS feed, and the media type used to link to an RSS feed from an RSS Autodiscovery link. In the latter case, a media type of application/rss+xml. However it is important to recognise that in the case of linking to a feed, the media type is being used simply as a label, it does not define the processing behaviour of the feed reader.

 

A. Example Atom TOC Feed

 

An example Atom TOC feed based on the cut-down (and slightly modified) version of the example RSS feed given in Sect. 4 is shown here for illustrative purposes only. Both <feed> and <entry> elements are displayed.

 

A.1 Example <feed> element

 

<feed xmlns="..." xmlns:dc="..." xmlns:prism="...">

<title type="text">Nature</title>

<author>

<name>Nature Publishing Group</name>

</author>

<updated/>

<id/>

<link rel="self" type="application/atom+xml" href="http://www.nature.com/nature/current_issue/atom"/>

<!--

<image rdf:resource="http://www.nature.com/includes/rj_globnavimages/nature_logo.gif"/>

-->

<icon/>

<rights/>

<dc:publisher>Nature Publishing Group</dc:publisher>

<dc:language>en</dc:language>

<dc:rights>© 2007 Nature Publishing Group</dc:rights>

<prism:publicationName>Nature</prism:publicationName>

<prism:issn>0028-0836</prism:issn>

<prism:eIssn>1476-4679</prism:eIssn>

<prism:copyright>© 2007 Nature Publishing Group</prism:copyright>

<prism:rightsAgent>permissions@nature.com</prism:rightsAgent>

 

<entry>

...

</entry>

...

 

</feed>

 

A.2 Example <entry> element

 

<entry>

<id/>

<title>Structure-based activity prediction for an enzyme of unknown function</title>

<link rel="alternate" type="text/html" href="http://0-dx-doi-org.lib.rivier.edu/10.1038/nature05981"/>

<summary>With many genomes sequenced, a pressing challenge in biology is predicting the function of the proteins that the genes encode. When proteins are unrelated to others of known activity, bioinformatics inference for function becomes problematic. It would thus be useful to interrogate protein structures for </summary>

<updated/>

<author>

<name>...</name>

</author>

<content type="xhtml">

<div xmlns="http://www.w3.org/1999/xhtml">

<p><b>Structure-based activity prediction for an enzyme of unknown function</b></p>

<p>Nature 448, 775 (2007). <a href="http://0-dx-doi-org.lib.rivier.edu/10.1038/nature05981">doi:10.1038/nature05981</a>

</p><p>Authors: Johannes C. Hermann, Ricardo Marti-Arbona, Alexander A. Fedorov, Elena Fedorov, Steven C. Almo, Brian K. Shoichet & Frank M. Raushel</p><p>With many genomes sequenced, a pressing challenge in biology is predicting the function of the proteins that the genes encode. When proteins are unrelated to others of known activity, bioinformatics inference for function becomes problematic. It would thus be useful to interrogate protein structures for </p>

</div>

</content>

<dc:title>Structure-based activity prediction for an enzyme of unknown function</dc:title>

<dc:creator>Hermann, Johannes C.</dc:creator>

<dc:creator>Marti-Arbona, Ricardo</dc:creator>

<dc:creator>Fedorov, Alexander A. </dc:creator>

<dc:creator>Fedorov, Elena</dc:creator>

<dc:creator>Almo, Steven C. </dc:creator>

<dc:creator>Shoichet, Brian K.</dc:creator>

<dc:creator>Raushel, Frank M.</dc:creator>

<dc:identifier>doi:10.1038/nature05981</dc:identifier>

<dc:source>Nature 448, 775 (2007)</dc:source>

<dc:date>2007-07-01</dc:date>

<prism:publicationName>Nature</prism:publicationName>

<prism:publicationDate>2007-07-01</prism:publicationDate>

<prism:volume>448</prism:volume>

<prism:number>7155</prism:number>

<prism:section>Article</prism:section>

<prism:startingPage>775</prism:startingPage>

<prism:endingPage>779</prism:endingPage>

</entry>