Referrer abuse?

Many news aggregators (including NewsGator) write something interesting (to them!) into the HTTP referrer field when retrieving RSS feeds. For example, NewsGator writes http://www.rassoc.com/newsgator/ into this field. There have been some interesting posts about this recently, mostly saying that this is an inappropriate use of the referrer field, and that User-Agent is where the aggregator information should go (incidentally, NewsGator uses the User-Agent as well).

On the other hand, I’ve had some comments from folks who want to be able to customize the referrer field, presumably to point to their own site. This wouldn’t address the problem, and in fact might make it worse, by adding more “bogus” referrers.

What do you guys think? There is still time to address this for NewsGator v1. I’m leaning toward defaulting to no referrer, and allowing the user to override this with a custom referrer string. Another option is what Aggie does (or used to), where the referrer could be something like http://www.newsgator.com/referrers?usersite=www.rassoc.com/gregr/weblog/. Thoughts?

14 thoughts on “Referrer abuse?

  1. Gordon Weakliem

    I think that the user-agent is the place for this, but the downside is that many of us don’t host our own pages and don’t get user-agent logs. Still, I think the main value of referrers is to show what pages are linking to your log. While it’s interesting to see how many people are reading you, that data shouldn’t be mixed with referrer data.

    Reply
  2. Steve Makofsky

    I had always thought that User-Agent was for the “tool” downloading it, and “referer” was for the site that was linking to the reference. Now im not too sure. Checking on RFC 2616 (http://www.ietf.org/rfc/rfc2616.txt) it looks like it’s the following:

    “The User-Agent request-header field contains information about the user agent originating the request… The field can contain multiple product tokens (section 3.8) and comments identifying the agent and any subproducts which form a significant part of the user agent”

    “The Referer[sic] request-header field allows the client to specify, for the server’s benefit, the address (URI) of the resource from which the Request-URI was obtained (the “referrer”, although the header field is misspelled.)”

    Reply
  3. Joe Friend

    I would go with the Aggie model or something similar. I think it is interesting to see information in the referer about both the aggregrator used and the user.

    Reply
  4. Joe Madia

    I believe that sending the URI of the RSS feed as the referrer would be the most consistent with the spec. The RSS URI seems to be a perfect match for “the address (URI) of the resource from which the Request-URI was obtained” as stated in Section 14.36 of RFC 2616.

    One benefit of referrers mentioned by the spec explicitly is the ability to trace broken links back to their source. Using the RSS URI as the referrer would preserve this benefit. No other option really would.

    The spec also says, “The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard.” I would avoid user-defined referrers since there would be no guarantee that they represent valid resources. More importantly, I would avoid them because they don’t seem to be consistent with the intent of the spec.

    Just my two cents, of course!

    Reply
  5. Greg Reinacker

    Joe, which RSS feed are you referring to? If I’m reading news, do you mean I should use my own RSS feed as the referrer? That doesn’t seem to make sense.

    And if I were retrieving pages from the target site, I can see how their RSS URI might make sense…but we’re talking about the referrer for when we’re retrieving the RSS itself.

    Or did I just completely misunderstand what you were saying?

    Reply
  6. Joe Madia

    Whoops… my comment was written in regard to “retrieving pages from the target site” and not “retrieving the RSS itself” even though your original post was very clear. Sorry about that!

    After adjusting my brain back to “retrieving the RSS feed itself”, I now think you should omit the referrer entirely. The spec seems pretty strong on the point that Referrer fields should only be used for sources that have valid URIs. Since that’s not the case with aggregators (at least not yet), I would omit the referrer. Also… I would vote that giving users explicit control over the Referrer field should be avoided for the same reasons.

    Reply
  7. Dare Obasanjo

    I personally can’t stand the fact that people fill my referrer logs with bogus referrer links. I now have to figure out how to configure Webalizer to calculate referrer stats while ignoring RSS aggregators.

    Reply
  8. Steve Makofsky

    I agree with Dare – Now when I analyze my January stats, I have 4033 hits from “http://radio.userland.com/newsAggregator” and 2052 from “http://www.rassoc.com/newsgator”. Doesnt really help me discern any information about who’s linking to me.

    Reply
  9. Mike Gunderloy

    Well, here’s a vote for letting the user set. Given the general overall state of RSS, I find getting on a “standards compliance” crusade in connection with RSS absurd. As someone who runs an RSS feed, I’m interested in who’s reading it as well as in who’s linking. That information I can get from customized referrers.

    If you don’t want to see RSS aggregators in your stats, just tell your stats program to ignore hits on your RSS feed.

    Reply

Leave a Reply