Greg Reinacker’s Weblog

Musings on just about everything.

NewsGator feed retrieval intervals

February 14th, 2008 by gregr

I was just reading an article about Google Reader and their retrieval intervals, and thought this might be a good time to write about what NewsGator Online does. This is relevant for not only online users, but anyone who is using one of our clients (FeedDemon, NetNewsWire, Inbox, Go!, etc) in sync mode, since in that mode the clients retrieve content from our online system.

One of the more common questions/complaints we get is something about a feed not appearing to update in a timely manner. 99% of the time, it’s actually a problem with the feed - but I’ll come back to that.

There are about 2.5 million feeds in our system, and these feeds get divided into categories. They have fancy (and sometimes amusing) internal names, but for now I will describe them as follows. Also keep in mind these rules are subject to change, and in fact do change quite often to better optimize the experience for our users and our overall system load.

And before I get into all of this…note that feeds that ping our system will be updated and available typically within 60 seconds. The category the feed is in is largely irrelevant.

Category A: these are feeds that are needed by certain commercial syndication services customers with extremely tight SLAs - some of these SLAs guarantee content available within 2 minutes of publication in a feed. Feeds in this category are retrieved every 60 seconds. Exception - if a feed reliably pings our system with updates, the poll-retrieval interval may be dropped to a lower category; however, if the feed does not appear to ping us with every update, the 60 second interval remains in effect.

Category B: these are feeds with over 20 subscribers, or occasional feeds that for whatever reason are deemed “important” enough to keep in this category. Retrieval interval is 15 minutes.

Category C: these are feeds with 2-19 subscribers, and any feed that requires credentials to access. These feeds are retrieved every 1-2 hours depending on system load.

Category D: these are feeds with only 1 subscriber, which do not require credentials. If that subscriber is an “active user”, interval is 1-2 hours. If that subscriber is not very active, interval is 4-8 hours depending on load. The definition of “active” changes, but think of it as people who use the system daily-ish.

Category E: this is what we affectionately call the “penalty box.” These are feeds which have returned some kind of error, and they are “penalized” for it. For example - if a feed 404’s, it is immediately penalized for 24 hours. A 500 server error? 4 hours. Other kinds of errors (including parsing problems) cause penalties of varying lengths, taking into account how many consecutive errors we see. If a feed continues to have errors for 90 days, it will be blacklisted and no longer retrieved at all…and the only way for a feed to get off the blacklist is for it to a) fix the error(s) and then b) ping us. [I should add that 410 (gone) is not considered an error; feeds that return a 410 are immediately removed and all subscribers are unsubscribed.]

Category F: this is somewhat of a grab bag of other cases. The most visible type of feed in this category is craigslist feeds - we retrieve them on a 48-hour interval. This sucks - for you, for me, for everyone - but the problem is craigslist will throttle and blacklist us, and they seem not to be interested in solving this problem with us (we’re also not the only ones with this problem). So 48 hours is roughly the minimum interval we can get away with and minimize the chances of getting blacklisted (which takes days to undo).

By far the best way to help ensure timely updates to content is to encourage publishers to ping our system when they update (I talk about NewsGator’s ping endpoint here). A large number already do this - but there are some folks who do not. If they’re using FeedBurner, we’re already getting pinged; if they’re using another system, they may need to add NewsGator to their ping list manually. But typically, after a ping, updated content is available within 60 seconds. And as mentioned, a ping can even remove content from our blacklist.

We get a fair number of inquiries in the forums and elsewhere about feeds not updating; in nearly all of those instances, everything is actually working fine - the feed has usually fallen into category E for whatever reason. Something I’ve been thinking about is some kind of status page or something where someone can type in the name of a feed, and we’ll display status for that feed (including why it’s in the penalty box if it is)…we’ve resisted doing this because it’s just one of those things our users shouldn’t have to worry about.

This entry was posted on Thursday, February 14th, 2008 at 12:44 pm and is filed under newsgator. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

20 responses about “NewsGator feed retrieval intervals”

  1. Pedro Melo said:

    Hi,

    Thanks for clarifying NewsGator operation regarding this.

    I for one would welcome some sort of notification when my feeds drop into the E category.

    Best regards,

  2. Andrew Bloomgarden said:

    I’d really appreciate it if I received some sort of notification when a feed 410s thus unsubscribing me. I don’t think this happens often, but I know that I wouldn’t likely notice for a month or so if a feed suddenly disappeared unless it’s one of my favorites.

  3. Brad said:

    Thanks for the explanation of this. I in fact just got a reply from NewsGator support about why it took so long to see updates to my (category C, possibly D) blog. Their answer was to disable syncing, which was an unfortunate solution. I’ve since turned on pinging so that I can reënable syncing.

  4. Brian said:

    And this is the reason why I don’t use the sync feature. I like being able to specify the amount of time in between checking, and I don’t have to worry about any “penalty box” senario (which only hurts us because our feed stops updating at all).

  5. Greg Reinacker tells us how often NewsGator updates feeds said:

    [...] Category F: this is somewhat of a grab bag of other cases. The most visible type of feed in this category is craigslist feeds - we retrieve them on a 48-hour interval. This sucks - for you, for me, for everyone - but the problem is craigslist will throttle and blacklist us, and they seem not to be interested in solving this problem with us (we’re also not the only ones with this problem). So 48 hours is roughly the minimum interval we can get away with and minimize the chances of getting blacklisted (which takes days to undo). Source: NewsGator feed retrieval intervals - Greg Reinacker’s Weblog - Musings on just about everything. [...]

  6. Sam said:

    Could you please ask Facebook to add Newsgator to their “ping list”? My Facebook Notifications feed is super slow.

  7. Patrick said:

    If I’m using NetNewsWire in syncing mode, if I set a feed to “don’t sync” then it doesn’t follow these rules, right? In other words if I want to sub to a CL feed I can do it that way and get it updated once an hour?

  8. gregr said:

    Patrick - that is correct.

  9. Around the web | alexking.org said:

    [...] NewsGator feed retrieval intervals - Scott and I had many long discussions about this very topic back in the day. [...]

  10. The devils in the feed details Life is grand said:

    [...] Greg’s post is interesting it really comes down to that in the end; Users shouldn’t have to worry. They shouldn’t [...]

  11. DDA said:

    “We get a fair number of inquiries in the forums and elsewhere about feeds not updating; in nearly all of those instances, everything is actually working fine - the feed has usually fallen into category E for whatever reason.”

    Then things are *not* working fine; the user is confused or upset about something and the explanation is about how your system has decided their feed doesn’t get refreshed.

    I like the idea of syncing all my readers so I’m not reading the same stuff over and over. But I have important feeds that I want updated and being told, “Well, our system decided your feed had some issue so it won’t be refreshed when you want it to be” doesn’t cut it. So I turn off syncing since I can’t find a way to exclude one feed in NNW; while I can easily set a custom refresh interval, it is in *hours* but I’ve set the default feed refresh to be 30 minutes.

  12. Jo said:

    You’re effectively penalizing your users for something beyond their control, which just seems insanely stupid to me. A single 404 kills and updates for 24 hours? That’s crazy. Four hours for a 500? I could see that being acceptable after MULTIPLE 404s or 500, but not after just one.

  13. Sebastian Lewis said:

    Jo, that’s the thing though, if the feed just keeps 404ing then Newsgator would just be wasting bandwidth by continuing to let that feed 404 every time they do a refresh. It’s just easier to put it on a 24 hour interval until it returns so that they don’t hammer their servers with needless 404s.

    Sebastian

  14. tbelcher said:

    Mate, this is just bloody stupid! I’ve just wasted the morning trying to figure out why a couple of my feeds refuse to update. Couldn’t FeedDemon at least show some status icon on feeds that are causing it grief?

    Also the evidence is that your system puts some feeds in the too-hard basket for a lot longer than 24 hours.

    I have 2 feeds that have not been updated for the better part of a week. I look in the raw XML and see that the feed data is correct - the XML contains the latest entries. But your program refuses to show them. That’s just plain !@#$%%^-ing mad.

  15. TPN :: The Global Geek Podcast » Blog Archive » FeedDemon now Working; What About Your Feeds that are Not Updating as they Should said:

    [...] reading is an article about what NewsGator does by the CTO of NewsGator Greg Reinacker. The article is well worth the read. For publishers check [...]

  16. Symphonious » More On NewsGator Syncing said:

    [...] NewsGator Syncing that I thought were worth following up on. Firstly, Greg Reinacker points to the article I had in mind about how NewsGator polls the feeds, and Andy pointed me to this forum posting about it which shows how to see why feeds aren’t [...]

  17. Geoff said:

    I’ve just found this post linked from the NewsGator forum, because I had a single feed that refused to update in NetNewsWire. The problem turned out to be that the server had, on roughly June 13th, returned an authorisation error (even though the feed doesn’t require authorisation). No updates had been received since then, suggesting that there’s a class of errors that will cause a feed to not be updated for much longer than 24 hours, or perhaps no longer updated at all.

    It would be enormously helpful if the error status of a synchronised feed on the NewsGator server could be propagated to the NetNewsWire client, so the user knows that they may need to force a refresh to get new entries. I only became aware of the problem at all because someone else subscribed to the same feed (through Google Reader) asked whether I’d seen the latest post… which of course led to the question: “Why aren’t you using Google Reader?” I like NetNewsWire, but I don’t like missing out on news and having no visibility of the reason.

  18. Selva said:

    How about the internal feeds stored in Newsgator Enterprise server? Do they have catogarization too? Is there any option in Admin to specify this or the time period to poll?

  19. gregr said:

    @Selva - NGES feeds are all treated equally; there is no algorithm in place there to auto-adjust retrieval intervals. A system admin can specify the global retrieval interval, though, and NGES also has a standard XML-RPC ping endpoint at /ngws/xmlrpcping.aspx.

  20. Selva said:

    Thanks Greg for the clarification

Leave a Reply