Category Archives: Uncategorized

Thoughts on RSS and bandwidth

Every couple of months, there is another uprising about the bandwidth usage of RSS…the most recent one has been going on in the last couple of days, and this post from Robert Scoble is right in the middle of it, along with its associated comments. In another post, he even says “RSS is broken.”

As you could probably surmise, NewsGator’s own RSS feeds (such as the News/Updates feed) generate an enormous amount of traffic. This isn’t unexpected, and our network is designed for this…but I understand what people are seeing with their feeds. We use HTTP caching mechanisms to dramatically reduce the total bandwidth requirements, and other internal caching mechanisms to reduce overall server load.

90% of the discussion on the bandwidth issue centers around RSS aggregators, and how they are allegedly abusing servers relentlessly. Robert makes a rough estimation that hits increase 20x by having a RSS feed on a site like MSDN. He also surmises that this will get worse and worse over time:

This gets worse over time because on most sites HTML traffic will go down as people move away (at least until the site reposts interesting content that’ll bring back more traffic) while RSS just grows and grows even if new content doesn’t get posted because people subscribe and don’t move away.

Let’s look at two cases. Let’s make the assumption that the “average” aggregator will default to polling once an hour. Let’s also assume that the server implements HTTP caching headers in some way – really, I consider this a minimum entry criteria for RSS on a busy site.

Case 1 – the content on the site doesn’t update often (let’s say once a day). If the feed only updates once a day, 96% of the requests for the feed (23/24) will return a 304 Not Modified response. The other 4% of requests will respond with the entire contents of the feed. For the 304’s, the bandwidth is small (not negligible on an extremely busy feed, but low enough to not be a huge concern)…total number of connections are something to worry about, but typically not a big issue in most environments.

And the 4% will drop even smaller if the content is updated less often than once a day.

Case 2 – the content on the site is updated often, such that there are almost always changes from hour to hour. Assuming the feed updates in real time, every request to the RSS feed in our example will return the entire feed. This is the case that’s worth worrying about.

Given case 2, there are a number of things that can be done. Fewer items in the feed, excerpts versus full content; all of these have their issues. Some folks have suggested serving incremental content changes based on if-modified-since headers, which not only violates the HTTP specification, but breaks in common caching proxy scenarios. So what can you do?

One possible thing you could do is use caching headers to limit the potential “exposure” of a shorter-than-ideal aggregator polling interval. Nick Bradbury describes one such way to do that here.

Another similar option would be to batch feed updates to once or twice a day. All of the RSS feed requests would return a 304, except for those that occur just after the daily update(s). If there is one update a day, you cut 96% of the required bandwidth in our example. But wait – isn’t the point of RSS to get quick updates to site changes?

Now it gets interesting in a different way.

Back to Robert’s example, he assumes that users without RSS will break down as follows:

20% will visit at least once a day
40% will visit at least once a week
20% will visit at least once a month
20% will not visit in any one month (assuming these folks visited before but just aren’t revisiting)

But look at it this way – 80% of users will be at least a week behind on new content, and 40% will be at least a month behind.

So do you care about these users? Do you have content that you think they would be interested in, if only they knew about it? Would you benefit in some way if these users were reading your content more often? If yes to any of these, RSS helps.

You’re distributing incremental content to users who might be interested. From a business perspective, you can’t compare the bandwidth required by that process to the bandwidth required if these users only occasionally come to your site.

Further, the RSS hits will generally be smaller that the corresponding HTML pages, and also have less ancillary impact (such as images on the site, layout, etc). For example, my weblog front page is 58KB right now, and the RSS feed is 19KB. Adding images and such to the HTML version, and let’s call it 80K, approximately 4x the RSS size.

So I’m finally getting to the point. :-) Assume there is benefit to having users read your content every day. If you had some way to convince your interested users to do this (which of course you’re trying to do), going back to Robert’s example for the HTML site:

1000 users x 30 visits/month = 30,000 visits/mo (assuming once/day)

This is the ideal case for the site – assuming more exposure for your content is better. We’re not counting ancillary hits here, which will certainly add to the server load.

With RSS, let’s say we set it up to update/publish the feed 4x per day – which gives aggregator users an average 3 hour delay before they learn of new content (vs. 24 hours for the HTML):

1000 users x 120 hits/month = 120,000 hits/mo

Remember, all of the other hits (potentially 20 per day per user) are negligible in terms of bandwidth due to cache header implementation.

So we have 4x as many hits, but 1/4 the overall size…so it’s a wash in terms of bandwidth. And users are exposed to your content multiple times per day, which is good for you and them both.

If quicker updates are important for your users, then there is an incremental bandwidth cost to pay for that…but you as the publisher can control this, based on the information you’re trying to push.

Anyway, many of these numbers are pulled out of the air…but the point is, most mature aggregators (like NewsGator and NewsGator Online) use the HTTP caching mechanisms, so use them. And further, there are things you can do on the server side to manage the bandwidth load, depending on the goals you have for your feed.

Comments welcome as always!

Supernova

For anyone who’s going to be at Supernova, be sure to check out the “Spam and the Future of Email” panel Thursday at 4:30pm…I’ll be on that panel, joined by some folks from Cloudmark, Turntide, and Oddpost. My contribution to the panel is obviously in the relationship of RSS and email, and how RSS can potentially help with the spam problem for certain categories of email.

And even if you don’t go to that panel, be sure to find me and say hello if you’re at the conference!

The Uncool Blogging Seminar…

…also known as Blogging for Associations, Non-profits & Content-driven Web Sites, is happening on June 30 in Washington, D.C. Debbie Weil, a wizard of online publishing and marketing, has lined up Komra Moriko to speak. Debbie says there are still a few spots left as of today.

NewsGator Technologies is pleased to sponsor the seminar, and is giving away a 6-month subscription to NewsGator Online Services (a $65.70 value) to all attendees.

To quote Debbie – “Blogging for business isn’t about being cool…”  Well said!

RSS Webcast

I’ll be participating in a RSS round-table webcast on Thursday morning, 10:00am PDT. It’s free – if you’d like to listen in, you can register here. On the call will be Mark Fletcher, Mitch Ratcliffe, and Harry Hayes. [the registration page says differently, but I believe this is the more up-to-date information]

Interview

Continuing his series, Harold Check has published an interview with me on his weblog – The New Net Architects, Part IV – Greg Reinacker. A quote from the interview:

Syndication will change things. For a taste, look at the early adopters of syndication technology, and how much they say it has changed their online habits, and how much it has changed their lives.

I very much enjoyed reading his earlier installments of this series with Luke Hutteman, Mark Fletcher, and Brent Simmons…check them all out!

Reminiscing

I was just flipping through some old posts here on my blog. It’s been fun – a lot of it really takes me back to what was going on at the time. Like an online diary. I know, I know, people have been reflecting on this for ages, but I finally have posts that are old enough it’s fun to go back and read them :-).

I’m going to try to start writing some more here, as soon as things calm down a little. Some of my favorite posts here have been about non-work-related stuff…so hopefully I can squeeze some more of that in too.

Some of my favorites –

Semiconductor physics – to those who know me well, you know this has to be a favorite. :-)

Most eligible bachelorettes – darn it, neither Sandra nor Britney called.

Valvoline runoffs (1, 2, 3, 4, 5) – tidbits from one of the highlights of my racing career thus far, the 2002 SCCA national championship.

First race at Arizona Motorsports Park – I’m glad I wrote about this…since as it turns out, this was the last race there I would get to go to before the track closed.

Driving (1, 2) – musings on understeer/oversteer, and front/rear wheel drive. Some of the most fun I’ve had writing on my weblog…

Travel by Messenger

It’s not all that often that I post about some new technology that I see, but today’s going to be an exception.

As some of you know, I used to do development consulting in the travel business. Well today, I was fortunate enough to get a glimse of a new application from TravelMessenger, the likes of which I haven’t seen before. It’s basically an automated travel agent, which you interact with via MSN Messenger:

Very interesting. It’s got some quirks at the moment, but it’s a great demo. I’m not completely convinced I would book a trip this way, but maybe – it’s definitely got potential. With some good AI work, I think this could become something very different than most of us are used to – in a good way. I’m told this will go into a limited beta next week…

Phillip Torrone’s video feed

Many of you probably know Phil Torrone – the guy behind flashenabled.com, who always seems to have cooler toys than the rest of us. :-)

Phil’s built a video feed, optimized for NewsGator Media Center Edition. In Phil’s own words:

“…basically, with this, i have my own tv station.”

He works for Fallon Worldwide, which is the company behind BMW Films…which is good for us, because Phil’s feed even includes his favorite episode from the BMW Films series! Here’s a screen shot running in NewsGator Media Center edition:

Nice job, Phil! For any of you already running NewsGator Media Center edition, Phil’s feed can be added from the “Featured Feeds” section.

Orbz in the Arcade

This is so cool. A while back, I posted something about my friend Justin Mette, who runs an independent game studio. Their claim to fame, so far, is Orbz – it was even listed as one of the best games of 2003 (amidst very good company, like Halo and Madden) in PC Magazine.

Well, pretty soon we’re all going to kiss our quarters goodbye:

Today, 21-6 is thrilled to announce that we have signed a deal with TLC Industries to put Orbz in the Arcade! The arcade version will contain a new mode of play with the focus on cumulative scoring across many levels. If all goes well, Orbz will be in the arcades by summer. [21-6 Productions News]
Congrats to the 21-6 team!