Atom feeds and well-formed XML

Nick and Brent (here and here) have announced that FeedDemon and NetNewsWire will have strict parsing for Atom feeds, meaning non-well-formed feeds will not parse in those products. I totally understand their position, and sympathize with their arguments. However…

NewsGator 2.0, and all of the NewsGator editions being shipped as part of NewsGator Online Services on January 19, will parse Atom feeds using a very similar parser to that used for RSS; that means that most “questionable” feeds (of which there are a LOT) will parse ok.

The vast majority of our customers don’t care about well-formed XML – they care about getting information. Our tools are designed to make that happen.

25 thoughts on “Atom feeds and well-formed XML

  1. Dave Winer

    It’s perhaps true that they don’t care about well-formed XML (I think they should, btw) but they do care about being locked into using one tool because new developers can’t get into the market because you’ve had time to adapt to all the bugs in content, and of course the content has had to adapt to your bugs. It’s why we’re in this terrible situation with Web browsers. It’s why *users* are in that terrible situation.

    The problem with your position is that it depends on users being uninformed. In 2004, using the technology you’re selling, if they’re uninformed, it’s just a temporary thing. These people soak up information, and it is in their interest that content be well-formed and valid, and if we’re patient they will see that, and will reward developers who don’t erect unnecessary barriers to user choice.

    I’ve been around this block several times, and believe I know how the story ends, and while I don’t have a crystal ball, I’m pretty sure you’re pursuing a losing strategy. It may take some time to shake out, but it always does.

    Reply
  2. Mark

    It is left as an exercise for the reader to figure out how users are supposed to stay informed if their aggregator refuses to display the information they paid for it to display.

    Reply
  3. Sérgio Nunes

    Thats how HTML started.. and now we have hacks for each browser :-/

    How will you inform users that what they are seeing is not the original content?

    Reply
  4. Sérgio Nunes

    The pressure to produce valid feed is not on the consumers (they don’t care and should not have to care), its on the tools.

    Reply
  5. Stefan Tilkov

    As a user, I totally disagree that I benefit from applications that accept crappy XML input. Product developers will waste their time supporting lots of invalid feeds instead of focusing on new features; not being able to use standard XML tools will raise the barrier for new companies that enter the market.

    Using XML without requiring it to be well-formed is absolutely, totally counterproductive.

    Reply
  6. Bryant

    What terrible situation with Web browsers? The terrible situation of having a full-featured robust Web browser named Safari available on the Mac, and the terrible situation of having a pretty decent Web browser named Firebird available on all platforms?

    IE’s dominance doesn’t have anything to do with the difficulty of writing a browser. It’s due to the fact that they’ve convinced a lot of people to write content that relies on Windows-only features, and the fact that it’s bundled with the OS.

    Reply
  7. Greg Reinacker

    Wow, who knew this would generate so much controversy. Here’s the thing. I personally see most of the support questions that come in about NewsGator, and a good portion of those are about feeds that won’t parse. Until we tell them otherwise, customers assume it’s a bug in NewsGator…when they learn it’s a badly-formed feed, they’re happier with NewsGator, but their primary problem is still there – they can’t read the content they want.

    And there are a TON of not-quite-correct feeds out there, mostly with encoding errors. There are a couple of blog publishing tools in particular that are the worst about this – I won’t name them here, but I see a ton of feeds with encoding problems that won’t parse without a little help.

    My Mom doesn’t care if a feed is valid XML, and doesn’t want to have to email 30 different people about fixing it before she can read the content. If it doesn’t parse, she’s mad – at NewsGator, and at the publisher. We do our best to make sure she can read it.

    Reply
  8. Jeremy Gray

    This is a pretty slippery slope, indeed, and I think its best for me to say that if its not well-formed, its not XML. Similar arguments could also be made at the schema level, but as an RDFer I’m more flexible in that regard.

    re: “The vast majority of our customers don’t care about well-formed XML – they care about getting information.”

    Your customers surely care about getting information, but do not care about having to pay your developers and testers to work on handling of tag soup instead of on performance, stability, and compelling new features.

    Reply
  9. Patrick

    Much has been made of the “mistakes of the past” wrt browser hacks and parsing malformed HTML producing browser hacks. What about how the browsers tend to choose to render things differently even in strict mode? Take a look at any “modern” browser and you will find differences in rendering not related to parsing malformed HTML. My point being that even if way back when, browsers enforced strict HTML compliance, we would still see the same “problem” of browser hacks and thus unfair to compare to the current discussion.

    Let’s look at DVD players for a moment though. If you buy a DVD player and then you go buy the latest Criterion Edition of Plan 9 From Outter Space, which just happens to add extra features outside of the DVD spec, and it doesn’t play…do you blame the content producer or the player? Are consumers supposed to be educated about DVD specs and not buy from those that don’t obey the spec? I use DVDs as my example because I know a person who works on making players. They go out and buy every XXX DVD they can find and test their players on them since those type of DVDs tend to live outside the lines of the spec to add “features.”

    Just another mental exercise…

    Reply
  10. Greg Reinacker

    To pb, yes – your posts are being deleted. And if one post is deleted, then reposting the same thing will certainly get deleted again.

    If you don’t understand why I’m deleting your comments, or you think I should allow them to remain here, feel free to contact me privately about it.

    Reply
  11. Danny

    If the Atom specification states that you must reject invalid feeds, will you a) not make any claims about supporting Atom, or b) mislead your users?

    Patrick, I suspect the specifications for DVD formats are considerably more demanding than producing well-formed XML.

    Reply
  12. Greg Reinacker

    Come on folks, let’s not get bogged down in lawyer-speak in specs. The Atom spec might say something like “if it’s not well-formed XML, it’s not an Atom feed”. Ok, fine…we’ll parse EVERY valid Atom feed.

    We’ll also parse a couple of feeds that aren’t Atom, per the spec, but that we can make some sense out of for the benefit of our users.

    Reply
  13. Atom & XML

    Les auteurs respectifs de NetNewsWire et de FeedDemon ont annoncs cette semaine que leur support d’Atom serait XMLement strict. Cela veut dire que les fils Atom devront ncessairement tre valides XML pour tre lu dans ces 2 aggrgateurs et que d…[more]

    Reply
  14. Mark

    Danny, your home page is currently invalid XHTML because of… wait for it… XML wellformedness errors.

    Luckily for you, you serve it as text/html and my browser is very forgiving, so I was able to read what you had to say. You are free to fix the problem at your own convenience. Imagine how frustrating publishing would be for you if this sort of silly mistake was treated as a critical error.

    Apparently XML is more demanding that you thought.

    Reply
  15. MB

    Here’s an idea:

    Change your User Agent to indicate that the last time you read the feed, it appeared invalid. (e.g. instead of “NewsGator”, use “NewsGator-FeedCorrectionMode”). That way the feed producers will see the error. The user might never see anything, though you could put the ‘feed corrected’ info in a custom Outlook column so interested users could search on it.

    (Based on a comment from Raymond Chen on Robert Scoble’s blog.)

    Reply
  16. XML and Information Theory

    Greg Reinacker stirred up a controversy by announcing that he would update the Atom parser in NewsGator to parse and display badly formed XML. Those who criticize Greg’s decision point to the fact that if parsers reject badly formed…[more]

    Reply
  17. Allyn

    I know this isn’t for newsgator support, but figured the post would be relevant.

    Has anyone had problems with newsgator 2.0 and google groups feeds?

    GG uses atom, but newsgator doesn’t see it as valid. I can’t believe google, of all sites, would have badly formed atom/xml, but who knows…

    Reply

Leave a Reply to Greg Reinacker Cancel reply