Only 4.13% of the Web is Standards Compliant?

Browser maker Opera has conducted a recent study to see how much of the web is standards compliant. Using a specialized web crawler, dubbed “MAMA” for “Metadata Analysis and Mining Application,” that searches around 3.5 million pages, the company has determined that a mere 4.13% of the web is standards compliant.

Of course, one wonders about the accuracy of this study. There are certainly more than 3.5 million pages on the internet. Perhaps they were only searching a portion of the web that had less valid pages? And does a site with 100 non-compliant pages count as 100 invalid pages? How many of those sites are invalid because they try to comply to Microsoft’s bogus standard (a.k.a the “does it look alright in IE?” standard) at the same time?

I can understand the small figure, and maybe it is realistic. After all, many a website almost validates, such as Reddit.com, which has one lone (and minor) error stopping it from validating. And heck, Google and Amazon are validity-challenged. Amazon has “1445 Errors, 135 warning(s)” on it’s front page.

Many monolithic sites that you’d think would validate don’t, though they look fine in most browsers anyway. This brings up an interesting question: Does it matter whether you meet the standard to the letter, or is it okay if it looks fine in all of the standards-compliant browsers? What’s your opinion?

News article: Opera study: only 4.13% of the web is standards-compliant

Interesting Reddit Discussion: http://www.reddit.com/r/programming/comments/77grk/

  • http://stevenclark.com.au Steven Clark

    I think a large part of this issue was originally nobody had doctypes (say 4 – 5 years or more back in history) so a PhD researcher in Europe discovered at the time only 1% of sites carried doctypes. Then pressure was put onto the manufacturers of WYSIWYGs in particular and CMS developers to insert doctypes by default. The problem with that is people not even aware of what a doctype was simply produced the same junk – doctypes present / mostly invalid the moment content went on the page. So doctypes, the trigger for browsers to measure standards compliance, was kind of perverted at that point, and the web was littered with docytypes that really didn’t mean compliance anymore. So it would be interesting to see what was defined as compliance by Opera in this case, as you said. Is that doctype presence or doctype validity of the document? Personally I think Sturgeon’s Law applies to the Web like anything else – 99% of everything is crap. It’s always going to be unrealistic to believe that the web will provide total conformance. 99% of people whacking up pages will not care – but professional authors should care and should create quality products. Just like somone fixing their own car might not care about some detail that I would totally expect a qualified mechanic to do without question. Just my 2 cents.

  • http://www.webmaster-source.com Matt

    @Steven A good argument. I’d like to learn more about this study as well. I did a little bit of digging, and found Opera’s full report, which is available here: http://dev.opera.com/articles/view/mama/ I haven’t looked through it thoroughly myself yet though.

    One part I thought was funny:

    Another related statistic MAMA uncovered was that, of the number of sites proudly displaying “W3C validation badges”, only ~50% of them actually validate.

  • http://stevenclark.com.au Steven Clark

    Yes I saw that, which is probably more positive than it sounds.I’m very big on validation as a development methodology tool, validating code is much easier to find and fix cross browser CSS issues. But I find every new person I work with agrees with the idea but doesn’t appreciate it until they come knocking on the office door looking for answers to why their layout broke. Without the validating I just pull my hair out.The second part of that is that nearly everything you pass over to your client in a CMS as valid code ends up invalid the moment they start to edit, upload images and play around. One of the hidden costs of a CMS I guess. So I was impressed a whopping 50% remained valid. I probably wouldn’t badge a client’s CMS solution though. I more often badge blogs so that for ease one can post then validated. Of course, a simple Firefox plugin is a better solution to the badge for this purpose.

  • http://www.webmaster-source.com Matt

    @Steven, I use the fabulous Web Developer Toolbar for my validating needs. It does the job well for my purposes. Do you use the same extension or something similar?

    As for the CMSes invalidating pages, they don’t *have* to, though out of the box it often happens. If you’re careful with theme building, and about what plugins you install, it is possible to keep the markup valid. It’s a right pain in the neck though. I mainly content myself with having the original template (before CMS-ifying it) be valid, and then I just make sure it still works among all the major browsers (esp. standards-compliant browsers). I’d like to improve upon that methodology though. I’d like to have Webmaster-Source be valid all-around after the next redesign, though I don’t know if it will be feasible or not.

  • http://stevenclark.com.au Steven Clark

    I use the Web Developer Toolbar but also have a bunch of other developer Firefox plugins that come in useful. The HTML Validator plugin sits on the bottom right of my PC screen and shows a tick or red cross on every page I look at. If I ever view source the invalid code is highlighted. I also use YSlow for performance measurement, and Firebug but there are a lot out there.The problem with CMS is really the client you pass it to. Often they are ill equipped or don’t care so a couple of content pages later and the page is invalid. And I’ve found over time that it’s too costly to keep a “free eye” on them. But at least what I provide is valid when they receive it and it’s originally populated with content.I find the biggest benefit of validation is in helping me debug code and to speed up my cross browser compatability. Trying to debug CSS with invalid code can be a nightmare. It’s always better to have something work for everyone than actually be valid too, if it comes down to the crunch I guess.