Tag Archives: (x)html

Google Canonical URLs

Finally, the solution for our duplicate content worries is over! Google now supports a new method to specify a canonical URL for your page. This “hint” suggests that Google use this page as the original, and ignore duplicates elsewhere on your domain.

You simply add the fully W3c-compliant <link> tag in your header, and have it point to the permalink for a given post. Google will most likely rank that page in their results, and ignore others. That should help out your ranking overall.

<link rel="canonical" src="http://www.example.org/your/permalink/page/" />

Obviously you’ll want some way to integrate this with your CSS. Some will want to roll their own solution, but if not, there are already prefab options available.

Markdown in PHP

HTML isn’t exactly easy for ordinary people to comprehend and use correctly, and allowing it’s use in web forms then entails measures to be taken to prevent malicious code from being inserted.

The infamous John Gruber came up with Markdown, with the help of Aaron Swartz, and whether they intended it to be or not, they came up with the solution to our problem. Using Markdown formatting, you can make text italics by putting an *asterisk* on either side, or bold by using **two.** Blockquotes are as simple as putting a “>” before a paragraph of text. Links are a little more complicated, but they’re easier to do than with straight HTML for the average user.

Markdown is a nifty solution for allowing users of a website to format their input, and it’s gained a good measure of popularity. Reddit is one site that makes use of it for it’s comment forms.

Markdown-enabling a website isn’t too hard for someone with a bit of coding experience. You first need to find an implementation for the language of your choice, unless you want to write your own. Daring Fireball has a Perl implementation right on the Markdown homepage, but what if you’re like me and prefer PHP? Download a copy of PHP Markdown. The script functions like an ordinary PHP library, or as a WordPress plugin, enabling you to use Markdown in comments and the Post Editor.

Using Markdown in your own PHP script is as simple as including markdown.php and passing any Markdown-formatted text through a function to convert it to straight HTML.

include_once "markdown.php";
$my_html = Markdown($my_text);

I would also recommend additionally using the strip_tags PHP function to first remove any HTML tags someone may have put in.

Or you could use the WMD Editor, which applies a JavaScript formatting bar to an input form, allowing the contents to be formatted with Markdown. It then spits out full HTML for the form when it is submitted.

H1 or H2? How to Handle Headers?

In (X)HTML, there are several levels of headings. You have h1, h2, h3, h4, and so on. The proper way to use them is in a structural manner, a post title having a lower number (e.g. h1) than subheaders in a posts content, which would be the next one down (and then subheadings below that would be h3).

Search engines are very big on the h-tags, treating content in h1s and h2s as more important than most other text.

The question is, where should that structure start? Information on the semantics of heading tags is very mixed, and can be hard to sort through.

Continue reading →

Only 4.13% of the Web is Standards Compliant?

Browser maker Opera has conducted a recent study to see how much of the web is standards compliant. Using a specialized web crawler, dubbed “MAMA” for “Metadata Analysis and Mining Application,” that searches around 3.5 million pages, the company has determined that a mere 4.13% of the web is standards compliant.

Of course, one wonders about the accuracy of this study. There are certainly more than 3.5 million pages on the internet. Perhaps they were only searching a portion of the web that had less valid pages? And does a site with 100 non-compliant pages count as 100 invalid pages? How many of those sites are invalid because they try to comply to Microsoft’s bogus standard (a.k.a the “does it look alright in IE?” standard) at the same time?

I can understand the small figure, and maybe it is realistic. After all, many a website almost validates, such as Reddit.com, which has one lone (and minor) error stopping it from validating. And heck, Google and Amazon are validity-challenged. Amazon has “1445 Errors, 135 warning(s)” on it’s front page.

Many monolithic sites that you’d think would validate don’t, though they look fine in most browsers anyway. This brings up an interesting question: Does it matter whether you meet the standard to the letter, or is it okay if it looks fine in all of the standards-compliant browsers? What’s your opinion?

News article: Opera study: only 4.13% of the web is standards-compliant

Interesting Reddit Discussion: http://www.reddit.com/r/programming/comments/77grk/