Tag Archives: (x)html

Building iTunes Links

Have you ever wanted to link to an iTunes page? It’s easy enough to copy a long, nasty-looking URL like http://itunes.apple.com/us/album/yours-truly/id310905907 right from the application by right-clicking on the album art or title. This works well enough if you’re manually linking to an album or iPhone app from a blog post, but what if you need something a bit more user friendly?

It’s not documented terribly well, but you can create much nicer short “search” links using the itunes.com domain. Here are a few examples:

You can even combine the search links with your iTunes affiliate ID, if you know how to do it right. It looks something like this:

http://itunes.com/artist/album?partnerId=30&siteID=YOUR_AFFILIATE_ID

Now for the fun part… Wouldn’t it be neat to be able to generate the links automatically? I’m doing that over at Folk Music Site (a fun project I put together last summer, as I thought it would be neat to have a directory of noteworthy folk musicians). Here is some simple PHP that will generate iTunes links on-the-fly:

function get_itunes_link($artist, $album) {
 $affiliate_id = ''; //Linkshare affiliate ID for iTunes
 $artist = clean_itunes_string($artist);
 $album = clean_itunes_string($album);
 $link = "http://itunes.com/{$artist}/{$album}?partnerId=30&siteID={$affiliate_id}";
 return $link;
}

function clean_itunes_string($string) {
 $string = strtolower(str_replace(' ', '', $string)); //remove whitespace and makes everything lower-case
 $string = str_replace('&', 'and', $string); //Replace ampersands with the word 'and'
 $remove = array('!', '¡', '"', '#', '$', '%', '\'', '(', ')', '*', '+', ',', '\\', '-', '.', '/', ':', ';', '<', '=', '>', '¿', '?', '@', '[', ']', '^', '_', '`', '{', '|', '}', '~', '©', '®', '™');
 $string = str_replace($remove, '', $string); //Remove special characters that iTunes doesn't like
 return $string;
}

echo get_itunes_link('Feist', 'The Reminder');

Note: iTunes links should not have accented characters like á or ü. They should be replaced with their base ASCII counterparts (“a” and “u” respectively). The above code does nothing to sanitize these characters, so you might run into problems unless you write some additional logic to handle that. If you’re using a framework like Kohana, you might have a convenient helper function like utf8::transliterate_to_ascii().

MediaElement.js — HTML5 Video Player With Flash Backup

Many modern web browsers have early support for the <video> and <audio> elements in the HTML5 spec. Unfortunately, their implementation varies depending on the ideals of the various browser developers. Safari expects video to be encoded in the high-quality H.264 codec, other browsers prefer Ogg Theora. Google is trying to push their own freely-licensed VP8 codec, which Mozilla is showing signs of adopting. Then there’s Internet Explorer, which doesn’t support the <video> element at all.

Thankfully, there’s a way to fairly easily support everything. You can offer HTML5 video in one or more formats and fall back on Silverlight or Flash if necessary.

MediaElement.js allows you to do that with a little bit of jQuery voodoo. After including all of the required files, you can serve-up an H.264 video for Safari and iPhone/iPad users like so:

<video src="myfile.mp4" type="video/mp4" width="640" height="360"></video>
<script>
jQuery(document).ready(function($) {
$('video').mediaelementplayer();
});
</script>

There is also a way to specify more than one video type in the <video> element, if you have re-encoded it into more than one codec:

<video width="640" height="360">
<source src="myfile.mp4" type="video/mp4" >
<source src="myfile.ogg" type="video/ogg" >
<source src="myfile.webm" type="video/webm" >
</video>

You will want to check it out if you’re interested in cross-browser compatible web video.

Remove Default Textarea Scrollbars in Internet Explorer

Have you ever noticed that Internet Explorer, with its great wisdom and intelligence, likes to add a useless scrollbar to the side of every HTML textarea? Most browsers add one when it is needed, but IE adds it in right away.

There’s an easy CSS fix though:

textarea {
   overflow: auto;
}

Loading JavaScript Asynchronously

BuySellAds and Google Analytics, in an attempt to make the internet faster, recently changed the code snippets they use for serving ads and tracking visitors, respectively, to be non-blocking and asynchronous. This means that the scripts won’t hold up the rendering of your pages while they load. If you reload this page, as an example, the ads to the right may actually appear after the rest of the page has finished loading. It gives the appearance of being a lot faster.

How can you load your own scripts asynchronously? Here’s an example I put together after dissecting Googles’ and BuySellAds’ scripts:

<script id="myscript" type="text/javascript">

(function() {
 var myscript = document.createElement('script');
 myscript.type = 'text/javascript';
 myscript.src = ('http://example.org/myscript.js');
 var s = document.getElementById('myscript');
 s.parentNode.insertBefore(myscript, s);
})();

</script>

What it does is it dynamically assembles and writes out a <script type="text/javascript" src="..." /> DOM element, placing it just before the loader script. You could alter the insertBefore part to append loaded script to the <head> element if you’re so inclined.

You can read further on this technique from these sources:

PHP Simple HTML DOM Parser

It’s always fun to obtain data from REST APIs and parse the XML or JSON response. Twitter, for sure, wouldn’t be what it is today if not for the thriving community of developers building applications that tie-in with the API.

But what do you do when you need to obtain information from a site that doesn’t have an API, or at least an RSS feed that you could dump into SimpleXML. You scrape the page. There are numerous methods of doing that, such as using file_get_contents() and passing the resulting HTML to Tidy (to convert everything to strict XHTML) before invoking SimpleXML.

One of the simplest options is S.C. Chen’s PHP Simple HTML DOM Parser. Once you include the PHP library, you gain access to a set of functions that lets you read and modify HTML content with jQuery-like selectors.

Here is an example of scraping Slashdot headlines:

// Create DOM from URL
$html = file_get_html('http://slashdot.org/');

// Find all article blocks
foreach($html->find('div.article') as $article) {
 echo $article->find('div.title', 0)->plaintext;
}

As usual, with great power comes great responsibility. There are certain ethical guidelines to data scraping. Don’t steal articles for republication, use caching so you don’t make too many redundant requests to the target server, credit your source, etc.. If you do some Googling, you’ll probably find some relative articles.

A Standard to Specify a Canonical Short Link

There has been a small push to create a standard way for a web page to specify a preferred short link for use in places like Twitter. Something like the rel="canonical" trick that tells search engines which page on your domain is the one that should be indexed. Basically, a meta tag to put in the page header, which could then be read by Twitter applications. The end goal is to help reduce the issue of “link splintering,” where everyone ends up linking to the same page with a different URL. (For instance, I could shorten a link to this page with Is.gd, then three others could create their own different Bit.ly links…)

One proposal is rev=”canonical”, but I really don’t I don’t like that option. This comment sums it up pretty well. Rev is too easily confused with rel, and is deprecated in HTML5 to boot. The “canonical” terminology also isn’t fitting, since it implies that the short URL is the preferred URL for the page (i.e. “the short link is preferred over the full one”) rather than an alternate link.

I found it interesting to learn that WordPress 3.0 is going to start automatically including something along the lines of this on permalink pages:

<link rel='shortlink' href='http://fantasyfolder.com?p=32' />

There will be hooks to override it with your own URL (so a plugin could place a single Bit.ly or YOURLS link there on publication), but the URL is irrelevant for the purpose of this discussion. The rel='shortlink' part is what interests me. I think it’s the perfect term to use for this scenario.

I think, whether you use WordPress or not, rel="shortlink" is what you should go with. (If you’re worried about controlling short links, at least.)

HTML Text Over an Image

Have you ever seen a site where HTML text is rendered over an image? One example of this is Pro Blog Design‘s article headings.

HTML Text Over an Image on Pro Blog Design

The effect looks good, and it’s search engine-friendly. CSS-Tricks has a tutorial on how to create a similar implementation with CSS absolute positioning.

Basically you create a relatively-positioned DIV and put the image and H2 elements inside. Then you absolutely position the two elements, and add a solid or semi-transparent background behind the heading text.

Text Blocks Over Image [CSS-Tricks]

Securing PHP Web Forms

Chris Coyier has written an interesting article on securing form scripts. Serious Form Security talks about token matching, hack logging, and a few other useful techniques to apply to a form processing script. Token matching is definitely a trick worth learning, since it will do a lot to stop bots from submitting data through your form.

The first thing that we are going to do is generate a “token”, essentially a secret code. This token is going to be part of our “session”, meaning it is stored server side. This token also is going to be applied as a hidden input on the form itself when it is first generated in the browser. That means this token exists both on the client side and the server side and we can match them when the form gets submitted and make sure they are the same.

One of the best (worst?) ways to spam forms is to create a script that uses cURL to send POST requests to the URL listed in the form’s action attribute, with some spammy data in the POST fields. (Or malicious data intended to break your script…) By having a pseudo-random token generated like the article describes, it makes things a lot harder. cURL, whether from a command line or an automated script, isn’t going to be able to store the session data and send the token along with the form.

Open Links in New Windows or Tabs Without Target=”Blank”

Though it’s not considered good practice to go around opening new windows on people, it still is something that there are practical uses for. There are legitimate reasons to open new windows, other than trying to open all of your external links in new windows, such as popping a window with a Google Map or Flash game, etc.

I recently came accross an interesting post on WP Cult that discussed opening external links in a new window with jQuery. I adapted the jQuery snippet a little and came up with this:

jQuery(function() {
jQuery("a.popup")
.click(function(){
window.open(this.href, "popupwin", "status=1, toolbar=0, location=1, menubar=0, width=600, height=450, resizeable, scrollbars");
return false;
})
});

If you add the snippet to your page head, between <script> tags, of course (and after referencing the jQuery library), you gain the ability to selectively make links open in new windows by adding the class of “popup” to the <a> tag.

In addition to avoiding the target=”_blank” attribute, you gain the control of being able to customize window size and chrome. It’s also very accessible to both users and search robots, since the href attribute still points to the usual URL, unlike with some JavaScript solutions that break browser features such as middle-clicking to open a link in a new tab.

Why Does SimplePie Replace Some Characters With Gibberish?

Sometimes when you use SimplePie to load and output an RSS feed, some characters, like quote marks and apostrphes, are replaced with some gibberish like €‡™. You may wonder what’s wrong, and search to find a way to prevent the unsightly garbage from appearing.

You have an encoding issue.

RSS feeds are encoded as UTF-8, as are many web pages. If you try to put SimplePie output on a page that isn’t UTF-8, you’ll get the weird characters.

“But…my page is UTF-8! I have a <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> tag in my header!”

Actually, there’s more to it than that. In addition to specifying the charset in your header, you server also has to send the data in UTF-8. If you use Firefox, choose Tools -> Page Info from the menu. In the resulting dialog box, note the two references to the encoding and charset.

Continue reading →