Regular Expression to Get All Image URLs From a Page

Have you ever wanted to, while working on some sort of PHP project, get an array listing of all the images used in a chunk of HTML? I’ve been planning out a web app over the past couple months, which will be doing a bit of RSS parsing, and I thought it would be nice to do just that, when it came time to start coding. Suppose you were going to show a summary of an article from a feed, with a link to the original source. Wouldn’t it look better if you pulled an image from that article, scaled it down if necessary, and displayed it next to it? (Caching it, of course. Hotlinking == bad.)

I was reading an article from Cats Who Code and, lo and behold, there was a code snippet that did just that with a regular expression. (I decided to file it away to save time in the future.)

$images = array();
preg_match_all('/(img|src)=("|')[^"'>]+/i', $data, $media);
unset($data);
$data=preg_replace('/(img|src)("|'|="|=')(.*)/i',"$3",$media[0]);
foreach($data as $url)
{
$info = pathinfo($url);
if (isset($info['extension']))
{
if (($info['extension'] == 'jpg') ||
($info['extension'] == 'jpeg') ||
($info['extension'] == 'gif') ||
($info['extension'] == 'png'))
array_push($images, $url);
}
}

Source: 15 PHP regular expressions for web developers

Feed-parsing is an excellent use for this, as you have just the article, no layout-related imagery, like you would see if you were screen-scraping a web page to obtain the image URLs. Though I imagine Digg takes the latter route when they dig-up (Freudian pun unintended, honest) the thumbnails that go along with their news links.