Tag Archives: XML

Building an iPhone App to Parse the Twitter API with NSXMLParser

iOS has a simple event-based XML parser built in, which makes it fairly easy to do less involved parsing operations without having to load up a third-party framework. This tutorial will show you how to build a simple iPhone application that will download an XML feed from Twitter containing a user’s tweets, and then display them with a pretty UI. (You could easily adapt this to parse other XML documents, such as RSS feeds.)

Continue reading →

PHP Simple HTML DOM Parser

It’s always fun to obtain data from REST APIs and parse the XML or JSON response. Twitter, for sure, wouldn’t be what it is today if not for the thriving community of developers building applications that tie-in with the API.

But what do you do when you need to obtain information from a site that doesn’t have an API, or at least an RSS feed that you could dump into SimpleXML. You scrape the page. There are numerous methods of doing that, such as using file_get_contents() and passing the resulting HTML to Tidy (to convert everything to strict XHTML) before invoking SimpleXML.

One of the simplest options is S.C. Chen’s PHP Simple HTML DOM Parser. Once you include the PHP library, you gain access to a set of functions that lets you read and modify HTML content with jQuery-like selectors.

Here is an example of scraping Slashdot headlines:

// Create DOM from URL
$html = file_get_html('http://slashdot.org/');

// Find all article blocks
foreach($html->find('div.article') as $article) {
 echo $article->find('div.title', 0)->plaintext;
}

As usual, with great power comes great responsibility. There are certain ethical guidelines to data scraping. Don’t steal articles for republication, use caching so you don’t make too many redundant requests to the target server, credit your source, etc.. If you do some Googling, you’ll probably find some relative articles.

jParse: A jQuery XML Parser Plugin

jParse is a jQuery plugin that can asynchronously fetch an XML file (AJAX, in other words) and parse it for display. It works in all modern browsers, plus Internet Explorer 6+, and the file is only 1.8KB in size. It’s basic usage looks something like this, where #item-cont is the element that the XML content will be displayed in:

$('#item-cont').jParse({
    ajaxOpts: {url: 'digg-feed.xml'},
    count: '#item-count'
});

The script’s biggest limitation is that you can’t request an XML file from another domain, because of JavaScript’s Cross-Site Scripting taboo. You could, if you wanted, get around that with a PHP proxy or similar trick.