I’ve been working on a neat enhancement for my Tweetable WordPress plugin. Already I have a handy “Documentation” link on the plugin’s pages in the WordPress admin. When clicked, it opens a ThickBox dialog pointing to the README.txt file.
Not bad, but it had a few rough edges. Raw markdown doesn’t look look stellar, and then there was the problem with the horizontal scrollbars that would appear from loading a plain text file into the ThickBox. So I made a new script that would load-up the README.txt file and use Regular Expressions to parse some of the more basic markdown syntax into good old HTML.
As I write this, the changes haven’t been released to the public quite yet, as I have a few more things to finish up before putting out a new patch to the plugin, but they’re on their way.
How do you pull off something like this? It’s not too hard.
First, dump a basic HTML page wrapper into your new PHP file:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>Documentation</title> </head> <body> </body> </html>
Now, between the two body tags, we’ll put the beginnings of our script. We need to reference wp-load.php, so we can access a few WordPress-related functions later.
Now it’s time to load the README.txt file. Once we dump the contents into a variable, we run them through a series of functions.
wp_specialchars() to escape PHP code and other unpleasant things,
nl2br() to turn each newline character into a
<br /> tag (which makes the text nice and readable, instead of a jumbled mess), and finally
make_clickable() to turn any URLs into clickable links.
$readme = file_get_contents('readme.txt'); $readme = make_clickable(nl2br(wp_specialchars($readme)));
With that out of the way, we move on to actually parsing some of the markdown formatting. Let’s start with turning backticks (`) into HTML <code> and </code> tags.
$readme = preg_replace('/`(.*?)`/', '<code>\\1</code>', $readme);
It may look a bit…strange, but that line does just as advertised. The / characters signify the start and end of a Regular Expression, and the middle part isn’t too hard to guess at. The backticks are the markdown formatting we see wrapping a section of code (e.g.
`echo $this;`) The part between the backticks, enclosed by the parenthesis, means “one or more of any sort of letter, number, or character.” The second argument of
preg_replace() is the part we’ll be replacing the matches with, code tags with the content inside the backticks (represented as
\\1) inside them.
Now we do a similar thing for *italics* and **bold text**. It’s important to put the line for the boldface formatting before the one for the italics, otherwise you’ll have some Unexpected Results happening.
$readme = preg_replace('/[\040]\*\*(.*?)\*\*/', ' <strong>\\1</strong>', $readme); $readme = preg_replace('/[\040]\*(.*?)\*/', ' <em>\\1</em>', $readme);
This one looks like more of a mess, doesn’t it? That’s because we have to escape the asterisks with backslashes (i.e.
\*), as the asterisk has meaning in a regular expression otherwise. The
[\040], which represents a space character, is added so the expression will only match instances where the first asterisk has a space in front of it. This is mainly a safety feature, so no code snippets break anything…
Next we handle headings, which are marked-up as one to three equality signs on either side of a line of text.
$readme = preg_replace('/=== (.*?) ===/', '<h2>\\1</h2>', $readme); $readme = preg_replace('/== (.*?) ==/', '<h3>\\1</h3>', $readme); $readme = preg_replace('/= (.*?) =/', '<h4>\\1</h4>', $readme);
Once again, the order of the lines matters.
Now all that needs to be done is to echo-out the text and close our PHP block:
echo $readme; ?>
That wasn’t too hard was it? It’s only the most basic markdown syntax that’s being parsed, but it’s lightyears better than plain text.