Update: There were several serious shortcomings in the code below that have been resolved (Twitter #failwhale, control characters in Tweets, greedy regex).

I’ve been working on a backend for a website and needed a way to show the latest Twitter update. There is plenty of code floating around that does this, but I couldn’t find anything simple that preserved formatting, like hyperlinks to @replies, e-mail addresses, URLs, etc. Anything that did was either very bulky (lots of source files) or required a Twitter API key…so I wrote my own, self-contained in a single function call.

As you can see, this just scrubs the HTML from Twitter.com, so it definitely could break without notice. That’s what you get for not using the official API, you should expect that.

function latest_tweet($username = 'gfiumara', $include_date = true)
{
        /* Grab the latest tweet in XML format */
        $twitter_feed_url = "http://twitter.com/statuses/user_timeline/$username.xml?count=1";
        $feed_buffer = file_get_contents($twitter_feed_url);

        /* Use the tweet ID from XML to retrieve an HTML page with the tweet */
        try {
                $xml = new SimpleXMLElement($feed_buffer);
                $status = $xml->status;
        /* Twitter #failwhale */
        } catch (Exception $e) {
                return '

@' . $username . ': No tweet retrieved.r
'; } $single_tweet_url = "http://twitter.com/$username/status/$status->id"; $html_buffer = file_get_contents($single_tweet_url); /* Strip out control characters */ $html_buffer = preg_replace('/[\x00-\x1F\x7F]/', ' ', $html_buffer); /* Tweets are located in an 'entry-content' class span */ preg_match('/ (?P.*?)<\/span>/', $html_buffer, $tweet_array); /* Add some additional formatting */ $tweet = '
@' . $username . ': '; if (isset($tweet_array['latest'])) { $tweet .= $tweet_array['latest']; /* Twitter uses relative links for @replies, etc. */ $tweet = preg_replace('/href="\//', 'href="http://twitter.com/', $tweet); /* Insert the date */ if ($include_date) { /* Date is located in a 'published timestamp' span */ preg_match('/(?P.*?)<\/span>/', $html_buffer, $date_array); if (isset($date_array['latest'])) /* Don't use HTML5
"; }

That said, this would be infinitely easier with jQuery or an HTML DOM Parser, but I really wanted it to be self-contained and completely PHP. You’ll need PHP 5 to use this code. If you’re stuck with PHP 4, you could simply rewrite the SimpleXML section with DOM XML or parse it with preg_match() like I do to get the date. You might want to preg_quote() the regular expressions Don’t use preg_quote() because it will escape characters you don’t want escaped (such as < and >).

If you see how this can be improved in any way or you spend the time to make it work for PHP 4, please leave a comment!