Building a local news mashup with Twitter, TwitterFeed, Delicious, Yahoo! Pipes, Ruby and RSS

15 March 2009

sutton-local-news-mashup

(Click on the image to download the PDF, 19KB, opens in new window/tab.)

I’m a self-confessed and unashamed news junkie and this is how I’m starting to mash up news in my local area. For those that aren’t local, Sutton is a London borough with a population of approximately 180,000. Stonecot Hill is a neighbourhood within Sutton with a population of a few thousand.

Here’s how it all works.

Sources (green boxes)

I write Stonecot Hill News which is a local news blog running as a standalone WordPress installation on its own server. It produces an RSS 2.0 feed which here is treated as an outbound API.

Paul Burstow is the local member of parliament (constituency: Sutton & Cheam). Paul posts news regularly to his website and for many years that site has been serving an RSS 1.0 (RDF) feed. Whether he realises it or not, Paul laid one of the first foundations for news mashability in the borough.

The Sutton Guardian is the local newspaper, published by Newsquest. Together with its sister titles in other areas, they publish several dozen RSS 2.0 feeds for a wide variety of content.

Sutton Council is the local authority for the borough. Despite a recent £270,000 revamp to their website they haven’t yet managed to step into the Twenty-First and produce any RSS feeds. However, they do publish a variety of content regularly on their website, including their press releases.

APIs (grey boxes)

For the non-technical: API stands for Application Programming Interface, but that doesn’t tell you very much. Think of APIs like connectors or adapters that allow one program to plug into another in the same way that our household appliances can all connect to the electrical network because they share common plugs and sockets.

An API may be inbound (allowing data to be put into an application), outbound (allowing data to be extracted) or both.

As we can see in the diagram, applications which use APIs can be daisy-chained together, with the output of one application being fed into another.

RSS and Atom feeds are also APIs in that they provide a structured way for a program to get data out of an application. These feed formats are simple to implement (many applications produce them automatically) and are the first thing to consider when implementing a simple outbound API for an application.

Mashers (pink boxes)

Mashers are small programs that connect otherwise incompatible inbound and outbound APIs together. TwitterFeed is a simple example. Say you want to automatically post the new items from your blog to your Twitter account. Your blog serves an RSS feed but Twitter, while it has an inbound API, cannot accept RSS directly as input. TwitterFeed links the two, allowing the user to define any number of RSS feeds as inputs and any number of Twitter accounts as outputs, via the Twitter API. In this way, TwitterFeed plugs blogs into Twitter.

Yahoo! Pipes is a much more sophisticated and flexible masher. It can take inputs from a variety of sources (RSS, Atom, CSV, Flickr API, Google Base or even raw web pages), sort, filter and combine them in every conceivable way, and output the results as a single stream in various formats (RSS, JSON, and KML, the geo-format used by Google Earth). For my mashup I created this pipe to filter Paul Burstow’s, the Sutton Guardian’s and Sutton Council’s news and only pass through items containing the word “stonecot” to the stream that eventually ends in the @stonecothill Twitter feed, which is just for Stonecot Hill residents. The number of items coming through these sources about Stonecot Hill is very low, but when something appears residents will want to see it. (By way of example, only a single press release from Sutton Council in the last 227 concerns the Stonecot Hill area specifically.)

As mentioned above, Sutton Council doesn’t provide an RSS feed or any other kind of outbound API for its press release. I wrote a screen scraper in Ruby (using Hpricot) that grabs the press releases directly from the council website, dumps them into a MySQL database and pushes new items into the Delicious API. I’ve used Delicious here for two reasons. Firstly, because it generates an RSS feed automatically from all the items posted to it, so I can easily connect this output to other mashers and APIs further downstream without having to generate and host an RSS feed myself. Also, Delicious provides a useful search facility on its website allowing me to easily search just the press releases from Sutton Council. This isn’t possible with the council’s own website, where searches are scoped to the entire site.

Destinations (orange boxes)

In my diagram, the destinations are sites and services which represent new ways of consuming information coming from the original sources. Don’t want to read Sutton Council’s press releases on their own website? You can folllow them in Delicious or on Twitter. Want to keep up with the latest news about Stonecot Hill? Again, the @stonecothill Twitter account can find this for you from various sources. I also add my own items to @stonecothill, making it a unique mashup of original and syndicated content that’s highly targeted and very local.

The information stream doesn’t need to end with these destinations. Any destination that provides an outbound API can simply be another link in the chain to downstream services. In my diagram, the RSS feed from Delicious is used to do just that, pushing all its content on to the @suttonboro Twitter account, and just the Stonecot Hill-related content on to the @stonecothill account via the Yahoo! Pipes filter. Twitter has its own specific outbound API and also serves RSS feeds. There’s nothing to stop anyone else building on these destinations by combining and filtering them with other sources to produce their own unique, relevant information streams that they find useful.

What next?

If you run a website, it’s time to start thinking of mashability with the same degree of seriousness as you treat human visitors. Your website needs to serve up feeds and APIs so that other programs can connect to your content and deliver it to people in ways and contexts that they find useful. Some of these may have an audience of thousands or even millions. Others may have an audience of one. Regardless, by providing an API to your content you enable others to build things that you haven’t imagined, don’t have the resources or desire to build yourself, and won’t have to maintain. Businesses like newspapers that survive by selling their content (or selling advertising around their content) are thinking very carefully about the challenges and opportunities for the future of their industries. For government and voluntary organisations, it’s time to start thinking more like evangelists than economists. Spread the word like the free Bibles in hotel bedrooms and take every opportunity to get your message out there.

Sutton Council have been encouraged in various ways to implement feeds on their own website and the song will remain the same until they do. I don’t want to maintain my scraper for ever and I certainly don’t want to build any more of them.

The whole API and mashability agenda is far bigger than simple web feed formats like RSS and Atom. It’s time for technologists to stop flogging the line that “RSS is an easy way for people who follow lots of websites to read all their news in one place”. Direct human consumption of RSS feeds is never going to hit the mainstream in that way. If you’re reading this, you’re far more likely that average to use an RSS reader. (I’ve got 86 feeds in my Google Reader right now). The average web user has barely heard of the concept and most definitely don’t do it. I suspect they never will. But it’s likely they’re already benefiting from syndicated content through sites and applications that they use. If they never have to see or care about the underlying technology that’s really no more a problem than worrying that the average web user doesn’t understand HTTP or DNS. It’s just plumbing that can stay out of sight and out of mind as long as it works.

For the minority that do use personal RSS readers, I’d like to see more of them with built-in filtering features. Setting a simple keyword filter on a feed makes RSS reading considerably more powerful.

For those serving up feeds, I’d like to see Atom more widely used. Without wanting to open a can of Wineresque worms, RSS 2.0 fudges a number of important issues around content semantics and provides no support whatsoever for correctly attributing items in feeds mashed from several sources. Atom was designed to solve these problems and it does. Let’s use it.

Lastly, mashability is about every conceivable kind of content and content type. It’s not just about news and text. Every stream of information should have its own machine-readable feed. Every system that can accept data from human input should implement an inbound API to do likewise. To take one example, FixMyStreet is a website for people to report street faults to local authorities and currently takes around 1000 reports a week. It even has its own iPhone application so people can report faults complete with GPS locations and photos directly from the street. Only a single local authority in over 400 has implemented an inbound API to receive these reports. The rest get them by email, which must be manually copied into their own databases with all the effort, expense, possibility for error and opportunity costs that represents. Third-parties building extensions to other people’s systems is no longer unusual, so organisations need to embrace the possibilities rather than fighting against it or standing around looking bemused.

It’s time to open the doors and windows and get the web joined up, mashed up and moving.