1 February 2011
It’s always good when open data makes the headlines, albeit slightly for the wrong reasons today. Nonetheless, too much traffic to our website is a problem we’d all like to have. It shows public interest if nothing else. After all, who wouldn’t want an easy way to find out how much crime is on their street and in their neighbourhood?
But before we fall over ourselves to be grateful for this latest attempt at transparency we should exercise more than a little caution.
This won’t be news to anyone who thinks seriously about data, but a map is a visualisation, not the data itself. It’s one way of representing the underlying data. In as much as the data is accurate, complete and relevant, the police.uk website is simply giving us a single way to look at it that’s already been decided for us. No matter how often we’re reminded that the map is not the territory (and let’s be honest, most people have never heard that saying, let alone considered the issues in any depth), if you’ve only got the map it might as well be the territory. Psychologically, the two become conflated.
Perhaps apocryphally, Stalin said that it’s not who votes that counts but who counts the votes. Likewise, we should be hugely cautious about giving too much weight to official visualisations of data. As the policing minister Nick Herbert wrote today (my emphasis):
We live in the age of accountability and transparency. The public deserve to know what is happening on their streets, and they want action. By opening up this information, and allowing the public to elect Police and Crime Commissioners, we are giving people real power – and strengthening the fight against crime.
So what we’re looking at here isn’t a value-neutral scientific exercise in helping people to live their daily lives a little more easily, it’s an explicitly political attempt to shape the terms of a debate around the most fundamental changes in British policing in our lifetimes.
Transparency isn’t wrong. It’s absolutely vital to make a meaningful contribution to public debate, but we need to distinguish pseudo-transparency from the real thing. Spatial visualisation and analysis is enormously difficult to get right and even thoughtfully-designed visualisations require a fair bit of understanding to interpret correctly. Slap it on a map works fine when you just want to see where your local recycling centres are, but as soon as you start to classify crimes by type and bound them into streets and neighbourhoods you’re into the realm of professional spatial analysis. You need to know what you’re doing and have access to tools that enable you to shift category and spatial boundaries to account for anomalous effects. The newspapers that have run lists of the most crime-ridden streets in the country today might want to consider the fact that longer streets will on average have more crime than shorter streets, just to take one simple example of a relevant factor that’s not accounted for if you want to visualise this data in that way.
Whether police.uk is trying to pull a fast one on us or is simply naive about the possibilities for doing something meaningful for a general audience with this data, the result is the same: plenty of heat and very little light. Mark Monmonier’s How to lie with maps provides a good starter text for the myriad ways in which maps can deceive, intentionally and otherwise.
On a more positive note, we’re also getting the data itself to use. This is a good thing, in as much as the data itself is, as stated above, accurate, complete and relevant. Unfortunately, it’s not. It’s derived data that’s already been classified, rounded and lumped together in various ways, with a bit of location anonymising thrown in for good measure. I haven’t had a detailed look at it yet but I would caution against trying to use it for anything serious. A whole set of decisions have already transformed the raw source data (individual crime reports) into this derived dataset and you can’t undo them. You’ll just have to work within those decisions and stay extremely conscious that everything you produce with it will be prefixed, “as far as we can tell”.
£300K for this? There ought to be a law against it. Worse than useless, it’s thoroughly misleading. In future, we need fine-grained datasets for these kinds of applications and a big head start (six months?) between publishing official data and the commissioning of official expensive projects around it to ensure that everyone really understands what can and should be done with it.