Guide to Open Data: Using it, Sharing it, and Creating a Portal
Over the last few years, governments, businesses, research organizations, and others have embraced the open data movement to enormous benefits. To steal a phrase from FME World Tour presenter Chris Rado: it makes information available to city agencies, to the public, and to Batman.
This post will explore what open data means, who produces and consumes it, best practices for integrating it in your workflows, and recommendations for sharing your own data with the world.
It’s kind of a big deal.
Open data is free data that anyone can use for any purpose—and public interest in it is exploding. Take a look at this graph from Canada’s Open Government website showing the number of visitors to the Open Government Portal over the last year.
Why is it so popular? Open data means we have access to information about the places, businesses, and organizations we care about. It means transparency of government, a degree to which George Orwell would be proud. It enables collaboration, innovation, and scientific and technological advancement. (A degree to which Bill Nye must be proud.)
Open data is useful for:
- Everyday people. For example, the award-winning City of Surrey portal provides pay parking locations, land use plans, service requests, population info and estimates, and a whole lot more.
- Commercial and non-profit ventures. It’s easy to make data-driven decisions and solve complex problems when the whole world of information is available.
- Researchers and journalists. Data tells stories!
- Developers who are building apps. Check out the Canadian Open Government Apps Gallery for examples, or the award-winning NYC Citygram that sends people alerts based on their areas of interest.
Where can I find open data?
Governments are the main providers of open data. Most western governments are actually mandated to provide it. You can find portals at the city, state, and federal levels, plus via intergovernmental organizations. For example, Surrey Heath in England publishes all their expenditures in this (FME-hosted!) data download portal. In the City of Vancouver portal, you can also find such extremely urgent data as where the nearest food truck is.
CBC: “This is data that’s ultimately been paid for by taxpayers, one way or another”
A significant number of private companies are offering open data upon realizing transparency is power. Adopting open data practices has helped corporations improve net profits. Non-governmental organizations (NGOs) and nonprofits (NPOs) have always supported the democratization of data, and are now producing open data themselves. Crowd-sourced open data and geomapping can ensure that time and money is spent where it’s needed most. For example, within 48 hours of the Nepal earthquake, a crowd-sourced relief effort resulted in the mapping of thousands of miles of roads and tens of thousands of buildings. These maps enabled rescue plans.
Academic institutes are also sharing data, including universities and scientific research organizations. There are complexities around sharing scientific data, but progress is being made towards more transparent biomedical research. Already the Accelerating Medicines Partnership (AMP) has collaborated on data in three disease areas.
How to use open data
It’s easy to consume data directly from a portal in FME Workbench. You can read from a URL or FTP by pasting the link into the reader. Then you can do anything you want with it, like integrate with other sources, manipulate the content and structure, and/or write it out to whatever format you need it in.
To monitor for changes, send the data through the ChangeDetector to get only the updates.
For total automation, you can set up FME Server to pull the data at scheduled intervals.
Tips for creating your own open data portal
Making your data open is trivial. Just make a GeoCities page and put a link to the PDF file, right? Well, creating a nice, useful open data portal takes planning.
Scientific American: “In the case of open data sites, what we want to make are tools that make data understandable to humans, but also, to the search engines that humans use to explore the web.”
For a top-notch example, check out the City of Surrey’s portal, which has won the Open Data for Democracy and the Canadian Open Data Excellence 2016 awards. They offer a vast range of data in an easy-to-navigate site, plus they let users draw a polygon on a map to pick the exact area they want to download. And yes, it is powered by FME.
If you want to make your data public, here are a few tips.
1. Update the datasets frequently
The problem with many open datasets is that they’re static and don’t get updated enough. Make sure your data is updated regularly. You should also provide your data as a published feed (e.g. RSS) or API rather than statically downloadable files. This will allow people to consume the endpoint, and if you make updates they will be automatically reflected in the user’s app.
You should also connect your portal directly to your master database rather than duplicating the data across two locations.
You can set up FME to do this for you. Create an FME workspace to synchronize your portal with your database, and use the ChangeDetector to apply just the updated fields instead of reloading entire datasets every time. You’d use FME Server scheduling to run the synchronization process automatically, and FME Server data streaming to provide the feed.
2. Offer coordinate system choices
For spatial data, offer more than one option for the coordinate system. Your end users might want Spherical Mercator (EPSG:3857) for a web mapping application, or WGS84 lat/long (EPSG:4326) for GPS navigation systems, or a precise local projection like State Plane. Give them the freedom to pick the one they want. We recommend offering both local and global projections.
In FME, this is done by making a published parameter in Workbench so at runtime you get the choice. To set the coordinate system on the data, you would use the Reprojector transformer (which uses the CS-Map reprojection engine — but others are available, e.g. Blue Marble, Gtrans, Esri).
3. Ensure the data is good quality
Make sure your data is good quality before making it public. This includes validating geometry, attributes, standards compliance, format-specific issues like XML / JSON structure, and more. Consult our data quality checklist for a thorough guide to geospatial data QA.
In FME, this can be done automatically using validation transformers like the AttributeValidator, GeometryValidator, XMLValidator, Tester, and others.
4. Offer format choices
By definition, open data should be easy for the public to use. Offer a choice with respect to format. Here are our recommendations:
- GeoJSON because it’s flexible, machine readable, offers an API endpoint for the user, and is instantly viewable in a web environment.
- XML because it’s machine readable and offers the user a lot of power and flexibility for tabular data.
- JSON for the same reasons, plus it offers an API endpoint for the user.
- CSV because it’s a tabular format that’s easily read by humans. Excel is also a good one to offer for these reasons.
- Esri Shapefile because it’s such a widely used spatial data format. It’s consistently the most popular GIS format in our usage stats.
- KML because it’s instantly viewable in a web environment and is the format of choice for Google Maps and Earth.
- Other useful spatial formats to consider: GML (because it’s a widely used OGC format), AutoCAD DXF/DWG (for the CAD users), Esri File Geodatabase (because Esri), MapInfo TAB (because it’s among the most popular GIS formats).
- You could also offer PDF, because it looks nice and is easily shareable. Note this should be a supplementary format and not your central focus, as PDF is useless for people intending to do things with the data.
5. Choose the right delivery solution
As for delivering the data, here are a few solutions you can leverage (alphabetically, not necessarily ordered by awesomeness). We have a webinar and an ebook that explore these in detail.
- ArcGIS Open Data
- Amazon Web Services (AWS)
- CKAN
- DataPress
- DKAN
- FTP
- GitHub
- Junar
- OpenDataSoft
- Socrata
Free the data!
We’re going to be seeing a lot more open data in the world due to overwhelming popularity. Plus, there are no excuses when it comes to the technical side of things. With the rise of the cloud and automation tools like FME, it’s straightforward and cheap (and fun!) to create an open data portal.
Of course, there will be demand for higher quality data, not just more of it. Open data must be easy to find, use, and collaborate on. We also expect to see open data become normalized so it’s easier to compare cities globally.
With free access to data, citizens will be able to use it for amazing things. Canadian Open Data Experience (CODE), for example, was a hackathon focused on using open data to solve problems and increase productivity. NYC BigApps 2015 was another one aimed at building tools to overcome pressing civic challenges.
Further reading / watching:
- Free eBook: A Beginner’s Guide to Open Data – Includes what we talked about here, a comparison of the above solutions, and links to resources & demos.
- Video: How the City of Surrey automates Open Data with FME.
- Webinar recording: Open Police Data – more tips and examples of open data portals.
- Webinar recording: Open Data Portals – 9 Solutions and How They Compare.
- Try it: Create a data delivery service in the FME Server developer playground.