The problem with more data: cliff notes edition

Chip Oglesby bio photo By Chip Oglesby

If you would like to read my previous post, you can find it here. It goes into more depth than this post explaining more of the transparency cycle.

Hopefully this will serve as a reference guide for those looking to post their data such as check registers online.

All data should be machine readable

  1. The basic reason to post data online is to inform citizens.
  2. PDF’s are good for scanned pages, they are bad when they’re from computer generated programs.
  3. Data is much more useable when it’s in a machine readable format such as a csv file. It makes it easier for developers and designers to massage data into a format they need.
  4. Extracting information from PDF’s can be tedious and labor intensive. It’s easier to provide an open standard format for people to use.

All data is dirty

  1. The main question when releasing data is always: What if we release the wrong info?
  2. Data can be incorrect. Names can be misspelled, numbers can be input wrong, descriptions can be off.
  3. When working with data, always go to the source. Double check your sources.
  4. When exporting data, choose which information will best suit your consumers needs. The more information you can include, the better.

All data needs context

  1. Schools, government and municipalities needn’t waste time giving data context. Allow developers to take your information and do that for you.
  2. Developers need an easy way to access your information. CSV files and API’s are a developers best friend.
  3. Designers can work together with developers to best highlight data and make it more meaningful.
  4. If you feel compelled to give your data context, show examples highlighting your dataset. For example, how much did your school district spend per month on lodging and meals. How much as your government spent on cell phones and technology?

All data needs a central storage location

  1. Storing pdf’s by month on the same page is a good start, it gives people a way to categorize things, but is bad for computers.
  2. Data should be stored in a publicly accessible database such as socrata.
  3. Storing your data in socrata will centralize information, allowing for quicker, easier access to material.
  4. Databases should have methods of exporting data: JSON, REST, CSV.

All data needs action

  1. While pushing to have more information online is great, if no actions result from publishing, what good is it?
  2. Engaged citizens and advocacy groups need a way to export and share their findings through social media.

Further Reading: For more in-depth reading about data check out these resources.

  1. Civic Commons/OpenMuni Wiki: A great resource for any municipality looking to make the leap into the digital world. Case studies, best and worst practices and more.
  2. The five stars of open linked data: The father of the internet, Tim Berners-Lee explains why he wants to build a new internet using linked data and what we need to do to get there.
  3. Socrata: A free and paid service for municipalities to store their data online. Their basic service is free and prices increase depending on needs.
  4. The transparency cycle: From the sunlight foundation. This graphic and blog post explains why we must all work together.
  5. The eight principles of open government data: Government data shall be considered open if the data are made public in a way that complies with these eight principles.