Chip Oglesby

An online portfolio and notebook about the future of journalism.

Tag: transparency

The problem with more data: cliff notes edition

If you would like to read my previous post, you can find it here. It goes into more depth than this post explaining more of the transparency cycle.

Hopefully this will serve as a reference guide for those looking to post their data such as check registers online.

All data should be machine readable

  1. The basic reason to post data online is to inform citizens.
  2. PDF’s are good for scanned pages, they are bad when they’re from computer generated programs.
  3. Data is much more useable when it’s in a machine readable format such as a csv file. It makes it easier for developers and designers to massage data into a format they need.
  4. Extracting information from PDF’s can be tedious and labor intensive. It’s easier to provide an open standard format for people to use.

All data is dirty

  1. The main question when releasing data is always: What if we release the wrong info?
  2. Data can be incorrect. Names can be misspelled, numbers can be input wrong, descriptions can be off.
  3. When working with data, always go to the source. Double check your sources.
  4. When exporting data, choose which information will best suit your consumers needs. The more information you can include, the better.

All data needs context

  1. Schools, government and municipalities needn’t waste time giving data context. Allow developers to take your information and do that for you.
  2. Developers need an easy way to access your information. CSV files and API’s are a developers best friend.
  3. Designers can work together with developers to best highlight data and make it more meaningful.
  4. If you feel compelled to give your data context, show examples highlighting your dataset. For example, how much did your school district spend per month on lodging and meals. How much as your government spent on cell phones and technology?

All data needs a central storage location

  1. Storing pdf’s by month on the same page is a good start, it gives people a way to categorize things, but is bad for computers.
  2. Data should be stored in a publicly accessible database such as socrata.
  3. Storing your data in socrata will centralize information, allowing for quicker, easier access to material.
  4. Databases should have methods of exporting data: JSON, REST, CSV.

All data needs action

  1. While pushing to have more information online is great, if no actions result from publishing, what good is it?
  2. Engaged citizens and advocacy groups need a way to export and share their findings through social media.

Further Reading: For more in-depth reading about data check out these resources.

  1. Civic Commons/OpenMuni Wiki: A great resource for any municipality looking to make the leap into the digital world. Case studies, best and worst practices and more.
  2. The five stars of open linked data: The father of the internet, Tim Berners-Lee explains why he wants to build a new internet using linked data and what we need to do to get there.
  3. Socrata: A free and paid service for municipalities to store their data online. Their basic service is free and prices increase depending on needs.
  4. The transparency cycle: From the sunlight foundation. This graphic and blog post explains why we must all work together.
  5. The eight principles of open government data: Government data shall be considered open if the data are made public in a way that complies with these eight principles.

The problem with ‘more data’

Recently, SCPC wrote about the problem with online check registers in county school districts. As more and more data is placed online, we need a way to standardize data so that has context and it’s not just sitting there. That’s what we would call ‘naked transparency.’

The naked transparency movement marries the power of network technology to the radical decline in the cost of collecting, storing, and distributing data. Its aim is to liberate that data, especially government data, so as to enable the public to process it and understand it better, or at least differently.

Before we rally the troops, we have to realize that getting more data, data that we own, from government officials on all levels doesn’t equal more transparency or accountability.

Data is only part of the Transparency Cycle

In a blog post by the Sunlight Foundation, they posted a very interesting graphic that shows how the ‘Transparency Cycle’ works. It has no beginning or end because it’s part of an ongoing process. Government Agencies (State Ethics Board) for example, are responsible for organizing data and giving web developers API’s who work with Graphic Designers who Give data context by visualizing it. Designers work with Journalists who help build public awareness through context and raising public awareness by reporting anomalies. Engaged Citizens work with Advocacy Groups who Organize and take action to hold the public and lawmakers accountable for what’s going on in government.

Tim Berners-Lee, the founder of the internet has envisioned a new type of web, one of linked data, where the dots are able to be connected. Berners-Lee gives five points of open linked data.

  1. make your stuff available on the web (whatever format)
  2. make it available as structured data (e.g. excel instead of image scan of a table)
  3. non-proprietary format (e.g. csv instead of excel)
  4. use URLs to identify things, so that people can point at your stuff
  5. link your data to other people’s data to provide context

State Comptroller Richard Eckstrom’s state government spending transparency site accomplishes 4 of the 5 goals, a great accomplishment in my opinion. Our school websites on the other hand, meet only one of the 5 requirements. PDF’s with no structure, give engaged citizens no way to ingest and analyze more than one month worth of data.

I was able to go in and scrap a PDF off of Berkley County’s transparency website and run the information through Many Eyes to get this chart that’s featured below. Ideally, there should be a simpler way for a developer or designer to visualize this information through API’s.

Eckstrom’s website is faced with the same type of problem. It focuses on the month to month expenditures, and if I want to build a database, I would have to download 12 separate .csv files to enter into another database to visualize.

All data is dirty

Once we’re able to actually collect data through publicly accessible API’s, does that necessarily mean the info is clean? Not really.

Since data input still relies on human-beings we are all prone to make mistakes. Remember the disaster of recovery.gov? There was a huge scandal because of all of the ‘ghost’ districts where money was being spent. The to main views here are simple “It happened on purpose, democrats are trying to steal/take our money” or “It was just a simple mistake, a slip of the finger or some congressional page didn’t know what district they were in.”

Also, if you browse the transparency data portal from the Sunlight Foundation and look for campaign contributions, names can be misspelled and instead of using proper nouns for occupation such as “Owner: Fast Bucks” a donor may simply list occupation as “store owner.”

This can lead to a few errors. It makes it hard to track who’s actually giving because a researcher will have to double check which company the donor works for to help connect the dots.

Data needs context

Once the data is published, it still needs context. PDF’s are good for looking at a small record, but what if we want to compare values over a given year, or the past six years? How do we know when a company a lobbyists represents gives a lawmaker money for his PAC so that he may be influenced to vote a certain way?

Spending all day pouring through massive amounts of information can be tedious and lead to the wrong conclusions. Instead, there should be automated processes in place that alert people via email, text, tweet when anomalies arise. Like the internet, quietly working in the background, but always on.

Designers and reporters also play an important roll in this because they can help clarify misunderstandings someone may have.

Data doesn’t equal transparency

Once we get the data, it’s been check to be accurate, and given context all is not complete in the transparency cycle. Government could publish every single bit of data it has, recorded votes, transit information, gis maps, but what good will it do if it just sits there?

It’s up to engaged citizens and Advocacy Groups to take the information from Designers, Developers, Journalists and Bloggers and form grassroots movements to hold government responsible. Data without action is done for naught.

Once Citizens and Groups organize and take action, they along with others can work with Lawmakers to actually make a change.

Transparency alone will not lead to more accountability in government. Data.gov and recovery.gov are great examples, Federal government have given citizens monitoring tools.

In South Carolina, we face battles of our own. South Carolina Senate, comprised of only 46 people cannot simply decide if they’ll vote on the record because they say it’s unconstitutional. They’ve also argued that verbal roll-call voting takes too long, and I agree, it does. But there are solutions out there. Open-source software can be written so that bills, amendments and earmarks can be posted online 72 hours early for the public can expect them, then house and senate members could vote on the bills so that we can connect the dots to see where change and influence is happening.

The question that South Carolina faces is: Who’s going to be first in the Transparency Cycle?

Two issues facing newspapers

There are a lot of issues that plague newspapers. Aggregation, fair use, copyright infringement are just a few, but when you really hone in on the problem it may surprise you.

The two main issues keeping newspapers from moving forward are control and education.

Why control?

The Washington Post’s new social media policy is a great example of what I consider ‘control.’ Before the immense rise of social media fueled by recently by twitter, it was easier for a journalist to hide their opinions. Services like twitter keep their thoughts and opinions archived and always on the record. There are plenty of conversations going on about transparency and objectivity in journalism, but why is the Washington Post so concerned about controlling the message of their journalist?

The one-way street that newspapers had are quickly coming to an end. Newspapers need to prepare and educate their staff on how to properly handle these situations. Just because journalists need to remain objective doesn’t mean they can’t engage with readers and sources.

Let’s take a look at the Ombudsman’s blog on the Post’s situation:

In today’s hyper-sensitive political environment, Narisetti’s tweets could be seen as one of The Post’s top editors taking sides on the question of whether a health-care reform plan must be budget neutral. On Byrd, his comments could be construed as favoring term limits or mandatory retirement for aging lawmakers. Many readers already view The Post with suspicion and believe that the personal views of its reporters and editors influence the coverage. The tweets could provide ammunition.

A lot of this can be avoided if newspapers would adapt the idea of transparency in the newsroom as Dan Gillmor and numerous others have suggested. Journalist, like everyone else are people and they have opinions and bias. They may as well be as open as possible about it from the beginning. Some will think that this could lead to “Fox news style journalism” but there’s a difference between being transparent and injecting your opinion into every story.

One of the things we’ll be talking about at Social Media Club in October is how journalist can use services like twitter to engage with readers and help build a stronger audience through transparency and engagement. Newspapers also need to have active social media users in their company draft guidelines instead of relying on senior management or editors with little to no experience with these tools.

The second example of control is the continuation of the general interest product. I’ve borrowed this quote before and paraphrased it but it really applies to newspapers as well: “If publishers would focus on how people read and not how they publish, they would be a lot better off.”

The actual quote comes from Clive Thompson:

To which I reply: Sure they can. But only if publishers adopt Wark’s perspective and provide new ways for people to encounter the written word. We need to stop thinking about the future of publishing and think instead about the future of reading.

The Internet has completely unbundled the newspaper and allows news to come to the reader instead of the reader going to the news. Although newspapers continue to see a decline in print readership and an increase in online readership, papers still operate in an old school model where stories are routinely held until midnight deadlines. It’s understandable that the paper is still the golden egg of newspapers, but failure to embrace the wants and needs of your readers will result in the end of your business, plain and simple.

Why education?

What’s wrong with your company when 100% of your employees know what a furlough is but less than 75% can actually write a full hyperlink? If your employees can’t finish this “a href” you’ve got some serious problems. Some of the main divisions in the newsroom come from the simple fact that employees aren’t being properly educated. I’m not just talking about continuing education through school, I’m also talking about educating them about their future of their jobs and their trade in a real world environment.

If you ask a typical employee why newspapers are doing so poorly, you’ll likely get an answer like ‘I blame the Internet’ or ‘it’s all Craigslist’s fault’ when in actuality, it goes much deeper than that.

Education also includes teaching coworkers things like although the AP is useful for print, it’s unnecessary for the web when you combine tools like Publish2 with ideas like the link economy. So wire editors now become curators of news instead of copy editors.

Education also extends past institutional knowledge and also includes having an open conversation with coworkers. Before you groan about having another meeting or another brown-bag lunch, consider the fact that these conversations will be one of the most important things you’ll ever talk about at your paper. Leaving coworkers in the dark about what’s going on is just another form of control and shows a lack of education.

Where do we go from here?

All of this starts with a conversation. It could be in small groups or it could be through email, but it must happen. Newspapers cannot afford to operate in the business as usual mentality. Don’t wait for higher-ups to start this conversation, take it to your coworkers and start it yourselves.

Take the time to see how your paper uses ‘control’ and question if it’s really necessary. Ask questions about everything and don’t be afraid to raise the question of education, the only thing worse than an uneducated audience is an uneducated newsroom.