Chip Oglesby

An online portfolio and notebook about the future of journalism.

Tag: semantics

The problem with more data: cliff notes edition

If you would like to read my previous post, you can find it here. It goes into more depth than this post explaining more of the transparency cycle.

Hopefully this will serve as a reference guide for those looking to post their data such as check registers online.

All data should be machine readable

  1. The basic reason to post data online is to inform citizens.
  2. PDF’s are good for scanned pages, they are bad when they’re from computer generated programs.
  3. Data is much more useable when it’s in a machine readable format such as a csv file. It makes it easier for developers and designers to massage data into a format they need.
  4. Extracting information from PDF’s can be tedious and labor intensive. It’s easier to provide an open standard format for people to use.

All data is dirty

  1. The main question when releasing data is always: What if we release the wrong info?
  2. Data can be incorrect. Names can be misspelled, numbers can be input wrong, descriptions can be off.
  3. When working with data, always go to the source. Double check your sources.
  4. When exporting data, choose which information will best suit your consumers needs. The more information you can include, the better.

All data needs context

  1. Schools, government and municipalities needn’t waste time giving data context. Allow developers to take your information and do that for you.
  2. Developers need an easy way to access your information. CSV files and API’s are a developers best friend.
  3. Designers can work together with developers to best highlight data and make it more meaningful.
  4. If you feel compelled to give your data context, show examples highlighting your dataset. For example, how much did your school district spend per month on lodging and meals. How much as your government spent on cell phones and technology?

All data needs a central storage location

  1. Storing pdf’s by month on the same page is a good start, it gives people a way to categorize things, but is bad for computers.
  2. Data should be stored in a publicly accessible database such as socrata.
  3. Storing your data in socrata will centralize information, allowing for quicker, easier access to material.
  4. Databases should have methods of exporting data: JSON, REST, CSV.

All data needs action

  1. While pushing to have more information online is great, if no actions result from publishing, what good is it?
  2. Engaged citizens and advocacy groups need a way to export and share their findings through social media.

Further Reading: For more in-depth reading about data check out these resources.

  1. Civic Commons/OpenMuni Wiki: A great resource for any municipality looking to make the leap into the digital world. Case studies, best and worst practices and more.
  2. The five stars of open linked data: The father of the internet, Tim Berners-Lee explains why he wants to build a new internet using linked data and what we need to do to get there.
  3. Socrata: A free and paid service for municipalities to store their data online. Their basic service is free and prices increase depending on needs.
  4. The transparency cycle: From the sunlight foundation. This graphic and blog post explains why we must all work together.
  5. The eight principles of open government data: Government data shall be considered open if the data are made public in a way that complies with these eight principles.

Putting Americans to work through technology

While driving down I-26 to Asheville North Carolina last weekend, I noticed a sign that said “Project funded by the American Recovery and Reinvestment Act.” This was passed in 2009 under the Obama administration as a way to create jobs and promote investment and consumer spending during the recession. The measures of ARRA are worth about $787 billion.IMG_4459

The road sign got me thinking about acts that were passed as part of the New Deal: The Works Progress Administration (WPA) and The Civilian Conservation Corps (CCC). The WPA was created by President Roosevelt in 1935. Expenditures from 36-39 totaled nearly $7 billion. As a side note, my grandfather use to refer to the WPA as “We piddle around” for their lack of work.

The CCC was a public work relief program for unemployed men age 18-24 to provide development of natural resources in rural areas from 1933 to 1942.

In North Carolina, both the WPA and CCC helped build the Blue Ridge Parkway:

On June 30, 1936, Congress formally authorized the project as the “Blue Ridge Parkway” and placed it under the jurisdiction of the National Park Service. Some work was carried out by various New Deal public works agencies. The Works Progress Administration did some roadway construction. Crews from the Emergency Relief Administration carried out landscape work and development of parkway recreation areas. Personnel from four Civilian Conservation Corps camps worked on roadside cleanup, roadside plantings, grading slopes, and improving adjacent fields and forest lands. During World War II, the CCC crews were replaced by conscientious objectors in the Civilian Public Service program.

What does this have to do with technology?

The CCC and WPA were good ideas in the 1930′s, but there are new and greater demands that need to be met during our current recession. Government spending is at an all time high while red tape and bloat hinder a system that our founding fathers helped establish.

One of the main problems facing the government is a lack of innovation and creation with technology.

It’s time that we take the next step and create new agencies that will help bring us into the next era of a public and open government.

Large strides could be made by putting people to work in the technology sector of government. Designers, developers, community activist could all work in stride to help make how the government spends your money and help hold your elected officials accountable.

What could these jobs produce?

Designers across America for example could help redesign websites such as the South Carolina Governors Mansion website.

These jobs could help create databases of information that are publicly accessible by all. A good start would be to build a system for Roll Call voting as outlined in the South Carolina Policy Council’s 2009 transparency report. The US Senate has an XML feed of their votes available, a key piece of legislation that was championed by S.C. Senator Jim DeMint. Politico goes further into the pro’s and con’s of this.

Sites like scvotes should have API’s that work with counties GIS departments to report voting results for precints statewide in realtime on election night.

Any state ran or supported website should also make their analytics avaiable under FOIA laws. This could help determine the sites actually impact to cost ratio.

Copyright laws should also be updated and any text and photograph make publicly available. On the South Carolina Flim website for example, in the footer the following information appears:

Photographs and art on this website and any downloadable publications are copyrighted and cannot be reproduced without the written permission of the photographer and/or the South Carolina Film Commission. © 2010 by the South Carolina Film Commission, a division of SC Parks, Recreation and Tourism. All rights reserved.

All information on any taxpayer supported website needs to be licensed through a Creative Commons license. Government entities should also use open platforms such as drupal, django and share their code using Github.

SCPC also supports the idea of an online check registery. [PDF] After submitting FOIA’s to 85 district counties, 12 counties quoted an expense of more than $10,000 to complete the request.

A technological workforce reinvestment could help solve these problems by working with those 85 districts to move their financial records online using open linked data.

There are five stars to linked data: 1. make your stuff available on the web. (whatever format.) 2. make it avaiable as stuctured data (excel instead of pdf.) 3. use non-proprietary format ( csv, tab delimited instead of excel.) 4. Use urls to identify things, so that people can point to it. 5. link your data to other people’s data to provide context.

If you think about it, it’s a pretty simple idea. Why can’t citizens log in to one website and see a single view of their account with a city or state. (water, sewer, real estate, auto excise tax, registration, etc.)

All of this requires a massive amount of imagination and innovation, but it has to start from the ground up. People need to learn that they have just as much control of what happens in government as the people they elect. The media does a good job of informing what’s happening in government, but they miss a great opportunity when it comes to educating people on how to make changes.

In conclusion, it’s time to take a serious look at the types of projects we’re funding with ARRA and look for ways to promote real change through a digital workforce reinvestment.

Using data and augmented reality to help define local news

There is no longer denying the use of what we currently call “smartphones” will only continue to increase their capacity as technology becomes cheaper.

The way that we use our phones will also continue to change as more phones utilize what is known as Location Based Services or LBS which uses various methods of A-GPS.

This is a pretty new area for newspapers to start exploring and I would like to see more attention paid to local advertising using LBS.

I recently saw an article that described the idea of using an Augmented Reality app that runs on the Android Phone that showed nearby tweets and various other types of information. Wikitude: (Android) TwitAround: (iPhone)

The basic idea of TwitAround is that by using the phone’s accelerometer you can see real-time tweets happening around you.

We also know that data needs relationships and newspapers are historically good about gathering data. What they are not good at is how the record and distribute that information.

My idea is the build an application that harnesses all of this data and makes it available on your phone.

Examples

Example 1: You are a first time home buyer looking in the Rosewood area on Maple for a home. By simply pointing your phone at a home, you are instantly able to see MLS listings, tax parcel service look ups and average utility usage charges. You are also able to see local related stories, photos, tweets, video, crime stats and so forth.

Example 2: You are the same home buyer and you travel to the intersection of Wheat and Rosewood and come upon Hand Middle School where you children may attend. By pointing your phone at the school, you are able to see publicly accessible data such as SAT scores, teachers salaries, crime reports, stories about the school, historical context and more.

Example 3: You are at a high school football game where Hammond is playing Heathwood Hall. By pointing your phone at a jersey on the field, you would be able to see team roster, individual stats, results in various weather conditions, past games, photos, videos and tweets.

Example 4: You are are at the museum of art and want to know more about the painting you are looking at. By pointing your phone, you are able to see historical context, painters bio, similar paintings and more.

A business model

In a virtual interview that I did with Dan Conover, I found this quote to be interesting

“The issue with augmented reality, then, isn’t the technology. You need a platform that communicates it, a system that structures and creates it, a business model that understands its value and how to communicate it, and user devices and software agents that accurately interpret and negotiate it. The issue is content and how to pay for it. ”

The problem is that we need a business model that rewards someone for adding value (i.e., meaningful content that people actually want). Until that happens, then every business that approaches augmented reality is going to treat it as just another way of delivering no-cost crap. It’s going to be mass-media executives trying to figure out how to use Facebook all over again. Business people tend to look at networked media as a way to make free money off of somebody else’s content, but there’s not going to be a sustainable business here until we work out the connections and expectations and exchanges..

While what Dan is saying is correct, I don’t think that it will be an entire ‘crap in, crap out’ model either. Just as Twitter has become popular, so will it’s ability to filter tweets through geolocation.

What we need is a better way to rate and log information through various algorithms that will sort the good from the bad. Part of the connections that we need to work out will be taking and filtering raw data as Berner Lee suggested, but also pulling content from our own archives and making that available through various API’s.

Mindy McAdams also raises an interesting point in here post ‘Augmented Reality: a business model.’

Each view of a node can be tracked. Each visit to the node can be tabulated. I think the opportunities for selling would be fantastic — the whole process could be automated. The advertiser pays a small fee to have the privilege of viewing all visits to a node. This is like micro-metrics for local businesses. The fee is necessary because you want it to be monthly or yearly, and you want it tied to a true identity. The account can be modified to allow advertisers to input and update their own coupons, etc. Then they pay per ad, per length of time, per update, etc. But it’s all hands-free for the entity that owns the app.

Not only would this tie in well with local advertisers, it would also open an entirely new stream of revenue we haven’t previously seen. It’s hard to answer the question of “how are we going to make money off of this?” because we’ve never done it before. The closest thing we’ve ever had to this would be a ‘bar database.’

Drawbacks

There are some drawbacks to LBS:

Results indicate that A-GPS locations obtained using the 3G iPhone are much less accurate than those from regular autonomous GPS units (average median error of 8 m for ten 20-minute field tests) but appear sufficient for most Location Based Services (LBS). WiFi locations using the 3G iPhone are much less accurate (median error of 74 m for 58 observations) and fail to meet the published accuracy specifications.

but that’s something we’ll have to address in another post.

Steps to getting started

1. You data will have to be available in a raw format. Hopefully, you’ll be able to use the COPE method, or the more controversial hnews for your information.
2. Your data will have to be given relationships and linked to other data.
3. Your data will have to be given a specific longitude, latitude for future reference.
4. You’ll can build your own publish platform or you can use openly available API’s like Layar.
5. All of your photos and stories will require stronger semantic data. No more incomplete information.
6. You’ll have to actually have a team who can code all of this for you.

Conclusion

Where we go from here really depends on how much news organizations want to invest in this type of technology. At the very least, we can take small steps by adding value to our stories through our Content Management System by using keywords and physical locations if they support it. (Hint: MNI does!)