slightly OT: OpenCalAccess project
slightly OT: OpenCalAccess project
- Subject: slightly OT: OpenCalAccess project
- From: Ray Kiddy <email@hidden>
- Date: Sat, 18 Dec 2010 11:16:22 -0800
Hello -
I have created an open-source project on http://code.google.com/p/opencalaccess/ and I would like to hear if anyone is interested in participating in this with me.
There is a web page maintained by the California Secretary of State called CalAccess (http://cal-access.sos.ca.gov/campaign/). It provides access to a large database of campaign financing data from candidates, lobbyists, fundraisers and many others who have their noses in the electoral trough. The data is a mess and, from what I have seen, the SoS puts up a sanitized version of the data with lots of data removed. One can get dumps of the data from the SoS. I have done so and I am interested in resolving ambiguities in the data and in presenting the data, using WebObjects apps of course, in more useful ways.
There are several interesting problems one sees as one works with the data.
1 - There are errors are every place there can be errors. There are illegal characters in the data dump and malformed lines. There are "information" errors as well. For every constraint that is documented, there are violations. There is a document that describes the schema but it is rife with errors. For example, one table in the data only includes half of its columns. In the document, there is a page break in the list of columns and the columns on the second page do not appear in the data dump. Really.
2 - The schema is hugely bloated in the way that government agencies mess up all databases.
3 - There is another layer of obfuscation. For example, names are not related to primary keys but may only be linked in different tables by string comparisons. For example, a committee treasurer may appear as "Bob Smith" in one table, "Robert Smith" in another, "R. E. Smith" in yet another, and then again as "Robert Smyth" at the same address....
4 - The schema itself is obfuscated. One can tease out the relationships, for example, in the tables that contain info on lobbyists, their employers, and who they give money to, but the graph of these is amazingly complicated. One wants to assume to SoS is not deliberately trying to hide the relationships. But then one sees the schema. Something is definitely going on.
5 - There are many ways to display the data and I am not sure which are useful. Anyone with political oversight or forensic accounting experience would be appreciated.
6 - It is not clear what the "WO way" to work with the data should be. If one is creating a shopping cart app, for example, WO makes this easy. When one wants to find near-duplicates in a list of 1.5 million addresses, it is not clear what the "WO way" is. Naive implementations cause much suckage.
I have put up a slice of the data (CSV files), some scripts I use to clean up and import the data into MySQL and a basic WO app. I have tried to document what is going on. I am only relying on WebObjects (and so java), perl and Bourne shell and I am trying to do this cross-platform-ishly, but am working on Mac OS X.
I have built and taken apart parts of this Edsel before and, since I am interested in this for citizen advocacy reasons and not to make money, I thought I would try the open-source route.
The state of California keeps saying that they cannot afford to do anything with the data. I think it would be cool to come up with an open-source solution and tell the SoS that they can either get a $5 million proposal from KPMG for another bloat-ware system or they can use something that is free and which works. When citizens put up data of this sort, the government does tend to get embarrassed into doing their jobs.
Let me know if you are interested.
cheers - ray
email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden