September 2nd, 2008 — Uncategorized - Jeremy Carbaugh
I took a few minutes this morning to look at the technology that powers the DNC and RNC convention web sites. It is always interesting to see what technological decisions different organizations take when they are trying to accomplish similar goals.
Democratic National Convention
http://demconvention.com/
The URL for the web site was registered with Domain Discover. The whois information lists the registrant as:
Democratic National Committee
430 S. Capitol St. S.E.
Washington, DC 20003
For all of the domain squatters out there, the domain expires in November, so keep an eye out. My best guess is that the servers are hosted by Verizon Business. ServInt, based in McLean, VA, provides the hosting for the name servers which are also used by dnc.org and democrats.org, among others.
The DNC convention web site is served from an Apache 2.0.52 web server running on Red Hat Linux. SilverStripe, a PHP-based, open source content management system, is used to power the site. The main page of the site has a XHTML 1.0 Transitional doctype, but is served as text/html and is not valid XHTML. The site is laid out using divs to define logical elements within the page and CSS to position the elements.
The following are screenshots of the page with JavaScript disabled and with both JavaScript and CSS disabled.


Republican National Convention
http://gopconvention08.com/
The RNC registered their convention’s domain with GoDaddy. The registrant is listed as:
Roman Buhler
4056 41st Street North
McClain [sic], Virginia 22101
Mr. Buhler is the president of Roman Buhler & Associates, a lobbying firm based in McLean, VA. Mr. Buhler also served as counsel of the House Administration Committee from 1989 to 2003. Hosting for the web servers and DNS is provided by Smartech, based in Chattanooga, TN.
The web site is served by Microsoft IIS 6.0 using Microsoft’s ASP.Net language. The main page declares a generic XHTML namespace, but does not declare a doctype, is served as text/xml, and does not validate as XHTML. The site is laid out using HTML tables.
The following are screenshots of the page with JavaScript disabled and with both JavaScript and CSS disabled.


August 21st, 2008 — APIs, data, semantic web - cjohnson
James is finishing up a tweak to the Sunlight Labs API that allows for fairly sophisticated search for members of Congress, it isn’t “published” yet but it is active so if you want to experiment you’re welcome to try it out, but for now it is “unofficial”
Here’s the deal: We wanted a better way for people to search for members, as members of congress are often times referred to by different names– think “Ted Kennedy,” “Edward Kennedy,” “Teddy Kennedy” etc. Whether it is nicknames or typos, it makes analyzing data difficult if names are not standardized.
We’re not saying we’ve come up with a complete solution to name standardization or even congressional name standardization, but we’ve got a simple solution that might make some lives easier. To demonstrate, we’ll use Google Spreadsheets. Follow along at home!
Step 1: Get an API key from Sunlight Labs here
Step 2: Create a Google Spreadsheet
Step 3: Name the columns of your spreadsheet “Member”, “Firstname”, “Lastname” so it looks like this:

Step 4: Let’s add some wacky mispellings and some semi-dirty data like below:

Step 5: Here’s where it gets fun. We’ll use the importXML function in Google Spreadsheets to take the values of our dirty data and send them to the Sunlight Labs API, and get a firstname. Enter this code into your spreadsheet:
=importXML(”http://services.sunlightlabs.com/api/legislators.search.xml?apikey=YOURAPIKEYHERE&name=”&A2,”//firstname”)
See below for an example:

Step 6: Do the same for the last name column, but change your call to parse the lastname, like so:
=importXML(”http://services.sunlightlabs.com/api/legislators.search.xml?apikey=YOURAPIKEYHERE&name=”&A2,”//lastname”)
Step 7: Finish it up! Select the first values of those newly processed columns and fill in the rest of the values like so:

Neat! Clean easy name cleanup in your spreadsheet!
August 7th, 2008 — django, opensource, python - James Turk
Pass223.com
I recently worked on Pass223.com, a simple site that urges the Senate to pass a piece of legislation that requires the Senate to adhere to the same electronic financial disclosure rules in place for representatives and presidential candidates.
Pass223.com is similar to that of hundreds of related action sites: choose a legislator, call them, report results, repeat. I wrote the code, our esteemed creative director Kerry did the bulk of the design, and various others here at Sunlight helped to refine the concept and wording of the site and call script.
It was a bit surprising seeing how positive the feedback has been for such a simple site. A number of people have been pointing to Pass223 as an example of how this type of thing should be done. Most of that credit goes to the team that worked together to revise the awesomely straightforward script.
The other question that has come up is what content management system (CMS) Pass223 was done on and what legislative database it was built on top of. This made me think about the other reason Pass223 was able to come together the way that it did, the tools used behind the scene.
When all you have is a hammer…
It seems, especially in the nonprofit world where developers are sparse, that content management systems like Drupal are considered the solution to every problem that arises. Because Content Management Systems can not possibly do everything that organizations want they are left with two options: attempt to mangle the CMS to do things it was never intended to do, or alternatively not get what they actually want. Because of the difficulty in dealing with the massive codebases of most CMSs, they often find themselves accepting both results. A great deal of time is therefore sunk into a project and in the end things still don’t work quite how they were planned.
The supposed benefit of a CMS is the speed of deployment and ease of use, but as Jeremy’s recent post about LetOurCongressTweet mentioned, we are able to rapidly create sites without the use of a CMS. And in reality, struggling to fit an innovative project or idea into the rigid structure of a CMS is not easy nor fast.
A better use of the time and money spent maintaining and modifying complex CMS installations would be to spend that time learning and deploying sites in a framework such as Django or Ruby on Rails.
Perfectionists with Deadlines
Django in particular was created to solve this problem. Working with a bloated CMS forces you to make a decision between getting what you want and getting something fast, and more often than not you wind up with neither. It is because of this that a team originally working at the Lawrence Journal-World newspaper built Django to meet their needs as “perfectionists with deadlines.”
Frameworks like Django provide all of the pieces of commonly used functionality, user registration, Object-Relational mappers to avoid dealing with the database directly, caching, and a ton more. All of these pieces are given to you without any mandate that they must all be used, they are simply building blocks from which your particular project can choose to use or not. A large site such as EarmarkWatch may need complex user profiles, whereas it is possible to eschew all of the unneeded modules and build something as quick and simple as Pass223.com.
Ultimately, unless some CMS already provides exactly what you want, it is far easier to build a project from reusable components within a framework than to attempting to teach an old CMS a new trick. One of the reasons that Pass223.com seems to impress people used to looking at the typical contact-your-legislator forms is because we had the flexibility to build what we wanted.
July 11th, 2008 — Uncategorized - Jeremy Carbaugh
There has been some commotion over the past few days regarding Congresstional Web use restrictions. The rules are inadequate for the current state of the Web and must be rewritten to reflect changes in technology. Republicans and Democrats have been going back-and-forth over proposed changes to the existing rules; one side claiming the other is trying to stifle their communication. While they keep on bickering, we wanted to raise awareness of these Web use restrictions and get people involved.
We decided to launch a petition-like site that uses Twitter as the organizing method; using one of the very technologies that are impacted by Congressional Web use restrictions. We knew this had to be timely to have an impact, so the decision was made to have the Web site completed by the end of the day. That gave Kerry Mitchell, our fearless Creative Director, and I about six hours to get the site completed.
So how did it go?
As LOCT is a petition-like site, it is important to get a list of the people that are following the LOCT Twitter account. Twitter has a very nice API that makes it easy to pull information from the service. To get JSON list of people following your Twitter account, just send an HTTP GET request to http://twitter.com/statuses/followers.json using HTTP Basic Authentication with your username and password. You can also get a list of people following other user accounts from http://twitter.com/statuses/followers/<username>.json; no authentication necessary. We query Twitter for this information and cache it locally in a database.
Unfortunately, due to Twitter’s recent performance issues, many of the nicest features have either been limited or disabled, making it almost impossible to use Twitter exclusively for LOCT. We needed to get a list of tweets that mentioned LOCT, but couldn’t with the current performance restrictions in place. If only there was another service that provided this functionality.
And there is!
Summize rocks. Based right here in the greater Washington metropolitan area, Summize is tweet search service that has one of the few direct feeds into every tweet that is twittered. They also have an awesome API that makes it dead simple to search all tweets. http://summize.com/search.json?q=%23loct08. That is all you need to get search results in JSON. Just like the list of followers from Twitter, the results are cached locally for Maximum PerformanceTM.
As with almost all projects created by Sunlight Labs, Let Our Congress Tweet is writting using Django, a Python Web development framework. I love Django. It simplifies development by providing object-relational mapping, templating, and other features in an unobtrusive way.
Developing in Django is already quite rapid, but by reusing existing code we can develop at an unheard-of pace. Writing a reusable Django application is quite easy as it is nothing more than a standard Python module that can be used in any project.
Feedinating the countryside
A few weeks ago we released a new version of the Sunlight Foundation Web site. The old, infuriating Drupal installation was replaced with a slick Django application that was written in-house. One of the main features of the new site is feed aggregator that pulls in the recent blog posts from across the Sunlight-influenced transparency network. To accomplish this, we wrote Feedinator, a Django feed aggregator application that makes it easy to pull in feeds from multiple blogs and display them in different ways on a Web site.
We use Feedinator on Let Our Congress Tweet to pull in the feed of Tweets from the LOTC08 twitter account and a del.icio.us feed of Web sites that have mentioned LOCT. By writing Feedinator in a way that makes it easily reusable, we were able to start incorporating feeds into LOCT in a matter of minutes.
If you would like to use Feedinator in your own project, you are in luck. We plan on releasing the code in the near future as well as the code for a few other Django applications and Python modules.
Designers are useful
While I was coding, Kerry took care of the design, CSS, and HTML. A few minutes of converting the HTML into Django templates and the site was up and running.
So that’s it! After throwing in a few cron entries and some Apache configuration files, the site went live. We went from an idea to a production ready Web site in about six hours. Sure, the it isn’t overly complicated, but we’re proud of it nonetheless.
June 30th, 2008 — semantic web - cjohnson
Modern baseball’s origins are something historians don’t have a good read on. If you look at the Origins of Baseball article on Wikipedia, you’ll see that we don’t know very much about where the rules came from, but it formalized somewhere around 1845 when the Knickerbocker Club of New York City began to play baseball against the New York Nine. In 1857 16 clubs finally sent delegates to a convention to standardize the rules and standardize America’s Pastime.
Baseball statistics have their own story. A fellow named Henry Chadwick was the first to start using statistics to judge a player’s performance. It was a few years after the sport was invented and formalized that this young journalist would give himself the goal of creating “numerical evidence as that would prove what players helped or hurt a team win.”
It took nearly 100 years for baseball statistics to make it to the common man and woman. It wasn’t until 1951 when a researcher named Hy Turkin published the Encyclopedia of Baseball that used a computer to compile statistics for the first time. It wasn’t until 1977 that decent, predictive and objective statistical methods called sabermetrics were invented and distributed. Sabermetrics are the analytical methods that Theo Epstein used to break a curse and build a World Series winning Boston Red Sox. He even hired Sabermetrics’ inventor to work for the Red Sox.
It took over a century to invent and distribute the objective performance measuring statistics we use today to evaluate baseball players. To tell whether or not Greg Maddux is a better pitcher than Pedro Martinez.
But though Congress has been around for nearly 234 years we still don’t have an objective way to tell whether or not my namesake, Henry Clay was as effective of a speaker as Nancy Pelosi. This isn’t to say we haven’t been making up our own statistics. We’ve been compiling subjective scorecards for years. But we need a process of standardizing our statistics, publishing how they’re calculated, and we need to build a system for authenticating and delivering those results.
It took us over 100 years to get good baseball stats, but this isn’t to say that the data didn’t exist or wasn’t being recorded. The data was still there. Here’s the full stats on the 1876 Chicago White Stockings. People were watching, keeping score, logging the games and recording the data and even making their own statistics out of it. But it was the process of standardizing the statistics, publishing how they’re to be calculated and sharing the results that made these subjective metrics effective.
So what metrics out there are effective in evaluating our legislators? Off the top of my head, here’s some elementary ones:
-
Attendance Percentage: The percent of time a member attends Congress when it is in session.
-
Vote percentage: The percentage of time a member has voted when they’ve had the opportunity to do so.
-
Sponsored Bills Per Term: The average number of bills sponsored and co-sponsored per term
-
Sponsored Bills Passed Per Term: The average number of sponsored and co-sponsored bills passed per term
-
Party percentage: The percentage of time the member votes with their political party
-
Vote Victory Percentage: The percentage of time the member votes with a bill that passes.
These are just obvious building blocks of a much more sophisticated statistical system. All these statistics exist right now. Sunlight’s partner, Open Congress and GovTrack.us and many more track their congressional statistics in their own way as do many others. In order to do it right we need:
-
Standardization: We need to be calculating these things and naming these statistics the same way everywhere.
-
Comparison: Statistics are not relevant unless they’re in context. We need to be able to create a ranked 1-535 list for every statistic we standardize and create
-
Adoption: They need to be adopted and as pervasive as RBIs and ERAs.
-
More: We need more statistics made from the data that congress generates that provide “numerical evidence” about the effectiveness of congress. We need our own “sabermetrics” that objectively evaluate whether or not a Member of Congress is a effective at representing those that chose to elect them. The ones I’ve listed aren’t even close to being accurate predictors.
So let this be a post to start a discussion amongst the transparency community about how we can begin standardizing our own objective statistics, making them useful and centralized. Let’s start working together to invent new statistics for how our members can be evaluated.
June 29th, 2008 — Uncategorized - cjohnson
We launched Capitol Words just a couple weeks ago and got a really great reception from the blogs. I’m two weeks in to my new duties as Director of Sunlight Labs and while I didn’t have much (really, anything) to do with the project’s success, I am really excited about it. With the CapitolWords API we can start doing some interesting analysis of overall word-usage in Congress.
Some of this is obvious and you can see at the surface. Check out the screenshots below:



June seems to be predominantly about energy and oil. Septembers of even numbered years tend to be about security and intelligence. March tends to be about budgets and amounts.
Neat! Josh wrote most of the code and handled the architecture of the system. Garrett who heads our http://www.louisdb.org project also had a big hand in concieving and building the application. Of course Kerry, our wonderful Creative Director helped make the user interface and designed the site. It is written in Django and MySQL. Great work guys!
April 3rd, 2008 — Uncategorized - Greg Elin

Josh Tauberer, one of the champs of opening U.S. Government data announced today he has made all of Govtrack.us available as open source.
This week I made www.GovTrack.us officially totally open source.
GovTrack is a website that tracks U.S. federal legislation and also
builds the only comprehensive open database of congressional
information. While the data behind GovTrack has been provided in the
public domain for a number of years now, and has been successfully
powering a bunch of other sites like OpenCongress, I’ve been playing
catch-up in getting the source code of the website opened up.
Run at get the code! (link)
March 14th, 2008 — Uncategorized, data, documentation - Greg Elin


W0ot! Thanks for the excellent diagram WavingSparks.
It’s very helpful to have such a nice graphic explaining the data flows. As you discovered, Sunlight is very interested in financially supporting and being part of making transparency information available.
In interest of transparency, Sunlight is also supporting Taxpayers for Common Sense to update their website and offer smaller transparency grants, too (http://sunlightfoundation.com/grants). GovTrack is a pretty efficient and essential website and is nobody’s weak link! For more related websites check out:
(Link: http://waving.deadsquid.com/?p=29)
February 19th, 2008 — APIs, Uncategorized, data, semantic web - Greg Elin
Metaweb has announced an open source release of structured data from Wikipedia. Via the email from get.theinfo email list:
“Hello from Metaweb. We’ve just released a GFDL licensed extraction of
Wikipedia in XML + relational form. Anyone is welcome to use it for
any purpose…”
This follows Reuters recent announcement of Open Calais API to extract people, places, things, and simple relationships from unstructured text. (We are experimenting with similar techniques of entity tagging via open protocols at Sunlight.) Metaweb’s WEX’s is 57GB of download-able structured data from the largest peer-production encyclopedia project ever. The Semantic Web, so long discussed, is now beginning a virtuous cycle of innovation. We are entering the age of open source semantics. Like compounding interest, Moore’s Law, and exercise, results from the cycle of innovation around open source semantics will multiply quickly. If you thought Google circa 2007 is impressive, buckle your seatbelt and reach for your helmet. Things are about to move even faster.
Addendum: DBPedia is another project extracting data from Wikipedia in the RDF format.
Links: •Metaweb’s WEX •Open Calais •ReadWriteWeb on Open Calais •DBPedia
February 11th, 2008 — APIs, documentation, opensource - Greg Elin
The Social Graph API page on Google code exemplifies the future of multimedia learning and why open source is being so productive relative to proprietary software development efforts. A two and half minute YouTube video introduces the concept. Links are included to documentation and examples. I’ve got code, examples, documentation, and even a human giving me a tutorial. (This one intro video could easily be expanded to series of step-by-step instructions.) Five developers might jump on the Social Graph API or 5,000. The right five might matter more than having 5,000 developers. What matters is two points. First, beneficial multimedia is no longer expensive to produce or distribute. In fact, it is becoming a basic skill of people everywhere. Second, the distributed nature of open source mentality encourages people to provide information in economical paired-down forms—even if the code is not open source but merely an API.