TheUpshot/statement

Statement parses RSS feeds and HTML pages containing press releases and other official statements from members of Congress, and produces hashes with information about those pages. It has been tested under Ruby 1.9.3 and 2.x.

Coverage

Statement currently parses press releases for members of the House and Senate. For members with RSS feeds, you can pass the feed URL into Statement. For members without RSS feeds (or with broken ones), HTML scrapers are provided, as are methods for special groups, such as House Republicans. Suggestions are welcomed.

Installation

Add this line to your application’s Gemfile:

And then execute:

Or install it yourself as:

Usage

Statement provides access to press releases, Facebook status updates and tweets from members of Congress. Most congressional offices have RSS feeds but some require HTML scraping.

To configure Statement to pull from the Twitter and Facebook APIs, you can pass in configuration values via a hash or a config.yml file:

require 'rubygems'
require 'statement'
Statement.configure(:oauth_token => token, :oauth_token_secret => secret, ...) # option 1
Statement.configure_with("config.yml") # option 2

If you don’t need to use the Twitter or Facebook APIs, you don’t need to setup configuration.

Press Releases

To parse an RSS feed, simply pass the URL to Statement’s Feed class:

require 'rubygems'
require 'statement'
results = Statement::Feed.from_rss('http://blumenauer.house.gov/index.php?option=com_bca-rss-syndicator&feed_id=1')
puts results.first
{:source=>"http://blumenauer.house.gov/index.php?option=com_bca-rss-syndicator&feed_id=1", :url=>"http://blumenauer.house.gov/index.php?option=com_content&amp;view=article&amp;id=2203:blumenauer-qwe-need-a-national-system-that-speaks-to-the-transportation-challenges-of-todayq&amp;catid=66:2013-press-releases", :title=>"Blumenauer: &quot;We need a national system that speaks to the transportation challenges of ...", :date=>#<Date: 2013-04-24 ((2456407j,0s,0n),+0s,2299161j)>, :domain=>"blumenauer.house.gov"}

Statement will try to parse a date if an RSS feed contains a PubDate element; if not it will return nil.

If you have a batch of RSS URLs, you can pass them to Feed’s batch class method, which will use Typhoeus to fetch them in parallel and returns a two-element array of results and failed urls:

urls = ['http://aderholt.house.gov/common/rss//index.cfm?rss=20', 'http://andrews.house.gov/rss.xml', "http://alexander.house.gov/common/rss/?rss=24", "http://amash.house.gov/rss.xml"]
results, failures = Statement::Feed.batch(urls)

The sites that require HTML scraping are detailed in individual methods, and can be called individually or in bulk:

results = Statement::Scraper.billnelson
members = Statement::Scraper.member_scrapers

Facebook Updates

Using the koala gem, Statement can fetch Facebook status feeds, given a Facebook ID. You’ll need to either set environment variables APP_ID and APP_SECRET or create a config.yml file containing app_id and app_secret keys and values.

f = Statement::Facebook.new
results = f.feed('RepFincherTN08')

It also can process IDs in batches by passing an array of IDs and a slice argument to indicate how many ids in each batch:

f = Statement::Facebook.new
results = f.batch(facebook_ids, 10)

In all cases Statement strips out posts that are not by the ID, and returns a Hash containing attributes from the feed:

1 2

Share