Legislative & Economic Data Visualization#

Python Plotly Pandas Selenium NumPy

02/2023

Overview#

Data analysis and visualization of U.S. economic indicators alongside legislative activity from 1953 to 2022. The project combines web-scraped economic data with a comprehensive dataset of over 25,000 laws to explore relationships between legislation and economic trends.

Acquired economic indicator data by developing a web scraper using Python and Selenium
Cleaned, processed, and analyzed data from multiple sources using Pandas
Created interactive Plotly visualizations combining legislative counts with economic indicators
Analyzed unemployment rates, inflation, and GDP growth alongside legislative output

Legislative and economic data visualization — Legislative activity plotted alongside economic indicators.#

Interactive Visualization#

The chart below is an interactive Plotly visualization. Hover over data points to see values, and use the controls to zoom and pan.

Automated Data Collection#

The legislative dataset was gathered using a custom Selenium web scraper that navigates Congress.gov search results, extracting law titles and descriptions across 25,000+ records.

Scraper Source Code

import pandas as pd
from selenium import webdriver
from selenium.common import TimeoutException, NoSuchElementException
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.firefox import GeckoDriverManager
from timeit import default_timer as timer
import time

start_timer = timer()

BASE_URL = 'https://www.congress.gov/u/qfqynCbSYlZHeuZXDIwFu'

def main():
    try:
        driver = webdriver.Firefox(
            service=Service(executable_path=GeckoDriverManager().install())
        )
        df = pd.DataFrame(columns=['Titles', 'Description'])

        driver.get(BASE_URL)
        while True:
            try:
                search_container = driver.find_element(By.ID, 'main')
                search_results_title = search_container.find_elements(
                    By.XPATH,
                    "//li[@class='expanded']//span[@class='result-heading']"
                )
                search_results_description = search_container.find_elements(
                    By.XPATH,
                    "//li[@class='expanded']//span[@class='result-title']"
                )

                titles = {
                    'Titles': [r.text for r in search_results_title]
                }
                description = {
                    'Description': [r.text for r in search_results_description]
                }
                titles.update(description)

                df = pd.concat(
                    [df, pd.DataFrame.from_dict(titles)], ignore_index=True
                )
                df.to_excel('law_names.xlsx', engine='openpyxl', index=False)

                try:
                    next_button = driver.find_element(
                        By.XPATH,
                        "/html/body/div[2]/div/main/div/div/div[2]/div[2]/div[2]/a/i"
                    )
                    next_button.click()
                    time.sleep(5)
                except NoSuchElementException:
                    break

            except:
                break

    except TimeoutException:
        print("Timed out waiting for data")
    except NoSuchElementException as e:
        print(e.msg)
    finally:
        df.to_excel('law_names_final.xlsx', engine='openpyxl', index=False)
        print(f'Number of titles found: {len(df)}')
        driver.quit()

if __name__ == "__main__":
    main()

Technologies#

Category	Tools
Data Processing	Pandas, NumPy
Web Scraping	Selenium
Visualization	Plotly
Data Sources	Excel, CSV

View Visualization on GitHub

View Scraper on GitHub