Legislative & Economic Data Visualization#

Python Plotly Pandas Selenium NumPy

02/2023

Overview#

Data analysis and visualization of U.S. economic indicators alongside legislative activity from 1953 to 2022. The project combines web-scraped economic data with a comprehensive dataset of over 25,000 laws to explore relationships between legislation and economic trends.

  • Acquired economic indicator data by developing a web scraper using Python and Selenium

  • Cleaned, processed, and analyzed data from multiple sources using Pandas

  • Created interactive Plotly visualizations combining legislative counts with economic indicators

  • Analyzed unemployment rates, inflation, and GDP growth alongside legislative output

Legislative and economic data visualization

Legislative activity plotted alongside economic indicators.#

Interactive Visualization#

The chart below is an interactive Plotly visualization. Hover over data points to see values, and use the controls to zoom and pan.

Methodology

Data Sources:

  • U.S. legislative records (1953–2022): 25,000+ laws with names and dates

  • Historical unemployment rates spanning 93 years

  • Inflation and GDP growth indicators

Processing Pipeline:

  1. Scrape economic data from government sources using Selenium

  2. Parse and clean legislative records from Excel datasets

  3. Merge and align time-series data across sources

  4. Generate interactive multi-axis visualizations with Plotly

Automated Data Collection#

The legislative dataset was gathered using a custom Selenium web scraper that navigates Congress.gov search results, extracting law titles and descriptions across 25,000+ records.

Scraper Source Code
import pandas as pd
from selenium import webdriver
from selenium.common import TimeoutException, NoSuchElementException
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.firefox import GeckoDriverManager
from timeit import default_timer as timer
import time

start_timer = timer()

BASE_URL = 'https://www.congress.gov/u/qfqynCbSYlZHeuZXDIwFu'

def main():
    try:
        driver = webdriver.Firefox(
            service=Service(executable_path=GeckoDriverManager().install())
        )
        df = pd.DataFrame(columns=['Titles', 'Description'])

        driver.get(BASE_URL)
        while True:
            try:
                search_container = driver.find_element(By.ID, 'main')
                search_results_title = search_container.find_elements(
                    By.XPATH,
                    "//li[@class='expanded']//span[@class='result-heading']"
                )
                search_results_description = search_container.find_elements(
                    By.XPATH,
                    "//li[@class='expanded']//span[@class='result-title']"
                )

                titles = {
                    'Titles': [r.text for r in search_results_title]
                }
                description = {
                    'Description': [r.text for r in search_results_description]
                }
                titles.update(description)

                df = pd.concat(
                    [df, pd.DataFrame.from_dict(titles)], ignore_index=True
                )
                df.to_excel('law_names.xlsx', engine='openpyxl', index=False)

                try:
                    next_button = driver.find_element(
                        By.XPATH,
                        "/html/body/div[2]/div/main/div/div/div[2]/div[2]/div[2]/a/i"
                    )
                    next_button.click()
                    time.sleep(5)
                except NoSuchElementException:
                    break

            except:
                break

    except TimeoutException:
        print("Timed out waiting for data")
    except NoSuchElementException as e:
        print(e.msg)
    finally:
        df.to_excel('law_names_final.xlsx', engine='openpyxl', index=False)
        print(f'Number of titles found: {len(df)}')
        driver.quit()

if __name__ == "__main__":
    main()

Technologies#

Category

Tools

Data Processing

Pandas, NumPy

Web Scraping

Selenium

Visualization

Plotly

Data Sources

Excel, CSV

View Visualization on GitHub

View Scraper on GitHub