Tips Dataset Analysis#
Python NumPy Pandas Seaborn Matplotlib Scikit-learn
02/2023
Overview#
Exploratory data analysis on a tips dataset comparing tips received by male and female servers. The project employs bootstrap resampling techniques to generate statistically robust confidence intervals and uses kernel density estimation for accurate distribution modeling.
Methodology
Statistical Methods:
Bootstrap Resampling: Generated thousands of resampled datasets to estimate the sampling distribution of the mean tip amount for each server gender
Smoothed Bootstrap: Applied kernel density estimation to the bootstrap distribution for a continuous representation
Confidence Intervals: Computed 95% confidence intervals from bootstrap distributions to assess whether the difference in mean tips is statistically significant
Key Findings: The analysis examines whether there is a statistically significant difference in tipping behavior based on server gender, using non-parametric methods that make no assumptions about the underlying distribution.
Technologies#
Category |
Tools |
|---|---|
Data Processing |
NumPy, Pandas |
Visualization |
Seaborn, Matplotlib |
Statistical Analysis |
Scikit-learn, Bootstrap methods |