Arijit Guchhait
Data Analyst | MSc Data Science Student @ University of Leicester
Professional Profile
Data Analyst with 5+ years of experience building the backbone of enterprise data systems through automated pipelines and complex SQL workflows. Currently pursuing an MSc in Data Science, I am evolving my expertise to include statistical computing in R and Python, Exploratory Data Analysis (EDA), and Machine Learning. I aim to combine my extensive technical background with predictive modeling to turn historical data into future-ready insights
Technical Expertise
Cloud, SQL & Analytics Engineering
- SQL & Warehousing: Snowflake (Snowpipe, Streams, Tasks), PostgreSQL, SQL Server
- Data Modeling: SCD Type-2, Stored Procedures, Layered Data Architectures (Bronze/Silver/Gold)
Data Science & Statistical Computing
- Python Stack: Pandas, Scikit-learn, Seaborn (Automation, Data Validation, ML)
- R Stack: Tidyverse (Dplyr, ggplot2), Quarto, Markdown
- Statistics: Hypothesis Testing (T-tests, ANOVA), Linear Regression, Probability & Distribution Analysis
BI & Visualization
- Tools: Power BI, Tableau, Excel (Xlookup, Power Query), GitHub
Featured Projects
UK Road Collision Analysis
A comprehensive statistical analysis of over 50,000 road collision records to identify key environmental factors influencing accident severity.
- Predictive Modeling of Collision Severity: Developed a Random Forest classifier achieving 74% accuracy. The model identifies relationships between Hour, Day of Week, and Speed to distinguish high-volume “Slight” incidents from high-severity crashes.
- Hypothesis Testing & Statistical Validation: Validated situational factors by rejecting the Null Hypothesis (H₀) with p<0.05. Confirmed that variables like Speed are statistically significant predictors of Collision Severity.
- Temporal & Distribution Analysis: Identified a Bi-modal distribution with a distinct Left Skew. Analysis shows a morning peak at 08:00 and a primary concentration between 15:00 and 17:00.
- Volume vs. Severity Insights: Found that 30 mph zones drive Collision Severity through sheer volume, while 60 mph zones drive severity through impact force, despite fewer total incidents.
Industry Track Record
- Enterprise ELT: Designed, implemented, and maintained robust data pipelines within highly regulated healthcare environments, ensuring data integrity and compliance
- Systems Strategy: Proven expertise in structuring raw data into optimized layers (e.g., Bronze/Silver/Gold) to power downstream BI, reporting, and machine learning applications
Interests
Beyond data, I’m passionate about traveling and skydiving — seeking new perspectives both on the ground and from 15,000 feet.





