Advanced Statistical Analysis to Explore Cardiovascular Data
Exploring Framingham Heart Disease Data: An Advanced Statistical Analysis
As an applied data scientist, I recently had the opportunity to work on a fascinating project involving the Framingham Heart Study dataset. This dataset, renowned for its extensive data on cardiovascular health, allowed me to delve into advanced statistical analyses, unveiling insights into various factors influencing heart disease.
Project Overview
Dataset
The Framingham Heart Study dataset is a goldmine of information, covering a wide array of variables related to cardiovascular health. In this project, I aimed to answer several research questions using advanced statistical methods.
Research Questions
In this project I aimed to answer several research questions, some of them are:
- Does Glucose Influence BMI?
- Does Cholesterol Influence the Chance of Experiencing Congenital Heart Defects?
- Does BMI Distribution Change Between Genders?
- What Are the Different Relationships Between Heart Rate, Number of Cigarettes Per Day, and BMI Over Total Cholesterol?
Methodology
To answer these questions, I employed a variety of advanced statistical methods, including different regression models, handling missing data through imputations, conducting statistical tests, and leveraging Bayesian inference. The analysis was organized into distinct Jupyter notebooks within the GitHub repository for transparency and reproducibility.
Repository Structure
The project repository, Advanced-Statistical-Analysis, is organized into several Jupyter notebooks, each focusing on a specific aspect of the analysis:
- Bayesian Inference & Handling Missing Data.ipynb
- Confidence Intervals & Statistical Tests.ipynb
- EDA & Raising Hypotheses.ipynb
- Effects of Different Variables Over SysBP.ipynb
These notebooks cover the entire spectrum of the analysis, from exploratory data analysis and hypothesis generation to Bayesian inference and handling missing data.
Conclusion
The Framingham Heart Disease project provided valuable insights into the intricate relationships within cardiovascular health. By employing advanced statistical methods, I was able to navigate the complexities of the dataset and draw meaningful conclusions. The project’s transparency and reproducibility are ensured through the structured organization of Jupyter notebooks within the GitHub repository.
Feel free to explore the repository to gain a deeper understanding of the analysis and methodologies employed. If you have any questions or suggestions, don’t hesitate to reach out. Happy coding!