Posts

Week 4 Assignmemt: Graphing Decisions

Image
NESARC Dataset: Questions Selected As part of this assignment I will be considering association between the frequency of drinking alcohol during the time of abuse (: S2AQ21A: Explanatory Variable ) and whether or not the person suffered from one of the below cardiovascualar medical condition in last 12 months: Hardening of Arteries (S13Q6A1: Response Variable) High Blood Pressure (S13Q6A2: Response Variable) Heat Attack (S13Q6A7: Response Variable) All four variables are categorical. Univariate Bar Graps Since all variables are categorical, I have used Bar Graphs for Univariate plots. Below are the bar graphs for the four variables: Descriptions for the selected variables I have used the Pandas describe() method to describe the variables. Below is the output of the Python program: Total rows in NESARC dataset: 43093 Total columns in NESARC dataset: 3010 Description for how frequently people drank alcohol during the time of alcohol abuse: S2AQ21A count    34331.000000 unique...

Week 3 Assignment: Data Management

Variables Selected As part of this assignment, I have selected the following three variables from the NESARC dataset to perform data manipulation operations: Had hardening of arteries in last 12 months (S13Q6A1) Had high blood pressure in last 12 months (S13Q6A2) Had heart attack in last 12 months (S13Q6A7) Python Program to perform Data Management Below is the Python program that I used to perform data management operations on the selected variables: import pandas import numpy # setting options so that pandas data do not get truncated data = pandas.read_csv('nesarc_pds.csv', low_memory=False) print (f"Total rows in NESARC dataset: {len(data)}") #number of observations (rows) print (f"Total columns in NESARC dataset: {len(data.columns)}") # number of variables (columns) # column names colHard...

Frequency Distribution for Selected Variables

 I have used Python and the Pandas library to perform a frequency distribution analysis of the selected variables from the NESARC dataset. Below is the output od the Python code: Total rows in NESARC dataset: 43093 Total columns in NESARC dataset: 3010 Counts for duration of alcohol abuse S2AQ20  1     8960 -1     8266  2     4467  3     2780  10    2729  5     2463  4     1855  20    1618  99    1450  15     997  6      833  30     765  7      621  8      585  25     493  40     416  12     407  9      250  50     229  18     202  11     200  14     190  13  ...

Data Management and Visualization - Research Topic

Image
Research Dataset Selection - An Association of Alcohol Abuse on Cardio Vascular Health Although not a medical student, human health and what impacts its negatively has always raised my curiosity. Thus, after going through multiple codebook and datasets, I have decided to use in the NESARC study conducted on around 43093 U.S citizens. I would like to study the association between alcohol abuse and occurrence of various Cardio Vascular health conditions. Research Question For my research topic I will be studying the association between alcohol abuse, quantity of abuse, duration of abuse, kind of intoxicant consumed on the occurrence of the following five cardio vascular medical conditions in the last 12 months: Hyper Tension (High Blood Pressure) Arteriosclerosis (Hardening of Artery) Myocardial Infarction (Heart Attack) Tachy Cardia (Rapid Heart Beat) Angina Pectoris (Chest Pain) My Code Book After thorough study of the NESARC codebook, I have selected the following columns from the...