Frequency Distribution for Selected Variables

 I have used Python and the Pandas library to perform a frequency distribution analysis of the selected variables from the NESARC dataset. Below is the output od the Python code:

Total rows in NESARC dataset: 43093

Total columns in NESARC dataset: 3010

Counts for duration of alcohol abuse S2AQ20

 1     8960

-1     8266

 2     4467

 3     2780

 10    2729

 5     2463

 4     1855

 20    1618

 99    1450

 15     997

 6      833

 30     765

 7      621

 8      585

 25     493

 40     416

 12     407

 9      250

 50     229

 18     202

 11     200

 14     190

 13     171

 17     154

 22     145

 16     143

 35     137

 21     132

 19     117

 23     106

 24     103

 28      99

 27      77

 26      72

 60      68

 45      67

 32      58

 37      47

 34      44

 33      43

 36      41

 31      38

 38      36

 29      35

 47      32

 48      29

 42      28

 55      28

 44      25

 43      23

 46      22

 41      20

 39      18

 52      17

 53      16

 58      14

 57      12

 54      12

 49      12

 56      11

 65      11

 70      10

 51       9

 59       6

 68       5

 75       4

 61       4

 63       3

 71       3

 62       2

 69       2

 67       1

 80       1

 64       1

 77       1

 72       1

 66       1

Name: S2AQ20, dtype: int64

Percentages for duration of alcohol abuse S2AQ20

 1     0.207922

-1     0.191818

 2     0.103660

 3     0.064512

 10    0.063328

 5     0.057155

 4     0.043046

 20    0.037547

 99    0.033648

 15    0.023136

 6     0.019330

 30    0.017752

 7     0.014411

 8     0.013575

 25    0.011440

 40    0.009654

 12    0.009445

 9     0.005801

 50    0.005314

 18    0.004688

 11    0.004641

 14    0.004409

 13    0.003968

 17    0.003574

 22    0.003365

 16    0.003318

 35    0.003179

 21    0.003063

 19    0.002715

 23    0.002460

 24    0.002390

 28    0.002297

 27    0.001787

 26    0.001671

 60    0.001578

 45    0.001555

 32    0.001346

 37    0.001091

 34    0.001021

 33    0.000998

 36    0.000951

 31    0.000882

 38    0.000835

 29    0.000812

 47    0.000743

 48    0.000673

 42    0.000650

 55    0.000650

 44    0.000580

 43    0.000534

 46    0.000511

 41    0.000464

 39    0.000418

 52    0.000394

 53    0.000371

 58    0.000325

 57    0.000278

 54    0.000278

 49    0.000278

 56    0.000255

 65    0.000255

 70    0.000232

 51    0.000209

 59    0.000139

 68    0.000116

 75    0.000093

 61    0.000093

 63    0.000070

 71    0.000070

 62    0.000046

 69    0.000046

 67    0.000023

 80    0.000023

 64    0.000023

 77    0.000023

 72    0.000023

 66    0.000023

Name: S2AQ20, dtype: float64

Counts for how often abused alcohol: S2AQ21A

-1     8266

 10    5215

 6     3309

 8     1367

 5     4600

 3     4040

 4     4277

 1     4167

 9     2759

 7     2502

 2     2095

 99     496

Name: S2AQ21A, dtype: int64

Percentages for how often abused alcohol: S2AQ21A

-1     0.191818

 10    0.121017

 6     0.076787

 8     0.031722

 5     0.106746

 3     0.093751

 4     0.099250

 1     0.096698

 9     0.064024

 7     0.058060

 2     0.048616

 99    0.011510

Name: S2AQ21A, dtype: float64

Counts for how often drank 5+ drinks: S2AQ22

-1      8266

 11    20698

 9       968

 8       532

 4      1856

 5      1908

 1      2090

 10     1330

 6      1208

 3      1764

 7      1018

 2       957

 99      498

Name: S2AQ22, dtype: int64

Percentages for how often drank 5+ drinks: S2AQ22

-1     0.191818

 11    0.480310

 9     0.022463

 8     0.012345

 4     0.043070

 5     0.044276

 1     0.048500

 10    0.030863

 6     0.028032

 3     0.040935

 7     0.023623

 2     0.022208

 99    0.011556

Name: S2AQ22, dtype: float64

Counts for type of alcohol abused:S2AQ23

-1     8266

 2    12351

 4     6248

 1     1802

 3     3681

 9    10745

Name: S2AQ23, dtype: int64

Percentages for type of alcohol abused:S2AQ23

-1    0.191818

 2    0.286613

 4    0.144989

 1    0.041817

 3    0.085420

 9    0.249344

Name: S2AQ23, dtype: float64

Counts of people having hardening of arteries in last 12 months: S13Q6A1

2    40917

1      911

9     1265

Name: S13Q6A1, dtype: int64

Percentages of people having hardening of arteries in last 12 months: S13Q6A1

2    0.949505

1    0.021140

9    0.029355

Name: S13Q6A1, dtype: float64

Counts of people having high blood pressure in last 12 months: S13Q6A2

2    32828

1     9136

9     1129

Name: S13Q6A2, dtype: int64

Percentages of people having high blood pressure in last 12 months: S13Q6A2

2    0.761794

1    0.212007

9    0.026199

Name: S13Q6A2, dtype: float64

Counts of people having heart attack in last 12 months: S13Q6A7

2    41557

1      470

9     1066

Name: S13Q6A7, dtype: int64

Percentages of people having heart attack in last 12 months: S13Q6A7

2    0.964356

1    0.010907

9    0.024737

Name: S13Q6A7, dtype: float64

Inferences from the frequency distribution:

The following inferences can be drawn from the frequency distributtion for the selected variables:

  • Hardening of arteries: Around Around 2.11% people from the survey suffered from Hardening of Arteries in last 12 months, 94.95% did not suffer this issue, around 2.94% were not sure.
  • High Blood Pressure: Around 21.20% of people suffered from high blood pressure in last 12 months, around 76.18% did not suffer and around 2.62% were not sure
  • Heart Attack: Around 1.09% of the surveyed people suffered from heart attack in last 12 months, around 96.44% people did not have heart attack in last 12 months and around 2.47% were not sure.
  • Major type of alcohol drank during period of alcohol abuse: Around 4.18% people drank Coolers, around 28.66 drank Beer, around 8.54% drank Wine, around 14.50% drank Liquor and the rest were either not sure or did not drink
  • Duration of alcohol abuse: Around 20.79% people abused alcohol for 1 year, around 10.37% for 2 years and so on till 80 years which can be observed from the frequency distribution output above. -1 indicates invalid entries or not applicable.
  • How often drank any alcohol during period of abuse: 9.67% drank every day, 4.86% nearly every day, 9.38% 3 tp 4 times a week, 9.93% 2 times a week, 10.67% once a week, 7.68% 2 to 3 times a month, 5.81% once a month, 3.17% 7 to 11 times a year, 6.40% 3 to 6 times a year, 12.10% once or twice a year. -1 or 99 indicates invalid or unknown entries.

Comments

Popular posts from this blog

Data Management and Visualization - Research Topic

Week 4 Assignmemt: Graphing Decisions

Week 3 Assignment: Data Management