[9ab7c1]: / outputs / data_analysis.txt

Download this file

96 lines (82 with data), 4.3 kB

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
------------ PRE-PROCESSED DATA ANALYSIS ------------
We perform data analysis on each features of the PLCO and NLST dataset.
Number of participants:
- PLCO: 55161
- NLST: 48595
--- Feature analysis ---
Age: This feature captures the person’s age.
-------------- ----- ------ ----- ------
Age PLCO PLCO % NLST NLST %
<= 50 0 0.0 1 0.0
50 < ... <= 60 27337 49.6 24861 51.2
60 < ... <= 70 25120 45.5 20901 43.0
> 70 2704 4.9 2832 5.8
Missing 0 0.0 0 0.0
-------------- ----- ------ ----- ------
Smoking cessation age: This feature describes the age at which the person stopped smoking.
--------------------- ----- ------ ----- ------
Smoking cessation age PLCO PLCO % NLST NLST %
<= 30 10470 19.0 2 0.0
30 < ... <= 40 11886 21.5 130 0.3
40 < ... <= 50 11447 20.8 7025 14.5
50 < ... <= 60 8649 15.7 14071 29.0
> 60 1942 3.5 4378 9.0
Missing 10767 19.5 22989 47.3
--------------------- ----- ------ ----- ------
Smoking status: This feature describes if the person is a current or a former cigarette smoker at the beginning of the study.
-------------- ----- ------ ----- ------
Smoking status PLCO PLCO % NLST NLST %
Active 9965 18.1 22842 47.0
Former 45196 81.9 25753 53.0
Missing 0 0.0 0 0.0
-------------- ----- ------ ----- ------
Pack-years: This feature refers to the number of packs smoked per day multiplied by the number of years during which the person smoked.
--------------- ----- ------ ----- ------
Pack years PLCO PLCO % NLST NLST %
<= 25 26981 48.9 8 0.0
25 < ... <= 50 16147 29.3 26746 55.0
50 < ... <= 100 9448 17.1 19544 40.2
> 100 1434 2.6 2297 4.7
Missing 1151 2.1 0 0.0
--------------- ----- ------ ----- ------
Smoking onset age: This feature indicates the age at which the person started smoking.
----------------- ----- ------ ----- ------
Smoking onset age PLCO PLCO % NLST NLST %
<= 15 10169 18.4 17927 36.9
15 < ... <= 20 33760 61.2 25411 52.3
> 20 10950 19.9 5256 10.8
Missing 282 0.5 1 0.0
----------------- ----- ------ ----- ------
Years smoked: This feature describes the total number of years during which the person smoked.
-------------- ----- ------ ----- ------
Smoking years PLCO PLCO % NLST NLST %
<= 10 8800 16.0 2 0.0
10 < ... <= 20 11761 21.3 292 0.6
20 < ... <= 30 11532 20.9 5134 10.6
30 < ... <= 40 13037 23.6 21620 44.5
> 40 8963 16.2 21547 44.3
Missing 1068 1.9 0 0.0
-------------- ----- ------ ----- ------
Lung family history: This feature describes if the person has close family (parents, siblings or child) who had lung cancer.
-------------------------- ----- ------ ----- ------
Lung cancer family history PLCO PLCO % NLST NLST %
No 48415 87.8 37302 76.8
Yes 6323 11.5 10598 21.8
Missing 423 0.8 695 1.4
-------------------------- ----- ------ ----- ------
BMI: This feature describes the person’s body mass index.
------------------------------------ ----- ------ ----- ------
Body Mass Index PLCO PLCO % NLST NLST %
Underweight (... <= 18.4) 295 0.5 347 0.7
Healthy weight (18.5 <= ... <= 24.9) 17556 31.8 13404 27.6
Overweight (25 <= ... <= 29.9) 23920 43.4 20894 43.0
Obesity (... >= 30) 12631 22.9 13696 28.2
Missing 759 1.4 234 0.5
------------------------------------ ----- ------ ----- ------
Lung cancer: This feature indicates if the person was diagnosed with lung cancer.
----------- ----- ------ ----- ------
Lung cancer PLCO PLCO % NLST NLST %
Negative 52409 95.0 47084 96.9
Positive 2752 5.0 1511 3.1
Missing 0 0.0 0 0.0
----------- ----- ------ ----- ------