|
a |
|
b/outputs/data_analysis.txt |
|
|
1 |
------------ PRE-PROCESSED DATA ANALYSIS ------------ |
|
|
2 |
|
|
|
3 |
We perform data analysis on each features of the PLCO and NLST dataset. |
|
|
4 |
Number of participants: |
|
|
5 |
- PLCO: 55161 |
|
|
6 |
- NLST: 48595 |
|
|
7 |
|
|
|
8 |
--- Feature analysis --- |
|
|
9 |
|
|
|
10 |
Age: This feature captures the person’s age. |
|
|
11 |
-------------- ----- ------ ----- ------ |
|
|
12 |
Age PLCO PLCO % NLST NLST % |
|
|
13 |
<= 50 0 0.0 1 0.0 |
|
|
14 |
50 < ... <= 60 27337 49.6 24861 51.2 |
|
|
15 |
60 < ... <= 70 25120 45.5 20901 43.0 |
|
|
16 |
> 70 2704 4.9 2832 5.8 |
|
|
17 |
Missing 0 0.0 0 0.0 |
|
|
18 |
-------------- ----- ------ ----- ------ |
|
|
19 |
|
|
|
20 |
Smoking cessation age: This feature describes the age at which the person stopped smoking. |
|
|
21 |
--------------------- ----- ------ ----- ------ |
|
|
22 |
Smoking cessation age PLCO PLCO % NLST NLST % |
|
|
23 |
<= 30 10470 19.0 2 0.0 |
|
|
24 |
30 < ... <= 40 11886 21.5 130 0.3 |
|
|
25 |
40 < ... <= 50 11447 20.8 7025 14.5 |
|
|
26 |
50 < ... <= 60 8649 15.7 14071 29.0 |
|
|
27 |
> 60 1942 3.5 4378 9.0 |
|
|
28 |
Missing 10767 19.5 22989 47.3 |
|
|
29 |
--------------------- ----- ------ ----- ------ |
|
|
30 |
|
|
|
31 |
Smoking status: This feature describes if the person is a current or a former cigarette smoker at the beginning of the study. |
|
|
32 |
-------------- ----- ------ ----- ------ |
|
|
33 |
Smoking status PLCO PLCO % NLST NLST % |
|
|
34 |
Active 9965 18.1 22842 47.0 |
|
|
35 |
Former 45196 81.9 25753 53.0 |
|
|
36 |
Missing 0 0.0 0 0.0 |
|
|
37 |
-------------- ----- ------ ----- ------ |
|
|
38 |
|
|
|
39 |
Pack-years: This feature refers to the number of packs smoked per day multiplied by the number of years during which the person smoked. |
|
|
40 |
--------------- ----- ------ ----- ------ |
|
|
41 |
Pack years PLCO PLCO % NLST NLST % |
|
|
42 |
<= 25 26981 48.9 8 0.0 |
|
|
43 |
25 < ... <= 50 16147 29.3 26746 55.0 |
|
|
44 |
50 < ... <= 100 9448 17.1 19544 40.2 |
|
|
45 |
> 100 1434 2.6 2297 4.7 |
|
|
46 |
Missing 1151 2.1 0 0.0 |
|
|
47 |
--------------- ----- ------ ----- ------ |
|
|
48 |
|
|
|
49 |
Smoking onset age: This feature indicates the age at which the person started smoking. |
|
|
50 |
----------------- ----- ------ ----- ------ |
|
|
51 |
Smoking onset age PLCO PLCO % NLST NLST % |
|
|
52 |
<= 15 10169 18.4 17927 36.9 |
|
|
53 |
15 < ... <= 20 33760 61.2 25411 52.3 |
|
|
54 |
> 20 10950 19.9 5256 10.8 |
|
|
55 |
Missing 282 0.5 1 0.0 |
|
|
56 |
----------------- ----- ------ ----- ------ |
|
|
57 |
|
|
|
58 |
Years smoked: This feature describes the total number of years during which the person smoked. |
|
|
59 |
-------------- ----- ------ ----- ------ |
|
|
60 |
Smoking years PLCO PLCO % NLST NLST % |
|
|
61 |
<= 10 8800 16.0 2 0.0 |
|
|
62 |
10 < ... <= 20 11761 21.3 292 0.6 |
|
|
63 |
20 < ... <= 30 11532 20.9 5134 10.6 |
|
|
64 |
30 < ... <= 40 13037 23.6 21620 44.5 |
|
|
65 |
> 40 8963 16.2 21547 44.3 |
|
|
66 |
Missing 1068 1.9 0 0.0 |
|
|
67 |
-------------- ----- ------ ----- ------ |
|
|
68 |
|
|
|
69 |
Lung family history: This feature describes if the person has close family (parents, siblings or child) who had lung cancer. |
|
|
70 |
-------------------------- ----- ------ ----- ------ |
|
|
71 |
Lung cancer family history PLCO PLCO % NLST NLST % |
|
|
72 |
No 48415 87.8 37302 76.8 |
|
|
73 |
Yes 6323 11.5 10598 21.8 |
|
|
74 |
Missing 423 0.8 695 1.4 |
|
|
75 |
-------------------------- ----- ------ ----- ------ |
|
|
76 |
|
|
|
77 |
BMI: This feature describes the person’s body mass index. |
|
|
78 |
------------------------------------ ----- ------ ----- ------ |
|
|
79 |
Body Mass Index PLCO PLCO % NLST NLST % |
|
|
80 |
Underweight (... <= 18.4) 295 0.5 347 0.7 |
|
|
81 |
Healthy weight (18.5 <= ... <= 24.9) 17556 31.8 13404 27.6 |
|
|
82 |
Overweight (25 <= ... <= 29.9) 23920 43.4 20894 43.0 |
|
|
83 |
Obesity (... >= 30) 12631 22.9 13696 28.2 |
|
|
84 |
Missing 759 1.4 234 0.5 |
|
|
85 |
------------------------------------ ----- ------ ----- ------ |
|
|
86 |
|
|
|
87 |
Lung cancer: This feature indicates if the person was diagnosed with lung cancer. |
|
|
88 |
----------- ----- ------ ----- ------ |
|
|
89 |
Lung cancer PLCO PLCO % NLST NLST % |
|
|
90 |
Negative 52409 95.0 47084 96.9 |
|
|
91 |
Positive 2752 5.0 1511 3.1 |
|
|
92 |
Missing 0 0.0 0 0.0 |
|
|
93 |
----------- ----- ------ ----- ------ |
|
|
94 |
|
|
|
95 |
|