------------ PRE-PROCESSED DATA ANALYSIS ------------ We perform data analysis on each features of the PLCO and NLST dataset. Number of participants: - PLCO: 55161 - NLST: 48595 --- Feature analysis --- Age: This feature captures the person’s age. -------------- ----- ------ ----- ------ Age PLCO PLCO % NLST NLST % <= 50 0 0.0 1 0.0 50 < ... <= 60 27337 49.6 24861 51.2 60 < ... <= 70 25120 45.5 20901 43.0 > 70 2704 4.9 2832 5.8 Missing 0 0.0 0 0.0 -------------- ----- ------ ----- ------ Smoking cessation age: This feature describes the age at which the person stopped smoking. --------------------- ----- ------ ----- ------ Smoking cessation age PLCO PLCO % NLST NLST % <= 30 10470 19.0 2 0.0 30 < ... <= 40 11886 21.5 130 0.3 40 < ... <= 50 11447 20.8 7025 14.5 50 < ... <= 60 8649 15.7 14071 29.0 > 60 1942 3.5 4378 9.0 Missing 10767 19.5 22989 47.3 --------------------- ----- ------ ----- ------ Smoking status: This feature describes if the person is a current or a former cigarette smoker at the beginning of the study. -------------- ----- ------ ----- ------ Smoking status PLCO PLCO % NLST NLST % Active 9965 18.1 22842 47.0 Former 45196 81.9 25753 53.0 Missing 0 0.0 0 0.0 -------------- ----- ------ ----- ------ Pack-years: This feature refers to the number of packs smoked per day multiplied by the number of years during which the person smoked. --------------- ----- ------ ----- ------ Pack years PLCO PLCO % NLST NLST % <= 25 26981 48.9 8 0.0 25 < ... <= 50 16147 29.3 26746 55.0 50 < ... <= 100 9448 17.1 19544 40.2 > 100 1434 2.6 2297 4.7 Missing 1151 2.1 0 0.0 --------------- ----- ------ ----- ------ Smoking onset age: This feature indicates the age at which the person started smoking. ----------------- ----- ------ ----- ------ Smoking onset age PLCO PLCO % NLST NLST % <= 15 10169 18.4 17927 36.9 15 < ... <= 20 33760 61.2 25411 52.3 > 20 10950 19.9 5256 10.8 Missing 282 0.5 1 0.0 ----------------- ----- ------ ----- ------ Years smoked: This feature describes the total number of years during which the person smoked. -------------- ----- ------ ----- ------ Smoking years PLCO PLCO % NLST NLST % <= 10 8800 16.0 2 0.0 10 < ... <= 20 11761 21.3 292 0.6 20 < ... <= 30 11532 20.9 5134 10.6 30 < ... <= 40 13037 23.6 21620 44.5 > 40 8963 16.2 21547 44.3 Missing 1068 1.9 0 0.0 -------------- ----- ------ ----- ------ Lung family history: This feature describes if the person has close family (parents, siblings or child) who had lung cancer. -------------------------- ----- ------ ----- ------ Lung cancer family history PLCO PLCO % NLST NLST % No 48415 87.8 37302 76.8 Yes 6323 11.5 10598 21.8 Missing 423 0.8 695 1.4 -------------------------- ----- ------ ----- ------ BMI: This feature describes the person’s body mass index. ------------------------------------ ----- ------ ----- ------ Body Mass Index PLCO PLCO % NLST NLST % Underweight (... <= 18.4) 295 0.5 347 0.7 Healthy weight (18.5 <= ... <= 24.9) 17556 31.8 13404 27.6 Overweight (25 <= ... <= 29.9) 23920 43.4 20894 43.0 Obesity (... >= 30) 12631 22.9 13696 28.2 Missing 759 1.4 234 0.5 ------------------------------------ ----- ------ ----- ------ Lung cancer: This feature indicates if the person was diagnosed with lung cancer. ----------- ----- ------ ----- ------ Lung cancer PLCO PLCO % NLST NLST % Negative 52409 95.0 47084 96.9 Positive 2752 5.0 1511 3.1 Missing 0 0.0 0 0.0 ----------- ----- ------ ----- ------