Bulk RNA-seq profiles from TGCA-LUAD
(i.e., NSCLC adenocarcinoma) and TGCA-LUSC
(i.e., NSCLC squamous cell carcinoma) were extracted with the GDC data portal, normalized with TPM, and log-transformed.
MCP counter signatures (1) were computed for each sample (i.e., score the abundance of 10 cell types). Additionally, log expressions of 22 oncogenes associated with lung cancer were extracted as features (KRAS, NRAS, EGFR, MET, BRAF, ROS1, ALK, ERBB2, ERBB4, FGFR1, FGFR2, FGFR3, NTRK1, NTRK2, NTRK3, LTK, RET, RIT1, MAP2K1, DDR2, ALK, and CD274).
Clinical data, including Overall Survival data, were extracted from Liu et al. (2). Categorical features were binary encoded:
stageIV_vs_stageIII (stage): 0: stage III, 1: stage IV
Only patients with stage III or IV were considered in this analysis.