Card

About Dataset

The TCGA Brain Glioma Grading Dataset focuses on the classification of gliomas, the most prevalent primary tumors of the brain, into Lower-Grade Gliomas (LGG) and Glioblastoma Multiforme (GBM). Gliomas are classified based on histological and imaging criteria, with clinical and molecular factors playing a crucial role in the grading process. The dataset is derived from the Cancer Genome Atlas (TCGA) Project, funded by the National Cancer Institute (NCI).

Instances in this dataset represent patient records from TCGA-LGG and TCGA-GBM brain glioma projects. Each record is characterized by 20 molecular features, representing the most frequently mutated genes, and 3 clinical features related to patient demographics. The molecular features can be categorized as either mutated or not_mutated (wildtype) based on the TCGA Case_ID.

The primary goal of this dataset is to facilitate the development of predictive models capable of determining whether a patient has LGG or GBM using clinical and molecular features. By identifying the optimal subset of mutation genes and clinical features, the dataset aims to enhance the accuracy of glioma grading, ultimately improving performance while reducing the costs associated with molecular tests.

Funding Source:

The creation of this dataset was funded by The Cancer Genome Atlas (TCGA) Project, supported by the National Cancer Institute (NCI).