a/README.md b/README.md
1
---
1
# Rad Genome Chest CT
2
title: "RadGenome Chest CT Dataset"
3
license: cc-by-nc-sa-4.0
4
extra_gated_prompt: |
5
  ## Terms and Conditions for Using the RadGenome Chest CT
6
  
2
  
7
  **1. Acceptance of Terms**
3
  **1. Acceptance of Terms**
8
  Accessing and using the RadGenome Chest CT dataset implies your agreement to these terms and conditions copied from CT-RATE. If you disagree with any part, please refrain from using the dataset.
4
  Accessing and using the RadGenome Chest CT dataset implies your agreement to these terms and conditions copied from CT-RATE. If you disagree with any part, please refrain from using the dataset.
9
5
10
  **2. Permitted Use**
6
  **2. Permitted Use**
11
  - The dataset is intended solely for academic, research, and educational purposes.
7
  - The dataset is intended solely for academic, research, and educational purposes.
12
  - Any commercial exploitation of the dataset without prior permission is strictly forbidden.
8
  - Any commercial exploitation of the dataset without prior permission is strictly forbidden.
13
  - You must adhere to all relevant laws, regulations, and research ethics, including data privacy and protection standards.
9
  - You must adhere to all relevant laws, regulations, and research ethics, including data privacy and protection standards.
14
10
15
  **3. Data Protection and Privacy**
11
  **3. Data Protection and Privacy**
16
  - Acknowledge the presence of sensitive information within the dataset and commit to maintaining data confidentiality.
12
  - Acknowledge the presence of sensitive information within the dataset and commit to maintaining data confidentiality.
17
  - Direct attempts to re-identify individuals from the dataset are prohibited.
13
  - Direct attempts to re-identify individuals from the dataset are prohibited.
18
  - Ensure compliance with data protection laws such as GDPR and HIPAA.
14
  - Ensure compliance with data protection laws such as GDPR and HIPAA.
19
15
20
  **4. Attribution**
16
  **4. Attribution**
21
  - Cite the dataset and acknowledge the providers in any publications resulting from its use.
17
  - Cite the dataset and acknowledge the providers in any publications resulting from its use.
22
  - Claims of ownership or exclusive rights over the dataset or derivatives are not permitted.
18
  - Claims of ownership or exclusive rights over the dataset or derivatives are not permitted.
23
19
24
  **5. Redistribution**
20
  **5. Redistribution**
25
  - Redistribution of the dataset or any portion thereof is not allowed.
21
  - Redistribution of the dataset or any portion thereof is not allowed.
26
  - Sharing derived data must respect the privacy and confidentiality terms set forth.
22
  - Sharing derived data must respect the privacy and confidentiality terms set forth.
27
23
28
  **6. Disclaimer**
24
  **6. Disclaimer**
29
  The dataset is provided "as is" without warranty of any kind, either expressed or implied, including but not limited to the accuracy or completeness of the data.
25
  The dataset is provided "as is" without warranty of any kind, either expressed or implied, including but not limited to the accuracy or completeness of the data.
30
26
31
  **7. Limitation of Liability**
27
  **7. Limitation of Liability**
32
  Under no circumstances will the dataset providers be liable for any claims or damages resulting from your use of the dataset.
28
  Under no circumstances will the dataset providers be liable for any claims or damages resulting from your use of the dataset.
33
29
34
  **8. Access Revocation**
30
  **8. Access Revocation**
35
  Violation of these terms may result in the termination of your access to the dataset.
31
  Violation of these terms may result in the termination of your access to the dataset.
36
32
37
  **9. Amendments**
33
  **9. Amendments**
38
  The terms and conditions may be updated at any time; continued use of the dataset signifies acceptance of the new terms.
34
  The terms and conditions may be updated at any time; continued use of the dataset signifies acceptance of the new terms.
39
35
40
  **10. Governing Law**
36
  **10. Governing Law**
41
  These terms are governed by the laws of the location of the dataset providers, excluding conflict of law rules.
37
  These terms are governed by the laws of the location of the dataset providers, excluding conflict of law rules.
42
38
43
  **Consent:**
39
  **Consent:**
44
40
45
extra_gated_fields:
41
extra_gated_fields:
46
  Name: "text"
42
  Name: "text"
47
  Institution: "text"
43
  Institution: "text"
48
  Email: "text"
44
  Email: "text"
49
  I have read and agree with Terms and Conditions for using the RadGenome Chest CT and CT-RATE dataset: "checkbox"
45
  I have read and agree with Terms and Conditions for using the RadGenome Chest CT and CT-RATE dataset: "checkbox"
50
46
51
configs:
47
configs:
52
- config_name: grounded reports
48
- config_name: grounded reports
53
  data_files:
49
  data_files:
54
  - split: train
50
  - split: train
55
    path: "dataset/radgenome_files/train_region_report.csv"
51
    path: "dataset/radgenome_files/train_region_report.csv"
56
  - split: validation
52
  - split: validation
57
    path: "dataset/radgenome_files/validation_region_report.csv"
53
    path: "dataset/radgenome_files/validation_region_report.csv"
58
- config_name: grounded vqa
54
- config_name: grounded vqa
59
  data_files:
55
  data_files:
60
  - split: train
56
  - split: train
61
    path: ["dataset/radgenome_files/train_vqa_abnormality.csv",
57
    path: ["dataset/radgenome_files/train_vqa_abnormality.csv",
62
           "dataset/radgenome_files/train_vqa_location.csv",
58
           "dataset/radgenome_files/train_vqa_location.csv",
63
           "dataset/radgenome_files/train_vqa_presence.csv",
59
           "dataset/radgenome_files/train_vqa_presence.csv",
64
           "dataset/radgenome_files/train_vqa_size.csv"]
60
           "dataset/radgenome_files/train_vqa_size.csv"]
65
  - split: validation
61
  - split: validation
66
    path: ["dataset/radgenome_files/validation_vqa_abnormality.csv",
62
    path: ["dataset/radgenome_files/validation_vqa_abnormality.csv",
67
          "dataset/radgenome_files/validation_vqa_location.csv",
63
          "dataset/radgenome_files/validation_vqa_location.csv",
68
          "dataset/radgenome_files/validation_vqa_presence.csv",
64
          "dataset/radgenome_files/validation_vqa_presence.csv",
69
          "dataset/radgenome_files/validation_vqa_size.csv"]
65
          "dataset/radgenome_files/validation_vqa_size.csv"]
70
- config_name: case-level vqa
66
- config_name: case-level vqa
71
  data_files:
67
  data_files:
72
  - split: train
68
  - split: train
73
    path: "dataset/radgenome_files/train_case_disorders.csv"
69
    path: "dataset/radgenome_files/train_case_disorders.csv"
74
  - split: validation
70
  - split: validation
75
    path: "dataset/radgenome_files/calidation_case_disorders.csv"    
71
    path: "dataset/radgenome_files/calidation_case_disorders.csv"    
76
---
72
---
77
73
78
74
79
75
80
76
81
## [RadGenome Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis](https://arxiv.org/pdf/2404.16754)
77
## [RadGenome Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis](https://arxiv.org/pdf/2404.16754)
82
78
83
Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities.
79
Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities.
84
80
85
We introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on [CT-RATE](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE). Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. 
81
We introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on [CT-RATE](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE). Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. 
86
82
87
We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field.
83
We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field.
88
84
89
## Citing Us
85
## Citing Us
90
If you use RadGenome Chest CT, we would appreciate your references to [CT-CLIP](https://arxiv.org/abs/2403.17834) and [our paper](https://arxiv.org/pdf/2404.16754).
86
If you use RadGenome Chest CT, we would appreciate your references to [CT-CLIP](https://arxiv.org/abs/2403.17834) and [our paper](https://arxiv.org/pdf/2404.16754).