AttentionMOI / Git / Diff of /README.md

Models:

AlyssaS/

AttentionMOI

Downloads: 1

Diff of /README.md [4e24f4] .. [f00db4]

Switch to unified view


...

The input file format is described below, or you can refer to the reference data we provide (https://github.com/BioAI-kits/AttentionMOI/tree/master/AttentionMOI/example).

f | omic_file

REQUIRED: File path for omics files (should be matrix)

**NOTE:The file must be in csv format, such as rna.csv. Of course, it can be compressed with gz, such as rna.csv.gz.**. Example: The first line is the header, patient_id and gene (features) names.

patient_id,A1BG,A1CF,A2BP1,A2LD1,....

TCGA.KL.8323,3.3491,0.0,0.0,5.8939,....

TCGA.KL.8324,2.922,0.5557,0.5557,6.4226,....

n | omic_name

#### REQUIRED: Omic names for omics files, should be the same order as the omics file

l | label_file

#### REQUIRED: File path for label file

**NOTE:The file must be in csv format, such as label.csv. Of course, it can be compressed with gz, such as label.csv.gz.**. Example: The first line is the header, patient_id and label represent the sample name and sample classification label respectively. 

 patient_id,label

TCGA.KL.8328,0

TCGA.KL.8339,0

TCGA.KM.8439,1

TCGA.KM.8441,1

TCGA.KM.8442,1


**2. Output**

o | outdir

OPTIONAL: Setting output file path, default=./output


**3. Feature selection**

method

OPTIONAL: Method of feature selection, choosing from ANOVA, RFE, LASSO, PCA, default is no feature selection

percentile

OPTIONAL: Percent of features to keep for ANOVA (integer between 1-100), only used when using ANOVA, default=30

num_pc

OPTIONAL: Number of PCs to keep for PCA (integer), only used when using PCA, default=50

FSD

OPTIONAL: Whether to use FSD to mitigate noise of omics. Default is not using FSD, and set --FSD to use FSD

i | iteration

OPTIONAL: The number of FSD iterations (integer), default=10

s | seed

OPTIONAL: Random seed for FSD (integer), default=0

threshold

OPTIONAL: FSD threshold to select features (float), default=0.8 (select features that are selected in 80 percent FSD iterations)


**4. Building Model**

m | model 

 OPTIONAL: Model names, choosing from DNN, Net (Net for AttentionMOI), RF, XGboost, svm, mogonet, moanna, default=DNN.

t | test_size

OPTIONAL: Testing dataset proportion when split train test dataset (float), default=0.3 (30 percent data for testing)

b | batch

OPTIONAL: Mini-batch number for model training (integer), default=32

e | epoch

 OPTIONAL: Epoch number for model training (integer), default=300

r | lr

 OPTIONAL: Learning rate for model training(float), default=0.0001

w | weight_decay

OPTIONAL: weight_decay parameter for model training (float), default=0.0001

---

### Example


	a/README.md		b/README.md
	...		...
57		57
58	The input file format is described below, or you can refer to the reference data we provide (https://github.com/BioAI-kits/AttentionMOI/tree/master/AttentionMOI/example).	58	The input file format is described below, or you can refer to the reference data we provide (https://github.com/BioAI-kits/AttentionMOI/tree/master/AttentionMOI/example).
59		59
60	f \| omic_file	60	f \| omic_file
61		61
62	> REQUIRED: File path for omics files (should be matrix)	62	REQUIRED: File path for omics files (should be matrix)
63		63
64	NOTE:The file must be in csv format, such as rna.csv. Of course, it can be compressed with gz, such as rna.csv.gz.. Example: The first line is the header, patient_id and gene (features) names.	64	NOTE:The file must be in csv format, such as rna.csv. Of course, it can be compressed with gz, such as rna.csv.gz.. Example: The first line is the header, patient_id and gene (features) names.
65		65
66	> patient_id,A1BG,A1CF,A2BP1,A2LD1,....	66	patient_id,A1BG,A1CF,A2BP1,A2LD1,....
67	>	67
68	> TCGA.KL.8323,3.3491,0.0,0.0,5.8939,....	68	TCGA.KL.8323,3.3491,0.0,0.0,5.8939,....
69	>	69
70	> TCGA.KL.8324,2.922,0.5557,0.5557,6.4226,....	70	TCGA.KL.8324,2.922,0.5557,0.5557,6.4226,....
71		71
72	n \| omic_name	72	n \| omic_name
73		73
74	> REQUIRED: Omic names for omics files, should be the same order as the omics file	74	#### REQUIRED: Omic names for omics files, should be the same order as the omics file
75		75
76	l \| label_file	76	l \| label_file
77		77
78	> REQUIRED: File path for label file	78	#### REQUIRED: File path for label file
79		79
80	NOTE:The file must be in csv format, such as label.csv. Of course, it can be compressed with gz, such as label.csv.gz.. Example: The first line is the header, patient_id and label represent the sample name and sample classification label respectively.	80	NOTE:The file must be in csv format, such as label.csv. Of course, it can be compressed with gz, such as label.csv.gz.. Example: The first line is the header, patient_id and label represent the sample name and sample classification label respectively.
81		81
82	> patient_id,label	82	patient_id,label
83	>	83
84	> TCGA.KL.8328,0	84	TCGA.KL.8328,0
85	>	85
86	> TCGA.KL.8339,0	86	TCGA.KL.8339,0
87	>	87
88	> TCGA.KM.8439,1	88	TCGA.KM.8439,1
89	>	89
90	> TCGA.KM.8441,1	90	TCGA.KM.8441,1
91	>	91
92	> TCGA.KM.8442,1	92	TCGA.KM.8442,1
93		93
94		94
95	2. Output	95	2. Output
96		96
97	o \| outdir	97	o \| outdir
98		98
99	> OPTIONAL: Setting output file path, default=./output	99	OPTIONAL: Setting output file path, default=./output
100		100
101		101
102	3. Feature selection	102	3. Feature selection
103		103
104	method	104	method
105		105
106	> OPTIONAL: Method of feature selection, choosing from ANOVA, RFE, LASSO, PCA, default is no feature selection	106	OPTIONAL: Method of feature selection, choosing from ANOVA, RFE, LASSO, PCA, default is no feature selection
107		107
108	percentile	108	percentile
109		109
110	> OPTIONAL: Percent of features to keep for ANOVA (integer between 1-100), only used when using ANOVA, default=30	110	OPTIONAL: Percent of features to keep for ANOVA (integer between 1-100), only used when using ANOVA, default=30
111		111
112	num_pc	112	num_pc
113		113
114	> OPTIONAL: Number of PCs to keep for PCA (integer), only used when using PCA, default=50	114	OPTIONAL: Number of PCs to keep for PCA (integer), only used when using PCA, default=50
115		115
116	FSD	116	FSD
117		117
118	> OPTIONAL: Whether to use FSD to mitigate noise of omics. Default is not using FSD, and set --FSD to use FSD	118	OPTIONAL: Whether to use FSD to mitigate noise of omics. Default is not using FSD, and set --FSD to use FSD
119		119
120	i \| iteration	120	i \| iteration
121		121
122	> OPTIONAL: The number of FSD iterations (integer), default=10	122	OPTIONAL: The number of FSD iterations (integer), default=10
123		123
124	s \| seed	124	s \| seed
125		125
126	> OPTIONAL: Random seed for FSD (integer), default=0	126	OPTIONAL: Random seed for FSD (integer), default=0
127		127
128	threshold	128	threshold
129		129
130	> OPTIONAL: FSD threshold to select features (float), default=0.8 (select features that are selected in 80 percent FSD iterations)	130	OPTIONAL: FSD threshold to select features (float), default=0.8 (select features that are selected in 80 percent FSD iterations)
131		131
132		132
133	4. Building Model	133	4. Building Model
134		134
135	m \| model	135	m \| model
136		136
137	> OPTIONAL: Model names, choosing from DNN, Net (Net for AttentionMOI), RF, XGboost, svm, mogonet, moanna, default=DNN.	137	OPTIONAL: Model names, choosing from DNN, Net (Net for AttentionMOI), RF, XGboost, svm, mogonet, moanna, default=DNN.
138		138
139	t \| test_size	139	t \| test_size
140		140
141	> OPTIONAL: Testing dataset proportion when split train test dataset (float), default=0.3 (30 percent data for testing)	141	OPTIONAL: Testing dataset proportion when split train test dataset (float), default=0.3 (30 percent data for testing)
142		142
143	b \| batch	143	b \| batch
144		144
145	> OPTIONAL: Mini-batch number for model training (integer), default=32	145	OPTIONAL: Mini-batch number for model training (integer), default=32
146		146
147	e \| epoch	147	e \| epoch
148		148
149	> OPTIONAL: Epoch number for model training (integer), default=300	149	OPTIONAL: Epoch number for model training (integer), default=300
150		150
151	r \| lr	151	r \| lr
152		152
153	> OPTIONAL: Learning rate for model training(float), default=0.0001	153	OPTIONAL: Learning rate for model training(float), default=0.0001
154		154
155	w \| weight_decay	155	w \| weight_decay
156		156
157	> OPTIONAL: weight_decay parameter for model training (float), default=0.0001	157	OPTIONAL: weight_decay parameter for model training (float), default=0.0001
158		158
159	---	159	---
160		160
161	### Example	161	### Example
162		162