--- a
+++ b/data/README.md
@@ -0,0 +1,3 @@
+Please download the raw data set (~6.4 GB) at https://qiita.ucsd.edu/study/description/10423 
+
+The already preprocessed data set in .tsv format (~95 KB) can be found at https://drive.google.com/file/d/19GYJK87j3SS9AWUnQoqxkcDjyH7LOtvd/view