Qiita is an open source software package, and we welcome community contributions. You can find the source code and test code for Qiita under public revision control in the Qiita git repository on GitHub. We very much welcome contributions.
This document covers what you should do to get started with contributing to Qiita. You should read this whole document before considering submitting code to Qiita. This will save time for both you and the Qiita developers.
Adding source code to Qiita, can take place in three different modules:
qiita_pet
: Contains the graphical user interface layer of the system, mainly written in Python, JavaScript and HTML (see Tornado templates).qiita_db
: Contains the bridge layer between the Python objects and the SQL database. In general this subpackage is mainly written in Python with a fair amount of inline PostgreSQL statements (see the section below on how to make database changes).qiita_ware
: Contains the logic of the system and functions that can generally be called from a Python script (see the scripts directory), and it is mostly written in Python. Several workflows that can be achieved using the GUI, can also be reproduced through the command line using this subpackage.Regardless of the module where you are adding new functionality, you should
always take into consideration how these new features affect users and whether
or not adding a new section or document to the documentation (found under the
doc
folder) would be useful.
The Qiita repository contains three branches:
master
: This branch reflects the code deployed in our main Qiita server.dev
: This branch is the active development branch. All new Pull Requests should be performed against this branch.release-candidate
: This branch is used to freeze the code from the dev
branch, so we can deploy in our test servers and exercise the code extensively before deploying in our main system. Code freezes typically occur one week before the scheduled deployment. Check our milestones page to see the scheduled deployments.Since Qiita is a package that is continuously growing, we found ourselves in a position where development rules needed to be established so we can reduce both development and reviewer time. These rules are:
%timeit
magic (or similar).# See issue #XXX
). This will help other developers to identify the source of the issue and it will likely be solved faster.The Qiita configuration file determines how the package interacts with your system’s resources (redis and postgres). Thus you should review the documentation detailed here, but especially bear in mind the following points:
qiita_core/support_files/qiita_config.txt
and if you don’t set a QIITA_CONFIG_FP
environment variable, that’s the file that Qiita will use.[main]
section sets a TEST_ENVIRONMENT
variable, which determines whether your system will be running unit tests or if it a demo/production system. You will want to set the value to TRUE if you are running the unit tests.A note on data accumulation: Qiita keeps data in the BASE_DATA_DIR
as the system gets used. When you drop a Qiita environment and create a fresh testing environment, the “old” data that was generated from the previous environment should be manually deleted (or, at least, removed from the data directories in the BASE_DATA_DIR
).
Unit tests in Qiita are located inside the tests/test folder of every sub-module, for example qiita_db/test/test_metadata_template.py
. These can be executed on a per-file basis or using nosetests
from the base directory.
During test creation make sure the test class is decorated with @qiita_test_checker()
if database modifications are done during tests. This will automatically drop and rebuild the qiita schema after the entire test class has been executed. This requires to all the tests in a single class be independent of each other, so stochastic failures do not occur due to different test order execution.
Coverage testing is in effect, so run tests using nosetests --with-coverage [test_file.py]
to check what lines of new code in your pull request are not tested.
The documentation for Qiita is maintained as part of this repository, under the
qiita_pet/support_files/doc
folder, for more information, see the README.md
file in qiita_pet/support_files/doc/README.md
.
Scripts in Qiita are located inside the scripts directory, their actions will rely on the settings described in the Qiita config file, for example if you are dropping a database, the database that will be dropped is the one described by the DATABASE
setting. The following is a list of the most commonly used commands during development:
qiita-env make
will create a new environment (as specified by the Qiita config file).qiita-env drop
will delete the environment (as specified by the Qiita config file).qiita pet webserver start
, will start the Qiita web-application running on port 21174, you can change this using the --port
flag, for example --port=7532
.After the initial production release of Qiita, changes to the database schema will require patches; the database can no longer be dropped and recreated using the most recent schema because all the data would be lost! Therefore, patches must be applied instead.
qiita_db/support_files/patches
directory. Note that the patches will be applied in order based on the natural sort order of their filename (e.g., 2.sql
will be applied before 10.sql
, and 10.sql
will be applied before a.sql
)In May 2024 we decided to:
* Merge all patches into the main database schema, this means that there are no patches younger than 92.sql.
* Added a new folder patches/test_db_sql/
where we can store sql files that will only be applied for the test environment.
* Added a test to the GitHub actions to test that the production database has an expected number of rows.
Note that these changes mean:
1. 92.sql is the current first sql file to patch the database.
2. If you need to make changes (like INSERTS) only to the tests database you need to add a patch to patches/test_db_sql/
.
qiita-db.dbs
) in DBSchemafoo.dbs
)foo.dbs
with qiita-db.dbs
1.sql
)One drawback is that developers will need to have DBSchema to develop for this project.
If you need to submit a patch that changes only data but does not alter the schema, you should still create a patch file with the next name (e.g., 2.sql
) with your changes. Note that a patch should not be created if the modifications do not apply to Qiita databases in general; data patches are only necessary in some cases, e.g., if the terms in an ontology change.
Occasionally, SQL alone cannot effect the desired changes, and a corresponding python script must be run after the SQL patch is applied. If this is the case, a python file should be created in the patches/python_patches
directory, and it should have the same basename as the SQL file. For example, if there is a patch 4.sql
in the patches
directory, and this patch requires a python script be run after the SQL is applied, then the python file should be placed at patches/python_patches/4.py
. Note that not every SQL patch will have a corresponding python patch, but every python patch will have a corresponding SQL patch.
If in the future we discover a use-case where a python patch must be applied for which there is no corresponding SQL patch, then a blank SQL patch file will still need to be created.
Since the qiita_db
code contains a mixture of python code and SQL code, here are some coding guidelines to add SQL code to Qiita:
python
sql = "select * from qiita.qiita_user"
python
sql = "SELECT * FROM qiita.qiita_user"
python
sql = ("SELECT processed_data_status FROM qiita.processed_data_status pds JOIN "
"qiita.processed_data pd USING (processed_data_status_id) JOIN "
"qiita.study_processed_data spd USING (processed_data_id) "
"WHERE spd.study_id = %s")
python
sql = """SELECT processed_data_status
FROM qiita.processed_data_status pds
JOIN qiita.processed_data pd USING (processed_data_status_id)
JOIN qiita.study_processed_data spd USING (processed_data_id)
WHERE spd.study_id = %s"""
sql = "SELECT * FROM qiita.qiita_user"
SELECT
, FROM
, WHERE
and similar clauses:python
sql = """SELECT udt_name FROM information_schema.columns WHERE
column_name = %s AND table_schema = 'qiita' AND (table_name = %s
OR table_name = %s)"""
python
sql = """SELECT udt_name
FROM information_schema.columns
WHERE column_name = %s AND table_schema = 'qiita'
AND (table_name = %s OR table_name = %s)"""
sql_args
parameter from the transaction object. This is a strong recommendation from the psycopg2 developers to avoid SQL injection attacks (detailed explanation here):python
sql = """SELECT processed_data_status
FROM qiita.processed_data_status pds
JOIN qiita.processed_data pd USING (processed_data_status_id)
JOIN qiita.study_processed_data spd USING (processed_data_id)
WHERE spd.study_id = %s""" % study.id
with TRN:
TRN.add(sql)
python
sql = """SELECT processed_data_status
FROM qiita.processed_data_status pds
JOIN qiita.processed_data pd USING (processed_data_status_id)
JOIN qiita.study_processed_data spd USING (processed_data_id)
WHERE spd.study_id = %s"""
with TRN:
TRN.add(sql, [study.id])
str.format
function. Table or column names as parameters are not supported by psycopg2. Using str.format
is desirable because if you need to pass parameters to the SQL statement, the python string formatting will fail (see second example below):# This will fail during execution with the following error:
# TypeError: not enough arguments for format string
table = "qiita_user"
sql = "SELECT * FROM qiita.%s WHERE email = %s" % table
* Correct:
python
table = "qiita_user"
sql = "SELECT * FROM qiita.{0}".format(table)
6. The SQL command should be set up in a variable and use this variable as parameter to the `TRN.add` method, rather than defining the SQL statement in the method itself, unless the statement is short and fits in a single line:
* Wrong:
python
TRN.add("""SELECT processed_data_status
FROM qiita.processed_data_status pds
JOIN qiita.processed_data pd USING (processed_data_status_id)
JOIN qiita.study_processed_data spd USING (processed_data_id)
WHERE spd.study_id = %s""", [study.id])
* Correct:
python
sql = """SELECT processed_data_status
FROM qiita.processed_data_status pds
JOIN qiita.processed_data pd USING (processed_data_status_id)
JOIN qiita.study_processed_data spd USING (processed_data_id)
WHERE spd.study_id = %s"""
TRN.add(sql, [study.id])
TRN.add("SELECT * FROM qiita.qiita_user WHERE email=%s", [user.id])
```