Diff of /CONTRIBUTING.md [000000] .. [879b32]

Switch to unified view

a b/CONTRIBUTING.md
1
# Contributing to Qiita
2
3
Qiita is an open source software package, and we welcome community contributions. You can find the source code and test code for Qiita under public revision control in the Qiita git repository on [GitHub](http://github.com/biocore/qiita). We very much welcome contributions.
4
5
This document covers what you should do to get started with contributing to Qiita. You should read this whole document before considering submitting code to Qiita. This will save time for both you and the Qiita developers.
6
7
# General Notes on Development
8
9
Adding source code to Qiita, can take place in three different modules:
10
11
* `qiita_pet`: Contains the graphical user interface layer of the system, mainly written in Python, JavaScript and HTML (see [Tornado templates](http://tornado.readthedocs.org/en/latest/template.html)).
12
* `qiita_db`: Contains the bridge layer between the Python objects and the SQL database. In general this subpackage is mainly written in Python with a fair amount of inline PostgreSQL statements (see the section below on how to make database changes).
13
* `qiita_ware`: Contains the logic of the system and functions that can generally be called from a Python script (see the scripts directory), and it is mostly written in Python. Several workflows that can be achieved using the GUI, can also be reproduced through the command line using this subpackage.
14
15
Regardless of the module where you are adding new functionality, you should
16
always take into consideration how these new features affect users and whether
17
or not adding a new section or document to the documentation (found under the
18
`doc` folder) would be useful.
19
20
### Repository branch structure
21
22
The Qiita repository contains three branches:
23
24
* `master`: This branch reflects the code deployed in our [main Qiita server](http://qiita.microbio.me).
25
* `dev`: This branch is the active development branch. All new Pull Requests should be performed against this branch.
26
* `release-candidate`: This branch is used to freeze the code from the `dev` branch, so we can deploy in our test servers and exercise the code extensively before deploying in our main system. Code freezes typically occur one week before the scheduled deployment. Check our [milestones page](https://github.com/biocore/qiita/milestones) to see the scheduled deployments.
27
28
### The Qiita development rules
29
30
Since Qiita is a package that is continuously growing, we found ourselves in a position where development rules needed to be established so we can reduce both development and reviewer time. These rules are:
31
32
1. Pull Requests (PR) should be small: maximum 200 lines
33
  1. HTML files and DBS files (from DBSchema) do not count, but JavaScript does
34
  2. Test data do not count toward the line limit
35
  3. PR over this limit will only be allowed if it has been discussed with the developer team and it can potentially be done in a different branch.
36
2. The code in the master branch should always be consistent. If your PR is leaving master in an inconsistent state, it's a red flag that more changes need to be done, and the code has to go to its own branch until all modifications get in (see 1. iii). Ideally, these branches should exist for a short period of time (~24h) before getting merged into master again, in order to avoid merge conflicts. This means that both developers and reviewers agree on working/reviewing these issues fast.
37
3. If you are changing code and the reviewers provide suggestions on how to improve the code, make the change unless you can demonstrate that your current implementation is better than the suggestion. Suggested changes must also explain why they are better when given, and make a reasonable case for the change. Examples:
38
  * For performance improvements, the reviewer should provide code using IPython's `%timeit` magic (or similar).
39
  * Point to the [Contributing.md](https://github.com/biocore/qiita/blob/master/CONTRIBUTING.md) and/or the Code Guidelines (under construction).
40
  * For User Interface (UI) changes, explain how usability will be improved or describe the difficulties you found in your first interaction with the interface.
41
  * Code readability improvements, as code that is difficult to understand is hard to maintain. If the first time a reviewer reads the code does not understand the code at all, it is a red flag that the code is not going to be maintained.
42
4. Avoid competing PR. If you're working on an issue that can conflict with another developer, coordinate with him/her to get the work done. If coordination proves difficult, include the rest of the development team in the discussion to determine the best way to proceed.
43
5. If you find an issue while working on a PR, you must either:
44
  * if it's a small change and completely unrelated to your PR, stage your changes, create a new branch and submit a PR. It will likely be merged fast and will reduce the time that issue is going to be present in the code base.
45
  * if it's a big issue, create an issue on GitHub, make sure someone is assigned to the issue, and add a comment in the code with the issue number (e.g. `# See issue #XXX`). This will help other developers to identify the source of the issue and it will likely be solved faster.
46
6. Group issues in blocks that can be solved together. Using the GitHub's label system will be the best way to do this.
47
7. When you start working in a complex issue, discussing the path that you're going to take to solve it with other developers will help to identify potential problems in your solution and to make a correct definition of the issue scope. Starting the discussion in the GitHub issue tracker is recommended. If no consensus could be reached in some solution, moving the discussion to a meeting will be the path to move forward.
48
8. UI development is tricky and really subjective. In order to smooth the progress, this should be the path to develop the UI:
49
  1. Discuss as a group (in meetings or in the issue tracker) the overall design of the new UI.
50
  2. The developer assigned to the issue, will mock up some view in straight HTML or with a static tornado page, and shares the view with the rest of the developer team.
51
  3. The developer team reaches a consensus in the new UI layout, by modifying the mock up and/or providing constructive feedback to the assigned developer. After all, the developer team will be the first users of the new UI, so if something smells fishy it will become a bigger problem for the end users.
52
  4. After a consensus is reached, the assigned developer implements the new UI.
53
  5. Once the PR is issued, another round of improvements can be done until a consensus is reached. Sometimes, the first consensus is not the best layout; and new ideas/improvements are always welcome!
54
9. Last but not least, you are working as part of a team and you should try to help others when possible.
55
56
57
### Configuration file
58
59
The Qiita configuration file determines how the package interacts with your system’s resources (redis and postgres). Thus you should review the documentation detailed [here](https://docs.google.com/document/d/1u7kwLP31NM513-8xwpwvLbSQxYu0ehI6Jau1APR13e0/edit#), but especially bear in mind the following points:
60
61
* An example version of this file can be found here `qiita_core/support_files/qiita_config.txt` and if you don’t set a `QIITA_CONFIG_FP` environment variable, that’s the file that Qiita will use.
62
* The `[main]` section sets a `TEST_ENVIRONMENT` variable, which determines whether your system will be running unit tests or if it a demo/production system. You will want to set the value to TRUE if you are running the unit tests.
63
64
**A note on data accumulation**: Qiita keeps data in the `BASE_DATA_DIR` as the system gets used. When you drop a Qiita environment and create a fresh testing environment, the “old” data that was generated from the previous environment should be **manually** deleted (or, at least, removed from the data directories in the `BASE_DATA_DIR`).
65
66
### Unit tests
67
68
Unit tests in Qiita are located inside the tests/test folder of every sub-module, for example `qiita_db/test/test_metadata_template.py`. These can be executed on a per-file basis or using `nosetests` from the base directory.
69
70
During test creation make sure the test class is decorated with `@qiita_test_checker()` if database modifications are done during tests. This will automatically drop and rebuild the qiita schema after the entire test class has been executed. This requires to all the tests in a single class be independent of each other, so stochastic failures do not occur due to different test order execution.
71
72
Coverage testing is in effect, so run tests using `nosetests --with-coverage [test_file.py]` to check what lines of new code in your pull request are not tested.
73
74
### Documentation
75
76
The documentation for Qiita is maintained as part of this repository, under the
77
`qiita_pet/support_files/doc` folder, for more information, see the README.md
78
file in `qiita_pet/support_files/doc/README.md`.
79
80
### Scripts
81
82
Scripts in Qiita are located inside the scripts directory, their actions will rely on the settings described in the Qiita config file, for example if you are dropping a database, the database that will be dropped is the one described by the `DATABASE` setting. The following is a list of the most commonly used commands during development:
83
84
* `qiita-env make` will create a new environment (as specified by the Qiita config file).
85
* `qiita-env drop` will delete the environment (as specified by the Qiita config file).
86
* `qiita pet webserver start`, will start the Qiita web-application running on port 21174, you can change this using the `--port` flag, for example `--port=7532`.
87
88
## Making Database Changes
89
After the initial production release of Qiita, changes to the database schema will require patches; the database can no longer be dropped and recreated using the most recent schema because all the data would be lost! Therefore, patches must be applied instead.
90
91
### Approach
92
93
1. We keep "unpatched" versions of the SQL and DBS files in the repository
94
2. We keep fully patched versions of the DBS and HTML files in the repository
95
3. We keep a patch file for each patch as required in the `qiita_db/support_files/patches` directory. Note that **the patches will be applied in order based on the natural sort order of their filename** (e.g., `2.sql` will be applied before `10.sql`, and `10.sql` will be applied before `a.sql`)
96
97
### Patch 91.sql
98
99
In May 2024 we decided to:
100
* Merge all patches into the main database schema, this means that there are no patches younger than 92.sql.
101
* Added a new folder `patches/test_db_sql/` where we can store sql files that will only be applied for the test environment.
102
* Added a test to the GitHub actions to test that the production database has an expected number of rows.
103
104
Note that these changes mean:
105
1. 92.sql is the current first sql file to patch the database.
106
2. If you need to make changes (like INSERTS) _only_ to the tests database you need to add a patch to `patches/test_db_sql/`.
107
108
### Developer Workflow
109
110
1. Load the fully patched DBS file (e.g., `qiita-db.dbs`) in [DBSchema](http://www.dbschema.com/)
111
2. Make desired changes
112
3. Save the DBS file *under a different name* (e.g., `foo.dbs`)
113
4. Select *Compare Schemas with Other Project From File* from the *Synchronization* menu
114
5. Compare `foo.dbs` with `qiita-db.dbs`
115
6. Click the button to prefer all of your new changes
116
7. Save the patch file with the next number (e.g., `1.sql`)
117
8. Edit the patch file and add general comments including a date, and (where necessary) specific comments
118
119
One drawback is that developers will need to have [DBSchema](http://www.dbschema.com/) to develop for this project.
120
121
### Data Patches
122
123
If you need to submit a patch that changes only data but does not alter the schema, you should still create a patch file with the next name (e.g., `2.sql`) with your changes. Note that a patch should *not* be created if the modifications do not apply to Qiita databases in general; data patches are only necessary in some cases, e.g., if the terms in an ontology change.
124
125
### Python Patches
126
127
Occasionally, SQL alone cannot effect the desired changes, and a corresponding python script must be run after the SQL patch is applied. If this is the case, a python file should be created in the `patches/python_patches` directory, and it should have the same basename as the SQL file. For example, if there is a patch `4.sql` in the `patches` directory, and this patch requires a python script be run after the SQL is applied, then the python file should be placed at `patches/python_patches/4.py`. Note that not every SQL patch will have a corresponding python patch, but every python patch will have a corresponding SQL patch.
128
129
If in the future we discover a use-case where a python patch must be applied for which there *is no corresponding SQL patch*, then a blank SQL patch file will still need to be created.
130
131
## SQL coding guidelines
132
Since the `qiita_db` code contains a mixture of python code and SQL code, here are some coding guidelines to add SQL code to Qiita:
133
134
1. Any SQL keyword should be written uppercased:
135
  * Wrong:
136
  ```python
137
  sql = "select * from qiita.qiita_user"
138
  ```
139
  * Correct:
140
  ```python
141
  sql = "SELECT * FROM qiita.qiita_user"
142
  ```
143
2. Triple quotes are preferred for the SQL statements, unless the statement fits in a single line, since most editors and GitHub will detect SQL statement and apply highlighting accordingly. See point 3 for logical grouping of the SQL statements and indentation:
144
  * Wrong:
145
  ```python
146
  sql = ("SELECT processed_data_status FROM qiita.processed_data_status pds JOIN "
147
         "qiita.processed_data pd USING (processed_data_status_id) JOIN "
148
         "qiita.study_processed_data spd USING (processed_data_id) "
149
         "WHERE spd.study_id = %s")
150
  ```
151
  * Correct:
152
  ```python
153
  sql = """SELECT processed_data_status
154
           FROM qiita.processed_data_status pds
155
              JOIN qiita.processed_data pd USING (processed_data_status_id)
156
              JOIN qiita.study_processed_data spd USING (processed_data_id)
157
           WHERE spd.study_id = %s"""
158
  sql = "SELECT * FROM qiita.qiita_user"
159
  ```
160
3. Use PEP8-style indentation for the SQL queries, as it will improve readability. Common SQL best-practices recommend to have a new line for the `SELECT`, `FROM`, `WHERE` and similar clauses:
161
  * Wrong:
162
  ```python
163
  sql = """SELECT udt_name FROM information_schema.columns WHERE
164
           column_name = %s AND table_schema = 'qiita' AND (table_name = %s
165
           OR table_name = %s)"""
166
  ```
167
  * Correct:
168
  ```python
169
  sql = """SELECT udt_name
170
           FROM information_schema.columns
171
           WHERE column_name = %s AND table_schema = 'qiita'
172
               AND (table_name = %s OR table_name = %s)"""
173
  ```
174
4. Never, NEVER, use python string formatting to complete the SQL query parameters. Use the `sql_args` parameter from the transaction object. This is a strong recommendation from the psycopg2 developers to avoid SQL injection attacks (detailed explanation [here](http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters)):
175
  * Wrong:
176
  ```python
177
  sql = """SELECT processed_data_status
178
           FROM qiita.processed_data_status pds
179
              JOIN qiita.processed_data pd USING (processed_data_status_id)
180
              JOIN qiita.study_processed_data spd USING (processed_data_id)
181
           WHERE spd.study_id = %s""" % study.id
182
  with TRN:
183
      TRN.add(sql)
184
  ```
185
  * Correct:
186
  ```python
187
  sql = """SELECT processed_data_status
188
           FROM qiita.processed_data_status pds
189
              JOIN qiita.processed_data pd USING (processed_data_status_id)
190
              JOIN qiita.study_processed_data spd USING (processed_data_id)
191
           WHERE spd.study_id = %s"""
192
  with TRN:
193
      TRN.add(sql, [study.id])
194
  ```
195
5. However, python string formatting is allowed to provide table or column names, although this should be done through the `str.format` function. Table or column names as parameters are not supported by psycopg2. Using `str.format` is desirable because if you need to pass parameters to the SQL statement, the python string formatting will fail (see second example below):
196
  * Wrong:
197
  ```python
198
  table = "qiita_user"
199
  sql = "SELECT * FROM qiita.%s" % table
200
201
  # This will fail during execution with the following error:
202
  # TypeError: not enough arguments for format string
203
  table = "qiita_user"
204
  sql = "SELECT * FROM qiita.%s WHERE email = %s" % table
205
  ```
206
  * Correct:
207
  ```python
208
  table = "qiita_user"
209
  sql = "SELECT * FROM qiita.{0}".format(table)
210
  ```
211
6. The SQL command should be set up in a variable and use this variable as parameter to the `TRN.add` method, rather than defining the SQL statement in the method itself, unless the statement is short and fits in a single line:
212
  * Wrong:
213
  ```python
214
  TRN.add("""SELECT processed_data_status
215
           FROM qiita.processed_data_status pds
216
              JOIN qiita.processed_data pd USING (processed_data_status_id)
217
              JOIN qiita.study_processed_data spd USING (processed_data_id)
218
           WHERE spd.study_id = %s""", [study.id])
219
  ```
220
  * Correct:
221
  ```python
222
  sql = """SELECT processed_data_status
223
           FROM qiita.processed_data_status pds
224
              JOIN qiita.processed_data pd USING (processed_data_status_id)
225
              JOIN qiita.study_processed_data spd USING (processed_data_id)
226
           WHERE spd.study_id = %s"""
227
  TRN.add(sql, [study.id])
228
  TRN.add("SELECT * FROM qiita.qiita_user WHERE email=%s", [user.id])
229
  ```