|
a |
|
b/CONTRIBUTING.md |
|
|
1 |
# Contributing to medaCy |
|
|
2 |
MedaCy seeks to create a unified platform to streamline research efforts in medical text mining while also providing an |
|
|
3 |
interface to easily apply models to real world problems. Due to this, contributions to medaCy are often consequences |
|
|
4 |
and direct by-products of active research projects. However, if not for the contributions, bug fixes/reports, |
|
|
5 |
and suggestions of practioners - medaCy could not grow and thrive. |
|
|
6 |
|
|
|
7 |
This contribution guide is designed to inform: |
|
|
8 |
|
|
|
9 |
1. **Researchers** in how they can efficiently utilize medaCy to make their work more reachable by practioners. |
|
|
10 |
2. **Practioners** in how they can tune medaCy's cutting-edge functionalities to their specific application. |
|
|
11 |
|
|
|
12 |
## Table of contents |
|
|
13 |
1. [Issues and Bug Reports](#issues-and-bug-reports) |
|
|
14 |
2. [Development Set-up](#development-environment-setup) |
|
|
15 |
3. [Running Unit Tests](#running-unit-tests) |
|
|
16 |
|
|
|
17 |
## Issues And Bug Reports |
|
|
18 |
Please do a search before posting an issue/bug report - your problem may already be solved! If your search comes up for |
|
|
19 |
not - congratulations, you may have something to contribute! |
|
|
20 |
|
|
|
21 |
## Development Environment Setup |
|
|
22 |
At it's most basic one can fork medaCy, clone down their fork, and use their favorite text editor to develop. |
|
|
23 |
However, some up-front set-up effort goes a long way towards streamlining the contribution process and keeping organized. |
|
|
24 |
This section details a suggested set-up for efficient development, testing, and experimentation with medaCy utilizing |
|
|
25 |
[PyCharm](https://www.jetbrains.com/pycharm/). |
|
|
26 |
|
|
|
27 |
**Assumptions of this section:** |
|
|
28 |
- You are working in a UNIX based operating system. |
|
|
29 |
- Part 2 assumes you have Pycharm Professional installed - Pycharm Professional is provided with the Jetbrains |
|
|
30 |
University License. (this isn't entirely necessary but the useful Remote Host feature is disabled on the Community Edition) |
|
|
31 |
|
|
|
32 |
**Part 1: Development Installation** |
|
|
33 |
1. If you are shaky with git - [this link](https://nvie.com/posts/a-successful-git-branching-model/) provides an |
|
|
34 |
excellent description of the branching model medaCy follows to organize contributions. |
|
|
35 |
2. Fork medaCy and copy the clone link. |
|
|
36 |
3. On your machine, insure you have Python 3 installed. Set-up a [virtual environment](https://docs.python.org/3/library/venv.html) |
|
|
37 |
and activate it. |
|
|
38 |
4. Run the bash commands: `python --version` and `pip list`. Upgrade pip to the latest version as suggested. |
|
|
39 |
Your python version should be above 3.4 and your installed packages should be few in number - if both of these |
|
|
40 |
conditions do not hold return to *Step 3*. |
|
|
41 |
5. In a directory separate from the one created by the virtual envirorment set-up command, clone down your fork of medaCy. |
|
|
42 |
6. Whilst inside your cloned fork, insure you are in at-least the *development* branch or a branch of the *development* branch. |
|
|
43 |
This can be verified by running `git status` and branching can be done with `git checkout <branch-name>` |
|
|
44 |
7. Run `pip install -e .` This will install medaCy in editable mode inside of your virtual environment and will take |
|
|
45 |
several minutes to install dependencies - medaCy stands on the shoulders of giants! Errors one is likely to encounter |
|
|
46 |
here include the installation of sci-py and numpy. Google search the errors as they are easily fixable via the installation |
|
|
47 |
of some extra dependencies. Likely, your python installation is missing C headers required by scipy. |
|
|
48 |
|
|
|
49 |
**Part 2: Developing with PyCharm** |
|
|
50 |
PyCharm can streamline development efforts - especially if you are developing locally and running medaCy on a remote |
|
|
51 |
machine for model building. |
|
|
52 |
|
|
|
53 |
**Part 3: Logging** |
|
|
54 |
|
|
|
55 |
MedaCy uses the [logging](https://docs.python.org/3/howto/logging.html#logging-basic-tutorial) module to allow users |
|
|
56 |
insight into how medaCy is handling their data. Insure you are logging critical steps in any functionality you implement |
|
|
57 |
at the appropriate logging levels to make it easy for users to debug. |
|
|
58 |
|
|
|
59 |
## Running Unit Tests |
|
|
60 |
All components of medaCy have associated unit tests. Please insure these all pass before submitting pull requests. |
|
|
61 |
When medaCy runs unit tests, it first automatically installs the [END dataset](https://github.com/NanoNLP/medaCy_dataset_end) |
|
|
62 |
then uses it to test various functionalities of the package. Some tests involve building a model over the dataset - these |
|
|
63 |
may take some time to complete. |
|
|
64 |
|
|
|
65 |
After installing medaCy for development, make sure that `pytest` is installed. Then: |
|
|
66 |
|
|
|
67 |
1) For quick testing of the whole framework, run: \ |
|
|
68 |
`python setup.py test`. |
|
|
69 |
1) For more fine-grained testing on individual files with colorful log output run: \ |
|
|
70 |
`pytest -s tests/tools/test_data_manager.py -o log_cli=True --log-cli-level=INFO`. |
|
|
71 |
|
|
|
72 |
This will show log output during tests and allow you to adust logging level for the test file being run. |
|
|
73 |
Read the pytest documentation for details. |
|
|
74 |
|
|
|
75 |
Note that some of the unit tests require knowledge about the configuration of your machine, and that those tests will |
|
|
76 |
be skipped if those configuration settings are not specified in the config.json file. These settings include |
|
|
77 |
the location of a MetaMap binary file on your machine, which GPU core to use for certain tests, and the location |
|
|
78 |
of a word embeddings file. It may be that your contributions will not affect functionality that depend on these features, |
|
|
79 |
however, all pull requests will be tested against the full unit test suite. |