Contributing to medaCy

MedaCy seeks to create a unified platform to streamline research efforts in medical text mining while also providing an
interface to easily apply models to real world problems. Due to this, contributions to medaCy are often consequences
and direct by-products of active research projects. However, if not for the contributions, bug fixes/reports,
and suggestions of practioners - medaCy could not grow and thrive.

This contribution guide is designed to inform:

Researchers in how they can efficiently utilize medaCy to make their work more reachable by practioners.
Practioners in how they can tune medaCy's cutting-edge functionalities to their specific application.

Issues and Bug Reports
Development Set-up
Running Unit Tests

Issues And Bug Reports

Please do a search before posting an issue/bug report - your problem may already be solved! If your search comes up for
not - congratulations, you may have something to contribute!

Development Environment Setup

At it's most basic one can fork medaCy, clone down their fork, and use their favorite text editor to develop.
However, some up-front set-up effort goes a long way towards streamlining the contribution process and keeping organized.
This section details a suggested set-up for efficient development, testing, and experimentation with medaCy utilizing
PyCharm.

Assumptions of this section:
- You are working in a UNIX based operating system.
- Part 2 assumes you have Pycharm Professional installed - Pycharm Professional is provided with the Jetbrains
University License. (this isn't entirely necessary but the useful Remote Host feature is disabled on the Community Edition)

Part 1: Development Installation
1. If you are shaky with git - this link provides an
excellent description of the branching model medaCy follows to organize contributions.
2. Fork medaCy and copy the clone link.
3. On your machine, insure you have Python 3 installed. Set-up a virtual environment
and activate it.
4. Run the bash commands: python --version and pip list. Upgrade pip to the latest version as suggested.
Your python version should be above 3.4 and your installed packages should be few in number - if both of these
conditions do not hold return to Step 3.
5. In a directory separate from the one created by the virtual envirorment set-up command, clone down your fork of medaCy.
6. Whilst inside your cloned fork, insure you are in at-least the development branch or a branch of the development branch.
This can be verified by running git status and branching can be done with git checkout <branch-name>
7. Run pip install -e . This will install medaCy in editable mode inside of your virtual environment and will take
several minutes to install dependencies - medaCy stands on the shoulders of giants! Errors one is likely to encounter
here include the installation of sci-py and numpy. Google search the errors as they are easily fixable via the installation
of some extra dependencies. Likely, your python installation is missing C headers required by scipy.

Part 2: Developing with PyCharm
PyCharm can streamline development efforts - especially if you are developing locally and running medaCy on a remote
machine for model building.

Part 3: Logging

MedaCy uses the logging module to allow users
insight into how medaCy is handling their data. Insure you are logging critical steps in any functionality you implement
at the appropriate logging levels to make it easy for users to debug.

Running Unit Tests

All components of medaCy have associated unit tests. Please insure these all pass before submitting pull requests.
When medaCy runs unit tests, it first automatically installs the END dataset
then uses it to test various functionalities of the package. Some tests involve building a model over the dataset - these
may take some time to complete.

After installing medaCy for development, make sure that pytest is installed. Then:

1) For quick testing of the whole framework, run: \
python setup.py test.
1) For more fine-grained testing on individual files with colorful log output run: \
pytest -s tests/tools/test_data_manager.py -o log_cli=True --log-cli-level=INFO.

This will show log output during tests and allow you to adust logging level for the test file being run. 
Read the pytest documentation for details.

Note that some of the unit tests require knowledge about the configuration of your machine, and that those tests will
be skipped if those configuration settings are not specified in the config.json file. These settings include
the location of a MetaMap binary file on your machine, which GPU core to use for certain tests, and the location
of a word embeddings file. It may be that your contributions will not affect functionality that depend on these features,
however, all pull requests will be tested against the full unit test suite.

80 lines (64 with data), 5.3 kB

Contributing to medaCy

Table of contents

Issues And Bug Reports

Development Environment Setup

Running Unit Tests