|
a |
|
b/docs/explanation/using-ehrql-in-opensafely-projects.md |
|
|
1 |
This page describes how ehrQL fits in with a full OpenSAFELY project. |
|
|
2 |
|
|
|
3 |
In one sentence: |
|
|
4 |
|
|
|
5 |
> Researchers develop an ehrQL query and analysis code on their own computers |
|
|
6 |
> using dummy tables, |
|
|
7 |
> then submit it to the [OpenSAFELY jobs site](https://jobs.opensafely.org) |
|
|
8 |
> to run against real tables in an OpenSAFELY backend. |
|
|
9 |
|
|
|
10 |
## Project workflow summary |
|
|
11 |
|
|
|
12 |
The workflow for a single study using ehrQL is much like that for |
|
|
13 |
[existing studies that use cohort-extractor](https://docs.opensafely.org/workflow/). |
|
|
14 |
|
|
|
15 |
In summary: |
|
|
16 |
|
|
|
17 |
1. Create a Git repository from the template repository provided and clone it on your local machine. |
|
|
18 |
1. Write a dataset definition in ehrQL that specifies what data you want to extract from the database. |
|
|
19 |
**Only this step is specific to ehrQL.** |
|
|
20 |
1. Develop analysis scripts using [dummy datasets](#dummy-datasets) in R, Stata, or Python to process and analyse the dummy datasets created by ehrQL. |
|
|
21 |
1. Test the code by running the analysis steps specified in the [project pipeline](https://docs.opensafely.org/actions-pipelines/). |
|
|
22 |
1. Execute the analysis on the [real tables via OpenSAFELY's jobs site](#real-tables). This will generate outputs on the secure server. |
|
|
23 |
1. Check the [output for disclosivity within the server, and redact if necessary](https://docs.opensafely.org/releasing-files/). |
|
|
24 |
1. Release the [outputs on the jobs site](https://docs.opensafely.org/releasing-files/#2-requesting-release-of-outputs-from-the-server). |
|
|
25 |
|
|
|
26 |
## Dummy datasets |
|
|
27 |
|
|
|
28 |
Because OpenSAFELY doesn't allow researchers direct access to patient data, |
|
|
29 |
researchers must use dummy datasets for developing their analysis code on their own computer. |
|
|
30 |
|
|
|
31 |
When an ehrQL action is executed on a researcher's computer (see [Running ehrQL](../explanation/running-ehrql.md)), |
|
|
32 |
ehrQL can generate dummy datasets based on the properties of the tables used in the dataset definition. |
|
|
33 |
Alternatively, users can also provide their own dummy tables. |
|
|
34 |
|
|
|
35 |
This allows the dataset definition to be checked for errors, |
|
|
36 |
and produces dummy datasets that can be used to test downstream actions that depend on the output of the ehrQL action. |
|
|
37 |
|
|
|
38 |
## Real tables |
|
|
39 |
|
|
|
40 |
Executing a dataset definition against real tables in an OpenSAFELY backend involves running the study on the |
|
|
41 |
[OpenSAFELY jobs site](https://jobs.opensafely.org). |
|
|
42 |
More information about the jobs site and how to run a study can be found in the |
|
|
43 |
[OpenSAFELY documentation](https://docs.opensafely.org/jobs-site/). |