[e988c2]: / docs / explanation / output-formats.md

Download this file

100 lines (68 with data), 2.5 kB

Supported output formats

The following output formats are supported:

  • .arrow — Apache Arrow format
  • .csv.gz — compressed CSV format
  • .csv — uncompressed CSV format

⚠️ The uncompressed CSV format is not recommended,
because this produces much larger files than the alternative formats.

Unsupported output formats

These formats were supported in cohort-extractor,
but are not by ehrQL

  • .dta and .dta.gz — Stata formats

arrowload for Stata users

Stata itself does not directly support .arrow.
However, OpenSAFELY's Stata Docker image contains the arrowload library
that can load .arrow files in Stata.

Use arrowload as:

. arrowload /path/to/arrow/file

See the full documentation via running command-line Stata via OpenSAFELY:

opensafely exec stata-mp stata

and then running

. help arrowload

Selecting an output format

You select an output format
when you use the --output option to specify an output filename for ehrQL.
The filename extension — for example, .arrow — that you provide determines the output format file.

If you specify a filename extension that is not supported,
you will get an error telling you so.

:notepad_spiral: If you omit the --output option,
the output is not saved to a file.
Instead, the output is displayed at the command line.

Examples with opensafely exec

.arrow

opensafely exec ehrql:v1 generate-dataset "./dataset-definition.py" --dummy-tables "example-data/" --output "./outputs/data_extract.arrow"

.csv.gz

opensafely exec ehrql:v1 generate-dataset "./dataset-definition.py" --dummy-tables "example-data/" --output "./outputs/data_extract.csv.gz"

Example project.yaml

version: "4.0"

actions:
  extract_data:
    run: ehrql:v1 generate-dataset "./dataset_definition.py" --output "outputs/data_extract.arrow"
    outputs:
      highly_sensitive:
        population: outputs/data_extract.arrow

⚠️ The population filename must be identical to the output filename specified by --output.
Otherwise you will see the following error when you use opensafely run
to run the project actions:

$ opensafely run run_all
=> ProjectValidationError
   Invalid project:
   1 validation error for Pipeline
   __root__
     --output in run command and outputs must match (type=value_error)