a b/docs/concepts/inference.md
1
# Inference
2
3
Once you have obtained a pipeline, either by composing rule-based components, training a model or loading a model from the disk, you can use it to make predictions on documents. This is referred to as inference. This page answers the following questions :
4
5
> How do we leverage computational resources run a model on many documents?
6
7
> How do we connect to various data sources to retrieve documents?
8
9
Be sure to check out the [Processing multiple texts](/tutorials/multiple-texts) tutorial for a practical example of how to use EDS-NLP to process large datasets.
10
11
## Inference on a single document
12
13
In EDS-NLP, computing the prediction on a single document is done by calling the pipeline on the document. The input can be either:
14
15
- a text string
16
- or a [Doc](https://spacy.io/api/doc) object
17
18
```{ .python .no-check }
19
from pathlib import Path
20
21
nlp = ...
22
text = "... my text ..."
23
doc = nlp(text)
24
```
25
26
If you're lucky enough to have a GPU, you can use it to speed up inference by moving the model to the GPU before calling the pipeline.
27
28
```{ .python .no-check }
29
nlp.to("cuda")  # same semantics as pytorch
30
doc = nlp(text)
31
```
32
33
To leverage multiple GPUs when processing multiple documents, refer to the [multiprocessing backend][edsnlp.processing.multiprocessing.execute_multiprocessing_backend] description below.
34
35
## Streams {: #edsnlp.core.stream.Stream }
36
37
When processing multiple documents, we can optimize the inference by parallelizing the computation on a single core, multiple cores and GPUs or even multiple machines.
38
39
These optimizations are enabled by performing *lazy inference* : the operations (e.g., reading a document, converting it to a Doc, running the different pipes of a model or writing the result somewhere) are not executed immediately but are instead scheduled in a [Stream][edsnlp.core.stream.Stream] object. It can then be executed by calling the `execute` method, iterating over it or calling a writing method (e.g., `to_pandas`). In fact, data connectors like `edsnlp.data.read_json` return a stream, as well as the `nlp.pipe` method.
40
41
A stream contains :
42
43
- a `reader`: the source of the data (e.g., a file, a database, a list of strings, etc.)
44
- the list of operations to perform (`stream.ops`) that contain the function / pipe, keyword arguments and context for each operation
45
- an optional `writer`: the destination of the data (e.g., a file, a database, a list of strings, etc.)
46
- the execution `config`, containing the backend to use and its configuration such as the number of workers, the batch size, etc.
47
48
All methods (`map()`, `map_batches()`, `map_gpu()`, `map_pipeline()`, `set_processing()`) of the stream are chainable, meaning that they return a new stream object (no in-place modification).
49
50
For instance, the following code will load a model, read a folder of JSON files, apply the model to each document and write the result in a Parquet folder, using 4 CPUs and 2 GPUs.
51
52
```{ .python .no-check }
53
import edsnlp
54
55
# Load or create a model
56
nlp = edsnlp.load("path/to/model")
57
58
# Read some data (this is lazy, no data will be read until the end of of this snippet)
59
data = edsnlp.data.read_json("path/to/json_folder", converter="...")
60
61
# Apply each pipe of the model to our documents and split the data
62
# into batches such that each contains at most 100 000 padded words
63
# (padded_words = max doc size in batch * batch size)
64
data = data.map_pipeline(
65
    nlp,
66
    # optional arguments
67
    batch_size=100_000,
68
    batch_by="padded_words",
69
)
70
# or equivalently : data = nlp.pipe(data, batch_size=100_000, batch_by="padded_words")
71
72
# Sort the documents in chunks of 1024 documents
73
data = data.map_batches(
74
    lambda batch: sorted(batch, key=lambda doc: len(doc)),
75
    batch_size=1024,
76
)
77
78
data = data.map_batches(
79
    # Apply a function to each batch of documents
80
    lambda batch: [doc._.my_custom_attribute for doc in batch]
81
)
82
83
# Configure the execution
84
data = data.set_processing(
85
    # 4 CPUs to parallelize rule-based pipes, IO and preprocessing
86
    num_cpu_workers=4,
87
    # 2 GPUs to accelerate deep-learning pipes
88
    num_gpu_workers=2,
89
    # Show the progress bar
90
    show_progress=True,
91
)
92
93
# Write the result, this will execute the stream
94
data.write_parquet("path/to/output_folder", converter="...", write_in_worker=True)
95
```
96
97
Streams support a variety of operations, such as applying a function to each element of the stream, batching the elements, applying a model to the elements, etc. In each case, the operations will not be executed immediately but will be scheduled to be executed when iterating of the collection, or calling the `execute()`, `to_*()` or `write_*()` methods.
98
99
### `map()` {: #edsnlp.core.stream.Stream.map }
100
101
::: edsnlp.core.stream.Stream.map
102
    options:
103
        sections: ['text', 'parameters']
104
        header: false
105
        show_source: false
106
107
### `map_batches()` {: #edsnlp.core.stream.Stream.map_batches }
108
109
To apply an operation to a stream in batches, you can use the `map_batches()` method. It takes a callable as input, an optional dictionary of keyword arguments and batching arguments.
110
111
::: edsnlp.core.stream.Stream.map_batches
112
    options:
113
        heading_level: 3
114
        sections: ['text', 'parameters']
115
        header: false
116
        show_source: false
117
118
### `map_pipeline()` {: #edsnlp.core.stream.Stream.map_pipeline }
119
120
::: edsnlp.core.stream.Stream.map_pipeline
121
    options:
122
        heading_level: 3
123
        sections: ['text', 'parameters']
124
        header: false
125
        show_source: false
126
127
### `map_gpu()` {: #edsnlp.core.stream.Stream.map_gpu }
128
129
::: edsnlp.core.stream.Stream.map_gpu
130
    options:
131
        heading_level: 3
132
        sections: ['text', 'parameters']
133
        header: false
134
        show_source: false
135
136
### `loop()` {: #edsnlp.core.stream.Stream.loop }
137
138
::: edsnlp.core.stream.Stream.loop
139
    options:
140
        heading_level: 3
141
        sections: ['text', 'parameters']
142
        header: false
143
        show_source: false
144
145
### `shuffle()` {: #edsnlp.core.stream.Stream.shuffle }
146
147
::: edsnlp.core.stream.Stream.shuffle
148
    options:
149
        heading_level: 3
150
        sections: ['text', 'parameters']
151
        header: false
152
        show_source: false
153
154
### Configure the execution with `set_processing()` {: #edsnlp.core.stream.Stream.set_processing }
155
156
You can configure how the operations performed in the stream are executed by calling its `set_processing(...)` method. The following options are available :
157
158
::: edsnlp.core.stream.Stream.set_processing
159
    options:
160
        heading_level: 3
161
        sections: ['text', 'parameters']
162
        header: false
163
        show_source: false
164
165
## Backends
166
167
The `backend` parameter of the `set_processing` supports the following values:
168
169
### `simple` {: #edsnlp.processing.simple.execute_simple_backend }
170
171
::: edsnlp.processing.simple.execute_simple_backend
172
    options:
173
        heading_level: 3
174
        show_source: false
175
176
### `multiprocessing` {: #edsnlp.processing.multiprocessing.execute_multiprocessing_backend }
177
178
::: edsnlp.processing.multiprocessing.execute_multiprocessing_backend
179
    options:
180
        heading_level: 3
181
        show_source: false
182
183
### `spark` {: #edsnlp.processing.spark.execute_spark_backend }
184
185
::: edsnlp.processing.spark.execute_spark_backend
186
    options:
187
        heading_level: 3
188
        show_source: false
189
190
## Batching
191
192
Many operations rely on batching, either to be more efficient or because they require a fixed-size input. The `batch_size` and `batch_by` argument of the `map_batches()` method allows you to specify the size of the batches and what function to use to compute the size of the batches.
193
194
```{ .python .no-check }
195
# Accumulate in chunks of 1024 documents
196
lengths = data.map_batches(len, batch_size=1024)
197
198
# Accumulate in chunks of 100 000 words
199
lengths = data.map_batches(len, batch_size=100_000, batch_by="words")
200
# or
201
lengths = data.map_batches(len, batch_size="100_000 words")
202
```
203
204
We also support special values for `batch_size` which use "sentinels" (i.e. markers inserted in the stream) to delimit the batches.
205
206
```{ .python .no-check }
207
# Accumulate every element of the input in a single batch
208
# which is useful when looping over the data in training
209
lengths = data.map_batches(len, batch_size="dataset")
210
211
# Accumulate in chunks of fragments, in the case of parquet datasets
212
lengths = data.map_batches(len, batch_size="fragments")
213
```
214
215
Note that these batch functions are only available under specific conditions:
216
217
- either `backend="simple"` or `deterministic=True` (default) if `backend="multiprocessing"`, otherwise elements might be processed out of order
218
- if every op before was elementwise (e.g. `map()`, `map_gpu()`, `map_pipeline()` and no generator function), or `sentinel_mode` was explicitly set to `"split"` in `map_batches()`, otherwise the sentinel are dropped by default when the user requires batching.