|
a |
|
b/docs/concepts/inference.md |
|
|
1 |
# Inference |
|
|
2 |
|
|
|
3 |
Once you have obtained a pipeline, either by composing rule-based components, training a model or loading a model from the disk, you can use it to make predictions on documents. This is referred to as inference. This page answers the following questions : |
|
|
4 |
|
|
|
5 |
> How do we leverage computational resources run a model on many documents? |
|
|
6 |
|
|
|
7 |
> How do we connect to various data sources to retrieve documents? |
|
|
8 |
|
|
|
9 |
Be sure to check out the [Processing multiple texts](/tutorials/multiple-texts) tutorial for a practical example of how to use EDS-NLP to process large datasets. |
|
|
10 |
|
|
|
11 |
## Inference on a single document |
|
|
12 |
|
|
|
13 |
In EDS-NLP, computing the prediction on a single document is done by calling the pipeline on the document. The input can be either: |
|
|
14 |
|
|
|
15 |
- a text string |
|
|
16 |
- or a [Doc](https://spacy.io/api/doc) object |
|
|
17 |
|
|
|
18 |
```{ .python .no-check } |
|
|
19 |
from pathlib import Path |
|
|
20 |
|
|
|
21 |
nlp = ... |
|
|
22 |
text = "... my text ..." |
|
|
23 |
doc = nlp(text) |
|
|
24 |
``` |
|
|
25 |
|
|
|
26 |
If you're lucky enough to have a GPU, you can use it to speed up inference by moving the model to the GPU before calling the pipeline. |
|
|
27 |
|
|
|
28 |
```{ .python .no-check } |
|
|
29 |
nlp.to("cuda") # same semantics as pytorch |
|
|
30 |
doc = nlp(text) |
|
|
31 |
``` |
|
|
32 |
|
|
|
33 |
To leverage multiple GPUs when processing multiple documents, refer to the [multiprocessing backend][edsnlp.processing.multiprocessing.execute_multiprocessing_backend] description below. |
|
|
34 |
|
|
|
35 |
## Streams {: #edsnlp.core.stream.Stream } |
|
|
36 |
|
|
|
37 |
When processing multiple documents, we can optimize the inference by parallelizing the computation on a single core, multiple cores and GPUs or even multiple machines. |
|
|
38 |
|
|
|
39 |
These optimizations are enabled by performing *lazy inference* : the operations (e.g., reading a document, converting it to a Doc, running the different pipes of a model or writing the result somewhere) are not executed immediately but are instead scheduled in a [Stream][edsnlp.core.stream.Stream] object. It can then be executed by calling the `execute` method, iterating over it or calling a writing method (e.g., `to_pandas`). In fact, data connectors like `edsnlp.data.read_json` return a stream, as well as the `nlp.pipe` method. |
|
|
40 |
|
|
|
41 |
A stream contains : |
|
|
42 |
|
|
|
43 |
- a `reader`: the source of the data (e.g., a file, a database, a list of strings, etc.) |
|
|
44 |
- the list of operations to perform (`stream.ops`) that contain the function / pipe, keyword arguments and context for each operation |
|
|
45 |
- an optional `writer`: the destination of the data (e.g., a file, a database, a list of strings, etc.) |
|
|
46 |
- the execution `config`, containing the backend to use and its configuration such as the number of workers, the batch size, etc. |
|
|
47 |
|
|
|
48 |
All methods (`map()`, `map_batches()`, `map_gpu()`, `map_pipeline()`, `set_processing()`) of the stream are chainable, meaning that they return a new stream object (no in-place modification). |
|
|
49 |
|
|
|
50 |
For instance, the following code will load a model, read a folder of JSON files, apply the model to each document and write the result in a Parquet folder, using 4 CPUs and 2 GPUs. |
|
|
51 |
|
|
|
52 |
```{ .python .no-check } |
|
|
53 |
import edsnlp |
|
|
54 |
|
|
|
55 |
# Load or create a model |
|
|
56 |
nlp = edsnlp.load("path/to/model") |
|
|
57 |
|
|
|
58 |
# Read some data (this is lazy, no data will be read until the end of of this snippet) |
|
|
59 |
data = edsnlp.data.read_json("path/to/json_folder", converter="...") |
|
|
60 |
|
|
|
61 |
# Apply each pipe of the model to our documents and split the data |
|
|
62 |
# into batches such that each contains at most 100 000 padded words |
|
|
63 |
# (padded_words = max doc size in batch * batch size) |
|
|
64 |
data = data.map_pipeline( |
|
|
65 |
nlp, |
|
|
66 |
# optional arguments |
|
|
67 |
batch_size=100_000, |
|
|
68 |
batch_by="padded_words", |
|
|
69 |
) |
|
|
70 |
# or equivalently : data = nlp.pipe(data, batch_size=100_000, batch_by="padded_words") |
|
|
71 |
|
|
|
72 |
# Sort the documents in chunks of 1024 documents |
|
|
73 |
data = data.map_batches( |
|
|
74 |
lambda batch: sorted(batch, key=lambda doc: len(doc)), |
|
|
75 |
batch_size=1024, |
|
|
76 |
) |
|
|
77 |
|
|
|
78 |
data = data.map_batches( |
|
|
79 |
# Apply a function to each batch of documents |
|
|
80 |
lambda batch: [doc._.my_custom_attribute for doc in batch] |
|
|
81 |
) |
|
|
82 |
|
|
|
83 |
# Configure the execution |
|
|
84 |
data = data.set_processing( |
|
|
85 |
# 4 CPUs to parallelize rule-based pipes, IO and preprocessing |
|
|
86 |
num_cpu_workers=4, |
|
|
87 |
# 2 GPUs to accelerate deep-learning pipes |
|
|
88 |
num_gpu_workers=2, |
|
|
89 |
# Show the progress bar |
|
|
90 |
show_progress=True, |
|
|
91 |
) |
|
|
92 |
|
|
|
93 |
# Write the result, this will execute the stream |
|
|
94 |
data.write_parquet("path/to/output_folder", converter="...", write_in_worker=True) |
|
|
95 |
``` |
|
|
96 |
|
|
|
97 |
Streams support a variety of operations, such as applying a function to each element of the stream, batching the elements, applying a model to the elements, etc. In each case, the operations will not be executed immediately but will be scheduled to be executed when iterating of the collection, or calling the `execute()`, `to_*()` or `write_*()` methods. |
|
|
98 |
|
|
|
99 |
### `map()` {: #edsnlp.core.stream.Stream.map } |
|
|
100 |
|
|
|
101 |
::: edsnlp.core.stream.Stream.map |
|
|
102 |
options: |
|
|
103 |
sections: ['text', 'parameters'] |
|
|
104 |
header: false |
|
|
105 |
show_source: false |
|
|
106 |
|
|
|
107 |
### `map_batches()` {: #edsnlp.core.stream.Stream.map_batches } |
|
|
108 |
|
|
|
109 |
To apply an operation to a stream in batches, you can use the `map_batches()` method. It takes a callable as input, an optional dictionary of keyword arguments and batching arguments. |
|
|
110 |
|
|
|
111 |
::: edsnlp.core.stream.Stream.map_batches |
|
|
112 |
options: |
|
|
113 |
heading_level: 3 |
|
|
114 |
sections: ['text', 'parameters'] |
|
|
115 |
header: false |
|
|
116 |
show_source: false |
|
|
117 |
|
|
|
118 |
### `map_pipeline()` {: #edsnlp.core.stream.Stream.map_pipeline } |
|
|
119 |
|
|
|
120 |
::: edsnlp.core.stream.Stream.map_pipeline |
|
|
121 |
options: |
|
|
122 |
heading_level: 3 |
|
|
123 |
sections: ['text', 'parameters'] |
|
|
124 |
header: false |
|
|
125 |
show_source: false |
|
|
126 |
|
|
|
127 |
### `map_gpu()` {: #edsnlp.core.stream.Stream.map_gpu } |
|
|
128 |
|
|
|
129 |
::: edsnlp.core.stream.Stream.map_gpu |
|
|
130 |
options: |
|
|
131 |
heading_level: 3 |
|
|
132 |
sections: ['text', 'parameters'] |
|
|
133 |
header: false |
|
|
134 |
show_source: false |
|
|
135 |
|
|
|
136 |
### `loop()` {: #edsnlp.core.stream.Stream.loop } |
|
|
137 |
|
|
|
138 |
::: edsnlp.core.stream.Stream.loop |
|
|
139 |
options: |
|
|
140 |
heading_level: 3 |
|
|
141 |
sections: ['text', 'parameters'] |
|
|
142 |
header: false |
|
|
143 |
show_source: false |
|
|
144 |
|
|
|
145 |
### `shuffle()` {: #edsnlp.core.stream.Stream.shuffle } |
|
|
146 |
|
|
|
147 |
::: edsnlp.core.stream.Stream.shuffle |
|
|
148 |
options: |
|
|
149 |
heading_level: 3 |
|
|
150 |
sections: ['text', 'parameters'] |
|
|
151 |
header: false |
|
|
152 |
show_source: false |
|
|
153 |
|
|
|
154 |
### Configure the execution with `set_processing()` {: #edsnlp.core.stream.Stream.set_processing } |
|
|
155 |
|
|
|
156 |
You can configure how the operations performed in the stream are executed by calling its `set_processing(...)` method. The following options are available : |
|
|
157 |
|
|
|
158 |
::: edsnlp.core.stream.Stream.set_processing |
|
|
159 |
options: |
|
|
160 |
heading_level: 3 |
|
|
161 |
sections: ['text', 'parameters'] |
|
|
162 |
header: false |
|
|
163 |
show_source: false |
|
|
164 |
|
|
|
165 |
## Backends |
|
|
166 |
|
|
|
167 |
The `backend` parameter of the `set_processing` supports the following values: |
|
|
168 |
|
|
|
169 |
### `simple` {: #edsnlp.processing.simple.execute_simple_backend } |
|
|
170 |
|
|
|
171 |
::: edsnlp.processing.simple.execute_simple_backend |
|
|
172 |
options: |
|
|
173 |
heading_level: 3 |
|
|
174 |
show_source: false |
|
|
175 |
|
|
|
176 |
### `multiprocessing` {: #edsnlp.processing.multiprocessing.execute_multiprocessing_backend } |
|
|
177 |
|
|
|
178 |
::: edsnlp.processing.multiprocessing.execute_multiprocessing_backend |
|
|
179 |
options: |
|
|
180 |
heading_level: 3 |
|
|
181 |
show_source: false |
|
|
182 |
|
|
|
183 |
### `spark` {: #edsnlp.processing.spark.execute_spark_backend } |
|
|
184 |
|
|
|
185 |
::: edsnlp.processing.spark.execute_spark_backend |
|
|
186 |
options: |
|
|
187 |
heading_level: 3 |
|
|
188 |
show_source: false |
|
|
189 |
|
|
|
190 |
## Batching |
|
|
191 |
|
|
|
192 |
Many operations rely on batching, either to be more efficient or because they require a fixed-size input. The `batch_size` and `batch_by` argument of the `map_batches()` method allows you to specify the size of the batches and what function to use to compute the size of the batches. |
|
|
193 |
|
|
|
194 |
```{ .python .no-check } |
|
|
195 |
# Accumulate in chunks of 1024 documents |
|
|
196 |
lengths = data.map_batches(len, batch_size=1024) |
|
|
197 |
|
|
|
198 |
# Accumulate in chunks of 100 000 words |
|
|
199 |
lengths = data.map_batches(len, batch_size=100_000, batch_by="words") |
|
|
200 |
# or |
|
|
201 |
lengths = data.map_batches(len, batch_size="100_000 words") |
|
|
202 |
``` |
|
|
203 |
|
|
|
204 |
We also support special values for `batch_size` which use "sentinels" (i.e. markers inserted in the stream) to delimit the batches. |
|
|
205 |
|
|
|
206 |
```{ .python .no-check } |
|
|
207 |
# Accumulate every element of the input in a single batch |
|
|
208 |
# which is useful when looping over the data in training |
|
|
209 |
lengths = data.map_batches(len, batch_size="dataset") |
|
|
210 |
|
|
|
211 |
# Accumulate in chunks of fragments, in the case of parquet datasets |
|
|
212 |
lengths = data.map_batches(len, batch_size="fragments") |
|
|
213 |
``` |
|
|
214 |
|
|
|
215 |
Note that these batch functions are only available under specific conditions: |
|
|
216 |
|
|
|
217 |
- either `backend="simple"` or `deterministic=True` (default) if `backend="multiprocessing"`, otherwise elements might be processed out of order |
|
|
218 |
- if every op before was elementwise (e.g. `map()`, `map_gpu()`, `map_pipeline()` and no generator function), or `sentinel_mode` was explicitly set to `"split"` in `map_batches()`, otherwise the sentinel are dropped by default when the user requires batching. |