Skip to content

Commit 22943e7

Browse files
committed
Move bulk helpers' docs outside of streaming_bulk's docstring
1 parent 50a9902 commit 22943e7

File tree

2 files changed

+84
-42
lines changed

2 files changed

+84
-42
lines changed

docs/helpers.rst

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,93 @@ Collection of simple helper functions that abstract some specifics or the raw
77
API.
88

99

10+
Bulk helpers
11+
------------
12+
13+
There are several helpers for the ``bulk`` API since it's requirement for
14+
specific formatting and other considerations can make it cumbersome if used directly.
15+
16+
All bulk helpers accept an instance of ``Elasticsearch`` class and an iterable
17+
``actions`` (any iterable, can also be a generator, which is ideal in most
18+
cases since it will allow you to index large datasets without the need of
19+
loading them into memory).
20+
21+
The items in the ``action`` iterable should be the documents we wish to index
22+
in several formats. The most common one is the same as returned by
23+
:meth:`~elasticsearch.Elasticsearch.search`, for example:
24+
25+
.. code:: python
26+
27+
{
28+
'_index': 'index-name',
29+
'_type': 'document',
30+
'_id': 42,
31+
'_parent': 5,
32+
'_ttl': '1d',
33+
'_source': {
34+
"title": "Hello World!",
35+
"body": "..."
36+
}
37+
}
38+
39+
Alternatively, if `_source` is not present, it will pop all metadata fields
40+
from the doc and use the rest as the document data:
41+
42+
.. code:: python
43+
44+
{
45+
"_id": 42,
46+
"_parent": 5,
47+
"title": "Hello World!",
48+
"body": "..."
49+
}
50+
51+
The :meth:`~elasticsearch.Elasticsearch.bulk` api accepts ``index``, ``create``,
52+
``delete``, and ``update`` actions. Use the ``_op_type`` field to specify an
53+
action (``_op_type`` defaults to ``index``):
54+
55+
.. code:: python
56+
57+
{
58+
'_op_type': 'delete',
59+
'_index': 'index-name',
60+
'_type': 'document',
61+
'_id': 42,
62+
}
63+
{
64+
'_op_type': 'update',
65+
'_index': 'index-name',
66+
'_type': 'document',
67+
'_id': 42,
68+
'doc': {'question': 'The life, universe and everything.'}
69+
}
70+
71+
72+
.. note::
73+
74+
When reading raw json strings from a file, you can also pass them in
75+
directly (without decoding to dicts first). In that case, however, you lose
76+
the ability to specify anything (index, type, even id) on a per-record
77+
basis, all documents will just be sent to elasticsearch to be indexed
78+
as-is.
79+
80+
1081
.. py:module:: elasticsearch.helpers
1182
1283
.. autofunction:: streaming_bulk
1384

85+
.. autofunction:: parallel_bulk
86+
1487
.. autofunction:: bulk
1588

89+
90+
Scan
91+
----
92+
1693
.. autofunction:: scan
1794

95+
96+
Reindex
97+
-------
98+
1899
.. autofunction:: reindex

elasticsearch/helpers/__init__.py

Lines changed: 3 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -142,45 +142,6 @@ def streaming_bulk(client, actions, chunk_size=500, max_chunk_bytes=100 * 1014 *
142142
bulk that returns summary information about the bulk operation once the
143143
entire input is consumed and sent.
144144
145-
This function expects the action to be in the format as returned by
146-
:meth:`~elasticsearch.Elasticsearch.search`, for example::
147-
148-
{
149-
'_index': 'index-name',
150-
'_type': 'document',
151-
'_id': 42,
152-
'_parent': 5,
153-
'_ttl': '1d',
154-
'_source': {
155-
...
156-
}
157-
}
158-
159-
Alternatively, if `_source` is not present, it will pop all metadata fields
160-
from the doc and use the rest as the document data.
161-
162-
When reading raw json strings from a file, you can also pass them in. In
163-
that case, however, you lose the ability to specify anything (index, type,
164-
even id) on a per-record basis, all documents will just be sent to
165-
elasticsearch to be indexed as-is.
166-
167-
The :meth:`~elasticsearch.Elasticsearch.bulk` api accepts `index`, `create`,
168-
`delete`, and `update` actions. Use the `_op_type` field to specify an
169-
action (`_op_type` defaults to `index`)::
170-
171-
{
172-
'_op_type': 'delete',
173-
'_index': 'index-name',
174-
'_type': 'document',
175-
'_id': 42,
176-
}
177-
{
178-
'_op_type': 'update',
179-
'_index': 'index-name',
180-
'_type': 'document',
181-
'_id': 42,
182-
'doc': {'question': 'The life, universe and everything.'}
183-
}
184145
185146
:arg client: instance of :class:`~elasticsearch.Elasticsearch` to use
186147
:arg actions: iterable containing the actions to be executed
@@ -208,8 +169,8 @@ def bulk(client, actions, stats_only=False, **kwargs):
208169
information - number of successfully executed actions and either list of
209170
errors or number of errors if `stats_only` is set to `True`.
210171
211-
See :func:`~elasticsearch.helpers.streaming_bulk` for more information
212-
and accepted formats.
172+
See :func:`~elasticsearch.helpers.streaming_bulk` for more accepted
173+
parameters
213174
214175
:arg client: instance of :class:`~elasticsearch.Elasticsearch` to use
215176
:arg actions: iterator containing the actions
@@ -240,7 +201,7 @@ def parallel_bulk(client, actions, thread_count=4, chunk_size=500,
240201
max_chunk_bytes=100 * 1014 * 1024,
241202
expand_action_callback=expand_action, **kwargs):
242203
"""
243-
Parallel version of the bulk helper.
204+
Parallel version of the bulk helper run in multiple threads at once.
244205
245206
:arg client: instance of :class:`~elasticsearch.Elasticsearch` to use
246207
:arg actions: iterator containing the actions

0 commit comments

Comments
 (0)