Skip to content

Commit 92c1757

Browse files
committed
Allow people to just pass raw json strings to bulk helpers
This should allow for significant speedups when indexing json documents from a file
1 parent 5d4640d commit 92c1757

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

elasticsearch/helpers/__init__.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
from multiprocessing.dummy import Pool
55
from operator import methodcaller
66

7+
import six
8+
79
from ..exceptions import ElasticsearchException, TransportError
810
from ..compat import map
911

@@ -25,6 +27,10 @@ def expand_action(data):
2527
action/data lines needed for elasticsearch's
2628
:meth:`~elasticsearch.Elasticsearch.bulk` api.
2729
"""
30+
# when given a string, assume user wants to index raw json
31+
if isinstance(data, six.string_types):
32+
return '{"index": {}}', data
33+
2834
# make sure we don't alter the action
2935
data = data.copy()
3036
op_type = data.pop('_op_type', 'index')
@@ -155,6 +161,11 @@ def streaming_bulk(client, actions, chunk_size=500, max_chunk_bytes=100 * 1014 *
155161
Alternatively, if `_source` is not present, it will pop all metadata fields
156162
from the doc and use the rest as the document data.
157163
164+
When reading raw json strings from a file, you can also pass them in. In
165+
that case, however, you lose the ability to specify anything (index, type,
166+
even id) on a per-record basis, all documents will just be sent to
167+
elasticsearch to be indexed as-is.
168+
158169
The :meth:`~elasticsearch.Elasticsearch.bulk` api accepts `index`, `create`,
159170
`delete`, and `update` actions. Use the `_op_type` field to specify an
160171
action (`_op_type` defaults to `index`)::

0 commit comments

Comments
 (0)