Skip to content

Conversation

@andreip
Copy link

@andreip andreip commented Dec 14, 2015

Description

I've called elasticsearch.helpers.reindex with a custom request_timeout present in both scan_kwargs and bulk_kwargs. But that's not sent down to client.scroll which would send it to self.transport.perform_request which would then use it.

The first call from scan helper does forward the **kwargs: client.search(body=query, scroll=scroll, **kwargs), so I'd say client.scroll should as well. Because this wasn't happening, I eventually got a timeout near the end of a huge reindexing with the below output, even though I supplied a request_timeout=600.

.... File "/Users/andrei/.virtualenvs/genesis/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 342, in _change_doc_index for h in hits: File "/Users/andrei/.virtualenvs/genesis/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 283, in scan resp = client.scroll(scroll_id, scroll=scroll) File "/Users/andrei/.virtualenvs/genesis/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) File "/Users/andrei/.virtualenvs/genesis/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 662, in scroll params=params, body=body) File "/Users/andrei/.virtualenvs/genesis/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/Users/andrei/.virtualenvs/genesis/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 86, in perform_request raise ConnectionTimeout('TIMEOUT', str(e), e) elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='139f46bdca40404618dec7e5a8ecad29.us-east-1.aws.found.io', port=9243): Read timed out. (read timeout=10))

Later edit

I saw the failing tests, **kwargs wasn't thought as being forwarded on scroll. And adding an inline case for forwarding only request_timeout there is a hack. But I was trying to make use of the fact that it's gonna be added to params in here. If there's an easy fix here I'd appreciate a suggestion though.

Sending `request_timeout` in **kwargs to scan or reindex helpers wouldn't be passed to `client.scroll` call, so that underlying `perform_request` would still use a default timeout=10.
@honzakral
Copy link
Contributor

Thanks for this PR! It makes a lot of sense but we need to revise it a little bit - the search request supports more kwargs than the scroll api, passing the same kwargs to both can therefore be dangerous and might produce errors.

What do you think would be a better solution?

  • have a global_kwargs (or some better name) that would be passed to both + search_ and scroll_ kwargs?
  • pass the kwargs to search and add another kwarg called scan_kwargs (similar to what reindex does)
  • try to detect which kwarg goes where (my least favorite option since it would tie it to specific api version and would make maintenance more difficult)
@andreip
Copy link
Author

andreip commented Jan 1, 2016

@honzakral I followed your first suggestion, that seemed a clean way to go for me. But now you can send kwargs to the client.search call in multiple ways:

  • the old way of calling scan was by forwarding all from **kwargs of the scan method to the client.search call
  • the new search_kwargs kwarg I added (due to symmetry and being more clean, since I also added scroll_kwargs and global_kwargs to the scan method)

If I were to remove the old way of sending parameters to the client.search (since having multiple ways might create confusion?), then that would be fine for this repo, only a few modifications I'd have to do, but the previous code of people using the library might not work with new release. Should I leave it backwards compatible as is now?

@dtran320
Copy link

👍 for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these need to be reversed - start with global_kwargs.copy() and then do .update(search_kwargs) - that way search_kwargs will have priority and will be able to override global_kwargs. Other way might be counter intuitive.

@honzakral
Copy link
Contributor

Sorry for the delay (and thanks to @dtran320 for the ping!). I left some line notes to clear things up a bit. I am more than happy to implement these but would like your opinion.

Thanks!

@fxdgear
Copy link
Contributor

fxdgear commented Sep 8, 2017

bum @andreip this PR is pretty out of date and has some merge conflicts. Would you be willing to update this?

@andreip
Copy link
Author

andreip commented Sep 30, 2017

@fxdgear thanks for the ping! I see that @honzakral fixed the main problem in d4efb81 . This is a great short fix, that was the issue I had too, only scroll wasn't using request_timeout; bulk was forwarding that properly to transport.perform_request.

Gonna close this. 👍

@andreip andreip closed this Sep 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants