Skip to content

Conversation

@blob42
Copy link
Owner

@blob42 blob42 commented Feb 21, 2023

hwchase17 and others added 30 commits February 2, 2023 19:54
This does not involve a separator, and will naively chunk input text at the appropriate boundaries in token space. This is helpful if we have strict token length limits that we need to strictly follow the specified chunk size, and we can't use aggressive separators like spaces to guarantee the absence of long strings. CharacterTextSplitter will let these strings through without splitting them, which could cause overflow errors downstream. Splitting at arbitrary token boundaries is not ideal but is hopefully mitigated by having a decent overlap quantity. Also this results in chunks which has exact number of tokens desired, instead of sometimes overcounting if we concatenate shorter strings. Potentially also helps with langchain-ai#528.
add ability to retry when certain exceptions are raised by `openai.Completions.create` Test plan: ran all OpenAI integration tests.
Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com> Signed-off-by: Frank Liu <frank.liu@zilliz.com> Co-authored-by: Filip Haltmayer <81822489+filip-halt@users.noreply.github.com> Co-authored-by: Frank Liu <frank@frankzliu.com>
Just noticed this little typo while reading the docs, thought I'd open a PR!
The re.DOTALL flag in Python's re (regular expression) module makes the . (dot) metacharacter match newline characters as well as any other character. Without re.DOTALL, the . metacharacter only matches any character except for a newline character. With re.DOTALL, the . metacharacter matches any character, including newline characters.
Was passing prompt in directly as string and getting nonsense outputs. Had to inspect source code to realize that first arg should be a list. Could be nice if there was an explicit error or warning, seems like this could be a common mistake.
PR to fix outdated environment details in the docs, see issue langchain-ai#897 I added code comments as pointers to where users go to get API keys, and where they can find the relevant environment variable.
Fix for issue langchain-ai#906 Switches `[i : i + batch_size]` to `[i : i_end]` in Pinecone `from_texts` method
langchain-ai#899) This allows the LLM to correct its previous command by looking at the error message output to the shell. Additionally, this uses subprocess.run because that is now recommended over subprocess.check_output: https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module Co-authored-by: Amos Ng <me@amos.ng>
Basic integration test for pinecone
Co-authored-by: Eno Reyes <enoreyes@gmail.com>
Co-authored-by: Jon Luo <20971593+jzluo@users.noreply.github.com>
Co-authored-by: Gabriel Simmons <simmons.gabe@gmail.com>
nan-wang and others added 28 commits February 19, 2023 21:15
add missing links to toc --------- Signed-off-by: Nan Wang <nan.wang@jina.ai>
Co-authored-by: Michael Chen <flamingdescent@gmail.com> Co-authored-by: Michael Chen <michaelchen@stripe.com>
- fix notebook formatting, remove empty cells and add scrolling for long text --------- Co-authored-by: blob42 <spike@w530>
### Description This PR adds a wrapper which adds support for the OpenSearch vector database. Using opensearch-py client we are ingesting the embeddings of given text into opensearch cluster using Bulk API. We can perform the `similarity_search` on the index using the 3 popular searching methods of OpenSearch k-NN plugin: - `Approximate k-NN Search` use approximate nearest neighbor (ANN) algorithms from the [nmslib](https://github.com/nmslib/nmslib), [faiss](https://github.com/facebookresearch/faiss), and [Lucene](https://lucene.apache.org/) libraries to power k-NN search. - `Script Scoring` extends OpenSearch’s script scoring functionality to execute a brute force, exact k-NN search. - `Painless Scripting` adds the distance functions as painless extensions that can be used in more complex combinations. Also, supports brute force, exact k-NN search like Script Scoring. ### Issues Resolved langchain-ai#1054 --------- Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Lets a chain prompt the user for more input as a part of its execution.
Added a GitBook document loader. It lets you both, (1) fetch text from any single GitBook page, or (2) fetch all relative paths and return their respective content in Documents. I've modified the `scrape` method in the `WebBaseLoader` to accept custom web paths if given, but happy to remove it and move that logic into the `GitbookLoader` itself.
For persistence, it's convenient to have a default collection name which gets used everywhere.
langchain-ai#1153) It is useful to be able to specify `verbose` or `memory` while still keeping the chain's overall structure. --------- Co-authored-by: Francisco Ingham <>
Co-authored-by: OmriNach <32659330+OmriNach@users.noreply.github.com>
When I try to import the Class HuggingFaceEndpoint I get an Import Error: cannot import name 'HuggingFaceEndpoint' from 'langchain'. (langchain version 0.0.88) These two imports work fine: from langchain import HuggingFacePipeline and from langchain import HuggingFaceHub. So I corrected the import statement in the example. There is probably a better solution to this, but this fixes the Error for me.
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
conceptually, no reason a tool should know what an "agent action" is unless any objections, can change in all callback handlers
…ons (langchain-ai#1208) ### Summary Corrects the install instruction for local inference to `pip install "unstructured[local-inference]"`
@blob42 blob42 closed this Feb 21, 2023
blob42 pushed a commit that referenced this pull request May 4, 2023
without --no-sandbox param, load documents from url by selenium in chrome occured error below: ```Traceback (most recent call last): File "/data//playgroud/try_langchain.py", line 343, in <module> langchain_doc_loader() File "/data//playgroud/try_langchain.py", line 67, in langchain_doc_loader documents = loader.load() File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/langchain/document_loaders/url_selenium.py", line 102, in load driver = self._get_driver() File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/langchain/document_loaders/url_selenium.py", line 76, in _get_driver return Chrome(options=chrome_options) File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriver.py", line 80, in __init__ super().__init__( File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/chromium/webdriver.py", line 104, in __init__ super().__init__( File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 286, in __init__ self.start_session(capabilities, browser_profile) File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 378, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute self.error_handler.check_response(response) File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally. (unknown error: DevToolsActivePort file doesn't exist) (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.) Stacktrace: #0 0x55cf8da1bfe3 <unknown> #1 0x55cf8d75ad36 <unknown> langchain-ai#2 0x55cf8d783b20 <unknown> langchain-ai#3 0x55cf8d77fa9b <unknown> langchain-ai#4 0x55cf8d7c1af7 <unknown> langchain-ai#5 0x55cf8d7c111f <unknown> langchain-ai#6 0x55cf8d7b8693 <unknown> langchain-ai#7 0x55cf8d78b03a <unknown> langchain-ai#8 0x55cf8d78c17e <unknown> langchain-ai#9 0x55cf8d9dddbd <unknown> langchain-ai#10 0x55cf8d9e1c6c <unknown> langchain-ai#11 0x55cf8d9eb4b0 <unknown> langchain-ai#12 0x55cf8d9e2d63 <unknown> langchain-ai#13 0x55cf8d9b5c35 <unknown> langchain-ai#14 0x55cf8da06138 <unknown> langchain-ai#15 0x55cf8da062c7 <unknown> langchain-ai#16 0x55cf8da14093 <unknown> langchain-ai#17 0x7f3da31a72de start_thread ``` add option `chrome_options.add_argument("--no-sandbox")` for chrome.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment