- Notifications
You must be signed in to change notification settings - Fork 19.7k
Description
System Info
langchain/document_loaders/web_base.py > works for me only when i change:
return await response.text() with:
body = await response.read() return body.decode('utf-8', errors='ignore') otherwise:
der Code produziert leider einen Fehler:
/home/codespace/.py
thon/current/bin/python3 /workspaces/b3rn_zero_ai/notebooks/ignite_vectorstore.py
Fetching pages: 13%|###8 | 33/256 [00:03<00:19, 11.18it/s]Traceback (most recent call last):
File "/workspaces/b3rn_zero_ai/notebooks/ignite_vectorstore.py", line 68, in
documents = loader.load()
File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/document_loaders/sitemap.py", line 142, in load
results = self.scrape_all([el["loc"].strip() for el in els if "loc" in el])
File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/document_loaders/web_base.py", line 168, in scrape_all
results = asyncio.run(self.fetch_all(urls))
File "/home/codespace/.local/lib/python3.10/site-packages/nest_asyncio.py", line 35, in run
return loop.run_until_complete(task)
File "/home/codespace/.local/lib/python3.10/site-packages/nest_asyncio.py", line 90, in run_until_complete
return f.result()
File "/home/codespace/.python/current/lib/python3.10/asyncio/futures.py", line 201, in result
raise self._exception.with_traceback(self._exception_tb)
File "/home/codespace/.python/current/lib/python3.10/asyncio/tasks.py", line 232, in __step
result = coro.send(None)
File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/document_loaders/web_base.py", line 148, in fetch_all
return await tqdm_asyncio.gather(
File "/home/codespace/.python/current/lib/python3.10/site-packages/tqdm/asyncio.py", line 79, in gather
res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
File "/home/codespace/.python/current/lib/python3.10/site-packages/tqdm/asyncio.py", line 79, in
res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
File "/home/codespace/.python/current/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
return f.result() # May raise f.exception().
File "/home/codespace/.python/current/lib/python3.10/asyncio/futures.py", line 201, in result
raise self._exception.with_traceback(self._exception_tb)
File "/home/codespace/.python/current/lib/python3.10/asyncio/tasks.py", line 234, in __step
result = coro.throw(exc)
File "/home/codespace/.python/current/lib/python3.10/site-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
return i, await f
File "/home/codespace/.python/current/lib/python3.10/asyncio/futures.py", line 285, in await
yield self # This tells Task to wait for completion.
File "/home/codespace/.python/current/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
future.result()
File "/home/codespace/.python/current/lib/python3.10/asyncio/futures.py", line 201, in result
raise self._exception.with_traceback(self._exception_tb)
File "/home/codespace/.python/current/lib/python3.10/asyncio/tasks.py", line 232, in __step
result = coro.send(None)
File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/document_loaders/web_base.py", line 136, in _fetch_with_rate_limit
return await self._fetch(url)
File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/document_loaders/web_base.py", line 120, in _fetch
return await response.text()
File "/home/codespace/.python/current/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1086, in text
return self._body.decode( # type: ignore[no-any-return,union-attr]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 11: invalid start byte
Fetching pages: 15%|####4 | 38/256 [00:04<00:23, 9.25it/s]
Who can help?
No response
Information
- The official example notebooks/scripts
- My own modified scripts
Related Components
- LLMs/Chat Models
- Embedding Models
- Prompts / Prompt Templates / Prompt Selectors
- Output Parsers
- Document Loaders
- Vector Stores / Retrievers
- Memory
- Agents / Agent Executors
- Tools / Toolkits
- Chains
- Callbacks/Tracing
- Async
Reproduction
I tried to make embedding from a website in "french" language.
Expected behavior
we need a solution when : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 11: invalid start byte