Skip to content

Conversation

@nfcampos
Copy link
Contributor

@nfcampos nfcampos commented Apr 3, 2023

No description provided.

Copy link
Collaborator

@agola11 agola11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing a generator on the callbackhandler via stream will work, but the main issue is that with the current API, we don't have control over which LLM will stream tokens. Several chains (like ConversationalRetrieval) make multiple calls to the llm, and we'd only want to stream back the results from the last one. I think there are couple things we can do:

  1. Add in the StreamCallbackHandler that @nfcampos introduced in this diff, but remove the llm parameter in the constructor to the LangChainPoeHandler class, replacing with the callback handler. The set-up for streaming is done outside of the PoeHandler as it will vary chain to chain. It would look something like this (for ConversationalRetrievalQA):
# Construct a ConversationalRetrievalChain with a streaming llm for combine docs # and a separate, non-streaming llm for question generation llm = OpenAI(temperature=0) stream_handler = StreamCallbackHandler() streaming_llm = OpenAI(streaming=True, callback_manager=AsyncCallbackManager([stream_handler]), verbose=True, temperature=0) question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT) doc_chain = load_qa_chain(streaming_llm, chain_type="stuff", prompt=QA_PROMPT) qa = ConversationalRetrievalChain( retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator) poe_handler = LangChainPoeHandler(qa, stream_handler) 

In LangChainPoeHandler.get_response:

async def get_response(self, query): run = asyncio.create_task(self.chain.acall(query)) async for token in this.callback_handler.stream(): yield token await run 

We can have an as_poe_handler to some of the common chains to cut down on this boilerplate.

  1. Second, more involved approach which is something we probably don't want to do now but will want to revisit in the future is to have the option to return a generator for tokens on BaseChain via a method called stream (which is what we have for some LLMs already). Initially, when we implemented streaming, we used callbacks bc it was the easiest solution and didn't require any API changes to chains/etc. However, some users might want more flexibilty and the ability to interact with the raw generator.
@nfcampos nfcampos force-pushed the nc/poe-handler-standalone branch from 84112f6 to 04cf9f0 Compare April 4, 2023 12:48
@nfcampos nfcampos changed the title Nc/poe handler standalone Add AsyncIteratorCallbackHandler Apr 4, 2023
@nfcampos nfcampos marked this pull request as ready for review April 4, 2023 12:49
@hwchase17 hwchase17 merged commit 6f39e88 into master Apr 8, 2023
@hwchase17 hwchase17 deleted the nc/poe-handler-standalone branch April 8, 2023 21:34
@lmeyerov
Copy link

Variants of #2 are def of interest here, we're trying to bind observers to various orchestrations as part of a more responsive experience, and potentially more interesting autonomy modes. Meanwhile, working through variants of #1, appreciate the confirmation of approach. Overall, good to see this discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants