Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ To upgrade and rebuild `llama-cpp-python` add `--upgrade --force-reinstall --no-

The high-level API provides a simple managed interface through the [`Llama`](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama) class.

Below is a short example demonstrating how to use the high-level API to for basic text completion:
Below is a short example demonstrating how to use the high-level API for basic text completion:

```python
from llama_cpp import Llama
Expand Down Expand Up @@ -337,7 +337,7 @@ The high-level API also provides a simple interface for chat completion.
Chat completion requires that the model knows how to format the messages into a single prompt.
The `Llama` class does this using pre-registered chat formats (ie. `chatml`, `llama-2`, `gemma`, etc) or by providing a custom chat handler object.

The model will will format the messages into a single prompt using the following order of precedence:
The model will format the messages into a single prompt using the following order of precedence:
- Use the `chat_handler` if provided
- Use the `chat_format` if provided
- Use the `tokenizer.chat_template` from the `gguf` model's metadata (should work for most new models, older models may not have this)
Expand Down