- Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
I'm experiencing an issue where memory allocated by loading a Tiktoken encoding (e.g., for "gpt-4o") is not released even after setting the tokenizer reference to None and calling gc.collect(). This leads to persistent high memory usage, which is problematic for long-running applications or environments with limited resources.
Steps to Reproduce
- Install tiktoken version 0.9.0.
- Run the following Python script:
import psutil import gc process = psutil.Process() def get_memory(): print(process.memory_info().rss / 1000000) get_memory() # Baseline memory import tiktoken get_memory() # After importing tiktoken tokenizer = tiktoken.encoding_for_model("gpt-4o").encode get_memory() # After loading the encoding tokenizer = None gc.collect() get_memory() # After releasing reference and GCExpected Behavior
After setting tokenizer = None and calling gc.collect(), the memory usage should decrease back to near the level after importing tiktoken (around 18 MB in my test), as the encoding object is no longer referenced.
Actual Behavior
Memory remains at the elevated level (around 117 MB) even after releasing the reference and garbage collection. Output from my run:
13.729792 18.284544 117.604352 117.604352 Environment
- Python version: 3.11
- Tiktoken version: 0.9.0
- OS: macOS 15.3.1
Additional Context
Is there a recommended way to explicitly release the memory used by the tiktoken encoding? For example, does tiktoken cache encodings internally, or is there a method to unload them? Any workarounds or fixes would be appreciated, as this impacts memory management in production scenarios.