Skip to content

Unable to Release Memory Used by Tiktoken Encoding After Setting to None and GC Collect #414

@kimhyun5u

Description

@kimhyun5u

Description

I'm experiencing an issue where memory allocated by loading a Tiktoken encoding (e.g., for "gpt-4o") is not released even after setting the tokenizer reference to None and calling gc.collect(). This leads to persistent high memory usage, which is problematic for long-running applications or environments with limited resources.

Steps to Reproduce

  1. Install tiktoken version 0.9.0.
  2. Run the following Python script:
import psutil import gc process = psutil.Process() def get_memory(): print(process.memory_info().rss / 1000000) get_memory() # Baseline memory import tiktoken get_memory() # After importing tiktoken tokenizer = tiktoken.encoding_for_model("gpt-4o").encode get_memory() # After loading the encoding tokenizer = None gc.collect() get_memory() # After releasing reference and GC

Expected Behavior

After setting tokenizer = None and calling gc.collect(), the memory usage should decrease back to near the level after importing tiktoken (around 18 MB in my test), as the encoding object is no longer referenced.

Actual Behavior

Memory remains at the elevated level (around 117 MB) even after releasing the reference and garbage collection. Output from my run:

13.729792 18.284544 117.604352 117.604352 

Environment

  • Python version: 3.11
  • Tiktoken version: 0.9.0
  • OS: macOS 15.3.1

Additional Context

Is there a recommended way to explicitly release the memory used by the tiktoken encoding? For example, does tiktoken cache encodings internally, or is there a method to unload them? Any workarounds or fixes would be appreciated, as this impacts memory management in production scenarios.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions