Unable to Release Memory Used by Tiktoken Encoding After Setting to None and GC Collect

Description

I'm experiencing an issue where memory allocated by loading a Tiktoken encoding (e.g., for "gpt-4o") is not released even after setting the tokenizer reference to None and calling gc.collect(). This leads to persistent high memory usage, which is problematic for long-running applications or environments with limited resources.

Steps to Reproduce

Install tiktoken version 0.9.0.
Run the following Python script:

import psutil import gc process = psutil.Process() def get_memory(): print(process.memory_info().rss / 1000000) get_memory() # Baseline memory import tiktoken get_memory() # After importing tiktoken tokenizer = tiktoken.encoding_for_model("gpt-4o").encode get_memory() # After loading the encoding tokenizer = None gc.collect() get_memory() # After releasing reference and GC

Expected Behavior

After setting tokenizer = None and calling gc.collect(), the memory usage should decrease back to near the level after importing tiktoken (around 18 MB in my test), as the encoding object is no longer referenced.

Actual Behavior

Memory remains at the elevated level (around 117 MB) even after releasing the reference and garbage collection. Output from my run:

13.729792 18.284544 117.604352 117.604352

Environment

Python version: 3.11
Tiktoken version: 0.9.0
OS: macOS 15.3.1

Additional Context

Is there a recommended way to explicitly release the memory used by the tiktoken encoding? For example, does tiktoken cache encodings internally, or is there a method to unload them? Any workarounds or fixes would be appreciated, as this impacts memory management in production scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to Release Memory Used by Tiktoken Encoding After Setting to None and GC Collect #414

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to Release Memory Used by Tiktoken Encoding After Setting to None and GC Collect #414

Description

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions