Skip to content

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

License

Notifications You must be signed in to change notification settings

wyzhang/JetStream

 
 

Repository files navigation

Unit Tests PyPI version PyPi downloads Contributions welcome

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

JetStream Engine Implementation

Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.

Jax

Pytorch

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

make install-deps 

Run local server & Testing

Use the following commands to run a server locally:

# Start a server python -m jetstream.core.implementations.mock.server # Test local mock server python -m jetstream.tools.requester # Load test local mock server python -m jetstream.tools.load_tester 

Test core modules

# Test JetStream core orchestrator python -m unittest -v jetstream.tests.core.test_orchestrator # Test JetStream core server library python -m unittest -v jetstream.tests.core.test_server # Test mock JetStream engine implementation python -m unittest -v jetstream.tests.engine.test_mock_engine # Test mock JetStream token utils python -m unittest -v jetstream.tests.engine.test_token_utils python -m unittest -v jetstream.tests.engine.test_utils 

About

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.8%
  • Shell 6.6%
  • Other 0.6%