Cortex is an open source platform that takes machine learning models—trained with nearly any framework—and turns them into production web APIs in one command.
-
Autoscaling: Cortex automatically scales APIs to handle production workloads.
-
Multi framework: Cortex supports TensorFlow, PyTorch, scikit-learn, XGBoost, and more.
-
CPU / GPU support: Cortex can run inference on CPU or GPU infrastructure.
-
Rolling updates: Cortex updates deployed APIs without any downtime.
-
Log streaming: Cortex streams logs from deployed models to your CLI.
-
Prediction monitoring: Cortex monitors network metrics and tracks predictions.
-
Minimal configuration: Deployments are defined in a single
cortex.yaml
file.
# predictor.py model = download_my_model() def predict(sample, metadata): return model.predict(sample["text"])
# cortex.yaml - kind: deployment name: sentiment - kind: api name: classifier predictor: path: predictor.py tracker: model_type: classification compute: gpu: 1
$ cortex deploy created endpoint: http://***.amazonaws.com/sentiment/classifier
$ curl http://***.amazonaws.com/sentiment/classifier \ -X POST -H "Content-Type: application/json" \ -d '{"text": "the movie was great!"}' positive
$ cortex get classifier --watch status up-to-date available requested last update avg latency live 1 1 1 8s 123ms class count positive 8 negative 4
The CLI sends configuration and code to the cluster every time you run cortex deploy
. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), Flask, TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch.
- Sentiment analysis in TensorFlow with BERT
- Image classification in TensorFlow with Inception
- Text generation in PyTorch with DistilGPT2
- Iris classification in XGBoost / ONNX