1 7 J U N E 2 0 2 1 S C A L I N G A I I N P R O D U C T I O N U S I N G P Y T O R C H G E E T A C H A U H A N PyTorch Partner Engineering, Facebook AI @ C H A U H A N G
MLOPS World 2021 A G E N D A 0 1 C H A L L E N G E S W I T H M L I N P R O D U C T I O N 0 2 T O R C H S E R V E O V E R V I E W 0 3 B E S T P R A C T I C E S F O R P R O D U C T I O N D E P L O Y M E N T
MLOps World 2021 P Y T O R C H C O M M U N I T Y G R O W T H Source: https://paperswithcode.com/trends
MLOps World 2021 ● ● ● Cloud / On-Prem Preprocessing Application Application logic Application logic Postprocessing . . . . . . . . . Performance Ease of use Cost efficiency Deployment at scale C H A L L E N G E S W I T H M L I N D E P L O Y M E N T
MLOps World 2021 INFERENCE AT SCALE Deploying and managing models in production is di ffi cult. Some of the pain points include: Loading and managing multiple models, on multiple servers or end devices Running pre-processing and post-processing code on prediction requests. How to log, monitor and secure predictions What happens when you hit scale?
MLOps World 2021 TORCHSERVE Easily deploy PyTorch models in production at scale D E F A U LT H A N D L E R S F O R C O M M O N T A S K S L O W L AT E N C Y M O D E L S E R V I N G W O R K S W I T H A N Y M L E N V I R O N M E N T
MLOps World 2021 • Default handlers for common use cases (e.g., image segmentation, text classification) along with custom handlers support for other use cases and a Model Zoo • Multi-model serving, Model versioning and ability to roll back to an earlier version • Automatic batching of individual inferences across HTTP requests • Logging including common metrics, and the ability to incorporate custom metrics • Robust HTTP APIS - Management and Inference model1.pth model1.pth model1.pth torch-model-archiver HTTP HTTP http://localhost:8080/ … http://localhost:8081/ … Logging Metrics model1.mar model2.mar model3.mar model4.mar model5.mar <path>/model_store Inference API Management API TorchServe Metrics API Inference API Serving Model 3 Serving Model 2 Serving Model 1 torchserve --start TORCHSERVE
T O R C H S E R V E D E T A I L : M O D E L H A N D L E R S TorchServe has default model handlers that perform boilerplate data transforms for common cases: • Image Classification • Image Segmentation • Object Detection • Text Classification You can also create custom model handlers for any model and inference task. import torch class MyModelHandler(object):     def initialize(self, context): # get GPU status & device handle # load model & supporting files (vocabularies etc.)     def preprocess(self, data): # put incoming data into tensor # transform as needed for your model     def inference(self, context): # do predictions     def postprocess(self, output): # process inference output, e.g. extracting top K # package output for web delivery     def handle(self, context): if not _service.initialized: _service.initialize(context) if data is None: return None data = _service.preprocess(data) data = _service.inference(data) data = _service.postprocess(data) return data
M O D E L A R C H I V E torch-model-archiver cli tool for packaging all model artifacts into a single deployment unit • model checkpoints or model definition file with state_dict • torchscript and eager mode support • Extra files like vocab, config, index_to_name mapping torch-model-archiver 
 —model-name BERTSeqClassification_Torchscript 
 --version 1.0 
 --serialized-file Transformer_model/traced_model.pt 
 --handler ./Transformer_handler_generalized.py 
 --extra-files "./setup_config.json,./ Seq_classification_artifacts/index_to_name.json" 
 

 setup.config 
 { “model_name": "bert-base-uncased", “mode": "sequence_classification", “do_lower_case": "True", “num_labels": "2", “save_mode": "torchscript", “max_length": "150" } 
 
 torchserve --start 
 --model-store model_store 
 —-models <path-to model-file/s3-url/azure-blob-url> https://github.com/pytorch/serve/tree/master/model-archiver#creating-a-model-archive
D Y N A M I C B A T C H I N G Via Custom Handlers • Model Configuration based • batch_size Max batch size • max_batch_delay The max batch delay time TorchServe waits to receive batch_size number of requests 
 • (Coming soon) Batching support in default handlers curl localhost:8081/models/resnet-152 { "modelName": "resnet-152", "modelUrl": "https://s3.amazonaws.com/model-server/ model_archive_1.0/examples/resnet-152-batching/resnet-152.ma "runtime": "python", "minWorkers": 1, "maxWorkers": 1, "batchSize": 8, "maxBatchDelay": 10, "workers": [ { "id": "9008", "startTime": "2019-02-19T23:56:33.907Z", "status": "READY", "gpu": false, "memoryUsage": 607715328 } ] } https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md
M E T R I C S Out of box metrics with ability to extend • CPU, Disk, Memory utilization • Requests type count • ts.metrics class for extension • Types supported - Size, percentage, counter, general metric • Prometheus metrics support available # Access context metrics as follows metrics = context.metrics # Create Dimension Object from ts.metrics.dimension import Dimension # Dimensions are name value pairs dim1 = Dimension(name, value) . dimN= Dimension(name_n, value_n) # Add Distance as a metric # dimensions = [dim1, dim2, dim3, ..., dimN] metrics.add_metric('DistanceInKM', distance, 'km', dimensions=dimensions) # Add Image size as a size metric metrics.add_size('SizeOfImage', img_size, None, 'MB', dimensions) # Add MemoryUtilization as a percentage metric metrics.add_percent('MemoryUtilization', utilization_percent, None, dimensions) # Create a counter with name 'LoopCount' and dimensions metrics.add_counter('LoopCount', 1, None, dimensions) # Log custom metrics for metric in metrics.store: logger.info("[METRICS]%s", str(metric)) https://github.com/pytorch/serve/blob/master/docs/metrics.md
MLOps World 2021 RECENT FEATURES + Ensemble Model support, Captum Model Interpretability + Kubeflow Pipelines /KFServing Integration with Auto-scaling and Canary rollout on any cloud/on-prem 
 + GCP Vertex AI Serverless pipelines + MLflow Integration + Prometheus Integration with Grafana + Multiple nodes on EC2, Autoscaling on SageMaker/EKS, AWS Inferentia support + MMF, NMT, DeepLapV3 new examples 
 

Deployment models Optimizations Resilience Measurement Responsible AI Standalon e Primary backu p Orchestratio n Cloud vs. 
 on-premises Performance vs. latency TorchScript profilin g Offline vs. real-tim e Cost Robust endpoin t Auto-scalin g Canary deployment s A / B testing Metric s Model performanc e Interpretabilit y Feedback loop Fairnes s Human-centered design B E S T P R A C T I C E S F O R P R O D U C T I O N D E P L O Y M E N T S
MLOps World 2021 Fairness by design • Measure skewness of data, model bias, data bias; identify relevant metrics • Transparency, Explainable AI, inclusive design Human-centered design • Consider AI-driven decisions and their impact on people at the time of model design • Provide ability to have human recourse vs. full automation – for example, need to avoid a mortgage applications AI rejecting people of certain category or race • Computer vision models measure results based on demographics; for example, include support for different skin tones, age groups R E S P O N S I B L E A I
MLOps World 2021 • Build with performance vs. latency goals in mind • Reduce size of the model: Quantization, pruning, mixed precision training • Reduce latency: TorchScript model; use SnakeViz profiler • Evaluate GPU vs. CPU for low latency • Evaluate REST vs. gRPC for your prediction service O P T I M I Z A T I O N S
MLOps World 2021 fp32 accuracy int8 accuracy change Technique CPU inference speed up ResNet50 76.1 
 Top-1, Imagenet -0.2 
 75.9 Post Training 2x 
 214ms ➙102ms, 
 Intel Skylake-DE MobileNetV2 71.9 Top-1, Imagenet -0.3 71.6 Quantization-Aware Training 4x 
 75ms ➙18ms 
 OnePlus 5, Snapdragon 835 Translate / FairSeq 32.78 
 BLEU, IWSLT 2014 de-en 0.0 
 32.78 Dynamic 
 (weights only) 4x 
 for encoder 
 Intel Skylake-SE These models and more available on TorchHub - https://pytorch.org/hub/ QUANTIZATION
MLOps World 2021 B E R T M O D E L P R O F I L I N G Eager Mode
MLOps World 2021 B E R T M O D E L P R O F I L I N G Torchscript Mode 4x speedup
MLOps World 2021 Offline vs. real-time predictions • Offline: Dynamic batching • Online: Async processing – push/poll • Pre-computed predictions for certain elements Cost optimizations • Spot Instances for offline • Autoscaling based on metrics, on-demand cluster • Evaluate AI Accelerators supported like AWS Inferentia for lower cost point O P T I M I Z A T I O N S ( C O N T D . )
MLOps World 2021 Develop , Test Production Staging , Experiments Hybrid Cloud On-prem Cloud Managed Install from Source Standalone Docker Large Scale
 Production MLflow, Kubeflow Kubernetes, Kubeflow/KFserving Primary/Backup, ML Microservices Autoscaling, Canary Rollouts Minikub e Self managed Docker AWS CloudFormation CLOUD VMs/ Containers Microservices behind API Gateway CLOUD VMs/ Containers AWS SageMaker Endpoints, BYOC AWS SageMaker EKS/AKS/GKE AWS SageMaker/ GCP AI Platform Serverless Functions GCP Vertex AI, AWS SageMaker Canary Rollouts Databricks Managed MLflow D E P L O Y I N G M O D E L S I N P R O D U C T I O N
MLOps World 2021 Create robust endpoint for serving, for example, SageMaker endpoint Auto-scaling with orchestration deployments, multi-node for EC2, and other scenarios Canary deployments, test new version of a model on small subset before making default Shadow inference, deploy new version of model in parallel A / B testing of different versions of model R E S I L L I E N C E
MLOps World 2021 Define model performance metrics, such as accuracy, while designing the AI service; use-case specific Add custom metrics as appropriate Use CloudWatch or Prometheus dashboards for monitoring model performance Model interpretability analysis via Captum Deploy with a feedback loop, if model accuracy drops over time or new version, analyze issues like concept drift, stale data, etc. M E A S U R E M E N T
MLOps World 2021 Understand Align Mitigate Monitor Measure Stakeholder conversations to find 
 consensus and outline measurement and mitigation plans Analyze model performance, 
 label bias, outcomes, and other relevant signals Address observed 
 issues in dataset, 
 models, policies, etc How might the product’s goals, its policy, and its implementation affect users from different subgroups? Identify contextual definitions of fairness Monitor effect of mitigations on 
 subgroups, and ensure fairness analysis holds as product adapts FAIRNESS BY DESIGN
CAPTUM Text Contributions: 7.54 Image Contributions: 11.19 Total Contributions: 18.73 0 200 400 600 800 400 300 200 100 0 S U P P O R T F O R AT T R I B U T I O N A LG O R I T H M S 
 T O I N T E R P R E T: • Output predictions with respect to inputs • Output predictions with respect to layers • Neurons with respect to inputs • Currently provides gradient & perturbation based approaches (e.g. Integrated Gradients) Model interpretability library for PyTorch https://captum.ai/
MLOps World 2021 DYNABOARD & FLORES 101 WMT COMPETITION http://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html https://github.com/facebookresearch/dynalab https://dynabench.org/tasks/3#overall
MLOps World 2021 COMMUNIT Y PROJECTS https://github.com/cceyda/torchserve-dashboard https://github.com/Unity-Technologies/SynthDet https://medium.com/pytorch/how-wadhwani-ai-uses-pytorch- to-empower-cotton-farmers-14397f4c9f2b
MLOps World 2021 FUTURE RELEASES + Improved memory and resource usage for better scalability + C++ Backend for lower latency + Enhanced profiling tools
• TorchServe: https://github.com/pytorch/serve • Management API: https://github.com/pytorch/serve/blob/master/docs/management_api.md • Inference API: https://github.com/pytorch/serve/blob/master/docs/inference_api.md • Language Translation Ensemble example: https://github.com/pytorch/serve/tree/master/examples/Work fl ows/nmt_tranformers_pipeline • BERT Model example: https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers • Model Zoo: https://github.com/pytorch/serve/blob/master/docs/model_zoo.md • SnakeViz visualizations: https://github.com/pytorch/serve/tree/master/benchmarks#visualize-snakeviz-results • Logging: https://github.com/pytorch/serve/blob/master/docs/logging.md • Metrics: https://github.com/pytorch/serve/blob/master/docs/metrics.md • Prometheus Metrics: https://gith ub.com/pytorch/serve/blob/master/docs/metrics_api.md • Batch Inference: https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md • Kube fl ow Pipelines: https://github.com/kube fl ow/pipelines/tree/master/components/PyTorch/pytorch-kfp-components • Kubernetes support: https://github.com/pytorch/serve/blob/master/kubernetes/README.md • TorchServe Dashboard (Community): https://cceyda.github.io/blog/torchserve/streamlit/dashboard/2020/10/15/torchserve.html • Custom Handler community blog: https://towardsdatascience.com/deploy-models-and-create-custom-handlers-in-torchserve- fc2d048fbe91 • Captum Interpretability for BERT models: https://github.com/pytorch/serve/blob/master/captum/Captum_visualization_for_bert.ipynb • Operationalize, Scale and Infuse Trust in AI using KFServing: https://blog.kube fl ow.org/release/o ffi cial/2021/03/08/kfserving-0.5.html REFERENCES
QUESTIONS? Contact: Email: gchauhan@fb.com Linkedin: https://www.linkedin.com/in/geetachauhan/

Scaling AI in production using PyTorch

  • 1.
    1 7 JU N E 2 0 2 1 S C A L I N G A I I N P R O D U C T I O N U S I N G P Y T O R C H G E E T A C H A U H A N PyTorch Partner Engineering, Facebook AI @ C H A U H A N G
  • 2.
    MLOPS World 2021 AG E N D A 0 1 C H A L L E N G E S W I T H M L I N P R O D U C T I O N 0 2 T O R C H S E R V E O V E R V I E W 0 3 B E S T P R A C T I C E S F O R P R O D U C T I O N D E P L O Y M E N T
  • 3.
    MLOps World 2021 PY T O R C H C O M M U N I T Y G R O W T H Source: https://paperswithcode.com/trends
  • 4.
    MLOps World 2021 ● ● ● Cloud/ On-Prem Preprocessing Application Application logic Application logic Postprocessing . . . . . . . . . Performance Ease of use Cost efficiency Deployment at scale C H A L L E N G E S W I T H M L I N D E P L O Y M E N T
  • 5.
    MLOps World 2021 INFERENCEAT SCALE Deploying and managing models in production is di ffi cult. Some of the pain points include: Loading and managing multiple models, on multiple servers or end devices Running pre-processing and post-processing code on prediction requests. How to log, monitor and secure predictions What happens when you hit scale?
  • 6.
    MLOps World 2021 TORCHSERVE Easilydeploy PyTorch models in production at scale D E F A U LT H A N D L E R S F O R C O M M O N T A S K S L O W L AT E N C Y M O D E L S E R V I N G W O R K S W I T H A N Y M L E N V I R O N M E N T
  • 7.
    MLOps World 2021 •Default handlers for common use cases (e.g., image segmentation, text classification) along with custom handlers support for other use cases and a Model Zoo • Multi-model serving, Model versioning and ability to roll back to an earlier version • Automatic batching of individual inferences across HTTP requests • Logging including common metrics, and the ability to incorporate custom metrics • Robust HTTP APIS - Management and Inference model1.pth model1.pth model1.pth torch-model-archiver HTTP HTTP http://localhost:8080/ … http://localhost:8081/ … Logging Metrics model1.mar model2.mar model3.mar model4.mar model5.mar <path>/model_store Inference API Management API TorchServe Metrics API Inference API Serving Model 3 Serving Model 2 Serving Model 1 torchserve --start TORCHSERVE
  • 8.
    T O RC H S E R V E D E T A I L : M O D E L H A N D L E R S TorchServe has default model handlers that perform boilerplate data transforms for common cases: • Image Classification • Image Segmentation • Object Detection • Text Classification You can also create custom model handlers for any model and inference task. import torch class MyModelHandler(object):     def initialize(self, context): # get GPU status & device handle # load model & supporting files (vocabularies etc.)     def preprocess(self, data): # put incoming data into tensor # transform as needed for your model     def inference(self, context): # do predictions     def postprocess(self, output): # process inference output, e.g. extracting top K # package output for web delivery     def handle(self, context): if not _service.initialized: _service.initialize(context) if data is None: return None data = _service.preprocess(data) data = _service.inference(data) data = _service.postprocess(data) return data
  • 9.
    M O DE L A R C H I V E torch-model-archiver cli tool for packaging all model artifacts into a single deployment unit • model checkpoints or model definition file with state_dict • torchscript and eager mode support • Extra files like vocab, config, index_to_name mapping torch-model-archiver 
 —model-name BERTSeqClassification_Torchscript 
 --version 1.0 
 --serialized-file Transformer_model/traced_model.pt 
 --handler ./Transformer_handler_generalized.py 
 --extra-files "./setup_config.json,./ Seq_classification_artifacts/index_to_name.json" 
 

 setup.config 
 { “model_name": "bert-base-uncased", “mode": "sequence_classification", “do_lower_case": "True", “num_labels": "2", “save_mode": "torchscript", “max_length": "150" } 
 
 torchserve --start 
 --model-store model_store 
 —-models <path-to model-file/s3-url/azure-blob-url> https://github.com/pytorch/serve/tree/master/model-archiver#creating-a-model-archive
  • 10.
    D Y NA M I C B A T C H I N G Via Custom Handlers • Model Configuration based • batch_size Max batch size • max_batch_delay The max batch delay time TorchServe waits to receive batch_size number of requests 
 • (Coming soon) Batching support in default handlers curl localhost:8081/models/resnet-152 { "modelName": "resnet-152", "modelUrl": "https://s3.amazonaws.com/model-server/ model_archive_1.0/examples/resnet-152-batching/resnet-152.ma "runtime": "python", "minWorkers": 1, "maxWorkers": 1, "batchSize": 8, "maxBatchDelay": 10, "workers": [ { "id": "9008", "startTime": "2019-02-19T23:56:33.907Z", "status": "READY", "gpu": false, "memoryUsage": 607715328 } ] } https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md
  • 11.
    M E TR I C S Out of box metrics with ability to extend • CPU, Disk, Memory utilization • Requests type count • ts.metrics class for extension • Types supported - Size, percentage, counter, general metric • Prometheus metrics support available # Access context metrics as follows metrics = context.metrics # Create Dimension Object from ts.metrics.dimension import Dimension # Dimensions are name value pairs dim1 = Dimension(name, value) . dimN= Dimension(name_n, value_n) # Add Distance as a metric # dimensions = [dim1, dim2, dim3, ..., dimN] metrics.add_metric('DistanceInKM', distance, 'km', dimensions=dimensions) # Add Image size as a size metric metrics.add_size('SizeOfImage', img_size, None, 'MB', dimensions) # Add MemoryUtilization as a percentage metric metrics.add_percent('MemoryUtilization', utilization_percent, None, dimensions) # Create a counter with name 'LoopCount' and dimensions metrics.add_counter('LoopCount', 1, None, dimensions) # Log custom metrics for metric in metrics.store: logger.info("[METRICS]%s", str(metric)) https://github.com/pytorch/serve/blob/master/docs/metrics.md
  • 12.
    MLOps World 2021 RECENTFEATURES + Ensemble Model support, Captum Model Interpretability + Kubeflow Pipelines /KFServing Integration with Auto-scaling and Canary rollout on any cloud/on-prem 
 + GCP Vertex AI Serverless pipelines + MLflow Integration + Prometheus Integration with Grafana + Multiple nodes on EC2, Autoscaling on SageMaker/EKS, AWS Inferentia support + MMF, NMT, DeepLapV3 new examples 
 

  • 13.
    Deployment models Optimizations Resilience Measurement ResponsibleAI Standalon e Primary backu p Orchestratio n Cloud vs. 
 on-premises Performance vs. latency TorchScript profilin g Offline vs. real-tim e Cost Robust endpoin t Auto-scalin g Canary deployment s A / B testing Metric s Model performanc e Interpretabilit y Feedback loop Fairnes s Human-centered design B E S T P R A C T I C E S F O R P R O D U C T I O N D E P L O Y M E N T S
  • 14.
    MLOps World 2021 Fairnessby design • Measure skewness of data, model bias, data bias; identify relevant metrics • Transparency, Explainable AI, inclusive design Human-centered design • Consider AI-driven decisions and their impact on people at the time of model design • Provide ability to have human recourse vs. full automation – for example, need to avoid a mortgage applications AI rejecting people of certain category or race • Computer vision models measure results based on demographics; for example, include support for different skin tones, age groups R E S P O N S I B L E A I
  • 15.
    MLOps World 2021 •Build with performance vs. latency goals in mind • Reduce size of the model: Quantization, pruning, mixed precision training • Reduce latency: TorchScript model; use SnakeViz profiler • Evaluate GPU vs. CPU for low latency • Evaluate REST vs. gRPC for your prediction service O P T I M I Z A T I O N S
  • 16.
    MLOps World 2021 fp32accuracy int8 accuracy change Technique CPU inference speed up ResNet50 76.1 
 Top-1, Imagenet -0.2 
 75.9 Post Training 2x 
 214ms ➙102ms, 
 Intel Skylake-DE MobileNetV2 71.9 Top-1, Imagenet -0.3 71.6 Quantization-Aware Training 4x 
 75ms ➙18ms 
 OnePlus 5, Snapdragon 835 Translate / FairSeq 32.78 
 BLEU, IWSLT 2014 de-en 0.0 
 32.78 Dynamic 
 (weights only) 4x 
 for encoder 
 Intel Skylake-SE These models and more available on TorchHub - https://pytorch.org/hub/ QUANTIZATION
  • 17.
    MLOps World 2021 BE R T M O D E L P R O F I L I N G Eager Mode
  • 18.
    MLOps World 2021 BE R T M O D E L P R O F I L I N G Torchscript Mode 4x speedup
  • 19.
    MLOps World 2021 Offlinevs. real-time predictions • Offline: Dynamic batching • Online: Async processing – push/poll • Pre-computed predictions for certain elements Cost optimizations • Spot Instances for offline • Autoscaling based on metrics, on-demand cluster • Evaluate AI Accelerators supported like AWS Inferentia for lower cost point O P T I M I Z A T I O N S ( C O N T D . )
  • 20.
    MLOps World 2021 Develop , Test Production Staging , Experiments HybridCloud On-prem Cloud Managed Install from Source Standalone Docker Large Scale
 Production MLflow, Kubeflow Kubernetes, Kubeflow/KFserving Primary/Backup, ML Microservices Autoscaling, Canary Rollouts Minikub e Self managed Docker AWS CloudFormation CLOUD VMs/ Containers Microservices behind API Gateway CLOUD VMs/ Containers AWS SageMaker Endpoints, BYOC AWS SageMaker EKS/AKS/GKE AWS SageMaker/ GCP AI Platform Serverless Functions GCP Vertex AI, AWS SageMaker Canary Rollouts Databricks Managed MLflow D E P L O Y I N G M O D E L S I N P R O D U C T I O N
  • 21.
    MLOps World 2021 Createrobust endpoint for serving, for example, SageMaker endpoint Auto-scaling with orchestration deployments, multi-node for EC2, and other scenarios Canary deployments, test new version of a model on small subset before making default Shadow inference, deploy new version of model in parallel A / B testing of different versions of model R E S I L L I E N C E
  • 22.
    MLOps World 2021 Definemodel performance metrics, such as accuracy, while designing the AI service; use-case specific Add custom metrics as appropriate Use CloudWatch or Prometheus dashboards for monitoring model performance Model interpretability analysis via Captum Deploy with a feedback loop, if model accuracy drops over time or new version, analyze issues like concept drift, stale data, etc. M E A S U R E M E N T
  • 23.
    MLOps World 2021 Understand Align Mitigate Monitor Measure Stakeholderconversations to find 
 consensus and outline measurement and mitigation plans Analyze model performance, 
 label bias, outcomes, and other relevant signals Address observed 
 issues in dataset, 
 models, policies, etc How might the product’s goals, its policy, and its implementation affect users from different subgroups? Identify contextual definitions of fairness Monitor effect of mitigations on 
 subgroups, and ensure fairness analysis holds as product adapts FAIRNESS BY DESIGN
  • 24.
    CAPTUM Text Contributions: 7.54 ImageContributions: 11.19 Total Contributions: 18.73 0 200 400 600 800 400 300 200 100 0 S U P P O R T F O R AT T R I B U T I O N A LG O R I T H M S 
 T O I N T E R P R E T: • Output predictions with respect to inputs • Output predictions with respect to layers • Neurons with respect to inputs • Currently provides gradient & perturbation based approaches (e.g. Integrated Gradients) Model interpretability library for PyTorch https://captum.ai/
  • 25.
    MLOps World 2021 DYNABOARD& FLORES 101 WMT COMPETITION http://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html https://github.com/facebookresearch/dynalab https://dynabench.org/tasks/3#overall
  • 26.
    MLOps World 2021 COMMUNITY PROJECTS https://github.com/cceyda/torchserve-dashboard https://github.com/Unity-Technologies/SynthDet https://medium.com/pytorch/how-wadhwani-ai-uses-pytorch- to-empower-cotton-farmers-14397f4c9f2b
  • 27.
    MLOps World 2021 FUTURERELEASES + Improved memory and resource usage for better scalability + C++ Backend for lower latency + Enhanced profiling tools
  • 28.
    • TorchServe: https://github.com/pytorch/serve •Management API: https://github.com/pytorch/serve/blob/master/docs/management_api.md • Inference API: https://github.com/pytorch/serve/blob/master/docs/inference_api.md • Language Translation Ensemble example: https://github.com/pytorch/serve/tree/master/examples/Work fl ows/nmt_tranformers_pipeline • BERT Model example: https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers • Model Zoo: https://github.com/pytorch/serve/blob/master/docs/model_zoo.md • SnakeViz visualizations: https://github.com/pytorch/serve/tree/master/benchmarks#visualize-snakeviz-results • Logging: https://github.com/pytorch/serve/blob/master/docs/logging.md • Metrics: https://github.com/pytorch/serve/blob/master/docs/metrics.md • Prometheus Metrics: https://gith ub.com/pytorch/serve/blob/master/docs/metrics_api.md • Batch Inference: https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md • Kube fl ow Pipelines: https://github.com/kube fl ow/pipelines/tree/master/components/PyTorch/pytorch-kfp-components • Kubernetes support: https://github.com/pytorch/serve/blob/master/kubernetes/README.md • TorchServe Dashboard (Community): https://cceyda.github.io/blog/torchserve/streamlit/dashboard/2020/10/15/torchserve.html • Custom Handler community blog: https://towardsdatascience.com/deploy-models-and-create-custom-handlers-in-torchserve- fc2d048fbe91 • Captum Interpretability for BERT models: https://github.com/pytorch/serve/blob/master/captum/Captum_visualization_for_bert.ipynb • Operationalize, Scale and Infuse Trust in AI using KFServing: https://blog.kube fl ow.org/release/o ffi cial/2021/03/08/kfserving-0.5.html REFERENCES
  • 29.