The AWS Step Functions Data Science SDK is an open-source library that allows data scientists to easily create workflows that process and publish machine learning models using Amazon SageMaker and AWS Step Functions. You can create machine learning workflows in Python that orchestrate AWS infrastructure at scale, without having to provision and integrate the AWS services separately.
- Workflow - A sequence of steps designed to perform some work
- Step - A unit of work within a workflow
- ML Pipeline - A type of workflow used in data science to create and train machine learning models
The AWS Step Functions Data Science SDK enables you to do the following.
- Easily construct and run machine learning workflows that use AWS infrastructure directly in Python
- Instantiate common training pipelines
- Create standard machine learning workflows in a Jupyter notebook from templates
- Getting Started With Sample Jupyter Notebooks
- Installing the AWS Step Functions Data Science SDK
- Overview
- AWS Permissions
- Licensing
- Verifying the Signature
The best way to quickly review how the AWS Step Functions Data Science SDK works is to review the related example notebooks. These notebooks provide code and descriptions for creating and running workflows in AWS Step Functions Using the AWS Step Functions Data Science SDK.
In Amazon SageMaker, example Jupyter notebooks are available in the example notebooks portion of a notebook instance. To run the example notebooks, do the following.
- Either Create a Notebook Instance or Access an Existing notebook instance.
- Select the SageMaker Examples tab.
- Choose a notebook in the Step Functions Data Science SDK section and select Use.
For more information, see Example Notebooks in the Amazon SageMaker documentation.
To run the AWS Step Functions Data Science SDK example notebooks locally, download the sample notebooks and open them in a working Jupyter instance.
- Install Jupyter: https://jupyter.readthedocs.io/en/latest/install.html
- Download the following files from: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/step-functions-data-science-sdk.
hello_world_workflow.ipynb
machine_learning_workflow_abalone.ipynb
training_pipeline_pytorch_mnist.ipynb
- Open the files in Jupyter.
The AWS Step Functions Data Science SDK is built to PyPI and can be installed with pip as follows.
pip install stepfunctions
You can install from source by cloning this repository and running a pip install command in the root directory of the repository:
git clone https://github.com/aws/aws-step-functions-data-science-sdk-python.git cd aws-step-functions-data-science-sdk-python pip install .
The AWS Step Functions Data Science SDK supports Unix/Linux and Mac.
The AWS Step Functions Data Science SDK is tested on:
- Python 2.7
- Python 3.6
The AWS Step Functions Data Science SDK provides a Python API that enables you to create data science and machine learning workflows using AWS Step Functions and SageMaker directly in your Python code and Jupyter notebooks.
Using this SDK you can:
- Create steps that accomplish tasks.
- Chain those steps together into workflows.
- Include retry, succeed, or fail steps.
- Review a graphical representation and definition for your workflow.
- Create a workflow in AWS Step Functions.
- Start and review executions in AWS Step Functions.
For a detailed API reference of the AWS Step Functions Data Science SDK, be sure to view this documentation on Read the Docs.
AWS Step Functions lets you coordinate multiple AWS services into serverless workflows so you can build and update apps quickly. Using Step Functions, you can design and run workflows that combine services such as Amazon SageMaker, AWS Lambda, and Amazon Elastic Container Service (Amazon ECS), into feature-rich applications. Workflows are made up of a series of steps, with the output of one step acting as input to the next.
The AWS Step Functions Data Science SDK provides access to AWS Step Functions so that you can easily create and run machine learning and data science workflows directly in Python, and inside your Jupyter Notebooks. Workflows are created locally in Python, but when they are ready for execution, the workflow is first uploaded to the AWS Step Functions service for execution in the cloud.
When you use the SDK to create, update, or execute workflows you are talking to the Step Functions service in the cloud. Your workflows live in AWS Step Functions and can be re-used.
You can execute a workflow as many times as you want, and you can optionally change the input each time. Each time you execute a workflow, it creates a new execution instance in the cloud. You can inspect these executions with SDK commands, or with the Step Functions management console. You can run more than one execution at a time.
Using this SDK you can create steps, chain them together to create a workflow, create that workflow in AWS Step Functions, and execute the workflow in the AWS cloud.
Once you have created your workflow in AWS Step Functions, you can execute that workflow in Step Functions, in the AWS cloud.
Step Functions creates workflows out of steps called States, and expresses that workflow in the Amazon States Language. When you create a workflow in the AWS Step Functions Data Science SDK, it creates a State Machine representing your workflow and steps in AWS Step Functions.
For more information about Step Functions concepts and use, see the Step Functions documentation.
You create steps using the SDK, and chain them together into sequential workflows. Then, you can create those workflows in AWS Step Functions and execute them in Step Functions directly from your Python code. For example, the following is how you define a pass step.
start_pass_state = Pass( state_id="MyPassState" )The following is how you define a wait step.
wait_state = Wait( state_id="Wait for 3 seconds", seconds=3 )The following example shows how to define a Lambda step, and then defines a Retry and a Catch.
lambda_state = LambdaStep( state_id="Convert HelloWorld to Base64", parameters={ "FunctionName": "MyLambda", #replace with the name of your function "Payload": { "input": "HelloWorld" } } ) lambda_state.add_retry(Retry( error_equals=["States.TaskFailed"], interval_seconds=15, max_attempts=2, backoff_rate=4.0 )) lambda_state.add_catch(Catch( error_equals=["States.TaskFailed"], next_step=Fail("LambdaTaskFailed") ))After you define these steps, chain them together into a logical sequence.
workflow_definition=Chain([start_pass_state, wait_state, lambda_state])Once the steps are chained together, you can define the workflow definition.
workflow = Workflow( name="MyWorkflow_v1234", definition=workflow_definition, role=stepfunctions_execution_role )The following generates a graphical representation of your workflow.
workflow.render_graph(portrait=False)The following renders the JSON of the Amazon States Language definition of the workflow you created.
print(workflow.definition.to_json(pretty=True))The following creates the workflow in AWS Step Functions.
workflow.create()The following starts an execution of your workflow in AWS Step Functions.
execution = workflow.execute(inputs={ "IsHelloWorldExample": True })The following generates an AWS CloudFormation Template to deploy your workflow.
get_cloudformation_template()The generated template contains only the StateMachine resource. To reuse the CloudFormation template in a different region, please make sure to update the region specific AWS resources (such as the Lambda ARN and Training Image) in the StateMachine definition.
As a managed service, AWS Step Functions performs operations on your behalf on AWS hardware that is managed by AWS Step Functions. AWS Step Functions can perform only operations that the user permits. You can read more about which permissions are necessary in the AWS Documentation.
The AWS Step Functions Data Science SDK should not require any additional permissions aside from what is required for using .AWS Step Functions. However, if you are using an IAM role with a path in it, you should grant permission for iam:GetRole.
AWS Step Functions Data Science SDK is licensed under the Apache 2.0 License. It is copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. The license is available at: http://aws.amazon.com/apache2.0/

