You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: r_examples/r_batch_transform/r_xgboost_batch_transform.ipynb
+11-21Lines changed: 11 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,13 @@
8
8
"\n",
9
9
"**Note:** You will need to use R kernel in SageMaker for this notebook.\n",
10
10
"\n",
11
-
"This sample Notebook describes how to do batch transform to make predictions for abalone age as measured by the number of rings in the shell. The notebook will use the public [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) hosted by [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
11
+
"This sample Notebook describes how to do batch transform to make predictions for an abalone's age, which is measured by the number of rings in the shell. The notebook will use the public [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) hosted by [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
12
12
"\n",
13
13
"You can find more details about SageMaker's Batch Trsnform here: \n",
14
14
"- [Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) using a Transformer\n",
15
15
"\n",
16
16
"We will use `reticulate` library to interact with SageMaker:\n",
17
-
"- [`Reticulate` library](https://rstudio.github.io/reticulate/): provides an R interface to make API calls [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/latest/index.html) to make API calls to Amazon SageMaker. The `reticulate` package translates between R and Python objects, and Amazon SageMaker provides a serverless data science environment to train and deploy ML models at scale.\n",
17
+
"- [`Reticulate` library](https://rstudio.github.io/reticulate/): provides an R interface to use the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/latest/index.html) to make API calls to Amazon SageMaker. The `reticulate` package translates between R and Python objects, and Amazon SageMaker provides a serverless data science environment to train and deploy ML models at scale.\n",
18
18
"\n",
19
19
"Table of Contents:\n",
20
20
"- [Reticulating the Amazon SageMaker Python SDK](#Reticulating-the-Amazon-SageMaker-Python-SDK)\n",
@@ -26,7 +26,7 @@
26
26
"- [Download the Batch Transform Output](#Download-the-Batch-Transform-Output)\n",
27
27
"\n",
28
28
"\n",
29
-
"**Note:** The first portion of this notebook focused on data ingestion and preparing the data for model training is inspired by the data preparation part outlined in [\"Using R with Amazon SageMaker\"](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/r_kernel/using_r_with_amazon_sagemaker.ipynb) notebook on AWS SageMaker Examples Github repository with some modifications."
29
+
"**Note:** The first portion of this notebook focused on data ingestion and preparing the data for model training is inspired by the data preparation section outlined in the [\"Using R with Amazon SageMaker\"](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/r_kernel/using_r_with_amazon_sagemaker.ipynb) notebook on AWS SageMaker Examples Github repository with some modifications."
30
30
]
31
31
},
32
32
{
@@ -110,7 +110,7 @@
110
110
"source": [
111
111
"<h3>Downloading and Processing the Dataset</h3>\n",
112
112
"\n",
113
-
"The model uses the [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). First, download the data and start the [exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis). Use tidyverse packages to read the data, plot the data, and transform the data into ML format for Amazon SageMaker:"
113
+
"The model uses the [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). First, download the data and start the [exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis). Use tidyverse packages to read, plot, and transform the data into ML format for Amazon SageMaker:"
114
114
]
115
115
},
116
116
{
@@ -187,7 +187,7 @@
187
187
"source": [
188
188
"<h3>Preparing the Dataset for Model Training</h3>\n",
189
189
"\n",
190
-
"The model needs three datasets: one each for training, testing, and validation. First, convert `sex` into a [dummy variable](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)) and move the target, `rings`, to the first column. Amazon SageMaker algorithm require the target to be in the first column of the dataset."
190
+
"The model needs three datasets: one for training, testing, and validation. First, convert `sex` into a [dummy variable](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)) and move the target, `rings`, to the first column. Amazon SageMaker algorithm require the target to be in the first column of the dataset."
191
191
]
192
192
},
193
193
{
@@ -231,24 +231,21 @@
231
231
"cell_type": "markdown",
232
232
"metadata": {},
233
233
"source": [
234
-
"Later in the notebook, we are going to use Batch Transform and Endpoint to make inference in two different ways and we will compare the results. The maximum number of rows that we can send to an endpoint for inference in one batch is 500 rows. We are going to reduce the number of rows for the test dataset to 500 and use this for batch and online inference for comparison. "
234
+
"Upload the training and validation data to Amazon S3 so that you can train the model. First, write the training and validation datasets to the local filesystem in .csv format:"
"Upload the training and validation data to Amazon S3 so that you can train the model. First, write the training and validation datasets to the local filesystem in .csv format:"
248
+
"Second, upload the two datasets to the Amazon S3 bucket into the `data` key:"
"Second, upload the two datasets to the Amazon S3 bucket into the `data` key:"
272
-
]
273
-
},
274
264
{
275
265
"cell_type": "code",
276
266
"execution_count": null,
@@ -436,9 +426,9 @@
436
426
"\n",
437
427
"In many situations, using a deployed model for making inference is not the best option, especially when the goal is not to make online real-time inference but to generate predictions from a trained model on a large dataset. In these situations, using Batch Transform may be more efficient and appropriate.\n",
438
428
"\n",
439
-
"This section of the notebook explain how to set up the Batch Transform Job, and generate predictions.\n",
429
+
"This section of the notebook explains how to set up the Batch Transform Job and generate predictions.\n",
440
430
"\n",
441
-
"To do this, we need to define the batch input data path on S3, and also where to save the generated predictions on S3."
431
+
"To do this, we need to identify the batch input data path in S3 and specify where generated predictions will be stored in S3."
Copy file name to clipboardExpand all lines: r_examples/r_xgboost_hpo_batch_transform/r_xgboost_hpo_batch_transform.ipynb
+16-13Lines changed: 16 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -6,13 +6,16 @@
6
6
"source": [
7
7
"<h1>Hyperparamter Optimization Using R with Amazon SageMaker</h1>\n",
8
8
"\n",
9
-
"This sample Notebook describes how to conduct Hyperparamter tuning and batch transform to make predictions for abalone age as measured by the number of rings in the shell. The notebook will use the public [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) hosted by [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
9
+
"This sample Notebook demonstrates how to conduct Hyperparamter tuning and how to generate predictions for abalone age using two methods:\n",
10
10
"\n",
11
-
"We will use two methods to generate predictionsm after performin Hyperparameter Optimization (HPO). The goal is to demonstrate how each method works in R. These methods are:\n",
12
-
"- [Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) using a Transformer\n",
13
-
"- [Deploying the model](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html) as an endpoint and making inference using the endpoint \n",
11
+
"- [Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) using a Transformer.\n",
12
+
"- [Deploying the model](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html) as an endpoint and making online inferences. \n",
14
13
"\n",
15
-
"We will also use two different libraries to interact with SageMaker:\n",
14
+
"The goal is to demonstrate how these methods work in R. \n",
15
+
"\n",
16
+
"Abalone age is measured by the number of rings in the shell. The notebook will use the public [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) hosted by [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). \n",
17
+
"\n",
18
+
"We will use two different libraries to interact with SageMaker:\n",
16
19
"- [`Reticulate` library](https://rstudio.github.io/reticulate/): that provides an R interface to make API calls [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/latest/index.html) to make API calls to Amazon SageMaker. The `reticulate` package translates between R and Python objects, and Amazon SageMaker provides a serverless data science environment to train and deploy ML models at scale.\n",
17
20
"- [`paws` library](https://cran.r-project.org/web/packages/paws/index.html): that provides an interface to make API calls to AWS services, similar to how [`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) works. `boto3` is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services. `paws` provides the same capabilities in R.\n",
18
21
"\n",
@@ -33,8 +36,8 @@
33
36
" - [Deleting the Endpoint](#Deleting-the-Endpoint)\n",
34
37
"\n",
35
38
"\n",
36
-
"**Note:** The first portion of this notebook focused on data ingestion and preparing the data for model training is similar to the data preparation outlined in [\"Using R with Amazon SageMaker\"](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/r_kernel/using_r_with_amazon_sagemaker.ipynb) notebook on AWS SageMaker Examples Github repository with some modifications.\n",
37
-
"Also the last portion of this notebook focused on making inference using an endpoint is inspired by the method outlined in the notebook referenced here."
39
+
"**Note:** The first portion of this notebook focused on data ingestion and preparing the data for model training is similar to the data preparation outlined in the [\"Using R with Amazon SageMaker\"](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/r_kernel/using_r_with_amazon_sagemaker.ipynb) notebook on AWS SageMaker Examples Github repository with some modifications.\n",
40
+
"Also the last portion of this notebook focused on making inference using an endpoint is inspired by the method outlined in the notebook referenced [here](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/r_examples/r_end_2_end/r_sagemaker_abalone.ipynb)."
38
41
]
39
42
},
40
43
{
@@ -118,7 +121,7 @@
118
121
"source": [
119
122
"<h3>Downloading and Processing the Dataset</h3>\n",
120
123
"\n",
121
-
"The model uses the [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). First, download the data and start the [exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis). Use tidyverse packages to read the data, plot the data, and transform the data into ML format for Amazon SageMaker:"
124
+
"The model uses the [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). First, download the data and start the [exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis). Use tidyverse packages to read, plot, and transform the data into ML format for Amazon SageMaker:"
122
125
]
123
126
},
124
127
{
@@ -195,7 +198,7 @@
195
198
"source": [
196
199
"<h3>Preparing the Dataset for Model Training</h3>\n",
197
200
"\n",
198
-
"The model needs three datasets: one each for training, testing, and validation. First, convert `sex` into a [dummy variable](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)) and move the target, `rings`, to the first column. Amazon SageMaker algorithm require the target to be in the first column of the dataset."
201
+
"The model needs three datasets: one for training, testing, and validation. First, convert `sex` into a [dummy variable](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)) and move the target, `rings`, to the first column. Amazon SageMaker algorithm require the target to be in the first column of the dataset."
199
202
]
200
203
},
201
204
{
@@ -322,7 +325,7 @@
322
325
"source": [
323
326
"<h3>Hyperparameter Tuning for the XGBoost Model</h3>\n",
324
327
"\n",
325
-
"Amazon SageMaker algorithm are available via a [Docker](https://www.docker.com/) container. To train an [XGBoost](https://en.wikipedia.org/wiki/Xgboost) model, specify the training containers in [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) (Amazon ECR) for the AWS Region."
328
+
"Amazon SageMaker algorithms are available via a [Docker](https://www.docker.com/) container. To train an [XGBoost](https://en.wikipedia.org/wiki/Xgboost) model, specify the training containers in [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) (Amazon ECR) for the AWS Region."
326
329
]
327
330
},
328
331
{
@@ -402,7 +405,7 @@
402
405
"For tuning the hyperparamters you need to also specify the type and range of hyperparamters to be tuned. You can specify either a `ContinuousParameter` or an `IntegerParameter`, as outlined in the documentation. In addition, the algorithm documentation provides suggestions for the hyperparamter range.\n",
403
406
"\n",
404
407
"\n",
405
-
"One the Estimator and its hyperparamters and tunable hyperparamter ranges are specified, you can create a `HyperparameterTuner` and then train (or fit) that tuner which will conduct the tuning and will select the most optimzied model that you can then use to do either Batch Transform, or deply as an endpoint and use for online inference."
408
+
"Once the Estimator and its hyperparamters and tunable hyperparamter ranges are specified, you can create a `HyperparameterTuner` (tuner). You can train (or fit) that tuner which will conduct the tuning and will select the most optimzied model. You can then generate predictions using this model with Batch Transform, or by deploying the model as an endpoint and using it for online inference."
406
409
]
407
410
},
408
411
{
@@ -593,7 +596,7 @@
593
596
"\n",
594
597
"We can extract the **ModelDataUrl** by describing the best training job using `paws` library and `describe_training_job()` method. [More details can be found here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.describe_training_job).\n",
595
598
"\n",
596
-
"Then we will create a model using this model container. We will use `paws` library and `create_model` method. [Documentaiton of this method can be found here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model). "
599
+
"Then we will create a model using this model container. We will use `paws` library and `create_model` method. [Documentation of this method can be found here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model). "
0 commit comments