ymwdalex
diff --git a/‎r_examples/r_batch_transform/r_xgboost_batch_transform.ipynb‎
Lines changed: 11 additions & 21 deletions b/‎r_examples/r_batch_transform/r_xgboost_batch_transform.ipynb‎
Lines changed: 11 additions & 21 deletions
diff --git a/‎r_examples/r_xgboost_hpo_batch_transform/r_xgboost_hpo_batch_transform.ipynb‎
Lines changed: 16 additions & 13 deletions b/‎r_examples/r_xgboost_hpo_batch_transform/r_xgboost_hpo_batch_transform.ipynb‎
Lines changed: 16 additions & 13 deletions
@@ -8,13 +8,13 @@
  "\n",
  "**Note:** You will need to use R kernel in SageMaker for this notebook.\n",
  "\n",
- "This sample Notebook describes how to do batch transform to make predictions for abalone age as measured by the number of rings in the shell. The notebook will use the public [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) hosted by [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
+ "This sample Notebook describes how to do batch transform to make predictions for an abalone's age, which is measured by the number of rings in the shell. The notebook will use the public [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) hosted by [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
  "\n",
  "You can find more details about SageMaker's Batch Trsnform here: \n",
  "- [Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) using a Transformer\n",
  "\n",
  "We will use `reticulate` library to interact with SageMaker:\n",
- "- [`Reticulate` library](https://rstudio.github.io/reticulate/): provides an R interface to make API calls [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/latest/index.html) to make API calls to Amazon SageMaker. The `reticulate` package translates between R and Python objects, and Amazon SageMaker provides a serverless data science environment to train and deploy ML models at scale.\n",
+ "- [`Reticulate` library](https://rstudio.github.io/reticulate/): provides an R interface to use the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/latest/index.html) to make API calls to Amazon SageMaker. The `reticulate` package translates between R and Python objects, and Amazon SageMaker provides a serverless data science environment to train and deploy ML models at scale.\n",
  "\n",
  "Table of Contents:\n",
  "- [Reticulating the Amazon SageMaker Python SDK](#Reticulating-the-Amazon-SageMaker-Python-SDK)\n",
@@ -26,7 +26,7 @@
  "- [Download the Batch Transform Output](#Download-the-Batch-Transform-Output)\n",
  "\n",
  "\n",
- "**Note:** The first portion of this notebook focused on data ingestion and preparing the data for model training is inspired by the data preparation part outlined in [\"Using R with Amazon SageMaker\"](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/r_kernel/using_r_with_amazon_sagemaker.ipynb) notebook on AWS SageMaker Examples Github repository with some modifications."
+ "**Note:** The first portion of this notebook focused on data ingestion and preparing the data for model training is inspired by the data preparation section outlined in the [\"Using R with Amazon SageMaker\"](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/r_kernel/using_r_with_amazon_sagemaker.ipynb) notebook on AWS SageMaker Examples Github repository with some modifications."
  ]
  },
  {
@@ -110,7 +110,7 @@
  "source": [
  "<h3>Downloading and Processing the Dataset</h3>\n",
  "\n",
- "The model uses the [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). First, download the data and start the [exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis). Use tidyverse packages to read the data, plot the data, and transform the data into ML format for Amazon SageMaker:"
+ "The model uses the [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). First, download the data and start the [exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis). Use tidyverse packages to read, plot, and transform the data into ML format for Amazon SageMaker:"
  ]
  },
  {
@@ -187,7 +187,7 @@
  "source": [
  "<h3>Preparing the Dataset for Model Training</h3>\n",
  "\n",
- "The model needs three datasets: one each for training, testing, and validation. First, convert `sex` into a [dummy variable](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)) and move the target, `rings`, to the first column. Amazon SageMaker algorithm require the target to be in the first column of the dataset."
+ "The model needs three datasets: one for training, testing, and validation. First, convert `sex` into a [dummy variable](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)) and move the target, `rings`, to the first column. Amazon SageMaker algorithm require the target to be in the first column of the dataset."
  ]
  },
  {
@@ -231,24 +231,21 @@
  "cell_type": "markdown",
  "metadata": {},
  "source": [
- "Later in the notebook, we are going to use Batch Transform and Endpoint to make inference in two different ways and we will compare the results. The maximum number of rows that we can send to an endpoint for inference in one batch is 500 rows. We are going to reduce the number of rows for the test dataset to 500 and use this for batch and online inference for comparison. "
+ "Upload the training and validation data to Amazon S3 so that you can train the model. First, write the training and validation datasets to the local filesystem in .csv format:"
  ]
  },
  {
  "cell_type": "code",
  "execution_count": null,
  "metadata": {},
  "outputs": [],
- "source": [
- "num_predict_rows <- 500\n",
- "abalone_test <- abalone_test[1:num_predict_rows, ]"
- ]
+ "source": []
  },
  {
  "cell_type": "markdown",
  "metadata": {},
  "source": [
- "Upload the training and validation data to Amazon S3 so that you can train the model. First, write the training and validation datasets to the local filesystem in .csv format:"
+ "Second, upload the two datasets to the Amazon S3 bucket into the `data` key:"
  ]
  },
  {
@@ -264,13 +261,6 @@
  "write_csv(abalone_test[-1], 'abalone_test.csv', col_names = FALSE)"
  ]
  },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Second, upload the two datasets to the Amazon S3 bucket into the `data` key:"
- ]
- },
  {
  "cell_type": "code",
  "execution_count": null,
@@ -436,9 +426,9 @@
  "\n",
  "In many situations, using a deployed model for making inference is not the best option, especially when the goal is not to make online real-time inference but to generate predictions from a trained model on a large dataset. In these situations, using Batch Transform may be more efficient and appropriate.\n",
  "\n",
- "This section of the notebook explain how to set up the Batch Transform Job, and generate predictions.\n",
+ "This section of the notebook explains how to set up the Batch Transform Job and generate predictions.\n",
  "\n",
- "To do this, we need to define the batch input data path on S3, and also where to save the generated predictions on S3."
+ "To do this, we need to identify the batch input data path in S3 and specify where generated predictions will be stored in S3."
  ]
  },
  {
@@ -595,4 +585,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 2
-}
+}
@@ -6,13 +6,16 @@
  "source": [
  "<h1>Hyperparamter Optimization Using R with Amazon SageMaker</h1>\n",
  "\n",
- "This sample Notebook describes how to conduct Hyperparamter tuning and batch transform to make predictions for abalone age as measured by the number of rings in the shell. The notebook will use the public [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) hosted by [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
+ "This sample Notebook demonstrates how to conduct Hyperparamter tuning and how to generate predictions for abalone age using two methods:\n",
  "\n",
- "We will use two methods to generate predictionsm after performin Hyperparameter Optimization (HPO). The goal is to demonstrate how each method works in R. These methods are:\n",
- "- [Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) using a Transformer\n",
- "- [Deploying the model](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html) as an endpoint and making inference using the endpoint \n",
+ "- [Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) using a Transformer.\n",
+ "- [Deploying the model](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html) as an endpoint and making online inferences. \n",
  "\n",
- "We will also use two different libraries to interact with SageMaker:\n",
+ "The goal is to demonstrate how these methods work in R. \n",
+ "\n",
+ "Abalone age is measured by the number of rings in the shell. The notebook will use the public [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) hosted by [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). \n",
+ "\n",
+ "We will use two different libraries to interact with SageMaker:\n",
  "- [`Reticulate` library](https://rstudio.github.io/reticulate/): that provides an R interface to make API calls [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/latest/index.html) to make API calls to Amazon SageMaker. The `reticulate` package translates between R and Python objects, and Amazon SageMaker provides a serverless data science environment to train and deploy ML models at scale.\n",
  "- [`paws` library](https://cran.r-project.org/web/packages/paws/index.html): that provides an interface to make API calls to AWS services, similar to how [`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) works. `boto3` is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services. `paws` provides the same capabilities in R.\n",
  "\n",
@@ -33,8 +36,8 @@
  " - [Deleting the Endpoint](#Deleting-the-Endpoint)\n",
  " \n",
  " \n",
- "**Note:** The first portion of this notebook focused on data ingestion and preparing the data for model training is similar to the data preparation outlined in [\"Using R with Amazon SageMaker\"](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/r_kernel/using_r_with_amazon_sagemaker.ipynb) notebook on AWS SageMaker Examples Github repository with some modifications.\n",
- "Also the last portion of this notebook focused on making inference using an endpoint is inspired by the method outlined in the notebook referenced here."
+ "**Note:** The first portion of this notebook focused on data ingestion and preparing the data for model training is similar to the data preparation outlined in the [\"Using R with Amazon SageMaker\"](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/r_kernel/using_r_with_amazon_sagemaker.ipynb) notebook on AWS SageMaker Examples Github repository with some modifications.\n",
+ "Also the last portion of this notebook focused on making inference using an endpoint is inspired by the method outlined in the notebook referenced [here](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/r_examples/r_end_2_end/r_sagemaker_abalone.ipynb)."
  ]
  },
  {
@@ -118,7 +121,7 @@
  "source": [
  "<h3>Downloading and Processing the Dataset</h3>\n",
  "\n",
- "The model uses the [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). First, download the data and start the [exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis). Use tidyverse packages to read the data, plot the data, and transform the data into ML format for Amazon SageMaker:"
+ "The model uses the [abalone dataset](https://archive.ics.uci.edu/ml/datasets/abalone) from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). First, download the data and start the [exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis). Use tidyverse packages to read, plot, and transform the data into ML format for Amazon SageMaker:"
  ]
  },
  {
@@ -195,7 +198,7 @@
  "source": [
  "<h3>Preparing the Dataset for Model Training</h3>\n",
  "\n",
- "The model needs three datasets: one each for training, testing, and validation. First, convert `sex` into a [dummy variable](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)) and move the target, `rings`, to the first column. Amazon SageMaker algorithm require the target to be in the first column of the dataset."
+ "The model needs three datasets: one for training, testing, and validation. First, convert `sex` into a [dummy variable](https://en.wikipedia.org/wiki/Dummy_variable_(statistics)) and move the target, `rings`, to the first column. Amazon SageMaker algorithm require the target to be in the first column of the dataset."
  ]
  },
  {
@@ -322,7 +325,7 @@
  "source": [
  "<h3>Hyperparameter Tuning for the XGBoost Model</h3>\n",
  "\n",
- "Amazon SageMaker algorithm are available via a [Docker](https://www.docker.com/) container. To train an [XGBoost](https://en.wikipedia.org/wiki/Xgboost) model, specify the training containers in [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) (Amazon ECR) for the AWS Region."
+ "Amazon SageMaker algorithms are available via a [Docker](https://www.docker.com/) container. To train an [XGBoost](https://en.wikipedia.org/wiki/Xgboost) model, specify the training containers in [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) (Amazon ECR) for the AWS Region."
  ]
  },
  {
@@ -402,7 +405,7 @@
  "For tuning the hyperparamters you need to also specify the type and range of hyperparamters to be tuned. You can specify either a `ContinuousParameter` or an `IntegerParameter`, as outlined in the documentation. In addition, the algorithm documentation provides suggestions for the hyperparamter range.\n",
  "\n",
  "\n",
- "One the Estimator and its hyperparamters and tunable hyperparamter ranges are specified, you can create a `HyperparameterTuner` and then train (or fit) that tuner which will conduct the tuning and will select the most optimzied model that you can then use to do either Batch Transform, or deply as an endpoint and use for online inference."
+ "Once the Estimator and its hyperparamters and tunable hyperparamter ranges are specified, you can create a `HyperparameterTuner` (tuner). You can train (or fit) that tuner which will conduct the tuning and will select the most optimzied model. You can then generate predictions using this model with Batch Transform, or by deploying the model as an endpoint and using it for online inference."
  ]
  },
  {
@@ -593,7 +596,7 @@
  "\n",
  "We can extract the **ModelDataUrl** by describing the best training job using `paws` library and `describe_training_job()` method. [More details can be found here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.describe_training_job).\n",
  " \n",
- "Then we will create a model using this model container. We will use `paws` library and `create_model` method. [Documentaiton of this method can be found here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model). "
+ "Then we will create a model using this model container. We will use `paws` library and `create_model` method. [Documentation of this method can be found here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model). "
  ]
  },
  {
@@ -959,4 +962,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 2
-}
+}