Skip to content

Commit 7bf7745

Browse files
committed
Accomodating mvs pr comments
1 parent 044cbd8 commit 7bf7745

File tree

1 file changed

+11
-4
lines changed

1 file changed

+11
-4
lines changed

sagemaker-python-sdk/tensorflow_script_mode_horovod/tensorflow_script_mode_horovod.ipynb

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -108,14 +108,18 @@
108108
"metadata": {},
109109
"source": [
110110
"### 1. Accept `--model_dir` command-line argument\n",
111-
"Modify script to accept `--model_dir` as command-line argument which will define the directory path where the output model should be saved. This will be equal to `/opt/ml/model/`\n",
111+
"Modify script to accept `--model_dir` as command-line argument which will define the directory path (i.e. `/opt/ml/model/`) where the output model should be saved. As Sagemaker destroys the complete cluster at the end of training, saving the model to `/opt/ml/model/` directory preserves the trained model from getting lost as SageMaker at the end of trainig pushes all the data in `/opt/ml/model/` to s3. \n",
112+
"\n",
113+
"This also allows the SageMaker training to integrate with other SageMaker services such as Inference and also allows you to host the trained model outside SageMaker.\n",
112114
"\n",
113115
"Here is the code that needs to be added to script:\n",
114116
"\n",
115117
"```\n",
116118
"parser = argparse.ArgumentParser()\n",
117119
"parser.add_argument('--model_dir', type=str)\n",
118-
"```"
120+
"```\n",
121+
"\n",
122+
"More details can be found [here](https://github.com/aws/sagemaker-containers/blob/master/README.md)."
119123
]
120124
},
121125
{
@@ -135,7 +139,9 @@
135139
"\n",
136140
"x_test = np.load(os.path.join(os.environ['SM_CHANNEL_TEST'], 'test.npz'))['data']\n",
137141
"y_test = np.load(os.path.join(os.environ['SM_CHANNEL_TEST'], 'test.npz'))['labels']\n",
138-
"```\n"
142+
"```\n",
143+
"\n",
144+
"List of all environemnt variables set by SageMaker which are accessible inside training script can be found [here](https://github.com/aws/sagemaker-containers/blob/master/README.md)."
139145
]
140146
},
141147
{
@@ -445,7 +451,7 @@
445451
"source": [
446452
"import boto3\n",
447453
"from botocore.exceptions import ClientError\n",
448-
"\n",
454+
"from time import sleep\n",
449455
"\n",
450456
"def create_vpn_infra(stack_name=\"hvdvpcstack\"):\n",
451457
" cfn = boto3.client(\"cloudformation\")\n",
@@ -466,6 +472,7 @@
466472
"\n",
467473
" while describe_stack[\"StackStatus\"] == \"CREATE_IN_PROGRESS\":\n",
468474
" describe_stack = cfn.describe_stacks(StackName=stack_name)[\"Stacks\"][0]\n",
475+
" sleep(0.5)\n",
469476
"\n",
470477
" if describe_stack[\"StackStatus\"] != \"CREATE_COMPLETE\":\n",
471478
" raise ValueError(\"Stack creation failed in state: {}\".format(describe_stack[\"StackStatus\"]))\n",

0 commit comments

Comments
 (0)