Skip to content

Conversation

mkovalski
Copy link
Contributor

@mkovalski mkovalski commented Nov 8, 2021

Adds ability to profile vertex training jobs using tensorboard profiler.

  • Add a base plugin and tf profiler plugin to cloud training tools.
  • Create helpers for uploading profiled items to tensorboard backend
  • Add additional environment variables for setting webserver port.

Fixes #519

mkovalski and others added 30 commits August 23, 2021 15:10
@mkovalski mkovalski requested a review from a team as a code owner November 8, 2021 16:04
@product-auto-label product-auto-label bot added the api: aiplatform Issues related to the AI Platform API. label Nov 8, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 8, 2021
@nicain
Copy link
Contributor

nicain commented Nov 8, 2021

@mkovalski: I am assigning as owner of this PR; feel free to ping reviewers as needed to make sure the review process progresses in a timely fashion, or provide guidance on a who might better own the process of getting the PR reviewed, passing continuous testing, and merged. Reach out if you have questions.

@mkovalski mkovalski requested a review from sasha-gitg November 10, 2021 17:15

if not environment_variables.http_handler_port:
raise MissingEnvironmentVariableException(
"'AIP_HTTP_HANDLER_PORT' must be set."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the user set this using env or is this set by the service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is set by the service.


from google.cloud.aiplatform.training_utils.cloud_profiler.plugins import base_plugin
from typing import List
from werkzeug import wrappers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap with informative importerror exception.

setup.py Outdated

full_extra_require = list(
set(tensorboard_extra_require + metadata_extra_require + xai_extra_require)
set(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TF version should be handled explicitly since TB, XAI, and Profiler have different version bounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mkovalski mkovalski requested a review from sasha-gitg November 19, 2021 18:37
@sasha-gitg sasha-gitg merged commit 6d5c7c4 into googleapis:main Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: aiplatform Issues related to the AI Platform API. cla: yes This human has signed the Contributor License Agreement.

3 participants