In this tutorial, you use Model Garden to deploy the Gemma 1B open model to a GPU-backed Vertex AI endpoint. You must deploy a model to an endpoint before that model can be used to serve online predictions. Deploying a model associates physical resources with the model so it can serve online predictions with low latency.
After you deploy the Gemma 1B model, you inference the trained model by using the PredictionServiceClient
to get online predictions. Online predictions are synchronous requests made to a model that is deployed to an endpoint.
Deploy Gemma using Model Garden
You can deploy the Gemma 1B by using its model card in the Google Cloud console or programmatically.
For more information about setting up the Google Gen AI SDK or Google Cloud CLI, see the Google Gen AI SDK overview or Install the Google Cloud CLI.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
List the models that you can deploy and record the model ID to deploy. You can optionally list the supported Hugging Face models in Model Garden and even filter them by model names. The output doesn't include any tuned models.
View the deployment specifications for a model by using the model ID from the previous step. You can view the machine type, accelerator type, and container image URI that Model Garden has verified for a particular model.
Deploy a model to an endpoint. Model Garden uses the default deployment configuration unless you specify additional argument and values.
gcloud
Before you begin, specify a quota project to run the following commands. The commands you run are counted against the quotas for that project. For more information, see Set the quota project.
List the models that you can deploy by running the
gcloud ai model-garden models list
command. This command lists all model IDs and which ones you can self deploy.gcloud ai model-garden models list --model-filter=gemma
In the output, find the model ID to deploy. The following example shows an abbreviated output.
MODEL_ID CAN_DEPLOY CAN_PREDICT google/gemma2@gemma-2-27b Yes No google/gemma2@gemma-2-27b-it Yes No google/gemma2@gemma-2-2b Yes No google/gemma2@gemma-2-2b-it Yes No google/gemma2@gemma-2-9b Yes No google/gemma2@gemma-2-9b-it Yes No google/gemma3@gemma-3-12b-it Yes No google/gemma3@gemma-3-12b-pt Yes No google/gemma3@gemma-3-1b-it Yes No google/gemma3@gemma-3-1b-pt Yes No google/gemma3@gemma-3-27b-it Yes No google/gemma3@gemma-3-27b-pt Yes No google/gemma3@gemma-3-4b-it Yes No google/gemma3@gemma-3-4b-pt Yes No google/gemma3n@gemma-3n-e2b Yes No google/gemma3n@gemma-3n-e2b-it Yes No google/gemma3n@gemma-3n-e4b Yes No google/gemma3n@gemma-3n-e4b-it Yes No google/gemma@gemma-1.1-2b-it Yes No google/gemma@gemma-1.1-2b-it-gg-hf Yes No google/gemma@gemma-1.1-7b-it Yes No google/gemma@gemma-1.1-7b-it-gg-hf Yes No google/gemma@gemma-2b Yes No google/gemma@gemma-2b-gg-hf Yes No google/gemma@gemma-2b-it Yes No google/gemma@gemma-2b-it-gg-hf Yes No google/gemma@gemma-7b Yes No google/gemma@gemma-7b-gg-hf Yes No google/gemma@gemma-7b-it Yes No google/gemma@gemma-7b-it-gg-hf Yes No
The output doesn't include any tuned models or Hugging Face models. To view which Hugging Face models are supported, add the
--can-deploy-hugging-face-models
flag.To view the deployment specifications for a model, run the
gcloud ai model-garden models list-deployment-config
command. You can view the machine type, accelorator type, and container image URI that Model Garden supports for a particular model.gcloud ai model-garden models list-deployment-config \ --model=MODEL_ID
Replace MODEL_ID with the model ID from the previous list command, such as
google/gemma@gemma-2b
orstabilityai/stable-diffusion-xl-base-1.0
.Deploy a model to an endpoint by running the
gcloud ai model-garden models deploy
command. Model Garden generates a display name for your endpoint and uses the default deployment configuration unless you specify additional argument and values.To run the command asynchronously, include the
--asynchronous
flag.gcloud ai model-garden models deploy \ --model=MODEL_ID \ [--machine-type=MACHINE_TYPE] \ [--accelerator-type=ACCELERATOR_TYPE] \ [--endpoint-display-name=ENDPOINT_NAME] \ [--hugging-face-access-token=HF_ACCESS_TOKEN] \ [--reservation-affinity reservation-affinity-type=any-reservation] \ [--reservation-affinity reservation-affinity-type=specific-reservation, key="compute.googleapis.com/reservation-name", values=RESERVATION_RESOURCE_NAME] \ [--asynchronous]
Replace the following placeholders:
- MODEL_ID: The model ID from the previous list command. For Hugging Face models, use the Hugging Face model URL format, such as
stabilityai/stable-diffusion-xl-base-1.0
. - MACHINE_TYPE: Defines the set of resources to deploy for your model, such as
g2-standard-4
. - ACCELERATOR_TYPE: Specifies accelerators to add to your deployment to help improve performance when working with intensive workloads, such as
NVIDIA_L4
. - ENDPOINT_NAME: A name for the deployed Vertex AI endpoint.
- HF_ACCESS_TOKEN: For Hugging Face models, if the model is gated, provide an access token.
- RESERVATION_RESOURCE_NAME: To use a specific Compute Engine reservation, specify the name of your reservation. If you specify a specific reservation, you can't specify
any-reservation
.
The output includes the deployment configuration that Model Garden used, the endpoint ID, and the deployment operation ID, which you can use to check the deployment status.
Using the default deployment configuration: Machine type: g2-standard-12 Accelerator type: NVIDIA_L4 Accelerator count: 1 The project has enough quota. The current usage of quota for accelerator type NVIDIA_L4 in region us-central1 is 0 out of 28. Deploying the model to the endpoint. To check the deployment status, you can try one of the following methods: 1) Look for endpoint `ENDPOINT_DISPLAY_NAME` at the [Vertex AI] -> [Online prediction] tab in Cloud Console 2) Use `gcloud ai operations describe OPERATION_ID --region=LOCATION` to find the status of the deployment long-running operation
- MODEL_ID: The model ID from the previous list command. For Hugging Face models, use the Hugging Face model URL format, such as
To see details about your deployment, run the
gcloud ai endpoints list --list-model-garden-endpoints-only
command:gcloud ai endpoints list --list-model-garden-endpoints-only \ --region=LOCATION_ID
Replace LOCATION_ID with the region where you deployed the model.
The output includes all endpoints that were created from Model Garden and includes information such as the endpoint ID, endpoint name, and whether the endpoint is associated with a deployed model. To find your deployment, look for the endpoint name that was returned from the previous command.
REST
List all deployable models and then get the ID of the model to deploy. You can then deploy the model with its default configuration and endpoint. Or, you can choose to customize your deployment, such as setting a specific machine type or using a dedicated endpoint.
List models that you can deploy
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your Google Cloud project ID.
- QUERY_PARAMETERS: To list Model Garden models, add the following query parameters
listAllVersions=True&filter=can_deploy(true)
. To list Hugging Face models, set the filter toalt=json&is_hf_wildcard(true)+AND+labels.VERIFIED_DEPLOYMENT_CONFIG%3DVERIFIED_DEPLOYMENT_SUCCEED&listAllVersions=True
.
HTTP method and URL:
GET https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: PROJECT_ID" \
"https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS" | Select-Object -Expand Content
You receive a JSON response similar to the following.
{ "publisherModels": [ { "name": "publishers/google/models/gemma3", "versionId": "gemma-3-1b-it", "openSourceCategory": "GOOGLE_OWNED_OSS_WITH_GOOGLE_CHECKPOINT", "supportedActions": { "openNotebook": { "references": { "us-central1": { "uri": "https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gradio_streaming_chat_completions.ipynb" } }, "resourceTitle": "Notebook", "resourceUseCase": "Chat Completion Playground", "resourceDescription": "Chat with deployed Gemma 2 endpoints via Gradio UI." }, "deploy": { "modelDisplayName": "gemma-3-1b-it", "containerSpec": { "imageUri": "us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20250312_0916_RC01", "args": [ "python", "-m", "vllm.entrypoints.api_server", "--host=0.0.0.0", "--port=8080", "--model=gs://vertex-model-garden-restricted-us/gemma3/gemma-3-1b-it", "--tensor-parallel-size=1", "--swap-space=16", "--gpu-memory-utilization=0.95", "--disable-log-stats" ], "env": [ { "name": "MODEL_ID", "value": "google/gemma-3-1b-it" }, { "name": "DEPLOY_SOURCE", "value": "UI_NATIVE_MODEL" } ], "ports": [ { "containerPort": 8080 } ], "predictRoute": "/generate", "healthRoute": "/ping" }, "dedicatedResources": { "machineSpec": { "machineType": "g2-standard-12", "acceleratorType": "NVIDIA_L4", "acceleratorCount": 1 } }, "publicArtifactUri": "gs://vertex-model-garden-restricted-us/gemma3/gemma3.tar.gz", "deployTaskName": "vLLM 128K context", "deployMetadata": { "sampleRequest": "{\n \"instances\": [\n {\n \"@requestFormat\": \"chatCompletions\",\n \"messages\": [\n {\n \"role\": \"user\",\n \"content\": \"What is machine learning?\"\n }\n ],\n \"max_tokens\": 100\n }\n ]\n}\n" } }, ...
Deploy a model
Deploy a model from Model Garden or a model from Hugging Face. You can also customize the deployment by specifying additional JSON fields.
Deploy a model with its default configuration.
Before using any of the request data, make the following replacements:
- LOCATION: A region where the model is deployed.
- PROJECT_ID: Your Google Cloud project ID.
- MODEL_ID: The ID of the model to deploy, which you can get from listing all the deployable models. The ID uses the following format: publishers/PUBLISHER_NAME/models/ MODEL_NAME@MODEL_VERSION.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
Request JSON body:
{ "publisher_model_name": "MODEL_ID", "model_config": { "accept_eula": "true" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
. Run the following command in the terminal to create or overwrite this file in the current directory:
cat > request.json << 'EOF' { "publisher_model_name": "MODEL_ID", "model_config": { "accept_eula": "true" } } EOF
Then execute the following command to send your REST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"
PowerShell
Save the request body in a file named request.json
. Run the following command in the terminal to create or overwrite this file in the current directory:
@' { "publisher_model_name": "MODEL_ID", "model_config": { "accept_eula": "true" } } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content
You receive a JSON response similar to the following.
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata", "genericMetadata": { "createTime": "2025-03-13T21:44:44.538780Z", "updateTime": "2025-03-13T21:44:44.538780Z" }, "publisherModel": "publishers/google/models/gemma3@gemma-3-1b-it", "destination": "projects/PROJECT_ID/locations/LOCATION", "projectNumber": "PROJECT_ID" } }
Deploy a Hugging Face model
Before using any of the request data, make the following replacements:
- LOCATION: A region where the model is deployed.
- PROJECT_ID: Your Google Cloud project ID.
- MODEL_ID: The Hugging Face model ID model to deploy, which you can get from listing all the deployable models. The ID uses the following format: PUBLISHER_NAME/MODEL_NAME.
- ACCESS_TOKEN: If the model is gated, provide an access token.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
Request JSON body:
{ "hugging_face_model_id": "MODEL_ID", "hugging_face_access_token": "ACCESS_TOKEN", "model_config": { "accept_eula": "true" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
. Run the following command in the terminal to create or overwrite this file in the current directory:
cat > request.json << 'EOF' { "hugging_face_model_id": "MODEL_ID", "hugging_face_access_token": "ACCESS_TOKEN", "model_config": { "accept_eula": "true" } } EOF
Then execute the following command to send your REST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"
PowerShell
Save the request body in a file named request.json
. Run the following command in the terminal to create or overwrite this file in the current directory:
@' { "hugging_face_model_id": "MODEL_ID", "hugging_face_access_token": "ACCESS_TOKEN", "model_config": { "accept_eula": "true" } } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content
You receive a JSON response similar to the following.
{ "name": "projects/PROJECT_ID/locations/us-central1LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata", "genericMetadata": { "createTime": "2025-03-13T21:44:44.538780Z", "updateTime": "2025-03-13T21:44:44.538780Z" }, "publisherModel": "publishers/PUBLISHER_NAME/model/MODEL_NAME", "destination": "projects/PROJECT_ID/locations/LOCATION", "projectNumber": "PROJECT_ID" } }
Deploy a model with customizations
Before using any of the request data, make the following replacements:
- LOCATION: A region where the model is deployed.
- PROJECT_ID: Your Google Cloud project ID.
- MODEL_ID: The ID of the model to deploy, which you can get from listing all the deployable models. The ID uses the following format: publishers/PUBLISHER_NAME/models/ MODEL_NAME@MODEL_VERSION, such as
google/gemma@gemma-2b
orstabilityai/stable-diffusion-xl-base-1.0
. - MACHINE_TYPE: Defines the set of resources to deploy for your model, such as
g2-standard-4
. - ACCELERATOR_TYPE: Specifies accelerators to add to your deployment to help improve performance when working with intensive workloads, such as
NVIDIA_L4
- ACCELERATOR_COUNT: The number of accelerators to use in your deployment.
reservation_affinity_type
: To use an existing Compute Engine reservation for your deployment, specify any reservation or a specific one. If you specify this value, don't specifyspot
.spot
: Whether to use spot VMs for your deployment.- IMAGE_URI: The location of the container image to use, such as
us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20241016_0916_RC00_maas
- CONTAINER_ARGS: Arguments to pass to the container during the deployment.
- CONTAINER_PORT: A port number for your container.
fast_tryout_enabled
: When testing a model, you can choose to use a faster deployment. This option is available only for the highly-used models with certain machine types. If enabled, you cannot specify model or deployment configurations.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
Request JSON body:
{ "publisher_model_name": "MODEL_ID", "deploy_config": { "dedicated_resources": { "machine_spec": { "machine_type": "MACHINE_TYPE", "accelerator_type": "ACCELERATOR_TYPE", "accelerator_count": ACCELERATOR_COUNT, "reservation_affinity": { "reservation_affinity_type": "ANY_RESERVATION" } }, "spot": "false" } }, "model_config": { "accept_eula": "true", "container_spec": { "image_uri": "IMAGE_URI", "args": [CONTAINER_ARGS ], "ports": [ { "container_port": CONTAINER_PORT } ] } }, "deploy_config": { "fast_tryout_enabled": false }, }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
. Run the following command in the terminal to create or overwrite this file in the current directory:
cat > request.json << 'EOF' { "publisher_model_name": "MODEL_ID", "deploy_config": { "dedicated_resources": { "machine_spec": { "machine_type": "MACHINE_TYPE", "accelerator_type": "ACCELERATOR_TYPE", "accelerator_count": ACCELERATOR_COUNT, "reservation_affinity": { "reservation_affinity_type": "ANY_RESERVATION" } }, "spot": "false" } }, "model_config": { "accept_eula": "true", "container_spec": { "image_uri": "IMAGE_URI", "args": [CONTAINER_ARGS ], "ports": [ { "container_port": CONTAINER_PORT } ] } }, "deploy_config": { "fast_tryout_enabled": false }, } EOF
Then execute the following command to send your REST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"
PowerShell
Save the request body in a file named request.json
. Run the following command in the terminal to create or overwrite this file in the current directory:
@' { "publisher_model_name": "MODEL_ID", "deploy_config": { "dedicated_resources": { "machine_spec": { "machine_type": "MACHINE_TYPE", "accelerator_type": "ACCELERATOR_TYPE", "accelerator_count": ACCELERATOR_COUNT, "reservation_affinity": { "reservation_affinity_type": "ANY_RESERVATION" } }, "spot": "false" } }, "model_config": { "accept_eula": "true", "container_spec": { "image_uri": "IMAGE_URI", "args": [CONTAINER_ARGS ], "ports": [ { "container_port": CONTAINER_PORT } ] } }, "deploy_config": { "fast_tryout_enabled": false }, } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content
You receive a JSON response similar to the following.
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata", "genericMetadata": { "createTime": "2025-03-13T21:44:44.538780Z", "updateTime": "2025-03-13T21:44:44.538780Z" }, "publisherModel": "publishers/google/models/gemma3@gemma-3-1b-it", "destination": "projects/PROJECT_ID/locations/LOCATION", "projectNumber": "PROJECT_ID" } }
Console
In the Google Cloud console, go to the Model Garden page.
Find a supported model that you want to deploy, and click its model card.
Click Deploy to open the Deploy model pane.
In the Deploy model pane, specify details for your deployment.
- Use or modify the generated model and endpoint names.
- Select a location to create your model endpoint in.
- Select a machine type to use for each node of your deployment.
To use a Compute Engine reservation, under the Deployment settings section, select Advanced.
For the Reservation type field, select a reservation type. The reservation must match your specified machine specs.
- Automatically use created reservation: Vertex AI automatically selects an allowed reservation with matching properties. If there's no capacity in the automatically selected reservation, Vertex AI uses the general Google Cloud resource pool.
- Select specific reservations: Vertex AI uses a specific reservation. If there's no capacity for your selected reservation, an error is thrown.
- Don't use (default): Vertex AI uses the general Google Cloud resource pool. This value has the same effect as not specifying a reservation.
Click Deploy.
Terraform
To learn how to apply or remove a Terraform configuration, see Basic Terraform commands. For more information, see the Terraform provider reference documentation.
Deploy a model
The following example deploys the gemma-3-1b-it
model to a new Vertex AI endpoint in us-central1
by using default configurations.
terraform { required_providers { google = { source = "hashicorp/google" version = "6.45.0" } } } provider "google" { region = "us-central1" } resource "google_vertex_ai_endpoint_with_model_garden_deployment" "gemma_deployment" { publisher_model_name = "publishers/google/models/gemma3@gemma-3-1b-it" location = "us-central1" model_config { accept_eula = True } }
To deploy a model with customization, see Vertex AI Endpoint with Model Garden Deployment for details.
Apply the Configuration
terraform init terraform plan terraform apply
After you apply the configuration, Terraform provisions a new Vertex AI endpoint and deploys the specified open model.
Clean Up
To delete the endpoint and model deployment, run the following command:
terraform destroy
Inference Gemma 1B with the PredictionServiceClient
After you deploy Gemma 1B, you use the PredictionServiceClient
to get online predictions for the prompt: "Why is the sky blue?"
Code parameters
The PredictionServiceClient
code samples require you to update the following.
PROJECT_ID
: To find your project ID follow these steps.Go to the Welcome page in the Google Cloud console.
From the project picker at the top of the page, select your project.
The project name, project number, and project ID appear after the Welcome heading.
ENDPOINT_REGION
: This is the region where you deployed the endpoint.ENDPOINT_ID
: To find your endpoint ID, view it in the console or run thegcloud ai endpoints list
command. You'll need the endpoint name and region from the Deploy model pane.Console
You can view the endpoint details by clicking Online prediction > Endpoints and selecting your region. Note the number that appears in the
ID
column.gcloud
You can view the endpoint details by running the
gcloud ai endpoints list
command.gcloud ai endpoints list \ --region=ENDPOINT_REGION \ --filter=display_name=ENDPOINT_NAME
The output looks like this.
Using endpoint [https://us-central1-aiplatform.googleapis.com/] ENDPOINT_ID: 1234567891234567891 DISPLAY_NAME: gemma2-2b-it-mg-one-click-deploy
Sample code
In the sample code for your language, update the PROJECT_ID
, ENDPOINT_REGION
, and ENDPOINT_ID
. Then run your code.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.