Skip to content
This repository was archived by the owner on Sep 4, 2025. It is now read-only.

Commit db244c5

Browse files
committed
Initial swipe at documenting existing metrics
1 parent e1857a0 commit db244c5

File tree

1 file changed

+142
-17
lines changed

1 file changed

+142
-17
lines changed

website/docs/cloud-docs/agents/telemetry.mdx

Lines changed: 142 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -12,21 +12,6 @@ To configure your agent to emit telemetry data, you must include the `-otlp-addr
1212

1313
Optionally, you can pass the `-otlp-cert-file` or `TFC_AGENT_OTLP_CERT_FILE`. The agent will use a certificate at the path supplied to encrypt the client connection to the OpenTelemetry collector. When omitted, client connections are not secured.
1414

15-
## Metrics
16-
17-
An example of the metric names that the agent will emit is `tfc-agent.terraform.plan-json.generate.bytes`. Breaking down those sections:
18-
19-
* `tfc-agent`: Metric names will be namespaced with tfc-agent to distinguish them from other metrics your system may be emitting.
20-
* `terraform`: Metric names may have a prefix with a component name, when a component is applicable.
21-
* `plan-json`: This shows that the metric in question is about the step of the agent process where JSON representations of the Terraform plan and Terraform provider schema are generated and uploaded.
22-
* `generate`: Specifically, it is about the generation of the JSON artifacts.
23-
* `bytes`: When a metric requires a unit in order to be understood, an un-abbreviated unit will be the last component of the metric name.
24-
25-
In addition, agent metrics will follow some conventions around unit types:
26-
27-
* All timing metrics (other than runtime metrics) will be measured in milliseconds.
28-
* All data size metrics will be measured in bytes.
29-
3015
## Tracing
3116

3217
In addition to metrics, the agent emits tracing spans that can be consumed by various distributed tracing tools. Information about supported tools can be found on the [OpenTelemetry Registry](https://opentelemetry.io/registry/). A span is a single unit of work performed by the agent.
@@ -38,6 +23,146 @@ Spans conform to the following rules:
3823
* Span attributes in the `tfc` namespace will have information relevant to the entire operation.
3924
* Span attributes in the `debug` namespace will have information relevant to the current span's scope.
4025

41-
## Stability
26+
## Metrics
27+
28+
The Terraform Cloud Agent emits numerous metrics describing the agent's
29+
performance. The metrics documented on this page are considered stable and will
30+
not change in any significant way between stable releases of the same major
31+
version.
32+
33+
### Metric naming conventions
34+
35+
An example of the metric names that the agent will emit is
36+
`tfc-agent.terraform.plan-json.generate.bytes`. Breaking down those sections:
37+
38+
* `tfc-agent`: Metric names will be namespaced with tfc-agent to distinguish
39+
them from other metrics your system may be emitting.
40+
* `terraform`: Metric names may have a prefix with a component name, when a
41+
component is applicable.
42+
* `plan-json`: This shows that the metric in question is about the step of the
43+
agent process where JSON representations of the Terraform plan and Terraform
44+
provider schema are generated and uploaded.
45+
* `generate`: Specifically, it is about the generation of the JSON artifacts.
46+
* `bytes`: When a metric requires a unit in order to be understood, an un-
47+
abbreviated unit will be the last component of the metric name.
48+
49+
In addition, agent metrics will follow some conventions around unit types:
50+
51+
* All timing metrics (other than runtime metrics) will be measured in milliseconds.
52+
* All data size metrics will be measured in bytes.
4253

43-
Metric names and definitions are not guaranteed to be stable at this time. HashiCorp will make an effort not to break existing monitoring of the agent, but metric names may change at any time. As the telemetry system matures, HashiCorp may add selected stable metrics to this documentation which will be covered by our versioning policy.
54+
While all of the metric names emitted by the tfc-agent use hyphens, some systems
55+
may automatically convert these to underscores.
56+
57+
### Core metrics
58+
59+
The following metrics are generated by the Terraform Cloud Agent core program,
60+
and are related to generic operations performed regularly by all agents. All
61+
metrics in this section are prefixed by `tfc-agent.`.
62+
63+
| Meric name | Type | Description |
64+
| ---------------------------- | ----- | ------------------------------------------------------------------------------------ |
65+
| `status.busy` | Gauge | Number of agents in `busy` status. |
66+
| `status.idle` | Gauge | Number of agents in `idle` status. |
67+
| `register.milliseconds` | Timer | Time in milliseconds to register the agent with Terraform Cloud. |
68+
| `fetch-job.milliseconds` | Timer | Time in milliseconds to complete a job dequeue request. |
69+
| `update-status.milliseconds` | Timer | Time in milliseconds to send a status update over from the agent to Terraform Cloud. |
70+
71+
### Runtime metrics
72+
73+
The Terraform Cloud Agent produces a number of metrics which are generated by
74+
the application runtime, and are primarily useful in debugging the tfc-agent.
75+
These metrics are emitted periodically throughout the entire agent process
76+
lifecycle. It is important to note that these metrics do not represent a
77+
complete picture of resource utilization by the agent. The agent may fork child
78+
processes (such as the Terraform binary or other programs) which maintain their
79+
own distinct runtimes and consume resources independently of the agent. To
80+
monitor resource utilization comprehensively, consider monitoring VM or
81+
container metrics. All metrics in this section are prefixed by
82+
`tfc-agent.runtime.`.
83+
84+
| Meric name | Type | Description |
85+
| ------------------------------- | ----- | ---------------------------------------------------------------------- |
86+
| `go.mem.heap-alloc.bytes` | Gauge | Amount of memory in bytes allocated to heap objects by the Go runtime. |
87+
| `go.mem.heap-idle.bytes` | Gauge | Amount of memory in bytes allocated to the heap which are unused. |
88+
| `go.mem.heap-inuse.bytes` | Gauge | Amount of memory in bytes allocated to the heap which are in use. |
89+
| `go.mem.heap-sys.bytes` | Gauge | Amount of memory in bytes obtained from the OS for the heap. |
90+
| `go.mem.heap-released.bytes` | Gauge | Amount of memory in bytes returned to the OS during GC. |
91+
| `go.mem.heap-objects.count` | Gauge | Number of allocated heap objects. |
92+
| `go.mem.lookups.count` | Gauge | Number of pointer lookups. |
93+
| `go.mem.malloc.count` | Gauge | Cumulative count of heap objects allocated. |
94+
| `go.mem.free.count` | Gauge | Cumulative count of heap objects freed. |
95+
| `go.gc.count` | Gauge | Number of completed GC cycles. |
96+
| `go.gc.pause-total.nanoseconds` | Timer | Cumulative time in nanoseconds spent in stop-the-world pauses. |
97+
| `uptime.milliseconds` | Timer | Cumulative time in milliseconds since the agent started. |
98+
99+
### Terraform component metrics
100+
101+
The following metrics are emitted by the `terraform` component, which is
102+
responsible for handling Terraform operations like plans and applies. All
103+
metrics in this section are prefixed by `tfc-agent.terraform.`.
104+
105+
| Meric name | Type | Description |
106+
| ---------------------------------------------- | ----- | ----------------------------------------------------------------------------------- |
107+
| `handle-signal.milliseconds` | Timer | Time in milliseconds spent handling an incoming signal from Terraform Cloud. |
108+
| `execute.milliseconds` | Timer | Time in milliseconds spent handling a Terraform operation. |
109+
| `output-stream.upload-chunk.bytes` | Gauge | Size in bytes of a chunk of Terraform output uploaded. |
110+
| `output-stream.upload-chunk.milliseconds` | Timer | Time in milliseconds spent uploading a single chunk of Terraform output. |
111+
| `output-stream.upload-full.bytes` | Gauge | Size in bytes of a full Terraform log uploaded. |
112+
| `output-stream.upload-full.milliseconds` | Timer | Time in milliseconds spent uploading the full Terraform log. |
113+
| `output-stream.close.milliseconds` | Timer | Time in milliseconds spent finalizing a Terraform output stream. |
114+
| `persist-filesystem.milliseconds` | Timer | Time in milliseconds spent packing and uploading a filesystem image. |
115+
| `persist-filesystem.pack.bytes` | Gauge | Size in bytes of a packed up filesystem image. |
116+
| `persist-filesystem.pack.milliseconds` | Timer | Time in milliseconds spent packing the contents of a filesystem. |
117+
| `persist-filesystem.upload.milliseconds` | Timer | Time in milliseconds spent uploading a packed up filesystem image. |
118+
| `plan-json.generate.bytes` | Gauge | Size in bytes of a generated JSON-formatted plan. |
119+
| `plan-json.generate.milliseconds` | Timer | Time in milliseconds spent generating a JSON plan. |
120+
| `plan-json.upload.milliseconds` | Timer | Time in milliseconds spent uploading a JSON plan. |
121+
| `provider-schemas-json.generate.bytes` | Gauge | Size in bytes of a generated JSON-formatted provider schemas file. |
122+
| `provider-schemas-json.generate.milliseconds` | Timer | Time in milliseconds spent generating the provider schemas document. |
123+
| `provider-schemas-json.upload.milliseconds` | Timer | Time in milliseconds spent uploading a provider schemas document. |
124+
| `restore-filesystem.download.milliseconds` | Timer | Time in milliseconds spent downloading and unpacking a filesystem image. |
125+
| `restore-filesystem.download.bytes` | Gauge | Size in bytes of a downloaded filesystem image. |
126+
| `restore-filesystem.download.milliseconds` | Timer | Time in milliseconds spent downloading a filesystem image. |
127+
| `restore-filesystem.unpack.milliseconds` | Timer | Time in milliseconds spent unpacking the contents of a filesystem image. |
128+
| `run-meta.additions` | Gauge | Number of resources added or proposed to be added in a Terraform operation. |
129+
| `run-meta.changes` | Gauge | Number of resources changed or proposed to change in a Terraform operation. |
130+
| `run-meta.destructions` | Gauge | Number of resources destroyed or proposed to be destroyed in a Terraform operation. |
131+
| `setup-backend.milliseconds` | Timer | Time in milliseconds spent configuring Terraform CLI to for Terraform Cloud. |
132+
| `setup-terraform-binary.milliseconds` | Timer | Time spent downloading and unpacking a Terraform OSS release. |
133+
| `setup-terraform-binary.download.bytes` | Gauge | Size in bytes of a downloaded Terraform OSS version. |
134+
| `setup-terraform-binary.download.milliseconds` | Timer | Time in milliseconds spent downloading a Terraform OSS release. |
135+
| `setup-terraform-binary.unpack.bytes` | Gauge | Size in bytes of an unpacked Terraform OSS release. |
136+
| `setup-terraform-binary.unpack.milliseconds` | Timer | Time in milliseconds spent unpacking a Terraform OSS release. |
137+
| `setup-terraform-config.milliseconds` | Timer | Time in milliseconds spent downloading and unpacking a Terraform configuration. |
138+
| `setup-terraform-config.download.bytes` | Gauge | Size in bytes of a downloaded Terraform configuration. |
139+
| `setup-terraform-config.download.milliseconds` | Timer | Time in milliseconds spent downloading a Terraform configuration. |
140+
| `setup-terraform-config.unpack.milliseconds` | Timer | Time in milliseconds spent unpacking a downloaded Terraform configuration. |
141+
| `setup-terraform-config.verify.milliseconds` | Timer | Time in milliseconds spent verifying a Terraform configuration. |
142+
| `setup-terraform-variables.milliseconds` | Timer | Time in milliseconds spent configuring Terraform variables provided by TFC. |
143+
| `setup-terraform-variables.write_file.bytes` | Gauge | Size in bytes of a tfvars file, generated by TFC-provided input variables. |
144+
| `terraform-apply.milliseconds` | Timer | Time in milliseconds spent running `terraform apply`. |
145+
| `terraform-plan.milliseconds` | Timer | Time in milliseconds spent running `terraform plan`. |
146+
| `terraform-version.milliseconds` | Timer | Time in milliseconds spent running `terraform version`. |
147+
148+
### Policy component metrics
149+
150+
The following metrics are emitted bythe `policy` component, which is responsible
151+
for handling OPA policy enforement operations. All metrics in this section
152+
are prefixed by `tfc-agent.policy.`.
153+
154+
| Meric name | Type | Description |
155+
| ---------------------------------------------- | ----- | ------------------------------------------------------------------------ |
156+
| `execute.milliseconds` | Timer | Time in milliseconds spent handling a policy operation. |
157+
| `policy-set.download.bytes` | Gauge | Size in bytes of a downloaded policy set. |
158+
| `policy-set.download.milliseconds` | Timer | Time in milliseconds spent downloading a policy set. |
159+
| `policy-set.unpack.milliseconds` | Timer | Time in milliseconds spent unpacking a policy set. |
160+
| `generate-opa-input-file.milliseconds` | Timer | Time in milliseconds spent generating the OPA input file. |
161+
| `parse-opa-config.milliseconds` | Timer | Time in milliseconds spent parsing the OPA configuration. |
162+
| `plan-json-download.milliseconds` | Timer | Time in milliseconds spent downloading the Terraform JSON plan document. |
163+
| `plan-json-download.bytes` | Gauge | Size in bytes of a downloaded Terraform JSON plan document. |
164+
| `run-opa-eval.milliseconds` | Timer | Time in milliseconds spent evaluating OPA policies. |
165+
| `setup-opa-binary.milliseconds` | Timer | Time in milliseconds spent downloading the OPA binary. |
166+
| `setup-policies.milliseconds` | Timer | Time in milliseconds spent setting up individually managed policies. |
167+
| `setup-policy-engines.milliseconds` | Timer | Time in milliseconds spent setting up policy runtimes. |
168+
| `setup-subjects.milliseconds` | Timer | Time in milliseconds spent setting up subjects for policy enforcement. |

0 commit comments

Comments
 (0)