[AI-5939] Update readme and metrics documentation #22069

sarah-witt · 2025-12-05T20:55:18Z

What does this PR do?

Adds a readme for ibm spectrum lsf, as well as documentation for metrics. Also specifies which metrics are monitored by which parameters.

Motivation

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

codecov · 2025-12-05T20:59:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.07%. Comparing base (6d8718a) to head (fc08a7b).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

iadjivon · 2025-12-05T21:26:40Z

Hi Sarah, thanks for this PR, I've added this to our board for editorial review: DOCS-12849

github-actions · 2025-12-05T21:45:19Z

⚠️ Recommendation: Add qa/skip-qa label

This PR does not modify any files shipped with the agent.

To help streamline the release process, please consider adding the qa/skip-qa label if these changes do not require QA testing.

evazorro

Thanks for updating the README! I added some style/wording suggestions and then a couple bigger formatting notes that will be easier for you to change locally.

evazorro · 2025-12-15T20:10:51Z

ibm_spectrum_lsf/README.md

+
+Add the `dd-agent` user as an LSF [administrator][10].
+
+The integration runs commands such as `lsid`, `bhosts`, and `lsclusters`. In order to run these commands, the Agent needs them in its `PATH`. This is typically done by running `source $LSF_HOME/conf/profile.lsf`. However, the Datadog Agent uses upstart or systemd to orchestrate the datadog-agent service. Environment variables may need to be added to the service configuration files at the default locations of:


Suggested change

The integration runs commands such as `lsid`, `bhosts`, and `lsclusters`. In order to run these commands, the Agent needs them in its `PATH`. This is typically done by running `source $LSF_HOME/conf/profile.lsf`. However, the Datadog Agent uses upstart or systemd to orchestrate the datadog-agent service. Environment variables may need to be added to the service configuration files at the default locations of:

The integration runs commands such as `lsid`, `bhosts`, and `lsclusters`. In order to run these commands, the Agent needs them in its `PATH`. This is typically done by running `source $LSF_HOME/conf/profile.lsf`. However, the Datadog Agent uses upstart or systemd to orchestrate the `datadog-agent` service. You may need to add environment variables to the service configuration files at the default locations of:

evazorro · 2025-12-15T20:12:14Z

ibm_spectrum_lsf/README.md

+
+To get the enviornment variables necessary for the agent service, locate the `<LSF_TOP_DIR>/conf/profile.lsf` file and run the following command:
+
+`env -i bash -c "source <LSF_TOP_DIR>/conf/profile.lsf; env"`


Even though this is just one line, I would put it in a standalone code block (three backticks) rather than inline code formatting (one backtick). It'll be easier for customers to copy/paste the command in a standalone code block.

evazorro · 2025-12-15T20:12:45Z

ibm_spectrum_lsf/README.md

+
+`env -i bash -c "source <LSF_TOP_DIR>/conf/profile.lsf; env"`
+
+This will output a list of environment variables necessary to run the IBM Spectrum LSF commands.


Suggested change

This will output a list of environment variables necessary to run the IBM Spectrum LSF commands.

Running this command outputs a list of environment variables necessary to run the IBM Spectrum LSF commands.

evazorro · 2025-12-15T20:15:07Z

ibm_spectrum_lsf/README.md

+- Upstart: `/etc/init/datadog-agent.conf`
+- Systemd: `/lib/systemd/system/datadog-agent.service`
+
+To get the enviornment variables necessary for the agent service, locate the `<LSF_TOP_DIR>/conf/profile.lsf` file and run the following command:


Suggested change

To get the enviornment variables necessary for the agent service, locate the `<LSF_TOP_DIR>/conf/profile.lsf` file and run the following command:

To get the environment variables necessary for the Agent service, locate the `<LSF_TOP_DIR>/conf/profile.lsf` file and run the following command:

evazorro · 2025-12-15T20:18:12Z

ibm_spectrum_lsf/README.md

-
 ## Troubleshooting

+Use the `datadog-agent check` command to view the metrics the integration is collection, as well as debug logs from the check:


Suggested change

Use the `datadog-agent check` command to view the metrics the integration is collection, as well as debug logs from the check:

Use the `datadog-agent check` command to view the metrics the integration is collecting, as well as debug logs from the check:

evazorro · 2025-12-15T20:23:03Z

ibm_spectrum_lsf/README.md


 1. Edit the `ibm_spectrum_lsf.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your `ibm_spectrum_lsf` performance data. See the [sample ibm_spectrum_lsf.d/conf.yaml][4] for all available configuration options.

+The IBM Spectrum LSF integration will run a series of management commands to collect data. To control what commands are run and what metrics are emitted, use the `metric_sources` configuration option. By default, data from the following commands are collected: `lsclusters`, `lshosts`, `bhosts`, `lsload`, `bqueues`, `bslots`, `bjobs`, but you can enable more optional metrics or opt-out of collecting any set of metrics.


Suggested change

The IBM Spectrum LSF integration will run a series of management commands to collect data. To control what commands are run and what metrics are emitted, use the `metric_sources` configuration option. By default, data from the following commands are collected: `lsclusters`, `lshosts`, `bhosts`, `lsload`, `bqueues`, `bslots`, `bjobs`, but you can enable more optional metrics or opt-out of collecting any set of metrics.

The IBM Spectrum LSF integration runs a series of management commands to collect data. To control which commands are run and which metrics are emitted, use the `metric_sources` configuration option. By default, data from the following commands are collected, but you can enable more optional metrics or opt out of collecting any set of metrics: `lsclusters`, `lshosts`, `bhosts`, `lsload`, `bqueues`, `bslots`, `bjobs`.

evazorro · 2025-12-15T20:24:09Z

ibm_spectrum_lsf/README.md


+The IBM Spectrum LSF integration will run a series of management commands to collect data. To control what commands are run and what metrics are emitted, use the `metric_sources` configuration option. By default, data from the following commands are collected: `lsclusters`, `lshosts`, `bhosts`, `lsload`, `bqueues`, `bslots`, `bjobs`, but you can enable more optional metrics or opt-out of collecting any set of metrics.
+
+For example, if you would like to measure only GPU specific metrics, your metric sources will look like:


Suggested change

For example, if you would like to measure only GPU specific metrics, your metric sources will look like:

For example, if you want to only measure GPU-specific metrics, your `metrics_sources` will look like:

evazorro · 2025-12-15T20:24:47Z

ibm_spectrum_lsf/README.md

+ - bhosts_gpu
+```
+
+The `badmin_perfmon` metric source collects fata from the `badmin perfmon view -json` command. This collects [overall statistics][12] about the cluster. To collect these metrics, performance collection must be enabled on your server using the `badmin perfmon start <COLLECTION_INTERVAL>` command. By default, the integration will run this command automatically (and stop collection once the agent is turned off). However, you can turn off this behavior by setting `badmin_perfmon_auto: false`.


Suggested change

The `badmin_perfmon` metric source collects fata from the `badmin perfmon view -json` command. This collects [overall statistics][12] about the cluster. To collect these metrics, performance collection must be enabled on your server using the `badmin perfmon start <COLLECTION_INTERVAL>` command. By default, the integration will run this command automatically (and stop collection once the agent is turned off). However, you can turn off this behavior by setting `badmin_perfmon_auto: false`.

The `badmin_perfmon` metric source collects data from the `badmin perfmon view -json` command. This collects [overall statistics][12] about the cluster. To collect these metrics, performance collection must be enabled on your server using the `badmin perfmon start <COLLECTION_INTERVAL>` command. By default, the integration runs this command automatically (and stops collection once the Agent is turned off). However, you can turn off this behavior by setting `badmin_perfmon_auto: false`.

evazorro · 2025-12-15T20:25:04Z

ibm_spectrum_lsf/README.md

+
+The `badmin_perfmon` metric source collects fata from the `badmin perfmon view -json` command. This collects [overall statistics][12] about the cluster. To collect these metrics, performance collection must be enabled on your server using the `badmin perfmon start <COLLECTION_INTERVAL>` command. By default, the integration will run this command automatically (and stop collection once the agent is turned off). However, you can turn off this behavior by setting `badmin_perfmon_auto: false`.
+
+Since collecting these metrics can add extra load on your server, we recommend setting a higher collection interval for these metrics, or at least 60. The exact depends on the load and size of your cluster. View IBM Spectrum LSF's [recommendations][13] for managing high query load.


Suggested change

Since collecting these metrics can add extra load on your server, we recommend setting a higher collection interval for these metrics, or at least 60. The exact depends on the load and size of your cluster. View IBM Spectrum LSF's [recommendations][13] for managing high query load.

Since collecting these metrics can add extra load on your server, we recommend setting a higher collection interval for these metrics, or at least 60. The exact interval depends on the load and size of your cluster. View IBM Spectrum LSF's [recommendations][13] for managing high query load.

Exact interval? Exact number?

evazorro · 2025-12-15T20:25:28Z

ibm_spectrum_lsf/README.md

+
+Since collecting these metrics can add extra load on your server, we recommend setting a higher collection interval for these metrics, or at least 60. The exact depends on the load and size of your cluster. View IBM Spectrum LSF's [recommendations][13] for managing high query load.
+
+Similarly, the `bhist` command collects information about completed jobs, which can be query intensive so we recommend monitoring this command with the `min_collection_interval` set to 60.


Suggested change

Similarly, the `bhist` command collects information about completed jobs, which can be query intensive so we recommend monitoring this command with the `min_collection_interval` set to 60.

Similarly, the `bhist` command collects information about completed jobs, which can be query-intensive, so we recommend monitoring this command with the `min_collection_interval` set to 60.

Add documentation

d518118

datadog-agent-integrations-bot bot added documentation integration/ibm_spectrum_lsf labels Dec 5, 2025

sarah-witt mentioned this pull request Dec 5, 2025

[AI-6250] Add support for bhist metrics #22030

Merged

3 tasks

sarah-witt marked this pull request as ready for review December 5, 2025 21:04

sarah-witt requested review from a team as code owners December 5, 2025 21:04

datadog-agent-integrations-bot bot added team/agent-integrations team/documentation labels Dec 5, 2025

sarah-witt changed the title ~~Update readme and metrics documentation~~ [AI-5939] Update readme and metrics documentation Dec 5, 2025

iadjivon added the editorial review Waiting on a more in-depth review from a docs team editor label Dec 5, 2025

Merge branch 'master' into sarah/update-docs-ibm-lsf

84d44ed

sarah-witt added 4 commits December 5, 2025 16:45

sort metadata

3221f52

Fix quoting

7c0213b

Add documentation for bhist metrics

ecce1c9

Remove quotes

fc08a7b

steveny91 approved these changes Dec 8, 2025

View reviewed changes

Kyle-Neale approved these changes Dec 8, 2025

View reviewed changes

evazorro requested changes Dec 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AI-5939] Update readme and metrics documentation #22069

[AI-5939] Update readme and metrics documentation #22069

Uh oh!

sarah-witt commented Dec 5, 2025

codecov bot commented Dec 5, 2025 •

edited

Loading

iadjivon commented Dec 5, 2025

github-actions bot commented Dec 5, 2025 •

edited

Loading

evazorro left a comment

evazorro Dec 15, 2025

evazorro Dec 15, 2025

evazorro Dec 15, 2025

evazorro Dec 15, 2025

evazorro Dec 15, 2025

evazorro Dec 15, 2025

evazorro Dec 15, 2025

evazorro Dec 15, 2025

evazorro Dec 15, 2025

evazorro Dec 15, 2025

evazorro Dec 15, 2025

Labels

6 participants


		Add the `dd-agent` user as an LSF [administrator][10].

		The integration runs commands such as `lsid`, `bhosts`, and `lsclusters`. In order to run these commands, the Agent needs them in its `PATH`. This is typically done by running `source $LSF_HOME/conf/profile.lsf`. However, the Datadog Agent uses upstart or systemd to orchestrate the datadog-agent service. Environment variables may need to be added to the service configuration files at the default locations of:


		To get the enviornment variables necessary for the agent service, locate the `<LSF_TOP_DIR>/conf/profile.lsf` file and run the following command:

		`env -i bash -c "source <LSF_TOP_DIR>/conf/profile.lsf; env"`


		`env -i bash -c "source <LSF_TOP_DIR>/conf/profile.lsf; env"`

		This will output a list of environment variables necessary to run the IBM Spectrum LSF commands.

	This will output a list of environment variables necessary to run the IBM Spectrum LSF commands.
	Running this command outputs a list of environment variables necessary to run the IBM Spectrum LSF commands.

	To get the enviornment variables necessary for the agent service, locate the `<LSF_TOP_DIR>/conf/profile.lsf` file and run the following command:
	To get the environment variables necessary for the Agent service, locate the `<LSF_TOP_DIR>/conf/profile.lsf` file and run the following command:


		## Troubleshooting

		Use the `datadog-agent check` command to view the metrics the integration is collection, as well as debug logs from the check:


		1. Edit the `ibm_spectrum_lsf.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your `ibm_spectrum_lsf` performance data. See the [sample ibm_spectrum_lsf.d/conf.yaml][4] for all available configuration options.

		The IBM Spectrum LSF integration will run a series of management commands to collect data. To control what commands are run and what metrics are emitted, use the `metric_sources` configuration option. By default, data from the following commands are collected: `lsclusters`, `lshosts`, `bhosts`, `lsload`, `bqueues`, `bslots`, `bjobs`, but you can enable more optional metrics or opt-out of collecting any set of metrics.


		The IBM Spectrum LSF integration will run a series of management commands to collect data. To control what commands are run and what metrics are emitted, use the `metric_sources` configuration option. By default, data from the following commands are collected: `lsclusters`, `lshosts`, `bhosts`, `lsload`, `bqueues`, `bslots`, `bjobs`, but you can enable more optional metrics or opt-out of collecting any set of metrics.

		For example, if you would like to measure only GPU specific metrics, your metric sources will look like:

	For example, if you would like to measure only GPU specific metrics, your metric sources will look like:
	For example, if you want to only measure GPU-specific metrics, your `metrics_sources` will look like:


		The `badmin_perfmon` metric source collects fata from the `badmin perfmon view -json` command. This collects [overall statistics][12] about the cluster. To collect these metrics, performance collection must be enabled on your server using the `badmin perfmon start <COLLECTION_INTERVAL>` command. By default, the integration will run this command automatically (and stop collection once the agent is turned off). However, you can turn off this behavior by setting `badmin_perfmon_auto: false`.

		Since collecting these metrics can add extra load on your server, we recommend setting a higher collection interval for these metrics, or at least 60. The exact depends on the load and size of your cluster. View IBM Spectrum LSF's [recommendations][13] for managing high query load.


		Since collecting these metrics can add extra load on your server, we recommend setting a higher collection interval for these metrics, or at least 60. The exact depends on the load and size of your cluster. View IBM Spectrum LSF's [recommendations][13] for managing high query load.

		Similarly, the `bhist` command collects information about completed jobs, which can be query intensive so we recommend monitoring this command with the `min_collection_interval` set to 60.

[AI-5939] Update readme and metrics documentation #22069

Are you sure you want to change the base?

[AI-5939] Update readme and metrics documentation #22069

Uh oh!

Conversation

sarah-witt commented Dec 5, 2025

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

codecov bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

iadjivon commented Dec 5, 2025

github-actions bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

evazorro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Labels

6 participants

codecov bot commented Dec 5, 2025 •

edited

Loading

github-actions bot commented Dec 5, 2025 •

edited

Loading