[SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab #41709

dongjoon-hyun · 2023-06-23T03:32:55Z

What changes were proposed in this pull request?

This PR aims to support Heap Histogram column in Executor tab.

Why are the changes needed?

Like Thread Dump column, this is very helpful when we analyze executor live JVM status.

Does this PR introduce any user-facing change?

Yes, but this is a new column and we provide spark.ui.heapHistogramEnabled configuration like spark.ui.threadDumpsEnabled.

How was this patch tested?

Manual review.

dongjoon-hyun · 2023-06-23T03:39:28Z

Hi, @viirya . This is the Heap Histogram PR. Could you review this when you have some time?

dongjoon-hyun · 2023-06-23T03:53:12Z

core/src/main/scala/org/apache/spark/internal/config/UI.scala

+ val UI_HEAP_HISTOGRAM_ENABLED = ConfigBuilder("spark.ui.heapHistogramEnabled")
+ .version("3.5.0")
+ .booleanConf
+ .createWithDefault(SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_11))


Java 8 JRE doesn't have jmap. For Java 8 JDK users, they are able this manually.

Here is the answer for jmap question, @viirya .

viirya · 2023-06-23T04:36:40Z

core/src/main/scala/org/apache/spark/storage/BlockManagerStorageEndpoint.scala

 context.reply(Utils.getThreadDump())

+ case TriggerHeapHistogram =>
+ context.reply(Utils.getHeapHistogram())


Hmm, what if executor node doesn't have jmap installed?

All Java 11+ has jmap because they are JDK~

viirya · 2023-06-23T04:48:56Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+ /** Return a heap dump. Used to capture dumps for the web UI */
+ def getHeapHistogram(): Array[String] = {
+ val pid = String.valueOf(ProcessHandle.current().pid())
+ val builder = new ProcessBuilder("jmap", "-histo:live", pid)


If we get error from executing it, can we get the error back?

executeAndGetOutput seems can handle it?

Which error do you have in your mind? At this layer, we don't handle it like getThreadDump. For example, getThreadDump can throw SecurityException and UnsupportedOperationException, but we ignore them.

The error message will go to stderr of Executor log.

I took a look at executeAndGetOutput, but it's the same. It throws SparkException instead of get the error back.

spark/core/src/main/scala/org/apache/spark/util/Utils.scala

Lines 1378 to 1381 in 7398e93

if (exitCode != 0) {

logError(s"Process $command exited with code $exitCode: $output")

throw new SparkException(s"Process $command exited with code $exitCode")

}

should it respect JAVA_HOME then PATH?

Like you pointed, there is no problem to run jmap, isn't it, @pan3793 ? IIRC, jmap is just CLI command without any format change. Especially, if you are saying JDKs here.

I suppose the $JAVA_HOME/bin/jmap(in above case, should be /opt/openjdk-11/bin/jmap instead of /opt/openjdk-8/bin/jmap) should be used, instead of the one first present in PATH.

And if only JAVA_HOME is set, but no jmap in PATH, the invocation will fail. e.g.

The JDK was installed by TGZ, it was unarchived to /opt/openjdk-11 whithout exposing to PATH nor creating softlink to /usr/bin, then Spark executor proccess respect JAVA_HOME to find $JAVA_HOME/bin/java but the subprocess can not find jmap even it knows where JAVA_HOME is.

@pan3793 . Here, the contract on jmap is just one of available binary, not the same JDK's jmap.

However, I'm interested in this case. Could you make a small reproducer with Docker image?

Spark executor proccess respect JAVA_HOME to find $JAVA_HOME/bin/java but the subprocess can not find jmap even it knows where JAVA_HOME is.

@dongjoon-hyun This is an issue - we should use $JAVA_HOME/bin/jmap (more specifically whatever comes from System.getProperty("java.home")), not the first jmap which happens to be in the PATH. It is common to override JAVA_HOME to specify the java version to be used explicitly (or even to not have jdk in the PATH at all).

Also, there is no compatibility gaurantees that I am aware of between different versions of jdk and jmap (for example, jdk11 jmap against jdk17 or vice versa) - if I missed any, please do let me know !

Got it. Let me take a look at that perspective, @mridulm and @pan3793 .

In the worst case, I'll limit this feature to K8s only environment because this was initially developed for K8s environment first.

dongjoon-hyun · 2023-06-23T05:02:54Z

core/src/main/scala/org/apache/spark/ui/exec/ExecutorHeapHistogramPage.scala

+ <td></td>
+ </tr>
+ case _ =>
+ // Ignore the first two lines and the last line


In addition, we will ignore all irregular messages in addition to these three lines.

So the error will go to Executor log, and in UI it just doesn't show heap dump (empty)?

Yes, correct. We log and return None for Option[Array[String]]. So, the heap histogram UI will be empty. When the executor JVM is dead for some other reasons before driver notices it, it'll be handled in the same way.

spark/core/src/main/scala/org/apache/spark/SparkContext.scala

Lines 756 to 775 in 284029b

private[spark] def getExecutorHeapHistogram(executorId: String): Option[Array[String]] = {

try {

if (executorId == SparkContext.DRIVER_IDENTIFIER) {

Some(Utils.getHeapHistogram())

} else {

env.blockManager.master.getExecutorEndpointRef(executorId) match {

case Some(endpointRef) =>

Some(endpointRef.askSync[Array[String]](TriggerHeapHistogram))

case None =>

logWarning(s"Executor $executorId might already have stopped and " +

"can not request heap histogram from it.")

None

}

}

} catch {

case e: Exception =>

logError(s"Exception getting heap histogram from executor $executorId", e)

None

}

}

dongjoon-hyun

Oops. Sorry, @viirya . I forgot that ProcessHandle exists only from Java 9+. Let me convert this to Draft.

dongjoon-hyun

I converted back from Draft to normal. The last commit has Java8 compatible way.

- val pid = String.valueOf(ProcessHandle.current().pid()) + // From Java 9+, we can use 'ProcessHandle.current().pid()' + val pid = getProcessName().split("@").head

dongjoon-hyun · 2023-06-23T05:55:13Z

Thank you for review and approval, @viirya !

dongjoon-hyun · 2023-06-23T07:49:54Z

Merged to master for Apache Spark 3.5.0.
Thank you, @viirya and @pan3793 .

mridulm · 2023-06-24T15:45:57Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+ // From Java 9+, we can use 'ProcessHandle.current().pid()'
+ val pid = getProcessName().split("@").head
+ val builder = new ProcessBuilder("jmap", "-histo:live", pid)
+ builder.redirectErrorStream(true)


Log errors in invocation to executor logs instead of sending it to driver as response ?

~~That's totally different feature. This is designed as a part of Spark Driver UI, @mridulm .~~
Sorry, I misunderstood this request. Sure, I'll try.

mridulm · 2023-06-24T15:51:34Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+ if (line.nonEmpty) rows += line
+ line = r.readLine()
+ }
+ rows.toArray


Use IOUtils.readLines or Source.getLines instead ?

Use IOUtils.readLines or Source.getLines instead ?

For this one, I was thinking about adding a new configuration to limit the results like Top 100 or Top 1000. I'll handle this with that new configuration together.

mridulm · 2023-06-24T15:58:37Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+ val builder = new ProcessBuilder("jmap", "-histo:live", pid)
+ builder.redirectErrorStream(true)
+ val p = builder.start()
+ val r = new BufferedReader(new InputStreamReader(p.getInputStream()))


nit: This reader is not closed and/or we are not doing waitFor on the process.

### What changes were proposed in this pull request? This is a follow-up of #41709 to address the review comments. ### Why are the changes needed? 1. Use `JAVA_HOME` prefixed `jmap` to ensure the same version's `JVM` and JMAP. 2. Use the existing stderr instead of merging `stderr` and `stdout` via `redirectErrorStream` 3. Use `tryWithResource`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes #41731 from dongjoon-hyun/SPARK-44153-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…` tab ### What changes were proposed in this pull request? This PR aims to support `Heap Histogram` column in `Executor` tab. ### Why are the changes needed? Like `Thread Dump` column, this is very helpful when we analyze executor live JVM status. ![Screenshot 2023-06-22 at 8 37 55 PM](https://github.com/apache/spark/assets/9700541/741c8deb-23ff-463d-8b1e-7c2e53d0b59f) ![Screenshot 2023-06-22 at 8 38 34 PM](https://github.com/apache/spark/assets/9700541/93f77f42-48b5-41fa-94ab-ea675f576331) ### Does this PR introduce _any_ user-facing change? Yes, but this is a new column and we provide `spark.ui.heapHistogramEnabled` configuration like `spark.ui.threadDumpsEnabled`. ### How was this patch tested? Manual review. Closes apache#41709 from dongjoon-hyun/SPARK-44153. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? This is a follow-up of apache#41709 to address the review comments. ### Why are the changes needed? 1. Use `JAVA_HOME` prefixed `jmap` to ensure the same version's `JVM` and JMAP. 2. Use the existing stderr instead of merging `stderr` and `stdout` via `redirectErrorStream` 3. Use `tryWithResource`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes apache#41731 from dongjoon-hyun/SPARK-44153-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…` tab ### What changes were proposed in this pull request? This PR aims to support `Heap Histogram` column in `Executor` tab. ### Why are the changes needed? Like `Thread Dump` column, this is very helpful when we analyze executor live JVM status. ![Screenshot 2023-06-22 at 8 37 55 PM](https://github.com/apache/spark/assets/9700541/741c8deb-23ff-463d-8b1e-7c2e53d0b59f) ![Screenshot 2023-06-22 at 8 38 34 PM](https://github.com/apache/spark/assets/9700541/93f77f42-48b5-41fa-94ab-ea675f576331) ### Does this PR introduce _any_ user-facing change? Yes, but this is a new column and we provide `spark.ui.heapHistogramEnabled` configuration like `spark.ui.threadDumpsEnabled`. ### How was this patch tested? Manual review. Closes apache#41709 from dongjoon-hyun/SPARK-44153. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit cd69d4d)

### What changes were proposed in this pull request? This is a follow-up of apache#41709 to address the review comments. ### Why are the changes needed? 1. Use `JAVA_HOME` prefixed `jmap` to ensure the same version's `JVM` and JMAP. 2. Use the existing stderr instead of merging `stderr` and `stdout` via `redirectErrorStream` 3. Use `tryWithResource`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes apache#41731 from dongjoon-hyun/SPARK-44153-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 646388e)

### What changes were proposed in this pull request? This PR aims to use JDK for Spark 3.5+ Docker image. Apache Spark Dockerfile are updated already. - apache/spark#45762 - apache/spark#45761 ### Why are the changes needed? Since Apache Spark 3.5.0, SPARK-44153 starts to use `jmap` like the following. - apache/spark#41709 https://github.com/apache/spark/blob/c832e2ac1d04668c77493577662c639785808657/core/src/main/scala/org/apache/spark/util/Utils.scala#L2030 ### Does this PR introduce _any_ user-facing change? Yes, the user can use `Heap Histogram` feature. ### How was this patch tested? Pass the CIs. Closes #66 from dongjoon-hyun/SPARK-49701. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

[SPARK-44153][CORE][UI] Support 'Heap Histogram' column in Executor tab

2bfbdfd

github-actions bot added CORE WEB UI labels Jun 23, 2023

dongjoon-hyun changed the title ~~[SPARK-44153][CORE][UI] Support 'Heap Histogram' column in Executor tab~~ [SPARK-44153][CORE][UI] Support Heap Histogram column in Executor tab Jun 23, 2023

dongjoon-hyun changed the title ~~[SPARK-44153][CORE][UI] Support Heap Histogram column in Executor tab~~ [SPARK-44153][CORE][UI] Support Heap Histogram column in Executors tab Jun 23, 2023

Enable by default from Java 11

a0ef3a4

dongjoon-hyun commented Jun 23, 2023

View reviewed changes

viirya reviewed Jun 23, 2023

View reviewed changes

dongjoon-hyun commented Jun 23, 2023

View reviewed changes

dongjoon-hyun marked this pull request as draft June 23, 2023 05:06

Use Java 8 compatible way to get pid

284029b

dongjoon-hyun marked this pull request as ready for review June 23, 2023 05:30

dongjoon-hyun commented Jun 23, 2023

View reviewed changes

viirya approved these changes Jun 23, 2023

View reviewed changes

dongjoon-hyun closed this in cd69d4d Jun 23, 2023

dongjoon-hyun deleted the SPARK-44153 branch June 23, 2023 07:49

mridulm reviewed Jun 24, 2023

View reviewed changes

dongjoon-hyun mentioned this pull request Jun 26, 2023

[SPARK-44153][CORE][UI][FOLLOWUP] Use JAVA_HOME and stderr #41731

Closed

dongjoon-hyun mentioned this pull request Sep 18, 2024

[SPARK-49701] Use JDK for Spark 3.5+ Docker image apache/spark-docker#66

Closed

pan3793 mentioned this pull request Oct 20, 2025

[SPARK-53955][CORE][3.5] Prefer to detect Java Home from env JAVA_HOME on finding jmap for JDK 8 #52665

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab #41709

[SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab #41709

Uh oh!

dongjoon-hyun commented Jun 23, 2023 •

edited

Loading

dongjoon-hyun commented Jun 23, 2023

dongjoon-hyun Jun 23, 2023 •

edited

Loading

dongjoon-hyun Jun 23, 2023

viirya Jun 23, 2023

dongjoon-hyun Jun 23, 2023

viirya Jun 23, 2023

dongjoon-hyun Jun 23, 2023

dongjoon-hyun Jun 23, 2023

dongjoon-hyun Jun 23, 2023

pan3793 Jun 23, 2023

dongjoon-hyun Jun 23, 2023 •

edited

Loading

pan3793 Jun 23, 2023 •

edited

Loading

dongjoon-hyun Jun 23, 2023 •

edited

Loading

mridulm Jun 24, 2023 •

edited

Loading

dongjoon-hyun Jun 25, 2023

dongjoon-hyun Jun 23, 2023

viirya Jun 23, 2023

dongjoon-hyun Jun 23, 2023 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun left a comment

dongjoon-hyun commented Jun 23, 2023

dongjoon-hyun commented Jun 23, 2023

mridulm Jun 24, 2023

dongjoon-hyun Jun 25, 2023 •

edited

Loading

mridulm Jun 24, 2023

dongjoon-hyun Jun 26, 2023

mridulm Jun 24, 2023 •

edited

Loading

Labels

4 participants

	if (exitCode != 0) {
	logError(s"Process $command exited with code $exitCode: $output")
	throw new SparkException(s"Process $command exited with code $exitCode")
	}

	private[spark] def getExecutorHeapHistogram(executorId: String): Option[Array[String]] = {
	try {
	if (executorId == SparkContext.DRIVER_IDENTIFIER) {
	Some(Utils.getHeapHistogram())
	} else {
	env.blockManager.master.getExecutorEndpointRef(executorId) match {
	case Some(endpointRef) =>
	Some(endpointRef.askSync[Array[String]](TriggerHeapHistogram))
	case None =>
	logWarning(s"Executor $executorId might already have stopped and " +
	"can not request heap histogram from it.")
	None
	}
	}
	} catch {
	case e: Exception =>
	logError(s"Exception getting heap histogram from executor $executorId", e)
	None
	}
	}

[SPARK-44153][CORE][UI] Support Heap Histogram column in Executors tab #41709

[SPARK-44153][CORE][UI] Support Heap Histogram column in Executors tab #41709

Uh oh!

Conversation

dongjoon-hyun commented Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

dongjoon-hyun commented Jun 23, 2023

dongjoon-hyun Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

pan3793 Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

dongjoon-hyun Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

mridulm Jun 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Jun 23, 2023

dongjoon-hyun commented Jun 23, 2023

Choose a reason for hiding this comment

dongjoon-hyun Jun 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mridulm Jun 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Labels

4 participants

[SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab #41709

[SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab #41709

dongjoon-hyun commented Jun 23, 2023 •

edited

Loading

dongjoon-hyun Jun 23, 2023 •

edited

Loading

dongjoon-hyun Jun 23, 2023 •

edited

Loading

pan3793 Jun 23, 2023 •

edited

Loading

dongjoon-hyun Jun 23, 2023 •

edited

Loading

mridulm Jun 24, 2023 •

edited

Loading

dongjoon-hyun Jun 23, 2023 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun Jun 25, 2023 •

edited

Loading

mridulm Jun 24, 2023 •

edited

Loading