Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jun 23, 2023

What changes were proposed in this pull request?

This PR aims to support Heap Histogram column in Executor tab.

Why are the changes needed?

Like Thread Dump column, this is very helpful when we analyze executor live JVM status.

Screenshot 2023-06-22 at 8 37 55 PM

Screenshot 2023-06-22 at 8 38 34 PM

Does this PR introduce any user-facing change?

Yes, but this is a new column and we provide spark.ui.heapHistogramEnabled configuration like spark.ui.threadDumpsEnabled.

How was this patch tested?

Manual review.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-44153][CORE][UI] Support 'Heap Histogram' column in Executor tab [SPARK-44153][CORE][UI] Support Heap Histogram column in Executor tab Jun 23, 2023
@dongjoon-hyun
Copy link
Member Author

Hi, @viirya . This is the Heap Histogram PR. Could you review this when you have some time?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-44153][CORE][UI] Support Heap Histogram column in Executor tab [SPARK-44153][CORE][UI] Support Heap Histogram column in Executors tab Jun 23, 2023
val UI_HEAP_HISTOGRAM_ENABLED = ConfigBuilder("spark.ui.heapHistogramEnabled")
.version("3.5.0")
.booleanConf
.createWithDefault(SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_11))
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java 8 JRE doesn't have jmap. For Java 8 JDK users, they are able this manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the answer for jmap question, @viirya .

context.reply(Utils.getThreadDump())

case TriggerHeapHistogram =>
context.reply(Utils.getHeapHistogram())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what if executor node doesn't have jmap installed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Java 11+ has jmap because they are JDK~

/** Return a heap dump. Used to capture dumps for the web UI */
def getHeapHistogram(): Array[String] = {
val pid = String.valueOf(ProcessHandle.current().pid())
val builder = new ProcessBuilder("jmap", "-histo:live", pid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get error from executing it, can we get the error back?

executeAndGetOutput seems can handle it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which error do you have in your mind? At this layer, we don't handle it like getThreadDump. For example, getThreadDump can throw SecurityException and UnsupportedOperationException, but we ignore them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message will go to stderr of Executor log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at executeAndGetOutput, but it's the same. It throws SparkException instead of get the error back.

if (exitCode != 0) {
logError(s"Process $command exited with code $exitCode: $output")
throw new SparkException(s"Process $command exited with code $exitCode")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it respect JAVA_HOME then PATH?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you pointed, there is no problem to run jmap, isn't it, @pan3793 ? IIRC, jmap is just CLI command without any format change. Especially, if you are saying JDKs here.

Copy link
Member

@pan3793 pan3793 Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the $JAVA_HOME/bin/jmap(in above case, should be /opt/openjdk-11/bin/jmap instead of /opt/openjdk-8/bin/jmap) should be used, instead of the one first present in PATH.

And if only JAVA_HOME is set, but no jmap in PATH, the invocation will fail. e.g.

The JDK was installed by TGZ, it was unarchived to /opt/openjdk-11 whithout exposing to PATH nor creating softlink to /usr/bin, then Spark executor proccess respect JAVA_HOME to find $JAVA_HOME/bin/java but the subprocess can not find jmap even it knows where JAVA_HOME is.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan3793 . Here, the contract on jmap is just one of available binary, not the same JDK's jmap.

However, I'm interested in this case. Could you make a small reproducer with Docker image?

Spark executor proccess respect JAVA_HOME to find $JAVA_HOME/bin/java but the subprocess can not find jmap even it knows where JAVA_HOME is.

Copy link
Contributor

@mridulm mridulm Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun This is an issue - we should use $JAVA_HOME/bin/jmap (more specifically whatever comes from System.getProperty("java.home")), not the first jmap which happens to be in the PATH. It is common to override JAVA_HOME to specify the java version to be used explicitly (or even to not have jdk in the PATH at all).

Also, there is no compatibility gaurantees that I am aware of between different versions of jdk and jmap (for example, jdk11 jmap against jdk17 or vice versa) - if I missed any, please do let me know !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let me take a look at that perspective, @mridulm and @pan3793 .

In the worst case, I'll limit this feature to K8s only environment because this was initially developed for K8s environment first.

<td></td>
</tr>
case _ =>
// Ignore the first two lines and the last line
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, we will ignore all irregular messages in addition to these three lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the error will go to Executor log, and in UI it just doesn't show heap dump (empty)?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. We log and return None for Option[Array[String]]. So, the heap histogram UI will be empty. When the executor JVM is dead for some other reasons before driver notices it, it'll be handled in the same way.

private[spark] def getExecutorHeapHistogram(executorId: String): Option[Array[String]] = {
try {
if (executorId == SparkContext.DRIVER_IDENTIFIER) {
Some(Utils.getHeapHistogram())
} else {
env.blockManager.master.getExecutorEndpointRef(executorId) match {
case Some(endpointRef) =>
Some(endpointRef.askSync[Array[String]](TriggerHeapHistogram))
case None =>
logWarning(s"Executor $executorId might already have stopped and " +
"can not request heap histogram from it.")
None
}
}
} catch {
case e: Exception =>
logError(s"Exception getting heap histogram from executor $executorId", e)
None
}
}

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Sorry, @viirya . I forgot that ProcessHandle exists only from Java 9+. Let me convert this to Draft.

@dongjoon-hyun dongjoon-hyun marked this pull request as draft June 23, 2023 05:06
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review June 23, 2023 05:30
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I converted back from Draft to normal. The last commit has Java8 compatible way.

- val pid = String.valueOf(ProcessHandle.current().pid()) + // From Java 9+, we can use 'ProcessHandle.current().pid()' + val pid = getProcessName().split("@").head 
@dongjoon-hyun
Copy link
Member Author

Thank you for review and approval, @viirya !

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 3.5.0.
Thank you, @viirya and @pan3793 .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-44153 branch June 23, 2023 07:49
// From Java 9+, we can use 'ProcessHandle.current().pid()'
val pid = getProcessName().split("@").head
val builder = new ProcessBuilder("jmap", "-histo:live", pid)
builder.redirectErrorStream(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log errors in invocation to executor logs instead of sending it to driver as response ?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's totally different feature. This is designed as a part of Spark Driver UI, @mridulm .
Sorry, I misunderstood this request. Sure, I'll try.

if (line.nonEmpty) rows += line
line = r.readLine()
}
rows.toArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IOUtils.readLines or Source.getLines instead ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IOUtils.readLines or Source.getLines instead ?

For this one, I was thinking about adding a new configuration to limit the results like Top 100 or Top 1000. I'll handle this with that new configuration together.

val builder = new ProcessBuilder("jmap", "-histo:live", pid)
builder.redirectErrorStream(true)
val p = builder.start()
val r = new BufferedReader(new InputStreamReader(p.getInputStream()))
Copy link
Contributor

@mridulm mridulm Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This reader is not closed and/or we are not doing waitFor on the process.

dongjoon-hyun added a commit that referenced this pull request Jun 26, 2023
### What changes were proposed in this pull request? This is a follow-up of #41709 to address the review comments. ### Why are the changes needed? 1. Use `JAVA_HOME` prefixed `jmap` to ensure the same version's `JVM` and JMAP. 2. Use the existing stderr instead of merging `stderr` and `stdout` via `redirectErrorStream` 3. Use `tryWithResource`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes #41731 from dongjoon-hyun/SPARK-44153-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
…` tab ### What changes were proposed in this pull request? This PR aims to support `Heap Histogram` column in `Executor` tab. ### Why are the changes needed? Like `Thread Dump` column, this is very helpful when we analyze executor live JVM status. ![Screenshot 2023-06-22 at 8 37 55 PM](https://github.com/apache/spark/assets/9700541/741c8deb-23ff-463d-8b1e-7c2e53d0b59f) ![Screenshot 2023-06-22 at 8 38 34 PM](https://github.com/apache/spark/assets/9700541/93f77f42-48b5-41fa-94ab-ea675f576331) ### Does this PR introduce _any_ user-facing change? Yes, but this is a new column and we provide `spark.ui.heapHistogramEnabled` configuration like `spark.ui.threadDumpsEnabled`. ### How was this patch tested? Manual review. Closes apache#41709 from dongjoon-hyun/SPARK-44153. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
### What changes were proposed in this pull request? This is a follow-up of apache#41709 to address the review comments. ### Why are the changes needed? 1. Use `JAVA_HOME` prefixed `jmap` to ensure the same version's `JVM` and JMAP. 2. Use the existing stderr instead of merging `stderr` and `stdout` via `redirectErrorStream` 3. Use `tryWithResource`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes apache#41731 from dongjoon-hyun/SPARK-44153-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Mrhs121 pushed a commit to Mrhs121/spark that referenced this pull request Apr 17, 2024
…` tab ### What changes were proposed in this pull request? This PR aims to support `Heap Histogram` column in `Executor` tab. ### Why are the changes needed? Like `Thread Dump` column, this is very helpful when we analyze executor live JVM status. ![Screenshot 2023-06-22 at 8 37 55 PM](https://github.com/apache/spark/assets/9700541/741c8deb-23ff-463d-8b1e-7c2e53d0b59f) ![Screenshot 2023-06-22 at 8 38 34 PM](https://github.com/apache/spark/assets/9700541/93f77f42-48b5-41fa-94ab-ea675f576331) ### Does this PR introduce _any_ user-facing change? Yes, but this is a new column and we provide `spark.ui.heapHistogramEnabled` configuration like `spark.ui.threadDumpsEnabled`. ### How was this patch tested? Manual review. Closes apache#41709 from dongjoon-hyun/SPARK-44153. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit cd69d4d)
Mrhs121 pushed a commit to Mrhs121/spark that referenced this pull request Apr 17, 2024
### What changes were proposed in this pull request? This is a follow-up of apache#41709 to address the review comments. ### Why are the changes needed? 1. Use `JAVA_HOME` prefixed `jmap` to ensure the same version's `JVM` and JMAP. 2. Use the existing stderr instead of merging `stderr` and `stdout` via `redirectErrorStream` 3. Use `tryWithResource`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes apache#41731 from dongjoon-hyun/SPARK-44153-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 646388e)
dongjoon-hyun added a commit to apache/spark-docker that referenced this pull request Sep 18, 2024
### What changes were proposed in this pull request? This PR aims to use JDK for Spark 3.5+ Docker image. Apache Spark Dockerfile are updated already. - apache/spark#45762 - apache/spark#45761 ### Why are the changes needed? Since Apache Spark 3.5.0, SPARK-44153 starts to use `jmap` like the following. - apache/spark#41709 https://github.com/apache/spark/blob/c832e2ac1d04668c77493577662c639785808657/core/src/main/scala/org/apache/spark/util/Utils.scala#L2030 ### Does this PR introduce _any_ user-facing change? Yes, the user can use `Heap Histogram` feature. ### How was this patch tested? Pass the CIs. Closes #66 from dongjoon-hyun/SPARK-49701. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants