- Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-44153][CORE][UI] Support Heap Histogram column in Executors tab #41709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Heap Histogram column in Executor tab | Hi, @viirya . This is the |
Heap Histogram column in Executor tabHeap Histogram column in Executors tab | val UI_HEAP_HISTOGRAM_ENABLED = ConfigBuilder("spark.ui.heapHistogramEnabled") | ||
| .version("3.5.0") | ||
| .booleanConf | ||
| .createWithDefault(SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_11)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java 8 JRE doesn't have jmap. For Java 8 JDK users, they are able this manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the answer for jmap question, @viirya .
| context.reply(Utils.getThreadDump()) | ||
| | ||
| case TriggerHeapHistogram => | ||
| context.reply(Utils.getHeapHistogram()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what if executor node doesn't have jmap installed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All Java 11+ has jmap because they are JDK~
| /** Return a heap dump. Used to capture dumps for the web UI */ | ||
| def getHeapHistogram(): Array[String] = { | ||
| val pid = String.valueOf(ProcessHandle.current().pid()) | ||
| val builder = new ProcessBuilder("jmap", "-histo:live", pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get error from executing it, can we get the error back?
executeAndGetOutput seems can handle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which error do you have in your mind? At this layer, we don't handle it like getThreadDump. For example, getThreadDump can throw SecurityException and UnsupportedOperationException, but we ignore them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message will go to stderr of Executor log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at executeAndGetOutput, but it's the same. It throws SparkException instead of get the error back.
spark/core/src/main/scala/org/apache/spark/util/Utils.scala
Lines 1378 to 1381 in 7398e93
| if (exitCode != 0) { | |
| logError(s"Process $command exited with code $exitCode: $output") | |
| throw new SparkException(s"Process $command exited with code $exitCode") | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it respect JAVA_HOME then PATH?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like you pointed, there is no problem to run jmap, isn't it, @pan3793 ? IIRC, jmap is just CLI command without any format change. Especially, if you are saying JDKs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the $JAVA_HOME/bin/jmap(in above case, should be /opt/openjdk-11/bin/jmap instead of /opt/openjdk-8/bin/jmap) should be used, instead of the one first present in PATH.
And if only JAVA_HOME is set, but no jmap in PATH, the invocation will fail. e.g.
The JDK was installed by TGZ, it was unarchived to /opt/openjdk-11 whithout exposing to PATH nor creating softlink to /usr/bin, then Spark executor proccess respect JAVA_HOME to find $JAVA_HOME/bin/java but the subprocess can not find jmap even it knows where JAVA_HOME is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pan3793 . Here, the contract on jmap is just one of available binary, not the same JDK's jmap.
However, I'm interested in this case. Could you make a small reproducer with Docker image?
Spark executor proccess respect JAVA_HOME to find $JAVA_HOME/bin/java but the subprocess can not find jmap even it knows where JAVA_HOME is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun This is an issue - we should use $JAVA_HOME/bin/jmap (more specifically whatever comes from System.getProperty("java.home")), not the first jmap which happens to be in the PATH. It is common to override JAVA_HOME to specify the java version to be used explicitly (or even to not have jdk in the PATH at all).
Also, there is no compatibility gaurantees that I am aware of between different versions of jdk and jmap (for example, jdk11 jmap against jdk17 or vice versa) - if I missed any, please do let me know !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <td></td> | ||
| </tr> | ||
| case _ => | ||
| // Ignore the first two lines and the last line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, we will ignore all irregular messages in addition to these three lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the error will go to Executor log, and in UI it just doesn't show heap dump (empty)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct. We log and return None for Option[Array[String]]. So, the heap histogram UI will be empty. When the executor JVM is dead for some other reasons before driver notices it, it'll be handled in the same way.
spark/core/src/main/scala/org/apache/spark/SparkContext.scala
Lines 756 to 775 in 284029b
| private[spark] def getExecutorHeapHistogram(executorId: String): Option[Array[String]] = { | |
| try { | |
| if (executorId == SparkContext.DRIVER_IDENTIFIER) { | |
| Some(Utils.getHeapHistogram()) | |
| } else { | |
| env.blockManager.master.getExecutorEndpointRef(executorId) match { | |
| case Some(endpointRef) => | |
| Some(endpointRef.askSync[Array[String]](TriggerHeapHistogram)) | |
| case None => | |
| logWarning(s"Executor $executorId might already have stopped and " + | |
| "can not request heap histogram from it.") | |
| None | |
| } | |
| } | |
| } catch { | |
| case e: Exception => | |
| logError(s"Exception getting heap histogram from executor $executorId", e) | |
| None | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Sorry, @viirya . I forgot that ProcessHandle exists only from Java 9+. Let me convert this to Draft.
dongjoon-hyun left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I converted back from Draft to normal. The last commit has Java8 compatible way.
- val pid = String.valueOf(ProcessHandle.current().pid()) + // From Java 9+, we can use 'ProcessHandle.current().pid()' + val pid = getProcessName().split("@").head | Thank you for review and approval, @viirya ! |
| // From Java 9+, we can use 'ProcessHandle.current().pid()' | ||
| val pid = getProcessName().split("@").head | ||
| val builder = new ProcessBuilder("jmap", "-histo:live", pid) | ||
| builder.redirectErrorStream(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log errors in invocation to executor logs instead of sending it to driver as response ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's totally different feature. This is designed as a part of Spark Driver UI, @mridulm .
Sorry, I misunderstood this request. Sure, I'll try.
| if (line.nonEmpty) rows += line | ||
| line = r.readLine() | ||
| } | ||
| rows.toArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use IOUtils.readLines or Source.getLines instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use
IOUtils.readLinesorSource.getLinesinstead ?
For this one, I was thinking about adding a new configuration to limit the results like Top 100 or Top 1000. I'll handle this with that new configuration together.
| val builder = new ProcessBuilder("jmap", "-histo:live", pid) | ||
| builder.redirectErrorStream(true) | ||
| val p = builder.start() | ||
| val r = new BufferedReader(new InputStreamReader(p.getInputStream())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This reader is not closed and/or we are not doing waitFor on the process.
### What changes were proposed in this pull request? This is a follow-up of #41709 to address the review comments. ### Why are the changes needed? 1. Use `JAVA_HOME` prefixed `jmap` to ensure the same version's `JVM` and JMAP. 2. Use the existing stderr instead of merging `stderr` and `stdout` via `redirectErrorStream` 3. Use `tryWithResource`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes #41731 from dongjoon-hyun/SPARK-44153-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…` tab ### What changes were proposed in this pull request? This PR aims to support `Heap Histogram` column in `Executor` tab. ### Why are the changes needed? Like `Thread Dump` column, this is very helpful when we analyze executor live JVM status.   ### Does this PR introduce _any_ user-facing change? Yes, but this is a new column and we provide `spark.ui.heapHistogramEnabled` configuration like `spark.ui.threadDumpsEnabled`. ### How was this patch tested? Manual review. Closes apache#41709 from dongjoon-hyun/SPARK-44153. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? This is a follow-up of apache#41709 to address the review comments. ### Why are the changes needed? 1. Use `JAVA_HOME` prefixed `jmap` to ensure the same version's `JVM` and JMAP. 2. Use the existing stderr instead of merging `stderr` and `stdout` via `redirectErrorStream` 3. Use `tryWithResource`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes apache#41731 from dongjoon-hyun/SPARK-44153-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…` tab ### What changes were proposed in this pull request? This PR aims to support `Heap Histogram` column in `Executor` tab. ### Why are the changes needed? Like `Thread Dump` column, this is very helpful when we analyze executor live JVM status.   ### Does this PR introduce _any_ user-facing change? Yes, but this is a new column and we provide `spark.ui.heapHistogramEnabled` configuration like `spark.ui.threadDumpsEnabled`. ### How was this patch tested? Manual review. Closes apache#41709 from dongjoon-hyun/SPARK-44153. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit cd69d4d)
### What changes were proposed in this pull request? This is a follow-up of apache#41709 to address the review comments. ### Why are the changes needed? 1. Use `JAVA_HOME` prefixed `jmap` to ensure the same version's `JVM` and JMAP. 2. Use the existing stderr instead of merging `stderr` and `stdout` via `redirectErrorStream` 3. Use `tryWithResource`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes apache#41731 from dongjoon-hyun/SPARK-44153-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 646388e)
### What changes were proposed in this pull request? This PR aims to use JDK for Spark 3.5+ Docker image. Apache Spark Dockerfile are updated already. - apache/spark#45762 - apache/spark#45761 ### Why are the changes needed? Since Apache Spark 3.5.0, SPARK-44153 starts to use `jmap` like the following. - apache/spark#41709 https://github.com/apache/spark/blob/c832e2ac1d04668c77493577662c639785808657/core/src/main/scala/org/apache/spark/util/Utils.scala#L2030 ### Does this PR introduce _any_ user-facing change? Yes, the user can use `Heap Histogram` feature. ### How was this patch tested? Pass the CIs. Closes #66 from dongjoon-hyun/SPARK-49701. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR aims to support
Heap Histogramcolumn inExecutortab.Why are the changes needed?
Like
Thread Dumpcolumn, this is very helpful when we analyze executor live JVM status.Does this PR introduce any user-facing change?
Yes, but this is a new column and we provide
spark.ui.heapHistogramEnabledconfiguration likespark.ui.threadDumpsEnabled.How was this patch tested?
Manual review.