Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 18, 2024

What changes were proposed in this pull request?

This PR aims to use JDK for Spark 3.5+ Docker image. Apache Spark Dockerfile are updated already.

Why are the changes needed?

Since Apache Spark 3.5.0, SPARK-44153 starts to use jmap like the following.

https://github.com/apache/spark/blob/c832e2ac1d04668c77493577662c639785808657/core/src/main/scala/org/apache/spark/util/Utils.scala#L2030

Does this PR introduce any user-facing change?

Yes, the user can use Heap Histogram feature.

How was this patch tested?

Pass the CIs.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya .

@dongjoon-hyun
Copy link
Member Author

Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-49701 branch September 18, 2024 23:08
@dongjoon-hyun
Copy link
Member Author

For the record, the fixed images are released.

$ docker run -it --rm apache/spark:4.0.0-preview1 jmap | head -n3 Usage: jmap -clstats <pid> to connect to running process and print class loader statistics 
dongjoon-hyun added a commit to apache/spark-kubernetes-operator that referenced this pull request Sep 19, 2024
### What changes were proposed in this pull request? This PR aims to propose to use `apache/spark` images instead of `spark` because `apache/spark` images are published first. For example, the following are only available in `apache/spark` as of now. - apache/spark-docker#66 - apache/spark-docker#67 - apache/spark-docker#68 ### Why are the changes needed? To apply the latest bits earlier. ### Does this PR introduce _any_ user-facing change? There is no change from `Apache Spark K8s Operator`. Only the underlying images are changed. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #128 from dongjoon-hyun/SPARK-49706. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@anish97IND
Copy link

Hi Team, think similar changes is required for Java 11 as well. At the moment we are using spark:3.5.2-scala2.12-java11-python3-ubuntu. We wanted to check heap dump but ended up into issue , It show something like below -:
image

We went aheads and checked the Java installed within docker image to get Jmap by doing an kubectl exec -ti <executor-pod> sh - however even within the java installed we did not find heap dump. Will the change in java version work for 3.5.2 .

@dongjoon-hyun
Copy link
Member Author

To @anish97IND , please use the latest Spark 3.5.3 images instead of 3.5.2.

FYI, there are correctness fixes in Apache Spark 3.5.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants