Skip to content

Conversation

@Jolanrensen
Copy link
Collaborator

adds sc: JavaSparkSession to KSparkSession

adds (Java)RDD.toDS() functions

includes tests

@Jolanrensen Jolanrensen requested a review from asm0dey February 21, 2022 16:57
@Jolanrensen Jolanrensen requested a review from asm0dey February 22, 2022 12:07
@Jolanrensen
Copy link
Collaborator Author

So, can I merge it? :) @asm0dey

inline fun <reified T> List<T>.toDS() = toDS(spark)
inline fun <reified T> Array<T>.toDS() = spark.dsOf(*this)
inline fun <reified T> dsOf(vararg arg: T) = spark.dsOf(*arg)
inline fun <reified T> RDD<T>.toDS() = toDS(spark)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should T be Serializable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't hold for all right? For instance primitives, are those Serializable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there can't be generic with primitive inside

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, so Int and other primitves that do not implement Serializable are allowed when having inline fun <reified T : Serializable> RDD<T>.toDS() = toDS(spark)? That's news to me. But useful!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they will be boxed to serializable Integer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'll add it to broadcast too!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no wait that doesn't hold. You can broadcast a non Serializable List for instance. And... you can also make an RDD of a non Serializable List

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List itself is not serializable, but its implementations are

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, but enforcing it to be Serializable requires users to for instance wrap their list like ArrayList(listOf(1, 2, 3)) which is not ideal...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nono, if "just" listOf doesn't work — let's not implement it.

it.nullable() shouldBe true
}
}
should("Easily convert a (Java)RDD to a Dataset") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this huge test should be split into smaller ones?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@asm0dey
Copy link
Contributor

asm0dey commented Feb 23, 2022

You will hate me, @Jolanrensen, but I've added more comments :)

@Jolanrensen
Copy link
Collaborator Author

Haha they're always welcome ;)

Added kotlin official code style prop for later
@Jolanrensen
Copy link
Collaborator Author

Alright, after this is done checking I'll merge it.

@Jolanrensen Jolanrensen merged commit 165c1b0 into spark-3.2 Feb 24, 2022
@Jolanrensen Jolanrensen added this to the 1.1.0 milestone May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants