Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
c57eaaf
added streaming and exploring how it works with kotlin spark api
Jolanrensen Feb 18, 2022
bb39fc7
Adds helpful rdd to dataset conversion, as well as a new withSpark fu…
Jolanrensen Feb 21, 2022
4bd3fe1
makes javaRDD toDS function more generic.
Jolanrensen Feb 21, 2022
c0ead09
added encoders: Duration, Period, ByteArray (Binary, now actually wor…
Jolanrensen Feb 21, 2022
09e9bb5
Arity is now Serializable, removed sc.stop(), sc is now lazy, updates…
Jolanrensen Feb 22, 2022
597b6f1
copying over some other missing parts from ScalaReflection.scala. Did…
Jolanrensen Feb 22, 2022
0f585bd
serializing binary works!
Jolanrensen Feb 23, 2022
92ed60e
serializing binary works!
Jolanrensen Feb 23, 2022
1fb680b
fixed serializing CalendarInterval, added tests and fixes for Decimal…
Jolanrensen Feb 23, 2022
4f8ae68
updating all tests to shouldBe instead of just show
Jolanrensen Feb 23, 2022
38486bb
removed .show() from rdd test
Jolanrensen Feb 23, 2022
0acd3e2
Update README.md
Jolanrensen Feb 23, 2022
bcf99b8
split up rdd tests, added list test.
Jolanrensen Feb 24, 2022
165c1b0
Merge pull request #132 from JetBrains/rdd-related-helpers
Jolanrensen Feb 24, 2022
2fdba6a
Update docs generation
Jolanrensen Feb 24, 2022
5d785e0
added jira issue
Jolanrensen Feb 24, 2022
518d1a1
Update ApiTest.kt
Jolanrensen Feb 25, 2022
e620896
added encoders: Duration, Period, ByteArray (Binary, now actually wor…
Jolanrensen Feb 21, 2022
e466df6
copying over some other missing parts from ScalaReflection.scala. Did…
Jolanrensen Feb 22, 2022
1557fa4
serializing binary works!
Jolanrensen Feb 23, 2022
680e5b1
serializing binary works!
Jolanrensen Feb 23, 2022
ba0c452
fixed serializing CalendarInterval, added tests and fixes for Decimal…
Jolanrensen Feb 23, 2022
054d626
updating all tests to shouldBe instead of just show
Jolanrensen Feb 23, 2022
66dc40e
added jira issue
Jolanrensen Feb 24, 2022
31f56d8
rebasing on spark 3.2 branch
Jolanrensen Feb 25, 2022
9b1c2c9
Merge remote-tracking branch 'origin/encoders-and-data-types' into en…
Jolanrensen Feb 28, 2022
d56b5a4
spark 3.2.1
Jolanrensen Feb 28, 2022
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
added streaming and exploring how it works with kotlin spark api
  • Loading branch information
Jolanrensen committed Feb 18, 2022
commit c57eaaf5c2109b1d46eb7c93dc6fd49a5965a373
5 changes: 5 additions & 0 deletions examples/pom-3.2_2.12.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@
<artifactId>spark-sql_${scala.compat.version}</artifactId>
<version>${spark3.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.compat.version}</artifactId>
<version>${spark3.version}</version>
</dependency>
</dependencies>

<build>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/*-
* =LICENSE=
* Kotlin Spark API: Examples for Spark 3.2+ (Scala 2.12)
* ----------
* Copyright (C) 2019 - 2022 JetBrains
* ----------
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
* =LICENSEEND=
*/
package org.jetbrains.kotlinx.spark.examples

import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.streaming.Durations
import org.apache.spark.streaming.api.java.JavaStreamingContext
import org.jetbrains.kotlinx.spark.api.withSpark
import scala.Tuple2
import java.io.Serializable

data class Row @JvmOverloads constructor(
var word: String = "",
) : Serializable

fun main() = withSpark {

val context = JavaStreamingContext(
SparkConf()
.setMaster("local[*]")
.setAppName("Test"),
Durations.seconds(1),
)

val lines = context.socketTextStream("localhost", 9999)

val words = lines.flatMap { it.split(" ").iterator() }

words.foreachRDD { rdd, time ->

// todo convert rdd to dataset using kotlin data class?

val rowRdd = rdd.map { Row(it) }

val dataframe = spark.createDataFrame(rowRdd, Row::class.java)


}


context.start()
context.awaitTermination()
}
6 changes: 6 additions & 0 deletions kotlin-spark-api/3.2/pom_2.12.xml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@
<version>${spark3.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.compat.version}</artifactId>
<version>${spark3.version}</version>
<scope>provided</scope>
</dependency>

<!-- Test dependencies -->
<dependency>
Expand Down