Name	Name	Last commit message	Last commit date
Latest commit History 4,100 Commits
.github	.github
binder	binder
core	core
data	data
dataframe-arrow	dataframe-arrow
dataframe-csv	dataframe-csv
dataframe-excel	dataframe-excel
dataframe-geo	dataframe-geo
dataframe-jdbc	dataframe-jdbc
dataframe-openapi-generator	dataframe-openapi-generator
dataframe-openapi	dataframe-openapi
docs	docs
examples	examples
gradle	gradle
plugins	plugins
tests	tests
.editorconfig	.editorconfig
.gitignore	.gitignore
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
KDOC_PREPROCESSING.md	KDOC_PREPROCESSING.md
LICENSE	LICENSE
README.md	README.md
RELEASE_CHECK_LIST.md	RELEASE_CHECK_LIST.md
build.gradle.kts	build.gradle.kts
gradle.properties	gradle.properties
gradlew	gradlew
gradlew.bat	gradlew.bat
settings.gradle.kts	settings.gradle.kts

Kotlin DataFrame: typesafe in-memory structured data processing for JVM

Kotlin DataFrame aims to reconcile Kotlin's static typing with the dynamic nature of data by utilizing both the full power of the Kotlin language and the opportunities provided by intermittent code execution in Jupyter notebooks and REPL.

Hierarchical — represents hierarchical data structures, such as JSON or a tree of JVM objects.
Functional — data processing pipeline is organized in a chain of DataFrame transformation operations. Every operation returns a new instance of DataFrame reusing underlying storage wherever it's possible.
Readable — data transformation operations are defined in DSL close to natural language.
Practical — provides simple solutions for common problems and the ability to perform complex tasks.
Minimalistic — simple, yet powerful data model of three column kinds.
Interoperable — convertable with Kotlin data classes and collections.
Generic — can store objects of any type, not only numbers or strings.
Typesafe — on-the-fly generation of extension properties for type safe data access with Kotlin-style care for null safety.
Polymorphic — type compatibility derives from column schema compatibility. You can define a function that requires a special subset of columns in a dataframe but doesn't care about other columns.

Integrates with Kotlin kernel for Jupyter. Inspired by krangl, Kotlin Collections and pandas

Documentation

Explore documentation for details.

You could find the following articles there:

What's new

Check out this notebook with new features in v0.15.

The DataFrame compiler plugin has reached public preview! Here's a compiler plugin demo project that works with IntelliJ IDEA 2024.2.

Setup

implementation("org.jetbrains.kotlinx:dataframe:0.15.0")

Optional Gradle plugin for enhanced type safety and schema generation https://kotlin.github.io/dataframe/schemasgradle.html

id("org.jetbrains.kotlinx.dataframe") version "0.15.0"

Check out the custom setup page if you don't need some of the formats as dependencies, for Groovy, and for configurations specific to Android projects.

Getting started

import org.jetbrains.kotlinx.dataframe.* import org.jetbrains.kotlinx.dataframe.api.* import org.jetbrains.kotlinx.dataframe.io.*

val df = DataFrame.read("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv") df["full_name"][0] // Indexing https://kotlin.github.io/dataframe/access.html df.filter { "stargazers_count"<Int>() > 50 }.print()

Getting started with data schema

Requires Gradle plugin to work

id("org.jetbrains.kotlinx.dataframe") version "0.15.0"

Plugin generates extension properties API for provided sample of data. Column names and their types become discoverable in completion.

// Make sure to place the file annotation above the package directive @file:ImportDataSchema( "Repository", "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv", ) package example import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema import org.jetbrains.kotlinx.dataframe.api.* fun main() { // execute `assemble` to generate extension properties API val df = Repository.readCSV() df.fullName[0] df.filter { stargazersCount > 50 } }

Getting started in Jupyter Notebook / Kotlin Notebook

Install the Kotlin kernel for Jupyter

Import the stable dataframe version into a notebook:

%use dataframe

or a specific version:

%use dataframe(<version>)

val df = DataFrame.read("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv") df // the last expression in the cell is displayed

When a cell with a variable declaration is executed, in the next cell DataFrame provides extension properties based on its data

df.filter { stargazers_count > 50 }

Data model

DataFrame is a list of columns with equal sizes and distinct names.
DataColumn is a named list of values. Can be one of three kinds:
- ValueColumn — contains data
- ColumnGroup — contains columns
- FrameColumn — contains dataframes

Syntax example

Let us show you how data cleaning and aggregation pipelines could look like with DataFrame.

Create:

// create columns val fromTo by columnOf("LoNDon_paris", "MAdrid_miLAN", "londON_StockhOlm", "Budapest_PaRis", "Brussels_londOn") val flightNumber by columnOf(10045.0, Double.NaN, 10065.0, Double.NaN, 10085.0) val recentDelays by columnOf("23,47", null, "24, 43, 87", "13", "67, 32") val airline by columnOf("KLM(!)", "{Air France} (12)", "(British Airways. )", "12. Air France", "'Swiss Air'") // create dataframe val df = dataFrameOf(fromTo, flightNumber, recentDelays, airline) // print dataframe df.print()

Clean:

// typed accessors for columns // that will appear during // dataframe transformation val origin by column<String>() val destination by column<String>() val clean = df // fill missing flight numbers .fillNA { flightNumber }.with { prev()!!.flightNumber + 10 } // convert flight numbers to int .convert { flightNumber }.toInt() // clean 'airline' column .update { airline }.with { "([a-zA-Z\\s]+)".toRegex().find(it)?.value ?: "" } // split 'fromTo' column into 'origin' and 'destination' .split { fromTo }.by("_").into(origin, destination) // clean 'origin' and 'destination' columns .update { origin and destination }.with { it.lowercase().replaceFirstChar(Char::uppercase) } // split lists of delays in 'recentDelays' into separate columns // 'delay1', 'delay2'... and nest them inside original column `recentDelays` .split { recentDelays }.inward { "delay$it" } // convert string values in `delay1`, `delay2` into ints .parse { recentDelays }

Aggregate:

clean // group by the flight origin renamed into "from" .groupBy { origin named "from" }.aggregate { // we are in the context of a single data group // total number of flights from origin count() into "count" // list of flight numbers flightNumber into "flight numbers" // counts of flights per airline airline.valueCounts() into "airlines" // max delay across all delays in `delay1` and `delay2` recentDelays.maxOrNull { delay1 and delay2 } into "major delay" // separate lists of recent delays for `delay1`, `delay2` and `delay3` recentDelays.implode(dropNA = true) into "recent delays" // total delay per destination pivot { destination }.sum { recentDelays.colsOf<Int?>() } into "total delays to" }

Check it out on Datalore to get a better visual impression of what happens and what the hierarchical dataframe structure looks like.

Explore more examples here.

Kotlin, Kotlin Jupyter, OpenAPI, Arrow and JDK versions

This table shows the mapping between main library component versions and minimum supported Java versions.

Kotlin DataFrame Version	Minimum Java Version	Kotlin Version	Kotlin Jupyter Version	OpenAPI version	Apache Arrow version
0.10.0	8	1.8.20	0.11.0-358	3.0.0	11.0.0
0.10.1	8	1.8.20	0.11.0-358	3.0.0	11.0.0
0.11.0	8	1.8.20	0.11.0-358	3.0.0	11.0.0
0.11.1	8	1.8.20	0.11.0-358	3.0.0	11.0.0
0.12.0	8	1.9.0	0.11.0-358	3.0.0	11.0.0
0.12.1	8	1.9.0	0.11.0-358	3.0.0	11.0.0
0.13.1	8	1.9.22	0.12.0-139	3.0.0	15.0.0
0.14.1	8	2.0.20	0.12.0-139	3.0.0	17.0.0
0.15.0	8	2.0.20	0.12.0-139	3.0.0	18.1.0

Code of Conduct

This project and the corresponding community are governed by the JetBrains Open Source and Community Code of Conduct. Please make sure you read it.

License

Kotlin DataFrame is licensed under the Apache 2.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kotlin DataFrame: typesafe in-memory structured data processing for JVM

Documentation

What's new

Setup

Getting started

Getting started with data schema

Getting started in Jupyter Notebook / Kotlin Notebook

Data model

Syntax example

Kotlin, Kotlin Jupyter, OpenAPI, Arrow and JDK versions

Code of Conduct

License

About

Uh oh!

Releases 12

Uh oh!

Contributors 51

Languages

License

Kotlin/dataframe

Folders and files

Latest commit

History

Repository files navigation

Kotlin DataFrame: typesafe in-memory structured data processing for JVM

Documentation

What's new

Setup

Getting started

Getting started with data schema

Getting started in Jupyter Notebook / Kotlin Notebook

Data model

Syntax example

Kotlin, Kotlin Jupyter, OpenAPI, Arrow and JDK versions

Code of Conduct

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 12

Uh oh!

Contributors 51

Languages