dataframe-persistent
Persistent database integration for the Haskell DataFrame library.
Overview
This package provides seamless integration between the dataframe library and the persistent database library, allowing you to:
- Load database entities directly into DataFrames
- Perform DataFrame operations on database data
- Save DataFrame results back to the database
- Work with type-safe database entities
Installation
Add to your package.yaml:
dependencies: - dataframe ^>= 0.3 - dataframe-persistent ^>= 0.1 - persistent >= 2.14 - persistent-sqlite >= 2.13 # or your preferred backend
Or to your .cabal file:
build-depends: dataframe ^>= 0.3, dataframe-persistent ^>= 0.1, persistent >= 2.14, persistent-sqlite >= 2.13
Quick Start
{-# LANGUAGE DataKinds #-} {-# LANGUAGE DerivingStrategies #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE MultiParamTypeClasses #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE QuasiQuotes #-} {-# LANGUAGE StandaloneDeriving #-} {-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE TypeApplications #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE UndecidableInstances #-} import Control.Monad.IO.Class (liftIO) import Database.Persist import Database.Persist.Sqlite import Database.Persist.TH import qualified DataFrame as DF import qualified DataFrame.Functions as F import DataFrame.IO.Persistent import DataFrame.IO.Persistent.TH import qualified Data.Vector as V -- Define your entities share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase| TestUser name Text age Int active Bool deriving Show Eq |] -- Derive DataFrame instances $(derivePersistentDataFrame ''TestUser) -- Example usage main :: IO () main = runSqlite "example.db" $ do -- Run migrations runMigration migrateAll -- Insert some test data _ <- insert $ TestUser "Alice" 25 True _ <- insert $ TestUser "Bob" 30 False _ <- insert $ TestUser "Charlie" 35 True -- Load from database allUsersDF <- fromPersistent @TestUser [] liftIO $ putStrLn $ "Loaded " ++ show (nRows allUsersDF) ++ " users" -- Load with filters activeUsersDF <- fromPersistent @TestUser [TestUserActive ==. True] liftIO $ putStrLn $ "Active users: " ++ show (nRows activeUsersDF) -- Process with DataFrame operations let youngUsers = DF.filter @Int "age" (< 30) allUsersDF ages = V.toList $ DF.columnAsVector @Int "age" youngUsers liftIO $ putStrLn $ "Young user ages: " ++ show ages -- Custom configuration let config = defaultPersistentConfig { pcIdColumnName = "user_id" , pcIncludeId = True } customDF <- fromPersistentWith @TestUser config [] liftIO $ putStrLn $ "Columns with custom config: " ++ show (DF.columnNames customDF)
Features
- Type-safe conversions between Persistent entities and DataFrames
- Template Haskell support for automatic instance generation
- Configurable loading with batch size and column selection
- Column name cleaning - removes table prefixes automatically (e.g.,
test_user_name → name) - Type preservation - maintains proper types for Text, Int, Bool, Day, etc.
- Empty DataFrame support - preserves column structure even with no data
- Support for all Persistent backends (SQLite, PostgreSQL, MySQL, etc.)
Configuration Options
data PersistentConfig = PersistentConfig { pcBatchSize :: Int -- Number of records to fetch at once (default: 10000) , pcIncludeId :: Bool -- Whether to include entity ID as column (default: True) , pcIdColumnName :: Text -- Name for the ID column (default: "id") }
Advanced Usage
You can also extract fields from individual entities:
let user = TestUser "Alice" 25 True columns = persistFieldsToColumns user -- Result: [("name", SomeColumn ["Alice"]), ("age", SomeColumn [25]), ("active", SomeColumn [True])]
Working with Vector Data
-- Extract specific column data let names = V.toList $ DF.columnAsVector @Text "name" df ages = V.toList $ DF.columnAsVector @Int "age" df activeFlags = V.toList $ DF.columnAsVector @Bool "active" df
Examples
For comprehensive examples and test cases, see:
Status
This package is actively maintained and tested. Current test coverage includes:
- ✅ Entity loading with and without filters
- ✅ Custom configuration options
- ✅ DataFrame operations on Persistent data
- ✅ Empty result set handling
- ✅ Field extraction utilities
- ✅ Multi-table relationships
Documentation
For detailed documentation, see:
License
GPL-3.0-or-later (same as the main dataframe package)