DEV Community

vindarel
vindarel

Posted on • Edited on

Read CSV files in Common Lisp (cl-csv, data-table)

I just helped friends for some CSV manipulation and did it for the first time from CL, here's a quick usage overview.

If you know a better© way please comment.

EDIT: lisp-stat's data-frame is pretty awesome.

I added the data-table utility in CIEL: http://ciel-lang.org/#/libraries?id=csv

cl-csv and a table-like data structure

We'll use cl-csv, data-table and cl-csv-data-tables (a system defined in cl-csv).

cl-csv allows to read and write a CSV file, string or stream.

data-table allows to work with a "table-like" data-structure where
you can access columns by name (instead of only by index), coerce
column types, and more.

cl-csv-data-tables brings a couple helpers.

WARN: below, csv is a nickname to ciel-csv which exports symbols from both cl-csv and data-table.

Here are the first cl-csv examples:

Read a file into a list of lists:

(csv:read-csv #P"file.csv") ;; <--- note the #P => (("1" "2" "3") ("4" "5" "6")) ;; read csv from a string (streams are also supported) (csv:read-csv "1,2,3 4,5,6") => (("1" "2" "3") ("4" "5" "6")) 
Enter fullscreen mode Exit fullscreen mode

Read a file that's TAB delimited:

(csv:read-csv #P"file.tab" :separator #\Tab) 
Enter fullscreen mode Exit fullscreen mode

loop over a CSV for side effects with do-csv:

(let ((sum 0)) (csv:do-csv (row #P"file.csv") (incf sum (parse-integer (nth 0 row)))) sum) 
Enter fullscreen mode Exit fullscreen mode

We can use map-fn to do something at each row.

Below we read a file and return a list of objects created from each row. We could create instances of our own objects like this.

(cl-csv:read-csv #P"file.csv" :map-fn #'(lambda (row) (make-instance 'object :foo (nth 0 row) :baz (nth 2 row)))) 
Enter fullscreen mode Exit fullscreen mode

note that we have to access each column by index.

Use a table, guess column types

Read a CSV, create a data-table object, assume headers are on the
first row (:has-column-names), guess the column types (:munge-types):

(csv:get-data-table-from-csv #p"file.csv") ;; <--- still the #P ;; #<DATA-TABLE:DATA-TABLE {10018A9F63}> (describe *) ;; => COLUMN-NAMES = ("Date" "Type" "Quantity" "Total") COLUMN-TYPES = (STRING STRING INTEGER DOUBLE-FLOAT) ROWS = (("9 jan. 1975" "Sell" 1 9.90) ) 
Enter fullscreen mode Exit fullscreen mode

This function is roughly as if you read the CSV file with cl-csv:read-csv, checked that the first row contains column names, created the data-table object with (make-instance 'data-table:data-table :column-names (first rows) :rows (rest rows), and coerced the columns' types with (data-table:coerce-data-table-of-strings-to-types dt) and (data-table::ensure-column-data-types dt) (unexported function).

Access rows and columns

do that with:

  • csv:rows
  • csv:data-table-value dt &key row row-idx col-name col-idx
  • and more

Write to file (or streams)

Write the data-table to a file with data-table-to-csv dt &optional stream).

Real-world example

That's my CIEL script that I did to help friends of a non-profit.

;;;; ;;;; Run this script on many CSV files: ;;;; ;;;; $ ciel sumuputils.lisp Rapports* ;;;; ;;;; (in-package :ciel-user) ;; Needs the latest CIEL, or this library. ;; (ql:quickload "data-table" :silent t) ;;; Download the CSV file from Sum Up (defvar *file* #p"/path/to/Rapport-ventes-2024-11-01_2024-11-30.csv" "Only for testing.") (defvar *dt* nil "devel only") (defun parse-csv (file) "Parse CSV, return a data-table object with column names and rows. file: a pathname (not just a string)." ;; This takes headers as the first row ;; and guesses the columns' types (string, int, float). (csv:get-data-table-from-csv (pathname file))) (defun get-all-days (dt) (remove-duplicates (loop for row in (data-table:rows dt) for date/time = (data-table:data-table-value dt :row row :col-name "Date") for day = (str:unwords (split-sequence #\Space date/time :count 3)) collect day) :test #'equal)) ;; (get-all-days (parse-csv *FILE*)) ;; ("1 nov. 2024" "2 nov. 2024" "5 nov. 2024" "7 nov. 2024" "8 nov. 2024" …) (defun get-all-conso-types (dt) (sort (remove-duplicates (loop for row in (data-table:rows dt) for description = (data-table:data-table-value dt :row row :col-name "Description") when (str:non-blank-string-p description) collect description) :test #'equal) #'string<)) (defun get-all-offert-types (types) (filter (^ (it) (str:containsp "offert" it)) types)) #++ (get-all-offert-types (get-all-conso-types (parse-csv *FILE*))) (defun rows-offerts-for-day (day dt) "day: string, like '23 nov" (loop for row in (data-table:rows dt) when (and (str:containsp day (data-table:data-table-value dt :row row :col-name "Date")) (str:containsp "offert" (data-table:data-table-value dt :row row :col-name "Description"))) collect row)) (defun sum-quantities-offerts-for-day (day dt) ;; optionnel: pour chq jour, combien de tartines, alcool, soft… d'offerts? (en nombre) (loop for row in (rows-offerts-for-day day dt) for qty = (data-table:data-table-value dt :row row :col-name "Quantité") when qty sum qty)) ;; (sum-quantities-offerts-for-day "23 nov" *dt*) ;; 15 (defun sum-quantities-offerts-for-day/by-type (day dt desc) ;; optionnel: pour chq jour, combien de tartines, alcool, soft… d'offerts? (en nombre) (loop for row in (rows-offerts-for-day day dt) for qty = (data-table:data-table-value dt :row row :col-name "Quantité") when (and qty (str:containsp desc (data-table:data-table-value dt :row row :col-name "Description"))) sum qty)) (defun report-sum-quantities-offerts-for-day/by-type (file &key (stream t) &aux dt) (setf dt (parse-csv file)) (loop for day in (get-all-days dt) do (format stream "~&~%Nombre de consos offertes le ~a~&~%" day) (loop for desc in (get-all-offert-types (get-all-conso-types dt)) do (format stream "~a: ~a~&" desc (sum-quantities-offerts-for-day/by-type day dt desc))))) (defun sum-total-offerts-for-day (day dt) ;; le + important (loop for row in (rows-offerts-for-day day dt) for qty = (data-table:data-table-value dt :row row :col-name "Prix (TTC)") when qty sum qty)) ;; (sum-total-offerts-for-day "23 nov" *dt*) ;; 49.0d0 (defun report-totals-offert-for-days (file &key (stream t)) (let ((dt (parse-csv file))) (format stream "~&~%Total TTC des offerts~&~%") (format stream "~&(somme des colonnes Prix (TTC) pour toutes les lignes du jour comportant 'conso offerte')~&") (loop for day in (get-all-days dt) do (format stream "~20a: ~4f~&" day (sum-total-offerts-for-day day dt))))) (defun report-number-offert-for-days (file &key (stream t)) (let ((dt (parse-csv file))) (format stream "~&Nombre des consos offertes par jour~&") (format stream "~&(somme des colonnes 'Quantité' pour toutes les lignes du jour comportant 'conso offerte')~&") (loop for day in (get-all-days dt) do (format stream "~20a: ~4a~&" day (sum-quantities-offerts-for-day day dt))))) #+ciel #+ciel (let ((files (mapcar #'pathname (rest ciel-user:*script-args*)))) (mapcar #'report-totals-offert-for-days files) (mapcar #'report-sum-quantities-offerts-for-day/by-type files)) 
Enter fullscreen mode Exit fullscreen mode

Call it from the command line:

$ ciel sumuputils.lisp Rapport*csv > rapport.txt 
Enter fullscreen mode Exit fullscreen mode

Show data in table

This is our CSV, by the way:

Date,Type,Réf. transaction,Moyen de paiement,Quantité,Description,Devise,Prix avant réduction,Réduction,Prix (TTC),Prix (HT),TVA,Taux de TVA,Compte 1 nov. 2024 22:09,Vente,T3YKZYZANT,Mastercard - Crédit,1,Montant personnalisé,EUR,3.00,0.00,3.00,3.00,0.00,,us@gmail.com … 
Enter fullscreen mode Exit fullscreen mode

Can we display it nicely in a table? We'll use our resurrected cl-ansi-term library.

We'll use the "raw" CSV data, where the headers are on the first line, instead of the data-value object.

That way we just have to pass it to term:table *csv*, and we choose the columns to display:

(setf *rawcsv* (csv:read-csv #p"test.csv")) (term:table *RAWCSV* :keys '("Date" "Quantité" "Prix (HT)")) 
Enter fullscreen mode Exit fullscreen mode

Result:

+-----------------+--------+---------+ |Date |Quantité|Prix (HT)| +-----------------+--------+---------+ |1 nov. 2024 22:09|1 |3.00 | +-----------------+--------+---------+ |2 nov. 2024 19:10|1 |6.00 | +-----------------+--------+---------+ |2 nov. 2024 20:25|1 |6.00 | +-----------------+--------+---------+ |2 nov. 2024 20:44|1 |10.00 | +-----------------+--------+---------+ |2 nov. 2024 21:05|1 |6.00 | +-----------------+--------+---------+ |2 nov. 2024 21:13|1 |3.00 | +-----------------+--------+---------+ |2 nov. 2024 21:45|1 |7.00 | +-----------------+--------+---------+ |2 nov. 2024 21:49|1 |6.00 | +-----------------+--------+---------+ 
Enter fullscreen mode Exit fullscreen mode

Lisp-stat's data-frame

lisp-stat's data-frame is the smart way.

(ql:quickload :lisp-stat) ;; … (in-package :ls-user) ;; Give a name to our dataframe, ;; read CSV. (defdf sumup (read-csv #p"sells.csv")) ;; similar to (defparameter *data-frame* …) but with future goodies. ;; The goodies are that each column is created as a variable for our "sumup" dataframe. ;; Here we get all the "date" columns: LS-USER> sumup:date ;; #("1 nov. 2024 19:36" "1 nov. 2024 19:36" …) ;; We didn't need to LOOP. ;; We can also use  (select sumup t 'date) ;; We have rows an column manipulation functions. ;; Let's get the data of only 2 columns: (columns sumup '(date quantité)) ;; Let's inspect our data: (head sumup) ;; DATE TYPE RÉF--TRANSACTION MOYEN-DE-PAIEMENT QUANTITÉ DESCRIPTION DEVISE PRIX-AVANT-RÉDUCTION RÉDUCTION PRIX-(TTC) PRIX-(HT) TVA TAUX-DE-TVA COMPTE  ;; 0 1 nov. 2024 19:36 Vente TE4PL6DA9X Mastercard - Débit 1 Montant personnalisé EUR 9.0 0.0 9.0 9.0 0.0 NA us@gmail.com ;; the data is printed as a nice table, respecting the columns width out of the box. ;; print-data would print all the data. ;; DESCRIBE output is pimped: (describe sumup) SUMUP A data-frame with 863 observations of 14 variables Variable | Type | Unit | Label -------- | ---- | ---- | ----------- DATE | NIL | NIL | NIL TYPE | NIL | NIL | NIL RÉF--TRANSACTION | NIL | NIL | NIL MOYEN-DE-PAIEMENT | NIL | NIL | NIL QUANTITÉ | NIL | NIL | NIL … ;; We can guess the column types: (heuristicate-types sumup) ;; Our types are not NIL now: (describe sumup) SUMUP A data-frame with 863 observations of 14 variables Variable | Type | Unit | Label -------- | ---- | ---- | ----------- DATE | STRING | NIL | NIL TYPE | STRING | NIL | NIL RÉF--TRANSACTION | STRING | NIL | NIL MOYEN-DE-PAIEMENT | STRING | NIL | NIL QUANTITÉ | INTEGER | NIL | NIL PRIX-AVANT-RÉDUCTION | DOUBLE-FLOAT | NIL | NIL  
Enter fullscreen mode Exit fullscreen mode

And we have rows and colums manipulation functions at our disposal. Much more, without looping.

See also

discussion: https://www.reddit.com/r/Common_Lisp/comments/1kht5ht/read_csv_files_in_common_lisp_clcsv_datatable/

If all you want is fast CSV parsing, redditors mentioned that DuckDB through cl-duckdb is pretty good.

  • lisp-stat's dataframe can read CSV. It's the smart way.
    • easy overview
    • direct access to row and columns manipulation, with no need of loop-ing.
  • auto-text, automatic detection for text files (encoding, end of line, column width, csv delimiter etc). inquisitor for detection of asian and far eastern languages.
  • CLAWK, an AWK implementation embedded into Common Lisp, to parse files line-by-line.

Lisp?!

Top comments (0)