I just helped friends for some CSV manipulation and did it for the first time from CL, here's a quick usage overview.
If you know a better© way please comment.
EDIT: lisp-stat's data-frame is pretty awesome.
I added the data-table utility in CIEL: http://ciel-lang.org/#/libraries?id=csv
cl-csv and a table-like data structure
We'll use cl-csv, data-table and cl-csv-data-tables
(a system defined in cl-csv).
cl-csv
allows to read and write a CSV file, string or stream.
data-table
allows to work with a "table-like" data-structure where
you can access columns by name (instead of only by index), coerce
column types, and more.
cl-csv-data-tables
brings a couple helpers.
WARN: below, csv
is a nickname to ciel-csv
which exports symbols from both cl-csv and data-table.
Here are the first cl-csv examples:
Read a file into a list of lists:
(csv:read-csv #P"file.csv") ;; <--- note the #P => (("1" "2" "3") ("4" "5" "6")) ;; read csv from a string (streams are also supported) (csv:read-csv "1,2,3 4,5,6") => (("1" "2" "3") ("4" "5" "6"))
Read a file that's TAB delimited:
(csv:read-csv #P"file.tab" :separator #\Tab)
loop over a CSV for side effects with do-csv
:
(let ((sum 0)) (csv:do-csv (row #P"file.csv") (incf sum (parse-integer (nth 0 row)))) sum)
We can use map-fn
to do something at each row.
Below we read a file and return a list of objects created from each row. We could create instances of our own objects like this.
(cl-csv:read-csv #P"file.csv" :map-fn #'(lambda (row) (make-instance 'object :foo (nth 0 row) :baz (nth 2 row))))
note that we have to access each column by index.
Use a table, guess column types
Read a CSV, create a data-table object, assume headers are on the
first row (:has-column-names
), guess the column types (:munge-types
):
(csv:get-data-table-from-csv #p"file.csv") ;; <--- still the #P ;; #<DATA-TABLE:DATA-TABLE {10018A9F63}> (describe *) ;; => COLUMN-NAMES = ("Date" "Type" "Quantity" "Total") COLUMN-TYPES = (STRING STRING INTEGER DOUBLE-FLOAT) ROWS = (("9 jan. 1975" "Sell" 1 9.90) …)
This function is roughly as if you read the CSV file with cl-csv:read-csv
, checked that the first row contains column names, created the data-table object with (make-instance 'data-table:data-table :column-names (first rows) :rows (rest rows)
, and coerced the columns' types with (data-table:coerce-data-table-of-strings-to-types dt)
and (data-table::ensure-column-data-types dt)
(unexported function).
Access rows and columns
do that with:
csv:rows
csv:data-table-value dt &key row row-idx col-name col-idx
- and more
Write to file (or streams)
Write the data-table to a file with data-table-to-csv dt &optional stream)
.
Real-world example
That's my CIEL script that I did to help friends of a non-profit.
;;;; ;;;; Run this script on many CSV files: ;;;; ;;;; $ ciel sumuputils.lisp Rapports* ;;;; ;;;; (in-package :ciel-user) ;; Needs the latest CIEL, or this library. ;; (ql:quickload "data-table" :silent t) ;;; Download the CSV file from Sum Up (defvar *file* #p"/path/to/Rapport-ventes-2024-11-01_2024-11-30.csv" "Only for testing.") (defvar *dt* nil "devel only") (defun parse-csv (file) "Parse CSV, return a data-table object with column names and rows. file: a pathname (not just a string)." ;; This takes headers as the first row ;; and guesses the columns' types (string, int, float). (csv:get-data-table-from-csv (pathname file))) (defun get-all-days (dt) (remove-duplicates (loop for row in (data-table:rows dt) for date/time = (data-table:data-table-value dt :row row :col-name "Date") for day = (str:unwords (split-sequence #\Space date/time :count 3)) collect day) :test #'equal)) ;; (get-all-days (parse-csv *FILE*)) ;; ("1 nov. 2024" "2 nov. 2024" "5 nov. 2024" "7 nov. 2024" "8 nov. 2024" …) (defun get-all-conso-types (dt) (sort (remove-duplicates (loop for row in (data-table:rows dt) for description = (data-table:data-table-value dt :row row :col-name "Description") when (str:non-blank-string-p description) collect description) :test #'equal) #'string<)) (defun get-all-offert-types (types) (filter (^ (it) (str:containsp "offert" it)) types)) #++ (get-all-offert-types (get-all-conso-types (parse-csv *FILE*))) (defun rows-offerts-for-day (day dt) "day: string, like '23 nov" (loop for row in (data-table:rows dt) when (and (str:containsp day (data-table:data-table-value dt :row row :col-name "Date")) (str:containsp "offert" (data-table:data-table-value dt :row row :col-name "Description"))) collect row)) (defun sum-quantities-offerts-for-day (day dt) ;; optionnel: pour chq jour, combien de tartines, alcool, soft… d'offerts? (en nombre) (loop for row in (rows-offerts-for-day day dt) for qty = (data-table:data-table-value dt :row row :col-name "Quantité") when qty sum qty)) ;; (sum-quantities-offerts-for-day "23 nov" *dt*) ;; 15 (defun sum-quantities-offerts-for-day/by-type (day dt desc) ;; optionnel: pour chq jour, combien de tartines, alcool, soft… d'offerts? (en nombre) (loop for row in (rows-offerts-for-day day dt) for qty = (data-table:data-table-value dt :row row :col-name "Quantité") when (and qty (str:containsp desc (data-table:data-table-value dt :row row :col-name "Description"))) sum qty)) (defun report-sum-quantities-offerts-for-day/by-type (file &key (stream t) &aux dt) (setf dt (parse-csv file)) (loop for day in (get-all-days dt) do (format stream "~&~%Nombre de consos offertes le ~a~&~%" day) (loop for desc in (get-all-offert-types (get-all-conso-types dt)) do (format stream "~a: ~a~&" desc (sum-quantities-offerts-for-day/by-type day dt desc))))) (defun sum-total-offerts-for-day (day dt) ;; le + important (loop for row in (rows-offerts-for-day day dt) for qty = (data-table:data-table-value dt :row row :col-name "Prix (TTC)") when qty sum qty)) ;; (sum-total-offerts-for-day "23 nov" *dt*) ;; 49.0d0 (defun report-totals-offert-for-days (file &key (stream t)) (let ((dt (parse-csv file))) (format stream "~&~%Total TTC des offerts~&~%") (format stream "~&(somme des colonnes Prix (TTC) pour toutes les lignes du jour comportant 'conso offerte')~&") (loop for day in (get-all-days dt) do (format stream "~20a: ~4f~&" day (sum-total-offerts-for-day day dt))))) (defun report-number-offert-for-days (file &key (stream t)) (let ((dt (parse-csv file))) (format stream "~&Nombre des consos offertes par jour~&") (format stream "~&(somme des colonnes 'Quantité' pour toutes les lignes du jour comportant 'conso offerte')~&") (loop for day in (get-all-days dt) do (format stream "~20a: ~4a~&" day (sum-quantities-offerts-for-day day dt))))) #+ciel #+ciel (let ((files (mapcar #'pathname (rest ciel-user:*script-args*)))) (mapcar #'report-totals-offert-for-days files) (mapcar #'report-sum-quantities-offerts-for-day/by-type files))
Call it from the command line:
$ ciel sumuputils.lisp Rapport*csv > rapport.txt
Show data in table
This is our CSV, by the way:
Date,Type,Réf. transaction,Moyen de paiement,Quantité,Description,Devise,Prix avant réduction,Réduction,Prix (TTC),Prix (HT),TVA,Taux de TVA,Compte 1 nov. 2024 22:09,Vente,T3YKZYZANT,Mastercard - Crédit,1,Montant personnalisé,EUR,3.00,0.00,3.00,3.00,0.00,,us@gmail.com …
Can we display it nicely in a table? We'll use our resurrected cl-ansi-term library.
We'll use the "raw" CSV data, where the headers are on the first line, instead of the data-value object.
That way we just have to pass it to term:table *csv*
, and we choose the columns to display:
(setf *rawcsv* (csv:read-csv #p"test.csv")) (term:table *RAWCSV* :keys '("Date" "Quantité" "Prix (HT)"))
Result:
+-----------------+--------+---------+ |Date |Quantité|Prix (HT)| +-----------------+--------+---------+ |1 nov. 2024 22:09|1 |3.00 | +-----------------+--------+---------+ |2 nov. 2024 19:10|1 |6.00 | +-----------------+--------+---------+ |2 nov. 2024 20:25|1 |6.00 | +-----------------+--------+---------+ |2 nov. 2024 20:44|1 |10.00 | +-----------------+--------+---------+ |2 nov. 2024 21:05|1 |6.00 | +-----------------+--------+---------+ |2 nov. 2024 21:13|1 |3.00 | +-----------------+--------+---------+ |2 nov. 2024 21:45|1 |7.00 | +-----------------+--------+---------+ |2 nov. 2024 21:49|1 |6.00 | +-----------------+--------+---------+
Lisp-stat's data-frame
lisp-stat's data-frame is the smart way.
(ql:quickload :lisp-stat) ;; … (in-package :ls-user) ;; Give a name to our dataframe, ;; read CSV. (defdf sumup (read-csv #p"sells.csv")) ;; similar to (defparameter *data-frame* …) but with future goodies. ;; The goodies are that each column is created as a variable for our "sumup" dataframe. ;; Here we get all the "date" columns: LS-USER> sumup:date ;; #("1 nov. 2024 19:36" "1 nov. 2024 19:36" …) ;; We didn't need to LOOP. ;; We can also use (select sumup t 'date) ;; We have rows an column manipulation functions. ;; Let's get the data of only 2 columns: (columns sumup '(date quantité)) ;; Let's inspect our data: (head sumup) ;; DATE TYPE RÉF--TRANSACTION MOYEN-DE-PAIEMENT QUANTITÉ DESCRIPTION DEVISE PRIX-AVANT-RÉDUCTION RÉDUCTION PRIX-(TTC) PRIX-(HT) TVA TAUX-DE-TVA COMPTE ;; 0 1 nov. 2024 19:36 Vente TE4PL6DA9X Mastercard - Débit 1 Montant personnalisé EUR 9.0 0.0 9.0 9.0 0.0 NA us@gmail.com ;; the data is printed as a nice table, respecting the columns width out of the box. ;; print-data would print all the data. ;; DESCRIBE output is pimped: (describe sumup) SUMUP A data-frame with 863 observations of 14 variables Variable | Type | Unit | Label -------- | ---- | ---- | ----------- DATE | NIL | NIL | NIL TYPE | NIL | NIL | NIL RÉF--TRANSACTION | NIL | NIL | NIL MOYEN-DE-PAIEMENT | NIL | NIL | NIL QUANTITÉ | NIL | NIL | NIL … ;; We can guess the column types: (heuristicate-types sumup) ;; Our types are not NIL now: (describe sumup) SUMUP A data-frame with 863 observations of 14 variables Variable | Type | Unit | Label -------- | ---- | ---- | ----------- DATE | STRING | NIL | NIL TYPE | STRING | NIL | NIL RÉF--TRANSACTION | STRING | NIL | NIL MOYEN-DE-PAIEMENT | STRING | NIL | NIL QUANTITÉ | INTEGER | NIL | NIL PRIX-AVANT-RÉDUCTION | DOUBLE-FLOAT | NIL | NIL …
And we have rows and colums manipulation functions at our disposal. Much more, without loop
ing.
See also
discussion: https://www.reddit.com/r/Common_Lisp/comments/1kht5ht/read_csv_files_in_common_lisp_clcsv_datatable/
If all you want is fast CSV parsing, redditors mentioned that DuckDB through cl-duckdb is pretty good.
- lisp-stat's dataframe can read CSV. It's the smart way.
- easy overview
- direct access to row and columns manipulation, with no need of loop-ing.
- auto-text, automatic detection for text files (encoding, end of line, column width, csv delimiter etc). inquisitor for detection of asian and far eastern languages.
- CLAWK, an AWK implementation embedded into Common Lisp, to parse files line-by-line.
Lisp?!
Top comments (0)