DEV Community

t-o-d
t-o-d

Posted on

Split CSV by number in shell only.

  • Sometimes I have to deal with large CSV files.
  • In this case, dividing and foldering in advance will make it easier to handle.
  • This section describes how to split files and classify folders according to the number of lines using only Shell.

Result

  • The following is the directory structure before execution.
. ├── sample.csv ├── main.sh 
Enter fullscreen mode Exit fullscreen mode
  • The following is the description of main.sh
    • sample.csv has 100 rows of data
    • ※Error handling is omitted.
#!/bin/sh set -e # file path [ ! -e "$1" ] && exit 1 || datafile="$1" # File extension deletion filename="${datafile%.*}" # Get number of lines row=$(grep -c '' $datafile) # Obtaining the number of splits sep="$2" # Number of directories created dir_cnt=$(awk -v row="$row" -v sep="$sep" 'BEGIN { i=row/sep printf("%d\n",i+=i<0?0:0.999) } ' ) # Folder creation seq -f "${filename}_%01.0f" 1 ${dir_cnt} | xargs mkdir -p # File division split -l ${sep} -a 2 $datafile "${filename}_" # File movement count=1 for i in `find . -type f -name "${filename}_*" | sort` do mv $i "${filename}_${count}/${i//_*/_${count}}.csv" let count++ done 
Enter fullscreen mode Exit fullscreen mode
  • Run as follows.
sh main.sh sample.csv 25 
Enter fullscreen mode Exit fullscreen mode
  • After executing, check that the directory structure is as follows.
. ├── main.sh ├── sample.csv ├── sample_1 │ ├── sample_1.csv ├── sample_2 │ ├── sample_2.csv ├── sample_3 │ ├── sample3.csv ├── sample_4 │ ├── sample4.csv 
Enter fullscreen mode Exit fullscreen mode

Supplement

Number of created directories

  • Rounding up
    • In the case of a decimal number such as 100/15, the directory is not created normally.
    • Round up to an integer with printf.

File splitting and moving

  • Extension is added by mv.
    • Additional extension (--additional-suffix) in split is not the default on Mac etc.

Link

Top comments (0)