2

I'm using Ansible to set up some configuration on several nodes, and as part of this setup I need to split one big file by n lines and copy each part to a remote file without creating local copy of each chunk (like bash split command does). Ansible can't do this by default (or I just didn't find out how to do it yet), so I decided to use GNU Parallel. I found out here that copying from stdin may be easily done like this:

~$ echo "Lots of data" | ssh [email protected] 'cat > big.txt' 

But I want to do this simultaneously to several hosts! So, here is an example input:

~$ cat hosts.txt 1.1.1.1 2.2.2.2 3.3.3.3 ~$ cat data.txt lots of ... lines 

I calculate number of lines per node by doing "wc -l" and dividing second number by first. So, basically, next step would be something like this:

~$ cat data.txt | parallel -S `cat hosts.txt | tr "\n" ","` -N $LINES_PER_HOST --pipe "ssh $HOST 'cat > /data/piece.txt'" 

but how can I launch one command for each host, what should I replace $HOST with? I thought about combining two inputs (one being hosts), but still no idea how to do it.

Would really appreciate any thoughts.

1 Answer 1

1

Works from version 20150922:

parallel-20150922 -a bigfile --roundrobin --pipepart --slf hosts.txt -j1 'cat > giraf' 
4
  • Great, thanks! I only see one limitation - I can only split file by blocks, not by lines. Will it always process line endings correctly in this case? I don't really care if one node gets a little more lines than the other, but I need lines to be complete and readable. Commented Jun 27, 2016 at 18:09
  • It splits on \n, so you should be safe. Commented Jun 28, 2016 at 6:34
  • I launched it on file with 10000 lines and 2 nodes, using block size a little larger that 'du -b' output divided by two. One node got ~4850 lines, the other one ~4900, the rest is lost. Is there any way to ensure all lines are copied? Or I should calculate block size in some other way? Commented Jun 29, 2016 at 9:34
  • You should never lose lines. Can you post the exact command you wrote? If you run the command I give, --block is not needed. Commented Jun 29, 2016 at 14:41

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.