0

In Fintech, the following scenario seems fairly common:

You've paid for access to a huge collection of data, but it is made available to you as thousands of little files, each with a footprint in the neighborhood of 300 kB, but altogether amounting to roughly 1 TB of data. Some of the files are stored with zip compression on the remote machine; some aren't. Furthermore, all of these files can only be accessed via FTP and you are limited to one connection to the server at a time.

What is the fastest way to get copies of these files?

14
  • Scripted (S)FTP. Using find to traverse the remote graph of files and piping its output to (s)ftp. Mounting the remote collection to the local file system using curlftpfs, then standard linux commands to copy files. Mounting the remote file system using curlftpfs, zipping entire directories, then copying those. All of these methods eventually cause the local Linux box to freeze. Checking resource allocation shows no memory leaks, or limited RAM. Commented Sep 12, 2016 at 0:40
  • But I'm not interested in what I've tried, I'm interested in what other people would try. Commented Sep 12, 2016 at 0:41
  • FTP != sftp. Please edit your question to include precise details on the protocols that are available, in addition to what you tried, what worked, what didn't work, etc. Commented Sep 12, 2016 at 0:42
  • Well, if you want help, you need to write a good question. The amount of effort and detail you put into your question has a direct bearing on the quality of answers received. Commented Sep 12, 2016 at 0:43
  • 1
    ftp has a subcommand called mget. With proper scripting of the directory tree and mget of the files in each dir, and a reasonable connection, it should take no more than a few days. ftp does not verify copies but part of scripting the directory tree will be to collect file sizes in bytes on the remote host. Use binary transfer mode and compare the local sizes with the catalog, and reretrieve discrepancies at the end. This whole process used to be very common, am sure one can find code laying around that does it. Commented Sep 12, 2016 at 3:46

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.