While doing research, I found that this class is already available in Discourse via require 'rubygems/package': http://ruby-doc.org/stdlib-2.0.0/libdoc/rubygems/rdoc/Gem/Package/TarWriter.html
Using this should allow Discourse to take backups without having more than double the required disk space available, by streaming the entire archive to disk through an in-process tar and an in-process gzip.
Usage should look like the following:
destination = File.open(target_filename, "wb") gz_stream = Zlib::GzipWriter.new(destination, 5) @tar_writer = Gem::Package::TarWriter.new(gz_stream) log "Archiving data dump..." FileUtils.cd(File.dirname(@dump_filename)) do @tar_writer.add_file "dump.sql.gz", 0644 do |tf| File.open(@dump_filename) do |df| IO.copy_stream(df, tf) end end end rel_directory = File.join(Rails.root, "public") upload_directory = File.join(rel_directory, "uploads", @current_db) log "Archiving uploads..." last_progress = Time.now files_since_progress = 0 Dir[File.join(upload_directory, "**/*")].each do |file| stat = File.stat(file) relative = file.delete_prefix(rel_directory) if stat.directory? @tar_writer.mkdir relative, stat.mode else files_since_progress += 1 if files_since_progress > 100 or (last_progress < 15.seconds.ago) log "Archiving #{file}" files_since_progress = 0 last_progress = Time.now end @tar_writer.add_file relative, stat.mode do |tf| File.open(file, "rb") { |df| IO.copy_stream(df, tf) } end end end log "Finishing up archive..." @tar_writer.close gz_stream.close destination.close remove_tmp_directory The above code does not have:
- proper error reporting
- progress indicators