Posted on Jul 9, 2024

Using Async in Ruby on Rails for CSV export

In this article, we'll go over the methods used to achieve async CSV export with Ruby on Rails.

Problem: When a large amount of data is exported and the server is unable to process it quickly after receiving client requests, a timeout error may occasionally occur with CSV exports.

The solution is to move the CSV export to a worker thread, process it there so that it doesn't slow down the main thread, and notify the client when the desired CSV file is prepared.

We implemented a general solution using the Command design pattern, which will enable us to run the same code for many types of CSV format outputs.

Technology used

Heroku
Ruby On Rails
Sidekiq as a worker thread
Redis for keeping the state of csv export processing
Filestack or any other cloud storage service

Here, two enums with the following structure are used.

 EXPORTABLE_REDIS_STATUSES = { processing: 'Processing', complete: 'Processed', }.freeze EXPORTABLE_REDIS_KEYS = { members_csv: 'MEMBERS_CSV_GENERATORS', all_tasks_csv: 'ALL_TASKS_CSV', my_tasks_csv: 'MY_TASKS_CSV', }.freeze

Here, we have two states: processing and processed. One is used to launch a worker thread, and the other is used to produce a CSV file once the task is completed.

The second enum is only used as a filter to prevent csv exports that aren't registered with the application.

This is how the command invoker class, which acts as a sidekiq worker, looks.

class Exports::ExportableCommandJob < ApplicationJob after_enqueue do |job| uuid = job.arguments.first[:uuid] redis_key = redis_collection_key(job.arguments.first[:redis_key]) REDIS.hset( redis_key, uuid, { status: Constants::EXPORTABLE_REDIS_STATUSES[:processing] }.to_json ) end def perform(uuid:, redis_key:, command:, params: {}, cleanup_interval: nil) params = JSON.parse(params).symbolize_keys unless params.is_a?(Hash) command = command.constantize.new(params) redis_key = redis_collection_key(redis_key) file_data = command.call tmp_file = Tempfile.new('upload', encoding: 'ascii-8bit') tmp_file << file_data tmp_file.flush tmp_file.rewind file_name = command.file_name uploaded_file = UploadFileService::UploadableFile.new(file: tmp_file, filename: file_name) details = UploadFileService.upload_file(uploaded_file) tmp_file.unlink file_path = details.metadata[:fileurl] generator = JSON.parse(REDIS.hget(redis_key, uuid)) generator['status'] = Constants::EXPORTABLE_REDIS_STATUSES[:complete] generator['exportable'] = file_path REDIS.hset(redis_key, uuid, generator.to_json) end after_perform do |job| uuid = job.arguments.first[:uuid] redis_key = redis_collection_key(job.arguments.first[:redis_key]) ExportableCleanup.set(wait: job.arguments.first[:cleanup_interval] || 1.hour) .perform_later(uuid: uuid, redis_key: redis_key) end private def redis_collection_key(key) redis_key = key.to_sym Constants::EXPORTABLE_REDIS_KEYS[redis_key] || key end end

We used the uuid and redis_key to set the job's status to processing after it was queuing, allowing us to monitor its progress at any time.

We accept a command class name and its arguments via params in the perform method, allowing us to invoke a function and expect the presence of CSV data. After that, we store the data in a temporary file and upload it using a fileStack service or any cloud storage service.
We obtain the file's url after putting it on the cloud, and we use it to set Redis to change the task state from processing to processed. The client can now request to get updated when the CSV export is completed and to receive generated URL for downloading.

For filtering purposes, the private method redis_collection_key has been used here.

In the end after_perform schedules a cleanup task, as shown in this example.

class ExportableCleanup < ApplicationJob def perform(uuid:, redis_key:) exportable_json = REDIS.hget(redis_key, uuid) unless exportable_json.nil? generator = JSON.parse(exportable_json) file_url = generator['exportable'] UploadFileService.remove_file(file_url) unless file_url.blank? end REDIS.hdel(redis_key, uuid) end end

Here it just removes data from Redis and file from cloud storage.

The invoker class call looks like this

def export_csv_async(args, redis_key) uuid = SecureRandom.uuid Exports::ExportableCommandJob.perform_later( uuid: uuid, redis_key: redis_key, command: 'CsvExportDataGenerator', params: args.to_h.to_json, ) uuid end

which returns the uuid and will return it to the client so it can make the state checking calls as described above.

This is a simple action which checks process state in Redis

class ExportableGeneratorsController < ActionController::API include HttpErrorHandling before_action :load_resource def show render json: { status: @generator['status'], fileUrl: @generator['exportable'] } end private def load_resource @exportable_key = Constants::EXPORTABLE_REDIS_KEYS[params[:key].to_sym] gen = REDIS.hget(@exportable_key, params[:uuid]) return not_found('Process not found') if gen.nil? @generator = JSON.parse(gen) end end

Top comments (1)

Abrar ahmed • May 29

Great post!
For large datasets, you might consider batching rows inside the worker:

CSV.open(tmp_file.path, 'w') do |csv| YourModel.find_in_batches(batch_size: 1000) do |batch| batch.each { |row| csv << row.to_csv } end end

Also, adding a TTL to cloud files (e.g., using S3’s lifecycle rules) can help with cleanup.
Looking forward to more Rails tips like this!