Asynchronously Synchronous.

.... the amazing task

One of the interesting tasks I have dealt with...

CSV - prepare and download using background process.

Serving a static file is easy, but when filters are applied on large data set and generating a file will add up load on server and the user experience isn't that good.

To overcome the above problem - I used delayed job gem as async process, you could choose any other gems as well. Here's how I went through...

The Synchronous way.

def download
  csv_string = CSV.generate do |csv|
    csv << ["ID", .....]
    @search.each do |site|
      csv << [site.id, site.so.on....]
    end
  end
  send_data csv_string,
            :type => 'text/csv',
            :filename => 'sites_list.csv',
            :disposition => 'attachment'
end

Table size was huge, records around 50k+ when requested for all records(sites information here) it holds up my machine, so much was the load.

After research and research(that is googling and ..ing) I thought to myself gather yourself and deal with it, you will find a way.

The delayed job gem.

Having not dealt with background processes before, started with railscasts on delayed job.

Well having fixed it in my mind how to go about this task, I started, but easier said than done.

Plan was - Make use of background process and prepare the csv file, just like that the part was done and dusted.

def download
  ExportCsv.new(@search.to_a.map(&:id), current_user.id).delay.perform
end

The GET request to download method should be an AJAX call.

download.js.erb
  alert('Preparing file to download, you will be notified once its complete...');
  timeout('<%= SOME_DELAY %>'); # Will be explained.

The custom class for delayed job operations.

lib/export_csv.rb
class ExportCsv < Struct.new(:site_ids, :user_id)

  def perform
    sites = Site.where(id: site_ids)
    CSV.open("tmp/sites_xls/#{user_id}.csv", "w+") do |csv|
      csv << ["ID", .....]
      sites.each do |site|
        csv << [site.id, site.so.on....]
      end
    end
  end

  def after(job)
    User.find(user_id).update_attributes(csv_download: true)
  end

  handle_asynchronously :perform

end

Add new column csv_download for users table.

Attribute will be set to true once after(method) is executed, which means asynchronously file is generated and residing on the server space.

So the preparation happens at background so there is no holding up server.

The poll function.

But why?

Once the csv preparation is complete the file seems to be in the application directory, so to notify for the client that the file is ready is download.

So what lies inside timeout function - polls rails method to check if file is ready to serve, when ready, pop a confirmation box to download.

function timeout(time) {
    setTimeout(function() {
        $.ajax({ url: "/controller/check_if_ready?",
            type : 'GET',
            dataType : 'json',
            success: function(response)
            {
                if(response.value === "success") {
                    var value = confirm("File ready to download ?");
                    if (value == true) {
                        window.location="/controller/download_csv";
                    } else if(value == false) {
                        $.ajax({ url: "/controller/remove_file"})
                    }
                }
                else
                {
                    timeout('4000') // wait 4 seconds 
                }
            },
        });
    }, time);
}

The Poll method - Ajax-ify.

check_if_ready?

def check_if_ready?
  if current_user.set_csv_download? && File.exist?(@file_path)
    render json: { value: "success" }
  else
    render json: { value: nil }
  end
end

Navigation to different parts of application.

It seems everything's fine but what if the user navigates when the background process is working on CSV.

Update accordingly

download.js.erb
<% unless session[:set_download] %>
    alert('Preparing file to download, you will be notified once its complete...');
  <% session[:set_download] = 1 %>
    timeout('<%= INITIAL_DELAY %>'); // initial delay.
<% else %>
    alert('Wait for the completion of earlier file...');
<% end %>

Call the function(be it globally) in whichever layout necessary.

%script
    - if session[:download]
      timeout('#{DELAY}'); 

Use a flag attribute(session[:download]) to make poll calls, once the download is complete you can toggle back to falsy value .

KickStart the background process.

RAILS_ENV=environment nice -n 15 script/delayed_job -n 3 restart

Final few points

  • Delay value can be configured based on count of records to reduce number of ajax calls. But on navigation to different pages(from user) you might also want to have smaller delay, as there can be another request within that delay value. Having a count down timer is a best option.
  • Delete the file and reset values afer the download.
  • When multiple files are requested to download? I have proceeded by allowing user to make one download request at a time. You might also want to handle this situation.
  • Important: On Capistrano deployment process, you will have to restart the delayed job process and make sure no download is in progress. If not handled - The Ajax call may get into loop and request to poll method may be endless, counter value can be used to fix this and kill the request.
  • If user requests for a download and closes the session, it should not be a concern of method call to poll getting into loop, as we have handled calling of Javascript function based on session value.

Finally I'm done with this one, there are some pros and cons I mostly have covered them all in my points. I would love to have your feedback. Oh no comment section up here.