Streaming CSV Downloads in Rails: A Practical Guide

Streaming large CSV files directly from your Rails application is an efficient way to handle data exports, especially when dealing with large datasets. By streaming the CSV, you avoid loading the entire dataset into memory, which can significantly improve performance. In this article, we'll walk through how to implement streaming CSV downloads in a Rails application using a generic Post model as an example. Model Let’s start with the Post model. This model will be responsible for generating the CSV data. Here's the code: class Post

Jan 27, 2025 - 19:25
 0
Streaming CSV Downloads in Rails: A Practical Guide

Streaming large CSV files directly from your Rails application is an efficient way to handle data exports, especially when dealing with large datasets. By streaming the CSV, you avoid loading the entire dataset into memory, which can significantly improve performance. In this article, we'll walk through how to implement streaming CSV downloads in a Rails application using a generic Post model as an example.

Model

Let’s start with the Post model. This model will be responsible for generating the CSV data. Here's the code:

class Post < ApplicationRecord
  # Generates a CSV string in memory (for smaller datasets)
  def self.to_csv
    CSV.generate(headers: true) do |csv|
      csv << csv_headers
      find_each(batch_size: 1000) { |post| csv << csv_row(post) }
    end
  end

  # Streams CSV data directly to an output stream (e.g., HTTP response)
  def self.stream_csv_to(output_stream)
    output_stream.write CSV.generate_line(csv_headers)
    find_each(batch_size: 1000) do |post|
      output_stream.write CSV.generate_line(csv_row(post))
    end
  end

  # Defines the CSV headers
  def self.csv_headers
    csv_attributes.map { |attr| human_attribute_name(attr) }
  end

  # Generates a CSV row for a given post
  def self.csv_row(post)
    csv_attributes.map { |attr| post.send(attr) }
  end

  # Specifies the attributes to include in the CSV
  def self.csv_attributes
    %w[id title content author_name created_at updated_at]
  end
end

Explanation:

  • to_csv: This method generates a CSV string in memory. It’s useful for smaller datasets but can be problematic for large datasets due to memory constraints.
  • stream_csv_to: This method streams the CSV data directly to an output stream (e.g., an HTTP response). It writes the CSV headers first, then iterates over each record in batches, writing each row to the stream.
  • csv_headers: This method returns the CSV headers, which are human-readable attribute names.
  • csv_row: This method generates a CSV row for a given post by mapping the csv_attributes to the corresponding values in the post.
  • csv_attributes: This method defines the attributes that should be included in the CSV. In this case, we’re including id, title, content, author_name, created_at, and updated_at.

Controller

Next, let’s look at the controller, which handles the HTTP request and response:

class PostsController < ApplicationController
  def index
    @posts = Post.all

    respond_to do |format|
      format.html do
        # Render the HTML view (e.g., a list of posts)
        render :index
      end
      format.csv do
        set_csv_headers
        stream_csv
      end
    end
  end

  private

  # Sets the necessary HTTP headers for the CSV response
  def set_csv_headers
    response.headers["Content-Type"] = "text/csv"
    response.headers["Content-Disposition"] = "attachment; filename=posts-#{Date.today}.csv"
    response.headers["Cache-Control"] = "no-cache"
    response.headers["Last-Modified"] = Time.now.httpdate
  end

  # Streams the CSV data to the response
  def stream_csv
    Post.stream_csv_to(response.stream)
  ensure
    response.stream.close
  end
end
  • index: This action handles both HTML and CSV formats. For HTML, it renders a standard view (e.g., a list of posts). For CSV, it sets the appropriate headers and streams the CSV data.
  • set_csv_headers: This method sets the necessary HTTP headers for the CSV response:
  • Content-Type: Specifies that the response is a CSV file.
  • Content-Disposition: Indicates that the file should be downloaded with a specific filename.
  • Cache-Control: Ensures that the response is not cached.
  • Last-Modified: Sets the last modified date to the current time.
  • stream_csv: This method streams the CSV data to the response. It uses the stream_csv_to method from the Post model, passing the response.stream as the output stream. The ensure block ensures that the stream is closed after the data is written.

Nginx Configuration

To support streaming, you may need to configure your Nginx server. Here’s an example configuration:

location / {
    proxy_pass http://puma;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-Proto https;
    proxy_redirect off;
}
  • proxy_pass: Forwards requests to the Puma application server.
  • proxy_http_version 1.1: Ensures that HTTP/1.1 is used, which is necessary for streaming.
  • proxy_set_header: Sets various headers to ensure that the request is properly forwarded and that the client's IP address is correctly passed through.
  • proxy_redirect off: Disables redirect handling, which is generally not needed for streaming.

How It Works

  1. When a user requests the CSV export (e.g., by visiting /posts.csv), the index action in the PostsController is triggered.
  2. The set_csv_headers method sets the appropriate headers for the CSV response.
  3. The stream_csv method calls Post.stream_csv_to(response.stream), which streams the CSV data directly to the client.
  4. Nginx ensures that the streaming connection is properly handled and forwarded to the Rails application.

Benefits of Streaming CSV Downloads

  • Memory Efficiency: Streaming avoids loading the entire dataset into memory, making it ideal for large datasets.
  • Faster Response Times: The client starts receiving data immediately, improving the user experience.
  • Scalability: This approach scales well for applications with large datasets or high traffic.

Conclusion

Streaming CSV downloads in Rails is a powerful technique for efficiently handling large datasets. By using the Post model as an example, we’ve demonstrated how to implement this feature in a Rails application. This approach can be adapted to other models and data export scenarios, making it a versatile tool for your Rails development toolkit.