As a Django developer, there are many times that you are asked for CSV export of your database tables.
To do this, Python’s CSV module is a simple library to use. But when your dataset is large, it becomes so inefficient that often leads to timeout error. There are two problem with this method:
- Python is not very fast.
- Generating model objects by Django Orm is very resource consuming.
As an alternative approach, I suggest using Postgres CSV functions. This way, not only we don’t need to create thousands of model objects, but also we rapidly generate CSV thanks to Postgres speed.
But do I need to write complex SQL queries?
What should I do if I currently have a filtered queryset, for example, in Django admin?
Don’t worry. There is no need to write SQL. Here, we write a function that accepts a queryset and a filename and returns a CSV response
from django.db import connection from django.utils import timezone from django.http import HttpResponse def qs_to_csv_response(qs, filename): sql, params = qs.query.sql_with_params() sql = f"COPY ({sql}) TO STDOUT WITH (FORMAT CSV, HEADER, DELIMITER E'\t')" filename = f'{filename}-{timezone.now():%Y-%m-%d_%H-%M-%S}.csv' response = HttpResponse(content_type='text/csv') response['Content-Disposition'] = f'attachment; filename={filename}' with connection.cursor() as cur: sql = cur.mogrify(sql, params) cur.copy_expert(sql, response) return response
Suppose we have these models:
from django.db import models from django.utils.translation import ugettext_lazy as _ class City(models.Model): name = models.CharField(max_length=50) class Place(models.Model): PLACE_TYPE_CHOICES = ( (1, _('Park')), (2, _('Cafe')), (3, _('Resturant')), (4, _('Cinema')), ) name = models.CharField(max_length=50) city = models.ForeignKey(City, on_delete=models.CASCADE) place_type = models.PositiveSmallIntegerField( choices=PLACE_TYPE_CHOICES )
If we want to have a CSV export from Place table containing place_id, place_name, place_type and city_name, we pass this query to the function above:
from django.db.models import F Place.objects.values( 'id', 'name', 'place_type', city_name=F('city__name') )
There is some caveat here. Postgres doesn’t know anything about our pretty, human readable and translated place types and putting some numbers in place_type columns can be completely useless.
The function below can be used to tell Postgres about this kind of mappings using Case and When:
from django.db.models import Case, When, Value, CharField def map_choices(field_name, choices): return Case( *[When(**{field_name: value, 'then': Value(str(representation))}) for value, representation in choices], output_field=CharField() )
And then, we use it in the query:
Place.objects.values( 'id', 'name', verbose_type=map_choices('place_type', Place.PLACE_TYPE_CHOICES), city_name=F('city__name') )
By using Django ORM tools like F, Func, ExpressionWrapper, RawSQL … you can easily write more complex queries in a performant way.
Top comments (1)
Neat solution! This made my day!