When it comes to optimizing an ORM, the biggest weapon is to understand how ORM works (at least at a high level). This makes it easier for you to understand all the rules and guidelines for creating fast applications. Therefore, I strongly recommend that you read the Django documentation on this topic at least once. My purpose in writing this article is to condense these tips and tricks into an easy-to-reference compilation and add some of my own tips.
Caveats
Before blindly following any of the below guidance, keep the following in mind.
- Only use optimizations that obfuscate or complicate the code if it is absolutely necessary. Prioritize readability and maintainability where possible.
- Not all of these tips are hard-and-fast rules. Use your judgement to determine what improvements make sense for your code.
Django ORM Optimization Tips
1. Profile
2. Be aware of QuerySet’s lazy evaluation.
Perhaps the most important aspect of the Django ORM to understand is how QuerySets work. Since QuerySets are lazily-evaluated, you can chain filter()
and exclude()
all day without actually hitting the database. Look out for this in order to evaluate QuerySets only when you actually need to.
When QuerySets are evaluated:
- # Iteration
- for person in Person.objects.all():
- # Some logic
- # Slicing/Indexing
- Person.objects.all()[0]
- # Pickling (i.e. serialization)
- pickle.dumps(Person.objects.all())
- # Evaluation functions
- repr(Person.objects.all())
- len(Person.objects.all())
- list(Person.objects.all())
- bool(Person.objects.all())
- # Other
- [person for person in Person.objects.all()] # List comprehensions
- person in Person.objects.all() # `in` checks
When QuerySets are not cached:
- # Not reusing evaluated QuerySets
- print([p.name for p in Person.objects.all()]) # QuerySet evaluated and cached
- print([p.name for p in Person.objects.all()]) # New QuerySet is evaluated and cached
- # Slicing/indexing unevaluated QuerySets
- queryset = Person.objects.all()
- print(queryset[0]) # Queries the database
- print(queryset[0]) # Queries the database again
- # Printing
- print(Person.objects.all())
When QuerySets are cached:
- # Reusing an evaluated QuerySet
- queryset = Person.objects.all()
- print([p.name for p in queryset]) # QuerySet evaluated and cached
- print([p.name for p in queryset]) # Cached results are used
- # Slicing/indexing evaluated QuerySets
- queryset = Person.objects.all()
- list(queryset) # Queryset evaluated and cached
- print(queryset[0]) # Cache used
- print(queryset[0]) # Cache used
3. Be aware of which model attributes are not cached.
When Django evaluates a QuerySet, foreign-key relationships and reverse relationships are not included in the query, and thus not included in the cache, unless specified otherwise.
- ## Not initially retrieved/cached
- # Foreign-key related objects
- person = Person.objects.get(id=1)
- person.father # foreign object is retrieved and cached
- person.father # cached version is used
- ## Never cached
- # Callable attributes
- person = Person.objects.get(id=1)
- person.children.all() # Database hit
- person.children.all() # Another database hit
4. Use select_related()
and prefetch_related()
when you will need foreign-key/reverse related objects.
These tools tell Django that you actually will need these objects, so that it will go ahead and query and cache them for you. The common pitfall here is to not use these when they are needed. This results in a lot of unnecessary database queries.
- # DON'T
- queryset = Person.objects.all()
- for person in queryset:
- person.father # Foreign key relationship results in a database hit each iteration
- # DO
- queryset = Person.objects.all().select_related('father') # Foreign key object is included in query and cached
- for person in queryset:
- person.father # Hits the cache instead of the database
5. Try to avoid database queries in a loop.
This is something you will most likely run into, as trying to write clean code can often result in this pitfall. Using get()
or evaluating a QuerySet in a loop can be very bad for performance. Instead, do what you can to do the database work before entering the loop.
Here is a contrived example:
- # DON'T (contrived example)
- filtered = Person.objects.filter(first_name='Shallan', last_name='Davar')
- for age in range(18):
- person = filtered.get(age=age) # Database query on each iteration
- # DO (contrived example)
- filtered = Person.objects.filter( # Narrow down the QuerySet to only what you need
- first_name='Shallan',
- last_name='Davar',
- age_gte=0,
- age_lte=18,
- )
- lookup = {person.age: person for person in filtered} # Evaluate the QuerySet and construct lookup
- for age in range(18):
- person = lookup[age] # No database query
6. Use iterator()
to iterate through a very large QuerySet only once.
If you know your QuerySet could be very large, and you only need to iterate over it once, it makes sense to eliminate usage of the cache in order to preserve memory and other overhead. iterator()
provides just this ability.
- # Save memory by not caching anything
- for person in Person.objects.iterator():
- # Some logic
7. Do work in the database rather than in Python.
Your database can do almost anything data-related much faster than Python can. If at all possible, do your work in the database. Django provides many tools to make this possible.
Use filter()
and exclude()
for filtering:
- # DON'T
- for person in Person.objects.all():
- if person.age >= 18:
- # Do something
- # DO
- for person in Person.objects.filter(age__gte=18):
- # Do something
Use F expressions:
- # DON'T
- for person in Person.objects.all():
- person.age += 1
- person.save()
- # DO
- Person.objects.update(age=F('age') + 1)
Do aggregation in the database:
- # DON'T
- max_age = 0
- for person in Person.objects.all():
- if person.age > max_age:
- max_age = person.age
- # DO
- max_age = Person.objects.all().aggregate(Max('age'))['age__max']
8. Use values()
and values_list()
to get only the things you need.
values()
and values_list()
provide lists, dictionaries, or tuples evaluating only the fields you specify.
Use values()
:
- # DON'T
- age_lookup = {
- person.name: person.age
- for person in Person.objects.all()
- }
- # DO
- age_lookup = {
- person['name']: person['age']
- for person in Person.objects.values('name', 'age')
- }
Use values_list()
:
- # DON'T
- person_ids = [person.id for person in Person.objects.all()]
- # DO
- person_ids = Person.objects.values_list('id', flat=True)
9. Use defer()
and only()
when you only need certain fields.
Caveats:
- Use these in favor of
values()
when you need a QuerySet instead of a list of dicts. - May only make a difference if the fields you are excluding require a lot of processing to be converted to a Python object.
Use defer()
:
- queryset = Person.objects.defer('age') # Imagine age is computationally expensive
- for person in queryset:
- print(person.id)
- print(person.name)
Use only()
:
- queryset = Person.objects.only('name')
- for person in queryset:
- print(person.name)
10. Use count()
and exists()
when you don’t need the contents of the QuerySet.
Caveats:
- Only use these when you don’t need to evaluate the QuerySet for other reasons.
Use count()
:
- # DON'T
- count = len(Person.objects.all()) # Evaluates the entire queryset
- # DO
- count = Person.objects.count() # Executes more efficient SQL to determine count
Use exists()
:
- # DON'T
- exists = len(Person.objects.all()) > 0
- # DO
- exists = Person.objects.exists()
11. Use delete()
and update()
when possible.
Instead of updating model instances one at a time, delete()
and update()
allow you to do this in bulk.
Use delete()
:
- # DON'T
- for person in Person.objects.all():
- person.delete()
- # DO
- Person.objects.all().delete()
Use update()
:
- # DON'T
- for person in Person.objects.all():
- person.age = 0
- person.save()
- # DO
- Person.objects.update(age=0)
12. Use bulk_create()
when possible.
Caveats:
- This works a bit differently than calling
create()
. - Read more about it in the Django docs.
- names = ['Jeff', 'Beth', 'Tim']
- creates = []
- for name in names:
- creates.append(
- Person(name=name, age=0)
- )
- Person.objects.bulk_create(creates)
Similarly, bulk-add to many-to-many fields:
- person = Person.objects.get(id=1)
- person.jobs.add(job1, job2, job3)
13. Use foreign key values directly.
The Django ORM automatically retrieves and caches foreign keys, so use them instead of causing a needless database query.
- # DON'T
- father_id = Person.objects.get(id=1).father.id # Causes a needless database query
- # DO
- father_id = Person.objects.get(id=1).father_id # The foreign key is already cached. No query
Closing Remarks
Using just these 13 tips, you can resolve most bottlenecks with the Django ORM.