Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

In my case, the application is an API built over Django Rest Framework and it uses a handful of custom permissions and some other whistles with a bug that only was happening in production so it was a bit difficult to trace what the heck was going on there because of the non-debug-insight.

Lots of people use django-debug-toolbar for development.
Of course, enabling debug in production is a no-no rule, but one thing are best practices and another thing is, “sometimes”, reality.

Long story short:
You can enable Django debug toolbar in production on demand just using one of its features and a simple browser extension. This is how.

This is my django.settings config for debug toolbar:

DEBUG_TOOLBAR_CONFIG = {
# Toolbar options
"SHOW_TOOLBAR_CALLBACK": "mymodule.utils.show_debug_toolbar",
"SHOW_COLLAPSED": True,
"SQL_WARNING_THRESHOLD": 70
}

Then in your file mymodule.utils.py:

from main import settingsdef show_debug_toolbar(request):
if "HTTP_MYAPPKEY" in request.META:
return request.META["HTTP_MYAPPKEY"] == settings.SECRET_KEY
return settings.DEBUG

And finally, find an extension for your browser able to inject an Http header in your browser requests. I am using “Modify Header Value for Chrome”, but up to you, there are some of them out there.

To see it in action you only need to go to the extension, setup your application url, the header name and the value of your settings.SECRET_KEY:


Take in care of the subtle detail:
In the show_debug_toolbar function the header is named HTTP_MYAPPKEY while in the browser extension it is set as MYAPPKEY.
This is because how Django handles the headers received in the request adding them a leading “HTTP_”, uppercasing them and replacing underscores and dashes

Now try your self and in under 5 minutes you will be a happy owner of a production debuggable application :-)

Important note:
This should be only used under HTTPS for obvious reasons; you don’t want anyone sniffing the wire or just transparent proxying your connection and see your APPKEY in clear text, although you can change the values for whatever you wish.

Hope this is useful for at least one person because it took more time writing this post than fixing the issue :-P

 

(source:https://www.pexels.com/photo/close-up-photography-of-colored-pencils-743986/)

When it comes to optimizing an ORM, the biggest weapon is to understand how ORM works (at least at a high level). This makes it easier for you to understand all the rules and guidelines for creating fast applications. Therefore, I strongly recommend that you read the Django documentation on this topic at least once. My purpose in writing this article is to condense these tips and tricks into an easy-to-reference compilation and add some of my own tips.

Caveats

Before blindly following any of the below guidance, keep the following in mind.

  • Only use optimizations that obfuscate or complicate the code if it is absolutely necessary. Prioritize readability and maintainability where possible.
  • Not all of these tips are hard-and-fast rules. Use your judgement to determine what improvements make sense for your code.

Django ORM Optimization Tips

1. Profile

2. Be aware of QuerySet’s lazy evaluation.

Perhaps the most important aspect of the Django ORM to understand is how QuerySets work. Since QuerySets are lazily-evaluated, you can chain filter() and exclude() all day without actually hitting the database. Look out for this in order to evaluate QuerySets only when you actually need to.

When QuerySets are evaluated:

  1. # Iteration
  2. for person in Person.objects.all():
  3. # Some logic
  4. # Slicing/Indexing
  5. Person.objects.all()[0]
  6. # Pickling (i.e. serialization)
  7. pickle.dumps(Person.objects.all())
  8. # Evaluation functions
  9. repr(Person.objects.all())
  10. len(Person.objects.all())
  11. list(Person.objects.all())
  12. bool(Person.objects.all())
  13. # Other
  14. [person for person in Person.objects.all()] # List comprehensions
  15. person in Person.objects.all() # `in` checks

When QuerySets are not cached:

  1. # Not reusing evaluated QuerySets
  2. print([p.name for p in Person.objects.all()]) # QuerySet evaluated and cached
  3. print([p.name for p in Person.objects.all()]) # New QuerySet is evaluated and cached
  4. # Slicing/indexing unevaluated QuerySets
  5. queryset = Person.objects.all()
  6. print(queryset[0]) # Queries the database
  7. print(queryset[0]) # Queries the database again
  8. # Printing
  9. print(Person.objects.all())

When QuerySets are cached:

  1. # Reusing an evaluated QuerySet
  2. queryset = Person.objects.all()
  3. print([p.name for p in queryset]) # QuerySet evaluated and cached
  4. print([p.name for p in queryset]) # Cached results are used
  5. # Slicing/indexing evaluated QuerySets
  6. queryset = Person.objects.all()
  7. list(queryset) # Queryset evaluated and cached
  8. print(queryset[0]) # Cache used
  9. print(queryset[0]) # Cache used

3. Be aware of which model attributes are not cached.

When Django evaluates a QuerySet, foreign-key relationships and reverse relationships are not included in the query, and thus not included in the cache, unless specified otherwise.

  1. ## Not initially retrieved/cached
  2. # Foreign-key related objects
  3. person = Person.objects.get(id=1)
  4. person.father # foreign object is retrieved and cached
  5. person.father # cached version is used
  6. ## Never cached
  7. # Callable attributes
  8. person = Person.objects.get(id=1)
  9. person.children.all() # Database hit
  10. person.children.all() # Another database hit

4. Use select_related() and prefetch_related()  when you will need foreign-key/reverse related objects.

These tools tell Django that you actually will need these objects, so that it will go ahead and query and cache them for you. The common pitfall here is to not use these when they are needed. This results in a lot of unnecessary database queries.

  1. # DON'T
  2. queryset = Person.objects.all()
  3. for person in queryset:
  4. person.father # Foreign key relationship results in a database hit each iteration
  5. # DO
  6. queryset = Person.objects.all().select_related('father') # Foreign key object is included in query and cached
  7. for person in queryset:
  8. person.father # Hits the cache instead of the database

5. Try to avoid database queries in a loop.

This is something you will most likely run into, as trying to write clean code can often result in this pitfall. Using get() or evaluating a QuerySet in a loop can be very bad for performance. Instead, do what you can to do the database work before entering the loop.

Here is a contrived example:

  1. # DON'T (contrived example)
  2. filtered = Person.objects.filter(first_name='Shallan', last_name='Davar')
  3. for age in range(18):
  4. person = filtered.get(age=age) # Database query on each iteration
  5. # DO (contrived example)
  6. filtered = Person.objects.filter( # Narrow down the QuerySet to only what you need
  7. first_name='Shallan',
  8. last_name='Davar',
  9. age_gte=0,
  10. age_lte=18,
  11. )
  12. lookup = {person.age: person for person in filtered} # Evaluate the QuerySet and construct lookup
  13. for age in range(18):
  14. person = lookup[age] # No database query

6. Use iterator() to iterate through a very large QuerySet only once.

If you know your QuerySet could be very large, and you only need to iterate over it once, it makes sense to eliminate usage of the cache in order to preserve memory and other overhead. iterator() provides just this ability.

  1. # Save memory by not caching anything
  2. for person in Person.objects.iterator():
  3. # Some logic

7. Do work in the database rather than in Python.

Your database can do almost anything data-related much faster than Python can. If at all possible, do your work in the database. Django provides many tools to make this possible.

Use filter() and exclude() for filtering:

  1. # DON'T
  2. for person in Person.objects.all():
  3. if person.age >= 18:
  4. # Do something
  5. # DO
  6. for person in Person.objects.filter(age__gte=18):
  7. # Do something

Use F expressions:

  1. # DON'T
  2. for person in Person.objects.all():
  3. person.age += 1
  4. person.save()
  5. # DO
  6. Person.objects.update(age=F('age') + 1)

Do aggregation in the database:

  1. # DON'T
  2. max_age = 0
  3. for person in Person.objects.all():
  4. if person.age > max_age:
  5. max_age = person.age
  6. # DO
  7. max_age = Person.objects.all().aggregate(Max('age'))['age__max']

8. Use values() and values_list() to get only the things you need.

values() and values_list() provide lists, dictionaries, or tuples evaluating only the fields you specify.

Use values():

  1. # DON'T
  2. age_lookup = {
  3. person.name: person.age
  4. for person in Person.objects.all()
  5. }
  6. # DO
  7. age_lookup = {
  8. person['name']: person['age']
  9. for person in Person.objects.values('name', 'age')
  10. }

Use values_list():

  1. # DON'T
  2. person_ids = [person.id for person in Person.objects.all()]
  3. # DO
  4. person_ids = Person.objects.values_list('id', flat=True)

9. Use defer() and only() when you only need certain fields.

Caveats:

  • Use these in favor of values() when you need a QuerySet instead of a list of dicts.
  • May only make a difference if the fields you are excluding require a lot of processing to be converted to a Python object.

Use defer():

  1. queryset = Person.objects.defer('age') # Imagine age is computationally expensive
  2. for person in queryset:
  3. print(person.id)
  4. print(person.name)

Use only():

  1. queryset = Person.objects.only('name')
  2. for person in queryset:
  3. print(person.name)

10. Use count() and exists() when you don’t need the contents of the QuerySet.

Caveats:

  • Only use these when you don’t need to evaluate the QuerySet for other reasons.

Use count():

  1. # DON'T
  2. count = len(Person.objects.all()) # Evaluates the entire queryset
  3. # DO
  4. count = Person.objects.count() # Executes more efficient SQL to determine count

Use exists():

  1. # DON'T
  2. exists = len(Person.objects.all()) > 0
  3. # DO
  4. exists = Person.objects.exists()

11. Use delete() and update() when possible.

Instead of updating model instances one at a time, delete() and update() allow you to do this in bulk.

Use delete():

  1. # DON'T
  2. for person in Person.objects.all():
  3. person.delete()
  4. # DO
  5. Person.objects.all().delete()

Use update():

  1. # DON'T
  2. for person in Person.objects.all():
  3. person.age = 0
  4. person.save()
  5. # DO
  6. Person.objects.update(age=0)

12. Use bulk_create() when possible.

Caveats:

  • This works a bit differently than calling create().
  • Read more about it in the Django docs.
  1. names = ['Jeff', 'Beth', 'Tim']
  2. creates = []
  3. for name in names:
  4. creates.append(
  5. Person(name=name, age=0)
  6. )
  7. Person.objects.bulk_create(creates)

Similarly, bulk-add to many-to-many fields:

  1. person = Person.objects.get(id=1)
  2. person.jobs.add(job1, job2, job3)

13. Use foreign key values directly.

The Django ORM automatically retrieves and caches foreign keys, so use them instead of causing a needless database query.

  1. # DON'T
  2. father_id = Person.objects.get(id=1).father.id # Causes a needless database query
  3. # DO
  4. father_id = Person.objects.get(id=1).father_id # The foreign key is already cached. No query

Closing Remarks

Using just these 13 tips, you can resolve most bottlenecks with the Django ORM. 

Know us

Contact us

Name

Email *

Message *