My thoughts on computers, programming, ancient spell scrolls and other magic devices...: Managing Django transacion processing (autocommit vs performance)

Django has some great ORM tools embedded that enable safe and simple methods of managing your database. There are also modules responsible for transaction processing, mainly TransactionMiddleware. By default it is present in the settings file, and I see no reason why it shouldn't - this middleware provides a very simple, yet powerful mechanism that considers your view processing as a single transaction. This pseudo-code presents the main idea:

1  try:

2     start_transaction()

3     (your view code)

4     commit() 

5  except:

6     rollback()

This is great, but since its a middleware module its not applicable to background processing. Instead the autocommit on save is used. This may be ineffective when you are processing large amounts of data (using celery, or some cron scheduled script). Each time you commit some changes, the DBMS has to perform some lazy tasks that usually require some I/O operations (recreating indexes, checking constraints etc.). This greatly increases the total processing time, yet if the process dies you still have some data... which tend to be useless without the missing data. So why not apply the above pattern to this problem? Django supports manual transaction management, all you have to do is use the commit_on_success or commit_manually decorators:

1 from django.db import transaction
2
3
4 @transaction.commit_on_success
5 def sophisticated_background_processing():
6     #your code come here :-)
7     #(...)
8     pass
9
10 @transaction.commit_manually
11 def some_other_background_processing():
12     try:
13         #your code
14         #(...)
15         transaction.commit()
16     except SomeException as se:
17         #handle exception
18         transaction.commit()
19     except:
20         #unhandled exceptions
21         transaction.rollback()

The commit_on_success acts just like TransactionMiddleware for view, in most cases it will do. If we need some more flexibility we can always try the commit_manually decorator. It enables commiting/rollbacking data whenever you want. Just make sure all execution paths end with an appropriate action or django will raise an exception.

Using manual-commit instead of auto-commit increased my accounting script performance about 5-10x, depending on the instance size (the processing is specific, and the data model is rather horizontal).

My thoughts on computers, programming, ancient spell scrolls and other magic devices...

Wednesday, April 25, 2012

Managing Django transacion processing (autocommit vs performance)

No comments:

Post a Comment