My thoughts on computers, programming, ancient spell scrolls and other magic devices...

Sunday, August 31, 2014

Possible I/O Errors while starting a new process from a daemon script (Popen)

I recently had a problem with one of my document processing application crashing after trying to revive malfunctioned workers, with an:

OSError: [Errno 5] Input/output error

After identifying the problem I logged in to server via SSH, restarted the service with debug logging enabled and started processing a sequence of documents, that guaranteed that the workers will have to be restarted. I was really surprised to see that the service managed to cope with the problem and bring up all the workers.

I verified that the only place that service could have crashed was starting the new worker process before the actual fork. I traced in python std libraries, that starting a new process eventually calls Popen(self).

Inspecting multiprocessing/forking.py:

if sys.platform != 'win32': 

    # some not importent stuff
    class Popen(object):

        def __init__(self, process_obj):
            sys.stdout.flush()
            sys.stderr.flush()
            self.returncode = None

            self.pid = os.fork()
            if self.pid == 0:
                if 'random' in sys.modules:
                    import random
                    random.seed() 
                code = process_obj._bootstrap()
                sys.stdout.flush()
                sys.stderr.flush()
                os._exit(code)

So before os.fork is actually called, the script tries to flush standard output and error streams. So the IOError that was caused by our script trying to flush stderr/stdout to the /dev/tty device, which was unavailable after a period of time (dropping ssh session after starting the daemon). I investigated the script for any left behind print / logging StreamHandlers. After a long investigation it occurred that a 3-rd party library was rarely logging some errors using a StreamHandler...

So the lesson learned is always verify that Your daemon scripts don't write stdout/stderr scripts or make sure the streams are redirect in the init script. Besides the fact that stderr won't provide any valuable information when You're off-line, You can easily run in to similar problems.

KR

Saturday, January 25, 2014

Python list, set and dict comprehentions 2.7+

Python supports list comprehension since v2.0. These expressions truly revolutionized python, making various functions much simpler and more readable. Let's see some basic LC in action:

>>> [x for x in "test"]
['t', 'e', 's', 't']

Now often there is a need to generate a set or dict in a similar way, so I often see such code:

>>> set([x for x in "test"])
set(['s', 'e', 't'])
>>> dict([(x,x) for x in "test"])
{'s': 's', 'e': 'e', 't': 't'}

This is good:
- it works!
- it's more readable than implementing a for loop.

But using python 2.7+ You can to it better! The latest python 2.7.X and 3.X support dict and set comprehensions - now this is pythonic! You can achieve the same results the following way:

>>> {x for x in "test"}
set(['s', 'e', 't'])
>>> {x: x for x in "test"}
{'s': 's', 'e': 'e', 't': 't'}

This is excellent!
- it works!
- it's more readable than creating a dict/set from a LC
- it's faster!!!

Simple performance comparison:

>>> timeit.timeit('set([x for x in "test"])')
0.44252514839172363
>>> timeit.timeit('{x for x in "test"}')
0.37139105796813965

>>> timeit.timeit('dict([(x,x) for x in "test"])')
0.8899600505828857
>>> timeit.timeit('{x: x for x in "test"}')
0.3909759521484375

Cheers!
KR

Monday, January 13, 2014

Preventing python from generating *pyc files on runtime

Whenever You import a module, CPython compiles it to byte code, and saves it using the same path and filename, except for a *.pyc extension (valid for python 2.X). A python script converted to byte code does not run any faster, the only advantage is that pyc files are loaded faster. Although this is generally desired it may cause problems during development, when our application imports existing pyc files instead of compiling our freshly modified source files. Such problems should not occur to often, but when they do, we usually don't have a clue what's going on (I just fixed it, and it still crashes!?).

Remove *pyc files!

Of course You can create a script that performs a search-and-destroy on all *pyc files located in all of your projects subdirectories. This is cool, but preventing python from generating *pyc files (in dev) is even better.

So there are basically three ways to achieve it in python 2.7:

1. During script run, use the -B parameter
python -B some_script.py

2. Project level, insert this line on top your application / script (*py)
import sys
sys.dont_write_bytecode = True

3. Environment level, set the following env. variable:
export PYTHONDONTWRITEBYTECODE=1

Have fun eradicating *pyc files in Your dev environments!

Cheers!
KR

Monday, December 9, 2013

A few words on boolean casting/mapping in python

So I thought I'd write a few words about truth value testing in python. It's quite common to if some_object conditionals, and I believe that programmers are not always aware of what actually happens when such a line is being evaluated, and tend to perceive it only as len(some_object) > 0 if it's a collection or some_object is not None in other cases. We may refer to the python docs to verify it:

object.__nonzero__(self): Called to implement truth value testing and the built-in operation bool(); should return False or True, or their integer equivalents 0 or 1. When this method is not defined, __len__() is called, if it is defined, and the object is considered true if its result is nonzero. If a class defines neither __len__() nor __nonzero__(), all its instances are considered true.

So basicly when we compose something like this:

bool(some_obj)

Is evaluated like this:

some_obj.__nonzero__() if hasattr(some_obj, '__nonzero__') else len(some_obj) <> 0 if hasattr(some_obj, '__len__') else True

So basically conditionals that made with premeditation not from laziness, especially when using 3rd-party libraries/frameworks. For example, when using python requests,

>>> import requests
>>> res = requests.get('https://google.pl/not_exists')
>>> res
<Response [404]>
>>> bool(res)
False
>>> res = requests.get('https://google.pl/')
>>> res
<Response [200]>
>>> bool(res)
True

In case of python-requests http errors are not raised by default, instead if a status code of 4XX or 5XX is returned, the __nonzero__ method returns False. In any case res is not None is always True in the above case.

Thursday, February 21, 2013

Python logging introspection

How many times have you used a print instruction instead of logger.debug or logger.info? Well I used to do it frequently. The thing is, setting up a logger in an application that has many of its own, is problematic. There is a tool however that may help you identify the right place for your logger (or identify the logger you want to use).

~ $ pip install logging_tree

So what does this package do? In practice this is a logging introspection tool that recreates a tree structure of your current loggers (along with handlers and filters). This is very useful since you may immediately identify which logger you should use, or at least confirm that adding a new logger will be mandatory.

For example: if You type the following in a python terminal:

>>> import logging_tree
>>> logging_tree.printout()
<--""
   Level WARNING

Well this is quite obvious, no modules are loaded, thus no custom logger was registered. On the other hand let's look at a logging tree of a young django app:

In [1]: import logging_tree

In [2]: logging_tree.printout()

<--""
   Level WARNING
   |
   o<--"core"
   |   Level INFO
   |   Handler File '/tmp/wt.log'
   |
   o<--"django"
   |   Handler Stream <open file '<stderr>', mode 'w' at 0x7f21e9890270>
   |     Filter <django.utils.log.RequireDebugTrue object at 0x1676350>
   |   |
   |   o<--[django.db]
   |   |   |
   |   |   o<--"django.db.backends"
   |   |
   |   o<--"django.request"
   |       Level ERROR
   |       Handler <django.utils.log.AdminEmailHandler object at 0x1676790>
   |
   o<--"nose"
   |   |
   |   o<--"nose.case"
   |   |
   |   o<--"nose.config"
   |   |
   |   o<--"nose.core"
   |   |
   |   o<--"nose.failure"
   |   |
   |   o<--"nose.importer"
   |   |
   |   o<--"nose.inspector"
   |   |
   |   o<--"nose.loader"
   |   |
   |   o<--"nose.plugins"
   |   |   |
   |   |   o<--"nose.plugins.attrib"
   |   |   |
   |   |   o<--"nose.plugins.capture"
   |   |   |
   |   |   o<--"nose.plugins.collect"
   |   |   |
   |   |   o<--"nose.plugins.cover"
   |   |   |
   |   |   o<--"nose.plugins.doctests"
   |   |   |
   |   |   o<--"nose.plugins.isolation"
   |   |   |
   |   |   o<--"nose.plugins.logcapture"
   |   |   |
   |   |   o<--"nose.plugins.manager"
   |   |   |
   |   |   o<--"nose.plugins.multiprocess"
   |   |   |
   |   |   o<--"nose.plugins.testid"
   |   |
   |   o<--"nose.proxy"
   |   |
   |   o<--"nose.result"
   |   |
   |   o<--"nose.selector"
   |   |
   |   o<--"nose.suite"
   |
   o<--[py]
   |   |
   |   o<--"py.warnings"
   |       Handler Stream <open file '<stderr>', mode 'w' at 0x7f21e9890270>
   |         Filter <django.utils.log.RequireDebugTrue object at 0x1676350>
   |
   o<--"south"
       Handler <south.logger.NullHandler object at 0x20c9350>

It's much easier to read this tree output than getting familiar with your applications logging configuration along with the documentation of other packages that are using the logging module.

Hope this saves You a lot of time.
Cheers!

KR

Thursday, February 14, 2013

A non-production function decorator

As most developers know, not every piece of code is meant to be run on a production server. Instead of using a lot of "ifs" here and there I suggest implementing a framework specific "non_production" decorator. A simple django-specific implementation could look like this:

def non_production(func):

    def is_production():
        #django specific 
        from django.conf import settings 
        return getattr(settings, "PRODUCTION", False) 

    def wrapped(*args, **kwargs):
        if is_production():
            raise Exception("%s is not meant to be run on a production server" % \
                    func.__name__)      
        else:
            return func(*args, **kwargs)
    return wrapped

Now all you have to do is to apply it to your dev/test only functions:

@non_production
def test_something(a, b):
     pass

Cheers!
KR

Thursday, January 31, 2013

Python code readability - PEP 8

Everybody familiar with python should be aware there are a bunch of documents called PEPs (Python Enhancement Proposals). As the name states, this documents are intended to help improve python, not only by adding new features to the interpreter and enhancing standard libraries, they also give guidelines (proposals) about meta-programming. Many experienced and respectable members of the python community participate in the PEP life-cycle, thus the documents are really reliable.

The thing I would like to talk about is PEP 8, also known as the Style Guide for Python Code. As you may know, the proposed name convention differs a bit from other high level programming languages (like Java, whitch its known for UlitimateLongAndDescriptiveClassName naming convetion, along with other funny things like evenLongerAndMoreSpohisticatedMethodNames). Anyway it's very intuitive, 4 spaces instead of tab, underscore inside multi-word variables, etc. Sounds cool, you may even download a package that checks your type convention with reference to PEP 8:

~ $ pip install pep8

Now you may check your perfectly compatible with PEP 8 source codes:

~ $ pep8 app/models.py

and to your wonderment get a result similar to this:

app/models.py:37:13: E128 continuation line under-indented for visual indent
app/models.py:36:58: E502 the backslash is redundant between brackets
app/models.py:42:1: W293 blank line contains whitespace
app/models.py:44:5: E303 too many blank lines (5)
app/models.py:63:80: E501 line too long (82 > 79 characters)
app/models.py:66:80: E501 line too long (95 > 79 characters)
app/models.py:67:80: E501 line too long (103 > 79 characters)
app/models.py:69:80: E501 line too long (82 > 79 characters)
app/models.py:70:80: E501 line too long (83 > 79 characters)
app/models.py:72:1: E302 expected 2 blank lines, found 1
app/models.py:80:1: W391 blank line at end of file

A clean and tidy source file prints so many errors... well yes it does. In fact without additional IDE features it's nearly impossible to write PEP 8 valid code. If you're using vim, you can get a PEP 8 validation plug-in that opens a quick-fix buffer with a list of PEP 8 incompatible statements. This is good for shaping your coding habits, but don't get to orthodox - don't ever change an existing projects coding convention, just keep to the current one.

~Thus spoketh KR,