My thoughts on computers, programming, ancient spell scrolls and other magic devices...: 2014

Sunday, August 31, 2014

Possible I/O Errors while starting a new process from a daemon script (Popen)

I recently had a problem with one of my document processing application crashing after trying to revive malfunctioned workers, with an:

OSError: [Errno 5] Input/output error

After identifying the problem I logged in to server via SSH, restarted the service with debug logging enabled and started processing a sequence of documents, that guaranteed that the workers will have to be restarted. I was really surprised to see that the service managed to cope with the problem and bring up all the workers.

I verified that the only place that service could have crashed was starting the new worker process before the actual fork. I traced in python std libraries, that starting a new process eventually calls Popen(self).

Inspecting multiprocessing/forking.py:

if sys.platform != 'win32': 

    # some not importent stuff
    class Popen(object):

        def __init__(self, process_obj):
            sys.stdout.flush()
            sys.stderr.flush()
            self.returncode = None

            self.pid = os.fork()
            if self.pid == 0:
                if 'random' in sys.modules:
                    import random
                    random.seed() 
                code = process_obj._bootstrap()
                sys.stdout.flush()
                sys.stderr.flush()
                os._exit(code)

So before os.fork is actually called, the script tries to flush standard output and error streams. So the IOError that was caused by our script trying to flush stderr/stdout to the /dev/tty device, which was unavailable after a period of time (dropping ssh session after starting the daemon). I investigated the script for any left behind print / logging StreamHandlers. After a long investigation it occurred that a 3-rd party library was rarely logging some errors using a StreamHandler...

So the lesson learned is always verify that Your daemon scripts don't write stdout/stderr scripts or make sure the streams are redirect in the init script. Besides the fact that stderr won't provide any valuable information when You're off-line, You can easily run in to similar problems.

KR

Saturday, January 25, 2014

Python list, set and dict comprehentions 2.7+

Python supports list comprehension since v2.0. These expressions truly revolutionized python, making various functions much simpler and more readable. Let's see some basic LC in action:

>>> [x for x in "test"]
['t', 'e', 's', 't']

Now often there is a need to generate a set or dict in a similar way, so I often see such code:

>>> set([x for x in "test"])
set(['s', 'e', 't'])
>>> dict([(x,x) for x in "test"])
{'s': 's', 'e': 'e', 't': 't'}

This is good:
- it works!
- it's more readable than implementing a for loop.

But using python 2.7+ You can to it better! The latest python 2.7.X and 3.X support dict and set comprehensions - now this is pythonic! You can achieve the same results the following way:

>>> {x for x in "test"}
set(['s', 'e', 't'])
>>> {x: x for x in "test"}
{'s': 's', 'e': 'e', 't': 't'}

This is excellent!
- it works!
- it's more readable than creating a dict/set from a LC
- it's faster!!!

Simple performance comparison:

>>> timeit.timeit('set([x for x in "test"])')
0.44252514839172363
>>> timeit.timeit('{x for x in "test"}')
0.37139105796813965

>>> timeit.timeit('dict([(x,x) for x in "test"])')
0.8899600505828857
>>> timeit.timeit('{x: x for x in "test"}')
0.3909759521484375

Cheers!
KR

Monday, January 13, 2014

Preventing python from generating *pyc files on runtime

Whenever You import a module, CPython compiles it to byte code, and saves it using the same path and filename, except for a *.pyc extension (valid for python 2.X). A python script converted to byte code does not run any faster, the only advantage is that pyc files are loaded faster. Although this is generally desired it may cause problems during development, when our application imports existing pyc files instead of compiling our freshly modified source files. Such problems should not occur to often, but when they do, we usually don't have a clue what's going on (I just fixed it, and it still crashes!?).

Remove *pyc files!

Of course You can create a script that performs a search-and-destroy on all *pyc files located in all of your projects subdirectories. This is cool, but preventing python from generating *pyc files (in dev) is even better.

So there are basically three ways to achieve it in python 2.7:

1. During script run, use the -B parameter
python -B some_script.py

2. Project level, insert this line on top your application / script (*py)
import sys
sys.dont_write_bytecode = True

3. Environment level, set the following env. variable:
export PYTHONDONTWRITEBYTECODE=1

Have fun eradicating *pyc files in Your dev environments!

Cheers!
KR