My thoughts on computers, programming, ancient spell scrolls and other magic devices...: September 2012

Sunday, September 30, 2012

Regular expression based process termination in linux.

Even though Unix-based systems are generally stable, some processes (usually non-kernel) could benefit from a kill from time to time. GUI applications may hang, some batch programs may have memory leaks, there are many things that can go wrong. Fortunately Linux provides a set of tools that may help in such situations, you can use the ps command to list active processes. It reports a snapshot of a process granting you information about the process id (PID), parent process id (PPID), priority, memory usage, current state and more. There are also kill and pkill commands which may terminate a process identified by a specific id or executable name respectively.

If you are running multiple instances of a program, and you want to terminate only a few of them its hard to apply the presented commands - pkill will terminate all instances (that is not desired), while kill will require you to obtain PIDs (this requires some work). It may be easier to locate the mentioned processes by their command line arguments, execution paths or other run parameters, literally or using regular expressions.

The following script does the job:

if [ "$1" ]
then
    ps lax  | tr -s ' ' | cut -d ' '  -f 3,13- | \
    egrep "$1" | cut -d ' ' -f 1 | \ 
    xargs kill -9 2>/dev/null
else
    echo "Usage: $0 <extended regular expresion>"
fi

First a process list is obtained and adjusted for further processing (the text output is so lousy...). Using cut preserves only the third column (PID) and everything beyond the 13-th (whitespace separated application name with additional parameters). Next we match the output with a provided extended regular expression (ERE), be warned though - the tested string starts with the process ID so starting the ERE with a "^" is a bad idea (starting with "^[0-9]+" may work, but you'll eventually end up with restarting your system :-)).

Cheers!

~KR

Friday, September 28, 2012

Setting up a custom bash prompt

If you spend a lot of time exploiting the terminal not only on a single PC, but also on other servers via ssh, it is a good practice to have your bash prompt properly configured. If all your command prompts looks like this (example):

~ chriss $

it's almost impossible to determine your current location (server, directory, only a user - which may be common for all machines). The whole magic behind bash prompt configuration is in the $PS1 environment variable. There are also variables $PS2, $PS3, $PS4, but $PS1 is used as the primary bash prompt string.

So let's create two prompts, one for each server your are usually connecting via ssh, and one for your local machine. Lets look at the possibilities :

\d : string date representation
\e : an escape character
\h : hostname sub-domain
\H : full hostname domain
\j : current job count
\n : newline character
\t : time / 24h HH:MM:SS
\T : time / 12h HH:MM:SS
\@ : time / 12h HH:MM
\A : time / 24h HH:MM
\u : username
\w : current directory relative to $HOME
\W : current directory

There are more options to configure the prompt, but I personally find these most useful. Let's start with the local machine. Well since it's the default place you'll be starting your adventure from, and you'll eventually return to this prompt after your job on remote machines is done I suggest skipping most information and keeping the prompt something like this:

export PS1="~ \u \w $"

Which results in:

~ kr ~/prj/python $

It's easy, if you don't see no hostname on the prompt, you're probably still on the local machine. Displaying the working directory may save you a lot of ls / pwd executions.

If you're connected to a remote machine, it's best to have a greater context. A good option beside the username i the hostname (full if you're having problems differentiating remote machines using only the first subdomain), the current working directory has proven useful not only on remote machines, and finally - if you're connecting to different time zones, it's a good idea to display the system time. Summing up, we would end up with something like this:

export PS1="[\A] \u@\H \W $"

This will result in the following prompt message:

[21:45] kr@some.secret.server current_dir $

This way you will never get confused about your terminal session. Remember that this only sets the prompt for the current session, if you want your prompt to get configured every time you start a session, you should apply this code to your ~/.bashrc file.

Cheers!

~KR

Monday, September 24, 2012

Logging JavaScript exceptions using raven-js and sentry.

Let's face it, its hard to obtain 100% unit-test code coverage, likewise it's nearly impossible to implement functional tests for all possible scenarios. A good way of recognizing problems and locating them is logging exceptions and other informative messages. For small projects logging to std out/err or files is fine, but if there are hundreds of people exploiting your application every minute you should think of a more scalable solution.

Sentry is a good option, it's a real-time event logging platform that may be set-up as a standalone application. Other pros are:

support of different levels of logging / dynamic filtering
logging data from many independent projects
presistent log storing (database)
user privilege configuration / email notification
there are clients for many popular programming languages
it's based on python/django
it's based on python/django
it's based on python/django
....

I can go on like this forever :-) You can easily install sentry via pip.

~ virtualenv --no-site-packages sentry_env && cd sentry_env
~ source bin/activate
(sentry_env) ~ pip install sentry
(sentry_env) ~ sentry init

Now all you have to configure database access and other important parameters, check out the sentry configuration guide for more information.

But let's get back to the main thought. Among programming languages that have clients for sentry there is also JavaScript. This wouldn't be a surprise if not for the fact that (besides node.js) JS is usually executed on the client side (web browser). Raven-js (don't confuse it the client for RavenDB) can log messages / catch exceptions and send them via AJAX requests to your sentry application. In order to set up the logging script you should first configure sentry and create a project and obtain generated project public key. Then use the following code:

Setting up raven-js and observing how your scripts crash on IE is just priceless :-)

~KR

Tuesday, September 18, 2012

Recursive call optimization

Every programmer sooner or later develops his first recursive function (common examples are calculating a factorial or a Fibonacci number). Recursive functions are usually quite easy to understand and implement, however they may lack performance - especially tail-recursive calls (I don't want to get too much in to low-level programming, but if you're really interested there are tons of articles about it).

A tail recursive call occurs when the last instruction in your function is a function call. Something like this:

def tail_rec_func(x):
    if x > 0:
        return tail_rec_func(x-1)
    else:
        return 0

Now this is one useless piece of code, but it will be a good educational example. Here's what you get when you execute it:

>>> tail_rec_func(5)
0
>>> tail_rec_func(1000)
(...)
RuntimeError: maximum recursion depth exceeded

Interesting, this code raises a runtime error stating that that our recursion stack is full. Python has a default recursion limit of 1000 ( sys.getrecurionlimit ) which prevents stack overflows (yes, this is even worse than a runtime error), so increasing the limit will not solve our problem. An obvious solution is replacing tail recursion with an iteration (it's always possible), many compilable languages will optimize this function in the mentioned way, but since python is a scripting language it does not make such straightforward moves - you are responsible for optimizing the code yourself.

In our case:

def iter_tail_rec_func(x):
    while 1:
        if x > 0:
            x -= 1
        else:
            return 0

If you frequently implement tail recursive functions, you had better implement a decorator that converts them to iterations or someday you may find your code crashing with a runtime/stack overflow error.

Cheers!

~KR

Saturday, September 8, 2012

Dynamic class attribute access (get/set) via string names

Today I'd like to share some of my experience concerning dynamic class attribute access in python. In most cases you can easily predict the attributes you will want to access, but sometimes - especially when the attributes/fields depend on some input data this may be a problem.

There are 3 magic python build in functions that may help a lot: setattr, getattr, hasattr responsible for setting, getting named class attributes and checking if they exist, respectively.

A simple example to get familiar how it works:

>>> class A():
...     pass
... 
>>> my_cls = A()
>>> hasattr(my_cls, 'test_attr')
False
>>> my_cls.test_attr
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: A instance has no attribute 'test_attr'
>>> setattr(my_cls, 'test_attr', 'some value')
>>> hasattr(my_cls, 'test_attr')
True
>>> my_cls.test_attr
'some value'
>>> getattr(my_cls, 'test_attr', 'default_value')
'some value'

This is quite intuitive, you may use these methods to update class attributes according to a dict:

>>> my_dict = {'attr1': 1, 'attr2' : 5, 'attr3' : 10}
>>> for k,v in my_dict.items():
...     setattr(my_cls, k, v)
... 
>>> my_cls.attr1
1
>>> getattr(my_cls, "attr3", -1)
10
>>> getattr(my_cls, "attr4", -1)
-1

Since getattr supports default values, you can manage without using hasattr.

These functions are great, when you can't predict what attributes you will need to access or when you are not sure whether some attributes are set. A good usage example are global configuration files. For example in django:

from django.conf import settings



some_param = getattr(settings, 'MY_CONF_PARAM',  100)

Just keep in mind that using by using setattr you may override ANY attribute, so make sure you implement some validation mechanisms. And if you ever need it - deleting attributes is also possible, try delattr.

~KR