My thoughts on computers, programming, ancient spell scrolls and other magic devices...: October 2012

Tuesday, October 30, 2012

Evaluating minimum/maximum regex match length.

The application of regular expressions in data validations is a standard nowadays. A good regular expression in the right context may save the day, I believe no one can deny it. However, sometimes you (or other people using your software) need some more feedback if the provided data fails to match the test pattern. A typical situation may look like this:

def test_id(user_id):
    import re
    regex = r"(secret\s?)?agent\s([0-9]{3}|james bond)"
       
    if not re.match(regex, user_id):
        print "Incorrect user ID: %s" % user_id
    else: 
        print "OK: %s" % user_id

test_id('agent james bond')
test_id('secret agent 007')
test_id('agent jack black')
test_id('agent j')

Running this code results in the following output:

OK: agent james bond
OK: secret agent 007
Incorrect user ID: agent jack black
Incorrect user ID: agent j

So what if we want to give a hint, about the minimum and maximum length? In many cases guessing/manually evaluating the potential match size should be easy. However, if You plan using some more sophisticated regular expressions, You should not rely on Your intuition - but on calculations:

def test_id(user_id):
    import re
    import sre_parse

    regex = r"(secret\s?)?agent\s([0-9]{3}|james bond)"
    match_range = sre_parse.parse(regex).getwidth()

    if len(user_id) < match_range[0] or \
            len(user_id) > match_range[1]:
        print "User ID %s should be between %s and %s characters" % \
                (user_id, match_range[0], match_range[1])
    elif not re.match(regex, user_id):
        print "Incorrect user ID: %s" % user_id
    else: 
        print "OK: %s" % user_id



test_id('agent james bond')
test_id('secret agent 007')
test_id('agent jack black')
test_id('agent j')

This time the output should look like this:

OK: agent james bond
OK: secret agent 007
Incorrect user ID: agent jack black
User ID agent j should be between 9 and 23 characters

It certainly is more descriptive.
Cheers!

~KR

Saturday, October 27, 2012

Determine Django form field clean order

Django provides some excellent out-of-the-box form validation/processing mechanisms. These include casual forms, model based forms, and formsets. One of their features is the ease of implementing data validation procedures. According to the documentation, the validation process includes executing the following methods in the presented order:

to_python (field)
validate (field)
run_validators (field)
clean (field)
clean_<fieldname> (form)
clean (form)

Since there is usually no need to extend form fields, let's focus on the form-side validation. Let's suppose you have the following forms:

class MyBaseForm(forms.Form):
    base_field1 = forms.CharField()
    base_field2 = forms.CharField()

    def clean_base_field1(self):
        #(...)
        return self.cleaned_data['base_field1']
       
    def clean_base_field2(self):
        #(...)
        return self.cleaned_data['base_field2']

class MyForm(MyBaseForm):
    field1 = forms.CharField()

    def clean_field1(self):
        #(...)
        return self.cleaned_data['field1']

All methods implemented in this example refer to step 5: clean_<fieldname>. So what would be the execution order if we try validating MyForm? Django manual states:

"These methods are run in the order given above, one field at a time. That is, for each field in the form (in the order they are declared in the form definition), the Field.clean() method (or its override) is run, then clean_<fieldname>(). Finally, once those two methods are run for every field, the Form.clean() method, or its override, is executed."

According to this statement, the expected order would be: clean_base_field1, clean_base_field2, field1. What if we don't like this order, should we rearrange the form definition? No such thing! There is a way change this order in a more elegant way. We may use fields.keyOrder to achieve it:

class MyForm(MyBaseForm):
    field1 = forms.CharField()

    def clean_field1(self):
        #(...)
        return self.cleaned_data['field1']

    def __init__(self, *args, **kwargs):
        super(MyForm, self).__init__(*args, **kwargs)
        self.fields.keyOrder = ['base_field1', 'field1', 'base_field2']

You don't need to extend a form to use this. You may also specify a partial order (if there are to many fields to be named explicitly) by putting only a few field names in the array and extending it by a list comprehension generated from the remaining self.fields.keys().

Cheers!

~KR

Tuesday, October 23, 2012

Hard links vs. symbolic links

Everybody uses links in his everyday virtual life. I don't think anyone could imagine navigating between websites without using hyper-refs, but let's move to links that refer to our file-system. If You had some experience with MS Windows before, You're probably familiar with symbolic links, aka shortcuts.

In general symbolic links may be interpreted as pointers directed to our files logical layer. In a larger scope this may look like a pointer to the pointer of physical data. If this still looks confusing, have a look at an example:

~ $ echo "Test file" > f1.txt
~ $ ln -s f1.tx f2.txt

This may be visualized the following way ("Test file" is the physical data here):

Now you can access the physical data using both f1.txt and f2.txt. However, if You delete f1.txt, the physical data will be lost (no label will point to it). A different situation occurs when You use hard links instead. Each time You create a file, a label is hard linked to it. In the previous example the hard link was created by executing:

~ $ echo "Test file" > f1.txt

By default each chunk of physical data has only 1 hard link attached, but more may be attached. For example:

~ $ ln f1.txt f2.txt

Will create a hard link with label f2.txt to the physical data of f1.txt, let's visualize it:

You can access the physical data both via f1.txt and f2.txt. Unlike symbolic links, each of the hard links works even if the other stops to exist. In order to delete the physical data you need to unlink all hard links that point to it (rm f1.txt alone will not do...).

To sum up, symbolic links are cool because they operate on the logical layer, and thus are not limited to a single file system. Furthermore symbolic links may point to directories which is an important feature.

Hard links also have some features that their symbolic cousins have not. Hard links are always bound the the physical source of data, thus making them move/rename proof (symbolic links are not updated if you move/rename a corresponding hard link).

Hope this overview helps you to choose a right link for each situation.

Cheers!

~KR

Monday, October 15, 2012

Renaming Mercurial branches

I believe there is no need to present Mercurial (since You got here, You should be familiar with it anyway). I'd like to present a way of renaming / replacing branches. By default, without using extensions, it's impossible ("branches are permanent and global...")... but there are other ways to deal with it. Let us suppose we want to rename branch A to B. We can achieve it the following way:

hg update A
last_head=$(hg id -i)
hg ci -m "close A" --close-branch
hg update $last_head

hg branch B
hg ci -m "branche renamed to B"

This is it, now our branch is named B. In practice we just closed branched A and created a new branch B from the last commit. This activity may be visualized the following way:

Replacing an existing branch with another is a bit more tricky, here's what you have to do:

hg update A
hg ci -m "close A" --close-branch
hg update B
hg branch A -f #force, branch A exists
hg ci "rename to A"
#optional close branch B

The general idea may be presented the following way:

In order to create a branch that previously existed we have to use the force switch (hg branch). Nothing should go wrong if the previous head of branch A was closed, else You'll just end up creating another head.

Experimenting with hg is cool, just remember - before You try anything experimental, commit your work! This may save You a lot of nerves.

Cheers!

~KR

Saturday, October 13, 2012

Serve text file via HTTP protocol using netcat

Unix based system provide a lot of cool network tools, today I'd like to show the potential of netcat. Netcat is a utility that may be used for "just about anything under the sun involving TCP and UDP" (netcat BSD manual) . Thats a pretty description, but let's go on to some practical stuff.

If we want to make a useful script, we should make it work with a popular protocol, such as HTTP. This way we don't have to worry about potential client application, we may use an ordinary web browser to test our script. More formal information about the HTTP protocol can be found in RFC2616.

So a bash script that generates a simple HTTP response and serves a text file may look like this:

if [ -z "$1" ]
then
    echo "Usage: $0 <text file to serve>"
    exit 1
fi

filename=$1

echo "
HTTP/1.1 200 OK
Date: $(LANG=en_US date -u) 
Server: NetCatFileServe
Last-Modified: $(LANG=en_US date -u)
Accept-Ranges: bytes
Content-Length: $(cat $filename | wc -c  | cut -d " " -f 1) 
Content-Type: application/force-download
Content-Disposition: attachment; filename=\"$filename\"

$(cat $filename)
"

If a HTTP client receives such a response, it should try to save the attached data as a file (a typical save-file-as window in a web browser).

Since we have a HTTP response generate, we need to create a server that will serve it - that's where netcat comes in handy. To serve some data we need to run it in the listening mode. For example :

~ $ ./ncserve.sh test.txt | nc -l 8888

If you now enter http://127.0.0.1:8888 (or your other IP) in a webbrowser, you should be able to download text.txt file. You may also test it using curl:

~ $ curl -X GET 127.0.0.1:8888
HTTP/1.1 200 OK
Date: Sat Oct 13 10:40:27 UTC 2012
Server: NetCatFileServe
Last-Modified: Sat Oct 13 10:40:27 UTC 2012
Accept-Ranges: bytes
Content-Length: 71
Content-Type: application/force-download
Content-Disposition: attachment; filename="test.txt"

This is a simple text file
bla bla bla
downloaded via netcatserver
:-)

This script only serve a file once and dies, if you want it to act like a regular HTTP server you should run it in a infinite loop.

Cheers!

~KR