Submit Blog  RSS Feeds

Tuesday, October 30, 2012

Evaluating minimum/maximum regex match length.

The application of regular expressions in data validations is a standard nowadays. A good regular expression in the right context may save the day, I believe no one can deny it. However, sometimes you (or other people using your software) need some more feedback if the provided data fails to match the test pattern. A typical situation may look like this:

def test_id(user_id):
    import re
    regex = r"(secret\s?)?agent\s([0-9]{3}|james bond)"
       
    if not re.match(regex, user_id):
        print "Incorrect user ID: %s" % user_id
    else: 
        print "OK: %s" % user_id

test_id('agent james bond')
test_id('secret agent 007')
test_id('agent jack black')
test_id('agent j')


Running this code results in the following output:

OK: agent james bond
OK: secret agent 007
Incorrect user ID: agent jack black
Incorrect user ID: agent j


So what if we want to give a hint, about the minimum and maximum length? In many cases guessing/manually evaluating the potential match size should be easy. However, if You plan using some more sophisticated regular expressions, You should not rely on Your intuition - but on calculations:


def test_id(user_id):
    import re
    import sre_parse

    regex = r"(secret\s?)?agent\s([0-9]{3}|james bond)"
    match_range = sre_parse.parse(regex).getwidth()

    if len(user_id) < match_range[0] or \
            len(user_id) > match_range[1]:
        print "User ID %s should be between %s and %s characters" % \
                (user_id, match_range[0], match_range[1])
    elif not re.match(regex, user_id):
        print "Incorrect user ID: %s" % user_id
    else: 
        print "OK: %s" % user_id



test_id('agent james bond')
test_id('secret agent 007')
test_id('agent jack black')
test_id('agent j')


This time the output should look like this:

OK: agent james bond
OK: secret agent 007
Incorrect user ID: agent jack black
User ID agent j should be between 9 and 23 characters


It certainly is more descriptive.
Cheers!

~KR



No comments:

Post a Comment

free counters