Submit Blog  RSS Feeds

Thursday, August 30, 2012

Extracting data from excel spreadsheets (xls) using xlrd

So what do you usually say when somebody asks you to extract some data from an excel spreadsheet for additional processing? I believe something like: "--beep--, can't you convert it to CSV?" (censored). Usually converting a spreadsheet to CSV is THE way to go.

Unfortunately sometimes such spreadsheets have to stay up to date and people keep modifying them. And when it comes to using formulas, maintaining a spreadsheet is a must (CSV preserves only final values).

I had such a situation in one of my past projects, clients uploaded XLS files via a web form and a python script had to decode appropriate evaluated cells and load them to the database. There are many platform dependent (windows) packages for this purpose requiring some M$ Office DLLs present, but since I didn't want to deploy this application on windows I started looking for a cross-platform solution. I decided to used xlrd, a cross-platform tool compatible with python 2.7.  You can get it using pip:

~ $ sudo pip install xlrd

Now for an example:

import xlrd

#open the spreadsheet
workbook = xlrd.open_workbook("test.xls")

#select the first sheet
sheet = workbook.sheet_by_index(0)

#get data from column C (2), row 10 (9)
#non-negative numeration
data = sheet.cell(colx=2,rowx=9).value
print "Cell C,10: %s" % data

It's quite easy, however its best to implement some mapping function, so you don't have to evaluate the column number for every letter (cell AXC, 19221 could be quite tricky ;-))

~KR

No comments:

Post a Comment

free counters