Submit Blog  RSS Feeds

Thursday, April 19, 2012

Search and replace many files based on regular expressions

In the past weeks our team has put a lot of effort into scaling our system, so it can handle greater amounts of traffic. A good way to decrease the load of the primary server is to serve static files from another machine. Django supports such mechanisms, one can specify the MEDIA_URL parameter in the settings file, which acts as a prefix when it comes to loading media files, including images, css and js files... well at least it should. As it occurred the template files (HTML) contained absolute URI paths. I decided this would be a great opportunity to modify those templates. The number of files that should be checked was rather high, and each of those files contained many media file references.

Here is where bash comes with a helpful hand. Since I did not intend to spend the whole morning on copy-pasting through hundreds of entries I wrote a script that does the magic thing for me:

  1 for f in $(find . | grep html$ | xargs egrep '"/media[^""]+[a-z]"'  | cut -d ":" -f1 | sort | uniq)
  2 do
  3     echo $f
  4     cat $f |  sed -r 's/(src|href)=\"\/media\/([^""]*\.[a-z]+)\"/\1=\"{{MEDIA_URL}}\2\"/g' > $f.tmp
  5     mv $f.tmp $f
  6 done

So what does it do? The script iterates over all html files found in subdirectories that have an absolute path starting with /media surrounded by quotation marks (the pipes ensures that each file is processed at most once). Line no. 4 is responsible for replacing the absolute path with a template variable. For example it changes:

(...) src="/media/some_path/some_image.jpg" (...), to
(...) src="{{MEDIA_URL}}some_path/some_image.jpg" (...)

Using the back references (groups) is essential, without it the search/replace context would be insufficient, which could result in modifying parts that don't refer to static media content. The first group covers the attribute (href or src), while the second group covers the file name.

Cheers!

No comments:

Post a Comment

free counters