My thoughts on computers, programming, ancient spell scrolls and other magic devices...: mysql

Showing posts with label mysql. Show all posts

Monday, December 17, 2012

Using table join with mysqldump

Everybody familiar with MySQL database knows mysqldump, the ultimate MySQL backup tool. This tool has more intereseting features, mainly it can be used to dump only specific tables or even selectively extract data from the database.

So if You want to dump a single table, you should execute:

mysqldump -u db_user -p db_name table_name > table_and_data.sql

This is a default script, quite useful for recreating tables/making backups. But as it was mentioned before, we just want the data. Fortunately mysqldump comes with many options, --no_create_info for instance will drum only the data wrapped by an INSERT INTO statement. Another nice option: --where enables providing custom data filtering rules (typical for MySQL). Let's try a query that dumps specific rows from a table that is related to another table using a WHERE statement on the corresponding data model.

mysqldump -u db_user -p db_name table_name --no_create_info \
--lock-all-tables --where 'id in (select tn.id from table_name as tn \
join related_table as rt on tn.related_table_id = rt.id \
where rt.some_field = 1)' > data.sql

The idea is to use a subselect to get table_name rows that have foreign keys to the related_table rows that meet the rt.some_field = 1 requirement. This works well, however if You access tables not specified in mysqldump's command params, You should lock them yourself. In this example I use --lock-all-tables which is an overkill (if You're using a transactional database engine, such as InnoDB, You can use --single-transaction instead - no table locking will be required).

Cheers!
~KR

Tuesday, November 27, 2012

Exporting MySQL queries into a CSV file.

Every now and then I need to do some non-standard data processing things, which requires grabbing a bit of data from a database. Usually this "non-standard data processing things" are simple scripts, and enhancing them to get data directly from a database would kill their main advantages: simplicity and a fast setup time.

CSV files are way better for this kind of processing. Since the data is located in a database, not a CSV file, we'll have to export it. Now there are two basic ways:

a) The MySQL geek way.
b) The bash/sh geek way.

So let's start with the pure MySQL way. Usually when queries are made using the standard mysql console client, the retrieved data is formated into something that should reassemble a table... well maybe it does, but "broad" result tables aren't readable anyway.

select * from some_table into outfile "/tmp/output.csv" \
fields terminated by ";" lines terminated by "\n";

All we have to do is change the field and line terminator to match our needs. About the file, you should pass a path that is writable for the mysql user. If the file exists, or the MySQL server has no write permissions, this query will fail.

Let's move on to the bash geek way - personally I don't fancy MySQL trciks, therefore I usually rely on bash. So here we go:

echo "select * from some_table;" | \
mysql -u my_usr -p my_db | sed 's/\t/;/g' | \
tail -n +2 > /home/chriss/output.csv

We execute a query, passing it via an input stream to the MySQL client. The client returns a tab separated list of fields (one per line), sed replaces those with semicolons. Next we dispose of the first line (using tail), since it contains column names. Finally we save the output as a file. Unlike in the MySQL way, you can write the file according to your write permissions, not mysql's users.

Both approaches have some advantages and disadvantages, so their usage should be dependent on the context.

Wednesday, June 13, 2012

Django "is substring" like filter (MySQL)

Recently I came by a problem of selecting rows from a table on a condition that one of the fields is a substring of a given phrase. It's a bit hard to explain, but anyway I wanted to achieve the equivalence of:

filter(lambda x: "some phrase".find(x.some_field) >= 0, MyModel.objects.all()) 

on the SQL/ORM level.

So I searched the django documentation again and again and I failed to find anything useful. If the problem cannot be solved on the ORM level it must be solved with raw SQL:

SELECT * FROM myapp_mymodel WHERE "some phrase" LIKE CONCAT('%',some_field,'%');

Since the presented where clause connot be generated using the QuerySet.filter method we have to use the extra method instead. A django equivalent would look like this:

MyModel.objects.extra(where=["%s LIKE CONCAT('%%', some_field, '%%')"], \
params=["some phrase"])

We should remember that the SQL LIKE operator is case insensitive (in this case it is desirable), however if you want a case sensitive filter, try using LIKE BINARY instead.

~KR

Thursday, May 10, 2012

Droping a multi-column unique constraint in MySQL with Django South

South is a great tool for managing database migrations (compatible with Django). It generates migrations by analysing the difference between the current data model and the previous one (stored in migration script files). However strange things may occur if you try do drop a multi-column unique constraint ( the django model defines it as: unique_together). For example we have:

1 class MyModel(models.Model):
2 classs Meta:
3 app_label = 'myapp'
4 unique_together = (('field1', 'field2',),)

We remove line 4, and run schemamigration:

~ ./manage.py schemamigration myapp --auto

South generates the following forward migration:

23 class Migration(SchemaMigration):

24 def forwards(self, orm):

25 db.delete_unique('myapp_mymodel', ['field1', 'field2'])

Let's try to execute it:

~ ./manage migrate kantor

And we get something like this:

ValueError: Cannot find a UNIQUE constraint on table myapp_mymodel, columns ['field1', 'field2']

We can investigate it using the mysql client:

mysql> SHOW CREATE TABLE myapp_mymodel;

(...)

UNIQUE KEY `myapp_mymodel_field1_667dc28f4f7b310_uniq` (`field1`,`field2`),

(...)

So the unique constraint really exists, but south fails to drop it. To solve this problem we have to drop that index ourselves. We should modify the forward migration the following way:

23 class Migration(SchemaMigration):

24 def forwards(self, orm):

25 import south

26 south.db.db.execute('drop index \
27 myapp_mymodel_field1_667dc28f4f7b310_uniq on myapp_mymodel;')

That does the trick! Farewell multi-column unique constraints!

~KR