Sunday, March 20, 2011

MySQL partitioning with ActiveRecord

I want to take advantage of the new partitioning in MySQL 5.1 but using a standard ActiveRecord model. The problem I have encountered is that to use the partitioning the primary key on the table must include the columns used in the partitioning function. (http://dev.mysql.com/doc/refman/5.1/en/partitioning-limitations-partitioning-keys-unique-keys.html)

Now, I want to have a auto-incrementing 'id' field (which is usually used as the primary key in Rails) but I want to partition using another column, in this case 'task_id'. Crucially I don't want to use composite primary keys in my Rails application. My question is this:

If I set up the table as follows:

"CREATE TABLE `annotations` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `task_id` int(11) DEFAULT NULL,
  `user_id` int(11) DEFAULT NULL,
  `value` text,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`, `task_id`),
  KEY `index_annotations_on_task_id` (`task_id`),
  KEY `index_annotations_on_user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
PARTITION BY RANGE(task_id)(
PARTITION 1_to_5 VALUES LESS THAN (6),
PARTITION 6_to_10 VALUES LESS THAN (11),
PARTITION 11_to_15 VALUES LESS THAN (16))"

where I have the primary key composed of the 'id' and 'task_id' columns, and I don't do anything special with the ActiveRecord model, am I setting myself up for any headaches in the future?

Basically, I'm creating a table with a primary key based upon two columns but not telling ActiveRecord about the second column as I still have an auto-incrementing id field - can anyone see any problems with this approach?

From stackoverflow
  • I believe what you describe, create the ActiveRecord model then alter it directly in MySQL will work. The possible down side is that Rails will not know about 'task_id' and of course would not include it in the queries. The problem being that the major advantage of partitioning is the query optimizer would see that the query included "task_id = 7" and know the query only needed to be run in the second partition.

    You should still see some advantages as joins (particularly from the task table) would include this key and the query optimizer could do the right thing.

    Rails / Active Record also has a habit of updating every column regardless of if it has changed. MySQL usually would notice your were updating a column to the value that it was already - but watch this because if MySQL thinks it needs to move a record from one partition to another your updates would take a hit.

  • Although not exactly what you are looking for, as an alternative have you seen DataFabric? It is an application level way of sharding your data across multiple databases. It also gives you a way to use master->slave replication. Like I said, not exactly the same as database level partitioning, but seems like it would be easier to implement.

  • OK, thanks for your comments. Yes I have been looking at DataFabric and this looks to have some pretty useful features for sharding and replicating databases. Anyway, thanks for the input, I'll have to read around a little more before making a decision.

0 comments:

Post a Comment