Friday, April 15, 2011

How do I UPDATE a row in a table or INSERT it if it doesn't exist?

I have the following table of counters:

CREATE TABLE cache (
    key text PRIMARY KEY,
    generation int
);

I would like to increment one of the counters, or set it to zero if the corresponding row doesn't exist yet. Is there a way to do this without concurrency issues in standard SQL? The operation is sometimes part of a transaction, sometimes separate.

The SQL must run unmodified on SQLite, PostgreSQL and MySQL, if possible.

A search yielded several ideas which either suffer from concurrency issues, or are specific to a database:

  • Try to INSERT a new row, and UPDATE if there was an error. Unfortunately, the error on INSERT aborts the current transaction.

  • UPDATE the row, and if no rows were modified, INSERT a new row.

  • MySQL has an ON DUPLICATE KEY UPDATE clause.

EDIT: Thanks for all the great replies. It looks like Paul is right, and there's not a single, portable way of doing this. That's quite surprising to me, as it sounds like a very basic operation.

From stackoverflow
  • I don't know that you are going to find a platform-neutral solution.

    This is commonly called an "UPSERT".

    See some related discussions:

    Ryan Graham : Is that solution really applicable? and portable?
  • If you don't have a common way to atomically update or insert (e.g., via a transaction) then you can fallback to another locking scheme. A 0-byte file, system mutex, named pipe, etc...

  • Could you use an insert trigger? If it fails, do an update.

    depesz : Trigger (at least in PostgreSQL) is running when the command worked. i.e. you can't have trigger that runs when base command failed.
    Michael Todd : Hunh. Didn't know that. Thanks.
  • MySQL (and subsequently SQLite) also support the REPLACE INTO syntax:

    REPLACE INTO my_table (pk_id, col1) VALUES (5, '123');
    

    This automatically identifies the primary key and finds a matching row to update, inserting a new one if none is found.

    Kyle Cronin : > (and subsequently SQLite) ...you totally stole that from my answer :-)
    andygeers : Sorry - hope that's not considered rude?
    Learning : if you found it useful , you could've atleast upvoted it :)
    Evan : Actually, to be specific, MySQL's REPLACE always does an insert, but it will delete the row if it already exists, first. http://dev.mysql.com/doc/refman/4.1/en/replace.html
    Zoredache : It is important to understand that it is an insert+delete and never and update. The consequence of this, is that you will always want to make sure when you do a replace, you should always include data for all the fields.
    j_random_hacker : +1 Zoredache, that's important info.
  • SQLite supports replacing a row if it already exists:

    INSERT OR REPLACE INTO [...blah...]
    

    You can shorten this to

    REPLACE INTO [...blah...]
    

    This shortcut was added to be compatible with the MySQL REPLACE INTO expression.

    Learning : +1 for sqlite context.
  • Standard SQL provides the MERGE statement for this task. Not all DBMS support the MERGE statement.

  • In PostgreSQL there is no merge command, and actually writing it is not trivial - there are actually strange edge cases that make the task "interesting".

    The best (as in: working in the most possible conditions) approach, is to use function - such as one shown in manual (merge_db).

    If you don't want to use function, you can usually get away with:

    updated = db.execute(UPDATE ... RETURNING 1)
    if (!updated)
      db.execute(INSERT...)
    

    Just remember that it is not fault proof and it will fail eventually.

  • I would do something like the following:

    INSERT INTO cache VALUES (key, generation)
    ON DUPLICATE KEY UPDATE (key = key, generation = generation + 1);
    

    Setting the generation value to 0 in code or in the sql but the using the ON DUP... to increment the value. I think that's the syntax anyway.

    Fire Crow : I've used this solution in php/mysql applications it works very well.
  • the ON DUPLICATE KEY UPDATE clause is the best solution because: REPLACE does a DELETE followed by an INSERT so for an ever so slight period the record is removed creating the ever so slight possibility that a query could come back having skipped that if the page was viewed during the REPLACE query.

    I prefer INSERT ... ON DUPLICATE UPDATE ... for that reason.

    jmoz's solution is the best: though I prefer the SET syntax to the parentheses

    INSERT INTO cache VALUES 
    SET key = 'key', generation = 'generation'
    ON DUPLICATE KEY 
    UPDATE key = 'key', generation = (generation + 1)
    ;
    

0 comments:

Post a Comment