Tuesday, November 30, 2010

Databases and Threads

The usual disclaimer applies to this post: millions of people probably already know this, and will be thinking 'What? And you call yourself a programmer??'

What could possible be wrong with the following function:-

public int RunIdentityInsert(String sql) throws SQLException {
Statement stmt = (Statement) conn.createStatement();
stmt.execute(sql);
return getScalarAsInt("SELECT @@IDENTITY AS NewID");
}

How about the bane of my programming life, threads? I've got loads of programs that have been running on servers for years using the above code, and mostly without a problem. Until yesterday.

My function added a record to a table containing about 6,000 rows, but the insert returned an id of about 250,000. It was only after looking at all the other possible reasons that this probably cause hit me: what if another thread ran another insert statement at pretty much exactly the same time? That would surely cause the same symptoms.

To be honest, I don't actually know if that was the cause, but it seems to me that the code should look like this:-


public int RunIdentityInsert(String sql) throws SQLException {
synchronized (conn) { // Need in case another thread also does an insert!
Statement stmt = (Statement) conn.createStatement();
stmt.execute(sql);
return getScalarAsInt("SELECT @@IDENTITY AS NewID");
}
}

4 comments:

Charlie said...

You use threads without using synchronized?!?!

How haven't you killed the Internet by now!

Always, ALWAYS, sychronized (foo) { } when doing threading in Java.

Steve said...

True, but your statement needs qualifying better as I don't want to pepper every function with a sycnhronized(). Something like "always use synchronized when... accessing databases"?

Anonymous said...

You can avoid synchronized{} all together if you batch your insert and select statements. (I'm assuming your DB back-end has transactional support.) I write multi-threaded programs all the time in Java, and I don't even worry about synchronization at the database level because the DBMS takes care of it for me.

Steve said...

I'm glad to hear that, since after adding the sync() the database inserts have slowed down a lot. I don't think a batch update will work here as I need the auto_increment id for each insert. Maybe a solution is to have two insert() functions, one that's syncd() and one that's not (where the id isn't required).