Upsert for sqlalchemy
I've been working on a codebase that included someone's implementation of an upsert
function for sqlalchemy. It's a problem that you can approach a bunch of different ways, including at the sql layer, but a prior engineer had written a method in a mixin class that boils down to:
existing_val = getattr(existing, column.name, None) new_val = getattr(self, column.name, None) if new_val != existing_val: setattr(existing, column.name, new_val)
The reason I ended up spending a morning with this method is that there were nulls showing up in timestamp columns, columns which shouldn't have been null. There was only one place that explicitly set that column, so it seemed like it had to be happening as part of some method that worked on the whole instance. upsert
stood out like a sore thumb, there.
You might have already noticed that this method is a bit naive in that it only cares that new_val
and existing_val
are different. In most cases, that makes sense - update them if they've changed. But the problem occurs when you are updating based on an object that was never read from the database.
In this case, an app was scraping data that largely didn't change, and using a key based on that data, so it was very reliably doing a full update of that data into the database. The problem started happening when I added timestamp columns to keep track of the last time related data had been scraped and, uddenly, that column was showing up null after it had been filled in. What was happening was that the new instance of the class, based only on the scraped data, had no timestamp information and so if new_val != existing_val:
was if None != some_timestamp:
and then None
got assigned to the field.
My first thought was to use a if new_val is not None and new_val != existing_val:
, which would handle the case that was the problem, but would never allow you to set a column to NULL
, which is a legal thing to want to do. So, I went in search of another way to handle this.
I came across sqlalchemy
's inspect
function, which can give you access to a lot of the bookkeeping that sqlalchemy
does under the hood. I found that I could get information about the state of an attribute on an instance, including whether it had been set or not, by getting an AttributeState
object and looking at the loaded_value
attribute, ie:
from sqlalchemy.orm.attributes import NO_VALUE instance_state: sqlalchemy.orm.state.InstanceState = inspect(instance) attribute_state: sqlalchemy.orm.state.AttributeState = instance_state.attrs['column_name'] should_update_column = attribute_state.loaded_value is not NO_VALUE
so, the version of the code at the beginning of the post became:
new_val = getattr(self, column.name) # It's only useful to set the new value if the # loaded_value of self's attribute is not NO_VALUE, # meaning it has actually been set to some value by # the user. if self_attribute_state[column.name].loaded_value is not NO_VALUE: setattr(existing, column.name, new_val)