Obergefell v. Hodges: the database engineering perspective

Previously

All the computer systems that exist exist to support requirements and processes arising in the real world. In particular, this includes databases intended to store personal information about human beings, such as who is married to whom, when they got married, when (if ever) they got divorced, and so on. Accurately storing this information is important because marriage often confers real-world legal privileges and obligations to the married people. It is also often important to the married people themselves that their union be public, and well-recorded, and recognised by the organisation doing the recording. The responsibility for accurately and reliably storing this information falls in part on the database administrator (DBA).

Each jurisdiction in the world imposes its own constraints upon who is allowed to marry whom. These constraints can be arbitrarily wide or narrow, but nevertheless they always exist. Because computer systems are not just structureless masses of information, marriage databases also impose constraints upon what marriages can or cannot be stored inside them. As a result, one of the responsibilities of the DBA is to ensure that their database is structured to be capable of storing all possible legal marriages. It may also be the DBA's responsibility to ensure that the database is incapable of storing illegal marriages (or combinations of marriages)... however, given that the laws surrounding this question can be arbitrarily complicated, the extent of this responsibility may be limited to only very simple rules, such as "nobody may marry themselves", with the remaining responsibility falling on application programmers and/or real people entering the data.

Today (2015-06-26)'s SCOTUS ruling changes the constraints of marriage across a great deal of the United States. This change is long-anticipated and the first marriages under the new constraints will have already happened by the time I finish writing this. This means DBAs (and many other people!) are going to have to react quickly to adjust their systems to accommodate marriages which, previously, there was no need to accommodate. And, depending on how the old systems were set up, and how well DBAs have foreseen this requirement, this may be an easy job, or a difficult one.

In my 2008 essay Gay marriage: the database engineering perspective I went through more than a dozen database schemas for storing marriages, gradually iterating towards more and more flexibility. That essay starts with some laughably stupid schemas and ends up considering some dubiously probable futures, such as scenarios where Alice is married to Bob but Bob is not married to Alice, or where Alice and Bob are somehow double-married, such that after a divorce, they would merely be married.

To investigate the specific ramifications of today's ruling, however, here's the schema we're probably starting with:

people
- id
- birthdate
- name
- gender

marriages
- id
- husband_id
- wife_id
- marriage_date
- divorce_date

Already the constraints on a schema like this are quite complicated. husband_id and wife_id are both foreign keys for column people.id. Check constraints ensure that the value of marriages.husband_id always points to a people row with gender set to "male" and the value of marriages.wife_id always points to a row with gender set to "female". (Exactly how the gender column should be structured is outside the scope of this essay, but the values "male" and "female", at least, should be available. Structuring the name column is even further out of scope, because yikes.) divorce_date is nullable. Probably there ought to be another check constraint which ensures that divorce_date doesn't come before marriage_date.

It might be required to incorporate some sort of check for duplicate combinations of husband_id and wife_id... but then again, this could make it impossible for a couple to e.g. marry in 1994, separate in 2009 and then remarry in 2015.

That's what we've got. Naturally, any of these constraints could be implemented at the application level instead, or even manually by the person entering the marriage data in the first place, and some of them probably should be implemented in this way.

Now the ruling comes through and we need to support same-sex marriage. Naively this would be a relatively simple matter. First, rename a few columns:

people
- id
- birthdate
- name
- gender

marriages
- id
- partner_1_id
- partner_2_id
- marriage_date
- divorce_date

And then, drop the check constraints for gender.

All done? Well, maybe. You almost certainly broke some applications by renaming those columns, but leaving them with their old names would be regrettable — and possibly deeply insulting to many couples, as well as highly confusing to newcomer DBAs. Those applications are going to need rewriting anyway.

A few edge cases are suddenly freed up. Married and divorced individuals may now change gender at will; previously, the database would forbid this due to the violated check constraint. In fact, it's no longer necessary to store gender information for the purposes of this now-missing constraint, and the column people.gender, and all the data in it, could in theory be dropped entirely.

But the more interesting thing is that you just incidentally let in a whole bunch of edge cases. Up until now, it wasn't possible for an individual to marry themself. Now it is, and you need a new check constraint to ensure that partner_1_id and partner_2_id are different. Regardless of concerns about duplicate rows/couples remarrying, you also now have to contend with swapped partners: Alice marries Eve, and also Eve marries Alice, resulting in two rows recording the same marriage. This can typically be prevented by ensuring that partner_2_id is greater than partner_1_id, which would incidentally also prevent self-marriage as described above. Note that this could in turn invalidate previously-existing heterosexual marriages where the husband_id was lower than the wife_id. This constraint would have to be applied for future inserts only, or the disordered rows would need to be swapped.

Overall it looks like a relatively straightforward day's work for a well-prepared DBA with cooperative application developers. Of course, these are big assumptions! The difficulty of making any given alteration to a system is unbounded.

MongoDB, NoSQL and structureless data

NoSQL wasn't a thing when I wrote the original essay, but now it is.

This kind of change is apparently trivial in NoSQL databases because these are explicitly not relational databases, and lack structure, foreign keys, check constraints and so on. In theory there is nothing stopping a NoSQL DBA from creating a "marriages" collection and then just adding any kind of data to it. This includes self-marriages, convoluted weighted, directed, polygamous graphs, things not involving humans, structures not recognisable as marriages at all, and so on.

Of course in practice, it is well-understood that for NoSQL databases, the structure is still there but it is implicit, imposed by the way applications interact with the database. The analogy is like liquid filling a complicatedly-shaped vessel. The shape taken by the fluid (the data in the database) is determined by the shape of vessel (structure and behaviour of the applications).

In other words, in this situation, the DBA has nearly nothing to do. It's the application developers who need to pick up the slack.

Lessons learned?

This change was seemingly simple but had a little subtlety. Like any change in requirements, the difficulty of handling this particular new requirement is inversely correlated with one's preparedness. In the case of same-sex marriage in US states where it was previously illegal, it should have been blindingly obvious for years that this moment was coming.

This isn't about being closed-minded or homophobic or anything like that, it's just a matter of domain knowledge: if you manage a database full of X then you should probably keep your ear to the ground regarding new developments in the field of X, particularly if those developments are likely to hit you and require immediate action within hours of developing. I don't want to cast being a bad/ill-prepared database administrator as some kind of moral failing. (I say this largely to cover my own back because I, myself, am a terrible database administrator. In fact, full disclosure: I am not now, and have never been, a database administrator of any kind. All of the above advice could well be complete gibberish.)

Looking into the realistic future, it seems as if the next major change in this area is going to relate to polygamy. Some marriage DBAs reading this (if there actually are any?) may already be familiar with the challenges and constraints associated with storing polygamous marriages. Are there Mormons in your state? Boom.

These changes will be more complex than the ones we've seen here, and I'm not going to go over them for several reasons. Firstly, I kind of already did. Secondly, it doesn't strike me as being so imminent at the time of writing... although, as I mentioned, keep your ear to the ground. Thirdly, supporting a highly complex database at a time when none of that complexity is needed or used can be tiring and unrewarding, particularly if it turns out that that complexity is never going to be needed.

But fourthly, it's always possible that something else could happen instead, some new legislative development which requires breaking your formerly-perfect marriage database down and building it up again into a new shape.

You can't fully prepare for that kind of thing because you don't know what it's going to be. But there are decisions you can make now which can make future changes easier (or harder) to make. Get used to making schema changes, botching them, rolling them back, losing no data, and trying again. Make backups.

Because marriage is a big deal. By definition it is a far bigger deal to the people getting married than it is to you, the person who takes care of the computer which logs the event. After all the formidable legislative barriers have fallen, you don't want to be the jerk who raises a trivial technological barrier to replace them.

And in case it isn't obvious: all of this generalises massively. All of us who create and maintain computer systems are simultaneously creating implicit and explicit restrictions on how those systems are used, and what kind of data they can store, and what users can do with them. Careless design can make the lives of users difficult or even nightmarish, in ways which are genuinely important.

And while carelessness can be forgivable, and a failure to prepare for all possible futures is unavoidable, not fixing the problem in a timely fashion — or even not being able to fix it — is a very unhappy place for both you and the user to be.

Discussion (19)

2015-06-27 05:30:10 by Creaphis:

Out of all the possible responses to today's events, this is the one you chose to write. Very original. DBAs truly are the unsung heroes behind the face of history.

2015-06-27 06:56:58 by qntm:

My point was that DBAs can very easily be the obstruction to progress. Not being an obstruction isn't heroic, it's just the job.

2015-06-27 13:55:03 by skztr:

I was complaining all of yesterday that every article I clicked on with an interesting-sounding title *didn't* turn out to be exactly this.

2015-06-28 18:55:09 by anonymouse:

The Soviet Union clearly understood that marriage was all about databases: the traditional wedding was held at the Civil Registry, and the actual key moment of the ceremony was when the couple signed their names in the registry book, which is in a sense simply the insertion of a row into an archaic sort of database.

2015-06-28 19:06:51 by Paul:

This is interesting enough, but as you say, this should have long been anticipated by any DBA worth their random cryptography string. Anyone still creating marriage tables with husband_id and wife_id is either culturally sluggish or deliberately intransigent. Besides which, this has theoretically been problematic even before same sex marriage was an issue, since some states allow people to marry based on their post-surgery gender identity. Let's say you had employed Bruce Jenner. He was married. He was therefore listed in the husband_id field. Now he's she, Caitlyn Jenner. If she lived in a state that (prior to Friday) only allowed her to marry a man, and she got married, you'd presumably put that ID in the wife_id field. That means, to make sure you're not allowing someone who's currently married to get married to someone new, you'd need to check her ID in both the husband_id and wife_id fields (where divorce_date is NOT NULL, of course). So this has been problematic, theoretically, since state laws allowed people to get married based on their presentation gender. That's been a while. So really, since gender is already recorded in your hypothetical PEOPLE table, it seems redundant to have husband_id and wife_id fields, instead of spouse1_id and spouse2_id fields, even prior to Friday.

2015-07-01 16:52:06 by MichaelSzegedy:

@anonymouse: We still do this in Hungary. I'm not sure when the electronic step happens, if at all; writing it in the book is the important part.

2015-07-02 00:52:14 by Bill Stewart:

Creaphis, it's not actually that original a response - it's a straight-forward current-events update to (Sam?'s) 2008 article http://qntm.org/gay which *was* very original and well thought out and looks at a lot more cases. (Now for nitpicking) The people database probably needs a death_date field as well; some kinds of family/marriage law apply after death in ways that don't apply after divorce (e.g. even with mandatory monogamy, a remarried widow/widower has a legal interest in the first marriage, such as a spouse's pension.) Also, the database shouldn't enforce all the legal constraints as database integrity constraints - sometimes people are in two active marriages at once, even when that's illegal, and maybe the state wants to be able to look that up, and sometimes database records don't get recorded properly (e.g. a death or a divorce in some other jurisdiction might not get entered into the database, and the DBA should be able to detect that), and of course, a person's name isn't always a single attribute/value pair, especially if they get married or divorced or are an actor or writer or spy.

2015-07-03 10:11:26 by LNR:

I'd argue that the database should NOT be structured so as to attempt to enforce the law. That sort of thing is how you let a simple data entry error turn into a huge problem in somebody's life. For instance, say we have very well-designed rules that prevent a person from entering two marriages at once (because multiple marriage is illegal in this jurisdiction). Alice and Bob get divorced, but the clerk fails to hit Save and their marriage is still marked as active. If Alice later tries to marry Charlie, the database structure does not allow it. There is no way to document their marriage until some other clerk has time to investigate the problem and decide whether it should be fixed. This is certain to be tremendously inconvenient, and further, it may have serious legal repercussions. A marriage that is not properly documented may not legally exist! When the existence of the record is important, it makes more sense for the database to accept the entry, detect errors, and flag them for human review. That's what we generally do in the medical field. It would be stupid if the system prevented a physician from charting with something like, "Error: Can't save hysterectomy documentation for male patient." Instead, a well-designed EMR just notes the inconsistency, marks a little icon on the chart, and then if the issue is not addressed sends an automated email to the physician. (And if he doesn't fix or validate it within N days then it goes to a second human.)

2015-07-03 18:10:41 by Bork:

@LNR: Interesting. What happens if it's a physician that isn't male that doesn't fix or validate the inconsistency within N days? Does the system behave differently?

2015-07-03 21:04:16 by qntm:

I totally agree that adding the suggested constraints at the database level is not really appropriate. It's not job of the DBA to enforce that kind of rule. This, in itself - what is the DBA responsible for, what are they not responsible for - is a worthwhile question. On the other hand, no matter how the database is structured, it is always possible to produce data which cannot fit into it. It doesn't matter how hard the DBA tries to stand back and allow/enable others to do their work, every database, by virtue of its structure, imposes explicit constraints on how it can be used. The gay marriage example doesn't illustrate this case very well! That's because it's actually a relatively piffling alteration to the structure of data which needs storing, a relaxation of a fairly arbitrary constraint. Other marriage concepts, however, illustrate this point better. Polygamy is probably the next "big one" on the list. Polygamous marriages can't be stored using the schema outlined above. (Or if they can, it would be a hacky workaround.) It would be better to develop a whole new schema. Point being, even though really it is not the DBA's job to say "No, all three of you cannot get married", unfortunately this is the position that an ill-prepared DBA can find themselves in.

2015-07-03 21:06:23 by qntm:

Although, having said "no matter how the database is structured, it is always possible to produce data which cannot fit into it", it sounds like the medical community has that whole problem firmly locked down. How did you folks manage that?

2015-07-04 06:17:59 by LNR:

We write a lot of notes. Every electronic medical record allows entry of free text of arbitrary length. In fact the expectation is that free text is used often. Standard practice is that each physician writes or dictates a note about each patient encounter, in their own words. For a hospital stay they write a new note each day. Nurses and other staff write notes with widely varying frequency depending on the situation, which could be once per shift or many times in an hour. The chart contains lots of data as various checkboxes, selectors, flowsheets, and so on. That's all available for purposes of visualization and data mining. ("How has this patient's blood pressure changed over the past week?") But when you want to know the clinical impression, the rationale for a particular diagnosis, or the gist of a conversation with the patient, that doesn't easily boil down to a database entry. The text explanation is necessary.

2015-07-04 06:25:59 by LNR:

As an aside, I'll note that electronic medical records (EMR) are not yet a terribly mature technology. They were originally developed as a direct replacement for paper charts, and the first generations just mimicked the same workflow of charting on paper. It is only within the last ten or even five years that they have become ubiquitious, and have really started to take advantage of the stuff a database can do that a paper file cannot.

2015-08-23 14:50:22 by Tim McCormack:

Maybe this was covered in the original post, but do any municipal marriage databases *actually* have referential integrity on humans -- or even database IDs for them? Based on the paperwork I've seen, it seems like the schema is more like: - husband_name - husband_SSN - husband_etc - wife_name - wife_SSN - wife_etc ...and if you want to find collisions, you have to do a self-join and think very carefully about how people's names and SSNs and whatnot can change over their lifetimes.

2016-09-29 17:02:38 by Nicholas:

Having husband_id and wife_id sounds like you haven't normalized your database enough. Instead, have Person (id, name), Marriage (id, start_date, end_date) and Marriage_Participant (id, marriage_id, person_id). This would be a relatively simple way to support changing definitions and prevent you from having "person_1_id, person_2_id, etc" as your columns.

2018-05-22 16:43:24 by NoSQL new?:

Ummm...IBM (previously Lotus) Notes was NoSQL before NoSQL was cool. If structure is required, the application layer logic provides it. This makes changes to rules a lot easier to implement since there's no impact at the db level. Additionally forms and fields can have multiple synonyms so if 'husband_id becomes too restrictive, just add the synonym for 'partner_id' and the meaning remains clear and current.

2019-03-11 13:47:02 by qntm:

"Husband" and "partner" aren't synonyms.

2022-01-18 21:48:26 by Scott:

Partner1id != Partner2id solves the > problem handily, right?

2022-03-22 20:44:40 by Joshua:

What happens with old uncorrected inconsistencies is they end up in a cleanup report. There's no way to enforce "fix this by X days" so it doesn't. But every so often we still get requirements that want bonkers hard constraints. Those tend to be destructive. My favorite one: cardiac surgery result: deceased, but discharge status: alive. Yes, it happened. No, you can't clean that data up. No, it's not a case of crossover to the wrong patient's record. He was found alive in the hospital morgue.

New comment by :

Plain text only. Line breaks become <br/>
The square root of minus one: