I’ve often pondered how to record and work with ambiguous dates in software. I typically have one of two reasons:
- Scheduling: an item that needs to be done next week or some time in February or a summer Saturday afternoon. The user can schedule it for Monday, or Feb. 1, then, at the end of every day it goes undone, reschedule it for the following qualifying day. This works, but the role software is to automate that sort of responsibility. If the user has to do something the computer could do just as well, the software is lacking.
- Dating pictures: You kids today and your fancy digital cameras with your fancy exif data and your geotagging. In my day, you had to write the date on the back of the photo. By hand! Like they did in the dark ages. Some pictures from this era have complete or partial dates. Some dates were obviously written decades later. Others you could probably guess and be accurate within about 5 years. In addition to storing a date, each part of the date (day, month, year) may have differing levels of accuracy, and it would be nice to be able to store its certainty with it.
I usually work with MySQL for my application’s data storage, so I’m going to assume that’s the tool we’re storing ambiguous dates in. It’s native Timestamp/Datetime format is YYYY-MM-DD hh:mm:ss. I usually store dates this way. In some contexts I might simplify my life by storing the Unix timestamp instead, but I like the dates to easily human-readable straight out of the database. Any ambiguous date storage system must consider all six pieces of information (year, month, day, hour, minute, second). An obvious table to store this sort of information might be:
create table ambiguous_dates(
id int unsigned not null auto-increment
primary key,
year int(4) unsigned,
month int(2) unsigned,
day int(2) unsigned,
hour int(2) unsigned,
minute int(2) unsigned,
second int(2) unsigned
);
Regarding the inclusion of all six pieces of information: while a system for cataloging decades-old pictures won’t have any use for the time of day portion, the idea here is to develop the most complex version I’m likely to need, then scale it back for individual projects if necessary.
I have three issues with this implementation:
- Day of the week. The only way to deal with hat is to add another column, but that introduces the possibility that parts of the date will conflict with one another (ie, the YYYY-MM-DD portion is given as 1962-03-23, which was a Friday, but the day_of_week column claims Wednesday.)
- Certainty of each field. This is especially important if the parts of the date can conflict with on another as in the example above. We could add another six fields for year_certainty, etc, but that feels heavy-handed to me; I suspect there’s a better way.
- According to this article, it just reinvents functionality already built in to MySQL.
create table ambiguous_dates(
id int unsigned not null auto-increment
primary key,
year int(4) unsigned,
month int(2) unsigned,
day int(2) unsigned,
day_of_week int(1) unsigned,
hour int(2) unsigned,
minute int(2) unsigned,
second int(2) unsigned
);
create table ambiguous_dates_certainty(
date_id int not null,
field varchar(11) not null default '',
certainty int(1) not null default 0,
);
This is closer to what I want, but it doesn’t join gracefully with the above table (you sure can join, but you’ll get up to seven rows of results for one date, which is awkward), so you risk having to run a second query, which is expensive in terms of processing time, and still gives you two sort of awkward sets of data. Additionally, I’m not sure I like just having an int for certainty. If it’s am multi-user application, one person’s seven may be another person’s three. Also, unless very explicitly programmed otherwise, some users may see 1 as very certain and others may think of 9 as the most certain. And even then, since when do users take the time to read the whole prompt when they think they have the gist of it?
We could add a text column indicating the source of each portion of the date so that we’d know whether it was from a reliable source or somebody’s best guess half a century after the picture was taken, but that also feels excessive to me. What would the corresponding interface for that look like? Is it fair to expect to enter each portion of the date separately, then add meta information about each portion? Especially when they’re likely to be (as in the example of the picture catalog) cataloging dozens or hundreds of ambiguous dates? Eventually, whether programmer or user, you have to accept that some data will always get lost to the sands of time, even if it means never knowing how old grandma was in that picture of her with her dogs.
A compromise might be to just have one column for notes in the ambiguous_dates table. If programming for my own use/enjoyment, I don’t think I’d take that compromise. My usual MO is to try to retain as much information as possible because once forgotten, it can’t be brought back, whereas if it’s retained needlessly, it’s easy to ignore.
I don’t think I’ve made any particularly good suggestions, but it is a starting point. I googled the topic and found very little written on it (but I did find this thread about another kind of ambiguous date), although I may not be using the best terminology. I’d be interested in hearing what the good people of the Internet have to say.