A Simple Question
One of the most frequent questions that comes up in the realm of enterprise applications is:
Who has touched/modified this data?
Short questions usually have long answers, and this one is no exception. To understand the implications of this question, one has to understand the principle of temporal versioning. This essentially means that time is treated as a dimension over which changes to data are persisted and observed. A full discussion on this topic can be found here. This post identifies a new feature in SQL Server 2016 called temporal tables and contrasts it with the manual process of creating and managing temporal tables in previous versions of SQL Server. The following example provides a basis for further discussion which in turn answers the question. Imagine a table with the following schema:
CREATE TABLE [dbo].[Orders] ( [Id] [int] IDENTITY(1,1) NOT NULL, [Amount] [money] NOT NULL, [Comments] [nvarchar](4000) NULL, [LastModifiedBy] [nvarchar](255) NOT NULL, CONSTRAINT [PK_Orders] PRIMARY KEY CLUSTERED ([Id] ASC) )
A simple orders table with Id as the primary key, Amount to track the money spent on an order, a Comments column to track qualitative details about the order, and a LastModifiedBy column to track who modified it.
In the process of consuming this table, imagine that a row was inserted:
INSERT INTO [dbo].Orders ([Amount],[Comments],[LastModifiedBy]) VALUES (5.32, 'Tall Caramel Macchiato', 'DOMAINpranav.kothare')
If we ask the question who modified this data, the answer is simple: the user DOMAINpranav.kothare was the person who last touched this row.
Now assume that this row was inserted with an error and that the amount needs to be updated from $5.32 to $4.89. Also assume that the correction was completed by a different user as follows:
UPDATE [dbo].[Orders] SET [Amount] = 4.89, [LastModifiedBy] = 'DOMAINkayla.funke' WHERE [Id] = 1
If we ask the same question again, the answer from the system’s perspective is simple: DOMAINkayla.funke is the user that last touched this row.
While the answer from the system’s perspective is simple, it is also incomplete. More often than not, due to compliance reasons, businesses not only want to know who modified the data the last time, but also who modified it since it was inserted into the table.
A naïve implementation would be to simply attach a timestamp column with well-defined semantics of how to determine valid rows versus invalid ones. But in the process of doing this, we cannot maintain Id as a primary key because we end up violating the uniqueness constraint. Including the timestamp column in the primary key is also just as bad because it confounds the process of maintaining foreign key relationships against the table. So what is the solution then?
Prior to SQL Server 2016
In earlier versions of SQL Server, the solution was to create a table with a similar schema that was identified as a history table. The following schema defines a history table that corresponds to the Orders table that we established earlier:
CREATE TABLE [dbo].[Orders_History] ( [Id] [int] NOT NULL, [Amount] [money] NOT NULL, [Comments] [nvarchar](4000) NULL, [LastModifiedBy] [nvarchar](255) NOT NULL, [ValidFrom] [datetime2] NOT NULL, [ValidTo] [datetime2] NOT NULL )
Note that Id is not a primary key and that this schema has two additional columns ValidFrom and ValidTo. These additional columns help identify the validity of the row over time. I have omitted additional DML triggers that would have to be written against the Orders table. However, the entire process would work as follows:
- When a row is inserted into the Orderstable, a row would also be inserted into for the Orders_History table with the same values for Id, Amounts, Comments, and LastModifiedBy. Furthermore, ValidFrom would have a value of the time of insertion into the Orders table and ValidTo would have the value of EndOfTimes (a sybolic value that indicates that this row is still valid, you can use a value like December 31st 9999 23:59:59).2. If a row is modified in the Orders table, a row in the Orders_History table with a matching primary key, that is still valid based on the ValidTo value, is expired by updating itsValidTo value to the time of when the update occurred on the Orders table. After this step 1 is repeated for the Orders_History table with the new values of the updated row.3. If a row is deleted in the Orders table, we would perform the same procedure as step 2, except the ValidTo value would be updated to the time of deletion from the Orders table.
So for the series of events described earlier (just an insert and update on the Orders table), we would have to do something like this via triggers:
-- Initial insert INSERT INTO [dbo].Orders_History ( [Id], [Amount], [Comments], [LastModifiedBy], [ValidFrom], [ValidTo] ) VALUES ( 1, 5.32, 'Tall Caramel Macchiato', 'DOMAINpranav.kothare', GETDATE(), '9999.12.31 23:59:59' ) -- Update DECLARE @now datetime2 = GETDATE() UPDATE [dbo].Orders_History SET [ValidTo] = @now WHERE [Id] = 1 AND [ValidTo] = '9999.12.31 23:59:59' INSERT INTO [dbo].Orders_History ( [Id], [Amount], [Comments], [LastModifiedBy], [ValidFrom], [ValidTo] ) VALUES ( 1, 4.89, 'Tall Caramel Macchiato', 'DOMAINkayla.funke', @now, '9999.12.31 23:59:59' )
At this point, if we are asked the same question again, we have the ability to dive into the history table and obtain information about all users that have modified a certain row of data in the OLTP Orders table.
While this feature allows us to provide a richer experience for end users of an enterprise application, it is non-trivial, very tedious, and stressful to manage especially if you’re trying to evolve your schema.
SQL Server 2016 CTP 2
As of SQL Server 2016 CTP 2, all this has been taken care of for us. We now have the ability to define a history table without directly having to manage the schema of the history table (even though we have the option / flexibility to do so). This in turn allows us to work with OLTP tables / data transparently and develop richer features more efficiently.
For the Orders table we can simply create the table as follows:
CREATE TABLE [dbo].[Orders] ( [Id] [int] IDENTITY(1,1) NOT NULL, [Amount] [money] NOT NULL, [Comments] [nvarchar](4000) NULL, [LastModifiedBy] [nvarchar](255) NOT NULL, [ValidFrom] [datetime2] GENERATED ALWAYS AS ROW START NOT NULL, [ValidTo] [datetime2] GENERATED ALWAYS AS ROW END NOT NULL PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo) CONSTRAINT [PK_Orders] PRIMARY KEY CLUSTERED ([Id] ASC) ) WITH (SYSTEM_VERSIONING = ON)
That’s it! Everything else can pretty much stay the same. For more information on how you can use this feature to your advantage, see Getting Started with System-Versioned Temporal Tables.
With this new feature, it’s going to be so much easier to report on historical artifacts in database, from an audit perspective or a statistical perspective. One last note about the temporal feature is the general size of the database. Even though this feature is meant to capture data by slowly changing dimensions (SCD), slowly for any given businesses is a relative term, and database sizes can grow to be very large relatively quickly. While there is not much on the topic yet, I suspect that StretchDB, another feature of SQL Server 2016, will play a key role in mitigating the size of databases with large history tables.
EasyBox, a product developed at Easy Dynamics, leverages the concept of temporal tables to enable data auditing at a very low (database) level. The development of the product began well before 2012 and the implementation used application code to populate the history tables. Even so, it’s great to see that Microsoft has made this an OOTB feature for 2016. Read our EasyBox blog posts to learn even more about how it can improve your organization.
How do you plan to implement temporal tables to improve your work? Share your wisdom with us in a comment below! If you’re looking for inspiration, check out this video that explores a deep dive into the topic