SQL Server, Persistent Memory on NVDIMMs and DAX

Paradigm Shift!

What do I mean by that? Every once in a while a technology comes along and changes the way things are done, moves the bar…well last week Microsoft released a Channel 9 video on persistent memory using NVDIMMs and DAX on Windows 2016…then combining it with SQL Server! This is one of those technologies that moves the bar! Check it out here.

Why is this important?

Relational databases like SQL Server use a transaction log to ensure the durability of the transactional operations to the database. This is so it can ensure its data is consistent in the event of a system failure. SQL Server uses a logging protocol called write ahead logging (WAL). This means that the data change operations must be written to stable, persistent storage before the process can change data in the database file…so this means our database throughput is at the mercy of the performance of the device the changes (log records) are written to.

Another facet of this is disk latency. Historically this has been terribly inefficient, especially in the case of spinning HDDs due to seek time and write latency. Disk latency has gotten much better recently with SSDs and NVMe. To combat disk latency, SQL Server doesn’t perform a disk write for each and every log record. SQL Server writes log records to what’s called a log buffer in main memory, once the log buffer is full it or a transaction commits (or aborts) it flushes that buffer to disk in sector aligned IO operations. This means your transactions have to wait for the disk IO to complete before they’re considered hardened to disk. This is a core bottleneck in RDBMS systems.

What changed?

Well, in NVDIMMs we have a DIMM, which is normally used for main memory, is now backed with a battery backed flash array. This means we can have access to stable storage at main memory speeds, think less than 10 microseconds, not milliseconds. An order of magnitude performance gain! NVDIMMs have two modes of operation, storage mode and memory mode. Storage mode will present the device as a block device to your OS. Memory mode will present this as a memory mapped device. Windows Server 2016 has the drivers to deal with both modes. In the first case you’ll get a block device and all the code that goes along with making a block IO request, the advantage of this mode is no application code changes, it’s just a block device like any other. In the second mode, memory mode, it presents byte addressable memory to the OS as storage, but you may need to change code to take full advantage of this. So, Microsoft has developed a feature called DirectAcces (DAX) in Windows Server 2016 to present memory mode storage to the OS bypassing traditional block IO code. This is where the insanity ensues! You’ll literally have stable storage available to your applications at main memory speeds, committing to stable storage is now just a memory copy!!! Watch the video for performance numbers!

How this will impact you?

Well, in the video Lindsey Allen highlights a code change in SQL Server that modifies how the log buffer writes data. Log writes are immediately considered hardened when committed to the log buffer. We no longer have to wait for the disk IO to complete, side stepping disk latency and IO altogether. This leads to a 2x improvement in the demo application in the video. These tests compare NVMe and DAX, and NVMe is PCI-express attached flash which is as good as it gets for persistent storage today (well until memory mode NVDIMMS take over). You’ll notice that CPU spikes during their load tests, this isn’t an uncommon issue when you remove your storage bottlenecks from your system.

What’s it going to cost?

Looking online, HPE has a 8GB NVDIMM available for $800 at online wholesalers, this is in DIMM form factor. Some are built directly into the motherboard, others are on cards.

Questions that I have?

  1. Foreign Memory Access – In a NUMA system is there be a performance penalty or inconsistent performance when threads need to read or write to a foreign memory area. Log writing is single threaded, so this should be something that we can manage. It does go multithreaded in 2016, so the best thing to do here is test your workload and measure.
  2. Hypervisor Support – can we leverage this awesomeness in a virtual machine?
  3. Error Detection – how to errors percolate back up into the OS/SQL so we know when something goes wrong. Likely using standard disk IO alerts will apply here.

Well hopefully I’ve convinced you how big of a deal this is and you’re as excited as I am!

Please let reach out to me if you like to know more about this – aen@centinosystems.com