A Peek Behind the Curtain
While an SSD might look like a simple device on the surface, there is a great deal of complexity behind the scenes. In order to ensure that your SSD stays in prime working condition, the SSD controller must manage complex performance and lifetime optimization algorithms. To facilitate an understanding of why the design and implementation of these algorithms is so crucial to your SSD experience, it is necessary to understand the limitations of NAND technology and what exactly is going on behind the scenes to compensate for them.
Writing and Erasing NAND
Before we can understand the various maintenance algorithms that the controller employs to keep your SSD neat and tidy, we need to understand a few basics about how we read and write data to a NAND chip. Data is stored in a unit called a “page,” which is finite in size and can only be written to when it is empty. Therefore, in order to write to a page that already contains data, it must first be erased. This would be a simple process, except each page belongs to a group of pages collectively known as a “block.” While data is written in pages, it can only be erased in blocks. To help illustrate the concept of pages and blocks, let’s use an analogy. Think of the Etch-A-Sketch toys many of us used as children. You could continue drawing on the screen until it was full – think of each artistic flourish you add to the canvas as a separate “page” and the total canvas as the “block” that all of the pages exist in – but when you wanted to erase something, you had to shake the toy and erase everything at once. Extending this thought process, there would be many Etch-A- Sketches inside each NAND chip. This restriction obviously has great implications for how the controller manages the NAND flash that stores your data.
Performance Over Time
With the above limitations in mind, let’s look at what would happen if we wanted to re-use a page that contained old data. If the SSD wanted to reuse an existing, no longer valid page, the other (valid) pages in the block would have to be copied into an empty NAND cell while the entire block was erased. The SSD would then rewrite the entire set of valid data, old and new, to the intended cell. This process is complex and time-consuming. Thus, the SSD controller avoids directly “overwriting” existing data in favor of working some magic through the Flash Translation Layer (FTL), a special mechanism that helps coordinate communication between the NAND flash and the host system. Utilizing the FTL, SSDs implement a logical to physical mapping system called Logical Block Addressing (LBA). Physical NAND writes are not required to correspond directly to the space the host system requests. Instead of performing all of the unnecessary copies described above to overwrite a piece of old data, the SSD writes the new data to the next available page and simply marks the old data as “invalid.”
In reality, the technique just described only delays the inevitable – at some point the SSD will suffer some sort of performance deficit regardless of whether or not it has to perform the complex overwrite procedure above. Let’s think of what happens when the drive is full. As an SSD is filled with more and more data, there will naturally be fewer free blocks readily available. The SSD is then forced to actively consolidate valid data and prepare free blocks in order to write new data and perform maintenance. This process of moving and consolidating data takes time, which is perceived as decreased performance, and requires free space. This is why Over Provisioning, which guarantees a certain amount of free swap space to use for Garbage Collection and other maintenance activities, is so important for SSD performance – it allows the Garbage Collection algorithm to prepare free space in advance through data consolidation.
Now that we know a bit about why the SSD needs free blocks to be readily available, we can talk about one of the ways the SSD controller prepares free blocks. One of those processes is called “garbage collection.” This name is a bit misleading, as the process actually involves the collection of good data, not garbage. The concept, however, is rather simple.
By nature, because of the nuances we discussed above, SSDs are a bit obsessive about how they organize their valid data. They prefer to have it neatly stacked together, and they run more efficiently when they can keep it that way because it is easier to find free space to write to. Unfortunately, everyday use makes this type of neatness difficult, because the OS is constantly writing and deleting data of various sizes, leaving data strewn haphazardly throughout the SSD.
Garbage Collection remedies this by combing through the Swiss cheese of data left behind, collecting any valid data and carefully placing it together. By doing this, invalid data is left separate and can be erased to make more free space, which means no waiting is necessary when you try to write new data to a page that may have previously been filled with “garbage” data.
Modern operating systems have enabled another form of SSD maintenance, TRIM. TRIM is a facility by which the OS can notify the SSD when data is either marked for erase or no longer valid. TRIM helps to make Garbage Collection more efficient by preparing invalid data for deletion. Remember, SSDs are new technology, so computers were built to interface with traditional hard disk technology. Hard disks are not subject to the same write/erase limitations that SSDs are – they can easily overwrite data in an existing location without erasing it first. Therefore, when the OS “deletes” data, the data does not actually go anywhere. The space in which it resides is simply marked as “free space” that may be used later. By default, because it doesn’t know it’s not working with its longtime HDD companion, the OS doesn’t let the SSD know that a particular piece of data is no longer valid and that its corresponding memory location is now free – after all, there is no reason to do so. With the introduction of SSDs, however, there is now a compelling reason to increase communication about file validity between the OS and the storage device. Enter TRIM. TRIM allows the OS to inform the SSD which data are no longer valid, allowing the SSD to skip over invalid data when performing Garbage Collection instead of moving around old data. Once a block is full of pages that all contain invalid data, that block is considered free and may be erased. The TRIM command is sent to the SSD controller automatically by the OS every time it deletes a file. As it requires OS support, not all users will be able to use native TRIM functionality. On PCs, TRIM is supported in Windows 7 or later. On Macs, TRIM is only supported for Apple’s OEM SSDs and is not supported for Samsung’s (or any other manufacturers’) aftermarket SSDs. Users of older Windows operating Systems (Windows XP, Windows Vista) may use Magician’s built-in “Performance Optimization” feature to manually pass the TRIM command to the SSD on demand (or via user-specified schedule).
Bad Block Management & Error Correcting Code (ECC)
In addition to maintenance at the drive level, the SSD must also perform maintenance at the chip level. In every NAND cell, each page contains a few extra bytes of extra capacity that the SSD controller uses to store a “parity bit.” Error- Correcting Code (ECC) uses this parity bit to compensate for other bits that may fail during normal operation of the drive. When the controller detects a read failure, it will invoke ECC to try and recover from it. If recovery is not possible, the firmware’s bad block management feature will retire the block and replace it with one of several free “reserved blocks.” “Bad blocks” can be made during read, program, or erase operations and are actively managed to guarantee expected SSD performance.
NAND flash memory suffers from one final limitation: each cell has a finite lifespan and can only withstand a limited number of program/erase cycles (called P/E cycles). The specific amount of P/E cycles depends on the process technology (e.g. 27nm, 21nm, 19 nm, etc.) and on the program mechanism (e.g. SLC, MLC). In order to overcome this limitation, the SSD firmware employs a wear-leveling algorithm that guarantees that write operations are spread evenly among all NAND cells. Using this technique, no single cell should be unduly stressed and prematurely fail. If too many cells were to fail, the entire block would have to be retired as just discussed above. There are only a limited number of reserved blocks, however, so this event should be avoided to prolong overall drive life.
Fortunately, all of the above procedures (with the exception of TRIM if you’re using an older Windows OS) happen transparently and without action on behalf of the user. While specific implantation will vary, most modern SSDs include all of these features. In fact, without features like Wear Leveling and ECC (to extend drive life and protect data integrity) and TRIM and Garbage Collection (to maintain SSD performance), SSD quality and user experience would suffer.
Maintenance procedures like wear-leveling and Garbage collection, which are created to overcome the unique properties of NAND flash memory, work together to help ensure that your SSD performs well over extended use. Together, these algorithms actually increase write activities to the NAND, which reduces overall lifespan. Thus, the key in designing a great SSD is finding the optimum balance among lifespan, performance, and reliability. As the #1 player in the memory business for over 20 years and the largest global supplier of SSDs in the preinstalled storage business, Samsung has unrivaled knowledge of and experience with SSD technology. Samsung’s unique, integrated approach to SSD manufacturing affords it full control of every component. You can trust that Samsung’s expertise is safeguarding your precious data, and your productivity, when you purchase a Samsung SSD.