Last month, I attended the Flash Memory Summit (FMS) in Santa Clara, California, a conference that focuses on the various flash technologies and their adoption in the data center. FMS 2018 reported over 6,000 registrations, making it one of the largest independent storage conferences. There were over 16 keynotes and many more educational sessions, as well as an exhibition hall with companies showing off their latest flash chips, devices, interconnects, and even complete storage arrays. Verizon, Toshiba, WDC/DCS, Micron, SK Hynix, Yangtze Memory, Intel, Solarflare, Xilinx, Marvell, ScaleFlux, Microsemi, Shannon Systems, NGD Systems, Smart IOPS, and Huawei Technologies provided keynotes. If you want a deep education in some of the latest flash technologies or even on how to develop your own flash controller, this is the conference for you—but it also has high-level sessions mixed into the educational offerings.
The data economy is driving flash adoption
Marvell, Micron, and others in their FMS 2018 keynotes declared the next era after cloud is “the data (economy) era.” Now that flash devices and SSDs have been out for a while and their usefulness is proven, it’s all about how to get data processed quickly and efficiently by combining flash/SSD and high-performance compute. Many of the keynotes described how flash technologies can be deployed to enable faster results and provide better efficiency in data processing scenarios.
Western Digital (WDC) went on to point out that the economy is quickly becoming data driven around two types of data: big data and “fast data,” which is being generated by innovations around cloud and enterprise data centers, edge and client computing, industrial and consumer IoT, smart communities, and autonomous cars. Fast data is data that is used in real time for insight and decision making. Toshiba estimated that really fast, or “hot data,” is only 10% of the overall generated data. Big data, on the other hand, is commonly used as a repository for analysis data. This repository data can be highly varied in nature and is commonly accumulated in large volumes over time.
There were three undercurrents to the Flash Memory Summit under which the various keynotes, sessions, and company presentations can be grouped:
- Density improvements in NAND flash and the introduction of even faster persistent memory
- Removing bottlenecks in plumbing and interconnects to provide improved high-performance connection between storage and compute
- Creating new architectures and solutions to take advantage of the benefits of flash
Flash devices are getting faster on one hand, cheaper on the other
For years we’ve have heard about storage-class memories (SCM) and how they will be a quantum leap in persistent memory performance. These new chips and devices are finally hitting the market and will have significantly more performance than today’s NAND flash, but they will be priced at a premium for the foreseeable future and start out being deployed as niche technology. The industry is also working on flash technologies in the workhorse NAND flash storage area that will continue historical price declines. In addition, vendors are working to evolve performance levels to increase performance at modest rates.
- FMS 2018 showcased revolutionary types of persistent memory, like 3D XPoint, that is much faster than current NAND technologies but not as fast as DDR memories. They are positioned to fit between DDR at the performance high end and the now traditional NAND flash at the performance mid-range. The big benefit to these new high-performance, high-durability technologies is that they are persistent compared to DDR. All of this makes them very interesting as a large cache in front of slower media in the IO stack. Several vendors discussed the potential benefits of using this emerging type of memory for real-time processing and AI with emphasis on use cases requiring the lowest latency.
- Broadly adopted NAND flash is continuing its evolution toward being much lower cost and incrementally faster. NAND flash has quickly become the workhorse storage media for most transactional workloads, replacing 15K and 10K rpm high-performance spinning hard disk drives (HDD). Vendors are on track to increase the flash chip or part density and thus continue or accelerate the price declines we’ve seen over the past several years. According to SK Hynix, “bits per wafer has increased 60x over the past 10 years,” and the company expects to be able to put “8 TB on one package chip by 2025.” Much of this increase in density is supported by stack layering. Stacks are just now achieving 96 layers or fewer, but Micron stated they see a roadmap for to up to 200 chip layers.
- In addition to NAND flash becoming lower cost, companies are also trying to improve its performance levels. According to SK Hynix, we should expect TLC and QLC NAND speed to increase 30%. These performance increases could even be greater if new architectures are implemented in the design of future NAND SSD to better connect the flash chips to internal buses, stated WDC.
System bottlenecks are next
Flash is fast and getting faster; thus, the infrastructure interconnect around flash must also evolve. Panel participants pointed out that it takes about four typical NAND SSDs to saturate a 100 Gbps link, so the raw performance is there. Now that there has been a huge increase in basic device performance, new performance bottlenecks when using flash have come to the forefront. Internal to the server, improvements in the physical path from the SSD to the processor are being implemented. At the networked storage level or the storage external to server, companies are driving toward NVMe over fabric (NVMeoF) to create shared storage systems with most of the bottlenecks around external to the server data access removed.
- Recently, PCIe interconnects have proven themselves as great high-speed NVMe connections for flash drives. In this scenario, the PCIe interconnect allows flash to be directly plugged into the server, providing a proven method for achieving low latency performance because this PCIe direct-plug architecture provides high-speed connect to compute. The industry is now moving forward with the PCIe Gen 4 transition, which doubles the per lane access to flash. Along with that are new PCIe switches and bridges to enable server capacity expansion by way of JBOF (Just a Bunch of Flash) or external PCIe attached disk boxes. Looking forward, the whole ecosystem for PCIe Gen 4 needs to be built out, which includes NVMe adapters and drivers, data protection, connection to GPUs, orchestration, and composability management.
- There are still many data transfer bottlenecks in the networked storage architecture to correct. All flash arrays until this year were commonly based on SAS (or SATA) drive interconnects. The host connections likewise have not been updated and still use tried-and-true FC or iSCSI connections. For years, almost all data movement was based on SCSI, but after the success of local NVMe to improve performance, storage network vendors are adopting the same NVMe protocols but running them over a switching fabric. This allows the physical storage capacity to be efficiently shared by multiple hosts and for provisioning to be done dynamically. Many of the conference sessions contained advantages/disadvantages debates around the various NVMeoF protocols, such as FC/NVMe, RoCE, iWARP, and IB. The future of NVMeoF appears to be bright, and products for early adopters are starting to be released, but a true winner has not emerged.
New architectures and solutions are needed
The effectiveness of using flash in the server has demonstrated how useful flash can be for driving increased data processing performance. But until now, the external storage players have done little in the storage infrastructure to exploit the new performance levels of flash. To fully benefit from flash requires a cascading set of infrastructure innovations all the way from the flash disk to the server CPU. A couple of noteworthy solutions and data center–level innovations discussed at FMS 2018 were Computational Storage and composable infrastructure.
- Computational Storage (CS) captured the attention of conference attendees this year. In fact, this new category of storage and its associated vendor companies won several show awards. Currently, facial recognition seems to be the “killer app” for this innovation and its need for high-throughput vector indexing and results search. A Computational Storage Subsystem works by performing the IO as close to the flash media as possible, skipping a data move across PCIe buses and by using the inherent parallelism of multiple SSDs to its advantage. Instead of moving the data to the CPU, a portion of the data is processed in the CS SSD subsystem and the result transferred to the main CPU for holistic application interaction. The computational storage capacity is typically for short-term data use and is generally not replicated or protected, so other types of storage may need to be integrated into the general AI workflow.
- A new style of managing storage is a composable infrastructure approach, where resources are logically pooled and then automatically provisioned so that administrators don't have to physically configure hardware to support a specific software application. WDC’s keynote claimed that this type of composable infrastructure is key to achieving improved scalability, efficiency, agility, and performance for processing data. I can’t help but think of the approach as a next-generation storage network. To realize this style of provisioning resources, new orchestration frameworks need to be developed as industry standards.
The bottom line
Several keynotes pointed out that disk IOPS hadn’t improved significantly until SSD came along. HDD storage performance was held back by the physics and geometries of spinning disk, which in turn masked many other system and interconnect bottlenecks. This slow-moving performance improvement pattern generally required compute architectures to evolve to deal with these very slow disks.
When flash disk first came out, flash was consumed as a simple replacement for spinning disk, not requiring changes to infrastructure. With broad adoption of flash SSD, disk is no longer the bottleneck, leading FMS attendees to one key question: What is the next bottleneck and how can it be removed?
Innovations and hot topics to keep your eye on are the robustness of NVMe for replacement of SCSI protocols, the certification of NVMeoF external-to-server protocols in support of shareable storage, how quickly 100Gb Ethernet will become the de facto bandwidth for storage networking, new internal architectures for storage arrays that remove bottlenecks and allow performance scaling, the emergence of new composable storage software, and the new storage category of computational storage.
Going forward, to truly consume the growing performance levels of flash, many things in the architecture need to change. These changes are what many of the participants at the 2018 Flash Memory Summit are currently and visibly grappling with.
As always, I welcome your feedback.Dennis Hahn
Sr Analyst, Data Center Storage
+1 316 648 8567