Challenge: Extracting Business Value from Big Data
Big data is a term applied to data sets too large for commonly used software to capture, manage, and process within a tolerable time. Big data sets are constantly evolving, currently ranging from a few dozen terabytes to many petabytes.
Organizations today face two growing challenges to their ability to extract business value from their big data. First, due to the proliferation of new devices coupled with declining hardware costs, data continues to grow relentlessly. This has led to organizations storing many terabytes, if not petabytes, of data. Second, the complexity of that data increases as customers store both structured and unstructured data, including Word and PDF files, videos and images, and geo-spatial data.
Microsoft's RDBMS (Relational Database Management System) SQL Server offers a robust and scalable platform for storing and analyzing big data. When applications need data from a database, they run queries against the RDBMS engine, which seeks the data in the most efficient manner. It looks first in memory to determine if the data were recently loaded (warm cache); if not present, it then looks on disk (cold cache).
Since data loaded into memory can be accessed a million times faster (nanoseconds versus milliseconds) than data on HDDs, the key for the IT industry is to get entire databases loaded into memory (the "in-memory" solution).
Proof of Concept by Samsung and Microsoft
Samsung Semiconductor and Microsoft performed a Proof of Concept (PoC) at the MTC Paris aimed at illustrating the scenario analysis of decision support system with Microsoft Windows Server 2012 OS and SQL Server 2012 paired with Samsung 20nm-class DRAM and SSD. The results show the measured impact on performance (query duration) and energy savings (power consumption) that can be translated into a high-response user experience with lowered operating costs.
Tests were run in parallel on identical servers, eliminating the need to switch memory modules and HDDs/SSDs between tests; this setup also helped reduce the PoC duration. Both servers were installed in dual boot with Windows Server 2008 R2 and Windows Server 2012. One server was populated with mainstream 50nm-class memory and HDDs in the Windows Server 2008 R2 SP1 OS environment ("Standard & Non-Green Server"), while the other used Samsung 20nm-class Green memory and SSDs working with Windows Server 2012 Release Candidate ("High Performance & Green Server"). A 1TB database was generated with MS SQL Server 2012 set up on both systems.
Test Scenarios
MTC Paris and Samsung used the TPC-H benchmark from www.tpc.org to generate the load, which simulates decision support system (DSS) activities against a database. So as to test the impact of Samsung 20nm-class DDR3 memory and PM830 SSDs (Datacenter version) with Windows Server 2012 OS on system performance and power saving, MTC Paris generated a 1TB database against which it ran a set of four predefined queries (hereafter called scenarios).
Test Protocol
To avoid variance between tests, the data buffers of SQL Server were emptied before each query (cold cache) to test the SSD throughput, and the query was run again (warm cache) to measure the memory's raw performance.
Scenario 1: Minimum Cost Supplier Query
This query finds which supplier should be selected to place an order for a given part in a given region.
Cold Cache Test
High performance & green server solution is 13.4 times faster than standard & non-green server solution and consumes 83% less power to analyze minimum cost supplier query against 1TB database.
Warm Cache Test
High performance & green server solution consumes 25% less power than standard & non-green server solution to analyze minimum cost supplier query.
OS Performance Improvement
The running duration of the minimum cost supplier query on Windows server 2012 RC is 4.3% faster in cold cache test than on Windows server 2008 R2 (no differences observed in the Warm cache scenario):
Scenario 2: Forecasting Revenue Change Query
This query quantifies the amount of revenue increase that would have resulted from eliminating certain company-wide discounts in a given percentage range in a given year. This type of what-if can be used to find ways to increase revenue.
Cold Cache Test
High performance & green server solution is 14.8 times faster than standard & non-green server solution and consumes 94% less power to analyze forecasting revenue change query against 1TB database.
Warm Cache Test
High performance & green server solution consumes 27% less power than standard & non-green server solution to analyze minimum cost supplier query.
OS Performance Improvement
The running duration of the forecasting revenue change query on Windows server 2012 RC is 5.3% faster in cold cache test than on Windows server 2008 R2 (no significant difference observed in the Warm cache scenario):
Scenario 3: Promotion Effect Query
This query monitors the market response to a promotion such as a TV commercial or a special campaign.
Cold Cache Test
High performance & green server solution is 13.3 times faster than standard & non-green server solution and consumes 94% less power to analyze promotion effect query against 1TB database.
Warm Cache Test
High performance & green server solution consumes 23% less power than standard & non-green server solution to analyze promotion effect query.
OS Performance Improvement
The running duration of the promotion effect query on Windows server 2012 RC is 4.8% faster in cold cache test and 17.3% faster in warm cache test than on Windows server 2008 R2.
Scenario 4: Partial Part Promotion Query
This query identifies suppliers in a particular nation with parts that may be candidates for a promotional offer.
Cold Cache Test
High performance & green server solution is 17.1 times faster than standard & non-green server solution and consumes 95% less power to analyze potential part promotion query against 1TB database.
Warm Cache Test
High performance & green server solution consumes 33% less power than standard & non-green server solution to analyze potential part promotion query.
OS Performance Improvement
The running duration of the potential part promotion effect query on Windows server 2012 RC is 12.4% faster in cold cache test than on Windows server 2008 R2 (no differences observed on the Warm cache scenario):
Summary of Results
Cold Cache Test
Cold cache test primarily highlights SSD performance. Memory usage is only 5%, which means all the data are located on disks before the benchmark starts:
The "High performance & green server" takes 11 minutes and 8 seconds and consumes 126Wh to run 4 queries of TPC-H for decision support (DSS), while standard & non-green server takes 2 hours 41 minutes and 33 seconds and consumes 2,196Wh.
"High performance & green server" is 14.5 times faster and saves 94% of system power consumption.
In this scenario we definitely see the leverage of SSD vs. HDD.
Warm Cache Test
Warm cache test highlights the benefits of in-memory computing and power saving of 20nm-class DDR3 modules. Memory usage is 95%, which means almost all the data is located in memory before the benchmark starts:
"High performance & green server" takes 1 minute and 45 seconds and consumes 23Wh to run 4 queries of TPC-H for decision support (DSS), while standard & non-green server takes 1 minute 47 seconds and consumes 32Wh.
High performance & green server is only 2% faster than the standard and non green configuration since almost all the data is located in memory, but it still saves 28% of system power consumption.
Dual Boot Test
Windows server 2012 RC (Release Candidate) shows 6.5% faster performance in cold cache test and 7.1% faster performance in warm cache test compared to Windows Server 2008 R2 SP1, as illustrated by the picture below:
Test Environment
Conclusion
With the exception of warm cache testing, the High Performance & Green Server significantly outperformed the Standard & Non-Green Server solution in both speed and power savings. Samsung's new flash-based SSDs offering access times in microseconds offer organizations reasonable solutions, based on a good compromise in the cost-to-performance ratio.