ARM® TechCon™ 2012: Heterogeneous big.LITTLE™ Processing on Exynos

January 23, 2013

Given that ARM® big.LITTLE™ processing is a leading mobile systems technology; it's no surprise that a session focusing on the topic was part of this year's ARM® TechCon™. Samsung Electronics' William Kang led a session explaining heterogeneous big.LITTLE processing and its implications for future Exynos systems-on-chip (SoCs). We've reviewed the highlights of his presentation, as well as Samsung's implementation of this exciting technology.

Samsung Semiconductor Exynos, arm techcon 2012 heterogeneous biglittle processing on exynos 01

Why big.LITTLE Processing?
Before delving into a technical explanation of big.LITTLE, William Kang addressed the reasons why we increasingly need this type of high-performance, low-power technology. Our smart mobile devices provide us with a PC-like user experience, offering everything from web browsing to high-quality multimedia playback. With the increase in computational expectations, there has to be an increase in processor performance; however, smart devices are mobile, making battery life a major concern.

The dilemma lies in the advancement of battery technology, which is drastically behind performance requirements in mobile devices. App trends are becoming increasingly intensive, with many users expecting full web browsing, 3D game rendering and augmented reality capabilities - all on displays that continue to push resolution higher. If battery capacity isn't advancing fast enough to keep up with performance needs, then how do we solve the power discrepancy?

Kang explained that big.LITTLE processing is part of the answer, in combination with power optimization at different levels of the system. Many of these power-saving features are already part of the Exynos architecture, such as Panel Self-Refresh, Dynamic Voltage/Frequency Scaling (DV/FS) and system-level caching. Process technologies like High K Metal Gate and FinFET are also designed to increase performance while optimizing power usage, making a significant impact on system efficiency as a whole. So what exactly does big.LITTLE processing add to the mix?

What is big.LITTLE Processing?

Samsung Semiconductor Exynos, arm techcon 2012 heterogeneous biglittle processing on exynos 02

ARM® big.LITTLE processing involves a single SoC architecture that contains two types of CPUs - one "big" CPU cluster (Cortex™-A15) that specializes in high-performance tasks and one "LITTLE" CPU cluster (Cortex™-A7) that excels in power management. Depending on the intensity of the task at hand, the system can switch workloads between the two types of CPUs to achieve optimal power efficiency. When making a voice call, texting or sending an email, the system can rely on the Cortex-A7 CPU to execute the tasks while minimizing power usage. For web browsing, flash playback or intensive gaming, the powerhouse Cortex-A15 CPU takes over the task.

The big CPU cluster offers the highest performance in the mobile power envelope. Using a complex, out-of-order, multi-issue pipeline, the Cortex-A15 boasts up to five times the performance capability of today's mainstream smartphone processors. On the other hand, the LITTLE CPU cluster is the most energy-efficient CPU that ARM® has to offer. With a simple, in-order, 8-stage pipeline, the Cortex-A7 is not only extremely power-efficient, it's also capable of performing many tasks with ease. Simply put, big.LITTLE is all about using the right processor for the right job.

Why Does big.LITTLE Processing Work?
Many people initially assume that the big CPU cluster does all of the important work. If that were the case, big.LITTLE processing wouldn't be as successful at conserving power as it is. The power-efficient Cortex-A7 cluster can handle many different workloads all on its own, so it truly pulls its weight within the system. This is why big.LITTLE is simultaneously competitive in both power and performance.

Big.LITTLE processing has another significant benefit - flexibility. There are multiple software configuration options, even when using the same hardware and architecture. With CPU migration, which involves a modified DV/FS driver, only one core of the big.LITTLE pair can be on at one time. With cluster migration, however, one cluster (big or LITTLE) can be on at one time. The increased usage of the A15 CPU in this latter configuration results in a limited power benefit in comparison to CPU migration. Finally, with multi-processing, any combination of cores can be on at any time, necessitating a modified CPU scheduler.

Implementation of big.LITTLE Processing
In order to implement big.LITTLE processing in a mobile device, there is a series of system changes that must be made to CPU architecture, OS software and the build of the SoC itself. In fact, engineers have to create a whole infrastructure to support big.LITTLE, which requires a lot of investment and effort. Conversely, device manufacturers don't have to do anything differently for the mobile devices themselves or the applications they run.

One of the most important components that's needed is a cache coherent interconnect among the CPU clusters. This component diminishes switching time, or the time between data moving from the outbound processor to the inbound processor. This smart interconnect reduces DRAM accesses during core switching through snooping, and enables faster cache warm-up for the inbound core. By bringing switching time down to 30-50 milliseconds, cache coherency ensures that a device will realize the full benefits of big.LITTLE; without it, switching time would be too high and performance would be disrupted.

Big.LITTLE Results with Exynos

Samsung Semiconductor Exynos, arm techcon 2012 heterogeneous biglittle processing on exynos 03

Samsung has already implemented big.LITTLE processing in its family of mobile processors and this will be released soon. Big.LITTLE architecture also performs at an A15-comparable level, meaning that there is no observable difference in performance between a big.LITTLE configuration and an A15-only build, but the difference in power savings is substantial. Exynos processors rely on big.LITTLE architecture because mobile SoCs using the technology boast the highest performance/watt among all mobile devices on the market.

What does all of this mean for the common smartphone user? If seamless performance and extended battery life are important to you; if you love high-quality media or intensive games, but hate to see battery power leak away; if you want a reliable, powerhouse of a mobile device, then you should choose a smartphone or tablet with the technological means to meet those requirements, one that runs on an Exynos big.LITTLE mobile application processor.