There is of course a single core version of the Cortex A9. In order to enable good performance in smartphone multitasking you'll need multiple cores. The next stage in smartphone evolution is enabling usable multitasking through interfaces like what we saw on the Palm Pre. The first implementations announced by TI as well as NVIDIA are dual-core designs. Even NVIDIA's Tegra 1 used the ARM11 MPCore processor, but only used one of them on its SoC. Technically ARM11 could be used in multi-core environments, it just wasn't (at least not commonly). Multithreaded code is far more commonplace and thus we see that ARM's first out-of-order processor is also multi-core capable. The world is very different today than it was when the Pentium Pro first came out. ARM took that same evolutionary step going from the Cortex A8 to A9. The Pentium Pro brought out of order execution into the mix. The Pentium processor was Intel's last high end in-order chip. What we're seeing is repetition of the sort of evolution we had in the desktop microprocessor, just on a much smaller scale. The L2 can run at the CPU's clock speed or for extremely high clocked versions of the A9 it can run at a divider. I'd expect L2 sizes to stay at around 256KB or 512KB. The architecture can scale up to 8MB of L2, but it seems a bit excessive. A shared L2 makes sense, especially with a dual-core design. The L2 cache is shared by all cores on the SoC. I'd expect these to be 32KB in size (each) just as they are today on the A8s. Mispredicted branches have a much lower performance and power impact on shallow pipelines than they do on deep ones.Įach Cortex A9 MPCore has its own private L1 instruction and data caches. The shallower pipeline is very important for keeping power consumption low. I'm talking iPhone to 3GS levels of performance improvement. Two cores together running multithreaded code and now you're looking at multiples of Cortex A8 performance. That's what NVIDIA's doing at first with Tegra 2. At 40nm there's enough room to cram two of these out of order cores on a single SoC. Given that most A8 implementations have been at or below 600MHz (1200 DMIPS), and TI's A9s are running at 750MHz or 1GHz (1875 DMIPS or 2500 DMIPS) I'd expect anywhere from a 30 - 100% performance improvement over existing Cortex A8 designs. ARM estimates that the A8 can do up to 2 DMIPS per MHz (or 2000 DMIPS at 1GHz), whereas the A9 can do 2.5 DMIPS per MHz (2500 DMIPS at 1GHz). At the same clock speed, A9 should destroy A8. It also has an out of order execution engine, allowing it to also do more per clock. Cortex A9 has a shallower pipeline compared to A8, so it does more per clock. Privately I've heard that designs scaling beyond 2GHz, especially at 28nm, are going to be possible. And even ARM is willing to supply Cortex A9 designs that can run at up to 2GHz on TSMC's 40nm process. TI is going to be shipping a 750MHz and 1GHz SoC based on the Cortex A9. What's even more ridiculous are the frequencies you can get out of this core. It's still a dual-issue pipeline, but instructions can execute out of order. The Cortex A9 goes back down to an 8-stage pipeline. Doubling issue width increased IPC (instructions per clock) and the deeper pipeline gave it frequency headroom. The A8 has a dual-issue in-order 13-stage integer pipeline. The Cortex A8 was announced in 2005 and doubled the front end with. The ARM11 core was introduced in 2003 and featured a single-issue 8-stage integer pipeline. I'm not used to seeing so much pipeline variance between microprocessor cores. NVIDIA won't talk about Tegra GPU architecture, but ARM is more than willing to talk about the Cortex A9.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |