Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Parallel

Microarchitecture Performance


Increasing the Efficiency of Out-of-order Processing

A traditional challenge in speeding up memory access is the ambiguity inherent in prefetching data from memory. Ambiguity is one of the main reasons there is latency in out-of-order processing.

New, advanced, memory disambiguation resolves this by providing execution cores with the built-in intelligence to speculatively load data for instructions that are about to execute--before all previous store instructions are completed.

In implementations without memory disambiguation, each load instruction that needs to read data from main memory must wait until all previous store instructions are completed before it can read that data in. Loads can't be rescheduled ahead of stores because the microprocessor doesn't know if it might violate data-location dependencies. Yet in many cases, loads don't depend on a previous store.

Memory disambiguation uses special, intelligent algorithms to evaluate whether or not a load can be executed ahead of a preceding store. If the system intelligently speculates that it can prefetch the data, then the load instructions are scheduled before the store instructions. The processor spends less time waiting and more time processing. To avoid putting additional requirements on the system, disambiguation is done during periods when the system bus and memory subsystems have spare bandwidth available.

In the rare event that a load is invalid, memory disambiguation has built-in intelligence to detect the conflict, reload the correct data, and reexecute the instruction.

Memory disambiguation is a sophisticated technique that helps avoid the wait states imposed by less capable microarchitectures. The result is faster execution and more efficient use of processor resources.

Doubling the Number of Prefetchers

Microarchitectures based on the new 65-nm process are also doubling the number of advanced prefetchers available per cache. Prefetchers do just that--they "prefetch" memory contents before the data is requested, so the data can be placed in cache and readily accessed when needed. By increasing the number of loads that occur from cache as opposed to main memory, these microarchitectures reduce memory latency and improve performance.

Specifically, to ensure data is where each execution core needs it, there are now two prefetchers per L1 cache and two prefetchers per L2 cache. These prefetchers detect multiple streaming and strided access patterns simultaneously. This lets them ready data in the L1 cache for "just-in-time" execution. The prefetchers for the L2 cache analyze accesses from cores to help make sure the L2 cache holds the data which the cores may need in the future.

The combination of advanced prefetchers and memory disambiguation delivers significantly improved execution throughput. The result is better performance through the highest possible instruction-level parallelism.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.