Open Access Repository

Cache performance in multithreaded processor architectures

Neumeyer, Paul Grant 1999 , 'Cache performance in multithreaded processor architectures', PhD thesis, University of Tasmania.

[img] PDF (Whole thesis)
whole_NeumeyerP...pdf | Request a copy
Full text restricted
Available under University of Tasmania Standard License.


Multithreading techniques used within computer processors aim to provide the
computer system with a means to tolerate long latency operations and also
dynamically convert variable software concurrency into the maximum parallelism in
hardware. To meet this challenge resources must be allocated to threads. Cache in
the memory hierarchy poses a problem because it is not an allocated resource. The
effect of interference in the cache between coactive threads because of this can lead
to poor performance results compared to the same threads executing without
interference on similar hardware.
The memory cache poses a resource management problem in a multithreaded
architecture because it does not distinguish between accesses from different threads
and thus is transparent to the memory model. Short lived or intermittent threads can
displace the cache lines still in active use by other threads. This interference in the
cache voids the benefit of the cache to those other threads on the subsequent access
to a displaced active cache line. Techniques to reduce this interference will increase
the performance achieved from multithreaded hardware.
We propose analytical models for the instantaneous miss rate of the cache that enable
the interference between threads to be determined. The interference between threads
is determined dependent on their dynamic interaction. Inputs to the models are the
cache design parameters and the memory behaviour of each thread. Sharing, latency
effects, cache design, and memory usage are investigated with the models. A type of
miss, the latency miss, is investigated. This category of miss has not been described
previously and occurs between threads with shared memory in non-blocking
multithreaded architectures. A proposal is also made for an approach called thread
pulling which is a scheduling scheme aimed at reducing interference by increasing
the locality of reference of concurrent threads.
New methods were determined that produce mathematically tractable measurements
of memory behaviour for the analytical cache miss rate models. The dependence on
arbitrary factors used while making the measurements was minimised via these
methods. Using these, the calculation of the working set of a thread at an instant
without requiring an arbitrary choice of the working set parameter is developed.
Two cache miss models are proposed, validated, and used to analyse the performance
of the cache under a range of multithreaded loads. First, a single threaded analytical
cache miss model at a point in time. Second, a multithreaded analytical cache miss
rate model using the working sets of the active threads at a point in time. Significant
agreement was found between the predictions of the models and measurements of the
real cache performance. Our analytical cache miss rate models provide insight to
computer architects and compiler writers toward performance related optimisations.

Item Type: Thesis - PhD
Authors/Creators:Neumeyer, Paul Grant
Keywords: Simultaneous multithreading processors, Cache memory
Copyright Holders: The Author
Copyright Information:

Copyright 1999 the author - The University is continuing to endeavour to trace the copyright
owner(s) and in the meantime this item has been reproduced here in good faith. We
would be pleased to hear from the copyright owner(s).

Additional Information:

Thesis (Ph.D.)--University of Tasmania, 2000. Includes bibliographical references

Item Statistics: View statistics for this item

Actions (login required)

Item Control Page Item Control Page