Open Access Repository
Cache performance in multithreaded processor architectures
![]() |
PDF
(Whole thesis)
whole_NeumeyerP...pdf | Request a copy Full text restricted Available under University of Tasmania Standard License. |
Abstract
Multithreading techniques used within computer processors aim to provide the
computer system with a means to tolerate long latency operations and also
dynamically convert variable software concurrency into the maximum parallelism in
hardware. To meet this challenge resources must be allocated to threads. Cache in
the memory hierarchy poses a problem because it is not an allocated resource. The
effect of interference in the cache between coactive threads because of this can lead
to poor performance results compared to the same threads executing without
interference on similar hardware.
The memory cache poses a resource management problem in a multithreaded
architecture because it does not distinguish between accesses from different threads
and thus is transparent to the memory model. Short lived or intermittent threads can
displace the cache lines still in active use by other threads. This interference in the
cache voids the benefit of the cache to those other threads on the subsequent access
to a displaced active cache line. Techniques to reduce this interference will increase
the performance achieved from multithreaded hardware.
We propose analytical models for the instantaneous miss rate of the cache that enable
the interference between threads to be determined. The interference between threads
is determined dependent on their dynamic interaction. Inputs to the models are the
cache design parameters and the memory behaviour of each thread. Sharing, latency
effects, cache design, and memory usage are investigated with the models. A type of
miss, the latency miss, is investigated. This category of miss has not been described
previously and occurs between threads with shared memory in non-blocking
multithreaded architectures. A proposal is also made for an approach called thread
pulling which is a scheduling scheme aimed at reducing interference by increasing
the locality of reference of concurrent threads.
New methods were determined that produce mathematically tractable measurements
of memory behaviour for the analytical cache miss rate models. The dependence on
arbitrary factors used while making the measurements was minimised via these
methods. Using these, the calculation of the working set of a thread at an instant
without requiring an arbitrary choice of the working set parameter is developed.
Two cache miss models are proposed, validated, and used to analyse the performance
of the cache under a range of multithreaded loads. First, a single threaded analytical
cache miss model at a point in time. Second, a multithreaded analytical cache miss
rate model using the working sets of the active threads at a point in time. Significant
agreement was found between the predictions of the models and measurements of the
real cache performance. Our analytical cache miss rate models provide insight to
computer architects and compiler writers toward performance related optimisations.
Item Type: | Thesis - PhD |
---|---|
Authors/Creators: | Neumeyer, Paul Grant |
Keywords: | Simultaneous multithreading processors, Cache memory |
Copyright Holders: | The Author |
Copyright Information: | Copyright 1999 the author - The University is continuing to endeavour to trace the copyright |
Additional Information: | Thesis (Ph.D.)--University of Tasmania, 2000. Includes bibliographical references |
Item Statistics: | View statistics for this item |
Actions (login required)
![]() |
Item Control Page |