Methods for Efficient Data Access and Communication in Many-core Architectures
Digital Document
Handle |
Handle
http://hdl.handle.net/11134/20002:860652989
|
||||||
---|---|---|---|---|---|---|---|
Persons |
Persons
Creator (cre): Hijaz, Farrukh
Major Advisor (mja): Khan, Omer
Associate Advisor (asa): Chandy, John
Associate Advisor (asa): Van Dijk, Marten
|
||||||
Title |
Title
Title
Methods for Efficient Data Access and Communication in Many-core Architectures
|
||||||
Origin Information |
Origin Information
|
||||||
Parent Item |
Parent Item
|
||||||
Resource Type |
Resource Type
|
||||||
Digital Origin |
Digital Origin
born digital
|
||||||
Description |
Description
The trend of increasing processor performance by boosting frequency has been halted due to excessive power dissipation. However, transistor density has continued to grow which has enabled integration of many cores on a single chip to meet the performance requirements of future applications. Scaling to hundreds of cores on a single chip present a number of challenges, mainly efficient data access and on-chip communication. Near-threshold voltage (NTV) operation has been identified as the most energy efficient region to operate in. Running at NTV can facilitate efficient data access, however, it introduces bit-cell faults in the SRAMs which needs to be dealt with. Another avenue to extract data access efficiency is by improving on-chip data locality. Shared memory abstraction dominates the traditional small computer and embedded space due to its ease of programming. For efficiency, shared memory is often implemented with hardware support for synchronization and cache coherence among the cores. However, accesses to shared data with frequent writes results in wasteful invalidations, synchronous write-backs, and cache line ping-pong leading to low spatio-temporal locality. Moreover, communication through coherent caches and shared memory primitives is inefficient because it can take many instructions to coordinate between cores. This thesis focuses on mitigating the effects of the data access and communication challenges and make architectural contributions to enable efficient and scalable many-core processors. The main idea is to minimize data movement and make each necessary data access more efficient. In this regard, a novel private level-1 cache architecture is presented to enable efficient and fault-free operation at near-threshold voltages. To better exploit data locality, a last-level cache (LLC) data replication scheme is proposed that co-optimizes data locality and off-chip miss rate. It utilizes an in-hardware predictive mechanism to classify data and only replicate high reuse data in the local LLC bank. Finally, a hybrid shared memory, explicit messaging architecture is proposed to enable efficient on-chip communication. In this architecture the shared memory model is retained, however, a set of lightweight in-hardware explicit message passing style instructions are introduced in the instruction set architecture (ISA) that enable efficient movement of computation to where data is located.
|
||||||
Genre |
Genre
|
||||||
Organizations |
Organizations
Degree granting institution (dgg): University of Connecticut
|
||||||
Held By | |||||||
Rights Statement |
Rights Statement
|
||||||
Use and Reproduction |
Use and Reproduction
These materials are provided for educational and research purposes only.
|
||||||
Local Identifier |
Local Identifier
OC_d_1119
|