Methods for Efficient Data Access and Communication in Many-core Architectures

Metadata

Handle

http://hdl.handle.net/11134/20002:860652989

Persons

Creator (cre): Hijaz, Farrukh

Major Advisor (mja): Khan, Omer

Associate Advisor (asa): Chandy, John

Associate Advisor (asa): Van Dijk, Marten

Title

Methods for Efficient Data Access and Communication in Many-core Architectures

Origin Information

Event Place	Storrs, CT
Date Created	2016
Publisher	University of Connecticut

Parent Item

Dissertations

Resource Type

Text

Digital Origin

born digital

Description

The trend of increasing processor performance by boosting frequency has been halted due to excessive power dissipation. However, transistor density has continued to grow which has enabled integration of many cores on a single chip to meet the performance requirements of future applications. Scaling to hundreds of cores on a single chip present a number of challenges, mainly efficient data access and on-chip communication. Near-threshold voltage (NTV) operation has been identified as the most energy efficient region to operate in. Running at NTV can facilitate efficient data access, however, it introduces bit-cell faults in the SRAMs which needs to be dealt with. Another avenue to extract data access efficiency is by improving on-chip data locality. Shared memory abstraction dominates the traditional small computer and embedded space due to its ease of programming. For efficiency, shared memory is often implemented with hardware support for synchronization and cache coherence among the cores. However, accesses to shared data with frequent writes results in wasteful invalidations, synchronous write-backs, and cache line ping-pong leading to low spatio-temporal locality. Moreover, communication through coherent caches and shared memory primitives is inefficient because it can take many instructions to coordinate between cores. This thesis focuses on mitigating the effects of the data access and communication challenges and make architectural contributions to enable efficient and scalable many-core processors. The main idea is to minimize data movement and make each necessary data access more efficient. In this regard, a novel private level-1 cache architecture is presented to enable efficient and fault-free operation at near-threshold voltages. To better exploit data locality, a last-level cache (LLC) data replication scheme is proposed that co-optimizes data locality and off-chip miss rate. It utilizes an in-hardware predictive mechanism to classify data and only replicate high reuse data in the local LLC bank. Finally, a hybrid shared memory, explicit messaging architecture is proposed to enable efficient on-chip communication. In this architecture the shared memory model is retained, however, a set of lightweight in-hardware explicit message passing style instructions are introduced in the instruction set architecture (ISA) that enable efficient movement of computation to where data is located.

Genre

doctoral dissertations

Organizations

Degree granting institution (dgg): University of Connecticut

Held By

Archives & Special Collections, University of Connecticut Library

Rights Statement

IN COPYRIGHT

Use and Reproduction

These materials are provided for educational and research purposes only.

Local Identifier

OC_d_1119