Accelerating Synchronization on Futuristic 1000-cores Multicore Processor with Moving Compute to Data Model
Digital Document
Document
Handle |
Handle
http://hdl.handle.net/11134/20002:860655934
|
||||||
---|---|---|---|---|---|---|---|
Persons |
Persons
Creator (cre): Dogan, Halit
Major Advisor (mja): Khan, Omer
Associate Advisor (asa): Chandy, John
Associate Advisor (asa): Van Dijk, Marten
|
||||||
Title |
Title
Title
Accelerating Synchronization on Futuristic 1000-cores Multicore Processor with Moving Compute to Data Model
|
||||||
Origin Information |
Origin Information
|
||||||
Parent Item |
Parent Item
|
||||||
Resource Type |
Resource Type
|
||||||
Digital Origin |
Digital Origin
born digital
|
||||||
Description |
Description
Single chip multicore processors are now prevalent and processors with hundreds of cores are being proposed and explored by both academia and industry. Shared memory cache coherence is the state-of-the-art technology for these processors to enable synchronization and communication between cores. However, since the synchronization of cores on shared data using hardware cache coherence suffers from instruction retries and cache line ping-pong overheads, it prevents performance scaling as core counts increase on a chip. This thesis proposes to utilize a novel moving computation to data model (MC) to overcome this synchronization bottleneck in a 1000-cores scale shared memory multicore processor. The proposed MC model pins shared data to dedicated cores called service cores. The execution of critical code sections is explicitly requested from worker cores to be performed at the service cores. In this way, the cache line bouncing between cores is prevented, hence data locality optimization is enabled. The proposed MC model utilizes auxiliary in-hardware explicit messaging for the critical section requests to enable efficient fine-grained blocking and non-blocking communication between communicating cores. To show the effectiveness of the proposed model, workloads with wide range of synchronization requirements from graph analytics, machine learning and database domains are implemented. The proposed model is then prototyped and exhaustively evaluated on a 72 core machine, Tilera Tile-Gx72 multicore platform, as it incorporates in-hardware core-to-core messaging as an auxiliary capability to the shared memory cache coherence paradigm. Since the Tile-Gx72 machine includes only 72 cores, it is deployed for evaluation at 8 to 64 core count scale. For further analysis at higher core count, a simulated RISC-V multicore environment is built and utilized, and the performance and dynamic energy scaling advantages of the MC model is evaluated against various baseline synchronization models up to 1024 cores.
|
||||||
Genre |
Genre
|
||||||
Organizations |
Organizations
Degree granting institution (dgg): University of Connecticut
|
||||||
Held By | |||||||
Use and Reproduction |
Use and Reproduction
These Materials are provided for educational and research purposes only.
|
||||||
Note |
Note
|
||||||
Degree Name |
Degree Name
Doctor of Philosophy
|
||||||
Degree Level |
Degree Level
Doctoral
|
||||||
Degree Discipline |
Degree Discipline
Electrical Engineering
|
||||||
Local Identifier |
Local Identifier
OC_d_2026
|