[1]
A. Benczur, "The digital universe – an information theoretical analyses", In: Proceedings of the 14th International Conference on Computer Systems and Technologies, CompSysTech’13, New York, NY, USA, 2013, pp. 1-10.
[2]
Intel Corporation. Intel Xeon Phi Processor 7290. Available from:, https://ark.intel.com/products/95831/Intel-Xeon-Phi-Processor-7290F-16GB-1_50-GHz-72-core,2018.\newblock Accessed on 2018- 06-30.
[3]
Mellanox Technologies.TILE-Gx72 Processor, 2018. Available from:, http://www.mellanox.com/page/products_dyn?product_family=238&mtag=tile_gx72 (Accessed: 30th Jun 2018)
[4]
"MIT Computer Science and Artificial Intelligence Laboratory. The Angstrom Project. Available from:", http://projects.csail.mit.edu/angstrom (Accessed: 23rd Jan 2012).
[5]
D. Modha, "The brains architecture, efficiency on a chip", Cognitive Computing, IBM Research-Almaden, Systems, 2016. Available from : .https://www.ibm.com/blogs/research/2016/12/the-brains-architecture-efficiency-on-a-chip/ Accessed on 4th Nov 2018.
[6]
N. Muralimanohar, and R. Balasubramonian, "Cacti 6.0: A tool to understand large caches. Available from:", http://citeseerx. ist.psu.edu/viewdoc/summary?doi=10.1.1.147.3834 (Accessed on 4th Nov 2018).
[7]
C. Kim, D. Burger, and S.W. Keckler, "An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches", SIGARCH Comput. Archit. News, vol. 30, no. (5), pp. 211-222, 2002.
[8]
Z. Chishti, M.D. Powell, and T.N. Vijaykumar, "Distance associativity for high-performance energy-efficient non-uniform cache architectures", In: Proceedings 36th Annual IEEE/ACM International Symposium on Microarchitecture, San Diego, CA, USA, 2003, pp. 55-66.
[9]
Y. Wang, L. Zhang, Y. Han, H. Li, and X. Li, "Address remapping for static nuca in noc-based degradable chip-multiprocessors", In: 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing, pp. 70-76. 2010
[10]
H. Dybdahl, and P. Stenstrom, "An adaptive shared/private NUCA cache partitioning scheme for chip multiprocessors", In: 2007 IEEE 13th International Symposium on High Performance Computer Architecture, Washington, DC, USA, 2007, pp. 2-12.
[11]
B.M. Beckmann, and D.A. Wood, "Managing wire delay in large chip-multiprocessor caches", In: Microarchitecture 37th International Symposium on Microarchitecture, Washington, DC, USA, pp. 319-330. 2004
[12]
A. Arora, M. Harne, H. Sultan, A. Bagaria, and S.R. Sarangi, "FP-NUCA: A fast NoC layer for implementing large nuca caches", IEEE Trans. Parallel Distributed. Syst., vol. 26, no. (9), pp. 2465-2478, 2015.
[13]
Y. Jin, E.J. Kim, and K.H. Yum, "Design and analysis of on-chip networks for large-scale cache systems", IEEE Trans. Comp., vol. 59, no. (3), pp. 332-344, 2010.
[14]
M. Zhang, and K. Asanovic, "Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors", In: Proceedings of the 32nd Annual International Symposium on Computer Architecture, ISCA ’05, Washington, DC, USA, 2005, pp. 336-345.
[15]
J. Merino, V. Puente, P. Prieto, and J.N. Gregorio, "SP NUCA: A cost effective dynamic non-uniform cache architecture", ACM SIGARCH Comp. Architec. News, vol. 36, pp. 64-71, 2008.
[16]
J. Merino, V. Puente, and J.A. Gregorio, "ESP-NUCA: A low-cost adaptive non-uniform cache architecture", In: HPCA - 16 2010 The 16th International Symposium on High-Performance Computer Architecture, Bangalore, India,, 2010, pp. 1-10.
[17]
R.M. Yoo, C.J. Hughes, C. Kim, Y-K. Chen, and C. Kozyrakis, "Locality-aware task management for unstructured parallelism: A quantitative limit study", In: Proceedings of the Twenty-fifth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’13, New York, NY, USA, 2013, pp. 315-325.
[18]
M. Wang, and Z. Li, "“A spatial and temporal locality-aware adaptive cache design with network optimization for tiled many-core architectures”, IEEE Trans. Very Large Scale Integration (VLSI)", Syst, vol. 25, no. (9), pp. 2419-2433, 2017.
[19]
M. Zahran, and S.A. McKee, Global management of cache hierarchiesInternational Conference on Computing Frontiers, New York, NY, USA, 2010, pp. 131-140.
[20]
M. S. Souahi, S. Niar, M. Zahran, and M. Benmohammed, "Towards dynamic cache block placement for multi-processor NUCA", In: ICM 2011 Proceeding, Penang, Malaysia, 2011, pp. 1-3.
[21]
J. Liao, and S. Chen, "Optimization of reading data via classified block access patterns in file systems", IEEE Access, vol. 4, pp. 9421-9427, 2016.
[22]
M. Soltaniyeh, I. Kadayif, and O. Ozturk, "“Classifying data blocks at subpage granularity with an on-chip page table to improve coherence in tiled cmps”", IEEE Trans. Computer-Aided Design Integrated Circuits Syst, vol. 37, no.
(4), pp. 806-819, 2018.
[23]
A. Jadidi, M. Arjomand, M.T. Kandemir, and C.R. Das, "Hybrid-comp: A criticality-aware compressed last-level cache", In: 2018 19th International Symposium on Quality Electronic Design (ISQED), pp. 25-30. 2018
[24]
M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, and R.B. Brown, "Mibench: A free, commercially representative embedded benchmark suite", In: Proceedings of the Workload Characterization, WWC-4. , 2001 IEEE International Workshop, WWC ’01, Washington, DC, USA, , 2001, pp. 3-14.
[25]
S. Bartolini, P. Foglia, and C.A. Prete, "Exploring the relationship between architectures and management policies in the design of nuca-based chip multicore systems", Future Gener. Comp. Syst., vol. 78, pp. 481-501, 2018.
[26]
A. Pathania, and J. Henkel, "Task scheduling for many-cores with s-nuca caches", In: 2018 Design, Automation Test in Europe Conference Exhibition (DATE),, pp. 557-562. 2018
[27]
A.K. Ziabari, R.U. Tena, D. Schaa, and D. Kaeli, "A framework for visualization of opencl applications execution: A tutorial", In: Proceedings of the 3rd International Workshop on OpenCL, IWOCL’15, New York, NY, USA, 2015, p. 22:. 1-22:2