TY - GEN
T1 - GCD2
T2 - 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
AU - Niu, Wei
AU - Guan, Jiexiong
AU - Shen, Xipeng
AU - Wang, Yanzhi
AU - Agrawal, Gagan
AU - Ren, Bin
N1 - Funding Information:
The authors would like to thank the anonymous reviewers for their constructive comments and helpful suggestions. This work was supported in part by National Science Foundation (NSF) under the awards of CCF-2047516 (CAREER), CCF-2146873, CCF-2232813, CCF-2146852, CCF-2131509, CCF-2034850, and CCF-2007793, and Army Research Office/Army Research Laboratory via grant W911-NF-20-1-0167 to Northeastern University. Any errors and opinions are not those of the NSF, Army Research Office, or Department of Defense, and are attributable solely to the author(s).
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - More specialized chips are exploiting available high transistor density to expose parallelism at a large scale with more intricate instruction sets. This paper reports on a compilation system GCD2, developed to support complex Deep Neural Network (DNN) workloads on mobile DSP chips. We observe several challenges in fully exploiting this architecture, related to SIMD width, more complex SIMD/vector instructions, and VLIW pipeline with the notion of soft dependencies. GCD2 comprises the following contributions: 1) development of matrix layout formats that support the use of different novel SIMD instructions, 2) formulation and solution of a global optimization problem related to choosing the best instruction (and associated layout) for implementation of each operator in a complete DNN, and 3) SDA, an algorithm for packing instructions with consideration for soft dependencies. These solutions are incorporated in a complete compilation system that is extensively evaluated against other systems using 10 large DNN models. Evaluation results show that GCD2 outperforms two product-level state-of-the-art end-to-end DNN execution frameworks (TFLite and Qualcomm SNPE) that support mobile DSPs by up to 6.0 × speedup, and outperforms three established compilers (Halide, TVM, and RAKE) by up to 4.5 ×, 3.4 × and 4.0 × speedup, respectively. GCD2 is also unique in supporting, real-time execution of certain DNNs, while its implementation enables two major DNNs to execute on a mobile DSP for the first time.
AB - More specialized chips are exploiting available high transistor density to expose parallelism at a large scale with more intricate instruction sets. This paper reports on a compilation system GCD2, developed to support complex Deep Neural Network (DNN) workloads on mobile DSP chips. We observe several challenges in fully exploiting this architecture, related to SIMD width, more complex SIMD/vector instructions, and VLIW pipeline with the notion of soft dependencies. GCD2 comprises the following contributions: 1) development of matrix layout formats that support the use of different novel SIMD instructions, 2) formulation and solution of a global optimization problem related to choosing the best instruction (and associated layout) for implementation of each operator in a complete DNN, and 3) SDA, an algorithm for packing instructions with consideration for soft dependencies. These solutions are incorporated in a complete compilation system that is extensively evaluated against other systems using 10 large DNN models. Evaluation results show that GCD2 outperforms two product-level state-of-the-art end-to-end DNN execution frameworks (TFLite and Qualcomm SNPE) that support mobile DSPs by up to 6.0 × speedup, and outperforms three established compilers (Halide, TVM, and RAKE) by up to 4.5 ×, 3.4 × and 4.0 × speedup, respectively. GCD2 is also unique in supporting, real-time execution of certain DNNs, while its implementation enables two major DNNs to execute on a mobile DSP for the first time.
KW - compiler optimization
KW - deep neural network
KW - mobile devices
KW - VLIW instruction packing
UR - http://www.scopus.com/inward/record.url?scp=85141665744&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141665744&partnerID=8YFLogxK
U2 - 10.1109/MICRO56248.2022.00044
DO - 10.1109/MICRO56248.2022.00044
M3 - Conference contribution
AN - SCOPUS:85141665744
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 512
EP - 529
BT - Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
PB - IEEE Computer Society
Y2 - 1 October 2022 through 5 October 2022
ER -