DNNFusion: Accelerating deep neural networks execution with advanced operator fusion

Wei Niu; Jiexiong Guan; Yanzhi Wang; Gagan Agrawal; Bin Ren

doi:10.1145/3453483.3454083

DNNFusion: Accelerating deep neural networks execution with advanced operator fusion

Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, Bin Ren

Computer & Cyber Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

48 Scopus citations

Abstract

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8 × higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3× speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.

Original language	English (US)
Title of host publication	PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation
Editors	Stephen N. Freund, Eran Yahav
Publisher	Association for Computing Machinery
Pages	883-898
Number of pages	16
ISBN (Electronic)	9781450383912
DOIs	https://doi.org/10.1145/3453483.3454083
State	Published - Jun 18 2021
Event	42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021 - Virtual, Online, Canada Duration: Jun 20 2021 → Jun 25 2021

Publication series

Name	Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

Conference

Conference	42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021
Country/Territory	Canada
City	Virtual, Online
Period	6/20/21 → 6/25/21

Keywords

Compiler Optimization
Deep Neural Network
Mobile Devices
Operator Fusion

ASJC Scopus subject areas

Software

Access to Document

10.1145/3453483.3454083

Cite this

Niu, W., Guan, J., Wang, Y., Agrawal, G., & Ren, B. (2021). DNNFusion: Accelerating deep neural networks execution with advanced operator fusion. In S. N. Freund, & E. Yahav (Eds.), PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (pp. 883-898). (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)). Association for Computing Machinery. https://doi.org/10.1145/3453483.3454083

DNNFusion: Accelerating deep neural networks execution with advanced operator fusion. / Niu, Wei; Guan, Jiexiong; Wang, Yanzhi et al.
PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. ed. / Stephen N. Freund; Eran Yahav. Association for Computing Machinery, 2021. p. 883-898 (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Niu, W, Guan, J, Wang, Y, Agrawal, G & Ren, B 2021, DNNFusion: Accelerating deep neural networks execution with advanced operator fusion. in SN Freund & E Yahav (eds), PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Association for Computing Machinery, pp. 883-898, 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021, Virtual, Online, Canada, 6/20/21. https://doi.org/10.1145/3453483.3454083

Niu W, Guan J, Wang Y, Agrawal G, Ren B. DNNFusion: Accelerating deep neural networks execution with advanced operator fusion. In Freund SN, Yahav E, editors, PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. Association for Computing Machinery. 2021. p. 883-898. (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)). doi: 10.1145/3453483.3454083

Niu, Wei ; Guan, Jiexiong ; Wang, Yanzhi et al. / DNNFusion : Accelerating deep neural networks execution with advanced operator fusion. PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. editor / Stephen N. Freund ; Eran Yahav. Association for Computing Machinery, 2021. pp. 883-898 (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)).

@inproceedings{e8ccacf1b97a4af8982ac38b2f2fedd6,

title = "DNNFusion: Accelerating deep neural networks execution with advanced operator fusion",

abstract = "Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8 × higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3× speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.",

keywords = "Compiler Optimization, Deep Neural Network, Mobile Devices, Operator Fusion",

author = "Wei Niu and Jiexiong Guan and Yanzhi Wang and Gagan Agrawal and Bin Ren",

note = "Publisher Copyright: {\textcopyright} 2021 ACM.; 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021 ; Conference date: 20-06-2021 Through 25-06-2021",

year = "2021",

month = jun,

day = "18",

doi = "10.1145/3453483.3454083",

language = "English (US)",

series = "Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)",

publisher = "Association for Computing Machinery",

pages = "883--898",

editor = "Freund, {Stephen N.} and Eran Yahav",

booktitle = "PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation",

}

TY - GEN

T1 - DNNFusion

T2 - 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021

AU - Niu, Wei

AU - Guan, Jiexiong

AU - Wang, Yanzhi

AU - Agrawal, Gagan

AU - Ren, Bin

PY - 2021/6/18

Y1 - 2021/6/18

N2 - Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8 × higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3× speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.

AB - Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8 × higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3× speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.

KW - Compiler Optimization

KW - Deep Neural Network

KW - Mobile Devices

KW - Operator Fusion

UR - http://www.scopus.com/inward/record.url?scp=85108902126&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85108902126&partnerID=8YFLogxK

U2 - 10.1145/3453483.3454083

DO - 10.1145/3453483.3454083

M3 - Conference contribution

AN - SCOPUS:85108902126

T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

SP - 883

EP - 898

BT - PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

A2 - Freund, Stephen N.

A2 - Yahav, Eran

PB - Association for Computing Machinery

Y2 - 20 June 2021 through 25 June 2021

ER -

DNNFusion: Accelerating deep neural networks execution with advanced operator fusion

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this