Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation

Peng Jiang; Yang Xia; Gagan Agrawal

doi:10.1145/3399714

Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation

Peng Jiang, Yang Xia, Gagan Agrawal

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Finite-state Machine (FSM) is the key kernel behind many popular applications, including regular expression matching, text tokenization, and Huffman decoding. Parallelizing FSMs is extremely difficult because of the strong dependencies and unpredictable memory accesses. Previous efforts have largely focused on multi-core parallelization and used different approaches, including speculative and enumerative execution, both of which have been effective but also have limitations. With increasing width and improving flexibility in SIMD instruction sets, this article focuses on combining SIMD and many/multi-core parallelism for FSMs. We have developed a novel strategy, called enumerative speculation. Instead of speculating on a single state as in speculative execution or enumerating all possible states as in enumerative execution, our strategy speculates transitions from several possible states, reducing the prediction overheads of speculation approach and the large amount of redundant work in the enumerative approach. A simple lookback approach produces a set of guessed states to achieve high speculation success rates in our enumerative speculation. In addition, to enable continued scalability of enumerative speculation with a large number of threads, we have developed a parallel merge method. We evaluate our method with four popular FSM applications: Huffman decoding, regular expression matching, HTML tokenization, and Div7. We obtain up to 2.5× speedup using SIMD on 1 core and up to 95× combining SIMD with 60 cores of an Intel Xeon Phi. On a single core, we outperform the best single-state speculative execution version by an average of 1.6×, and in combining SIMD and many-core parallelism, outperform enumerative execution by an average of 2×. Finally, when evaluate on a GPU, we show that our parallel merge implementations are 2.02 - 6.74× more efficient than corresponding sequential merge implementations and achieve better scalability on an Nvidia V100 GPU.

Original language	English (US)
Article number	15
Pages (from-to)	1-26
Journal	ACM Transactions on Parallel Computing
Volume	7
Issue number	3
DOIs	https://doi.org/10.1145/3399714 https://doi.org/10.1145/3399714
State	Published - Aug 2020

Keywords

break dependence
Finite-state machine
SIMD

ASJC Scopus subject areas

Software
Modeling and Simulation
Hardware and Architecture
Computer Science Applications
Computational Theory and Mathematics

Access to Document

https://dl.acm.org/doi/10.1145/3399714

Cite this

@article{ef2b58dd12f54337954460c2a1cea430,

title = "Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation",

abstract = "Finite-state Machine (FSM) is the key kernel behind many popular applications, including regular expression matching, text tokenization, and Huffman decoding. Parallelizing FSMs is extremely difficult because of the strong dependencies and unpredictable memory accesses. Previous efforts have largely focused on multi-core parallelization and used different approaches, including speculative and enumerative execution, both of which have been effective but also have limitations. With increasing width and improving flexibility in SIMD instruction sets, this article focuses on combining SIMD and many/multi-core parallelism for FSMs. We have developed a novel strategy, called enumerative speculation. Instead of speculating on a single state as in speculative execution or enumerating all possible states as in enumerative execution, our strategy speculates transitions from several possible states, reducing the prediction overheads of speculation approach and the large amount of redundant work in the enumerative approach. A simple lookback approach produces a set of guessed states to achieve high speculation success rates in our enumerative speculation. In addition, to enable continued scalability of enumerative speculation with a large number of threads, we have developed a parallel merge method. We evaluate our method with four popular FSM applications: Huffman decoding, regular expression matching, HTML tokenization, and Div7. We obtain up to 2.5× speedup using SIMD on 1 core and up to 95× combining SIMD with 60 cores of an Intel Xeon Phi. On a single core, we outperform the best single-state speculative execution version by an average of 1.6×, and in combining SIMD and many-core parallelism, outperform enumerative execution by an average of 2×. Finally, when evaluate on a GPU, we show that our parallel merge implementations are 2.02 - 6.74× more efficient than corresponding sequential merge implementations and achieve better scalability on an Nvidia V100 GPU. ",

keywords = "break dependence, Finite-state machine, SIMD",

author = "Peng Jiang and Yang Xia and Gagan Agrawal",

year = "2020",

month = aug,

doi = "10.1145/3399714",

language = "English (US)",

volume = "7",

pages = "1--26",

journal = "ACM Transactions on Parallel Computing",

issn = "2329-4949",

publisher = "Association for Computing Machinery (ACM)",

number = "3",

}

TY - JOUR

T1 - Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation

AU - Jiang, Peng

AU - Xia, Yang

AU - Agrawal, Gagan

PY - 2020/8

Y1 - 2020/8

N2 - Finite-state Machine (FSM) is the key kernel behind many popular applications, including regular expression matching, text tokenization, and Huffman decoding. Parallelizing FSMs is extremely difficult because of the strong dependencies and unpredictable memory accesses. Previous efforts have largely focused on multi-core parallelization and used different approaches, including speculative and enumerative execution, both of which have been effective but also have limitations. With increasing width and improving flexibility in SIMD instruction sets, this article focuses on combining SIMD and many/multi-core parallelism for FSMs. We have developed a novel strategy, called enumerative speculation. Instead of speculating on a single state as in speculative execution or enumerating all possible states as in enumerative execution, our strategy speculates transitions from several possible states, reducing the prediction overheads of speculation approach and the large amount of redundant work in the enumerative approach. A simple lookback approach produces a set of guessed states to achieve high speculation success rates in our enumerative speculation. In addition, to enable continued scalability of enumerative speculation with a large number of threads, we have developed a parallel merge method. We evaluate our method with four popular FSM applications: Huffman decoding, regular expression matching, HTML tokenization, and Div7. We obtain up to 2.5× speedup using SIMD on 1 core and up to 95× combining SIMD with 60 cores of an Intel Xeon Phi. On a single core, we outperform the best single-state speculative execution version by an average of 1.6×, and in combining SIMD and many-core parallelism, outperform enumerative execution by an average of 2×. Finally, when evaluate on a GPU, we show that our parallel merge implementations are 2.02 - 6.74× more efficient than corresponding sequential merge implementations and achieve better scalability on an Nvidia V100 GPU.

AB - Finite-state Machine (FSM) is the key kernel behind many popular applications, including regular expression matching, text tokenization, and Huffman decoding. Parallelizing FSMs is extremely difficult because of the strong dependencies and unpredictable memory accesses. Previous efforts have largely focused on multi-core parallelization and used different approaches, including speculative and enumerative execution, both of which have been effective but also have limitations. With increasing width and improving flexibility in SIMD instruction sets, this article focuses on combining SIMD and many/multi-core parallelism for FSMs. We have developed a novel strategy, called enumerative speculation. Instead of speculating on a single state as in speculative execution or enumerating all possible states as in enumerative execution, our strategy speculates transitions from several possible states, reducing the prediction overheads of speculation approach and the large amount of redundant work in the enumerative approach. A simple lookback approach produces a set of guessed states to achieve high speculation success rates in our enumerative speculation. In addition, to enable continued scalability of enumerative speculation with a large number of threads, we have developed a parallel merge method. We evaluate our method with four popular FSM applications: Huffman decoding, regular expression matching, HTML tokenization, and Div7. We obtain up to 2.5× speedup using SIMD on 1 core and up to 95× combining SIMD with 60 cores of an Intel Xeon Phi. On a single core, we outperform the best single-state speculative execution version by an average of 1.6×, and in combining SIMD and many-core parallelism, outperform enumerative execution by an average of 2×. Finally, when evaluate on a GPU, we show that our parallel merge implementations are 2.02 - 6.74× more efficient than corresponding sequential merge implementations and achieve better scalability on an Nvidia V100 GPU.

KW - break dependence

KW - Finite-state machine

KW - SIMD

UR - http://www.scopus.com/inward/record.url?scp=85092781314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85092781314&partnerID=8YFLogxK

U2 - 10.1145/3399714

DO - 10.1145/3399714

M3 - Article

AN - SCOPUS:85092781314

SN - 2329-4949

VL - 7

SP - 1

EP - 26

JO - ACM Transactions on Parallel Computing

JF - ACM Transactions on Parallel Computing

IS - 3

M1 - 15

ER -

Combining SIMD and Many/Multi-core Parallelism for Finite-state Machines with Enumerative Speculation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this