TY - JOUR
T1 - Towards optimal packed string matching
AU - Ben-Kiki, Oren
AU - Bille, Philip
AU - Breslauer, Dany
AU - Ga̧sieniec, Leszek
AU - Grossi, Roberto
AU - Weimann, Oren
N1 - Funding Information:
* Corresponding author. E-mail address: oren@cs.haifa.ac.il (O. Weimann). 1 Partially supported by the European Research Council (ERC) Project SFEROT and by the Israeli Science Foundation Grants 686/07, 347/09 and 864/11. 2 Partially supported by Italian project PRIN AlgoDEEP (2008TFBWL4) of MIUR.
PY - 2014/3/13
Y1 - 2014/3/13
N2 - In the packed string matching problem, it is assumed that each machine word can accommodate up to α characters, thus an n-character string occupies n/α memory words. (a) We extend the Crochemore-Perrin constant-space O(n)-time string-matching algorithm to run in optimal O(n/α) time and even in real-time, achieving a factor α speedup over traditional algorithms that examine each character individually. Our macro-level algorithm only uses the standard AC0 instructions of the word-RAM model (i.e. no integer multiplication) plus two specialized micro-level AC0 word-size packed-string instructions. The main word-size string-matching instruction wssm is available in contemporary commodity processors. The other word-size maximum-suffix instruction wslm is only required during the pattern pre-processing. Benchmarks show that our solution can be efficiently implemented, unlike some prior theoretical packed string matching work. (b) We also consider the complexity of the packed string matching problem in the classical word-RAM model in the absence of the specialized micro-level instructions wssm and wslm. We propose micro-level algorithms for the theoretically efficient emulation using parallel algorithms techniques to emulate wssm and using the Four-Russians technique to emulate wslm. Surprisingly, our bit-parallel emulation of wssm also leads to a new simplified parallel random access machine string-matching algorithm. As a byproduct to facilitate our results we develop a new algorithm for finding the leftmost (most significant) 1 bits in consecutive non-overlapping blocks of uniform size inside a word. This latter problem is not known to be reducible to finding the rightmost 1, which can be easily solved, since we do not know how to reverse the bits of a word in O(1) time.
AB - In the packed string matching problem, it is assumed that each machine word can accommodate up to α characters, thus an n-character string occupies n/α memory words. (a) We extend the Crochemore-Perrin constant-space O(n)-time string-matching algorithm to run in optimal O(n/α) time and even in real-time, achieving a factor α speedup over traditional algorithms that examine each character individually. Our macro-level algorithm only uses the standard AC0 instructions of the word-RAM model (i.e. no integer multiplication) plus two specialized micro-level AC0 word-size packed-string instructions. The main word-size string-matching instruction wssm is available in contemporary commodity processors. The other word-size maximum-suffix instruction wslm is only required during the pattern pre-processing. Benchmarks show that our solution can be efficiently implemented, unlike some prior theoretical packed string matching work. (b) We also consider the complexity of the packed string matching problem in the classical word-RAM model in the absence of the specialized micro-level instructions wssm and wslm. We propose micro-level algorithms for the theoretically efficient emulation using parallel algorithms techniques to emulate wssm and using the Four-Russians technique to emulate wslm. Surprisingly, our bit-parallel emulation of wssm also leads to a new simplified parallel random access machine string-matching algorithm. As a byproduct to facilitate our results we develop a new algorithm for finding the leftmost (most significant) 1 bits in consecutive non-overlapping blocks of uniform size inside a word. This latter problem is not known to be reducible to finding the rightmost 1, which can be easily solved, since we do not know how to reverse the bits of a word in O(1) time.
KW - Packed strings
KW - String matching
KW - Word-RAM
UR - http://www.scopus.com/inward/record.url?scp=84895927769&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84895927769&partnerID=8YFLogxK
U2 - 10.1016/j.tcs.2013.06.013
DO - 10.1016/j.tcs.2013.06.013
M3 - Article
AN - SCOPUS:84895927769
SN - 0304-3975
VL - 525
SP - 111
EP - 129
JO - Theoretical Computer Science
JF - Theoretical Computer Science
ER -