根据String searching algorithm,"The Boyer–Moore string search algorithm has been the standard benchmark for the practical string search literature."它是's not always the fastest, but in general it'的路要走 .
E:\_KAZE_Benchmark_2013>RUNME.bat
E:\_KAZE_Benchmark_2013>SKYFALL_TXT2HTML_MicrosoftV16.exe MIX_Tesla_BABAJI_Castaneda_Poirot_Holmes_Sunnah_Hadith_Quran_Bible_Patanjali_Dao_Simplici
ssimus.wrd.txt MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd /Gulliver
SKYFALL_TXT2HTML, revision 2, written by Kaze.
Size of 1st input file: 8916987
Size of 2nd input file: 3869529
Doing Search for each word with Railgun_Quadruplet_7Gulliver ...
Railgun_Quadruplet_7Gulliver performance: 1944+KB/clock
Average Pattern Length: 11
Function Invocations: 420,640
Function Inner-Loop Iterations: 131,873,881,926
Function Really Traversed: 1,142,740,439KB
E:\_KAZE_Benchmark_2013>SKYFALL_TXT2HTML_MicrosoftV16.exe MIX_Tesla_BABAJI_Castaneda_Poirot_Holmes_Sunnah_Hadith_Quran_Bible_Patanjali_Dao_Simplici
ssimus.wrd.txt MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
SKYFALL_TXT2HTML, revision 2, written by Kaze.
Size of 1st input file: 8916987
Size of 2nd input file: 3869529
Doing Search for each word with Railgun_Quadruplet_7 ...
Railgun_Quadruplet_7 performance: 1747+KB/clock
Average Pattern Length: 11
Function Invocations: 420,640
Function Inner-Loop Iterations: 146,826,792,351
Function Really Traversed: 1,142,740,439KB
E:\_KAZE_Benchmark_2013>SKYFALL_TXT2HTML_MicrosoftV16.exe MIX_Tesla_BABAJI_Castaneda_Poirot_Holmes_Sunnah_Hadith_Quran_Bible_Patanjali_Dao_Simplici
ssimus.wrd.txt MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd /Hasherezade
SKYFALL_TXT2HTML, revision 2, written by Kaze.
Size of 1st input file: 8916987
Size of 2nd input file: 3869529
Doing Search for each word with Railgun_Hasherezade ...
Railgun_Hasherezade performance: 1774+KB/clock
Average Pattern Length: 11
Function Invocations: 420,640
Function Inner-Loop Iterations: 128,900,655,391
Function Really Traversed: 1,142,740,439KB
2 回答
根据String searching algorithm,"The Boyer–Moore string search algorithm has been the standard benchmark for the practical string search literature."它是's not always the fastest, but in general it'的路要走 .
当你谈论一个像10,000个字符的小文本缓冲区时,Knuth-Morris-Pratt和Boyer-Moore的运行时间非常接近 . 在现代计算机上运行时,甚至在10K缓冲区上进行简单的字符串搜索也会非常快 . 我怀疑你会发现KMP和Boyer-Moore在10,000字符缓冲区中搜索10个字符的字符串之间的区别将是纳秒级 .
这种情况下最好的搜索算法?这取决于你需要多久调用一次 . 如果它被称为每秒几次调用(最多),我可能会写一个天真的搜索并留在那里 . 与程序的运行时间相比,Boyer-Moore和那个小缓冲区上的天真搜索之间的区别是微不足道的,并且您的优化工作将更好地用于其他地方 . 如果我不得不每秒呼叫数百次或数千次,我会花时间编写一个优化的Boyer-Moore搜索 .
要回答您的问题,您只需访问一个链接:http://www.codeproject.com/Articles/250566/Fastest-strstr-like-function-in-C
事实上,最快的文本搜索者的主页是这样的:http://www.sanmayce.com/Railgun/index.html
如果你有一台快速机器,比如i7 37 *,我想看看我最新的控制台基准测试包的结果(测试微软v16与英特尔12.1编译器):
http://www.sanmayce.com/Downloads/_KAZE_Benchmark_2013.zip
在我的T7500 2200Mhz笔记本上测试:
当使用32位和64位代码时,微软和英特尔之间的利润也很大,这有些起伏不定 .
Gulliver 的"engine"是订单2 BM-Horspool 由我调整 .
正如你在我不起眼的笔记本电脑上看到的 Gulliver 在 1898MB/s 搜索模式(平均模式长度:11),即使是超级漂亮的 BNDM 也在这里弯曲 .