青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

posts - 297,  comments - 15,  trackbacks - 0

      The Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.

    The algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by J. H. Morris in 1977, but the three published it jointly.

In this article, we will be using zero-based arrays for our strings; thus the 'C' in W (shown below) will be denoted W[2].

Worked example of the search algorithm

To illustrate the algorithm's details, we work through a (relatively artificial) run of the algorithm. At any given time, the algorithm is in a state determined by two integers, m and i, which denote respectively the position within S which is the beginning of a prospective match for W, and the index in W denoting the character currently under consideration. This is depicted, at the start of the run, like

             1         2
m: 01234567890123456789012
S: ABC ABCDAB ABCDABCDABDE
W: ABCDABD
i: 0123456

We proceed by comparing successive characters of W to "parallel" characters of S, moving from one to the next if they match. However, in the fourth step, we get S[3] is a space and W[3] = 'D', a mismatch. Rather than beginning to search again at S[1], we note that no 'A' occurs between positions 0 and 3 in S except at 0; hence, having checked all those characters previously, we know there is no chance of finding the beginning of a match if we check them again. Therefore we move on to the next character, setting m = 4 and i = 0.

             1         2
m: 01234567890123456789012
S: ABC ABCDAB ABCDABCDABDE
W:     ABCDABD
i:     0123456

We quickly obtain a nearly complete match "ABCDAB" when, at W[6] (S[10]), we again have a discrepancy. However, just prior to the end of the current partial match, we passed an "AB" which could be the beginning of a new match, so we must take this into consideration. As we already know that these characters match the two characters prior to the current position, we need not check them again; we simply reset m = 8, i = 2 and continue matching the current character. Thus, not only do we omit previously matched characters of S, but also previously matched characters of W.

             1         2
m: 01234567890123456789012
S: ABC ABCDAB ABCDABCDABDE
W:         ABCDABD
i:         0123456

This search fails immediately, however, as the pattern still does not contain a space, so as in the first trial, we return to the beginning of W and begin searching at the next character of S: m = 11, reset i = 0.

             1         2
m: 01234567890123456789012
S: ABC ABCDAB ABCDABCDABDE
W:            ABCDABD
i:            0123456

Once again we immediately hit upon a match "ABCDAB" but the next character, 'C', does not match the final character 'D' of the word W. Reasoning as before, we set m = 15, to start at the two-character string "AB" leading up to the current position, set i = 2, and continue matching from the current position.

             1         2
m: 01234567890123456789012
S: ABC ABCDAB ABCDABCDABDE
W:                ABCDABD
i:                0123456

This time we are able to complete the match, whose first character is S[15].


Description of and pseudocode for the search algorithm

The above example contains all the elements of the algorithm. For the moment, we assume the existence of a "partial match" table T, described below, which indicates where we need to look for the start of a new match in the event that the current one ends in a mismatch. The entries of T are constructed so that if we have a match starting at S[m] that fails when comparing S[m + i] to W[i], then the next possible match will start at index m + i - T[i] in S (that is, T[i] is the amount of "backtracking" we need to do after a mismatch). This has two implications: first, T[0] = -1, which indicates that if W[0] is a mismatch, we cannot backtrack and must simply check the next character; and second, although the next possible match will begin at index m + i - T[i], as in the example above, we need not actually check any of the T[i] characters after that, so that we continue searching from W[T[i]]. The following is a sample pseudocode implementation of the KMP search algorithm.

algorithm kmp_search:
input:
an array of characters, S (the text to be searched)
an array of characters, W (the word sought)
output:
an integer (the zero-based position in S at which W is found)
define variables:
an integer, m ← 0 (the beginning of the current match in S)
an integer, i ← 0 (the position of the current character in W)
an array of integers, T (the table, computed elsewhere)
while m + i is less than the length of S, 
do
if W[i] = S[m + i], let i ← i + 1 if i equals the length of W, return m otherwise{
 let m ← m + i - T[i], // this if else statement is utilized to reposition i
if
T[i] is greater than -1, let i ← T[i] else let i ← 0 }
done (if we reach here, we have searched all of S unsuccessfully) return the length of S, that is m+i.
[edit] Efficiency of the search algorithm

Assuming the prior existence of the table T, the search portion of the Knuth–Morris–Pratt algorithm has complexity O(k), where k is the length of S and the O is big-O notation. As except for the fixed overhead incurred in entering and exiting the function, all the computations are performed in the while loop, we will calculate a bound on the number of iterations of this loop; in order to do this we first make a key observation about the nature of T. By definition it is constructed so that if a match which had begun at S[m] fails while comparing S[m + i] to W[i], then the next possible match must begin at S[m + (i - T[i])]. In particular the next possible match must occur at a higher index than m, so that T[i] < i.

Using this fact, we will show that the loop can execute at most 2k times. For in each iteration, it executes one of the two branches in the loop. The first branch invariably increases i and does not change m, so that the index m + i of the currently scrutinized character of S is increased. The second branch adds i - T[i] to m, and as we have seen, this is always a positive number. Thus the location m of the beginning of the current potential match is increased. Now, the loop ends if m + i = k; therefore each branch of the loop can be reached at most k times, since they respectively increase either m + i or m, and m ≤ m + i: if m = k, then certainly m + i ≥ k, so that since it increases by unit increments at most, we must have had m + i = k at some point in the past, and therefore either way we would be done.

Thus the loop executes at most 2k times, showing that the time complexity of the search algorithm is O(k).


"Partial match" table (also known as "failure function")

The goal of the table is to allow the algorithm not to match any character of S more than once. The key observation about the nature of a linear search that allows this to happen is that in having checked some segment of the main string against an initial segment of the pattern, we know exactly at which places a new potential match which could continue to the current position could begin prior to the current position. In other words, we "pre-search" the pattern itself and compile a list of all possible fallback positions that bypass a maximum of hopeless characters while not sacrificing any potential matches in doing so.

We want to be able to look up, for each position in W, the length of the longest possible initial segment of W leading up to (but not including) that position, other than the full segment starting at W[0] that just failed to match; this is how far we have to backtrack in finding the next match. Hence T[i] is exactly the length of the longest possible proper initial segment of W which is also a segment of the substring ending at W[i - 1]. We use the convention that the empty string has length 0. Since a mismatch at the very start of the pattern is a special case (there is no possibility of backtracking), we set T[0] = -1, as discussed above.


Worked example of the table-building algorithm

We consider the example of W = "ABCDABD" first. We will see that it follows much the same pattern as the main search, and is efficient for similar reasons. We set T[0] = -1. To find T[1], we must discover a proper suffix of "A" which is also a prefix of W. But there are no proper suffixes of "A", so we set T[1] = 0. Likewise, T[2] = 0.

Continuing to T[3], we note that there is a shortcut to checking all suffixes: let us say that we discovered a proper suffix which is a proper prefix and ending at W[2] with length 2 (the maximum possible); then its first character is a proper prefix of a proper prefix of W, hence a proper prefix itself, and it ends at W[1], which we already determined cannot occur in case T[2]. Hence at each stage, the shortcut rule is that one needs to consider checking suffixes of a given size m>1 only if a valid suffix of size m-1 was found at the previous stage (e.g. T[x]=m-1).

Therefore we need not even concern ourselves with substrings having length 2, and as in the previous case the sole one with length 1 fails, so T[3] = 0.

We pass to the subsequent W[4], 'A'. The same logic shows that the longest substring we need consider has length 1, and although in this case 'A' does work, recall that we are looking for segments ending before the current character; hence T[4] = 0 as well.

Considering now the next character, W[5], which is 'B', we exercise the following logic: if we were to find a subpattern beginning before the previous character W[4], yet continuing to the current one W[5], then in particular it would itself have a proper initial segment ending at W[4] yet beginning before it, which contradicts the fact that we already found that 'A' itself is the earliest occurrence of a proper segment ending at W[4]. Therefore we need not look before W[4] to find a terminal string for W[5]. Therefore T[5] = 1.

Finally, we see that the next character in the ongoing segment starting at W[4] = 'A' would be 'B', and indeed this is also W[5]. Furthermore, the same argument as above shows that we need not look before W[4] to find a segment for W[6], so that this is it, and we take T[6] = 2.

Therefore we compile the following table:

i 0 1 2 3 4 5 6
W[i] A B C D A B D
T[i] -1 0 0 0 0 1 2

Description of and pseudocode for the table-building algorithm

The example above illustrates the general technique for assembling the table with a minimum of fuss. The principle is that of the overall search: most of the work was already done in getting to the current position, so very little needs to be done in leaving it. The only minor complication is that the logic which is correct late in the string erroneously gives non-proper substrings at the beginning. This necessitates some initialization code.

algorithm kmp_table:
input:
an array of characters, W (the word to be analyzed)
an array of integers, T (the table to be filled)
output:
nothing (but during operation, it populates the table)
define variables:
an integer, pos ← 2 (the current position we are computing in T)
an integer, cnd ← 0 (the zero-based index in W of the next 
character of the current candidate substring) (the first few values are fixed but different from what the algorithm
might suggest) let T[0] ← -1, T[1] ← 0 while pos is less than the length of W, do: (first case: the substring continues) if W[pos - 1] = W[cnd],
let T[pos] ← cnd + 1, pos ← pos + 1, cnd ← cnd + 1 (second case: it doesn't, but we can fall back) otherwise, if cnd > 0, let cnd ← T[cnd] (third case: we have run out of candidates. Note cnd = 0) otherwise, let T[pos] ← 0, pos ← pos + 1

Efficiency of the table-building algorithm

The complexity of the table algorithm is O(n), where n is the length of W. As except for some initialization all the work is done in the while loop, it is sufficient to show that this loop executes in O(n) time, which will be done by simultaneously examining the quantities pos and pos - cnd. In the first branch, pos - cnd is preserved, as both pos and cnd are incremented simultaneously, but naturally, pos is increased. In the second branch, cnd is replaced by T[cnd], which we saw above is always strictly less than cnd, thus increasing pos - cnd. In the third branch, pos is incremented and cnd is not, so both pos and pos - cnd increase. Since pos ≥ pos - cnd, this means that at each stage either pos or a lower bound for pos increases; therefore since the algorithm terminates once pos = n, it must terminate after at most 2n iterations of the loop, since pos - cnd begins at 1. Therefore the complexity of the table algorithm is O(n).


Efficiency of the KMP algorithm

Since the two portions of the algorithm have, respectively, complexities of O(k) and O(n), the complexity of the overall algorithm is O(n + k).

As is apparent from the worked example, the algorithm makes its greatest gains over a naive string-searching algorithm when it can skip the most characters at a time. That is, the less it has to backtrack, the faster it goes, which is reflected in the table T by the presence of zeroes. A word such as "ABCDEFG" works well with this algorithm because it has no repetitions of its beginning, so that its table is simply all zeroes with a -1 at the beginning. On the opposite end of the spectrum, W = "AAAAAAA" works terribly, because its table is

i 0 1 2 3 4 5 6
W[i] A A A A A A A
T[i] -1 0 1 2 3 4 5

This is the worst possible pattern for T, and it can be exploited by a word such as S = "AAAAAABAAAAAABAAAAAAA", in which the algorithm tries to match each 'A' against each 'B' before giving up; this results in the maximum possible number of iterations, verging on twice the number of characters of S as the number of repetitions of "AAAAAAB" increases. Although the table-building routine is rapid for this word (since no backtracking ever occurs), this routine is run only once for a given W, while the search routine is potentially run many times. If each time, this W is searched for in strings like S, the overall efficiency will be as poor as possible. By way of comparison, this particular combination is a best case for the Boyer-Moore string search algorithm.

Note that in practice, the KMP algorithm is not good at searching in natural language texts because it can only skip characters when the first part of the pattern actually matches a part of the text. In practice this only happens occasionally in natural language texts. For instance, consider how many times a prefix of the pattern "text" matches something in this paragraph.


there are also some references and furtuer explanation for kmp algorithm:
1.http://www.ics.uci.edu/~eppstein/161/960227.html
2.http://www.ics.uci.edu/~eppstein/161/kmp/
3.http://www-igm.univ-mlv.fr/~lecroq/string/node8.html
4.http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm


from:
1.http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm


posted on 2010-03-08 22:51 chatler 閱讀(722) 評(píng)論(0)  編輯 收藏 引用 所屬分類: Algorithm
<2010年3月>
28123456
78910111213
14151617181920
21222324252627
28293031123
45678910

常用鏈接

留言簿(10)

隨筆分類(307)

隨筆檔案(297)

algorithm

Books_Free_Online

C++

database

Linux

Linux shell

linux socket

misce

  • cloudward
  • 感覺(jué)這個(gè)博客還是不錯(cuò),雖然做的東西和我不大相關(guān),覺(jué)得看看還是有好處的

network

OSS

  • Google Android
  • Android is a software stack for mobile devices that includes an operating system, middleware and key applications. This early look at the Android SDK provides the tools and APIs necessary to begin developing applications on the Android platform using the Java programming language.
  • os161 file list

overall

搜索

  •  

最新評(píng)論

閱讀排行榜

評(píng)論排行榜

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品
  • <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>
            亚洲欧美一区二区精品久久久| 亚洲已满18点击进入久久| 欧美呦呦网站| 欧美一区在线直播| 久久精品免费看| 久久综合激情| 欧美α欧美αv大片| 欧美黄色日本| 国产精品国产三级欧美二区| 欧美天堂亚洲电影院在线播放| 国产精品成av人在线视午夜片| 欧美色欧美亚洲另类七区| 国产精品美女久久| 尤物在线精品| 99热免费精品| 久久经典综合| 91久久精品国产91久久性色tv| 亚洲国产视频直播| 亚洲影视中文字幕| 久久一区二区三区av| 欧美日韩国产美| 国产一区二区三区久久悠悠色av| 在线观看av一区| 亚洲图片欧洲图片av| 美女网站久久| 亚洲一区久久| 欧美激情一区二区三区在线视频| 国产乱肥老妇国产一区二| 91久久久亚洲精品| 久久九九免费| 亚洲视频在线观看视频| 欧美大片一区二区三区| 国产性做久久久久久| 亚洲午夜电影网| 欧美激情一区二区三区成人 | 亚洲最新在线| 久久综合久色欧美综合狠狠| 午夜视频精品| 亚洲免费在线精品一区| 欧美gay视频| 国产一区自拍视频| 亚洲影视在线播放| 91久久国产精品91久久性色| 久久精品国产2020观看福利| 欧美日韩在线直播| 亚洲精品小视频| 欧美成人精品h版在线观看| 性做久久久久久免费观看欧美| 欧美日韩在线第一页| 亚洲精品美女| 亚洲国产成人tv| 蜜乳av另类精品一区二区| 国内在线观看一区二区三区| 久久aⅴ国产紧身牛仔裤| 亚洲一区二区三区免费视频| 欧美色一级片| 亚洲一区制服诱惑| 亚洲天堂成人在线观看| 欧美性大战久久久久久久| 宅男精品视频| av成人激情| 国产精品日韩精品欧美在线| 亚洲欧美国产77777| 亚洲尤物视频网| 国产日本欧美一区二区三区| 欧美一区永久视频免费观看| 亚洲欧美中文另类| 海角社区69精品视频| 老司机午夜精品视频| 蜜臀久久99精品久久久画质超高清| 在线日韩欧美视频| 亚洲国产天堂久久综合| 欧美人与性动交α欧美精品济南到 | 亚洲综合导航| 午夜精品久久久久久久99热浪潮 | 亚洲欧洲日韩综合二区| 亚洲黄色毛片| 国产精品乱码一区二区三区| 欧美一级在线播放| 久久精品视频免费| 亚洲老司机av| 亚洲伊人一本大道中文字幕| 国产一区二区精品久久| 欧美激情精品久久久久久久变态| 欧美精品免费播放| 欧美在线观看视频一区二区| 久久婷婷国产综合国色天香| 一本色道久久88精品综合| 亚洲综合激情| 亚洲人成人一区二区三区| 一区二区三区日韩| 亚洲私人黄色宅男| 亚洲精品一区在线观看| 国产精品久久久久久久久免费| 久久精品人人爽| 欧美国产国产综合| 久久久久久综合| 欧美日韩国产综合一区二区| 久久九九免费视频| 欧美四级伦理在线| 欧美大片网址| 国产人久久人人人人爽| 亚洲精品国产精品国产自| 国产日韩欧美亚洲一区| 亚洲精品免费在线播放| 国产亚洲亚洲| 亚洲精品偷拍| 亚洲福利国产精品| 亚洲欧美日韩在线播放| aa级大片欧美三级| 久热精品视频在线观看一区| 香蕉成人啪国产精品视频综合网| 六月天综合网| 久久久久九九九九| 国产精品欧美久久| 亚洲欧洲精品一区| 1000精品久久久久久久久| 亚洲欧美日韩在线不卡| 亚洲一区精品在线| 欧美日韩精品一区二区天天拍小说 | 亚洲主播在线观看| 欧美xx视频| 农夫在线精品视频免费观看| 国产一级揄自揄精品视频| 亚洲免费视频一区二区| 亚洲欧美精品中文字幕在线| 欧美激情精品久久久久久黑人| 麻豆精品视频| 伊人久久综合| 久久天天躁狠狠躁夜夜av| 老司机精品久久| 在线看欧美视频| 欧美高清视频www夜色资源网| 午夜欧美精品| 亚洲女人天堂av| 欧美午夜精品久久久久免费视 | 久久成人18免费观看| 久久精品亚洲一区二区| 国产亚洲精品成人av久久ww| 亚洲一二区在线| 午夜久久影院| 国产一区二区剧情av在线| 久久国产成人| 欧美成人激情视频| 日韩视频免费观看| 欧美日韩在线看| 亚洲男同1069视频| 久久精品国产亚洲a| 国产亚洲精品bv在线观看| 欧美一二三视频| 牛夜精品久久久久久久99黑人| 亚洲国产精品va在线看黑人| 欧美成人国产一区二区| 红桃视频成人| 久久综合一区二区三区| 久久人人爽人人爽| 亚洲第一黄色网| 欧美国产欧美亚洲国产日韩mv天天看完整 | 亚洲国产精品va在线看黑人动漫| 在线观看一区二区视频| 欧美成人第一页| 这里只有精品视频| 久久精品中文字幕免费mv| 亚洲盗摄视频| 欧美午夜片在线观看| 香蕉久久一区二区不卡无毒影院| 久久欧美肥婆一二区| 亚洲精品资源美女情侣酒店| 国产精品大全| 久久综合给合久久狠狠狠97色69| 亚洲经典自拍| 久久精品亚洲精品国产欧美kt∨| 亚洲国产精品尤物yw在线观看| 欧美日韩免费视频| 久久精品在这里| 夜夜嗨av一区二区三区网站四季av| 欧美中文字幕在线播放| 亚洲另类春色国产| 激情自拍一区| 国产精品wwwwww| 欧美插天视频在线播放| 午夜在线一区| 99伊人成综合| 亚洲第一区中文99精品| 欧美在线一二三| 一区二区免费在线视频| 伊人久久综合97精品| 国产精品每日更新在线播放网址| 美女在线一区二区| 欧美一区二视频| 一区二区三区久久精品| 亚洲福利视频三区| 久久婷婷国产麻豆91天堂| 亚洲欧美日韩成人| av成人免费| 亚洲乱码国产乱码精品精98午夜| 国产综合欧美| 国产日韩欧美精品一区| 国产精品av久久久久久麻豆网| 欧美刺激性大交免费视频|