Talk:Knuth–Morris–Pratt algorithm
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
|
Early text
[edit]Removed the following text from the article, for I'm not sure if it's correct. It is clear, however, that it's not C, but rather a hybrid between C and Pascal...
The following implementation is written in C:
INPUT: Text T[0,n-1], Pattern P[0,m-1] Output alle matches of P in T
InitNext(P); j=0; for (i=1;i<=n;i++) { while ((j>0) && (P[j+1] != T[i])) { j=next[j]; } if (P[j+1] == T[i]) j++; if (j == m-1) { print "Found P"; j = next[j]; } }
Procedure InitNext(P) Input: Pattern P
next[1] = 0; for (q=2;q<=m;q++) { while ((l>0) && (P[l+1] != P[q])) { l = next[l]; } if (P[l+1] == P[q]) l++; next[q] = l; }
— Preceding unsigned comment added by Madoka (talk • contribs) 03:19, 24 March 2004 (UTC)
First, I've removed...
[edit]First, I've removed the rather antique discussion topic that was already here, as the article has been revised substantially since then and anyway, it was more than a year old.
Second, are there any thoughts on whether I should replace the verbal algorithms with the plain C code? Barely any knowledge of C is necessary to comprehend what is going on in the code; I have tried to write it in such a way that it avoids all idioms even at the cost of being a little too verbose, and the language itself is self-documenting in its usage for the most part. But now that the code is there it seems to reproduce very closely the English-language algorithms and creates a redundancy. I think, given the choice, I would rather keep the code, since it's clearer for the most part, in addition to being more useful, but then, I am comfortable with C and others may not be. So, keep both forms, or just one?
- The verbal description reflects the C description too closely, probably because it was written to the C description. Replace it with something more high-level, if you can. No verbal description should say stuff like "if i > 0, set i = e." Deco 07:59, 8 December 2005 (UTC)
- There are two verbal descriptions of the code; the lengthy example was the first thing written (actually, it was the reason I rewrote the article, since in the original it was all there was, and quite bad). The second verbal description, which I wrote in vague emulation of Knuth's style for describing algorithms, was written to the C code after I decided that it was unwise to have the only formal description of the algorithm written in a single programming language that some casual readers might not know. Those who do know can verify the two are the same and, if curious, can compile my code and run the algorithm. Ryan Reich 14:12, 21 February 2006 (UTC)
- When an old discussion is taking up space on a talk page, please archive it instead of deleting it! ᛭ LokiClock (talk) 13:35, 28 May 2010 (UTC)
Mistake? "Thus, in the fourth step..."
[edit]Mistake? "Thus in the fourth step, m = 0 and i = 3." should that be m = 3 and i = 3? — Preceding unsigned comment added by 70.27.57.14 (talk) 06:10, 8 December 2005 (UTC)
In the the efficiency of the search algorithm section...
[edit]In the The efficiency of the search algorithm section, the second paragraph starts:
- Using this fact, we will show that the loop can execute at most times.
but the third paragraph has this contradiction:
- Thus the loop executes at most times,
It seems it should be 2l times (for example searching AAAAA for AB, if I've read the code right), so I changed it. Secondly I tried but failed to make the math tags work on the letters l, m i, and so on, but they stayed obstinately unclear (1 l and I looked practically indistinguisale on my browser) so I resorted to boldifying each math term. I guess there's some preference setting for making the < math > terms look consistent? -Wikibob 02:23, 18 December 2005 (UTC)
- That's a fine solution. Computer users have been unable to distinguish the letters l and I, and the number 1, for at least the last fifteen years. I blame the invention of Times New Roman. Ryan Reich 14:12, 21 February 2006 (UTC)
T[i-1] instead of T[i+1]?
[edit]Is the line below supposed to read "T[i-1]" instead of "T[i+1]"?--
"...T[i] in the code is equivalent to T[i + 1] above..."
O(n+l)
[edit]Shouldn't O(n) + O(1) be O(n) and not O(n+1)? Constants are not taken into account in O() notation. PedroCR
- You're talking about the very last section? That's not a 1 ("one"), it's a l ("ell"). It's the length of the string. Ryan Reich 12:10, 10 April 2006 (UTC)
- Maybe it would be better to write uppercase 'L' and 'N', but lowercase 'i'? T0ljan 12:53, 17 April 2006 (UTC)
Code -> pseudocode
[edit]Since some IP editor decided that it would be neat to write Java code for this algorithm, I realized that having any language-specific code in here is a bad idea. Of course, the Java code was practically identical to the C code, with minor differences mostly centered on how the length of a string is determined, so it added nothing. I've replaced my C code, their Java code, and also my previous attempt at pseudocode with new pseudocode formatted as suggested in the WikiProject Computer Science manual of style. If you are a future editor reading this article, and for some reason you also read the talk page before editing, please don't implement the algorithm in your favorite language, as it adds nothing. I've also warned you in the page's source. Ryan Reich 19:10, 10 August 2006 (UTC)
History of the algorithm
[edit]The algorithm described in Item 179 of HAKMEM seems quite similar to KMP, despite that HAKMEM is from 1972 so it predates the publication by KMP. Were variants of the algorithm for fast string matching known long before this publication or am I misinterpreting something? Should this be mentioned in the article? Thanks, – b_jonas 19:07, 9 November 2006 (UTC)
- I've now also asked this at the Reference desk/Mathematics. – b_jonas 15:53, 29 September 2007 (UTC)
- That HAKMEM entry is cited by Knuth in "Selected Papers on Design of Algorithms" Chapter 9 "Fast Pattern Matching in Strings" for the idea of inlining the table into the search loop when compiling to machine instructions, but not for the original idea. Perhaps also cited in the paper from 1977? Aureooms (talk) 16:10, 10 May 2021 (UTC)
The efficiency of the KMP algorithm
[edit]"A word such as "ABCDEFG" works well with this algorithm because it has no repetitions of its beginning, so that its table is simply all zeroes with a -1 at the beginning. On the opposite end of the spectrum, W = "AAAAAAA" works terribly"
- here where this algorithm is being compared to the naive search, surely 'ABCDEFG' with a table of all zeroes has no actual gain over the naive search, and the whole point of this algorithm is to reduce the no. of comparisons made when patterns like 'AAAAAA' are encountered?
e.g. if string is 'aaaaaaaaaabcdef' (15 chars) and pattern to find is 'abcdef': both naive search and KMP will make 15 comparisons (after computing fail array)
but if string is 'aaaaaaaaaaaaaab' (15 chars) and pattern to find is 'aaaaab': naive search will make (6x9)+6 = 60 comparisons whereas KMP will only do 15 comparisons again.
fail table would be [-1][1][2][3][4][5] and the gain is made from the fact KMP only needs to check for the last character instead of checking through the whole pattern each time - the fail table is a way of 'remembering' you already matched 5 a's, hence this being a good algorithm for small alphabets or bitstreams.
(I'm not confident enough of this to make changes, nor can I think of a concise way to word it, but I think it should be looked at again)
The comment about the Boyer-Moore algorithm's worst case:
"...while the Boyer-Moore algorithm achieves its best case at the great expense of its worst."
is incorrect. The worst case is still linear in the text size. Sustik 22:28, 15 June 2007 (UTC)
- So fix it? Ryan Reich 22:07, 16 June 2007 (UTC)
- I came here precisely to point out the same thing, and seeing that I'm
- not the first, I will fix it. So there. :) 128.210.4.214 (talk) 18:58, 7 March 2008 (UTC)
- Funny, I was just remembering this question this morning, after months of neglect. I'm glad to see my goading has encouraged people to action :) Aside from my enthusiasm for the KMP algorithm I am not even distantly connected to this area, and I don't know how these algorithms compare; it seemed, from a long-ago look at the Boyer-Moore page, that the claim was right, but I guess not. Since obviously I'm uninformed, I didn't feel like I could make the change. Ryan Reich (talk) 19:25, 7 March 2008 (UTC)
The confusion may involve the fact that the version of Boyer-Moore in some textbooks is a simplified one with a slower worst case, not the full linear-time algorithm. —David Eppstein (talk) 19:17, 7 March 2008 (UTC)
Terminal substring
[edit]What is terminal substring (this term is used w/o definition and I cannot find one with google)? 217.21.164.43 09:18, 10 October 2007 (UTC)
- Perhaps I was writing too much under the influence of mathematical jargon (though you won't find it there either). "Terminal" just means "at the end", so a "terminal substring" of a string S is a substring located at the end of S. For example if S is the string "abcdefg", then the substrings "efg" and even "bcdefg" are terminal, whereas "a" and "cde" are not (to pick just a few examples). Ryan Reich 13:40, 10 October 2007 (UTC)
- Somehow, I'd feel a LOT more comfortable if somebody used "proper prefix" or "proper suffix" rather than "proper initial substring" or "proper terminal substring". I mean, it's a mouthful of syllables that clutters the rest of the article --202.168.251.139 16:22, 10 October 2007 (UTC)
- Here's an opportunity to gain comfort with editing Wikipedia, then. Why don't you change the terminology you don't like to something clearer? Ryan Reich 18:13, 10 October 2007 (UTC)
How to go on
[edit]Well, the pseudocode doesn't state how far to shift once we found a match (there could be more). I've learned the algo with a table building function that returns an array that is one longer than the pattern: e.g.
ABCDABCD -10000123
wouldn't tell you that you have to shift by 4 to find a potential match in ABCDABCDABCD, but
ABCDABCD -100001234
does. I think there is something wrong here. Or did I miss something? Cheers, 88.73.111.77 16:49, 3 November 2007 (UTC)
- Well, I don't get your example since the string ABCDABCDABCD contains the pattern ABCDABCD, so in fact no shifting is ever necessary. However, say the text were ABCDABCCABCDABCD (in other words, it fails when matching the second D when starting from the beginning of the text). So we do the following:
m: 0123456789012345 S: ABCDABCCABCDABCD W: ABCDABCD i: 01234567
- and you see that with m = 0, the search fails once when i = 7. According to the pseudocode, we then replace m with m + i - T[i] which is 0 + 7 - 3 = 4. And indeed, that where the first false match "ABC..." begins. So all is well. I certainly hope the code is right, since I agonized over it for some days when writing this article, including writing it into an actual C program and testing it. I'm fairly sure I've got the indices correct. I also think the "worked example" is general enough that the details of the algorithm can be verified by checking them against the computations done there. Do they not match up after all? Ryan Reich 19:26, 3 November 2007 (UTC)
Error in "working example of ..."?
[edit]In the following text: "As we already know that these characters match the two characters prior to the current position, we need not check them again; we simply reset m = 8, i = 2 and continue matching the current character.", shouldn't it be "m = 10, i = 2"? m[10] and i[2] are the next characters to be compared, so this seems wrong to me. --132.199.235.61 (talk) 19:06, 15 May 2008 (UTC)
- No. You don't understand the notation: m is the beginning of the word, and i is the position within the word. So, we position W so that W[0] = S[8] (that's what m = 8 means) and then start matching at W[2] (which is S[10]). Ryan Reich (talk) 20:47, 15 May 2008 (UTC)
- I was confused and came here to ask the same question. The comment by Ryan Reich explains this perfectly, maybe (some of) his text should be added to that step? --Cloudmonkey (talk) 20:43, 2 January 2019 (UTC)
Possible Error in Pseudocode
[edit]In the pseudocode for kmp_search, should the line "if T[i] is greater than 0," be "if T[i] is greater than -1,"? —Preceding unsigned comment added by Byronknoll (talk • contribs) 21:20, 30 November 2009 (UTC)
- I discussed the pseudocode with two friends and they also agreed it is a mistake. I have corrected it on the main article.--Byron (talk) 06:21, 1 December 2009 (UTC).
A very minor change, may be not work
[edit]I just inserted a new row above the m:0123... with the numbers 1,2, in order to read that columns of m as 10, 20, vertically. That change is very minor, it looks correctly aligned in my browser (Mozilla FireFox). It should be reverted, i.e. erase the row with 1 2 above m:01234..., if it is not seen aligned in other browsers. — Preceding unsigned comment added by Elias (talk • contribs) 13:36, 22 January 2010 (UTC)
Error in the example???
[edit]Shouldn't the example read:
i
|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
W[i]
|
P | A | R | T | I | C | I | P | A | T | E | I | N | P | A | R | A | C | H | U | T | E | ||
T[i]
|
-1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | *3* | 0 | 0 | 0 | 0 | 0 |
Instead of:
i
|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
W[i]
|
P | A | R | T | I | C | I | P | A | T | E | I | N | P | A | R | A | C | H | U | T | E | ||
T[i]
|
-1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | *0* | 0 | 0 | 0 | 0 | 0 |
Apologies if I am mistaken. Tony (talk) 16:06, 5 October 2010 (UTC)
I ran the C code last night, and it does indeed output a 3. I am changing the article. Tony (talk) 15:52, 6 October 2010 (UTC)
The example output is:
[-1, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0]
But the output from the pseudo-code function is:
[-1, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0, 0, 0, 2, 3, 4, 0, 0, 0, 0, 0]
This is the result in an off by one error in the pseudo-code. The two lines cnd ← cnd + 1
and T[pos] ← cnd
got reversed a few months ago. Fixed. NeilFraser (talk) 04:37, 22 October 2010 (UTC)
T used before it is defined
[edit]The article is confusing to read because T is referred to before it is defined in the description of the algorithm in the first section. —Preceding unsigned comment added by 12.47.208.58 (talk) 00:30, 13 April 2011 (UTC)
- Agree. Confuses me as well. — Preceding unsigned comment added by 80.254.148.43 (talk) 09:43, 3 November 2011 (UTC)
- Yes, this is confusing. Anubhab91 (talk) 15:27, 23 February 2012 (UTC)
- Also confused at first by this methodology. It's unclear. Vote for prioritizing the above suggestion. Dan McCarty (talk) 15:08, 6 September 2012 (UTC)
Difficulty in understanding Algorithms
[edit]It seems to m that the Algorithms given here are difficult to comprehend. Can anyone please modify the algorithms in simpler forms? Thank you. And especially in the explanation of building the T table, it is really obscure to understand the process. As for the pseudocode, the second branch (if cnd > 0, let cnd ← T[cnd]) looks very weird. I hope this part can be better presented with a good example if possible. Anubhab91 (talk) 15:33, 23 February 2012 (UTC)
- Do you have any concrete suggestions? Otherwise I'm tempted to point to the story of Euclid and Alexander. I'm sure there are things that could be improved about our description but the main reason that it's difficult to understand is that it's a highly-nontrivial algorithm. —David Eppstein (talk) 05:01, 4 February 2013 (UTC)
Explanation skips a step
[edit]This part of the explanation--"we note that no 'A' occurs between positions 0 and 3 in S except at 0"--leaves out the step of how we "note" it. Is it related to the table T[]? Were we tracking previous starts of the search string? It's unclear. Thx. Dan McCarty (talk) 15:16, 6 September 2012 (UTC)
no output, input is output?
[edit]algorithm kmp_table: input: an array of characters, W (the word to be analyzed) an array of integers, T (the table to be filled) output: nothing (but during operation, it populates the table)
That is silly, the output is T, it is not input. NotYourFathersOldsmobile (talk) 10:45, 20 September 2015 (UTC)
are characters matched more than once?
[edit]In the section partial match the first sentence is:
"The goal of the table is to allow the algorithm not to match any character of S more than once."
However, the Worked example shows that `S[3]` is matched twice: once against `W[3]` and once against `W[0]`.
Who is wrong: the example, that sentence, or me? — Preceding unsigned comment added by 83.85.105.41 (talk) 11:21, 21 April 2016 (UTC)
- When W[3] was tested agains S[3], the characters did not match. That is, position S[3] has not matched against anything yet, so the algorithm continues to search for a match at that position. Glrx (talk) 23:49, 23 April 2016 (UTC)
I believe the pseudocode is wrong
[edit]The following is my python code which run the same example as the original article. However, firstly, the pseudocode in section 2.2 is very different from what described in section 2.1. Secondly, I believe the pseudocode is wrong. If T[i] > -1 and there is a mismatch, then the algorithm will never increase m, thus enters a dead loop.
if __name__ == '__main__': w = 'ABCDABD' s = 'ABC ABCDAB ABCDABCDABDE' t = [-1,0,0,0,0,1,2] m = 0 i = 0 while m + i < len(s): if w[i] == s[m+i]: if i == len(w) - 1: print 'found', m break i += 1 else: if t[i] > -1: i = t[i] else: m += 1 i = 0 print 'end' — Preceding unsigned comment added by Bohu88 (talk • contribs) 03:06, 14 June 2016 (UTC)
Mistake in pseudocode for the table-building algorithm
[edit]The value of cnd
can become equal to -1
, and then W[cnd]
becomes meaningless. This happens for example if the word to be analyzed W
is "aabaaaa"
when we try to compute T[6]
using the given algorithm. — Preceding unsigned comment added by Agostino1984 (talk • contribs) 16:33, 22 August 2016 (UTC)
Efficiency of the search algorithm
[edit]Without mentioning that T can be equal to -1 only for i == 0 (so that m ← m + 1, i ← 0 is executed) the complexity analysis does not make any sense. When you read Efficiency of the search algorithm section you know nothing about T. So you can assume that i can be set to 0 after every mismatch making the algorithm O(n^2). — Preceding unsigned comment added by 31.179.124.4 (talk) 21:55, 9 November 2016 (UTC)
What algorithm form are easy to understand?
[edit]Algorithm seems difficult. Number of versions of this page contain errors. Current version https://en.wikipedia.org/w/index.php?title=Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm&oldid=781303383 (indexes increment when match, pattern shift when mismatch) differ from presented in many books and internet pages (pattern shift when mismatch, then indexes increment). What version are more simple (especially table-building algorithm)? Avhohlov (talk) 22:41, 25 May 2017 (UTC)
algorithm kmp_search: (Current version https://en.wikipedia.org/w/index.php?title=Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm&oldid=805665102) input: an array of characters, S (the text to be searched) an array of characters, W (the word sought) output: an array of integers, P (positions in S at which W is found) an integer, nP (number of positions) define variables: an integer, m ← 0 (the beginning of the current match in S) an integer, i ← 0 (the position of the current character in W) an array of integers, T (the table, computed elsewhere) let nP ← 0 while m + i < length(S) do if W[i] = S[m + i] then let i ← i + 1 if i = length(W) then (occurrence found, if only first occurrence is needed, m may be returned at this point) let P[nP] ← m, nP ← nP + 1 let m ← m + i - T[i], i ← T[i] (T[length(W)] can't be -1) else if T[i] > -1 then let m ← m + i - T[i], i ← T[i] else let m ← m + i + 1, i ← 0
algorithm kmp_search: {Current version with j variable instead of m + i expression} input: an array of characters, S (the text to be searched) an array of characters, W (the word sought) output: an array of integers, P (positions in S at which W is found) an integer, nP (number of positions) define variables: an integer, j ← 0 (the position of the current character in S instead of m - the beginning of the current match in S) an integer, i ← 0 (the position of the current character in W) an array of integers, T (the table, computed elsewhere) let nP ← 0 while j < length(S) do if W[i] = S[j] then let j ← j + 1 let i ← i + 1 if i = length(W) then (occurrence found, if only first occurrence is needed, m may be returned at this point) let P[nP] ← j - i, nP ← nP + 1 let i ← T[i] (T[length(W)] can't be -1, j remains unchanged) else let i ← T[i] if i < 0 then let j ← j + 1 let i ← i + 1 (i = 0)
algorithm kmp_search: {Initial KMP version with zero-based indexes} input: an array of characters, S (the text to be searched) an array of characters, W (the word sought) output: an array of integers, P (positions in S at which W is found) an integer, nP (number of positions) define variables: an integer, j ← 0 (the position of the current character in S) an integer, i ← 0 (the position of the current character in W) an array of integers, T (the table, computed elsewhere) let nP ← 0 while j < length(S) do while i >= 0 and W[i] <> S[j] do let i ← T[i] let j ← j + 1 let i ← i + 1 if i = length(W) then (occurrence found, if only first occurrence is needed, m may be returned at this point) let P[nP] ← j - i, nP ← nP + 1 let i ← T[i] (T[length(W)] can't be -1, j remains unchanged)
algorithm kmp_table: (Current version https://en.wikipedia.org/w/index.php?title=Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm&oldid=805665102, maybe one-letter names j and i would be better then pos and cnd?) input: an array of characters, W (the word to be analyzed) an array of integers, T (the table to be filled) output: nothing (but during operation, it populates the table) define variables: an integer, pos ← 1 (the current position we are computing in T) an integer, cnd ← 0 (the zero-based index in W of the next character of the current candidate substring) let T[0] ← -1 while pos < length(W) do if W[pos] = W[cnd] then let T[pos] ← T[cnd], pos ← pos + 1, cnd ← cnd + 1 else let T[pos] ← cnd let cnd ← T[cnd] (to increase performance) while cnd >= 0 and W[pos] <> W[cnd] do let cnd ← T[cnd] let pos ← pos + 1, cnd ← cnd + 1 let T[pos] ← cnd (only need when all word occurrences searched)
algorithm kmp_table: (Same version with two nested loops) input: an array of characters, W (the word to be analyzed) an array of integers, T (the table to be filled) output: nothing (but during operation, it populates the table) define variables: an integer, pos ← 1 (the current position we are computing in T) an integer, cnd ← 0 (the zero-based index in W of the next character of the current candidate substring) let T[0] ← -1 while pos <= length(W) do while pos < length(W) and W[pos] = W[cnd] do let T[pos] ← T[cnd], pos ← pos + 1, cnd ← cnd + 1 let T[pos] ← cnd while pos < length(W) and cnd >= 0 and W[pos] <> W[cnd] do let cnd ← T[cnd] let pos ← pos + 1, cnd ← cnd + 1
algorithm kmp_table: (Initial KMP version with zero-based indexes) input: an array of characters, W (the word to be analyzed) an array of integers, T (the table to be filled) output: nothing (but during operation, it populates the table) define variables: an integer, pos ← 0 (the current position we are computing in T) an integer, cnd ← -1 (the zero-based index in W of the next character of the current candidate substring) let T[0] ← -1 while pos < length(W) do while cnd >= 0 and W[pos] <> W[cnd] do let cnd ← T[cnd] let pos ← pos + 1, cnd ← cnd + 1 if pos < length(W) and W[pos] = W[cnd] then let T[pos] ← T[cnd] else let T[pos] ← cnd
Algorithm in "Introduction to Algorithms" book by Cormen, Leiserson, Rivest, Stein differ from original paper?
[edit]It is seems that difference in action when mismatch occurs. In original paper "Fast pattern matching in strings" (SIAM J. COMPUT. Vol. 6, No. 2, June 1977) by D.Knuth, J.Morris and V.Pratt pattern shifted always, in CLR book version only when match length greater then zero. So, in CLR book version pattern shift by match length plus one character are impossible.
algorithm kmp_search: {CLR} input: an array of characters, S (the text to be searched, all arrays one-based) an array of characters, W (the word sought) define variables: an integer, i ← 1 (the position of the current character in S) an integer, q ← 0 (match length, not position) an array of integers, T (the table, computed elsewhere) while i <= length(S) do while q > 0 and S[i] != W[q + 1] do let q ← T[q] if S[i] = W[q + 1] then let q ← q + 1 let i ← i + 1 if q >= length(W) then (occurrence found) let q ← T[q]
algorithm kmp_search: {SIAM J. Comput. modified for all occurences search} input: an array of characters, S (the text to be searched, all arrays one-based) an array of characters, W (the word sought) define variables: an integer, k ← 1 (the position of the current character in S) an integer, j ← 1 (the position of the current character in W, not length) an array of integers, T (the table, computed elsewhere) while k <= length(S) do while j > 0 and S[k] != W[j] do let j ← T[j] let k ← k + 1 let j ← j + 1 if j > length(W) then (occurrence found) let j ← T[j]
algorithm kmp_search: {CLR equivalent} input: an array of characters, S (the text to be searched, all arrays one-based) an array of characters, W (the word sought) define variables: an integer, i ← 1 (the position of the current character in S) an integer, q ← 0 (match length, not position) an array of integers, T (the table, computed elsewhere) while i <= length(S) do if S[i] = W[q + 1] then let i ← i + 1 let q ← q + 1 if q >= length(W) then (occurrence found) let q ← T[q] else if q > 0 then let q ← T[q] else let i ← i + 1
algorithm kmp_search: {CLR equivalent, k = q + 1} input: an array of characters, S (the text to be searched, all arrays one-based) an array of characters, W (the word sought) define variables: an integer, i ← 1 (the position of the current character in S) an integer, k ← 1 (the position of the current character in W) an array of integers, T (the table, computed elsewhere) while i <= length(S) do if S[i] = W[k] then let i ← i + 1 let k ← k + 1 if k > length(W) then (occurrence found) let k ← T[k - 1] + 1 else if k > 1 then let k ← T[k - 1] + 1 else let i ← i + 1 — Preceding unsigned comment added by Avhohlov (talk • contribs) 22:58, 14 March 2018 (UTC)
algorithm kmp_search: {SIAM J. Comput. equivalent} input: an array of characters, S (the text to be searched, all arrays one-based) an array of characters, W (the word sought) define variables: an integer, k ← 1 (the position of the current character in S) an integer, j ← 1 (the position of the current character in W, not length) an array of integers, T (the table, computed elsewhere) while k <= length(S) do if S[k] = W[j] then let k ← k + 1 let j ← j + 1 if j > length(W) then (occurrence found) let j ← T[j] else let j ← T[j] if j < 1 then let k ← k + 1 let j ← j + 1
Example walkthrough of the algorithm
[edit]Not stating where the index m starts or if it's incremented each step or not makes the algorithm very hard to follow. Would love to see this cleaned up a bit so people not familiar with the algorithm would be able to learn it more easily. — Preceding unsigned comment added by 2605:6000:EC17:D800:8E3:EC21:F3A7:8490 (talk) 17:24, 11 May 2018 (UTC)
Mistake in the kmp_search pseudocode... I reverted it.
[edit]Someone added 'let j ← j + k - T[k]' to the kmp_search algorithm which breaks it (and the change doesn't really make any sense... I think)
Checkout my tests here that show it's broken that way. Hope that's OK!
https://repl.it/@Craxic/ImprobableSpanishRectangles#main.py — Preceding unsigned comment added by Craxic (talk • contribs) 16:11, 7 June 2020 (UTC)
Is there a contradiction in example?
[edit]According to "Hence at each stage, the shortcut rule is that one needs to consider checking suffixes of a given size m+1 only if a valid suffix of size m was found at the previous stage (i.e. T[x] = m
) and should not bother to check m+2, m+3, etc."
and using formal logic (A only if B <=> If A, then B <=> If NOT(B), then NOT(A)) i indestand this rule as "if valid suffix of size m was NOT found at previous stage, then you DON'T need to check of m+1, m+2, etc. So, in text the text below:
"Furthermore, the same argument as above shows that we need not look before W[4]
to find a segment for W[6]
, so that this is it, and we take T[6] = 2
.".
But T[5]
is the previous stage for T[6]
and T[5]= 0
(in other words valid suffix of length = 1 IS NOT found) and W[5] ≠ W[0]
. And according to logic in text above we don't need to check suffixes with length>1. But T[6] = 2
. I suppose that it's a contradiction. PavelBarb (talk) 19:59, 14 October 2024 (UTC)