algorithm - Find word in string buffer/paragraph/text -
this asked in amazon telephonic interview - "can write program (in preferred language c/c++/etc.) find given word in string buffer of big size ? i.e. number of occurrences "
i still looking perfect answer should have given interviewer.. tried write linear search (char char comparison) , rejected.
given 40-45 min time telephonic interview, perfect algorithm he/she looking ???
the kmp algorithm popular string matching algorithm.
kmp algorithm
checking char char inefficient. if string has 1000 characters , keyword has 100 characters, don't want perform unnecessary comparisons. kmp algorithm handles many cases can occur, imagine interviewer looking case where: when begin (pass 1), first 99 characters match, 100th character doesn't match. now, pass 2, instead of performing entire comparison character 2, have enough information deduce next possible match can begin.
// c program implementation of kmp pattern searching // algorithm #include<stdio.h> #include<string.h> #include<stdlib.h> void computelpsarray(char *pat, int m, int *lps); void kmpsearch(char *pat, char *txt) { int m = strlen(pat); int n = strlen(txt); // create lps[] hold longest prefix suffix // values pattern int *lps = (int *)malloc(sizeof(int)*m); int j = 0; // index pat[] // preprocess pattern (calculate lps[] array) computelpsarray(pat, m, lps); int = 0; // index txt[] while (i < n) { if (pat[j] == txt[i]) { j++; i++; } if (j == m) { printf("found pattern @ index %d \n", i-j); j = lps[j-1]; } // mismatch after j matches else if (i < n && pat[j] != txt[i]) { // not match lps[0..lps[j-1]] characters, // match anyway if (j != 0) j = lps[j-1]; else = i+1; } } free(lps); // avoid memory leak } void computelpsarray(char *pat, int m, int *lps) { int len = 0; // length of previous longest prefix suffix int i; lps[0] = 0; // lps[0] 0 = 1; // loop calculates lps[i] = 1 m-1 while (i < m) { if (pat[i] == pat[len]) { len++; lps[i] = len; i++; } else // (pat[i] != pat[len]) { if (len != 0) { // tricky. consider example // aaacaaaa , = 7. len = lps[len-1]; // also, note not increment here } else // if (len == 0) { lps[i] = 0; i++; } } } } // driver program test above function int main() { char *txt = "ababdabacdababcabab"; char *pat = "ababcabab"; kmpsearch(pat, txt); return 0; }
this code taken site teaches algorithms: geeks geeks kmp