Rabin Karp
The RabināKarp algorithm is a string-searching algorithm used to find a pattern inside a larger text. Itās especially useful when you need to search for multiple patterns or do repeated searches efficiently.
It uses rolling hashing.
text: a b c d b a c dāØ
Pattern Hash: 7 (4+2+1) Add all the values of Hash.
- a = 1
- b = 2
- c = 3
- d = 4
- e = 5
- f = 6
- g = 7
- h = 8
- i = 9
- j = 10
Now, I will proceed with the testing and create a window with the same pattern and dimensions as the original pattern.
-
Now for a b c
window hash = 1 + 2 + 3 = 6
Is the windowHash matching the pattern hash?Noā
a b c is not a valid pattern. -
Now, check for b c d
window hash = 2 + 3 + 4 = 9
Is the windowHash matching the pattern hash?Noā
b c d is not a valid pattern. -
Now, check for c d b
window hash = 3 + 4 + 2 = 9
Is the windowHash matching the pattern hash?Noā
c d b is not a valid pattern. -
Now, check for d b a
window hash = 4 + 2 + 1 = 7
Is the windowHash matching the pattern hash?Yesā
Now the hash is matching. -
Now, check each and every character.
If the pattern matches,then we can say itās a valid pattern.
d b a is a valid pattern.
Spurious Hits
In the RabināKarp algorithm, a spurious hit happens when:
The hash value of the pattern and the current text window are the same,
but the actual strings are different.
So the algorithm thinks it found a match based on the hash, but after algorithm thinks it found a match based on the hash, it turns out to be false.
Note: To reduce spurious hits, we need to use a stronger hash function.
How can we calculate a strong hash?
text = a b c d b a c d
Pattern = d b a
- a = 1
- b = 2
- c = 3
- d = 4
- e = 5
- f = 6
- g = 7
- h = 8
- i = 9
- j = 10
- k = 11
- l = 12
- m = 13
- n = 14
- o = 15
- p = 16
- q = 17
- r = 18
- s = 19
- t = 20
- u = 21
- v = 22
- w = 23
- x = 24
- y = 25
- z = 26
StrongHash:
(102 x 4) + (101 x 2) + (100 x 1)
400 + 20 + 1 = 421
In the RabināKarp algorithm, 256 is commonly used as the base because the ASCII character set contains 256 possible characters. This helps generate stronger and more unique hash values for efficient string matching.
text = a b c d b a c d
Pattern = d b a
StrongHash:
(262 x 4) + (261 x 2) + (260 x 1)
2704 + 52 + 1 = 2757
Using a larger base such as 26 or 256 does not change the theoretical time complexity of RabināKarp, but it reduces hash collisions and spurious hits, making the algorithm more efficient in practice.
