Explore Swift Challenge #3: Boyer-Moore Solution
@Eiam8821 has been running the Explore Swift Challenge as a way to explore what it means to write Swift code and how we can solve actual programming problems in a true Swift-like manner. (See also Reddit.)
Challenge #3 goes as follows:
Implement a
findIndexOf()
extension onString
.findIndexOf()
returns theString.Index?
of the starting character in the suppliedString
For example:
As with the other challenges, the idea is to write the solution in pure Swift. You canât import Cocoa, Foundation, AppKit, or UIKit.
First, a brute-force solution:
This looks at each character in the source string in turn. If the character equals the first character of the search pattern, then it enters the inner loop that checks whether the rest of the pattern matches. If no match is found, the outer loop continues where it left off. This repeats until a complete match is found or the end of the source string is reached.
The brute-force approach works OK, but itâs not very efficient (or pretty). As it turns out, you donât have to look at each character from the source string â you can often skip ahead multiple characters.
That skip-ahead algorithm is called Boyer-Moore and it has been around for a long time. There are probably faster string search algorithms out there, but Boyer-Moore is no slouch either.
Hereâs how youâd write it in Swift:
This code is based on the article âFaster String Searchesâ by Costas Menico from Dr Dobbâs magazine, July 1989. (Yes, 1989! Sometimes itâs useful to keep those old magazines around.)
It works as follows. You line up the search pattern with the source string and see what character matches the last character of the search pattern:
source string: Hello, World
search pattern: World
^
There are three possibilities:
-
The two characters are equal. Youâve found a possible match.
-
The characters are not equal, but the source character does appear in the search pattern elsewhere.
-
The source character does not appear in the search pattern at all.
In the example, the characters d
and o
do not match, but o
does appear in the search pattern. That means we can skip ahead several positions:
source string: Hello, World
search pattern: World
^
Note how the two o
characters line up now. Again you compare the last character of the search pattern with the search text: d
vs W
. These are not equal but the W
does appear in the pattern. So we skip ahead again to line up those two W
characters:
source string: Hello, World
search pattern: World
^
This time the two characters are equal and there is a possible match. To verify the match you do a brute-force search, but backwards, from the end of the search pattern to the beginning. And thatâs all there is to it.
The amount to skip ahead at any given time is determined by the âskip tableâ, which is a dictionary of all the characters in the search pattern and the amount to skip by. The skip table in the example looks like:
W: 4
o: 3
r: 2
l: 1
d: 0
The closer a character is to the end of the pattern, the smaller the skip amount. If a character appears more than once in the pattern, the one nearest to the end of the pattern determines the skip value for that character.
Note that if the pattern is just one character, itâs faster to do a brute-force search. Thereâs probably a trade-off between the time it takes to build the skip table and doing brute-force for short patterns.
The code appears to work well, but I donât really like the following bit:
This performs the backwards search that happens when the last character from the search pattern matches the character from the source string. We need to step backwards through both strings until we find a character that doesnât match, or until weâve reached the beginning of the search pattern.
I wonder if there is a more Swift-ish way to write thisâŠ
The following would work:
This steps back through the source string until where the pattern starts, and then compares the substring to the search pattern. The code is a lot shorter but it may be slower because it potentially does more work.