We all know the well-known VLOOKUP() function that helps us combine data from different tables. However, this function has one significant drawback - it cannot combine similar values, that is, if there is an error in the word, then there will be no match.
To be able to combine approximate values, we can create our own function. Let's call it FuzzyLookup().
Let's imagine that we have two lists. Both have approximately the same elements, but they may be written slightly differently. The task is to find for each element in the first list the most similar element from the second list, i.e. implement a search for the nearest maximally similar text.
The big question, in this case, is what to consider the criterion of "similarity". Just the number of matching characters? Is the number of consecutive matches? Should character case or spaces be considered? What to do with different arrangement of words in a phrase? There are many options and there is no single solution - for each situation one or the other will be better than others.
In our case, we implement the simplest option - search by the maximum number of character matches. It's not perfect, but it works pretty well for most situations.
To add function FuzzyLookup , open the menu Tools - Macros - Edit Macros... , select Module1 and copy the following text into the module:
Function FuzzyLOOKUP(LookupValue As String, SrcTable As Variant, Optional SimThreshold As Single) As String
' moonexcel.com.ua
Dim Str As String
Dim CellArray As Variant
Dim StrArray As Variant
If IsMissing(SimThreshold) Then SimThreshold = 0
Str = LCase(LookupValue)
StrArray = Split(Str)
StrExt = UBound(StrArray)
For Each Cell In SrcTable
CellArray = Split(LCase(Cell))
CellExt = UBound(CellArray)
CellRate = 0
' We check each word in the search phrase
For x = 0 To StrExt
StrWord = StrArray(x)
If Len(StrWord) = 0 Then GoTo continue_x
MaxStrWordRate = 0
' We check each word in the next cell from the original table of values
For i = 0 To CellExt
CellWord = CellArray(i)
If Len(CellWord) = 0 Then GoTo continue_i
FindCharNum = OccurrenceNum(StrWord, CellWord)
StrWordRate = FindCharNum / Max(Len(StrWord),Len(CellWord))
If StrWordRate > MaxStrWordRate Then MaxStrWordRate = StrWordRate
continue_i:
Next i
CellRate = CellRate + MaxStrWordRate
continue_x:
Next x
' We keep the best match
If CellRate > MaxCellRate Then
MaxCellRate = CellRate
BestCell = Cell
FindCharNum = OccurrenceNum(Str, Cell)
SimRate = FindCharNum / Max(Len(Str),Len(Cell))
End If
Next Cell
IF SimRate >= SimThreshold Then
IF SimThreshold = -1 Then
ReturnValue = BestCell + " (" + Format(SimRate, "0.00") + ")"
ElseIf SimThreshold = -2 Then
ReturnValue = Format(SimRate, "0.00")
Else
ReturnValue = BestCell
End If
Else
ReturnValue = ""
End If
FuzzyLOOKUP = ReturnValue
End Function
Function OccurrenceNum(ByVal SourceString As String, ByVal TargetString As String)
For i = 1 To Len(SourceString)
' We are looking for the occurrence of each symbol
Position = InStr(1, TargetString, Mid(SourceString, i, 1), 1)
' We increase the counter of coincidences
If Position > 0 Then
Count = Count + 1
' Remove the found symbol
TargetString = Left(TargetString, Position - 1) + Right(TargetString, Len(TargetString) - Position)
End If
Next i
OccurrenceNum = Count
End Function
Function Max(ByVal value1 As Variant, ByVal value2 As Variant)
If value1 > value2 Then
Result = value1
Else
Result = value2
End If
Max = Result
End Function
Next, close Macro Editor and return to the worksheet LibreOffice Calc - now you can use our new feature FuzzyLookup() .
You can also use the feature FUZZYLOOKUP() by installing the free extension YouLibreCalc.oxt or its full-featured version YLC_Utilities.oxt .
After that, this function will be available in all files that will be opened in LibreOffice Calc.