Fuzzy search in Examine, does fuzziness work for you?

karlmacklin · March 7, 2025, 9:49am

Asking anyone who’s using Examine with fuzzy searching.

Fuzzy search takes a fuzziness parameter, a float between 0 and 2. However, from my testing I get the same result with a fuzziness level of 0.0000001 and 2 - there is no inbetween.

Just trying to see if someone has a confirmed working fuzzy search scenario where you have used fuzziness to “dial in” proper search results. I believe it’s currently broken.

mistyn8 · March 7, 2025, 11:54am

Apache Lucene - Query Parser Syntax

roam~
This search will find terms like foam and roams.

Starting with Lucene 1.9 an additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:

roam~0.8
The default that is used if the parameter is not given is 0.5.

Seems to be affecting the score in the backoffice at least - though as unpredictable as ever – umb 15.2.2

karlmacklin · March 7, 2025, 12:58pm

Very interesting results, thank you!
I didn’t know you could supply the tilde char directly in the backoffice, that simplifies testing for me greatly.

Do you know if we are using Lucene 1.9 under the hood in 15.2.2?
I haven’t seen before that the float range is supposedly 0-1 instead of 0-2, which Examine docs claim. More confusion possible there

You have a node there, 1126 “Meetups”.
What happens if your search is meetupzz~0.001? I reckon you still get a match?

I just don’t understand what’s different between our environments. I too have 15.2.2, fresh install.

I have a node called Meetups now for example, and I do this query:

But fuzziness level 2 (or 1 for that matter) renders exact same scoring:

I was under the impression that fuzziness dictates the Levenshtein distance allowed, as in how far off can you misspell it and still get a match?
Right now it’s only acting as a boolean rather than a gradual scale for all my tests.

But mainly, I always get the same score in my results as soon as I get a match, but evidently you can get different scores depending on the fuzziness, as visible in your first and third screenshot for that node 1126.

mistyn8 · March 7, 2025, 1:26pm

If it’s 0-1 in steps of 0.1?? for fuzzy… then your two tests are defaulting to 0.5?? would be a guess
0> (0.1 increments) <1 so 0.1 → 0.9 allowed otherwise it’s 0.5?

lucene.net 4.8.0.0 betas by the looks…

mistyn8 · March 7, 2025, 1:44pm

FuzzyQuery (Lucene 4.8.0 API)

LevenshteinAutomata (Lucene 4.8.0 API)

Maybe you can interogate the Max?

That’s not to say the Examine abstraction doesn’t do something else though, or lucene.net for that matter.

karlmacklin · April 2, 2025, 9:44am

Discobot keeps spamming me to mark a solution here, so I’ll conclude this thread by saying that I believe Fuzzy querying with Examine does not work as Examine explains it.

The float value is probably interpreted as int values, so the levenshtein step is either 0 steps, 1 step or 2 steps.

I cannot successfully recreate expected fuzzy query results locally. Lucene is mysterious and I have hopes with ElasticSearch in the future with expanded search integration capabilities.