Terms of use?
By using this project or its source code, for any purpose and in any shape or form, you grant your implicit agreement to all the following statements:
- You condemn Russia and its military aggression against Ukraine
- You recognize that Russia is an occupant that unlawfully invaded a sovereign state
- You support Ukraine's territorial integrity, including its claims over temporarily occupied territories of Crimea and Donbas
- You reject false narratives perpetuated by Russian state propaganda
To learn more about the war and how you can help, click here. Glory to Ukraine! 🇺🇦
C# .NET fast fuzzy string matching implementation of Seat Geek's well known python FuzzyWuzzy algorithm.
Nitrous-boosted Bit-parallel accelerated version of the original FuzzySharp.
Benchmark comparison of naive DP Levenshtein distance calculation (baseline), FuzzySharp, Fastenshtein and Quickenshtein:
Random words of 3 to 1024 random chars (LevenshteinLarge.cs):
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|
| NaiveDp | 231.563 ms | 57.5403 ms | 3.1540 ms | 1.00 | 0.02 | 43500.0000 | 34500.0000 | 275312920 B | 1.000 |
| FuzzySharp | 141.820 ms | 4.0905 ms | 0.2242 ms | 0.61 | 0.01 | - | - | 1545732 B | 0.006 |
| Fastenshtein | 123.356 ms | 13.0959 ms | 0.7178 ms | 0.53 | 0.01 | - | - | 34028 B | 0.000 |
| Quickenshtein | 12.918 ms | 12.8046 ms | 0.7019 ms | 0.06 | 0.00 | - | - | 12 B | 0.000 |
| Raffinert.FuzzySharp | 4.970 ms | 0.3311 ms | 0.0181 ms | 0.02 | 0.00 | - | - | 3051 B | 0.000 |
Install-Package Raffinert.FuzzySharp
or
dotnet add package Raffinert.FuzzySharp
Fuzz.Ratio("mysmilarstring", "myawfullysimilarstirng");
// 72
Fuzz.Ratio("mysmilarstring", "mysimilarstring");
// 97Fuzz.PartialRatio("similar", "somewhresimlrbetweenthisstring");
// 71Fuzz.TokenSortRatio("order words out of", " words out of order");
// 100
Fuzz.PartialTokenSortRatio("order words out of", " words out of order");
// 100Fuzz.TokenSetRatio("fuzzy was a bear", "fuzzy fuzzy fuzzy bear");
// 100
Fuzz.PartialTokenSetRatio("fuzzy was a bear", "fuzzy fuzzy fuzzy bear");
// 100Fuzz.TokenInitialismRatio("NASA", "National Aeronautics and Space Administration");
// 89
Fuzz.TokenInitialismRatio("NASA", "National Aeronautics Space Administration");
// 100
Fuzz.TokenInitialismRatio("NASA", "National Aeronautics Space Administration, Kennedy Space Center, Cape Canaveral, Florida 32899");
// 53
Fuzz.PartialTokenInitialismRatio("NASA", "National Aeronautics Space Administration, Kennedy Space Center, Cape Canaveral, Florida 32899");
// 100Fuzz.TokenAbbreviationRatio("bl 420", "Baseline section 420", PreprocessMode.Full);
// 40
Fuzz.PartialTokenAbbreviationRatio("bl 420", "Baseline section 420", PreprocessMode.Full);
// 67Fuzz.WeightedRatio("The quick brown fox jimps ofver the small lazy dog", "the quick brown fox jumps over the small lazy dog");
// 95Find the best match(es) from a collection of choices.
Process.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
// (string: Dallas Cowboys, score: 90, index: 3)Process.ExtractTop("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" }, limit: 3);
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7)]Process.ExtractAll("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" });
// [(string: google, score: 83, index: 0), (string: bing, score: 22, index: 1), ...]
// With score cutoff
Process.ExtractAll("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" }, cutoff: 40);
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7)]Process.ExtractSorted("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" });
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7), ...]Extraction uses WeightedRatio and Full preprocessing by default. Override these in the method parameters to use different scorers and processing:
Process.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" }, s => s, ScorerCache.Get<DefaultRatioScorer>());
// (string: Dallas Cowboys, score: 57, index: 3)Extraction can operate on objects of any type. Use the processor parameter to reduce the object to the string it should be compared on:
var events = new[]
{
new[] { "chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm" },
new[] { "new york yankees vs boston red sox", "Fenway Park", "2011-05-11", "8pm" },
new[] { "atlanta braves vs pittsburgh pirates", "PNC Park", "2011-05-11", "8pm" },
};
var query = new[] { "new york mets vs chicago cubs", "CitiField", "2017-03-19", "8pm" };
var best = Process.ExtractOne(query, events, strings => strings[0]);
// (value: { "chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm" }, score: 95, index: 0)The Process.Configure() fluent builder creates reusable, immutable pipelines with preconfigured scoring, caching, and parallel execution.
Equivalent to the static Process methods, but reusable across multiple queries:
var pipeline = Process.Configure().Build();
var result1 = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
var result2 = pipeline.ExtractOne("chicago cubs", baseballStrings);var pipeline = Process.Configure()
.WithScorer(ScorerCache.Get<DefaultRatioScorer>())
.Build();
var result = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });Enable multi-threaded processing for large choice sets:
var pipeline = Process.Configure()
.Parallel()
.Build();
var results = pipeline.ExtractAll("goolge", largeChoicesList);With ParallelOptions for fine-grained control:
var pipeline = Process.Configure()
.Parallel(new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount })
.Build();Automatic caching creates a CachedWeightedRatioScorer per extraction call, pre-initializing internal data structures for the query string:
var pipeline = Process.Configure()
.Cached()
.Build();
var result = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });Combine caching and parallelism. Builder methods are order independent -- .Cached().Parallel() and .Parallel().Cached() produce identical results:
var pipeline = Process.Configure()
.Cached()
.Parallel(new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount })
.Build();
var results = pipeline.ExtractAll("goolge", largeChoicesList);For maximum performance when running the same query against different choice sets, provide an externally managed ICachedRatioScorer. The scorer pre-initializes once and is reused across all extraction calls:
using var scorer = new CachedWeightedRatioScorer("new york mets at atlanta braves");
var pipeline = Process.Configure()
.Cached(scorer)
.Parallel()
.Build();
var results1 = pipeline.ExtractAll(choiceSet1);
var results2 = pipeline.ExtractAll(choiceSet2);Note: External cached scorers implement
IDisposable. Useusingto ensure proper cleanup.
Pass a CancellationToken via ParallelOptions to cancel long-running parallel extractions:
var cts = new CancellationTokenSource();
var pipeline = Process.Configure()
.Cached()
.Parallel(new ParallelOptions { CancellationToken = cts.Token })
.Build();
// Throws OperationCanceledException if cancelled
var results = pipeline.ExtractAll(query, largeChoicesList).ToList();Stateless scorers for use with Process static methods and the WithScorer() builder method:
var ratio = ScorerCache.Get<DefaultRatioScorer>();
var partialRatio = ScorerCache.Get<PartialRatioScorer>();
var tokenSet = ScorerCache.Get<TokenSetScorer>();
var partialTokenSet = ScorerCache.Get<PartialTokenSetScorer>();
var tokenSort = ScorerCache.Get<TokenSortScorer>();
var partialTokenSort = ScorerCache.Get<PartialTokenSortScorer>();
var tokenAbbreviation = ScorerCache.Get<TokenAbbreviationScorer>();
var partialTokenAbbrev = ScorerCache.Get<PartialTokenAbbreviationScorer>();
var weighted = ScorerCache.Get<WeightedRatioScorer>();Pre-initialize with a query string for repeated comparisons. These implement IDisposable:
using var scorer = new CachedWeightedRatioScorer("search query");
int score = scorer.Score("candidate string");Available cached scorers:
CachedWeightedRatioScorer-- weighted combination (default for.Cached())CachedDefaultRatioScorer-- simple Levenshtein ratioCachedTokenSortScorer-- token sort ratioCachedTokenSetScorer-- token set ratioCachedPartialTokenSetScorer-- partial token set ratioCachedTokenDifferenceScorer-- token difference ratio
Low-level access to the bit-parallel Levenshtein distance implementation:
// Edit distance
int distance = Levenshtein.Distance("kitten", "sitting");
// 3
// Normalized similarity (1.0 = identical, 0.0 = completely different)
double similarity = Levenshtein.NormalizedSimilarity("kitten", "sitting");
// Edit operations to transform one string into another
EditOp[] ops = Levenshtein.GetEditOps("kitten", "sitting");
// [Replace(0->0), Equal, Equal, Equal, Insert(4->4), Replace(5->6)]The Levenshtein, Indel, and LongestCommonSubsequence classes also offer an instance API for one-to-many comparisons. The constructor pre-computes a bit-parallel pattern match vector from the source string, which is then reused across all subsequent calls. This avoids rebuilding the internal data structure on every comparison, giving a significant speedup when comparing one source against many targets.
All three implement IDisposable -- use using to return pooled arrays.
using var lev = new Levenshtein("chicago cubs vs new york mets");
int d1 = lev.DistanceFrom("new york mets vs chicago cubs");
int d2 = lev.DistanceFrom("atlanta braves vs pittsburgh pirates");Indel distance counts only insertions and deletions (no replacements). NormalizedSimilarityWith returns a value between 0.0 (completely different) and 1.0 (identical):
using var indel = new Indel("chicago cubs");
int distance = indel.DistanceFrom("chicago white sox");
double similarity = indel.NormalizedSimilarityWith("chicago white sox");A generic variant IndelT<T> is available for comparing sequences of any IEquatable<T>:
using var indel = new IndelT<string>(new[] { "hello", "world" });
int distance = indel.DistanceFrom(new[] { "hello", "there" });
double similarity = indel.NormalizedSimilarityWith(new[] { "hello", "there" });LCS distance is defined as max(len1, len2) - LCS_length:
using var lcs = new LongestCommonSubsequence("chicago cubs");
int distance = lcs.DistanceFrom("chicago white sox");By default, Fuzz methods compare strings as-is. Pass PreprocessMode.Full to normalize whitespace, lowercase, and strip non-alphanumeric characters before comparing:
Fuzz.Ratio("new york mets", "NEW YORK METS");
// < 100 (case sensitive)
Fuzz.Ratio("new york mets", "NEW YORK METS", PreprocessMode.Full);
// 100 (case insensitive after preprocessing)Process extraction methods use PreprocessMode.Full by default. Pass a custom processor function to override this behavior.
- Adam Cohen (seatgeek/fuzzywuzzy)
- Antti Haapala (python-Levenshtein)
- David Necas (python-Levenshtein)
- Jacob Bayer (original FuzzySharp library)
- Max Bachmann (RapidFuzz)
- Mikko Ohtamaa (python-Levenshtein)
- Panayiotis (Java implementation I heavily borrowed from)
Support the project through GitHub Sponsors or via PayPal.
See CHANGELOG.md for release history.