Description
When using the V2 rolling checksum algorithm, files that are identical or very slightly different result in huge deltas: the whole new file gets added as the delta.
Environment
- repo freshly cloned from the current master branch (commit d87ee31)
- VS2022 17.2.6 on Windows 10 x64
I had to make a small code change so the command line app would use the V2 algorithm by default:
diff --git a/source/Octodiff/Core/SupportedAlgorithms.cs b/source/Octodiff/Core/SupportedAlgorithms.cs
index 2cc2aa5..5552f13 100644
--- a/source/Octodiff/Core/SupportedAlgorithms.cs
+++ b/source/Octodiff/Core/SupportedAlgorithms.cs
@@ -52,7 +52,7 @@ namespace Octodiff.Core
public virtual IRollingChecksum Default()
{
- return Adler32Rolling();
+ return Adler32Rolling(true);
}
public virtual IRollingChecksum Create(string algorithm)
Steps to reproduce
- grab a random binary file; my test was
kernel32.dll from windows\system32
- create 2 copies of it:
copy1.dll and copy2.dll
- modify
copy2.dll very slightly; I simply changed the first byte from 'M' to 'A'
- run octodiff to create the deltas:
Octodiff.exe signature kernel32.dll signature.bin
Octodiff.exe delta signature.bin copy1.dll delta1.bin
Octodiff.exe delta signature.bin copy2.dll delta2.bin
- observe how the delta files are very "not delta-y"
Other notes
The V1 version of the algorithm does produce expectedly small delta files.