Find the difference between two CSVs with ludicrous speed! e.g. compare two CSVs with 1M rows x 9 columns in under 600ms!
Table of Contents | Source: src/cmd/diff.rs | 🚀
Description | Examples | Usage | Diff Options | Common Options
Description ↩
Find the difference between two CSVs with ludicrous speed.
NOTE: diff does not support stdin. A file path is required for both arguments. Further, PRIMARY KEY VALUES MUST BE UNIQUE WITHIN EACH CSV.
To check if a CSV has unique primary key values, use qsv extdedup
with the same key columns using the --select option:
$ qsv extdedup --select keycol data.csv --no-outputThe duplicate count will be printed to stderr.
Examples ↩
Find the difference between two CSVs
qsv diff left.csv right.csvFind the difference between two CSVs when the right CSV has no headers
qsv diff left.csv --no-headers-right right-noheaders.csvFind the difference between two CSVs when the left CSV uses a tab delimiter
qsv diff --delimiter-left '\t' left.csv right-tab.tsvFind the difference between two CSVs when the left CSV uses a semicolon delimiter
qsv diff --delimiter-left ';' left.csv right-semicolon.csvFind the difference between two CSVs and write output with tab delimiter to a file
qsv diff -o diff-tab.tsv --delimiter-output '\t' left.csv right.csvFind the difference between two CSVs and write output with semicolon delimiter to a file
qsv diff -o diff-semicolon.csv --delimiter-output ';' left.csv right.csvFind the difference comparing records with the same values in the first two columns
qsv diff --key 0,1 left.csv right.csvFind the difference using first two columns as key and sort result by those columns
qsv diff -k 0,1 --sort-columns 0,1 left.csv right.csvFind the difference but replace equal field values with empty string (key fields still appear)
qsv diff --drop-equal-fields left.csv right.csvFind the difference but do not output headers in the result
qsv diff --no-headers-output left.csv right.csvFind the difference when both CSVs have no headers (generic headers _col_1, _col_2, etc. are used)
qsv diff --no-headers-left --no-headers-right left.csv right.csvFor more examples, see tests.
Usage ↩
qsv diff [options] [<input-left>] [<input-right>]
qsv diff --helpDiff Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑‑no‑headers‑left |
flag | When set, the first row will be considered as part of the left CSV to diff. (When not set, the first row is the header row and will be skipped during the diff. It will always appear in the output.) | |
‑‑no‑headers‑right |
flag | When set, the first row will be considered as part of the right CSV to diff. (When not set, the first row is the header row and will be skipped during the diff. It will always appear in the output.) | |
‑‑no‑headers‑output |
flag | When set, the diff result won't have a header row in its output. If not set and both CSVs have no headers, headers in the result will be: _col_1,_col_2, etc. | |
‑‑delimiter‑left |
string | The field delimiter for reading CSV data on the left. Must be a single character. (default: ,) | |
‑‑delimiter‑right |
string | The field delimiter for reading CSV data on the right. Must be a single character. (default: ,) | |
‑‑delimiter‑output |
string | The field delimiter for writing the CSV diff result. Must be a single character. (default: ,) | |
‑k,‑‑key |
string | The column indices that uniquely identify a record as a comma separated list of 0-based indices, e.g. 0,1,2 or column names, e.g. name,age. Note that when selecting columns by name, only the left CSV's headers are used to match the column names and it is assumed that the right CSV has the same selected column names in the same order as the left CSV. (default: 0) | |
‑‑sort‑columns |
string | The column indices by which the diff result should be sorted as a comma separated list of indices, e.g. 0,1,2 or column names, e.g. name,age. Records in the diff result that are marked as "modified" ("delete" and "add" records that have the same key, but have different content) will always be kept together in the sorted diff result and so won't be sorted independently from each other. Note that when selecting columns by name, only the left CSV's headers are used to match the column names and it is assumed that the right CSV has the same selected column names in the same order as the left CSV. | |
‑‑drop‑equal‑fields |
flag | Drop values of equal fields in modified rows of the CSV diff result (and replace them with the empty string). Key field values will not be dropped. | |
‑j,‑‑jobs |
string | The number of jobs to run in parallel. When not set, the number of jobs is set to the number of CPUs detected. |
Common Options ↩
| Option | Type | Description | Default |
|---|---|---|---|
‑h,‑‑help |
flag | Display this message | |
‑o,‑‑output |
string | Write output to instead of stdout. | |
‑d,‑‑delimiter |
string | Set ALL delimiters to this character. Overrides --delimiter-right, --delimiter-left and --delimiter-output. |
Source: src/cmd/diff.rs
| Table of Contents | README