Skip to content

Latest commit

 

History

History
80 lines (55 loc) · 4.65 KB

File metadata and controls

80 lines (55 loc) · 4.65 KB

input

Read CSV data with special commenting, quoting, trimming, line-skipping & non-UTF8 encoding handling rules. Typically used to "normalize" a CSV for further processing with other qsv commands.

Table of Contents | Source: src/cmd/input.rs

Description | Usage | Input Options | Common Options

Description

Read CSV data with special commenting, quoting, trimming, line-skipping & non UTF-8 encoding rules and transforms it to a "normalized", UTF-8 encoded CSV.

Generally, all qsv commands support basic options like specifying the delimiter used in CSV data. However, this does not cover all possible types of CSV data. For example, some CSV files don't use '"' for quotes or use different escaping styles.

Also, CSVs with preamble lines can have them skipped with the --skip-lines & --auto-skip options. Similarly, --skip-lastlines allows epilogue lines to be skipped.

Finally, non UTF-8 encoded files are "lossy" saved to UTF-8 by default, replacing all invalid UTF-8 sequences with �. Note though that this is not true transcoding.

If you need to properly transcode non UTF-8 files, you'll need to use a tool like iconv before processing it with qsv - e.g. to convert an ISO-8859-1 encoded file to UTF-8: iconv -f ISO-8859-1 -t UTF-8 input.csv -o utf8_output.csv.

You can change this behavior with the --encoding-errors option.

See https://github.com/dathere/qsv#utf-8-encoding for more details.

This command is typically used at the beginning of a data pipeline (thus the name input) to normalize & prepare CSVs for further processing with other qsv commands.

For examples, see https://github.com/dathere/qsv/blob/master/tests/test_input.rs.

Usage

qsv input [options] [<input>]
qsv input --help

Input Options

      Option        Type Description Default
 ‑‑quote  string The quote character to use. "
 ‑‑escape  string The escape character to use. When not specified, quotes are escaped by doubling them.
 ‑‑no‑quoting  flag Disable quoting completely when reading CSV data.
 ‑‑quote‑style  string The quoting style to use when writing CSV data. Possible values: all, necessary, nonnumeric and never. All: Quotes all fields. Necessary: Quotes fields only when necessary - when fields contain a quote, delimiter or record terminator. Quotes are also necessary when writing an empty record (which is indistinguishable from a record with one empty field). NonNumeric: Quotes all fields that are non-numeric. Never: Never write quotes. Even if it produces invalid CSV. necessary
 ‑‑skip‑lines  string The number of preamble lines to skip.
 ‑‑auto‑skip  flag Sniffs a CSV for preamble lines and automatically skips them. Takes precedence over --skip-lines option. Does not work with .
 ‑‑skip‑lastlines  string The number of epilogue lines to skip.
 ‑‑trim‑headers  flag Trim leading & trailing whitespace & quotes from header values.
 ‑‑trim‑fields  flag Trim leading & trailing whitespace from field values.
 ‑‑comment  string The comment character to use. When set, lines starting with this character will be skipped.
 ‑‑encoding‑errors  string How to handle UTF-8 encoding errors. Possible values: replace, skip, strict. replace: Replace invalid UTF-8 sequences with �. skip: Fields with encoding errors are "". strict: Fail on any encoding errors. replace

Common Options

     Option      Type Description Default
 ‑h,
‑‑help 
flag Display this message
 ‑o,
‑‑output 
string Write output to instead of stdout.
 ‑d,
‑‑delimiter 
string The field delimiter for reading CSV data. Must be a single character. (default: ,)

Source: src/cmd/input.rs | Table of Contents | README