Usage > Standardization Input Formats
Currently, Standardization supports these formats of input files
- Cobol (see Cobrix GitHub)
- CSV (see rfc4180)
- FixedWidth (see
Link to be added repo does not exist yet) - JSON (see json.org)
- Parquet (see Apache Parquet)
- XML (see xml.com)
When running standardization one of the formats of the list has to be specified.
...standardization options...
--format <format>
--format-specific-optionX valueY
Cobol
Cobol format value is cobol. Format options are
| Option | Values domain | Description | Default |
|---|---|---|---|
| charset | Any valid charset name | The character set of the input. | UTF-8 |
| cobol-encoding | ascii or ebcdic |
Specifies encoding of mainframe files | - |
| cobol-is-text | Boolean | Specifies if the mainframe file is ASCII text file | false |
| cobol-trimming-policy | none, left, right, both |
Specify string trimming policy for mainframe files | none |
| copybook | String | Path to a copybook for COBOL data format | - |
| is-xcom | Boolean | Does a mainframe file in COBOL format contain XCOM record headers | false |
CSV
CSV format value is csv. Format options are
| Option | Values domain | Description | Default |
|---|---|---|---|
| charset | Any valid charset names | The character set. | UTF-8 |
| csv-escape | Any char | Escape character. Escaped quote characters are ignored. | \ |
| csv-quote | Any char | Quote character. Delimiters inside quotes are ignored. | " |
| delimiter | Any char or unicode such as U+00A1 |
Delimiter the column values on a row | , |
| header | Boolean | Specifies if the input data have a CSV style header | false |
| null-value | String | Defines how null values are represented in a fixed-width file format |
"" (empty string) |
Fixed Width
Fixed Width is a custom in house made format. Requires width metadata, more in Usage - Schema.
Fixed Width format value is fixed-width. Format options are
| Option | Values domain | Description | Default |
|---|---|---|---|
| charset | Any valid charset names | The character set. | UTF-8 |
| empty-values-as-nulls | Boolean | If true treats empty values as nulls |
false |
| null-value | String | Defines how null values are represented in a fixed-width file format |
"" (empty string) |
| trimValues | Boolean | Uses Java’s String .trim method. Removes whitespaces from left and right ends. Required if data is to be casted to any Numeric |
false |
JSON
JSON format value is json. Format options are
| Option | Values domain | Description | Default |
|---|---|---|---|
| charset | Any valid charset names | The character set. | UTF-8 |
Parquet
Has no extra options. Only --format parquet.
XML
XML format value is xml. Format options are
| Option | Values domain | Description | Default |
|---|---|---|---|
| charset | Any valid charset names | The character set. | UTF-8 |
| row-tag | String | The tag of the xml file to treat as a row. For example, in the following xml <books> <book><book> ...</books>, the appropriate value would be book. |
- |