File Formats

Spice currently supports CSV, JSON, and Parquet data file-formats for data connectors that can read files from a file system or cloud object storage (i.e. s3://, abfs://, file://, etc.). Support for Iceberg and other file-formats are on the roadmap.

The parameters supported for specific file-formats are detailed on this page.

Parquet

Spice automatically supports reading any Parquet file, regardless of the compression codec or data encoding used.

Compression codecs:

UNCOMPRESSED
SNAPPY
GZIP
LZO
BROTLI
LZ4 (deprecated in favor of LZ4_RAW)
LZ4_RAW
ZSTD

Data encodings:

PLAIN
PLAIN_DICTIONARY / RLE_DICTIONARY
RLE
BIT_PACKED (deprecated in favor of RLE)
DELTA_BINARY_PACKED
DELTA_LENGTH_BYTE_ARRAY
DELTA_BYTE_ARRAY
BYTE_STREAM_SPLIT

CSV

Parameters

csv_has_header: Optional. Indicate if the CSV file has header row. Defaults to true
csv_quote: Optional. A one-character string used to quote fields containing special characters. Defaults to "
csv_escape: Optional. A one-character string used to represent special characters or to include characters that would normally be interpreted as delimiters or new line characters within a field value. Defaults to null
csv_schema_infer_max_records: Optional. A number used to set the limit in terms of records to scan to infer the schema. Defaults to 1000
csv_delimiter: Optional. A one-character string used to separate individual fields. Defaults to ,

JSON

Parameters

json_format: Optional. Specifies the JSON format to parse. Valid values are array, ndjson, and jsonl. Defaults to jsonl

Parquet​

CSV​

Parameters​

JSON​

Parameters​

Parquet

CSV

Parameters

JSON

Parameters