Parquet
Apache Parquet is a free and open-source column-oriented data storage format.
Structure¶
The expected schema for parquet files is as follows:
schema.parquet
assetId: { type: 'UTF8' }
measures: {
repeated: true,
fields: {
name: { type: 'UTF8' }
}
}
data: {
repeated: true,
fields: {
time: { type: 'UTF8' },
measureName: { type: 'UTF8' },
value: { type: 'DOUBLE', optional: true },
stringValue: { type: 'UTF8', optional: true }
}
}
The root of schema is an object which contains multiple fields:
assetId
: a UTF8 string containing a unique mapping ID for a sensor. If a sensor with the mapping ID does not exist, a new sensor will be automatically provisioned.measures
: an array of the measure names included within the data array.data
: an array of data points. There should be one entry per timestamp.data.time
: the timestamp of the entry. Should be a UTF8 string conforming with RFC3339 dates and times, including timezone - more details on the required format are available here.data.measureName
: a string name given to the measurement. This value will be the name associated with the time series within Senseye.data.value
ordata.stringValue
: the value of the measure. This can be either a double-precision float number or a string.
File¶
Name¶
To avoid file collisions, the name of the file should be unique. To aid debugging, we recommend composing the file name using two components: an ID for the asset being monitored, and a unix timestamp, separated by @
- as an example: motorM23@1651149458.parquet
.
Size¶
Please limit Parquet files to no more than 1MB.
Version¶
The file must be exactly parquet format version 1.0.
Encoding¶
The data file must use the parquet binary format and be encoded with one of the following supported codecs:
- PLAIN
- RLE
Compression¶
The supported compression methods are:
- UNCOMPRESSED
- GZIP
- SNAPPY
- LZO
- BROTLI