Introduction


Omniscope supports reading and writing Apache Parquet files. This functionality was introduced in a daily build in December 2001, and will be available in a Rock build in 2022.


Parquet is a popular column-based file format used by Hadoop systems. It is designed to efficiently storage large data sets and has the file extension .parquet.


Reading a Parquet file


Inside your Omniscope workflow, add a new File input block. Double-click on the block to open the options. Select the location of the parquet file. If the file has the expected .parquet extension Omniscope will automatically pick the Parquet file format. Click the Play button to execute and read the data:



Writing a Parquet file


In side your Omniscope workflow, add a new File output block. Connect the data that you want to write to your output block:



Double-click on the File output block to open the options. Select the location and name of the file you want to create. Change the Format to Apache Parquet (.parquet file). Click the Play button to write the data:



Limitations


When reading a Parquet file, Omniscope only supports the following logical types: STRING, ENUM, INTEGER, DECIMAL, DATE, TIME, TIMESTAMP, JSON. Other types, such as LIST and MAP are not currently supported. If you need to import data with one or more missing types please get in touch with us, as it may be possible for us to develop support if required.