Drive.Parquet

Type

Read/write

Author

Availability

Direct provider

Read

Finbourne

Provided with LUSID

The Drive.Parquet provider enables you to write a Luminesce query that extracts data from one or more Apache Parquet files stored in Drive.

Note: The LUSID user running the query must have sufficient access control permissions to both use the provider and enumerate target files and folders in Drive. This should automatically be the case if you are the domain owner.

The query returns a table of data assembled from the contents of the file or files in the order they are read.

See also: Drive.Excel, Drive.Csv, Drive.Xml, Drive.RawText

Basic usage

@x = use Drive.Parquet
<options>
enduse;
select * from @x

Options

Drive.Parquet has options that enable you to filter or refine a query.

Note: The --file option is mandatory.

An option takes the form --<option>=<value>, for example --file=trade-file.parquet. Note no spaces are allowed either side of the = operator. If an option:

  • Takes a boolean value, then specifying that option (for example --addFileName) sets it to True; omitting the option specifies False.

  • Takes multiple string values, then specify a comma-separated list, for example --select=My,Column,Names.

To see a help screen of available options, their data types, default values, and an explanation for each, run the following query using a suitable tool:

@x = use Drive.Parquet
enduse;
select * from @x

Examples

In the following examples, the select * from @x syntax at the end prints the table of data assembled by the query.

Note: For more examples, try the Luminesce Github repo.

Example 1: Extract data from a particular Parquet file

@x = use Drive.Parquet
--file=/trade-files/eod.parquet
enduse;
select * from @x

Example 2: Extract specific columns from a Parquet file

In this example, just column3 and column7 are extracted.

@x = use Drive.Parquet
--file=/trade-files/eod.parquet
--select=column3,column7
enduse;
select * from @x

Example 3: Extract data from a particular Parquet file stored in a ZIP archive

In this example, daily.zip is stored in the root Drive folder, containing one or more Parquet files. Data is extracted from the archived Parquet file specified by the --zipFilter option.

@x = use Drive.Parquet
--file=daily.zip
--zipFilter=eod.parquet
enduse;
select * from @x