Type | Read/write | Author | Availability |
|---|---|---|---|
Read | Finbourne | Provided with LUSID |
The Drive.Parquet provider enables you to write a Luminesce query that extracts data from one or more Apache Parquet files stored in Drive.
The query returns a table of data assembled from the contents of the file or files in the order they are read.
See also: Drive.Excel, Drive.Csv, Drive.Xml, Drive.RawText
Basic usage
@x = use Drive.Parquet
<options>
enduse;
select * from @xOptions
Drive.Parquet has options that enable you to filter or refine a query.
Note: The
--fileoption is mandatory.
An option takes the form --<option>=<value>, for example --file=trade-file.parquet. Note no spaces are allowed either side of the = operator. If an option:
Takes a boolean value, then specifying that option (for example
--addFileName) sets it to True; omitting the option specifies False.Takes multiple string values, then specify a comma-separated list, for example
--select=My,Column,Names.
Current options at article update time are listed in the table below. For the very latest information, run the following query using a suitable tool and examine the online help:
@x = use Drive.Parquet
enduse;
select * from @xCurrent options | Explanation |
|---|---|
| Mandatory. The file to read. It may also be a folder, in which case --folderFilter is also required to specify which files in the folder to process. [String] |
| Denotes this is searching an entire folder structure and provides a Regular Expression of path/file names within it that should be processed. All matches should be of the same format. [String] |
| Denotes this is a Zip file and provides a Regular Expression of path/file names within it that should be processed. All matches should be of the same format. [String] |
| Should a file/folder simply not exist, don't throw an error but return an empty table with column names and types created as best possible given other options. [Boolean] |
| Adds a column (the first column) to the result set which contains the file the row came from. [Boolean] |
| Column (by Name) that should be returned (comma delimited list). [String] |
Examples
In the following examples, the select * from @x syntax at the end prints the table of data assembled by the query.
Note: For more examples, try the Luminesce Github repo.
Example 1: Extract data from a particular Parquet file
@x = use Drive.Parquet
--file=/trade-files/eod.parquet
enduse;
select * from @xExample 2: Extract specific columns from a Parquet file
In this example, just column3 and column7 are extracted.
@x = use Drive.Parquet
--file=/trade-files/eod.parquet
--select=column3,column7
enduse;
select * from @xExample 3: Extract data from a particular Parquet file stored in a ZIP archive
In this example, daily.zip is stored in the root Drive folder, containing one or more Parquet files. Data is extracted from the archived Parquet file specified by the --zipFilter option.
@x = use Drive.Parquet
--file=daily.zip
--zipFilter=eod.parquet
enduse;
select * from @x