Type | Read/write | Author | Availability |
Read | Finbourne | Provided with LUSID |
The Drive.Excel
provider enables you to write a Luminesce query that extracts data from one or more Excel (.xlsx
or .xlsm
) files stored in Drive.
Note: The LUSID user running the query must have sufficient access control permissions to both use the provider and enumerate target files and folders in Drive. This should automatically be the case if you are the domain owner.
The query returns a table of data assembled from the contents of the file or files in the order they are read.
See also: Drive.Csv, Drive.Sqlite, Drive.Xml, Drive.RawText, Drive.File
Basic usage
@x = use Drive.Excel
<options>
enduse;
select * from @x
Options
Drive.Excel
has options that enable you to filter or refine a query.
Note: The
--file
option is mandatory.
An option takes the form --<option>=<value>
, for example --file=pnl.xlsx
. Note no spaces are allowed either side of the =
operator.
If an option:
Takes a boolean value, then specifying that option (for example
--noHeader
) sets it to True; omitting the option specifies False.Takes multiple string values, then specify a comma-separated list, for example
--names=My,Column,Names
.
To see a help screen of available options, their data types, default values, and an explanation for each, run the following query using a suitable tool:
@x = use Drive.Excel
enduse;
select * from @x
Examples
In the following examples, the select * from @x
syntax at the end prints the table of data assembled by the query.
Note: For more examples, try the Luminesce Github repo.
Example 1: Extract data from particular cells in a particular Excel worksheet
The --range
option enables you to select particular cells, for example:
A1:B3
selects every cell between A1 and B3.5,3,20,9
selects every cell from row 5 to row 20 and in columns 3 to 9.,3,,9
selects every populated cell in columns 3 to 9 from the first row to the last populated row.,3,,
selects every populated cell from column 3 to the last populated column.
Note if you omit --range
all the populated cells in a worksheet are selected.
@x = use Drive.Excel
--file=/calculation/pnl.xlsx
--range=A1:B3
--worksheet=Portfolio1
enduse;
select * from @x
Example 2: Extract data from a matching Excel file
In this example, @@today
is a scalar variable that must resolve to exactly one column and one row of data.
@@today = select strftime('%Y%m%d', 'now');
@x = use Drive.Excel with @@today
--file=/calculation/pnl-{@@today}.xlsx
--range=A1:C12
enduse;
select * from @x
Example 3: Extract data from multiple matching Excel files in a particular folder
If --file
is a folder in Drive, you must specify the --folderFilter
option with a regular expression to nominate one or more files to read.
In this example, data is extracted from the same cells in three files pnl-1.xlsx
, pnl-2.xlsx
and pnl-3.xlsx
, but not from pnl-4.xlsx
. The --addFileName
option adds an extra column to the table of results showing the source of each record.
@x = use Drive.Excel
--file=/calculation
--folderFilter=pnl-[1-3].xlsx
--range=A1:C12
--addFileName
enduse;
select * from @x
Example 4: Extract data from Excel files stored in a ZIP archive
In this example, daily.zip
is stored in the root Drive folder; data is extracted from any Excel file within it that just has alphabetic characters in its file name.
@x = use Drive.Excel
--file=daily.zip
--zipFilter=[a-zA-Z].xlsx
--range=A1:C12
--addFileName
enduse;
select * from @x
Example 5: Extract data from an Excel file with no header row, instead supplying column names
Note the number of column names specified must match the inbound number of columns.
@x = use Drive.Excel
--file=/calculation/pnl.xlsx
--range=A1:C12
--noHeader
--names=Equity,Units,Cost
enduse;
select * from @x
Example 6: Infer data types of columns from the first 10 rows
In this example, the --ignoreInvalid
option is specified to ignore cells that cannot be successfully converted.
@x = use Drive.Excel
--file=/calculation/pnl.xlsx
--range=A1:Z12
--inferTypeRowCount=10
--ignoreInvalid
enduse;
select * from @x