Type | Read/write | Author | Availability |
|---|---|---|---|
Read | Finbourne | Provided with LUSID |
The Drive.Excel provider enables you to write a Luminesce query that extracts data from one or more Excel (.xlsx or .xlsm) files stored in Drive.
The query returns a table of data assembled from the contents of the file or files in the order they are read.
See also: Drive.Csv, Drive.Sqlite, Drive.Xml, Drive.RawText, Drive.File
Basic usage
@x = use Drive.Excel
<options>
enduse;
select * from @xOptions
Drive.Excel has options that enable you to filter or refine a query.
Note: The
--fileoption is mandatory.
An option takes the form --<option>=<value>, for example --file=pnl.xlsx. Note no spaces are allowed either side of the = operator. If an option:
Takes a boolean value, then specifying that option (for example
--noHeader) sets it to True; omitting the option specifies False.Takes multiple string values, then specify a comma-separated list, for example
--names=My,Column,Names.
Current options at article update time are listed in the table below. For the very latest information, run the following query using a suitable tool and examine the online help:
@x = use Drive.Excel
enduse;
select * from @xCurrent options | Explanation |
|---|---|
| Mandatory. The file to read. Should be an Excel file (.xlsx or .xlsm). It may also be a folder, in which case --folderFilter is also required to specify which files in the folder to process. [String] |
| Denotes this is searching an entire folder structure and provides a Regular Expression of path/file names within it that should be processed. All matches should be of the same format. [String] |
| Denotes this is a Zip file and provides a Regular Expression of path/file names within it that should be processed. All matches should be of the same format. [String] |
| Should a file/folder simply not exist, don't throw an error but return an empty table with column names and types created as best possible given other options. [Boolean] |
| Adds a column (the first column) to the result set which contains the file the row came from. [Boolean] |
| Column Names either overrides the header row or steps in when there is no header row (comma-delimited list). [String] |
| Column types (comma delimited list of: 'Boolean, Date, DateTime, Decimal, Double, Int, BigInt, Text, Table', some columns may be left blank while others are specified) [String] |
| If non-zero and 'types' is not specified (or not specified for some columns) this will look through N rows to attempt to work out the column types for columns not pre-specified. [Int32] |
| Trim both leading and trailing whitespace when reading values. [Boolean] |
| Set this if there is no header row. [Boolean] |
| Whether to attempt a calculation of the imported cell range prior to import. [Boolean] |
| If specified will be used as the password used for password protected workbooks. [String] |
| The worksheet containing the cell range to import (name or index, will default to first). [String] |
| The cell range to import as either a specified range or a table name. Range formats can be any of:
Defaults to the used dimensions of the worksheet (equivalent to ',,,') [String] |
| If specified cells which can not be successfully converted to the target type will be ignored. [Boolean] |
| If the entire rows has only blank cells it will be ignored will be ignored. [Boolean] |
| Regex of values to map to 'null' in the returned data; You must Specify --convertValuesToNulls for this to be used. [String, Default: (^[nN][uU][lL]{1,2}$)|((\r\n|\n|\r)$)|(^(\r\n|\n|\r))|(^\s*$)] |
| Uses the Regex in 'valuesToMakeNull' to decide if some values should be converted to nulls. [Boolean] |
| Adds a column (the first or second column) to the result set which contains the worksheet name. [Boolean] |
Examples
In the following examples, the select * from @x syntax at the end prints the table of data assembled by the query.
Note: For more examples, try the Luminesce Github repo.
Example 1: Extract data from particular cells in a particular Excel worksheet
The --range option enables you to select particular cells, for example:
A1:B3selects every cell between A1 and B3.5,3,20,9selects every cell from row 5 to row 20 and in columns 3 to 9.,3,,9selects every populated cell in columns 3 to 9 from the first row to the last populated row.,3,,selects every populated cell from column 3 to the last populated column.
Note if you omit --range all the populated cells in a worksheet are selected.
@x = use Drive.Excel
--file=/calculation/pnl.xlsx
--range=A1:B3
--worksheet=Portfolio1
enduse;
select * from @xExample 2: Extract data from a matching Excel file
In this example, @@today is a scalar variable that must resolve to exactly one column and one row of data.
@@today = select strftime('%Y%m%d', 'now');
@x = use Drive.Excel with @@today
--file=/calculation/pnl-{@@today}.xlsx
--range=A1:C12
enduse;
select * from @xExample 3: Extract data from multiple matching Excel files in a particular folder
If --file is a folder in Drive, you must specify the --folderFilter option with a regular expression to nominate one or more files to read.
In this example, data is extracted from the same cells in three files pnl-1.xlsx, pnl-2.xlsx and pnl-3.xlsx, but not from pnl-4.xlsx. The --addFileName option adds an extra column to the table of results showing the source of each record.
@x = use Drive.Excel
--file=/calculation
--folderFilter=pnl-[1-3].xlsx
--range=A1:C12
--addFileName
enduse;
select * from @xExample 4: Extract data from Excel files stored in a ZIP archive
In this example, daily.zip is stored in the root Drive folder; data is extracted from any Excel file within it that just has alphabetic characters in its file name.
@x = use Drive.Excel
--file=daily.zip
--zipFilter=[a-zA-Z].xlsx
--range=A1:C12
--addFileName
enduse;
select * from @xExample 5: Extract data from an Excel file with no header row, instead supplying column names
Note the number of column names specified must match the inbound number of columns.
@x = use Drive.Excel
--file=/calculation/pnl.xlsx
--range=A1:C12
--noHeader
--names=Equity,Units,Cost
enduse;
select * from @xExample 6: Infer data types of columns from the first 10 rows
In this example, the --ignoreInvalid option is specified to ignore cells that cannot be successfully converted.
@x = use Drive.Excel
--file=/calculation/pnl.xlsx
--range=A1:Z12
--inferTypeRowCount=10
--ignoreInvalid
enduse;
select * from @x