Drive.Excel

Prev Next

Type

Read/write

Author

Availability

Direct provider

Read

Finbourne

Provided with LUSID

The Drive.Excel provider enables you to write a Luminesce query that extracts data from one or more Excel (.xlsx or .xlsm) files stored in Drive.

The query returns a table of data assembled from the contents of the file or files in the order they are read.

See also: Drive.Csv, Drive.Sqlite, Drive.Xml, Drive.RawText, Drive.File

Basic usage

@x = use Drive.Excel
<options>
enduse;
select * from @x

Options

Drive.Excel has options that enable you to filter or refine a query.

Note: The --file option is mandatory.

An option takes the form --<option>=<value>, for example --file=pnl.xlsx. Note no spaces are allowed either side of the = operator. If an option:

  • Takes a boolean value, then specifying that option (for example --noHeader) sets it to True; omitting the option specifies False.

  • Takes multiple string values, then specify a comma-separated list, for example --names=My,Column,Names.

Current options at article update time are listed in the table below. For the very latest information, run the following query using a suitable tool and examine the online help:

@x = use Drive.Excel
enduse;
select * from @x

Current options

Explanation

file (-f)

Mandatory. The file to read. Should be an Excel file (.xlsx or .xlsm). It may also be a folder, in which case --folderFilter is also required to specify which files in the folder to process. [String]

folderFilter

Denotes this is searching an entire folder structure and provides a Regular Expression of path/file names within it that should be processed. All matches should be of the same format. [String]

zipFilter (-z)

Denotes this is a Zip file and provides a Regular Expression of path/file names within it that should be processed. All matches should be of the same format. [String]

allowMissing

Should a file/folder simply not exist, don't throw an error but return an empty table with column names and types created as best possible given other options. [Boolean]

addFileName

Adds a column (the first column) to the result set which contains the file the row came from. [Boolean]

names

Column Names either overrides the header row or steps in when there is no header row (comma-delimited list). [String]

types

Column types (comma delimited list of: 'Boolean, Date, DateTime, Decimal, Double, Int, BigInt, Text, Table', some columns may be left blank while others are specified) [String]

inferTypeRowCount

If non-zero and 'types' is not specified (or not specified for some columns) this will look through N rows to attempt to work out the column types for columns not pre-specified. [Int32]

trim

Trim both leading and trailing whitespace when reading values. [Boolean]

noHeader

Set this if there is no header row. [Boolean]

calculate

Whether to attempt a calculation of the imported cell range prior to import. [Boolean]

password

If specified will be used as the password used for password protected workbooks. [String]

worksheet (-w)

The worksheet containing the cell range to import (name or index, will default to first). [String]

range (-r)

The cell range to import as either a specified range or a table name. Range formats can be any of:

  • 'A1:B3' (standard Excel range format)

  • '1,3,7,9' (rows 1-7, columns 3-9)

  • ',3,,9' (all populated rows, but only columns 3-9)

  • ',3,,' (all populated rows, but only columns 3 to the last populated one)

Defaults to the used dimensions of the worksheet (equivalent to ',,,') [String]

ignoreInvalid

If specified cells which can not be successfully converted to the target type will be ignored. [Boolean]

ignoreBlankRows

If the entire rows has only blank cells it will be ignored will be ignored. [Boolean]

valuesToMakeNull

Regex of values to map to 'null' in the returned data; You must Specify --convertValuesToNulls for this to be used. [String, Default: (^[nN][uU][lL]{1,2}$)|((\r\n|\n|\r)$)|(^(\r\n|\n|\r))|(^\s*$)]

convertValuesToNulls

Uses the Regex in 'valuesToMakeNull' to decide if some values should be converted to nulls. [Boolean]

addWorksheetName

Adds a column (the first or second column) to the result set which contains the worksheet name. [Boolean]

Examples

In the following examples, the select * from @x syntax at the end prints the table of data assembled by the query.

Note: For more examples, try the Luminesce Github repo.

Example 1: Extract data from particular cells in a particular Excel worksheet

The --range option enables you to select particular cells, for example:

  • A1:B3 selects every cell between A1 and B3.

  • 5,3,20,9 selects every cell from row 5 to row 20 and in columns 3 to 9.

  • ,3,,9 selects every populated cell in columns 3 to 9 from the first row to the last populated row.

  • ,3,, selects every populated cell from column 3 to the last populated column.

Note if you omit --range all the populated cells in a worksheet are selected.

@x = use Drive.Excel
--file=/calculation/pnl.xlsx
--range=A1:B3
--worksheet=Portfolio1
enduse;
select * from @x

Example 2: Extract data from a matching Excel file

In this example, @@today is a scalar variable that must resolve to exactly one column and one row of data.

@@today = select strftime('%Y%m%d', 'now');
@x = use Drive.Excel with @@today
--file=/calculation/pnl-{@@today}.xlsx
--range=A1:C12
enduse;
select * from @x

Example 3: Extract data from multiple matching Excel files in a particular folder

If --file is a folder in Drive, you must specify the --folderFilter option with a regular expression to nominate one or more files to read.

In this example, data is extracted from the same cells in three files pnl-1.xlsx, pnl-2.xlsx and pnl-3.xlsx, but not from pnl-4.xlsx. The --addFileName option adds an extra column to the table of results showing the source of each record.

@x = use Drive.Excel
--file=/calculation
--folderFilter=pnl-[1-3].xlsx
--range=A1:C12
--addFileName
enduse;
select * from @x

Example 4: Extract data from Excel files stored in a ZIP archive

In this example, daily.zip is stored in the root Drive folder; data is extracted from any Excel file within it that just has alphabetic characters in its file name.

@x = use Drive.Excel
--file=daily.zip
--zipFilter=[a-zA-Z].xlsx
--range=A1:C12
--addFileName
enduse;
select * from @x

Example 5: Extract data from an Excel file with no header row, instead supplying column names

Note the number of column names specified must match the inbound number of columns.

@x = use Drive.Excel
--file=/calculation/pnl.xlsx
--range=A1:C12
--noHeader
--names=Equity,Units,Cost
enduse;
select * from @x

Example 6: Infer data types of columns from the first 10 rows

In this example, the --ignoreInvalid option is specified to ignore cells that cannot be successfully converted.

@x = use Drive.Excel
--file=/calculation/pnl.xlsx
--range=A1:Z12
--inferTypeRowCount=10
--ignoreInvalid
enduse;
select * from @x