Type | Read/write | Author | Availability |
|---|---|---|---|
Read | Finbourne | Provided with LUSID |
The Drive.Csv provider enables you to write a Luminesce query that extracts data from one or more CSV or similar pipe-delimited or row-based text files stored in Drive.
The query returns a table of data assembled from the contents of the file or files in the order they are read.
See also: Drive.Excel, Drive.Sqlite, Drive.Xml, Drive.RawText, Drive.File
Basic usage
@x = use Drive.Csv
<options>
enduse;
select * from @xOptions
Drive.Csv has options that enable you to filter or refine a query.
Note: The
--fileoption is mandatory.
An option takes the form --<option>=<value>, for example --file=trade-file.csv. Note no spaces are allowed either side of the = operator. If an option:
takes a boolean value, then specifying that option (for example
--noHeader) sets it to True; omitting the option specifies Falsetakes multiple string values, then specify a comma-separated list, for example
--names=My,Column,Names
Current options at article update time are listed in the table below. For the very latest information, run the following query using a suitable tool and examine the online help:
@x = use Drive.Csv
enduse;
select * from @xCurrent options | Explanation |
|---|---|
| Mandatory. The file to read. Should be text (any extension) or compressed (.zip, .gz or .tar.gz). It may also be a folder, in which case --folderFilter is also required to specify which files in the folder to process. [String] |
| Denotes this is searching an entire folder structure and provides a Regular Expression of path/file names within it that should be processed. All matches should be of the same format. [String] |
| Denotes this is a Zip file and provides a Regular Expression of path/file names within it that should be processed. All matches should be of the same format. [String] |
| Should a file/folder simply not exist, don't throw an error but return an empty table with column names and types created as best possible given other options. [Boolean] |
| Adds a column (the first column) to the result set which contains the file the row came from. [Boolean] |
| Column Names either overrides the header row or steps in when there is no header row (comma-delimited list) [String] |
| Column (by Name) that should be returned (comma delimited list) [String] |
| Column types (comma delimited list of: 'Boolean, Date, DateTime, Decimal, Double, Int, BigInt, Text, Table', some columns may be left blank while others are specified) [String] |
| If non-zero and 'types' is not specified (or not specified for some columns) this will look through N rows to attempt to work out the column types for columns not pre-specified [Int32] |
| Trim both leading and trailing whitespace when reading values [Boolean] |
| Set this if there is no header row [Boolean] |
| The delimiter between values (\t for tab) [String, Default: ,] |
| Character used to escape the 'Quote' character when within a value [String, Default: "] |
| Specific name of a file encoding. Supported are: utf-16, utf-16BE, utf-32, utf-32BE, us-ascii, iso-8859-1, utf-8 [String] |
| Character used around any field containing the 'delimiter' or a line break. [String, Default: "] |
| Regex of values to map to 'null' in the returned data. [String, Default: (^[nN][uU][lL]{1,2}$)|((\r\n|\n|\r)$)|(^(\r\n|\n|\r))|(^\s*$)] |
| Number of rows to ignore before the header row [Int32] |
| Number of rows to ignore after the header row [Int32] |
| Skip invalid data rows (totally invalid ones). This also allows for potentially wrong data if it can be handled somewhat e.g. embedded quotes misused (and still returns such rows). In either case a warning will show in the progress feedback. [Boolean] |
Examples
In the following examples, the select * from @x syntax at the end prints the table of data assembled by the query.
Note: For more examples, try the Luminesce Github repo.
Example 1: Extract data from a particular CSV file
@x = use Drive.Csv
--file=/trade-files/eod.csv
enduse;
select * from @xExample 2: Extract data from a matching CSV file
In this example, @@today is a scalar variable that must resolve to exactly one column and one row of data.
@@today = select strftime('%Y%m%d', 'now');
@x = use Drive.Csv with @@today
--file=/trade-files/eod-{@@today}.csv
enduse;
select * from @xExample 3: Extract data from multiple CSV files using a regular expression
If --file is a folder in Drive, you must specify the --folderFilter option with a regular expression to nominate one or more files to read.
In this example, data is extracted from three files eod-1.csv, eod-2.csv and eod-3.csv, but not from eod-4.csv. The --addFileName option adds an extra column to the table of results showing the source of each record.
@x = use Drive.Csv
--file=/trade-files
--folderFilter=eod-[1-3].csv
--addFileName
enduse;
select * from @xIn this example, data is extracted from any file with a File_ prefix, 8 digits, an underscore, and then 6 digits, for example File_01072021_123456.csv:
@datatoload = use Drive.Csv
--file=/TestLocation/
--folderFilter=File_\d{8}_\d{6}.csv
enduse;
select * from @datatoloadExample 4: Extract data from multiple files stored in a ZIP archive
In this example, daily.zip is stored in the root Drive folder; data is extracted from any file within it that just has alphabetic characters in its file name.
@x = use Drive.Csv
--file=daily.zip
--zipFilter=[a-zA-Z].csv
--addFileName
enduse;
select * from @xExample 5: Extract data from a CSV file with no header row, instead supplying column names
Note the number of column names specified must match the inbound number of columns.
@x = use Drive.Csv
--file=/trade-files/daily.csv
--noHeader
--names=Equity,Units,Cost
enduse;
select * from @xExample 6: Extract encoded data from a CSV file
In this example, the --encoding option is specified to extract UTF-8 encoded data. See the table above for all supported encoding systems.
@x = use Drive.Csv
--file=/trade-files/daily.csv
--encoding=utf-8
enduse;
select * from @xExample 7: Infer data types of columns from the first 10 rows
In this example, the --skipInvalidRows option is specified to ignore cells that cannot be successfully converted.
@x = use Drive.Csv
--file=/trade-files/daily.csv
--inferTypeRowCount=10
--skipInvalidRows
enduse;
select * from @x