Drive.RawText

Prev Next

Type

Read/write

Author

Availability

Direct provider

Read

Finbourne

Provided with LUSID

The Drive.RawText provider enables you to write a Luminesce query that extracts data from one or more plain text files stored in Drive.

The query returns a table of data consisting of one row and one column per file, in the order multiple files are read (if the case). Note you can use this provider in conjunction with Tools.JsonExpand to read data in from JSON files stored in Drive and parse the JSON into a table of constituent data objects.

See also: Drive.Excel, Drive.Csv, Drive.Sqlite, Drive.Xml, Drive.File

Basic usage

@x = use Drive.RawText
<options>
enduse;
select * from @x

Options

Drive.RawText has options that enable you to filter or refine a query.

Note: The --file option is mandatory.

An option takes the form --<option>=<value>, for example --file=meeting.txt. Note no spaces are allowed either side of the = operator. If an option:

  • Takes a boolean value, then specifying that option (for example --addFileName) sets it to True; omitting the option specifies False.

  • Takes multiple string values, then specify a comma-separated list.

Current options at article update time are listed in the table below. For the very latest information, run the following query using a suitable tool and examine the online help:

@x = use Drive.RawText
enduse;
select * from @x

Current options

Explanation

file (-f)

Mandatory. The file to read. It may also be a folder, in which case --folderFilter is also required to specify which files in the folder to process. [String]

folderFilter

Denotes this is searching an entire folder structure and provides a Regular Expression of path/file names within it that should be processed. All matches should be of the same format. [String]

zipFilter (-z)

Denotes this is a Zip file and provides a Regular Expression of path/file names within it that should be processed. All matches should be of the same format. [String]

allowMissing

Should a file/folder simply not exist, don't throw an error but return an empty table with column names and types created as best possible given other options. [Boolean]

addFileName

Adds a column (the first column) to the result set which contains the file the row came from. [Boolean]

encoding

Specific name of a file encoding. Supported are: utf-16, utf-16BE, utf-32, utf-32BE, us-ascii, iso-8859-1, utf-8 [String]

Examples

In the following examples, the select * from @x syntax at the end prints the table of data assembled by the query.

Note: For more examples, try the Luminesce Github repo.

Example 1: Extract data from a particular text file

@x = use Drive.RawText
--file=/notes/meeting.txt
enduse;
select * from @x

Example 2: Extract data from a matching text file

In this example, @@today is a scalar variable that must resolve to exactly one column and one row of data.

@@today = select strftime('%Y%m%d', 'now');
@x = use Drive.RawText with @@today
--file=/notes/meeting-{@@today}.txt
enduse;
select * from @x

Example 3: Extract data from multiple text files matched using a regular expression

If --file is a folder in Drive, you must specify the --folderFilter option with a regular expression to nominate one or more files to read. 

In this example, data is extracted from three files meeting-1.txt, meeting-2.txt and meeting-3.txt, but not from meeting-4.txt. The --addFileName option adds an extra column to the table of results showing the source of each record.

@x = use Drive.RawText
--file=/notes
--folderFilter=meeting-[1-3].txt
--addFileName
enduse;
select * from @x

In this example, data is extracted from any file with a Meeting_ prefix, 8 digits, an underscore, and then 6 digits, for example Meeting_01072021_123456.txt.

@datatoload = use Drive.RawText
--file=/notes/
--folderFilter=Meeting_\d{8}_\d{6}.txt
enduse;
select * from @datatoload

Example 4: Extract data from multiple files stored in a ZIP archive

In this example, meetings.zip is stored in the root Drive folder; data is extracted from any .txt file within it that just has alphabetic characters in its file name.

@x = use Drive.RawText
--file=meetings.zip
--zipFilter=[a-zA-Z].txt
--addFileName
enduse;
select * from @x