Tools.Parse.Csv

Prev Next

Type

Read/write

Author

Availability

Direct provider

Read

Finbourne

Provided with LUSID

The Tools.Parse.Csv provider enables you to write a Luminesce query that reads CSV, text and more from cells within a table.

See also: Tools.Parse.Xml

Basic usage

@data = 
select 
  '<filename>' as Filename,
  '<column-name>[, <column-name>...]
<some-data>[, <some-data>...]' as Content;

@parsed = 
use Tools.Parse.Csv with @input
--<optional-arguments>  
enduse;

select * from @parsed;

Input tables

Tools.Parse.Csv takes in one input table and outputs a table of data, see example 1.

Options

Tools.Parse.Csv has options that enable you to refine a query.

An option takes the form --<option>=<value>, for example --file=trade-file.csv. Note no spaces are allowed either side of the = operator. If an option:

  • takes a boolean value, then specifying that option (for example --noHeader) sets it to True; omitting the option specifies False

  • takes multiple string values, then specify a comma-separated list, for example --names=My,Column,Names

Current options at article update time are listed in the table below. For the very latest information, run the following query using a suitable tool and examine the online help:

@x = use Tools.Parse.Csv 
--help
enduse;
select * from @x;

Current options

Explanation

fileFilter (-f)

A regex for the filenames to process [String]

addFileName

Adds a column (the first column) to the result set which contains the file the row came from. [Boolean]

names

Column Names either overrides the header row or steps in when there is no header row (comma-delimited list) [String]

select

Column (by Name) that should be returned (comma delimited list) [String]

types

Column types (comma delimited list of: 'Boolean, Date, DateTime, Decimal, Double, Int, BigInt, Text, Table', some columns may be left blank while others are specified) [String]

inferTypeRowCount

If non-zero and 'types' is not specified (or not specified for some columns) this will look through N rows to attempt to work out the column types for columns not pre-specified [Int32]

trim

Trim both leading and trailing whitespace when reading values [Boolean]

noHeader

Set this if there is no header row [Boolean]

delimiter

The delimiter between values (\t for tab) [String, Default: ,]

escape

Character used to escape the 'Quote' character when within a value [String, Default: "]

encoding

Specific name of a file encoding. Supported are: utf-16, utf-16BE, utf-32, utf-32BE, us-ascii, iso-8859-1, utf-8 [String]

quote

Character used around any field containing the 'delimiter' or a line break. [String, Default: "]

valuesToMakeNull

Regex of values to map to 'null' in the returned data. [String, Default: (^[nN][uU][lL]{1,2}$)|((\r\n|\n|\r)$)|(^(\r\n|\n|\r))|(^\s*$)]

skipPreHeader

Number of rows to ignore before the header row [Int32]

skipPostHeader

Number of rows to ignore after the header row [Int32]

skipInvalidRows

Skip invalid data rows (totally invalid ones). This also allows for potentially wrong data if it can be handled somewhat e.g. embedded quotes misused (and still returns such rows). In either case a warning will show in the progress feedback. [Boolean]

Examples

Note: For more example Luminesce queries, visit our GitHub repo.

Example 1: Reading a single CSV

@data = 
select 
  'MyFile.Csv' as Filename,
  'TextColumn1, TextColumn2, NumberColumn1, NumberColumn2
some text, more text, 1, 2' as Content
;

@parsed = 
use Tools.Parse.Csv with @data 
enduse;

select * from @parsed;

The table of data returned looks like this:

Example 2: Reading multiple CSVs at the same time

In this example, the --addFileName option is specified to ensure the first column in the table of results contains the filename corresponding to each row of data.

@data = 
select 
  'MyFile.Csv' as Filename,
  'TextColumn1, TextColumn2, NumberColumn1, NumberColumn2
some text, more text, 1, 2' as Content
union all
select
  'MyOtherFile.Csv' as Filename,
  'TextColumn1, TextColumn2, NumberColumn1, NumberColumn2
text from my other file, more text from my other file, 10, 20' as Content
union all
select
  'MyThirdFile.Csv' as Filename,
  'TextColumn1, TextColumn2, NumberColumn1, NumberColumn2
some text from my third file, more text from my third file, 100, 200' as Content
;

@parsed = 
use Tools.Parse.Csv with @data
--addFileName 
enduse;

select * from @parsed;

The table of data returned looks like this:

Example 3: Returning only rows with a particular filename

In this example, the --fileFilter option is specified to only select rows with a Filename value that contains ReportFile.

@data = 
select 
  'ReportFile_1.Csv' as Filename,
  'EmployeeName, Id
Jane Bloggs, 26' as Content
union all
select
  'ReportFile_2.Csv',
  'EmployeeName, Id
Joe Bloggs, 58'
union all
select
  'UnrelatedFile.Csv' as Filename,
'UnrelatedColumn1, UnrelatedColumn2
some unrelated text, some more text'
;

@parsed = 
use Tools.Parse.Csv with @data
--fileFilter=ReportFile.* 
enduse;

select * from @parsed;

The table of data returned looks like this, with UnrelatedFile.Csv being ignored for not matching the specified --fileFilter value:

Example 4: Returning only particular rows without any filenames

In this example, the --fileFilter option is specified to only select particular rows to process. As no Filename column is specified in @data, Tools.Parse.Csv gives each row a sequential name which can then be referred to in the --fileFilter option. 

@data = 
select 
  'EmployeeName, Id
Jane Bloggs, 26' as Content
union all
select
  'EmployeeName, Id
Joe Bloggs, 58'
union all
select 
  'EmployeeName, Id 
Jane Doe, 73'
union all 
select 
  'EmployeeName, Id
John Doe, 107'
;

@parsed = 
use Tools.Parse.Csv with @data
--fileFilter=File[2,4] 
enduse;

select * from @parsed;

The table of data returned looks like this, containing only the second and fourth rows from @data: