Tools.Parse.Xml

Type	Read/write	Author	Availability
Direct provider	Read	Finbourne	Provided with LUSID

The Tools.Parse.Xml provider enables you to write a Luminesce query that reads XML from one or more cells within a table.

Note: The LUSID user running the query must have sufficient access control permissions to use this provider. This should automatically be the case if you are the domain owner.

See also: Tools.Parse.Csv

Basic usage

@data = 
select 
  '<filename>' as Filename,
  '<?<XML-declaration>?>
  <rows>
    <row>
      <<column-name>><some-data></<column-name>>
      [<<column-name>>...]
    </row>
  </rows>' as Content;

@parsed = 
use Tools.Parse.Xml with @input
--<optional-arguments>  
enduse;

select * from @parsed;

Input tables

Tools.Parse.Xml takes in one input table and outputs a table of data, see example 1.

Options

Tools.Parse.Xml has options that enable you to refine a query.

An option takes the form --<option>=<value>, for example --fileFilter=myFile.xml. Note no spaces are allowed either side of the = operator. If an option takes a boolean value, then specifying that option (for example --addFileName) sets it to True; omitting the option specifies False.

The table below provides information on the most commonly used options for Tools.Parse.Xml:

Option	Value	Status	Information
`--fileFilter`	Regex string, for example `MyFile.xml`, `MyFile.*` or `File[1-3]`	Optional	Based on the `Filename` column, selects the rows which should be processed. If the `Filename` column is omitted, names files sequentially as `File1`, `File2` and so on.
`--addFileName`	Boolean	Optional	Adds a column to the result set containing the file the row came from.
`--nodePath`	String	Optional	XML Path query that selects the nodes to map to rows. Defaults to `//*`.

To see a help screen of all available options, their data types, default values, and an explanation for each, run the following query using a suitable tool:

@x = use Tools.Parse.Xml 
--help
enduse;
select * from @x;

Examples

Note: For more example Luminesce queries, visit our GitHub repo.

Example 1: Reading a single XML file from a table

@data = 
select 
  'MyFile.xml' as Filename,
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <TextColumn1>some text</TextColumn1>
      <TextColumn2>more text</TextColumn2>
      <NumberColumn1>1</NumberColumn1>
      <NumberColumn2>2</NumberColumn2>
    </row>
  </rows>' as Content
;

@parsed = 
use Tools.Parse.Xml with @data
--addFileName
--nodePath=/rows/row 
enduse;

select * from @parsed;

The table of data returned looks like this:

Example 2: Reading multiple XMLs at the same time

In this example, the --addFileName option is specified to ensure the first column in the table of results contains the filename corresponding to each row of data.

@data = 
select
  'MyFile.xml' as Filename,
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <TextColumn1>some text</TextColumn1>
      <TextColumn2>more text</TextColumn2>
      <NumberColumn1>1</NumberColumn1>
      <NumberColumn2>2</NumberColumn2>
    </row>
  </rows>' as Content
union all
select
  'MyOtherFile.xml' as Filename,
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <TextColumn1>text from my other file</TextColumn1>
      <TextColumn2>more text from my other file</TextColumn2>
      <NumberColumn1>10</NumberColumn1>
      <NumberColumn2>20</NumberColumn2>
    </row>
  </rows>' as Content
union all
select
  'MyThirdFile.xml' as Filename,
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <TextColumn1>some text from my third file</TextColumn1>
      <TextColumn2>more text from my third file</TextColumn2>
      <NumberColumn1>100</NumberColumn1>
      <NumberColumn2>200</NumberColumn2>
    </row>
  </rows>' as Content
;

@parsed = 
use Tools.Parse.Xml with @data
--addFileName
--nodePath=/rows/row 
enduse;

select * from @parsed;

The table of data returned looks like this:

Example 3: Returning only rows with a particular filename

In this example, the --fileFilter option is specified to only select rows with a Filename value that contains ReportFile.

@data = 
select
  'ReportFile_1.xml' as Filename,
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>Jane Bloggs</EmployeeName>
      <Id>26</Id>
    </row>
  </rows>' as Content
union all
select
  'ReportFile_2.xml',
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>Joe Bloggs</EmployeeName>
      <Id>58</Id>
    </row>
  </rows>'
union all
select
  'UnrelatedFile.xml',
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <UnrelatedColumn1>some unrelated text</UnrelatedColumn1>
      <UnrelatedColumn2>some more text</UnrelatedColumn2>
    </row>
  </rows>'
;

@parsed = 
use Tools.Parse.Xml with @data
--fileFilter=ReportFile.*
--nodePath=/rows/row 
enduse;

select * from @parsed;

The table of data returned looks like this, with UnrelatedFile.Xml being ignored for not matching the specified --fileFilter value:

Example 4: Returning only particular rows without any filenames

In this example, the --fileFilter option is specified to only select particular rows to process. As no Filename column is specified in @data, Tools.Parse.Xml gives each row a sequential name which can then be referred to in the --fileFilter option.

@data = 
select 
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>Jane Bloggs</EmployeeName>
      <Id>26</Id>
    </row>
  </rows>' as Content
union all
select
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>Joe Bloggs</EmployeeName>
      <Id>58</Id>
    </row>
  </rows>'
union all
select 
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>Jane Doe</EmployeeName>
      <Id>73</Id>
    </row>
  </rows>'
union all 
select 
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>John Doe</EmployeeName>
      <Id>107</Id>
    </row>
  </rows>'
;

@parsed = 
use Tools.Parse.Xml with @data
--fileFilter=File[2,4]
--addFileName
--nodePath=/rows/row 
enduse;

select * from @parsed;

The table of data returned looks like this, containing only the second and fourth rows from @data: