Tools.Parse.Xml

Type	Read/write	Author	Availability
Direct provider	Read	Finbourne	Provided with LUSID

The Tools.Parse.Xml provider enables you to write a Luminesce query that reads XML from one or more cells within a table.

See also: Tools.Parse.Csv

Basic usage

@data = select 
  '<filename>' as Filename,
  '<?<XML-declaration>?>
  <rows>
    <row>
      <<column-name>><some-data></<column-name>>
      [<<column-name>>...]
    </row>
  </rows>' as Content;

@parsed = 
use Tools.Parse.Xml with @input
--<optional-arguments>  
enduse;

select * from @parsed;

Input tables

Tools.Parse.Xml takes in one input table and outputs a table of data, see example 1.

Options

Tools.Parse.Xml has options that enable you to refine a query.

An option takes the form --<option>=<value>, for example --fileFilter=myFile.xml. Note no spaces are allowed either side of the = operator. If an option:

takes a boolean value, then specifying that option (for example --noHeader) sets it to True; omitting the option specifies False
takes multiple string values, then specify a comma-separated list, for example --names=My,Column,Names

Current options at article update time are listed in the table below. For the very latest information, run the following query using a suitable tool and examine the online help:

@x = use Tools.Parse.Xml 
--help
enduse;
select * from @x;

Current options	Explanation
`fileFilter` (`-f`)	A regex for the filenames to process. [String]
`addFileName`	Adds a column (the first column) to the result set which contains the file the row came from. [Boolean]
`types`	Column types (comma delimited list of: 'Boolean, Date, DateTime, Decimal, Double, Int, BigInt, Text, Table', some columns may be left blank while others are specified). [String]
`trim`	Trim both leading and trailing whitespace when reading values. [Boolean]
`inferTypeRowCount`	If non-zero and 'types' is not specified (or not specified for some columns) this will look through N rows to attempt to work out the column types for columns not pre-specified. [Int32]
`valuesToMakeNull`	Regex of values to map to 'null' in the returned data. [String, Default: (^[nN][uU][lL]{1,2}$)\|((\r\n\|\n\|\r)$)\|(^(\r\n\|\n\|\r))\|(^\s*$)]
`nodePath`	XPath query that selects the nodes to map to rows. [String, Default: //*]
`namespaces`	Selected prefix(es) and namespace(s): prefix1=namespace1-uri1,prefix2=namespace2-uri2,...prefixN=namespaceN-uriN [String]
`columns`	Column names and XPath to create them, of the form: Name=XPath (one pair per line). If omitted this will be made up of all the direct children of the first selected node in the --nodePath. If any column doesn't supply XPath (so of the form: abc or abc=) then the XPath will be inferred in the same way as omitting this entirely, but only requested columns will be returned. [String]

Examples

Note: For more example Luminesce queries, visit our GitHub repo.

Example 1: Reading a single XML file from a table

@data = 
select 
  'MyFile.xml' as Filename,
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <TextColumn1>some text</TextColumn1>
      <TextColumn2>more text</TextColumn2>
      <NumberColumn1>1</NumberColumn1>
      <NumberColumn2>2</NumberColumn2>
    </row>
  </rows>' as Content
;

@parsed = 
use Tools.Parse.Xml with @data
--addFileName
--nodePath=/rows/row 
enduse;

select * from @parsed;

The table of data returned looks like this:

Example 2: Reading multiple XMLs at the same time

In this example, the --addFileName option is specified to ensure the first column in the table of results contains the filename corresponding to each row of data.

@data = 
select
  'MyFile.xml' as Filename,
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <TextColumn1>some text</TextColumn1>
      <TextColumn2>more text</TextColumn2>
      <NumberColumn1>1</NumberColumn1>
      <NumberColumn2>2</NumberColumn2>
    </row>
  </rows>' as Content
union all
select
  'MyOtherFile.xml' as Filename,
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <TextColumn1>text from my other file</TextColumn1>
      <TextColumn2>more text from my other file</TextColumn2>
      <NumberColumn1>10</NumberColumn1>
      <NumberColumn2>20</NumberColumn2>
    </row>
  </rows>' as Content
union all
select
  'MyThirdFile.xml' as Filename,
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <TextColumn1>some text from my third file</TextColumn1>
      <TextColumn2>more text from my third file</TextColumn2>
      <NumberColumn1>100</NumberColumn1>
      <NumberColumn2>200</NumberColumn2>
    </row>
  </rows>' as Content
;

@parsed = 
use Tools.Parse.Xml with @data
--addFileName
--nodePath=/rows/row 
enduse;

select * from @parsed;

The table of data returned looks like this:

Example 3: Returning only rows with a particular filename

In this example, the --fileFilter option is specified to only select rows with a Filename value that contains ReportFile.

@data = 
select
  'ReportFile_1.xml' as Filename,
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>Jane Bloggs</EmployeeName>
      <Id>26</Id>
    </row>
  </rows>' as Content
union all
select
  'ReportFile_2.xml',
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>Joe Bloggs</EmployeeName>
      <Id>58</Id>
    </row>
  </rows>'
union all
select
  'UnrelatedFile.xml',
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <UnrelatedColumn1>some unrelated text</UnrelatedColumn1>
      <UnrelatedColumn2>some more text</UnrelatedColumn2>
    </row>
  </rows>'
;

@parsed = 
use Tools.Parse.Xml with @data
--fileFilter=ReportFile.*
--nodePath=/rows/row 
enduse;

select * from @parsed;

The table of data returned looks like this, with UnrelatedFile.Xml being ignored for not matching the specified --fileFilter value:

Example 4: Returning only particular rows without any filenames

In this example, the --fileFilter option is specified to only select particular rows to process. As no Filename column is specified in @data, Tools.Parse.Xml gives each row a sequential name which can then be referred to in the --fileFilter option.

@data = 
select 
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>Jane Bloggs</EmployeeName>
      <Id>26</Id>
    </row>
  </rows>' as Content
union all
select
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>Joe Bloggs</EmployeeName>
      <Id>58</Id>
    </row>
  </rows>'
union all
select 
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>Jane Doe</EmployeeName>
      <Id>73</Id>
    </row>
  </rows>'
union all 
select 
  '<?xml version="1.0" encoding="utf-8"?>
  <rows>
    <row>
      <EmployeeName>John Doe</EmployeeName>
      <Id>107</Id>
    </row>
  </rows>'
;

@parsed = 
use Tools.Parse.Xml with @data
--fileFilter=File[2,4]
--addFileName
--nodePath=/rows/row 
enduse;

select * from @parsed;

The table of data returned looks like this, containing only the second and fourth rows from @data: