Type | Read/write | Author | Availability |
|---|---|---|---|
Read | Finbourne | Provided with LUSID |
The Tools.Parse.Xml provider enables you to write a Luminesce query that reads XML from one or more cells within a table.
See also: Tools.Parse.Csv
Basic usage
@data = select
'<filename>' as Filename,
'<?<XML-declaration>?>
<rows>
<row>
<<column-name>><some-data></<column-name>>
[<<column-name>>...]
</row>
</rows>' as Content;
@parsed =
use Tools.Parse.Xml with @input
--<optional-arguments>
enduse;
select * from @parsed;Input tables
Tools.Parse.Xml takes in one input table and outputs a table of data, see example 1.
Options
Tools.Parse.Xml has options that enable you to refine a query.
An option takes the form --<option>=<value>, for example --fileFilter=myFile.xml. Note no spaces are allowed either side of the = operator. If an option:
takes a boolean value, then specifying that option (for example
--noHeader) sets it to True; omitting the option specifies Falsetakes multiple string values, then specify a comma-separated list, for example
--names=My,Column,Names
Current options at article update time are listed in the table below. For the very latest information, run the following query using a suitable tool and examine the online help:
@x = use Tools.Parse.Xml
--help
enduse;
select * from @x;Current options | Explanation |
|---|---|
| A regex for the filenames to process. [String] |
| Adds a column (the first column) to the result set which contains the file the row came from. [Boolean] |
| Column types (comma delimited list of: 'Boolean, Date, DateTime, Decimal, Double, Int, BigInt, Text, Table', some columns may be left blank while others are specified). [String] |
| Trim both leading and trailing whitespace when reading values. [Boolean] |
| If non-zero and 'types' is not specified (or not specified for some columns) this will look through N rows to attempt to work out the column types for columns not pre-specified. [Int32] |
| Regex of values to map to 'null' in the returned data. [String, Default: (^[nN][uU][lL]{1,2}$)|((\r\n|\n|\r)$)|(^(\r\n|\n|\r))|(^\s*$)] |
| XPath query that selects the nodes to map to rows. [String, Default: //*] |
| Selected prefix(es) and namespace(s): prefix1=namespace1-uri1,prefix2=namespace2-uri2,...prefixN=namespaceN-uriN [String] |
| Column names and XPath to create them, of the form: Name=XPath (one pair per line). If omitted this will be made up of all the direct children of the first selected node in the --nodePath. If any column doesn't supply XPath (so of the form: abc or abc=) then the XPath will be inferred in the same way as omitting this entirely, but only requested columns will be returned. [String] |
Examples
Note: For more example Luminesce queries, visit our GitHub repo.
Example 1: Reading a single XML file from a table
@data =
select
'MyFile.xml' as Filename,
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<TextColumn1>some text</TextColumn1>
<TextColumn2>more text</TextColumn2>
<NumberColumn1>1</NumberColumn1>
<NumberColumn2>2</NumberColumn2>
</row>
</rows>' as Content
;
@parsed =
use Tools.Parse.Xml with @data
--addFileName
--nodePath=/rows/row
enduse;
select * from @parsed;The table of data returned looks like this:
.png?sv=2022-11-02&spr=https&st=2026-02-12T08%3A27%3A12Z&se=2026-02-12T08%3A41%3A12Z&sr=c&sp=r&sig=TP%2B59SNJPG4BcAfnjaPtQ5PIA31jP1frmN4zjF%2Fyets%3D)
Example 2: Reading multiple XMLs at the same time
In this example, the --addFileName option is specified to ensure the first column in the table of results contains the filename corresponding to each row of data.
@data =
select
'MyFile.xml' as Filename,
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<TextColumn1>some text</TextColumn1>
<TextColumn2>more text</TextColumn2>
<NumberColumn1>1</NumberColumn1>
<NumberColumn2>2</NumberColumn2>
</row>
</rows>' as Content
union all
select
'MyOtherFile.xml' as Filename,
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<TextColumn1>text from my other file</TextColumn1>
<TextColumn2>more text from my other file</TextColumn2>
<NumberColumn1>10</NumberColumn1>
<NumberColumn2>20</NumberColumn2>
</row>
</rows>' as Content
union all
select
'MyThirdFile.xml' as Filename,
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<TextColumn1>some text from my third file</TextColumn1>
<TextColumn2>more text from my third file</TextColumn2>
<NumberColumn1>100</NumberColumn1>
<NumberColumn2>200</NumberColumn2>
</row>
</rows>' as Content
;
@parsed =
use Tools.Parse.Xml with @data
--addFileName
--nodePath=/rows/row
enduse;
select * from @parsed;The table of data returned looks like this:
.png?sv=2022-11-02&spr=https&st=2026-02-12T08%3A27%3A12Z&se=2026-02-12T08%3A41%3A12Z&sr=c&sp=r&sig=TP%2B59SNJPG4BcAfnjaPtQ5PIA31jP1frmN4zjF%2Fyets%3D)
Example 3: Returning only rows with a particular filename
In this example, the --fileFilter option is specified to only select rows with a Filename value that contains ReportFile.
@data =
select
'ReportFile_1.xml' as Filename,
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>Jane Bloggs</EmployeeName>
<Id>26</Id>
</row>
</rows>' as Content
union all
select
'ReportFile_2.xml',
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>Joe Bloggs</EmployeeName>
<Id>58</Id>
</row>
</rows>'
union all
select
'UnrelatedFile.xml',
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<UnrelatedColumn1>some unrelated text</UnrelatedColumn1>
<UnrelatedColumn2>some more text</UnrelatedColumn2>
</row>
</rows>'
;
@parsed =
use Tools.Parse.Xml with @data
--fileFilter=ReportFile.*
--nodePath=/rows/row
enduse;
select * from @parsed;The table of data returned looks like this, with UnrelatedFile.Xml being ignored for not matching the specified --fileFilter value:
.png?sv=2022-11-02&spr=https&st=2026-02-12T08%3A27%3A12Z&se=2026-02-12T08%3A41%3A12Z&sr=c&sp=r&sig=TP%2B59SNJPG4BcAfnjaPtQ5PIA31jP1frmN4zjF%2Fyets%3D)
Example 4: Returning only particular rows without any filenames
In this example, the --fileFilter option is specified to only select particular rows to process. As no Filename column is specified in @data, Tools.Parse.Xml gives each row a sequential name which can then be referred to in the --fileFilter option.
@data =
select
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>Jane Bloggs</EmployeeName>
<Id>26</Id>
</row>
</rows>' as Content
union all
select
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>Joe Bloggs</EmployeeName>
<Id>58</Id>
</row>
</rows>'
union all
select
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>Jane Doe</EmployeeName>
<Id>73</Id>
</row>
</rows>'
union all
select
'<?xml version="1.0" encoding="utf-8"?>
<rows>
<row>
<EmployeeName>John Doe</EmployeeName>
<Id>107</Id>
</row>
</rows>'
;
@parsed =
use Tools.Parse.Xml with @data
--fileFilter=File[2,4]
--addFileName
--nodePath=/rows/row
enduse;
select * from @parsed;The table of data returned looks like this, containing only the second and fourth rows from @data:
.png?sv=2022-11-02&spr=https&st=2026-02-12T08%3A27%3A12Z&se=2026-02-12T08%3A41%3A12Z&sr=c&sp=r&sig=TP%2B59SNJPG4BcAfnjaPtQ5PIA31jP1frmN4zjF%2Fyets%3D)