XML import

Modified on Wed, 12 Dec, 2018 at 3:03 PM

Importing data from XML involves configuring a File block to point to the XML file, then defining rules that map from the hierarchical structure of an XML document to the tabular structure (fields and records) that Omniscope expects. This is necessarily a little complex, but we're on hand to help.

The first step is to familiarise yourself with the XML structure. This is easily done by opening the XML file in Chrome. Alternatively some text editor may help, but you need to be able to reformat the document to show the hierarchy clearly.

Summary

For example we have a French XML file containing data about gas prices:

The end result we're after is as follows:

The configuration needed was:

In detail

Record path

The first step is to identify the Record path. This determines which nodes in the document become a record (row) in the output data table - i.e. what is the granularity of the output with respect to the XML.

Here we've chosen pdv_liste/pdv/prix which means all prix nodes within a pdv node within the pdv_liste root node. For every prix node found in the document and in the correct position in the hierarchy, a record will be emitted.

The first such node is <prix nom="Gazole" id="1" maj="2018-01-06T12:20:27" valeur="1376"/> and there are many more such nodes. There are also further pdv nodes in the document containing more prix nodes, and these will also be included.

(If instead we chose pdv_liste/pdv as the Record path, we would get a record for each pdv node in the document - in this small attached XML document there are two such nodes.)

Fields

Now we must identify what data we want to emit as fields (columns / attributes) in the table. For each record node (prix), what data values do we want?

For each field, you choose a Field Name (whatever you like, as a column name, which must be unique), and provide a Field Path, which is absolute (starts from the root of the document) and is typically similar to the record path.

There are 3 kinds of values relative to a given record node you might want to include:

Data attached directly to the record node:
- XML attributes of the record node are identified using node@attribute so the full field path for an attribute of the prix node might be pdv_liste/pdv/prix@id.
- Content of the record node is identified just using the node so with the full field path it might be pdv_liste/pdv/prix. In this case there isn't any content inside the node, but let's say you had ...<prix>Some content</prix> then pdv_liste/pdv/prix would result in a field cell value Some content.
Data at a higher level in the hierarchy (this will be replicated for every child node)
- XML attributes of record node ancestors. For example pdv_liste/pdv@id would yield values such as 1. This value will be repeated for every prix record corresponding to that pdv node's children.
  You could also write pdv_liste@xx if there was an xx attribute of the root node; this value would be repeated in every record of the output.
- XML attributes and content of other descendants of record node ancestors. For example pdv_liste/pdv/ville is like a complex attribute of the pdv node. Providing the node comes before the prix node, you can import it as shown in this example. If there were multiple ville nodes per pdv, you would get a comma-separated list in the cell of the output data.
  For example pdv_liste/pdv/horaires/jour@nom would result in a cell value "Lundi,Mardi,Mercredi..." in the output data.
  Likewise you could write pdv_liste/yy if there was a yy node child of the root node; this value would be repeated in every record of the output.
Data at a lower level in the hierarchy
- Our example doesn't use this, but if the record path was pdv_liste/pdv (in this case producing only 2 records), we could use the field path pdv_liste/pdv/ville to pull out the contents of one of pdv's children, or pdv_liste/pdv/prix@nom to pull out a comma-separated list of fuel names.

Visual illustration

This picture attempts to use colour coding to show how the XML document, block options, and table output relate. In particular see how the 3 elements of the Record path (pdv_liste, pdv, prix) coded grey, purple, red, are used in the Field Paths and correspond to the XML document:

Need more help?

Post your problem in our Forums, attach a small representative example XML file, and include an illustration of what data you expect to get. We'll post back right away.

Attachments (1)

xml

PrixCarburan....xml
48.6 KB