DeltaXML.CoreS9Api.Config LexicalPreservationConfig
Namespace: DeltaXML.CoreS9Api.Config
Assembly: DeltaXML.CoreS9Api (in DeltaXML.CoreS9Api.dll) Version: 10.4.0.1000 (10.4.0.1000)
Configures the way that Lexical Preservation is applied during the document loading, preservation processing and output/serialisation phases of a pipelined comparison. Here, the:
- document loading phase converts/encodes the 'lexical' aspects of the document into a form that can be retained and processed by the underpinning comparator engine.
- preservation processing phase conceptually gathers together those output filters that are responsible for transforming any differences contained in the preserved items into a form that can be handled by the final output/serialisation stage. This stage may require some custom filters for handling, say for specific output format constraints.
- output/serialisation phase typically converts/decodes the 'lexical' aspects of the document back into their original forms. However, it is possible for encoded forms to be retained if desired.
Normally an XML parser or 'XML processor' (a term defined in the XML specification) disregards 'doctype', 'ignorable whitespace', 'cdata Sections' and other 'lexical' aspects of the XML input during processing. Both the PipelinedComparatorS9 and the DocumentComparator can be configured to convert the 'lexical' items into markup that can be processed by the underpinning comparator (i.e. element, attribute, and text nodes). Note that comments and processing instructions are also treated as 'lexical' aspects of the input, as the underpinning comparator ignores them.
Note that some aspects of XML are not reported by an XML Parser and so we cannot ensure complete preservation of all lexical aspects of an input file. Some of these aspects include:
- whether single or double quotes are used for attribute values
- attribute order within a start tag
- any whitespace within a start tag or end tag, for example whitespace or line breaks between attributes
- any whitespace outside of the root element, including whitespace in the DTD internal subset
- whether or not an XML Declaration was present in the input
Some of the things that can be preserved include:
- comments
- processing instructions
- doctype declarations
- information about the file encoding and XML version (whether from the XML Declaration or otherwise)
- entity reference information (while the parser expands we still keep reference info)
- subset declarations for elements, attributes and entities
- use of CDATA sections
Usage
An instance of this configuration class should be set on either the PipelinedComparatorS9's or DocumentComparator's LexicalPreservationConfig property.
Data relocation
Some marked up items cannot be placed at their original locations whilst maintaining a well-formed result. This primarily relates to information outside the root element. For these areas the markup is moved inside the root element and contained in the first few children of the root element or the last child. Generally only comments and processing instructions can appear outside the root element, however the internal subset contains other items, as does the XML declaration. When all types of information are present the output will have this structure:
<root> <preserve:xmldecl xml-version="1.0" encoding="UTF-8" standalone="no"/> <preserve:comments-and-pis region="BEFORE_DTD"> ... </preserve:comments-and-pis> <preserve:doctype> ... </preserve:doctype> <preserve:comments-and-pis region="AFTER_DTD"> ... </preserve:comments-and-pis> <child> first child element of original root element ... </child> ... <child> last child element of original root element ... </child> <preserve:comments-and-pis region="AFTER_BODY"> ... </preserve:comments-and-pis> </root>
Entity Handling
Three of the settings provided for handling entities interact in various ways. Some observations to note include:
- Setting PreserveNestedEntityReferences to true only makes sense when both PreserveEntityReferences and PreserveEntityReplacementText are also true
- Setting both PreserveEntityReferences and PreserveEntityReplacementText to false means that information is lost completely and this is not recommended.
Namespaces
Lexcial preservation creates elements in several namespaces, the following table provides a summary:
Usual prefix | Namespace URI | Description |
---|---|---|
preserve | http://www.deltaxml.com/ns/preserve | All generated markup uses this namespace unless one of those mentioned below |
er | http://www.deltaxml.com/ns/entity-references | Entity references are represented as elements using this namespace and a local name based on the entity name |
pi | http://www.deltaxml.com/ns/processing-instructions | Processing instructions are represented as elements using this namespace and a local name based on the PI target |
Compatibility with Previous Releases
Lexical preservation is now a feature setting on a PipelinedComparatorS9, rather than being an XMLFilter that is added at the start of the input pipelines. This method of preserving items replaces the previous LexicalPreservation filter which has been removed.