public abstract class LexicalPreservationBase extends XMLFilterImpl3
The LexicalPreservationBase filter provides a means for retaining information that is usually discarded for round trip
processing. For example, it can extract and store DOCTYPE declarations, element declarations, attribute
declarations, entity declarations, default attribute expansions, and entity references. It can also extract and store some of
the XML Declaration data (i.e. XML version and character encoding attributes), so long as an input stream is used to read the
incoming data. If a reader (character stream) is used then the original character encoding is lost (at least by a Xerces SAX
parser).
The XML declaration data is stored in a preserve:xmldecl element, which is a child of the root element. The
internal subset data is stored in a preserve:doctype element, which is a child of the root element. It is possible
to turn off both the XML declaration and DOCTYPE (including its internal subset) storage by removing the
LexicalPreservationBase.PreserveItem.XML_DECL and LexicalPreservationBase.PreserveItem.DOCTYPE items from the set of items to preserve respectively.
The LexicalPreservationBase.PreserveItem.COMMENT and LexicalPreservationBase.PreserveItem.PROCESS_INST are used to indicate whether XML comments and processing
instructions (PIs) should be encoded into preserve:comment and pi:tag elements, where
tag is replaced by the name of the PI. If they are not encoded, then the XML comments and PIs are left in situ.
Note that normal comments and PIs are lost during the comparison process, so you may wish to use the alternative filter for
encoding them, such as xml2pi.xsl filter. Encoded XML comments and processing instructions (PIs) that appear
outside the root element are contained in a preserve:pi-and-comment element, which is a child of the root element.
There can be up to three preserve:pi-and-comment elements, which are distinguished by their region
attribute value:
LexicalPreservationBase.Region.BEFORE_DTD - PIs and comments before the DOCTYPE/Internal-Subset declaration.LexicalPreservationBase.Region.AFTER_DTD - PIs and comments after the DOCTYPE/Internal-Subset declaration, but before the root element.LexicalPreservationBase.Region.AFTER_BODY - PIs and comments after the root element (XML body has been completed).
The preserve:defaultAttributes attribute is added to any element that contains defaulted attributes. Its job is to
record the attribute names, so that they can be stripped out later. It is possible to turn off the default attribute
identification by removing the LexicalPreservationBase.PreserveItem.ATTRIBUTES from the set of items to preserve.
Entity references can be marked for later round trip preservation by adding the LexicalPreservationBase.PreserveItem.ENTITY_REF to the set of
items to preserve. Further, it is also possible to control the amount of information that is retained when preserving an entity
application, via the use of the LexicalPreservationBase.PreserveItem.ENTITY_REPLACEMENT_TEXT and LexicalPreservationBase.PreserveItem.INNER_ENTITY_REF items.
If LexicalPreservationBase.PreserveItem.ENTITY_REPLACEMENT_TEXT is in the set of preserved items, then the content of the entity application is
retained. If LexicalPreservationBase.PreserveItem.INNER_ENTITY_REF is in the set of preserved items then inner entity applications are also
marked. Note that it is possible, though not recommended, to omit both the LexicalPreservationBase.PreserveItem.ENTITY_REF and
LexicalPreservationBase.PreserveItem.ENTITY_REPLACEMENT_TEXT from the set of items to be preserved; this will result in entity references being
omitted from the output altogether.
Note that the underpinning XML parser must provide Locator2 and Attributes2 objects in order for character
encodings and default attribute expansion to be detected. Further it needs to support the
"http://xml.org/sax/features/resolve-dtd-uris" feature, which is set to false, to ensure that relative references are
retained. This feature is implemented by the Xerces_2 parser.
Example File
(01) <?xml version="1.0" encoding="UTF-8"?> (02) <!-- A pre DOCTYPE comment --> (03) <!DOCTYPE article SYSTEM "http://www.docbook.org/xml/4.5/docbookx.dtd" (04) [ <!ENTITY % paramEnt " (05) <!ATTLIST simpara level (unknown|novice|trainee|practioner|expert) 'unknown'> (06) "> (07) <!ELEMENT exampleElement (#PCDATA)> (08) <!ATTLIST exampleElement yesNo (yes|no) 'no'> (09) %paramEnt; (10) <!ENTITY genEnt "an <emphasis role='bold'>internal (parsed) general</emphasis> entity."> (11) ]> (12) <?myPI Content of the processing instruction.?> (13) <article> (14) <title>Lexical Preservation Filter Demo</title> (15) <!-- In the following paragraph we reference the entity &genEnt; --> (16) <para>This paragraph references &genEnt;</para> (17) <para><![CDATA[Content of the CDATA Section text]]></para> (18) <simpara>An overridden simpara with a defaulted level attribute.</simpara> (19) </article> (20) <!-- A post XML body comment -->
The results of applying this filter to the above file is discussed in stages where the lines under consideration are reproduced along with the output where everything is being preserved. For clarity, the whitespace aspect of the preservation is not maintained.
XML Declaration and Root Element
(01) <?xml version="1.0" encoding="UTF-8"?> (13) <article>
In order to compare and preserve the XML declaration it is added to the body of the document, as illustrated by line labelled (01b) below. Further, the namespaces that are used by this lexical preservation filter (13b to 13e) are attached to the * root element (13a).
(01a) <?xml version="1.0" encoding="UTF-8"?> (13a) <article (13b) xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" (13c) xmlns:preserve="http://www.deltaxml.com/ns/preserve" (13d) xmlns:er="http://www.deltaxml.com/ns/entity-references" (13e) xmlns:pi="http://www.deltaxml.com/ns/processing-instructions"> (01b) <preserve:xmldecl xml-version="1.0" encoding="UTF-8"/>
Processing Instructions and Comments before the DOCTYPE declaration
(02) <!-- A pre DOCTYPE comment -->
In order to compare and preserve comments and processing instructions that occur before the DOCTYPE declaration a
preserve:pi-and-comment block is introduced, with mode BEFORE_DTD.
(02a) <preserve:pi-and-comment region="BEFORE_DTD"> (02b) <preserve:comment> A pre DOCTYPE comment </preserve:comment> (02c) </preserve:pi-and-comment>
DOCTYPE and Internal Subset Declaration
(03) <!DOCTYPE article SYSTEM "http://www.docbook.org/xml/4.5/docbookx.dtd" (04) [ <!ENTITY % paramEnt " (05) <!ATTLIST q level (unknown|novice|trainee|practioner|expert) 'unknown'> (06) "> (07) <!ELEMENT exampleElement (#PCDATA)> (08) <!ATTLIST exampleElement yesNo (yes|no) 'no'> (09) %paramEnt; (10) <!ENTITY genEnt "an <emphasis role='bold'>internal (parsed) general</emphasis> entity."> (11) ]>
In order to compare and preserve the DOCTYPE and internal subset declaration it is added to the preserve:doctype block.
The content of the entity declarations is escaped using the ASCII exclamation mark (!) characters, where '!!' represents the
exclamation mark character. This special form of escaping ensures that it does not interfere with standard XML entity encoding
mechanisms. Note that the entity reference, in line (09), is transformed into four lines (09a), (05b), (05c), and (09b); the
key point is that the definition of the entity has been expanded, and so can be compared.
(03a) <preserve:doctype name="article" systemId="http://www.docbook.org/xml/4.5/docbookx.dtd">
(04a) <preserve:internalParsedParameterEntityDecl name="paramEnt" deltaxml:key="entity_par_paramEnt"
(05a) value="
!(*lt!)!!ATTLIST simpara level (unknown|novice|trainee|practioner|expert) !(*apos!)unknown!(*apos!)!(*gt!)
"
(06a) />
(07a) <preserve:elementDecl name="exampleElement" deltaxml:key="element_exampleElement" model="(#PCDATA)"/>
(08a) <preserve:attributeDecl name="yesNo" deltaxml:key="attribute(exampleElement,yesNo)"
(08b) eName="exampleElement" type="(yes|no)" value="no"/>
(09a) <er:paramEnt parameter="yes">
(05b) <preserve:attributeDecl name="level" deltaxml:key="attribute(simpara,level)"
(05c) eName="simpara" type="(unknown|novice|trainee|practioner|expert)" value="unknown"/>
(09b) </er:paramEnt>
(10a) <preserve:internalParsedGeneralEntityDecl name="genEnt" deltaxml:key="entity_gen_genEnt"
(10b) value="an !(*lt!)emphasis role=!(*apos!)bold!(*apos!)!(*gt!)internal (parsed) general!(*lt!)/emphasis!(*gt!) entity."/>
(03b) </preserve:doctype>
Processing instructions and comments between DOCTYPE and body
(12) <?myPI Content of the processing instruction.?>
In order to compare and preserve comments and processing instructions that occur between the DOCTYPE declaration and the body,
a preserve:pi-and-comment block is introduced, with mode AFTER_DTD.
(12a) <preserve:pi-and-comment region="AFTER_DTD"> (12b) <pi:myPI>Content of the processing instruction.</pi:myPI> (12c) </preserve:pi-and-comment>
The body of the document
(14) <title>Lexical Preservation Filter Demo</title> (15) <!-- In the following paragraph we reference the entity &genEnt; --> (16) <para>This paragraph references &genEnt;</para> (17) <para><![CDATA[Content of the CDATA Section text] ]></para> (18) <simpara>An overridden simpara with a defaulted level attribute.</simpara>
The following 5 lines illustrate how comments, entity references, CDATA sections, and defaulted attributes are encoded so that they can be compared and preserved within the body of the document.
(14a) <title>Lexical Preservation Filter Demo</title>
(15a) <preserve:comment> In the following paragraph we reference the entity &genEnt; </preserve:comment>
(16a) <para>This paragraph references <er:genEnt>an <emphasis role="bold">internal (parsed) general</emphasis> entity.</er:genEnt></para>
(17a) <para><preserve:cdata>Content of the CDATA Section text</preserve:cdata></para>
(18a) <simpara level="unknown" preserve:defaultAttributes="{}level">An overridden simpara with a defaulted level attribute.</simpara>
Ending the body of the document and the post body Processing Instructions and Comments.
(19) </article> (20) <!-- A post XML body comment -->]]>
In order to compare and preserve comments and processing instructions that occur after the body of the document a
preserve:pi-and-comment block is introduced, with mode AFTER_BODY.
(20a) <preserve:pi-and-comment region="AFTER_BODY"> (20b) <preserve:comment> A post XML body comment </preserve:comment> (20c) </preserve:pi-and-comment> (19a) </article>
| Modifier and Type | Class and Description |
|---|---|
static class |
LexicalPreservationBase.PreserveItem
An enum used to specify which items should be preserved.
|
static class |
LexicalPreservationBase.Region
An enumeration marks the region of the document that is being parsed.
|
XMLFilterImpl3.SaxEventItemPROCESS_ADDITIONAL_INFO, PROCESS_ALL, PROCESS_BODY, PROCESS_DATA, PROCESS_DECLS, PROCESS_ELEM_AND_ATTRIB_DECLS, PROCESS_ENTITY_AND_NOTATION_DECLS, PROCESS_ENTITY_DECLS, PROCESS_ENTITY_REFS, PROCESS_EXCEPTIONS, PROCESS_INTERNAL_SUBSET, PROCESS_NORMAL_BODY| Constructor and Description |
|---|
LexicalPreservationBase()
Constructs a new LexicalPreservationBase.
|
| Modifier and Type | Method and Description |
|---|---|
void |
attributeDecl(java.lang.String eName,
java.lang.String aName,
java.lang.String type,
java.lang.String mode,
java.lang.String value)
Implementation of the attributeDecl SAX event handler.
|
void |
characters(char[] ch,
int start,
int length)
Implementation of the characters SAX event handler.
|
void |
comment(char[] ch,
int start,
int length)
Implementation of the comment SAX event handler.
|
void |
elementDecl(java.lang.String name,
java.lang.String model)
Implementation of the elementDecl SAX event handler.
|
void |
endCDATA()
Implementation of the endCDATA SAX event handler.
|
void |
endDocument()
Implementation of the SAX endDocument event handler.
|
void |
endDTD()
Implementation of the endDTD SAX event handler.
|
void |
endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String name)
Implementation of the SAX startElement event handler.
|
void |
endEntity(java.lang.String name)
Implementation of the endEntity SAX event handler.
|
void |
externalEntityDecl(java.lang.String name,
java.lang.String publicId,
java.lang.String systemId)
Implementation of the externalEntityDecl SAX event handler.
|
void |
ignorableWhitespace(char[] ch,
int start,
int length)
Implementation of the ignorableWhitespace SAX event handler.
|
void |
internalEntityDecl(java.lang.String name,
java.lang.String value)
Implementation of the internalEntityDecl SAX event handler.
|
void |
notationDecl(java.lang.String name,
java.lang.String publicId,
java.lang.String systemId)
Implementation of the notationDecl SAX event handler.
|
void |
processingInstruction(java.lang.String target,
java.lang.String data)
Implementation of the processingInstruction SAX event handler.
|
void |
setDocumentLocator(org.xml.sax.Locator locator)
Implementation of the SAX setDocumentationLocator event handler.
|
void |
setParent(org.xml.sax.XMLReader parent) |
void |
skippedEntity(java.lang.String name)
Implementation of the skipEntity SAX event handler.
|
void |
startCDATA()
Implementation of the startCDATA SAX event handler.
|
void |
startDocument()
Implementation of the SAX startDocument event handler.
|
void |
startDTD(java.lang.String name,
java.lang.String publicId,
java.lang.String systemId)
Implementation of the startDTD SAX event handler.
|
void |
startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String name,
org.xml.sax.Attributes atts)
Implementation of the SAX startElement event handler.
|
void |
startEntity(java.lang.String name)
Implementation of the startEntity SAX event handler.
|
void |
unparsedEntityDecl(java.lang.String name,
java.lang.String publicId,
java.lang.String systemId,
java.lang.String notationName)
Implementation of the externalEntityDecl SAX event handler.
|
endPrefixMapping, error, fatalError, getProcessFilter, processEvent, setProcessFilter, setProcessFilter, setProcessFilter, startPrefixMapping, warninggetProperty, parse, parse, setPropertypublic LexicalPreservationBase()
public void setParent(org.xml.sax.XMLReader parent)
setParent in interface org.xml.sax.XMLFiltersetParent in class org.xml.sax.helpers.XMLFilterImplpublic void setDocumentLocator(org.xml.sax.Locator locator)
setDocumentLocator in interface org.xml.sax.ContentHandlersetDocumentLocator in class XMLFilterImpl3locator - The SAX locator data.public void startDocument()
throws org.xml.sax.SAXException
startDocument in interface org.xml.sax.ContentHandlerstartDocument in class XMLFilterImpl3org.xml.sax.SAXException - when there is a problem with the SAX event stream.public void endDocument()
throws org.xml.sax.SAXException
endDocument in interface org.xml.sax.ContentHandlerendDocument in class XMLFilterImpl3org.xml.sax.SAXException - when there is a problem with the SAX event stream.public void startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String name,
org.xml.sax.Attributes atts)
throws org.xml.sax.SAXException
startElement in interface org.xml.sax.ContentHandlerstartElement in class XMLFilterImpl3uri - The namespace of the element.localName - The element's local name.name - The element's qualified name.atts - The element's attributes.org.xml.sax.SAXException - when there is a problem with the SAX event stream.public void endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String name)
throws org.xml.sax.SAXException
endElement in interface org.xml.sax.ContentHandlerendElement in class XMLFilterImpl3uri - The namespace of the element.localName - The element's local name.name - The element's qualified name.org.xml.sax.SAXException - when there is a problem with the SAX event stream.public void startDTD(java.lang.String name,
java.lang.String publicId,
java.lang.String systemId)
throws org.xml.sax.SAXException
startDTD in interface org.xml.sax.ext.LexicalHandlerstartDTD in class XMLFilterImpl3name - The name of the root element.publicId - The doctype's public identifier.systemId - The doctype's system identifier.org.xml.sax.SAXException - when there is a problem with the SAX event stream.LexicalHandler.startDTD(String, String, String)public void endDTD()
throws org.xml.sax.SAXException
endDTD in interface org.xml.sax.ext.LexicalHandlerendDTD in class XMLFilterImpl3org.xml.sax.SAXException - when there is a problem with the SAX event stream.LexicalHandler.startDTD(String, String, String)public void startEntity(java.lang.String name)
throws org.xml.sax.SAXException
startEntity in interface org.xml.sax.ext.LexicalHandlerstartEntity in class XMLFilterImpl3name - The entity's name.org.xml.sax.SAXException - when there is a problem with the SAX event stream.LexicalHandler.startEntity(String)public void endEntity(java.lang.String name)
throws org.xml.sax.SAXException
endEntity in interface org.xml.sax.ext.LexicalHandlerendEntity in class XMLFilterImpl3name - The entity's name.org.xml.sax.SAXException - when there is a problem with the SAX event stream.LexicalHandler.endEntity(String)public void skippedEntity(java.lang.String name)
throws org.xml.sax.SAXException
skippedEntity in interface org.xml.sax.ContentHandlerskippedEntity in class XMLFilterImpl3name - The entity's name.org.xml.sax.SAXException - when there is a problem with the SAX event stream.public void elementDecl(java.lang.String name,
java.lang.String model)
throws org.xml.sax.SAXException
elementDecl in interface org.xml.sax.ext.DeclHandlerelementDecl in class XMLFilterImpl3name - The name of the element being declared.model - The model used to define what content the element is allowed to contain.org.xml.sax.SAXException - when there is a problem with the SAX event stream.DeclHandler.elementDecl(String, String)public void attributeDecl(java.lang.String eName,
java.lang.String aName,
java.lang.String type,
java.lang.String mode,
java.lang.String value)
throws org.xml.sax.SAXException
attributeDecl in interface org.xml.sax.ext.DeclHandlerattributeDecl in class XMLFilterImpl3eName - The name of the element that the attribute being declared belongs to.aName - The name of the attribute that is being declared.type - The attribute's type.mode - The attribute's mode (e.g required, optional, fixed).value - The attribute's default value.org.xml.sax.SAXException - when there is a problem with the SAX event stream.DeclHandler.attributeDecl(String, String, String, String, String)public void internalEntityDecl(java.lang.String name,
java.lang.String value)
throws org.xml.sax.SAXException
internalEntityDecl in interface org.xml.sax.ext.DeclHandlerinternalEntityDecl in class XMLFilterImpl3name - The name of the internal parsed entity being declared.value - The definition of the entities content (replacement text).org.xml.sax.SAXException - when there is a problem with the SAX event stream.DeclHandler.internalEntityDecl(String, String)public void externalEntityDecl(java.lang.String name,
java.lang.String publicId,
java.lang.String systemId)
throws org.xml.sax.SAXException
externalEntityDecl in interface org.xml.sax.ext.DeclHandlerexternalEntityDecl in class XMLFilterImpl3name - The name of the external parsed entity being declared.publicId - The public identity of the entity.systemId - The system identity of the entity.org.xml.sax.SAXException - when there is a problem with the SAX event stream.DeclHandler.externalEntityDecl(String, String, String)public void unparsedEntityDecl(java.lang.String name,
java.lang.String publicId,
java.lang.String systemId,
java.lang.String notationName)
throws org.xml.sax.SAXException
unparsedEntityDecl in interface org.xml.sax.DTDHandlerunparsedEntityDecl in class XMLFilterImpl3name - The name of the unparsed entity being declared.publicId - The public identity of the entity.systemId - The system identity of the entity.notationName - The entity's notationorg.xml.sax.SAXException - when there is a problem with the SAX event stream.public void notationDecl(java.lang.String name,
java.lang.String publicId,
java.lang.String systemId)
throws org.xml.sax.SAXException
notationDecl in interface org.xml.sax.DTDHandlernotationDecl in class XMLFilterImpl3name - The name of the notation entity being declared.publicId - The public identity of the notation.systemId - The system identity of the notation.org.xml.sax.SAXException - when there is a problem with the SAX event stream.public void comment(char[] ch,
int start,
int length)
throws org.xml.sax.SAXException
comment in interface org.xml.sax.ext.LexicalHandlercomment in class XMLFilterImpl3ch - The text.start - The start position of the text to be extracted.length - The number of characters to extract.org.xml.sax.SAXException - when there is a problem with the SAX event stream.LexicalHandler.comment(char[], int, int)public void processingInstruction(java.lang.String target,
java.lang.String data)
throws org.xml.sax.SAXException
processingInstruction in interface org.xml.sax.ContentHandlerprocessingInstruction in class XMLFilterImpl3target - The name of the processing instruction.data - The content of the processing instruction.org.xml.sax.SAXException - when there is a problem with the SAX event stream.public void startCDATA()
throws org.xml.sax.SAXException
startCDATA in interface org.xml.sax.ext.LexicalHandlerstartCDATA in class XMLFilterImpl3org.xml.sax.SAXException - when there is a problem with the SAX event stream.LexicalHandler.startCDATA(),
ContentHandler.characters(char[], int, int)public void endCDATA()
throws org.xml.sax.SAXException
endCDATA in interface org.xml.sax.ext.LexicalHandlerendCDATA in class XMLFilterImpl3org.xml.sax.SAXException - when there is a problem with the SAX event stream.LexicalHandler.endCDATA(),
ContentHandler.characters(char[], int, int)public void characters(char[] ch,
int start,
int length)
throws org.xml.sax.SAXException
characters in interface org.xml.sax.ContentHandlercharacters in class XMLFilterImpl3ch - The text.start - The start position of the text to be extracted.length - The number of characters to extract.org.xml.sax.SAXException - when there is a problem with the SAX event stream.public void ignorableWhitespace(char[] ch,
int start,
int length)
throws org.xml.sax.SAXException
ignorableWhitespace in interface org.xml.sax.ContentHandlerignorableWhitespace in class XMLFilterImpl3ch - The text.start - The start position of the text to be extracted.length - The number of characters to extract.org.xml.sax.SAXException - when there is a problem with the SAX event stream.