public final class NormalizeSpace extends XMLFilterImpl2
Removes whitespace in PCDATA and attributes in an XML file. Whitespace is defined as:
#x20
hex)#x9
hex or \t
escaped Java character)#xA
hex or \n
escaped Java character)#xD
hex or \r
escaped Java character)
To preserve whitespace in a specific section of the XML file, the xml:space
attribute should be included in an
element and set to preserve
. This attribute applies to a complete subtree beneath that element unless overridden
by another xml:space
attribute at a lower level.
WARNING: We have found this attribute to successfully preserve space with the Saxon processor, but not with
Xalan-J. Therefore we strongly recommend the use of Saxon when any input data uses xml:space
attributes.
By default, whitespace between elements is removed. In the following example, the space between the bold and italic words would
be deleted:
<para>some text <bold>bold words</bold> <italic>italic words</italic></para>
To change this behaviour, adding the attribute deltaxml:mixed-content="true"
to the paragraph level causes the
whitespace to be normalised rather than removed. N.B. This attribute needs to be added at every level that it
is required, it is not inherited from parent elements.
NormalizeSpace
should be used as the first filter in a pipeline to ensure that subsequent filters do not regard
whitespace as significant. It should be applied to either both input files or neither of them. Applying it to a single input
file may have an undesired effect.
Note: This class has not been designed to be extended, therefore to err on the side of caution, it has been declared final.
Constructor and Description |
---|
NormalizeSpace()
Creates a new instance of
NormalizeSpace . |
Modifier and Type | Method and Description |
---|---|
void |
characters(char[] ch,
int start,
int length)
Overrides the default
characters method. |
void |
elementDecl(java.lang.String name,
java.lang.String model)
Overrides the default
elementDecl method. |
void |
endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName)
Overrides the default
endElement method. |
void |
ignorableWhitespace(char[] ch,
int start,
int length)
Overrides the default
ignorableWhitespace method. |
void |
setnormalizeAttValues(java.lang.String value)
Specifies whether to normalize attribute values.
|
void |
startDocument()
Overrides the default
startDocument method. |
void |
startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes atts)
Overrides the default
startElement method. |
void |
startPrefixMapping(java.lang.String prefix,
java.lang.String uri)
Overrides the default
startPrefixMapping method. |
attributeDecl, comment, endCDATA, endDTD, endEntity, externalEntityDecl, getProperty, internalEntityDecl, parse, parse, setProperty, startCDATA, startDTD, startEntity
endDocument, endPrefixMapping, error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, notationDecl, processingInstruction, resolveEntity, setContentHandler, setDocumentLocator, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, skippedEntity, unparsedEntityDecl, warning
public NormalizeSpace()
NormalizeSpace
. An instance cannot be shared amongst pipelines, it can only receive
SAX event inputs from a single source and send events to a single output, thus two instances of this class are typically
required when used in conjunction with the DeltaXML XMLComparator
.public void setnormalizeAttValues(java.lang.String value)
value
- whether to normalize attribute valuespublic void elementDecl(java.lang.String name, java.lang.String model) throws org.xml.sax.SAXException
elementDecl
method.elementDecl
in interface org.xml.sax.ext.DeclHandler
elementDecl
in class XMLFilterImpl2
name
- - the element type namemodel
- - the content model as a normalized stringorg.xml.sax.SAXException
- the superclass may throw an exception during processingDeclHandler.elementDecl(String, String)
public void startPrefixMapping(java.lang.String prefix, java.lang.String uri) throws org.xml.sax.SAXException
startPrefixMapping
method.startPrefixMapping
in interface org.xml.sax.ContentHandler
startPrefixMapping
in class org.xml.sax.helpers.XMLFilterImpl
prefix
- - the namespace prefixuri
- - the namespace URIorg.xml.sax.SAXException
- the superclass may throw an exception during processing.XMLFilterImpl.startPrefixMapping(String, String)
public void startDocument() throws org.xml.sax.SAXException
startDocument
method. This method performs internal operations.startDocument
in interface org.xml.sax.ContentHandler
startDocument
in class org.xml.sax.helpers.XMLFilterImpl
org.xml.sax.SAXException
- the superclass may throw an exception during processing.public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException
startElement
method. This method performs internal operations.startElement
in interface org.xml.sax.ContentHandler
startElement
in class org.xml.sax.helpers.XMLFilterImpl
uri
- - the element's namespace URIlocalName
- - the element's localnameqName
- - the element's qualified nameatts
- - the element's attributesorg.xml.sax.SAXException
- the superclass may throw an exception during processing.XMLFilterImpl.startElement(String, String, String, Attributes)
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws org.xml.sax.SAXException
endElement
method. This method performs internal operations.endElement
in interface org.xml.sax.ContentHandler
endElement
in class org.xml.sax.helpers.XMLFilterImpl
uri
- - the element's namespace URIlocalName
- - the element's localnameqName
- - the element's qualified nameorg.xml.sax.SAXException
- the superclass may throw an exception during processing.XMLFilterImpl.endElement(String, String, String)
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException
characters
method. This version of the method removes whitespace from PCDATA within the
XML file unless the xml:space
attribute is set to preserve
.characters
in interface org.xml.sax.ContentHandler
characters
in class org.xml.sax.helpers.XMLFilterImpl
ch
- - an array of charactersstart
- - the starting position in the arraylength
- - the number of characters to use from the arrayorg.xml.sax.SAXException
- the superclass may throw an exception during processing.XMLFilterImpl.characters(char[], int, int)
public void ignorableWhitespace(char[] ch, int start, int length) throws org.xml.sax.SAXException
ignorableWhitespace
method. When a DTD is present in the XML file, inter-element
whitespace causes an ignorableWhitespace SAXEvent to occur rather then the normal characters SAXEvent. This method ensures
that such space is removed should a DTD be present.ignorableWhitespace
in interface org.xml.sax.ContentHandler
ignorableWhitespace
in class org.xml.sax.helpers.XMLFilterImpl
ch
- - an array of charactersstart
- - the starting position in the arraylength
- - the number of characters to use from the arrayorg.xml.sax.SAXException
- the superclass may thrown an exception during processing.XMLFilterImpl.ignorableWhitespace(char[], int, int)