public final class WordInfilter62 extends XMLFilterImpl2
This is the infilter for the Word by Word pipeline, which is discussed in the Word by Word Text Comparison guide.
Wraps words in a <deltaxml:word>
element, punctuation in a <deltaxml:punctuation>
element
and whitespace in a <deltaxml:space>
element.
Example The xml snippet:
<p>sample of word by word text!</p>would be converted to (pretty printed)
<p> <deltaxml:word>sample</deltaxml:word> <deltaxml:space> </deltaxml:space> <deltaxml:word>of</deltaxml:word> <deltaxml:space> </deltaxml:space> <deltaxml:word>word</deltaxml:word> <deltaxml:space> </deltaxml:space> <deltaxml:word>by</deltaxml:word> <deltaxml:space> </deltaxml:space> <deltaxml:word>word</deltaxml:word> <deltaxml:space> </deltaxml:space> <deltaxml:word>text</deltaxml:word> <deltaxml:punctuation>!</deltaxml:punctuation> </p>
To prevent detailed comparison of a specific section of the XML file, the deltaxml:word-by-word
attribute should
be included in an element and set to false
. This attribute applies to a complete subtree beneath that element
unless overriden by another deltaxml:word-by-word
attribute at a lower level.
The use of xml:space="preserve"
on an element will also have the effect of preventing detailed comparison unless
WordInfilter62.setProcessPreserveSpaceText(String)
has been set to true
.
To change the characters that should be treated as punctuation, the deltaxml:punctuation
attribute should be
included in an element with the punctuation characters in a space separated list. This attribute applies to a complete subtree
beneath that element unless overriden by another deltaxml:punctuation
attribute at a lower level. If no
deltaxml:punctuation
attribute is found in the input then punctuation will not be wrapped in elements; ie the
default set of punctuation characters is the empty set unless overridden. A suggested/possible set of characters could be
included with this attribute deltaxml:punctuation=". , ; : ! ? ( ) ' " \\ /"
.
WordInfilter
should always be used in conjunction with WordSpaceFixup62
and WordOutfilter62
. It is
designed to be used as a pre-filter and should be placed on both XMLComparator
input pipelines.
Note: As of XML Compare 5.0, the handling of elements marked as formating with
@deltaxml:format='true'
is now done with separate XSLT in/outfilters dx2-format-infilter.xsl
(which
should be included in the pipeline before this filter) and dx2-format-outfilter.xsl
.
Note: This class has not been designed to be extended, therefore to err on the side of caution, it has been declared final.
WordSpaceFixup62
,
WordOutfilter62
Constructor and Description |
---|
WordInfilter62()
Creates a new WordInfilter.
|
Modifier and Type | Method and Description |
---|---|
void |
characters(char[] ch,
int start,
int length)
Overrides the default
characters method. |
void |
endDocument()
Overrides the default
endDocument method. |
void |
endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName)
Overrides the default
endElement method. |
boolean |
getisCharacterByCharacter()
Returns the current value of isCharacterByCharacter.
|
void |
ignorableWhitespace(char[] ch,
int start,
int length)
Overrides the default
ignorableWhitespace method. |
boolean |
isProcessPreserveSpaceText()
Returns the current value of processPreserveSpaceText.
|
void |
setisCharacterByCharacter(boolean cbc)
Used to control where <deltaxml:word> elements contain a single character or a word.
|
void |
setProcessPreserveSpaceText(java.lang.String value)
Used to control whether text under an element with
xml:space="preserve" is split into words. |
void |
startDocument()
Overrides the default
startDocument method. |
void |
startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes atts)
Overrides the default
startElement method. |
void |
startPrefixMapping(java.lang.String prefix,
java.lang.String uri)
Overrides the default
startPrefixMapping method. |
attributeDecl, comment, elementDecl, endCDATA, endDTD, endEntity, externalEntityDecl, getProperty, internalEntityDecl, parse, parse, setProperty, startCDATA, startDTD, startEntity
endPrefixMapping, error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, notationDecl, processingInstruction, resolveEntity, setContentHandler, setDocumentLocator, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, skippedEntity, unparsedEntityDecl, warning
public void setisCharacterByCharacter(boolean cbc)
cbc
- sets character-by-character mode when truepublic boolean getisCharacterByCharacter()
WordInfilter62.setisCharacterByCharacter(boolean)
public void setProcessPreserveSpaceText(java.lang.String value)
Used to control whether text under an element with xml:space="preserve"
is split into words.
For historical reasons, any text under the subtree of an element with the xml:space="preserve"
attribute on it
is not split into words by default. To enable word-by-word processing of such text, pass a value of true
to this
method.
value
- whether to split preserve-space text into words or notpublic boolean isProcessPreserveSpaceText()
WordInfilter62.setProcessPreserveSpaceText(String)
public void endDocument() throws org.xml.sax.SAXException
endDocument
method.endDocument
in interface org.xml.sax.ContentHandler
endDocument
in class org.xml.sax.helpers.XMLFilterImpl
org.xml.sax.SAXException
- the superclass may throw an exception during processingXMLFilterImpl.endDocument()
public void startDocument() throws org.xml.sax.SAXException
startDocument
method.startDocument
in interface org.xml.sax.ContentHandler
startDocument
in class org.xml.sax.helpers.XMLFilterImpl
org.xml.sax.SAXException
- the superclass may throw an exception during processingXMLFilterImpl.startDocument()
public void startPrefixMapping(java.lang.String prefix, java.lang.String uri) throws org.xml.sax.SAXException
startPrefixMapping
method. This version of the method performs internal operations.startPrefixMapping
in interface org.xml.sax.ContentHandler
startPrefixMapping
in class org.xml.sax.helpers.XMLFilterImpl
prefix
- the namespace prefixuri
- the namespace URIorg.xml.sax.SAXException
- the superclass may throw an exception during processingXMLFilterImpl.startPrefixMapping(String, String)
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException
Overrides the default startElement
method. This method processes xml:space
and
deltaxml:word-by-word
attributes.
startElement
in interface org.xml.sax.ContentHandler
startElement
in class org.xml.sax.helpers.XMLFilterImpl
uri
- the URI of the element's namespacelocalName
- the element's localnameqName
- the element's qualified nameatts
- the element's attributesorg.xml.sax.SAXException
- the superclass may throw an exception during processingXMLFilterImpl.startElement(String, String, String, Attributes)
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws org.xml.sax.SAXException
endElement
method. This version of the method performs internal operations.endElement
in interface org.xml.sax.ContentHandler
endElement
in class org.xml.sax.helpers.XMLFilterImpl
uri
- the URI of the element's namespacelocalName
- the element's localnameqName
- the element's qualified nameorg.xml.sax.SAXException
- the superclass may throw an exception during processingXMLFilterImpl.endElement(String, String, String)
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException
characters
method. This version of the method stores the characters so that they can be
output at a later stage.characters
in interface org.xml.sax.ContentHandler
characters
in class org.xml.sax.helpers.XMLFilterImpl
ch
- an array of charactersstart
- the starting position in the arraylength
- the number of characters to use from the arrayorg.xml.sax.SAXException
- the superclass may throw an exception during processingXMLFilterImpl.characters(char[], int, int)
public void ignorableWhitespace(char[] ch, int start, int length) throws org.xml.sax.SAXException
ignorableWhitespace
method. This version of the method performs internal operations.ignorableWhitespace
in interface org.xml.sax.ContentHandler
ignorableWhitespace
in class org.xml.sax.helpers.XMLFilterImpl
ch
- an array of charactersstart
- the starting position in the arraylength
- the number of characters to use from the arrayorg.xml.sax.SAXException
- the superclass may throw an exception during processingXMLFilterImpl.ignorableWhitespace(char[], int, int)