public final class WordInfilter62 extends XMLFilterImpl2
This is the infilter for the Word by Word pipeline, which is discussed in the Word by Word Text Comparison guide.
Wraps words in a <deltaxml:word> element, punctuation in a <deltaxml:punctuation> element
and whitespace in a <deltaxml:space> element.
Example The xml snippet:
<p>sample of word by word text!</p>would be converted to (pretty printed)
<p> <deltaxml:word>sample</deltaxml:word> <deltaxml:space> </deltaxml:space> <deltaxml:word>of</deltaxml:word> <deltaxml:space> </deltaxml:space> <deltaxml:word>word</deltaxml:word> <deltaxml:space> </deltaxml:space> <deltaxml:word>by</deltaxml:word> <deltaxml:space> </deltaxml:space> <deltaxml:word>word</deltaxml:word> <deltaxml:space> </deltaxml:space> <deltaxml:word>text</deltaxml:word> <deltaxml:punctuation>!</deltaxml:punctuation> </p>
To prevent detailed comparison of a specific section of the XML file, the deltaxml:word-by-word attribute should
be included in an element and set to false. This attribute applies to a complete subtree beneath that element
unless overriden by another deltaxml:word-by-word attribute at a lower level.
The use of xml:space="preserve" on an element will also have the effect of preventing detailed comparison unless
WordInfilter62.setProcessPreserveSpaceText(String) has been set to true.
To change the characters that should be treated as punctuation, the deltaxml:punctuation attribute should be
included in an element with the punctuation characters in a space separated list. This attribute applies to a complete subtree
beneath that element unless overriden by another deltaxml:punctuation attribute at a lower level. If no
deltaxml:punctuation attribute is found in the input then punctuation will not be wrapped in elements; ie the
default set of punctuation characters is the empty set unless overridden. A suggested/possible set of characters could be
included with this attribute deltaxml:punctuation=". , ; : ! ? ( ) ' " \\ /".
WordInfilter should always be used in conjunction with WordSpaceFixup62 and WordOutfilter62. It is
designed to be used as a pre-filter and should be placed on both XMLComparator input pipelines.
Note: As of XML Compare 5.0, the handling of elements marked as formating with
@deltaxml:format='true' is now done with separate XSLT in/outfilters dx2-format-infilter.xsl (which
should be included in the pipeline before this filter) and dx2-format-outfilter.xsl.
Note: This class has not been designed to be extended, therefore to err on the side of caution, it has been declared final.
WordSpaceFixup62,
WordOutfilter62| Constructor and Description |
|---|
WordInfilter62()
Creates a new WordInfilter.
|
| Modifier and Type | Method and Description |
|---|---|
void |
characters(char[] ch,
int start,
int length)
Overrides the default
characters method. |
void |
endDocument()
Overrides the default
endDocument method. |
void |
endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName)
Overrides the default
endElement method. |
boolean |
getisCharacterByCharacter()
Returns the current value of isCharacterByCharacter.
|
void |
ignorableWhitespace(char[] ch,
int start,
int length)
Overrides the default
ignorableWhitespace method. |
boolean |
isProcessPreserveSpaceText()
Returns the current value of processPreserveSpaceText.
|
void |
setisCharacterByCharacter(boolean cbc)
Used to control where <deltaxml:word> elements contain a single character or a word.
|
void |
setProcessPreserveSpaceText(java.lang.String value)
Used to control whether text under an element with
xml:space="preserve" is split into words. |
void |
startDocument()
Overrides the default
startDocument method. |
void |
startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes atts)
Overrides the default
startElement method. |
void |
startPrefixMapping(java.lang.String prefix,
java.lang.String uri)
Overrides the default
startPrefixMapping method. |
attributeDecl, comment, elementDecl, endCDATA, endDTD, endEntity, externalEntityDecl, getProperty, internalEntityDecl, parse, parse, setProperty, startCDATA, startDTD, startEntityendPrefixMapping, error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, notationDecl, processingInstruction, resolveEntity, setContentHandler, setDocumentLocator, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, skippedEntity, unparsedEntityDecl, warningpublic void setisCharacterByCharacter(boolean cbc)
cbc - sets character-by-character mode when truepublic boolean getisCharacterByCharacter()
WordInfilter62.setisCharacterByCharacter(boolean)public void setProcessPreserveSpaceText(java.lang.String value)
Used to control whether text under an element with xml:space="preserve" is split into words.
For historical reasons, any text under the subtree of an element with the xml:space="preserve" attribute on it
is not split into words by default. To enable word-by-word processing of such text, pass a value of true to this
method.
value - whether to split preserve-space text into words or notpublic boolean isProcessPreserveSpaceText()
WordInfilter62.setProcessPreserveSpaceText(String)public void endDocument()
throws org.xml.sax.SAXException
endDocument method.endDocument in interface org.xml.sax.ContentHandlerendDocument in class org.xml.sax.helpers.XMLFilterImplorg.xml.sax.SAXException - the superclass may throw an exception during processingXMLFilterImpl.endDocument()public void startDocument()
throws org.xml.sax.SAXException
startDocument method.startDocument in interface org.xml.sax.ContentHandlerstartDocument in class org.xml.sax.helpers.XMLFilterImplorg.xml.sax.SAXException - the superclass may throw an exception during processingXMLFilterImpl.startDocument()public void startPrefixMapping(java.lang.String prefix,
java.lang.String uri)
throws org.xml.sax.SAXException
startPrefixMapping method. This version of the method performs internal operations.startPrefixMapping in interface org.xml.sax.ContentHandlerstartPrefixMapping in class org.xml.sax.helpers.XMLFilterImplprefix - the namespace prefixuri - the namespace URIorg.xml.sax.SAXException - the superclass may throw an exception during processingXMLFilterImpl.startPrefixMapping(String, String)public void startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes atts)
throws org.xml.sax.SAXException
Overrides the default startElement method. This method processes xml:space and
deltaxml:word-by-word attributes.
startElement in interface org.xml.sax.ContentHandlerstartElement in class org.xml.sax.helpers.XMLFilterImpluri - the URI of the element's namespacelocalName - the element's localnameqName - the element's qualified nameatts - the element's attributesorg.xml.sax.SAXException - the superclass may throw an exception during processingXMLFilterImpl.startElement(String, String, String, Attributes)public void endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName)
throws org.xml.sax.SAXException
endElement method. This version of the method performs internal operations.endElement in interface org.xml.sax.ContentHandlerendElement in class org.xml.sax.helpers.XMLFilterImpluri - the URI of the element's namespacelocalName - the element's localnameqName - the element's qualified nameorg.xml.sax.SAXException - the superclass may throw an exception during processingXMLFilterImpl.endElement(String, String, String)public void characters(char[] ch,
int start,
int length)
throws org.xml.sax.SAXException
characters method. This version of the method stores the characters so that they can be
output at a later stage.characters in interface org.xml.sax.ContentHandlercharacters in class org.xml.sax.helpers.XMLFilterImplch - an array of charactersstart - the starting position in the arraylength - the number of characters to use from the arrayorg.xml.sax.SAXException - the superclass may throw an exception during processingXMLFilterImpl.characters(char[], int, int)public void ignorableWhitespace(char[] ch,
int start,
int length)
throws org.xml.sax.SAXException
ignorableWhitespace method. This version of the method performs internal operations.ignorableWhitespace in interface org.xml.sax.ContentHandlerignorableWhitespace in class org.xml.sax.helpers.XMLFilterImplch - an array of charactersstart - the starting position in the arraylength - the number of characters to use from the arrayorg.xml.sax.SAXException - the superclass may throw an exception during processingXMLFilterImpl.ignorableWhitespace(char[], int, int)