public class WordSpaceFixup62 extends WordBufferingFilter
This is a fixup outfilter which is part of the Word by Word pipeline, which is discussed in the Word by Word Text Comparison guide.
This SAX filter changes the delta of unchanged space elements that are between modified words or punctuation elements. This filter should be used before WordOutfilter62 so as to produce less verbose result.
Example
If comparing "<p>hello world</p>" to "<p>goodbye universe</p>" your delta will look like:
<p deltaxml:deltaV2='A!=B'> <deltaxml:word deltaxml:deltaV2='A!=B'> <deltaxml:textGroup deltaxml:deltaV2='A!=B'> <deltaxml:text deltaxml:deltaV2='A'>hello</deltaxml:text> <deltaxml:text deltaxml:deltaV2='B'>goodbye</deltaxml:text> </deltaxml:textGroup> </deltaxml:word> <deltaxml:space deltaxml:deltaV2='A=B'> </deltaxml:space> <deltaxml:word deltaxml:deltaV2='A!=B'> <deltaxml:textGroup deltaxml:deltaV2='A!=B'> <deltaxml:text deltaxml:deltaV2='A'>world</deltaxml:text> <deltaxml:text deltaxml:deltaV2='B'>universe</deltaxml:text> </deltaxml:textGroup> </deltaxml:word> </p>If you were to run WordOutfilter62 on this delta, the result would be two
deltaxml:textGroup
s with a space in the
middle. By processing the delta so that the space becomes modified, WordOutfilter62 will produce a single
deltaxml:textGroup
. After running through WordSpaceFixup62 the above delta will become:
<p deltaxml:deltaV2='A!=B'> <deltaxml:word deltaxml:deltaV2='A!=B'> <deltaxml:textGroup deltaxml:deltaV2='A!=B'> <deltaxml:text deltaxml:deltaV2='A'>hello</deltaxml:text> <deltaxml:text deltaxml:deltaV2='B'>goodbye</deltaxml:text> </deltaxml:textGroup> </deltaxml:word> <deltaxml:space deltaxml:deltaV2='A!=B'> <deltaxml:textGroup deltaxml:deltaV2='A!=B'> <deltaxml:text deltaxml:deltaV2='A'> </deltaxml:text> <deltaxml:text deltaxml:deltaV2='B'> </deltaxml:text> </deltaxml:textGroup> </deltaxml:space> <deltaxml:word deltaxml:deltaV2='A!=B'> <deltaxml:textGroup deltaxml:deltaV2='A!=B'> <deltaxml:text deltaxml:deltaV2='A'>world</deltaxml:text> <deltaxml:text deltaxml:deltaV2='B'>universe</deltaxml:text> </deltaxml:textGroup> </deltaxml:word> </p>
Filter Requirements
WordSpaceFixup62 is an outfilter which should always be run on a Word By Word pipeline before WordOutfilter62.
WordInfilter62
,
WordOutfilter62
Constructor and Description |
---|
WordSpaceFixup62()
Creates a new WordSpaceFixup.
|
characters, endElement, isBufferModifiedOnly, setwrapUnchangedText, startElement, startPrefixMapping
endDocument, getClosestAttributeValueFromAncestor, getClosestAttributeValueFromAnyAncestor, getGrandParentLocalName, getParentLocalName, getParentQName, getProperty, hasAncestor, hasAncestor, hasAncestorWithAttr, hasAncestorWithAttrValue, hasAncestorWithAttrValues, outputCharacters, outputEndElement, outputStartElement, popAncestorStack, pushAncestorStack, setProperty, stackDepth, startDocument
attributeDecl, comment, elementDecl, endCDATA, endDTD, endEntity, externalEntityDecl, internalEntityDecl, parse, parse, startCDATA, startDTD, startEntity
endPrefixMapping, error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setContentHandler, setDocumentLocator, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, skippedEntity, unparsedEntityDecl, warning