Class HtmlDocumentBuilder
By default, when using the constructor without arguments, the
this parser coerces XML 1.0-incompatible infosets into XML 1.0-compatible
infosets. This corresponds to ALTER_INFOSET as the general
XML violation policy. To make the parser support non-conforming HTML fully
per the HTML 5 spec while on the other hand potentially violating the SAX2
API contract, set the general XML violation policy to ALLOW.
This does not work with a standard DOM implementation.
It is possible to treat XML 1.0 infoset violations as fatal by setting
the general XML violation policy to FATAL.
The doctype is not represented in the tree.
The document mode is represented as user data DocumentMode
object with the key nu.validator.document-mode on the document
node.
The form pointer is also stored as user data with the key
nu.validator.form-pointer.
- Version:
- $Id$
- Author:
- hsivonen
-
Constructor Summary
ConstructorsConstructorDescriptionInstantiates the document builder with the JAXP DOM implementation and the infoset-altering XML violation policy.HtmlDocumentBuilder(XmlViolationPolicy xmlPolicy) Instantiates the document builder with the JAXP DOM implementation and a specific XML violation policy.HtmlDocumentBuilder(DOMImplementation implementation) Instantiates the document builder with a specific DOM implementation and the infoset-altering XML violation policy.HtmlDocumentBuilder(DOMImplementation implementation, XmlViolationPolicy xmlPolicy) Instantiates the document builder with a specific DOM implementation and XML violation policy. -
Method Summary
Modifier and TypeMethodDescriptionvoidaddCharacterHandler(CharacterHandler characterHandler) Deprecated.Returns the commentPolicy.Returns the contentNonXmlCharPolicy.Returns the contentSpacePolicy.Returns theLocatorduring parse.Returns the document mode handler.Returns the DOM implementationThe policy for non-NCName element and attribute names.Returns the streamabilityViolationPolicy.Returns the xmlnsPolicy.booleanIndicates whether NFC normalization of source is being checked.booleanWhetherlangis mapped toxml:lang.booleanReturnstrue.booleanReturns the reportingDoctype.booleanWhether the parser considers scripting to be enabled for noscript treatment.booleanReturnsfalseFor API compatibility.parse(InputSource is) Parses a document from a SAXInputSource.parseFragment(InputSource is, String context) Parses a document fragment from a SAXInputSourcewith an HTML element as the fragment context.parseFragment(InputSource is, String contextLocal, String contextNamespace) Parses a document fragment from a SAXInputSource.voidsetBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy) Deprecated.voidsetCheckingNormalization(boolean enable) Toggles the checking of the NFC normalization of source.voidsetCommentPolicy(XmlViolationPolicy commentPolicy) Sets the policy for consecutive hyphens in comments.voidsetContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy) Sets the policy for non-XML characters except white space.voidsetContentSpacePolicy(XmlViolationPolicy contentSpacePolicy) Sets the policy for non-XML white space.voidsetDocumentModeHandler(DocumentModeHandler documentModeHandler) Sets the document mode handler.voidsetEntityResolver(EntityResolver resolver) Sets the entity resolver for URI-only inputs.voidsetErrorHandler(ErrorHandler errorHandler) Sets the error handler.voidsetHeuristics(Heuristics heuristics) Sets the encoding sniffing heuristics.voidsetIgnoringComments(boolean ignoreComments) Sets whether comment nodes appear in the tree.voidsetMappingLangToXmlLang(boolean mappingLangToXmlLang) Whetherlangis mapped toxml:lang.voidsetNamePolicy(XmlViolationPolicy namePolicy) The policy for non-NCName element and attribute names.voidsetReportingDoctype(boolean reportingDoctype) voidsetScriptingEnabled(boolean scriptingEnabled) Sets whether the parser considers scripting to be enabled for noscript treatment.voidsetStreamabilityViolationPolicy(XmlViolationPolicy streamabilityViolationPolicy) Sets the streamabilityViolationPolicy.voidsetTransitionHander(TransitionHandler handler) voidsetXmlnsPolicy(XmlViolationPolicy xmlnsPolicy) Whether thexmlnsattribute on the root element is passed to through.voidsetXmlPolicy(XmlViolationPolicy xmlPolicy) This is a catch-all convenience method for setting name, xmlns, content space, content non-XML char and comment policies in one go.Methods inherited from class javax.xml.parsers.DocumentBuilder
getSchema, isXIncludeAware, parse, parse, parse, parse, reset
-
Constructor Details
-
HtmlDocumentBuilder
Instantiates the document builder with a specific DOM implementation and XML violation policy.- Parameters:
implementation- the DOM implementationxmlPolicy- the policy
-
HtmlDocumentBuilder
Instantiates the document builder with a specific DOM implementation and the infoset-altering XML violation policy.- Parameters:
implementation- the DOM implementation
-
HtmlDocumentBuilder
public HtmlDocumentBuilder()Instantiates the document builder with the JAXP DOM implementation and the infoset-altering XML violation policy. -
HtmlDocumentBuilder
Instantiates the document builder with the JAXP DOM implementation and a specific XML violation policy.- Parameters:
xmlPolicy- the policy
-
-
Method Details
-
getDOMImplementation
Returns the DOM implementation- Specified by:
getDOMImplementationin classDocumentBuilder- Returns:
- the DOM implementation
- See Also:
-
isNamespaceAware
public boolean isNamespaceAware()Returnstrue.- Specified by:
isNamespaceAwarein classDocumentBuilder- Returns:
true- See Also:
-
isValidating
public boolean isValidating()Returnsfalse- Specified by:
isValidatingin classDocumentBuilder- Returns:
false- See Also:
-
newDocument
For API compatibility.- Specified by:
newDocumentin classDocumentBuilder- See Also:
-
parse
Parses a document from a SAXInputSource.- Specified by:
parsein classDocumentBuilder- Parameters:
is- the source- Returns:
- the doc
- Throws:
SAXException- if stuff goes wrongIOException- if IO goes wrong- See Also:
-
parseFragment
public DocumentFragment parseFragment(InputSource is, String context) throws IOException, SAXException Parses a document fragment from a SAXInputSourcewith an HTML element as the fragment context.- Parameters:
is- the sourcecontext- the context element name (HTML namespace assumed)- Returns:
- the document fragment
- Throws:
SAXException- if stuff goes wrongIOException- if IO goes wrong
-
parseFragment
public DocumentFragment parseFragment(InputSource is, String contextLocal, String contextNamespace) throws IOException, SAXException Parses a document fragment from a SAXInputSource.- Parameters:
is- the sourcecontextLocal- the local name of the context elementcontextNamespace- the namespace of the context element- Returns:
- the document fragment
- Throws:
SAXException- if stuff goes wrongIOException- if IO goes wrong
-
setEntityResolver
Sets the entity resolver for URI-only inputs.- Specified by:
setEntityResolverin classDocumentBuilder- Parameters:
resolver- the resolver- See Also:
-
setErrorHandler
Sets the error handler.- Specified by:
setErrorHandlerin classDocumentBuilder- Parameters:
errorHandler- the handler- See Also:
-
setTransitionHander
-
isCheckingNormalization
public boolean isCheckingNormalization()Indicates whether NFC normalization of source is being checked.- Returns:
trueif NFC normalization of source is being checked.- See Also:
-
setCheckingNormalization
public void setCheckingNormalization(boolean enable) Toggles the checking of the NFC normalization of source.- Parameters:
enable-trueto check normalization- See Also:
-
setCommentPolicy
Sets the policy for consecutive hyphens in comments.- Parameters:
commentPolicy- the policy- See Also:
-
setContentNonXmlCharPolicy
Sets the policy for non-XML characters except white space.- Parameters:
contentNonXmlCharPolicy- the policy- See Also:
-
setContentSpacePolicy
Sets the policy for non-XML white space.- Parameters:
contentSpacePolicy- the policy- See Also:
-
isScriptingEnabled
public boolean isScriptingEnabled()Whether the parser considers scripting to be enabled for noscript treatment.- Returns:
trueif enabled- See Also:
-
setScriptingEnabled
public void setScriptingEnabled(boolean scriptingEnabled) Sets whether the parser considers scripting to be enabled for noscript treatment.- Parameters:
scriptingEnabled-trueto enable- See Also:
-
getDocumentModeHandler
Returns the document mode handler.- Returns:
- the documentModeHandler
-
setDocumentModeHandler
Sets the document mode handler.- Parameters:
documentModeHandler- the documentModeHandler to set- See Also:
-
getStreamabilityViolationPolicy
Returns the streamabilityViolationPolicy.- Returns:
- the streamabilityViolationPolicy
-
setStreamabilityViolationPolicy
Sets the streamabilityViolationPolicy.- Parameters:
streamabilityViolationPolicy- the streamabilityViolationPolicy to set
-
getDocumentLocator
Returns theLocatorduring parse.- Returns:
- the
Locator
-
setMappingLangToXmlLang
public void setMappingLangToXmlLang(boolean mappingLangToXmlLang) Whetherlangis mapped toxml:lang.- Parameters:
mappingLangToXmlLang-- See Also:
-
isMappingLangToXmlLang
public boolean isMappingLangToXmlLang()Whetherlangis mapped toxml:lang.- Returns:
- the mappingLangToXmlLang
-
setXmlnsPolicy
Whether thexmlnsattribute on the root element is passed to through. (FATAL not allowed.)- Parameters:
xmlnsPolicy-- See Also:
-
getXmlnsPolicy
Returns the xmlnsPolicy.- Returns:
- the xmlnsPolicy
-
getCommentPolicy
Returns the commentPolicy.- Returns:
- the commentPolicy
-
getContentNonXmlCharPolicy
Returns the contentNonXmlCharPolicy.- Returns:
- the contentNonXmlCharPolicy
-
getContentSpacePolicy
Returns the contentSpacePolicy.- Returns:
- the contentSpacePolicy
-
setReportingDoctype
public void setReportingDoctype(boolean reportingDoctype) - Parameters:
reportingDoctype-- See Also:
-
isReportingDoctype
public boolean isReportingDoctype()Returns the reportingDoctype.- Returns:
- the reportingDoctype
-
setNamePolicy
The policy for non-NCName element and attribute names.- Parameters:
namePolicy-- See Also:
-
setHeuristics
Sets the encoding sniffing heuristics.- Parameters:
heuristics- the heuristics to set- See Also:
-
getHeuristics
-
setXmlPolicy
This is a catch-all convenience method for setting name, xmlns, content space, content non-XML char and comment policies in one go. This does not affect the streamability policy or doctype reporting.- Parameters:
xmlPolicy-
-
getNamePolicy
The policy for non-NCName element and attribute names.- Returns:
- the namePolicy
-
setBogusXmlnsPolicy
Deprecated.Does nothing. -
getBogusXmlnsPolicy
Deprecated.ReturnsXmlViolationPolicy.ALTER_INFOSET.- Returns:
XmlViolationPolicy.ALTER_INFOSET
-
addCharacterHandler
-
setIgnoringComments
public void setIgnoringComments(boolean ignoreComments) Sets whether comment nodes appear in the tree.- Parameters:
ignoreComments-trueto ignore comments- See Also:
-