HTML Tidy  5.8.0
The HTACG Tidy HTML Project
HTML and XML Parsing

Detailed Description

These functions and structures form the internal API for document parsing.

Functions

TY_PRIVATE Bool TY_❪CheckNodeIntegrity❫ (Node *node)
 Is used to perform a node integrity check recursively after parsing an HTML or XML document. More...
 
TY_PRIVATE void TY_❪CoerceNode❫ (TidyDocImpl *doc, Node *node, TidyTagId tid, Bool obsolete, Bool expected)
 Transforms a given node to another element, for example, from a p to a br. More...
 
TY_PRIVATE Node * TY_❪DiscardElement❫ (TidyDocImpl *doc, Node *element)
 Remove node from markup tree and discard it. More...
 
TY_PRIVATE Node * TY_❪DropEmptyElements❫ (TidyDocImpl *doc, Node *node)
 Trims a tree of empty elements recursively, returning the next node. More...
 
TY_PRIVATE void TY_❪InsertNodeAfterElement❫ (Node *element, Node *node)
 Insert node into markup tree after element. More...
 
TY_PRIVATE void TY_❪InsertNodeAtEnd❫ (Node *element, Node *node)
 Insert node into markup tree as the last element of content of element. More...
 
TY_PRIVATE void TY_❪InsertNodeAtStart❫ (Node *element, Node *node)
 Insert node into markup tree as the firt element of content of element. More...
 
TY_PRIVATE void TY_❪InsertNodeBeforeElement❫ (Node *element, Node *node)
 Insert node into markup tree before element. More...
 
TY_PRIVATE Bool TY_❪IsBlank❫ (Lexer *lexer, Node *node)
 Indicates whether or not a text node is blank, meaning that it consists of nothing, or a single space. More...
 
TY_PRIVATE Bool TY_❪IsJavaScript❫ (Node *node)
 Indicates whether or not a node is declared as containing javascript code. More...
 
TY_PRIVATE Bool TY_❪IsNewNode❫ (Node *node)
 Used to check if a node uses CM_NEW, which determines how attributes without values should be printed. More...
 
TY_PRIVATE void TY_❪ParseDocument❫ (TidyDocImpl *doc)
 Parses a document after lexing using the HTML parser. More...
 
TY_PRIVATE void TY_❪ParseXMLDocument❫ (TidyDocImpl *doc)
 Parses a document after lexing using the XML parser. More...
 
TY_PRIVATE Node * TY_❪RemoveNode❫ (Node *node)
 Extract a node and its children from a markup tree. More...
 
TY_PRIVATE Bool TY_❪TextNodeEndWithSpace❫ (Lexer *lexer, Node *node)
 Indicates whether or not a text node ends with a space or newline. More...
 
TY_PRIVATE Node * TY_❪TrimEmptyElement❫ (TidyDocImpl *doc, Node *element)
 Trims a single, empty element, returning the next node. More...
 
TY_PRIVATE Bool TY_❪XMLPreserveWhiteSpace❫ (TidyDocImpl *doc, Node *element)
 Indicates whether or not whitespace is to be preserved in XHTML/XML documents. More...
 

Function Documentation

◆ TY_❪CheckNodeIntegrity❫()

TY_PRIVATE Bool TY_❪CheckNodeIntegrity❫ ( Node *  node)

Is used to perform a node integrity check recursively after parsing an HTML or XML document.

Note
Actual performance of this check can be disabled by defining the macro NO_NODE_INTEGRITY_CHECK.
Parameters
nodeThe root node for the integrity check.
Returns
Returns yes or no indicating integrity of the node structure.

◆ TY_❪CoerceNode❫()

TY_PRIVATE void TY_❪CoerceNode❫ ( TidyDocImpl *  doc,
Node *  node,
TidyTagId  tid,
Bool  obsolete,
Bool  expected 
)

Transforms a given node to another element, for example, from a p to a br.

Parameters
docThe document which the node belongs to.
nodeThe node to coerce.
tidThe tag type to coerce the node into.
obsoleteIf the old node was obsolete, a report will be generated.
expectedIf the old node was not expected to be found in this particular location, a report will be generated.

◆ TY_❪DiscardElement❫()

TY_PRIVATE Node* TY_❪DiscardElement❫ ( TidyDocImpl *  doc,
Node *  element 
)

Remove node from markup tree and discard it.

Parameters
docThe Tidy document from which to discarb the node.
elementThe node to discard.
Returns
Returns the next node.

◆ TY_❪DropEmptyElements❫()

TY_PRIVATE Node* TY_❪DropEmptyElements❫ ( TidyDocImpl *  doc,
Node *  node 
)

Trims a tree of empty elements recursively, returning the next node.

Parameters
docThe Tidy document.
nodeThe element to trim.
Returns
Returns the next node.

◆ TY_❪InsertNodeAfterElement❫()

TY_PRIVATE void TY_❪InsertNodeAfterElement❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree after element.

Parameters
elementThe node after which the node is inserted.
nodeThe node to insert.

◆ TY_❪InsertNodeAtEnd❫()

TY_PRIVATE void TY_❪InsertNodeAtEnd❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree as the last element of content of element.

Parameters
elementThe new destination node.
nodeThe node to insert.

◆ TY_❪InsertNodeAtStart❫()

TY_PRIVATE void TY_❪InsertNodeAtStart❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree as the firt element of content of element.

Parameters
elementThe new destination node.
nodeThe node to insert.

◆ TY_❪InsertNodeBeforeElement❫()

TY_PRIVATE void TY_❪InsertNodeBeforeElement❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree before element.

Parameters
elementThe node before which the node is inserted.
nodeThe node to insert.

◆ TY_❪IsBlank❫()

TY_PRIVATE Bool TY_❪IsBlank❫ ( Lexer *  lexer,
Node *  node 
)

Indicates whether or not a text node is blank, meaning that it consists of nothing, or a single space.

Parameters
lexerThe lexer used to lex the document.
nodeThe node to test.
Returns
Returns the result of the test.

◆ TY_❪IsJavaScript❫()

TY_PRIVATE Bool TY_❪IsJavaScript❫ ( Node *  node)

Indicates whether or not a node is declared as containing javascript code.

Parameters
nodeThe node to test.
Returns
Returns the result of the test.

◆ TY_❪IsNewNode❫()

TY_PRIVATE Bool TY_❪IsNewNode❫ ( Node *  node)

Used to check if a node uses CM_NEW, which determines how attributes without values should be printed.

This was introduced to deal with user-defined tags e.g. ColdFusion.

Parameters
nodeThe node to check.
Returns
The result of the check.

◆ TY_❪ParseDocument❫()

TY_PRIVATE void TY_❪ParseDocument❫ ( TidyDocImpl *  doc)

Parses a document after lexing using the HTML parser.

It begins by properly configuring the overall HTML structure, and subsequently processes all remaining nodes. HTML is the root node.

Parameters
docThe Tidy document.

◆ TY_❪ParseXMLDocument❫()

TY_PRIVATE void TY_❪ParseXMLDocument❫ ( TidyDocImpl *  doc)

Parses a document after lexing using the XML parser.

Parameters
docThe Tidy document.

◆ TY_❪RemoveNode❫()

TY_PRIVATE Node* TY_❪RemoveNode❫ ( Node *  node)

Extract a node and its children from a markup tree.

Parameters
nodeThe node to remove.
Returns
Returns the removed node.

◆ TY_❪TextNodeEndWithSpace❫()

TY_PRIVATE Bool TY_❪TextNodeEndWithSpace❫ ( Lexer *  lexer,
Node *  node 
)

Indicates whether or not a text node ends with a space or newline.

Note
Implementation of this method is found in pprint.c for some reason.
Parameters
lexerA reference to the lexer used to lex the document.
nodeThe node to check.
Returns
The result of the check.

◆ TY_❪TrimEmptyElement❫()

TY_PRIVATE Node* TY_❪TrimEmptyElement❫ ( TidyDocImpl *  doc,
Node *  element 
)

Trims a single, empty element, returning the next node.

Parameters
docThe Tidy document.
elementThe element to trim.
Returns
Returns the next node.

◆ TY_❪XMLPreserveWhiteSpace❫()

TY_PRIVATE Bool TY_❪XMLPreserveWhiteSpace❫ ( TidyDocImpl *  doc,
Node *  element 
)

Indicates whether or not whitespace is to be preserved in XHTML/XML documents.

Parameters
docThe Tidy document.
elementThe node to test.
Returns
Returns the result of the test.