HTML Tidy  5.6.0
The HTACG Tidy HTML Project
HTML and XML Parsing

Detailed Description

These functions and structures form the internal API for document parsing.

Functions

Bool TY_❪CheckNodeIntegrity❫ (Node *node)
 Is used to perform a node integrity check recursively after parsing an HTML or XML document. More...
 
void TY_❪CoerceNode❫ (TidyDocImpl *doc, Node *node, TidyTagId tid, Bool obsolete, Bool expected)
 Transforms a given node to another element, for example, from a p to a br. More...
 
Node * TY_❪DiscardElement❫ (TidyDocImpl *doc, Node *element)
 Remove node from markup tree and discard it. More...
 
Node * TY_❪DropEmptyElements❫ (TidyDocImpl *doc, Node *node)
 Trims a tree of empty elements recursively, returning the next node. More...
 
void TY_❪InsertNodeAfterElement❫ (Node *element, Node *node)
 Insert node into markup tree after element. More...
 
void TY_❪InsertNodeAtEnd❫ (Node *element, Node *node)
 Insert node into markup tree as the last element of content of element. More...
 
void TY_❪InsertNodeAtStart❫ (Node *element, Node *node)
 Insert node into markup tree as the firt element of content of element. More...
 
void TY_❪InsertNodeBeforeElement❫ (Node *element, Node *node)
 Insert node into markup tree before element. More...
 
Bool TY_❪IsBlank❫ (Lexer *lexer, Node *node)
 Indicates whether or not a text node is blank, meaning that it consists of nothing, or a single space. More...
 
Bool TY_❪IsJavaScript❫ (Node *node)
 Indicates whether or not a node is declared as containing javascript code. More...
 
Bool TY_❪IsNewNode❫ (Node *node)
 Used to check if a node uses CM_NEW, which determines how attributes without values should be printed. More...
 
void TY_❪ParseDocument❫ (TidyDocImpl *doc)
 Parses a document after lexing using the HTML parser. More...
 
void TY_❪ParseXMLDocument❫ (TidyDocImpl *doc)
 Parses a document after lexing using the XML parser. More...
 
Node * TY_❪RemoveNode❫ (Node *node)
 Extract a node and its children from a markup tree. More...
 
Bool TY_❪TextNodeEndWithSpace❫ (Lexer *lexer, Node *node)
 Indicates whether or not a text node ends with a space or newline. More...
 
Node * TY_❪TrimEmptyElement❫ (TidyDocImpl *doc, Node *element)
 Trims a single, empty element, returning the next node. More...
 
Bool TY_❪XMLPreserveWhiteSpace❫ (TidyDocImpl *doc, Node *element)
 Indicates whether or not whitespace is to be preserved in XHTML/XML documents. More...
 

Function Documentation

Bool TY_❪CheckNodeIntegrity❫ ( Node *  node)

Is used to perform a node integrity check recursively after parsing an HTML or XML document.

Note
Actual performance of this check can be disabled by defining the macro NO_NODE_INTEGRITY_CHECK.
Parameters
nodeThe root node for the integrity check.
Returns
Returns yes or no indicating integrity of the node structure.
void TY_❪CoerceNode❫ ( TidyDocImpl *  doc,
Node *  node,
TidyTagId  tid,
Bool  obsolete,
Bool  expected 
)

Transforms a given node to another element, for example, from a p to a br.

Parameters
docThe document which the node belongs to.
nodeThe node to coerce.
tidThe tag type to coerce the node into.
obsoleteIf the old node was obsolete, a report will be generated.
expectedIf the old node was not expected to be found in this particular location, a report will be generated.
Node* TY_❪DiscardElement❫ ( TidyDocImpl *  doc,
Node *  element 
)

Remove node from markup tree and discard it.

Parameters
docThe Tidy document from which to discarb the node.
elementThe node to discard.
Returns
Returns the next node.
Node* TY_❪DropEmptyElements❫ ( TidyDocImpl *  doc,
Node *  node 
)

Trims a tree of empty elements recursively, returning the next node.

Parameters
docThe Tidy document.
nodeThe element to trim.
Returns
Returns the next node.
void TY_❪InsertNodeAfterElement❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree after element.

Parameters
elementThe node after which the node is inserted.
nodeThe node to insert.
void TY_❪InsertNodeAtEnd❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree as the last element of content of element.

Parameters
elementThe new destination node.
nodeThe node to insert.
void TY_❪InsertNodeAtStart❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree as the firt element of content of element.

Parameters
elementThe new destination node.
nodeThe node to insert.
void TY_❪InsertNodeBeforeElement❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree before element.

Parameters
elementThe node before which the node is inserted.
nodeThe node to insert.
Bool TY_❪IsBlank❫ ( Lexer *  lexer,
Node *  node 
)

Indicates whether or not a text node is blank, meaning that it consists of nothing, or a single space.

Parameters
lexerThe lexer used to lex the document.
nodeThe node to test.
Returns
Returns the result of the test.
Bool TY_❪IsJavaScript❫ ( Node *  node)

Indicates whether or not a node is declared as containing javascript code.

Parameters
nodeThe node to test.
Returns
Returns the result of the test.
Bool TY_❪IsNewNode❫ ( Node *  node)

Used to check if a node uses CM_NEW, which determines how attributes without values should be printed.

This was introduced to deal with user-defined tags e.g. ColdFusion.

Parameters
nodeThe node to check.
Returns
The result of the check.
void TY_❪ParseDocument❫ ( TidyDocImpl *  doc)

Parses a document after lexing using the HTML parser.

It begins by properly configuring the overall HTML structure, and subsequently processes all remaining nodes. HTML is the root node.

Parameters
docThe Tidy document.
void TY_❪ParseXMLDocument❫ ( TidyDocImpl *  doc)

Parses a document after lexing using the XML parser.

Parameters
docThe Tidy document.
Node* TY_❪RemoveNode❫ ( Node *  node)

Extract a node and its children from a markup tree.

Parameters
nodeThe node to remove.
Returns
Returns the removed node.
Bool TY_❪TextNodeEndWithSpace❫ ( Lexer *  lexer,
Node *  node 
)

Indicates whether or not a text node ends with a space or newline.

Note
Implementation of this method is found in pprint.c for some reason.
Parameters
lexerA reference to the lexer used to lex the document.
nodeThe node to check.
Returns
The result of the check.
Node* TY_❪TrimEmptyElement❫ ( TidyDocImpl *  doc,
Node *  element 
)

Trims a single, empty element, returning the next node.

Parameters
docThe Tidy document.
elementThe element to trim.
Returns
Returns the next node.
Bool TY_❪XMLPreserveWhiteSpace❫ ( TidyDocImpl *  doc,
Node *  element 
)

Indicates whether or not whitespace is to be preserved in XHTML/XML documents.

Parameters
docThe Tidy document.
elementThe node to test.
Returns
Returns the result of the test.