HTML Tidy  5.9.15
The HTACG Tidy HTML Project
HTML and XML Parsing

Detailed Description

These functions and structures form the internal API for document parsing.

Data Structures

struct  TidyParserMemory
 This typedef represents the state of a parser when it enters and exits. More...
 
struct  TidyParserStack
 This typedef represents a stack of parserState. More...
 

Functions

TY_PRIVATE Bool TY_❪CheckNodeIntegrity❫ (Node *node)
 Is used to perform a node integrity check recursively after parsing an HTML or XML document. More...
 
TY_PRIVATE void TY_❪CoerceNode❫ (TidyDocImpl *doc, Node *node, TidyTagId tid, Bool obsolete, Bool expected)
 Transforms a given node to another element, for example, from a p to a br. More...
 
TY_PRIVATE Node * TY_❪DiscardElement❫ (TidyDocImpl *doc, Node *element)
 Remove node from markup tree and discard it. More...
 
TY_PRIVATE Node * TY_❪DropEmptyElements❫ (TidyDocImpl *doc, Node *node)
 Trims a tree of empty elements recursively, returning the next node. More...
 
void TY_❪FreeParserStack❫ (TidyDocImpl *doc)
 Frees the parser's stack when done. More...
 
void TY_❪InitParserStack❫ (TidyDocImpl *doc)
 Allocates and initializes the parser's stack. More...
 
TY_PRIVATE void TY_❪InsertNodeAfterElement❫ (Node *element, Node *node)
 Insert node into markup tree after element. More...
 
TY_PRIVATE void TY_❪InsertNodeAtEnd❫ (Node *element, Node *node)
 Insert node into markup tree as the last element of content of element. More...
 
TY_PRIVATE void TY_❪InsertNodeAtStart❫ (Node *element, Node *node)
 Insert node into markup tree as the first element of content of element. More...
 
TY_PRIVATE void TY_❪InsertNodeBeforeElement❫ (Node *element, Node *node)
 Insert node into markup tree before element. More...
 
TY_PRIVATE Bool TY_❪IsBlank❫ (Lexer *lexer, Node *node)
 Indicates whether or not a text node is blank, meaning that it consists of nothing, or a single space. More...
 
Bool TY_❪isEmptyParserStack❫ (TidyDocImpl *doc)
 Indicates whether or not the stack is empty. More...
 
TY_PRIVATE Bool TY_❪IsJavaScript❫ (Node *node)
 Indicates whether or not a node is declared as containing javascript code. More...
 
TY_PRIVATE Bool TY_❪IsNewNode❫ (Node *node)
 Used to check if a node uses CM_NEW, which determines how attributes without values should be printed. More...
 
TY_PRIVATE void TY_❪ParseDocument❫ (TidyDocImpl *doc)
 Parses a document after lexing using the HTML parser. More...
 
TY_PRIVATE void TY_❪ParseXMLDocument❫ (TidyDocImpl *doc)
 Parses a document after lexing using the XML parser. More...
 
ParserTY_❪peekMemoryIdentity❫ (TidyDocImpl *doc)
 Peek at the parser memory "identity" field. More...
 
GetTokenMode TY_❪peekMemoryMode❫ (TidyDocImpl *doc)
 Peek at the parser memory "mode" field. More...
 
TidyParserMemory TY_❪peekMemory❫ (TidyDocImpl *doc)
 Peek at the parser memory. More...
 
TidyParserMemory TY_❪popMemory❫ (TidyDocImpl *doc)
 Pop out a parser memory. More...
 
void TY_❪pushMemory❫ (TidyDocImpl *doc, TidyParserMemory data)
 Push the parser memory to the stack. More...
 
TY_PRIVATE Node * TY_❪RemoveNode❫ (Node *node)
 Extract a node and its children from a markup tree. More...
 
TY_PRIVATE Bool TY_❪TextNodeEndWithSpace❫ (Lexer *lexer, Node *node)
 Indicates whether or not a text node ends with a space or newline. More...
 
TY_PRIVATE Node * TY_❪TrimEmptyElement❫ (TidyDocImpl *doc, Node *element)
 Trims a single, empty element, returning the next node. More...
 
TY_PRIVATE Bool TY_❪XMLPreserveWhiteSpace❫ (TidyDocImpl *doc, Node *element)
 Indicates whether or not whitespace is to be preserved in XHTML/XML documents. More...
 

Data Structure Documentation

◆ TidyParserMemory

struct TidyParserMemory

This typedef represents the state of a parser when it enters and exits.

When the parser needs to finish work on the way back up the stack, it will push one of these records to the stack, and it will pop a record from the stack upon re-entry.

Data Fields
Parser * identity Which parser pushed this record?
GetTokenMode mode The caller will peek at this value to get the correct mode.
Node * original_node Originally provided node at entry.
GetTokenMode reentry_mode The token mode to use when re-entering.
Node * reentry_node The node with which to re-enter.
int reentry_state State to set during re-entry.

Defined locally in each parser.

int register_1 Local variable storage.
int register_2 Local variable storage.

◆ TidyParserStack

struct TidyParserStack

This typedef represents a stack of parserState.

The Tidy document has its own instance of this.

Data Fields
TidyParserMemory * content A state record.
uint size Current size of the stack.
int top Top of the stack.

Function Documentation

◆ TY_❪CheckNodeIntegrity❫()

TY_PRIVATE Bool TY_❪CheckNodeIntegrity❫ ( Node *  node)

Is used to perform a node integrity check recursively after parsing an HTML or XML document.

Note
Actual performance of this check can be disabled by defining the macro NO_NODE_INTEGRITY_CHECK.
Parameters
nodeThe root node for the integrity check.
Returns
Returns yes or no indicating integrity of the node structure.

◆ TY_❪CoerceNode❫()

TY_PRIVATE void TY_❪CoerceNode❫ ( TidyDocImpl *  doc,
Node *  node,
TidyTagId  tid,
Bool  obsolete,
Bool  expected 
)

Transforms a given node to another element, for example, from a p to a br.

Parameters
docThe document which the node belongs to.
nodeThe node to coerce.
tidThe tag type to coerce the node into.
obsoleteIf the old node was obsolete, a report will be generated.
expectedIf the old node was not expected to be found in this particular location, a report will be generated.

◆ TY_❪DiscardElement❫()

TY_PRIVATE Node* TY_❪DiscardElement❫ ( TidyDocImpl *  doc,
Node *  element 
)

Remove node from markup tree and discard it.

Parameters
docThe Tidy document from which to discard the node.
elementThe node to discard.
Returns
Returns the next node.

◆ TY_❪DropEmptyElements❫()

TY_PRIVATE Node* TY_❪DropEmptyElements❫ ( TidyDocImpl *  doc,
Node *  node 
)

Trims a tree of empty elements recursively, returning the next node.

Parameters
docThe Tidy document.
nodeThe element to trim.
Returns
Returns the next node.

◆ TY_❪FreeParserStack❫()

void TY_❪FreeParserStack❫ ( TidyDocImpl *  doc)

Frees the parser's stack when done.

TidyRelease will perform this automatically.

◆ TY_❪InitParserStack❫()

void TY_❪InitParserStack❫ ( TidyDocImpl *  doc)

Allocates and initializes the parser's stack.

TidyCreate will perform this automatically.

◆ TY_❪InsertNodeAfterElement❫()

TY_PRIVATE void TY_❪InsertNodeAfterElement❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree after element.

Parameters
elementThe node after which the node is inserted.
nodeThe node to insert.

◆ TY_❪InsertNodeAtEnd❫()

TY_PRIVATE void TY_❪InsertNodeAtEnd❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree as the last element of content of element.

Parameters
elementThe new destination node.
nodeThe node to insert.

◆ TY_❪InsertNodeAtStart❫()

TY_PRIVATE void TY_❪InsertNodeAtStart❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree as the first element of content of element.

Parameters
elementThe new destination node.
nodeThe node to insert.

◆ TY_❪InsertNodeBeforeElement❫()

TY_PRIVATE void TY_❪InsertNodeBeforeElement❫ ( Node *  element,
Node *  node 
)

Insert node into markup tree before element.

Parameters
elementThe node before which the node is inserted.
nodeThe node to insert.

◆ TY_❪IsBlank❫()

TY_PRIVATE Bool TY_❪IsBlank❫ ( Lexer *  lexer,
Node *  node 
)

Indicates whether or not a text node is blank, meaning that it consists of nothing, or a single space.

Parameters
lexerThe lexer used to lex the document.
nodeThe node to test.
Returns
Returns the result of the test.

◆ TY_❪isEmptyParserStack❫()

Bool TY_❪isEmptyParserStack❫ ( TidyDocImpl *  doc)

Indicates whether or not the stack is empty.

◆ TY_❪IsJavaScript❫()

TY_PRIVATE Bool TY_❪IsJavaScript❫ ( Node *  node)

Indicates whether or not a node is declared as containing javascript code.

Parameters
nodeThe node to test.
Returns
Returns the result of the test.

◆ TY_❪IsNewNode❫()

TY_PRIVATE Bool TY_❪IsNewNode❫ ( Node *  node)

Used to check if a node uses CM_NEW, which determines how attributes without values should be printed.

This was introduced to deal with user-defined tags e.g. ColdFusion.

Parameters
nodeThe node to check.
Returns
The result of the check.

◆ TY_❪ParseDocument❫()

TY_PRIVATE void TY_❪ParseDocument❫ ( TidyDocImpl *  doc)

Parses a document after lexing using the HTML parser.

It begins by properly configuring the overall HTML structure, and subsequently processes all remaining nodes. HTML is the root node.

Parameters
docThe Tidy document.

◆ TY_❪ParseXMLDocument❫()

TY_PRIVATE void TY_❪ParseXMLDocument❫ ( TidyDocImpl *  doc)

Parses a document after lexing using the XML parser.

Parameters
docThe Tidy document.

◆ TY_❪peekMemoryIdentity❫()

Parser* TY_❪peekMemoryIdentity❫ ( TidyDocImpl *  doc)

Peek at the parser memory "identity" field.

This is just a convenience to avoid having to create a new struct instance in the caller.

◆ TY_❪peekMemoryMode❫()

GetTokenMode TY_❪peekMemoryMode❫ ( TidyDocImpl *  doc)

Peek at the parser memory "mode" field.

This is just a convenience to avoid having to create a new struct instance in the caller.

◆ TY_❪peekMemory❫()

TidyParserMemory TY_❪peekMemory❫ ( TidyDocImpl *  doc)

Peek at the parser memory.

◆ TY_❪popMemory❫()

TidyParserMemory TY_❪popMemory❫ ( TidyDocImpl *  doc)

Pop out a parser memory.

◆ TY_❪pushMemory❫()

void TY_❪pushMemory❫ ( TidyDocImpl *  doc,
TidyParserMemory  data 
)

Push the parser memory to the stack.

◆ TY_❪RemoveNode❫()

TY_PRIVATE Node* TY_❪RemoveNode❫ ( Node *  node)

Extract a node and its children from a markup tree.

Parameters
nodeThe node to remove.
Returns
Returns the removed node.

◆ TY_❪TextNodeEndWithSpace❫()

TY_PRIVATE Bool TY_❪TextNodeEndWithSpace❫ ( Lexer *  lexer,
Node *  node 
)

Indicates whether or not a text node ends with a space or newline.

Note
Implementation of this method is found in pprint.c for some reason.
Parameters
lexerA reference to the lexer used to lex the document.
nodeThe node to check.
Returns
The result of the check.

◆ TY_❪TrimEmptyElement❫()

TY_PRIVATE Node* TY_❪TrimEmptyElement❫ ( TidyDocImpl *  doc,
Node *  element 
)

Trims a single, empty element, returning the next node.

Parameters
docThe Tidy document.
elementThe element to trim.
Returns
Returns the next node.

◆ TY_❪XMLPreserveWhiteSpace❫()

TY_PRIVATE Bool TY_❪XMLPreserveWhiteSpace❫ ( TidyDocImpl *  doc,
Node *  element 
)

Indicates whether or not whitespace is to be preserved in XHTML/XML documents.

Parameters
docThe Tidy document.
elementThe node to test.
Returns
Returns the result of the test.