These functions and structures form the internal API for document parsing.
Data Structures | |
struct | TidyParserMemory |
This typedef represents the state of a parser when it enters and exits. More... | |
struct | TidyParserStack |
This typedef represents a stack of parserState. More... | |
Functions | |
TY_PRIVATE Bool | TY_❪CheckNodeIntegrity❫ (Node *node) |
Is used to perform a node integrity check recursively after parsing an HTML or XML document. More... | |
TY_PRIVATE void | TY_❪CoerceNode❫ (TidyDocImpl *doc, Node *node, TidyTagId tid, Bool obsolete, Bool expected) |
Transforms a given node to another element, for example, from a p to a br . More... | |
TY_PRIVATE Node * | TY_❪DiscardElement❫ (TidyDocImpl *doc, Node *element) |
Remove node from markup tree and discard it. More... | |
TY_PRIVATE Node * | TY_❪DropEmptyElements❫ (TidyDocImpl *doc, Node *node) |
Trims a tree of empty elements recursively, returning the next node. More... | |
void | TY_❪FreeParserStack❫ (TidyDocImpl *doc) |
Frees the parser's stack when done. More... | |
void | TY_❪InitParserStack❫ (TidyDocImpl *doc) |
Allocates and initializes the parser's stack. More... | |
TY_PRIVATE void | TY_❪InsertNodeAfterElement❫ (Node *element, Node *node) |
Insert node into markup tree after element. More... | |
TY_PRIVATE void | TY_❪InsertNodeAtEnd❫ (Node *element, Node *node) |
Insert node into markup tree as the last element of content of element. More... | |
TY_PRIVATE void | TY_❪InsertNodeAtStart❫ (Node *element, Node *node) |
Insert node into markup tree as the first element of content of element. More... | |
TY_PRIVATE void | TY_❪InsertNodeBeforeElement❫ (Node *element, Node *node) |
Insert node into markup tree before element. More... | |
TY_PRIVATE Bool | TY_❪IsBlank❫ (Lexer *lexer, Node *node) |
Indicates whether or not a text node is blank, meaning that it consists of nothing, or a single space. More... | |
Bool | TY_❪isEmptyParserStack❫ (TidyDocImpl *doc) |
Indicates whether or not the stack is empty. More... | |
TY_PRIVATE Bool | TY_❪IsJavaScript❫ (Node *node) |
Indicates whether or not a node is declared as containing javascript code. More... | |
TY_PRIVATE Bool | TY_❪IsNewNode❫ (Node *node) |
Used to check if a node uses CM_NEW, which determines how attributes without values should be printed. More... | |
TY_PRIVATE void | TY_❪ParseDocument❫ (TidyDocImpl *doc) |
Parses a document after lexing using the HTML parser. More... | |
TY_PRIVATE void | TY_❪ParseXMLDocument❫ (TidyDocImpl *doc) |
Parses a document after lexing using the XML parser. More... | |
Parser * | TY_❪peekMemoryIdentity❫ (TidyDocImpl *doc) |
Peek at the parser memory "identity" field. More... | |
GetTokenMode | TY_❪peekMemoryMode❫ (TidyDocImpl *doc) |
Peek at the parser memory "mode" field. More... | |
TidyParserMemory | TY_❪peekMemory❫ (TidyDocImpl *doc) |
Peek at the parser memory. More... | |
TidyParserMemory | TY_❪popMemory❫ (TidyDocImpl *doc) |
Pop out a parser memory. More... | |
void | TY_❪pushMemory❫ (TidyDocImpl *doc, TidyParserMemory data) |
Push the parser memory to the stack. More... | |
TY_PRIVATE Node * | TY_❪RemoveNode❫ (Node *node) |
Extract a node and its children from a markup tree. More... | |
TY_PRIVATE Bool | TY_❪TextNodeEndWithSpace❫ (Lexer *lexer, Node *node) |
Indicates whether or not a text node ends with a space or newline. More... | |
TY_PRIVATE Node * | TY_❪TrimEmptyElement❫ (TidyDocImpl *doc, Node *element) |
Trims a single, empty element, returning the next node. More... | |
TY_PRIVATE Bool | TY_❪XMLPreserveWhiteSpace❫ (TidyDocImpl *doc, Node *element) |
Indicates whether or not whitespace is to be preserved in XHTML/XML documents. More... | |
struct TidyParserMemory |
This typedef represents the state of a parser when it enters and exits.
When the parser needs to finish work on the way back up the stack, it will push one of these records to the stack, and it will pop a record from the stack upon re-entry.
Data Fields | ||
---|---|---|
Parser * | identity | Which parser pushed this record? |
GetTokenMode | mode | The caller will peek at this value to get the correct mode. |
Node * | original_node | Originally provided node at entry. |
GetTokenMode | reentry_mode | The token mode to use when re-entering. |
Node * | reentry_node | The node with which to re-enter. |
int | reentry_state |
State to set during re-entry. Defined locally in each parser. |
int | register_1 | Local variable storage. |
int | register_2 | Local variable storage. |
struct TidyParserStack |
This typedef represents a stack of parserState.
The Tidy document has its own instance of this.
Data Fields | ||
---|---|---|
TidyParserMemory * | content | A state record. |
uint | size | Current size of the stack. |
int | top | Top of the stack. |
TY_PRIVATE Bool TY_❪CheckNodeIntegrity❫ | ( | Node * | node | ) |
Is used to perform a node integrity check recursively after parsing an HTML or XML document.
node | The root node for the integrity check. |
TY_PRIVATE void TY_❪CoerceNode❫ | ( | TidyDocImpl * | doc, |
Node * | node, | ||
TidyTagId | tid, | ||
Bool | obsolete, | ||
Bool | expected | ||
) |
Transforms a given node to another element, for example, from a p
to a br
.
doc | The document which the node belongs to. |
node | The node to coerce. |
tid | The tag type to coerce the node into. |
obsolete | If the old node was obsolete, a report will be generated. |
expected | If the old node was not expected to be found in this particular location, a report will be generated. |
TY_PRIVATE Node* TY_❪DiscardElement❫ | ( | TidyDocImpl * | doc, |
Node * | element | ||
) |
Remove node from markup tree and discard it.
doc | The Tidy document from which to discard the node. |
element | The node to discard. |
TY_PRIVATE Node* TY_❪DropEmptyElements❫ | ( | TidyDocImpl * | doc, |
Node * | node | ||
) |
Trims a tree of empty elements recursively, returning the next node.
doc | The Tidy document. |
node | The element to trim. |
void TY_❪FreeParserStack❫ | ( | TidyDocImpl * | doc | ) |
Frees the parser's stack when done.
TidyRelease will perform this automatically.
void TY_❪InitParserStack❫ | ( | TidyDocImpl * | doc | ) |
Allocates and initializes the parser's stack.
TidyCreate will perform this automatically.
TY_PRIVATE void TY_❪InsertNodeAfterElement❫ | ( | Node * | element, |
Node * | node | ||
) |
Insert node into markup tree after element.
element | The node after which the node is inserted. |
node | The node to insert. |
TY_PRIVATE void TY_❪InsertNodeAtEnd❫ | ( | Node * | element, |
Node * | node | ||
) |
Insert node into markup tree as the last element of content of element.
element | The new destination node. |
node | The node to insert. |
TY_PRIVATE void TY_❪InsertNodeAtStart❫ | ( | Node * | element, |
Node * | node | ||
) |
Insert node into markup tree as the first element of content of element.
element | The new destination node. |
node | The node to insert. |
TY_PRIVATE void TY_❪InsertNodeBeforeElement❫ | ( | Node * | element, |
Node * | node | ||
) |
Insert node into markup tree before element.
element | The node before which the node is inserted. |
node | The node to insert. |
TY_PRIVATE Bool TY_❪IsBlank❫ | ( | Lexer * | lexer, |
Node * | node | ||
) |
Indicates whether or not a text node is blank, meaning that it consists of nothing, or a single space.
lexer | The lexer used to lex the document. |
node | The node to test. |
Bool TY_❪isEmptyParserStack❫ | ( | TidyDocImpl * | doc | ) |
Indicates whether or not the stack is empty.
TY_PRIVATE Bool TY_❪IsJavaScript❫ | ( | Node * | node | ) |
Indicates whether or not a node is declared as containing javascript code.
node | The node to test. |
TY_PRIVATE Bool TY_❪IsNewNode❫ | ( | Node * | node | ) |
Used to check if a node uses CM_NEW, which determines how attributes without values should be printed.
This was introduced to deal with user-defined tags e.g. ColdFusion.
node | The node to check. |
TY_PRIVATE void TY_❪ParseDocument❫ | ( | TidyDocImpl * | doc | ) |
Parses a document after lexing using the HTML parser.
It begins by properly configuring the overall HTML structure, and subsequently processes all remaining nodes. HTML is the root node.
doc | The Tidy document. |
TY_PRIVATE void TY_❪ParseXMLDocument❫ | ( | TidyDocImpl * | doc | ) |
Parses a document after lexing using the XML parser.
doc | The Tidy document. |
Parser* TY_❪peekMemoryIdentity❫ | ( | TidyDocImpl * | doc | ) |
Peek at the parser memory "identity" field.
This is just a convenience to avoid having to create a new struct instance in the caller.
GetTokenMode TY_❪peekMemoryMode❫ | ( | TidyDocImpl * | doc | ) |
Peek at the parser memory "mode" field.
This is just a convenience to avoid having to create a new struct instance in the caller.
TidyParserMemory TY_❪peekMemory❫ | ( | TidyDocImpl * | doc | ) |
Peek at the parser memory.
TidyParserMemory TY_❪popMemory❫ | ( | TidyDocImpl * | doc | ) |
Pop out a parser memory.
void TY_❪pushMemory❫ | ( | TidyDocImpl * | doc, |
TidyParserMemory | data | ||
) |
Push the parser memory to the stack.
TY_PRIVATE Node* TY_❪RemoveNode❫ | ( | Node * | node | ) |
Extract a node and its children from a markup tree.
node | The node to remove. |
TY_PRIVATE Bool TY_❪TextNodeEndWithSpace❫ | ( | Lexer * | lexer, |
Node * | node | ||
) |
Indicates whether or not a text node ends with a space or newline.
pprint.c
for some reason. lexer | A reference to the lexer used to lex the document. |
node | The node to check. |
TY_PRIVATE Node* TY_❪TrimEmptyElement❫ | ( | TidyDocImpl * | doc, |
Node * | element | ||
) |
Trims a single, empty element, returning the next node.
doc | The Tidy document. |
element | The element to trim. |
TY_PRIVATE Bool TY_❪XMLPreserveWhiteSpace❫ | ( | TidyDocImpl * | doc, |
Node * | element | ||
) |
Indicates whether or not whitespace is to be preserved in XHTML/XML documents.
doc | The Tidy document. |
element | The node to test. |