XML Validation can be done in two ways by the XML parser. They are:
- Well-formed XML Document
- Valid XML Document
A XML document is said to be well-formed if it accepts the XML syntax rules i.e. it should not have syntax, punctuation, spelling, grammar errors. If the document with sequences of markup characters has errors, it cannot be parsed through the XML Parser.
Additionally, the XML document must satisfy the below conditions:
- Few markup constructs such as parameter entity references are acceptable only in particular places. If used elsewhere, the document is considered as not well-formed even if the document is well-formed in all other ways.
- The replacement text for all parameter entities referenced inside a markup declaration consists of zero or more complete markup declarations.
- In a single tag, attribute should be used only once.
- The attribute with string values must not contain references to external entities.
- If any empty tags are present in the document, they must be properly nested.
- The declaration of Parameter entities should be done before used.
- A binary entity in the content can be used only in an attribute declared as ENTITY or ENTITIES but not as reference.
- Recursion is not allowed for text nor parameter entities.
Example: Below is the example for Well formed XML Document in which the DTD file is involved with the
contact
as root element and parameter entities like address, name, company and phone are declared before they are used.
[xml]
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE contact
[
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<contact>
<name>Michael Phelps</name>
<company>SPLessons</company>
<phone>(011) 123-4567</phone>
</contact>
[/xml]
To verify everything in XML Document whether it is in a structured manner or not, a set of rules are applied to the parsing process in addition to the well-formed checks. So, if the XML parser parses the docking correctly and the document content passes the rules, then it is called
Valid Document.
There are two ways to check if a document is valid or not. They are:
- Document Type Definitions(DTDs) : Simple to use but not very powerful. These are written using a syntax that's different than XML.
- XML Schema Document (XSD) : This Type Definitions are more powerful than DTD and is written using XML syntax rules.
The above concepts will be discussed in detail in further chapters.