Chapter-4
XML
 What is XML?
• The eXtensible Markup Language (XML) is a text
 document used mainly for distributing the data
 on the internet between different applications.
• An xml is a text file saved with an extension .xml
• It’s a document for storing and transporting the
 data; mainly used for the interchanging the data
 on the internet.
• It is a language similar to html.
• In xml user can define our own tags and these
 tags are used to describe the data.
• It is a compatible scripting language.
 Advantages
• xml documents are easy to create.
• It has the property of self describing the data.
• xml is a fully compatible application like java.
• It is a portable language.
• It is platform independent.
*Difference between XML and HTML
XML syntax:
XML declaration:
This XML declaration indicates that the
document is written in XML and specifies
which version of XML.
XML declaration can also specify the language
encoding for the document.
Ex: <? xml version=”1.0” encoding=”UTF-8”?
Lang =“en”>
• Comments:Non executable part of a
 program.
• XML comments begin with <!- - and end with - -> .
• XML comments allow us to write comments
 within the document
• Ex: <!--This file is related to book information-->
• Root element:
• The first element in the XML document is
 called root element, which is the parent of all
 other elements in the document.
• Ex: <books>
 ----------
 -----------
 </books>
• Child elements:
 The elements that are contained within
the root elements are called child elements.
• Empty elements:
 An empty element is the one without the
 closing tag and which does not hold any
contents.
• Ex: <br/>, < hr/>, <img/>…..
• Closing Tags – that’s the closing of the root
 element.
 Elements
• An xml document consists of 3 main tags
 – Elements
 – Attributes
 – Entities
 Elements
• Element:
 The content between the start tag<..> and
end tag</..> including the tags is called element.
Ex: <title>Web programming</title>
 <title>System programming</title>
Here web programming & system programming
are the elements.
 *Attributes
• An attribute is a name/value pair , that we place
 within an opening tag , which allows us to provide
 extra information about an element.
• The property that describes an element is called
 attributes.
• Ex :< img src=”myimage.gif”/>
• <input type=“text”>
• Here src & type are the attributes.
• An element can contain one or more attributes.
 Entities
• Entity is an object in the real world
• Eg: Student
• Book etc
 *XML syntax Rules:
• All XML documents must have a root element
• XML is Case sensitive.
• All XML elements must have closing tags.
• All XML elements must be properly nested.
• Attribute values must be quoted.
 eg: <input type =“text”>
• The first character of each tag name must be a letter
 or the “_ “character, but not numbers or other
 punctuation.
 *XML CDATA:
• CDATA is nothing but character data.
• The term CDATA is used about text data.
• Characters like <, >,& and few are treated as
 illegal in xml elements.
• It will generate an error if we directly using it.
• So in order to avoid the errors in scripting, the
 code can be defined using CDATA.
 syntax
 <! [CDATA [“ contents“ ] ]> as the closing
tag.
• "<" will generate an error because the parser
 interprets it as the start of a new element.
• "&" will generate an error because the parser
 interprets it as the start of an character entity.
• To avoid that error scripts code can be defined as CDATA as
 follows:Example
 <script type=”text/javascript” >
<![CDATA[
 function greatest(a,b)
 {
 if(a>b)
 return a;
 else
 return b;
 }
 ]]>
• </script>
 *Types of XML Documents
• There are two types
  Well Formed document
  Valid document
 Well Formed document
• An XML document with correct syntax is called "Well
 Formed". well-formedness refers to syntax.
• A Well Formed document is an xml document that confirms
 or follows all the syntax rules of the xml.
• A well-formed XML document must have a corresponding end
 tag for all of its start tags.
• Nesting of elements within each other in an XML document
 must be proper.
• Eg:- <?xml version="1.0" encoding="UTF-8"?>
 <!– -- Sample xml document-- -- >
 <person>
 <name> Manoj</name>
 <age> 34</age>
 <address> Hebbal</address>
 </person>
 Valid document
• An XML document said to be valid when it is not only well-formed, but
 it also confirms to available DTD that specifies which tags it uses and
 what attributes those tags can contain.
• validity refers to semantics.
• Syntax defines the rules and regulations that help write any statement
 in a programming language, while semantics refers to the meaning of
 the associated line of code.
• Eg:<?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE strictSYSTEM “ strict.dtd">
 <!– -- Sample xml document-- -- >
 <person>
 <name> Manoj</name>
 <age> 34</age>
 <address> Hebbal</address>
 </person>
**DTD- Document Type Definition
• A DTD (Document Type Definition) consists of
 a list of syntax definitions and rules for each
 element in the XML document.
• The purpose of a DTD is to define the
 structure and the legal elements and
 attributes of an XML document:
• DTD specifies which element names can be
 included in the document, the attributes that
 each element can have, whether or not these
 are required or optional and more.
 DTD
• DTD <! DOCTYPE>
• The <!DOCTYPE> appearing near the top of the
 document in every xml document;
• This is how DTD declaration happens in xml as well.
• Similarly to use DTD within XML document ,we need
 to declare it.
• Syntax:
 <!DOCTYPE rootname[DTD]>
• Eg:<! DOCTYPE books[note.dtd]>
 Rules for DTD
• The DTD type declaration must be written in
 between the xml declaration and the root
 element.(ie, second line should be DOCTYPE)
• Keyword DOCTYPE must be followed by the
 root element.
• Keyword DOCTYPE must be in uppercase.
 **Types of DTD
• Internal DTD
• External DTD
 Internal DTD
• A DTD is referred to as an internal DTD if
 elements are declared within the XML files.
• If the DTD is declared inside the XML file, it
 must be wrapped inside the <!DOCTYPE>
 definition
• An internal DTD is defined between the square
 brackets within the XML document.
 Syntax
<!DOCTYPE root-element [element-declarations]>
 Example
 External DTD
• In external DTD elements are declared outside the
 XML file.
• If the DTD is declared in an external file, the <!
 DOCTYPE> definition must contain a reference to the
 DTD file.
• It is same as internal except that defining an external
 file.
• An external DTD is defined in an external file. And it
 can be used with more than one XML document.
<?xml version="1.0"?>
 <!DOCTYPE note “note.dtd”>
 Syntax
• <!DOCTYPE root-element SYSTEM "file-name">
• where file-name is the file with .dtd extension.
 **XML NAMESPACE:
• In XML namespace is used to prevent any conflicts
 with element names.
• Because XML allows to create our own tag names,
 there’s always the possibility of naming a tag exactly
 same as one in another XML document.
• The XML namespace identifies the range of tags used
 by the xml document.
• It is used to ensure that names used by one DTD
 don’t conflict with user-defined tags or tags defined
Eg. For name conflicts
• If these XML fragments were added together,
 there would be a name conflict.
• Both contain a <table> element, but the
 elements have different content and meaning.
 Solving the Name Conflict Using a Prefix
In the example above, there will be no conflict because the
two <table> elements have different names.
 XML Namespaces - The xmlns Attribute
• When using prefixes in XML, a namespace for the prefix
 must be defined.
• The namespace can be defined by an xmlns attribute in
 the start tag of an element.
• The namespace declaration has the following syntax.
 xmlns:prefix="URI".
 **XML SCHEMAS
• An XML schema defines how to structure an XML
 document and it can be used in place of DTD.
• An XML Schema describes the structure of an XML
 document.
• – XML schema is based on XML.
• – XML Schema language is known as XML Schema
 Definition (XSD).
• – The purpose of an XML Schema is to define the
 legal building blocks of an XML document, just like a
 DTD.
• An XML Schema:
• – defines elements that can appear in a document.
• – defines attributes that can appear in a document
• – defines which elements are child elements.
• – defines the order of child elements.
• – defines the number of child elements.
• – defines whether an element is empty or can
 include text.
• – defines data types for elements and attributes.
• – defines default and fixed values for elements and
 attributes.
 (TYPES OF ELEMENTS IN XML)
• A simple Type
• A complex type
• “SIMPLE” TYPE ELEMENTS
• A simple element is an XML element that can contain
 only text. It cannot contain any other elements or
 attributes.
• Simple type elements have no children or attributes.
• Eg: <xs:element _name=“hai”/>
• “COMPLEX” TYPE ELEMENTS
• – A complex element may have attributes
•A complex element is an XML element that contains
other elements and/or attributes.
• – A complex element may be empty, or it may
 contain text, other elements, or both text and other
 elements.
• Eg: <product pid="1345"/>
 Simple Elements
• A simple element is an XML element that can
 contain only text. It cannot contain any other
 elements or attributes.
• Complex Elements
 A complex element is an XML element that
 contains other elements and/or attributes.
• There are four kinds of complex elements:
• Empty elements
 <product pid="1345"/>
Which does not have a child element.
• Elements that contain only other elements OR CHILD
 Ex:A complex XML element, "employee", which
contains only other elements:
 <employee>
 <firstname>John</firstname>
 <lastname>Smith</lastname>
 </employee>
• Elements that contain only text.
 Ex: A complex XML element, "food", which
contains only text:
 <food type="dessert">Ice cream</food>
• Elements that contain both other elements and
 text
 Ex:A complex XML element, "description",
which contains both elements and text:
 <description>
 It happened on <date>03.03.99</date>
 ....
 </description>
 **XSL( Extensible Style sheet Language)
• It is a styling language for XML just like CSS is a
 styling language for HTML.
• XSL is a language to format xml documents.
• XSL has two parts
 - XSLT
 - XSL- FO
XSLT
 • XSLT stands for XSL Transformations.
 • XSLT: It is a language for transforming XML
 documents into various other types of
 documents.
 • XSLT (Extensible Stylesheet Language
 Transformations) is a language for
 transforming XML documents into other XML
 documents like HTML for web pages, PDF,
 PNG (portable network graphics)etc.
XSLT Transformation Process
• The process of transforming an XML
 document into another format is called XSL
 transformation.
• XSLT Processor is responsible for
 transforming the xml document.
• XSLT processor reads XML and XSLT
 document and produces the output in the
 form of HTML or XHTML or XML or PDF etc.
Advantages
• XSLT provides an easy way to merge XML data
 to produce output.
• By using XML and XSLT, the application will
 look clean and will be easier to maintain.
• XSLT can be used as a validation language .
 XSL-FO
• XSL-FO (XSL- Formatting Objects) is a markup
 language for XML document formatting , that
 is most often used to generate PDF files.
• A markup language is a text-encoding system
• XSL-FO is part of XSL (Extensible Stylesheet
 Language), a set of W3C technologies
 designed for the transformation and
 formatting of XML data.
 Parser
• A parser is a compiler or interpreter
 component that breaks data into smaller
 elements for easy translation into another
 language. A parser takes input in the form of a
 sequence of tokens or program instructions.
 *XML PARSER or Processors
• An XML parser is a software library or package that provides
 interfaces for client applications to work with an XML
 document. The XML Parser is designed to read the XML and
 create a way for programs to use XML.
• XML parser validates the document and check that the
 document is well formatted.
• Reads in XML data, checks for syntactic constraints.
• There are two types of parser APIs(a set of functions and
 procedures allowing the creation of applications)
 – SAX Simple API to XML (event-based)
 – DOM Document Object Model (object/tree based)
 SAX(Simple API for XML)
• – An event-based parsing technique.
(the flow of the program is determined by events such
as user actions like mouse clicks.)
• – The parser generates an application event
 whenever it encounters an element or data in the
 document being parsed.
• It is an event based parser, it works like an event
 handler in Java.
• – Programmer attaches “event handlers” to handle
 the event. Eg: click -onclick
• Advantages
• 1) It is simple and memory efficient.
• 2) It is very fast and works for huge
 documents.
• Disadvantages
• 1) It is event-based so its API is less sensitive.
• 2) Clients never know the full information
 because the data is broken into pieces.
 DOM
• Refer from Chapter 2