This document provides an overview of XML, including: - XML is not a replacement for HTML, a presentation format, programming language, or network transfer protocol, but can be used with these. - XML examples demonstrating tags, elements, attributes, and how XML documents form ordered trees. - Key aspects of XML like namespaces, DTDs, schemas, and how XML documents are linked to external definitions.
2 XML is not… •A replacement for HTML (but HTML can be generated from XML) • A presentation format (but XML can be converted into one) • A programming language (but it can be used with almost any language) • A network transfer protocol (but XML may be transferred over a network) • A database (but XML may be stored into a database)
3.
3 XML by Example <article> <author>GerhardWeikum</author> <title>The Web in 10 Years</title> </article> • Easy to understand for human users • Very expressive (semantics along with the data) • Well structured, easy to read and write from programs This looks nice, but…
4.
4 XML by Example <t108> <x87>GerhardWeikum</x87> <g10>The Web in 10 Years</g10> </t108> • Hard to understand for human users • Not expressive (no semantics along with the data) • Well structured, easy to read and write from programs … this is XML, too:
5.
5 XML by Example <data> ch37fhgks73j5mv9d63h5mgfkds8d984lgnsmcns983 </data> •Impossible to understand for human users • Not expressive (no semantics along with the data) • Unstructured, read and write only with special programs … and what about this XML document: The actual benefit of using XML highly depends on the design of the application.
6.
6 Possible Advantages ofUsing XML • Truly Portable Data • Easily readable by human users • Very expressive (semantics near data) • Very flexible and customizable (no finite tag set) • Easy to use from programs (libs available) • Easy to convert into other representations (XML transformation languages) • Many additional standards and tools • Widely used and supported
7.
7 A Simple XMLDocument <article> <author>Gerhard Weikum</author> <title>The Web in Ten Years</title> <text> <abstract>In order to evolve...</abstract> <section number=“1” title=“Introduction”> The <index>Web</index> provides the universal... </section> </text> </article> Freely definable tags
8.
8 Element Content of the Element (Subelements and/orText) A Simple XML Document <article> <author>Gerhard Weikum</author> <title>The Web in Ten Years</title> <text> <abstract>In order to evolve...</abstract> <section number=“1” title=“Introduction”> The <index>Web</index> provides the universal... </section> </text> </article> End Tag Start Tag
9.
9 A Simple XMLDocument <article> <author>Gerhard Weikum</author> <title>The Web in Ten Years</title> <text> <abstract>In order to evolve...</abstract> <section number=“1” title=“Introduction”> The <index>Web</index> provides the universal... </section> </text> </article> Attributes with name and value
10.
10 Elements in XMLDocuments • (Freely definable) tags: article, title, author – with start tag: <article> etc. – and end tag: </article> etc. • Elements: <article> ... </article> • Elements have a name (article) and a content (...) • Elements may be nested. • Elements may be empty: <this_is_empty/> • Element content is typically parsed character data (PCDATA), i.e., strings with special characters, and/or nested elements (mixed content if both). • Each XML document has exactly one root element and forms a tree. • Elements with a common parent are ordered.
11.
11 Elements vs. Attributes Elementsmay have attributes (in the start tag) that have a name and a value, e.g. <section number=“1“>. What is the difference between elements and attributes? • Only one attribute with a given name per element (but an arbitrary number of subelements) • Attributes have no structure, simply strings (while elements can have subelements) As a rule of thumb: • Content into elements • Metadata into attributes Example: <person born=“1912-06-23“ died=“1954-06-07“> Alan Turing</person> proved that…
12.
12 XML Documents asOrdered Trees article author title text sectionabstract The index Web provides … title=“…“ number=“1“ In order … Gerhard Weikum The Web in 10 years
13.
13 More on XMLSyntax • <root> <child> <subchild>.....</subchild> </child></root> • <?xml version="1.0" encoding="UTF-8"?> • XML Tags are Case Sensitive • XML Elements Must be Properly Nested • XML Attribute Values Must be Quoted • <!-- This is a comment --> • Some special characters must be escaped using entities: < → < & → & (will be converted back when reading the XML doc) • Some other characters may be escaped, too: > → > “ → " ‘ → '
14.
14 Well-Formed XML Documents A well-formed document must adher to, among others, the following rules: •Every start tag has a matching end tag. • Elements may nest, but must not overlap. • There must be exactly one root element. • Attribute values must be quoted. • An element may not have two attributes with the same name. • Comments and processing instructions may not appear inside tags. • No unescaped < or & signs may occur inside character data.
15.
15 Well-Formed XML Documents A well-formed document must adher to, among others, the following rules: •Every start tag has a matching end tag. • Elements may nest, but must not overlap. • There must be exactly one root element. • Attribute values must be quoted. • An element may not have to attributes with the same name. • Comments and processing instructions may not appear inside tags. • No unescaped < or & signs may occur inside character data. Only well-formed documents can be processed by XML parsers.
19 3.1 Document TypeDefinitions Sometimes XML is too flexible: • Most Programs can only process a subset of all possible XML applications • For exchanging data, the format (i.e., elements, attributes and their semantics) must be fixed ⇒ Document Type Definitions (DTD) for establishing the vocabulary for one XML application (in some sense comparable to schemas in databases) A document is valid with respect to a DTD if it conforms to the rules specified in that DTD. Most XML parsers can be configured to validate. <!DOCTYPE element DTD identifier [ declaration1 declaration2 ........ ]>
20.
20 DTD Example: Elements <!ELEMENTarticle (title,author+,text)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT text (abstract,section*,literature?)> <!ELEMENT abstract (#PCDATA)> <!ELEMENT section (#PCDATA|index)+> <!ELEMENT literature (#PCDATA)> <!ELEMENT index (#PCDATA)> Content of the title element is parsed character data Content of the article element is a title element, followed by one or more author elements, followed by a text element Content of the text element may contain zero or more section elements in this position
21.
21 Element Declarations inDTDs One element declaration for each element type: <!ELEMENT element_name content_specification> where content_specification can be • (#PCDATA) parsed character data • (child) one child element • (c1,…,cn) a sequence of child elements c1…cn • (c1|…|cn) one of the elements c1…cn For each component c, possible counts can be specified: – c exactly one such element – c+ one or more – c* zero or more – c? zero or one Plus arbitrary combinations using parenthesis: <!ELEMENT f ((a|b)*,c+,(d|e))*>
22.
22 More on ElementDeclarations • Elements with mixed content: <!ELEMENT text (#PCDATA|index|cite|glossary)*> • Elements with empty content: <!ELEMENT image EMPTY> • Elements with arbitrary content (this is nothing for production-level DTDs): <!ELEMENT thesis ANY>
23.
23 Attribute Declarations inDTDs Attributes are declared per element: <!ATTLIST section number CDATA #REQUIRED title CDATA #REQUIRED> declares two required attributes for element section. element name attribute name attribute type attribute default
24.
24 Attribute Declarations inDTDs Attributes are declared per element: <!ATTLIST section number CDATA #REQUIRED title CDATA #REQUIRED> declares two required attributes for element section. Possible attribute defaults: • #REQUIRED is required in each element instance • #IMPLIED is optional • #FIXED default always has this default value • default has this default value if the attribute is omitted from the element instance
25.
25 Attribute Types inDTDs • CDATA string data • (A1|…|An) enumeration of all possible values of the attribute (each is XML name) • ID unique XML name to identify the element • IDREF refers to ID attribute of some other element („intra-document link“) • IDREFS list of IDREF, separated by white space • plus some more
26.
26 Attribute Examples <ATTLIST publicationtype (journal|inproceedings) #REQUIRED pubid ID #REQUIRED> <ATTLIST cite cid IDREF #REQUIRED> <ATTLIST citation ref IDREF #IMPLIED cid ID #REQUIRED> <publications> <publication type=“journal“ pubid=“Weikum01“> <author>Gerhard Weikum</author> <text>In the Web of 2010, XML <cite cid=„12“/>...</text> <citation cid=„12“ ref=„XML98“/> <citation cid=„15“>...</citation> </publication> <publication type=“inproceedings“ pubid=“XML98“> <text>XML, the extended Markup Language, ...</text> </publication> </publications>
27.
27 Attribute Examples <ATTLIST publicationtype (journal|inproceedings) #REQUIRED pubid ID #REQUIRED> <ATTLIST cite cid IDREF #REQUIRED> <ATTLIST citation ref IDREF #IMPLIED cid ID #REQUIRED> <publications> <publication type=“journal“ pubid=“Weikum01“> <author>Gerhard Weikum</author> <text>In the Web of 2010, XML <cite cid=„12“/>...</text> <citation cid=„12“ ref=„XML98“/> <citation cid=„15“>...</citation> </publication> <publication type=“inproceedings“ pubid=“XML98“> <text>XML, the extended Markup Language, ...</text> </publication> </publications>
28.
28 Linking DTD andXML Docs • Document Type Declaration in the XML document: <!DOCTYPE article SYSTEM “http://www-dbs/article.dtd“> keywords Root element URI for the DTD
29.
29 Linking DTD andXML Docs • Internal DTD: <?xml version=“1.0“?> <!DOCTYPE article [ <!ELEMENT article (title,author+,text)> ... <!ELEMENT index (#PCDATA)> ]> <article> ... </article> • Both ways can be mixed, internal DTD overwrites external entity information: <!DOCTYPE article SYSTEM „article.dtd“ [ <!ENTITY % pub_content (title+,author*,text) ]>
30.
Internal & ExternalDTD • <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE note SYSTEM "Note.dtd"> • Note.dtd <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> 30
31.
• <?xml version="1.0"?> <!DOCTYPEnote [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </note> 31
Xml file <?xml version="1.0"encoding="UTF-8"?> <breakfast_menu> <food> <name>Belgian Waffles</name> <price>$5.95</price> <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description> <calories>650</calories> </food> </ breakfast_menu> 40
example • <?xml version="1.0"encoding="UTF-8"?> <bookstore xmlns:xlink="http://www.w3.org/1999/xlink"> <book title="Harry Potter"> <description xlink:type="simple" xlink:href="/images/HPotter.gif" xlink:show="new"> As his fifth year at Hogwarts School of Witchcraft and Wizardry approaches, 15-year-old Harry Potter is....... </description> </book> • </bookstore> 44
45.
Xml with ids •<?xml version="1.0" encoding="UTF-8"?> <dogbreeds> <dog breed="Rottweiler" id="Rottweiler"> <picture url="http://dog.com/rottweiler.gif" /> <history>The Rottweiler's ancestors were probably Roman drover dogs.....</history> <temperament>Confident, bold, alert and imposing, the Rottweiler is a popular choice for its ability to protect....</temperament> </dog> • </dogbreeds> 45
46.
Xml with xpointer •<?xml version="1.0" encoding="UTF-8"?> <mydogs xmlns:xlink="http://www.w3.org/1999/xlink"> <mydog> <description> Anton is my favorite dog. He has won a lot of..... </description> <fact xlink:type="simple" xlink:href="http://dog.com/dogbreeds.xml#Rottweiler"> Fact about Rottweiler </fact> </mydog> • </mydogs> 46
47.
Xml into theserver • <% 'Load XML set xml = Server.CreateObject("Microsoft.XMLDOM") xml.async = false xml.load(Server.MapPath("simple.xml")) 'Load XSL set xsl = Server.CreateObject("Microsoft.XMLDOM") xsl.async = false xsl.load(Server.MapPath("simple.xsl")) 'Transform file Response.Write(xml.transformNode(xsl)) %> 47