Lecture: course info, XML and DTD

Tid: Tisdag 5 november 2013 kl 10:00 - 12:00 2013-11-05T10:00:00 2013-11-05T12:00:00

Aktivitet: Föreläsning

Lärare: Björn Hedin ()

Studentgrupper: CMETE3INMT, CMETE3TRK, TMETM2-IM, TMETM2-TK

Info:

The lecture starts 10.15

Introduction to the course and general course information during the first hour. The second hour is devoted to introducing XML in general, and how to write DTDs, which is required to complete the first lab assignment.

Prepare by reading this summary of the lecture

XML is a recommendation for how different types of data can be structured in tree structures, which are easy to understand and clear to both humans and computers. Examples of areas where XML is used is xhtml for coding web pages, SMIL (the language to encode MMS), DocBook (a highly structured document description languages) and JDF (a language to transmit information between different computer systems as part of a printing process). One can roughly divide the uses of document formats like HTML and SMIL, and data format JDF.

There are a number of major benefits of using an XML-based format over other types of formats, such as:

Data saved in XML-based formats are easily portable between different environments, and are easy to convert if new data formats emerge in the future.
"X" in XML stands for "extensible", which in this context means that it is easy to extend existing markup languages to fit into any specific needs.
There are many high quality tools that handle XML, eg editors and parsers. Several of the best are free.
XML is very familiar to most programmers, reducing startup time.
XML documents are easy to read, both by humans and machines.
Many powerful "language" has been developed that utilize and require XML, eg XSLT, CSS, XQuery mm.
Plain text is used, so the format is platform independent.

Data is described by means of a hierarchy of "elements". At the top is a "top element" such as the element "html" in the XHTML document. An element may contain zero, one or more other elements. Element is encoded in the form <element-name> element content </ element-name>.

So called "attributes" can be associated to elements. For example, the attribute "href" is associated with the element 'a' in the following way in html:

<a href="http://www.kth.se> Link to KTH </ a>

A set of elements, attributes, and the way it is permitted to combine these is called an "XML vocabulary." Such a set of rules and restrictions ("constraints") can be defined by a DTD. If an instance document (ie an XML document containing data) is associated with a DTD, it is possible to "validate" the document against the DTD, which means that you can see if/that the instance document follows the rules that are set in the DTD. It thus provides an opportunity for a program to check that the document contains no errors.

Besides the ability to validate the instance document, a DTD is an easy way to concisely define how the creator of the vocabulary meant it to be used. The risk of misunderstandings are reduced. Unfortunately, it is sometimes not possible to define every constraint the vocabulary should have using a DTD. Some limitations have to be written as comments, and these restrictions can naturally not be validated by a validator. XML Schemas, which are processed at the next opportunity, is a powerful way to express the "constraints" vocabulary creator wants expressed, and fills the same role as a DTD.

A DTD describes a "content model" for elements, ie which elements of the hierarchy can be, or should be, "children" to another element. To express that an element "book" can contain exactly one element "title", one of several "authors" zero to one "initial" and zero to several "chapters" can be done by the line

<!ELEMENT book (title, author +, introduction?, chapter*)>

To further express that a writer has several names, zero to several middle name, and exactly one surname can be done by the line

<!ELEMENT author (first-name+, middle-name*, name)>

Expressing that the element "book" may have an attribute named "isbn" can be expressed by

<!ATTLIST book isbn CDATA #IMPLIED>

Links

For a pre-recorded slidecast (slides + audio) of the second part, see http://www.slideshare.net/bjornh/xml-och-dtd

Literature: XML in a nutshell

(3) = Essential for the course
(2) = Important for the course
(1) = Relevant to the course

(2) Chapter 1, "Introducing XML" provides an overview and helps the reader understand the advantages of XML.
(3) Chapter 2, "XML fundamentals" provides an introduction to how to structure your own data and create your own markup language in XML. It also provides explanations of key concepts such as attributes, processing instructions, well-formed and more.
(3) Chapter 3, "Document Type Definitions" explains the concept of validation and review how to define a markup language using a DTD. To complete lab 1, you need to be able to understand this chapter well.
(1) Chapter 21, "XML Reference". Reference Chapter. NOTE! The examples on pages 369 and 370 is good to concretize the basic concepts and can be recommended to everyone.