Chapter 30: XML: Database System Concepts, 7 Ed
Chapter 30: XML: Database System Concepts, 7 Ed
Chapter 30: XML: Database System Concepts, 7 Ed
Database System Concepts - 7th Edition 30.2 ©Silberschatz, Korth and Sudarshan
Introduction
Database System Concepts - 7th Edition 30.3 ©Silberschatz, Korth and Sudarshan
XML Introduction (Cont.)
The ability to specify new tags, and to create nested tag structures make
XML a great way to exchange data, not just documents.
Much of the use of XML has been in data exchange applications, not
as a replacement for HTML
Tags make data (relatively) self-documenting
E.g.,
<university>
<department>
<dept_name> Comp. Sci. </dept_name>
<building> Taylor </building>
<budget> 100000 </budget>
</department>
<course>
<course_id> CS-101 </course_id>
<title> Intro. to Computer Science </title>
<dept_name> Comp. Sci </dept_name>
<credits> 4 </credits>
</course>
</university>
Database System Concepts - 7th Edition 30.4 ©Silberschatz, Korth and Sudarshan
XML: Motivation
Database System Concepts - 7th Edition 30.5 ©Silberschatz, Korth and Sudarshan
XML Motivation (Cont.)
Earlier generation formats were based on plain text with line headers
indicating the meaning of fields
Similar in concept to email headers
Does not allow for nested structures, no standard “type” language
Tied too closely to low level document structure (lines, spaces, etc)
Each XML based standard defines what are valid elements, using
XML type specification languages to specify the syntax
DTD (Document Type Descriptors)
XML Schema
Plus textual descriptions of the semantics
XML allows new tags to be defined as required
However, this may be constrained by DTDs
A wide variety of tools is available for parsing, browsing and querying
XML documents/data
Database System Concepts - 7th Edition 30.6 ©Silberschatz, Korth and Sudarshan
Comparison with Relational Data
Database System Concepts - 7th Edition 30.7 ©Silberschatz, Korth and Sudarshan
Structure of XML Data
Database System Concepts - 7th Edition 30.8 ©Silberschatz, Korth and Sudarshan
Example of Nested Elements
<purchase_order>
<identifier> P-101 </identifier>
<purchaser> …. </purchaser>
<itemlist>
<item>
<identifier> RS1 </identifier>
<description> Atom powered rocket sled </description>
<quantity> 2 </quantity>
<price> 199.95 </price>
</item>
<item>
<identifier> SG2 </identifier>
<description> Superb glue </description>
<quantity> 1 </quantity>
<unit-of-measure> liter </unit-of-measure>
<price> 29.95 </price>
</item>
</itemlist>
</purchase_order>
Database System Concepts - 7th Edition 30.9 ©Silberschatz, Korth and Sudarshan
Motivation for Nesting
Database System Concepts - 7th Edition 30.10 ©Silberschatz, Korth and Sudarshan
Structure of XML Data (Cont.)
Database System Concepts - 7th Edition 30.11 ©Silberschatz, Korth and Sudarshan
Attributes
Database System Concepts - 7th Edition 30.12 ©Silberschatz, Korth and Sudarshan
Attributes vs. Subelements
Database System Concepts - 7th Edition 30.13 ©Silberschatz, Korth and Sudarshan
Namespaces
XML data has to be exchanged between organizations
Same tag name may have different meaning in different organizations,
causing confusion on exchanged documents
Specifying a unique string as an element name avoids confusion
Better solution: use unique-name:element-name
Avoid using long unique names all over document by using XML
Namespaces
<university xmlns:yale=“http://www.yale.edu”>
…
<yale:course>
<yale:course_id> CS-101 </yale:course_id>
<yale:title> Intro. to Computer Science</yale:title>
<yale:dept_name> Comp. Sci. </yale:dept_name>
<yale:credits> 4 </yale:credits>
</yale:course>
…
</university>
Database System Concepts - 7th Edition 30.14 ©Silberschatz, Korth and Sudarshan
More on XML Syntax
Database System Concepts - 7th Edition 30.15 ©Silberschatz, Korth and Sudarshan
XML Document Schema
Database schemas constrain what information can be stored, and the data
types of stored values
XML documents are not required to have an associated schema
However, schemas are very important for XML data exchange
Otherwise, a site cannot automatically interpret data received from
another site
Two mechanisms for specifying XML schema
Document Type Definition (DTD)
Widely used
XML Schema
Newer, increasing use
Database System Concepts - 7th Edition 30.16 ©Silberschatz, Korth and Sudarshan
Document Type Definition (DTD)
Database System Concepts - 7th Edition 30.17 ©Silberschatz, Korth and Sudarshan
Element Specification in DTD
Database System Concepts - 7th Edition 30.18 ©Silberschatz, Korth and Sudarshan
University DTD
<!DOCTYPE university [
<!ELEMENT university ( (department|course|instructor|teaches)+)>
<!ELEMENT department ( dept name, building, budget)>
<!ELEMENT course ( course id, title, dept name, credits)>
<!ELEMENT instructor (IID, name, dept name, salary)>
<!ELEMENT teaches (IID, course id)>
<!ELEMENT dept name( #PCDATA )>
<!ELEMENT building( #PCDATA )>
<!ELEMENT budget( #PCDATA )>
<!ELEMENT course id ( #PCDATA )>
<!ELEMENT title ( #PCDATA )>
<!ELEMENT credits( #PCDATA )>
<!ELEMENT IID( #PCDATA )>
<!ELEMENT name( #PCDATA )>
<!ELEMENT salary( #PCDATA )>
]>
Database System Concepts - 7th Edition 30.19 ©Silberschatz, Korth and Sudarshan
Attribute Specification in DTD
Attribute specification : for each attribute
• Name
• Type of attribute
CDATA
ID (identifier) or IDREF (ID reference) or IDREFS (multiple
IDREFs)
– more on this later
• Whether
mandatory (#REQUIRED)
has a default value (value),
or neither (#IMPLIED)
Examples
• <!ATTLIST course course_id CDATA #REQUIRED>, or
• <!ATTLIST course
course_id ID #REQUIRED
dept_name IDREF #REQUIRED
instructors IDREFS #IMPLIED >
Database System Concepts - 7th Edition 30.20 ©Silberschatz, Korth and Sudarshan
IDs and IDREFs
Database System Concepts - 7th Edition 30.21 ©Silberschatz, Korth and Sudarshan
University DTD with Attributes
Database System Concepts - 7th Edition 30.22 ©Silberschatz, Korth and Sudarshan
XML data with ID and IDREF attributes
<university-3>
<department dept name=“Comp. Sci.”>
<building> Taylor </building>
<budget> 100000 </budget>
</department>
<department dept name=“Biology”>
<building> Watson </building>
<budget> 90000 </budget>
</department>
<course course id=“CS-101” dept name=“Comp. Sci”
instructors=“10101 83821”>
<title> Intro. to Computer Science </title>
<credits> 4 </credits>
</course>
….
<instructor IID=“10101” dept name=“Comp. Sci.”>
<name> Srinivasan </name>
<salary> 65000 </salary>
</instructor>
….
</university-3>
Database System Concepts - 7th Edition 30.23 ©Silberschatz, Korth and Sudarshan
Limitations of DTDs
Database System Concepts - 7th Edition 30.24 ©Silberschatz, Korth and Sudarshan
XML Schema
Database System Concepts - 7th Edition 30.25 ©Silberschatz, Korth and Sudarshan
XML Schema Version of Univ. DTD
<xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema”>
<xs:element name=“university” type=“universityType” />
<xs:element name=“department”>
<xs:complexType>
<xs:sequence>
<xs:element name=“dept name” type=“xs:string”/>
<xs:element name=“building” type=“xs:string”/>
<xs:element name=“budget” type=“xs:decimal”/>
</xs:sequence>
</xs:complexType>
</xs:element>
….
<xs:element name=“instructor”>
<xs:complexType>
<xs:sequence>
<xs:element name=“IID” type=“xs:string”/>
<xs:element name=“name” type=“xs:string”/>
<xs:element name=“dept name” type=“xs:string”/>
<xs:element name=“salary” type=“xs:decimal”/>
</xs:sequence>
</xs:complexType>
</xs:element>
… Contd.
Database System Concepts - 7 Edition
th
30.26 ©Silberschatz, Korth and Sudarshan
XML Schema Version of Univ. DTD (Cont.)
….
<xs:complexType name=“UniversityType”>
<xs:sequence>
<xs:element ref=“department” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element ref=“course” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element ref=“instructor” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element ref=“teaches” minOccurs=“0” maxOccurs=“unbounded”/>
</xs:sequence>
</xs:complexType>
</xs:schema>
Choice of “xs:” was ours -- any other namespace prefix could be chosen
Element “university” has type “universityType”, which is defined separately
• xs:complexType is used later to create the named complex type
“UniversityType”
Database System Concepts - 7th Edition 30.27 ©Silberschatz, Korth and Sudarshan
More features of XML Schema
Database System Concepts - 7th Edition 30.28 ©Silberschatz, Korth and Sudarshan
Querying and Transforming XML Data
Database System Concepts - 7th Edition 30.29 ©Silberschatz, Korth and Sudarshan
Tree Model of XML Data
Database System Concepts - 7th Edition 30.30 ©Silberschatz, Korth and Sudarshan
XPath
Database System Concepts - 7th Edition 30.31 ©Silberschatz, Korth and Sudarshan
XPath (Cont.)
The initial “/” denotes root of the document (above the top-level tag)
Path expressions are evaluated left to right
• Each step operates on the set of instances produced by the previous
step
Selection predicates may follow any step in a path, in [ ]
• E.g., /university-3/course[credits >= 4]
returns account elements with a balance value greater than 400
/university-3/course[credits] returns account elements containing
a credits subelement
Attributes are accessed using “@”
• E.g., /university-3/course[credits >= 4]/@course_id
returns the course identifiers of courses with credits >= 4
• IDREF attributes are not dereferenced automatically (more on this
later)
Database System Concepts - 7th Edition 30.32 ©Silberschatz, Korth and Sudarshan
Functions in XPath
XPath provides several functions
The function count() at the end of a path counts the number of
elements in the set generated by the path
E.g., /university-2/instructor[count(./teaches/course)> 2]
– Returns instructors teaching more than 2 courses (on
university-2 schema)
Also function for testing position (1, 2, ..) of node w.r.t. siblings
Boolean connectives and and or and function not() can be used in
predicates
IDREFs can be referenced using function id()
id() can also be applied to sets of references such as IDREFS and
even to strings containing multiple references separated by blanks
E.g., /university-3/course/id(@dept_name)
returns all department elements referred to from the dept_name
attribute of course elements.
Database System Concepts - 7th Edition 30.33 ©Silberschatz, Korth and Sudarshan
More XPath Features
Database System Concepts - 7th Edition 30.34 ©Silberschatz, Korth and Sudarshan
XQuery
XQuery is a general purpose query language for XML data
Currently being standardized by the World Wide Web Consortium (W3C)
The textbook description is based on a January 2005 draft of the
standard. The final version may differ, but major features likely to stay
unchanged.
XQuery is derived from the Quilt query language, which itself borrows from
SQL, XQL and XML-QL
XQuery uses a
for … let … where … order by …result …
syntax
for SQL from
where SQL where
order by SQL order by
result SQL select
let allows temporary variables, and has no equivalent in SQL
Database System Concepts - 7th Edition 30.35 ©Silberschatz, Korth and Sudarshan
FLWOR Syntax in XQuery
For clause uses XPath expressions, and variable in for clause ranges over
values in the set returned by XPath
Simple FLWOR expression in XQuery
• find all courses with credits > 3, with each result enclosed in an
<course_id> .. </course_id> tag
for $x in /university-3/course
let $courseId := $x/@course_id
where $x/credits > 3
return <course_id> { $courseId } </course id>
• Items in the return clause are XML text unless enclosed in {}, in which
case they are evaluated
Let clause not really needed in this query, and selection can be done In
XPath. Query can be written as:
for $x in /university-3/course[credits > 3]
return <course_id> { $x/@course_id } </course_id>
Alternative notation for constructing elements:
return element course_id { element $x/@course_id }
Database System Concepts - 7th Edition 30.36 ©Silberschatz, Korth and Sudarshan
Joins
Database System Concepts - 7th Edition 30.37 ©Silberschatz, Korth and Sudarshan
Nested Queries
The following query converts data from the flat structure for university
information into the nested structure used in university-1
<university-1>
{ for $d in /university/department
return <department>
{ $d/* }
{ for $c in /university/course[dept name = $d/dept name]
return $c }
</department>
}
{ for $i in /university/instructor
return <instructor>
{ $i/* }
{ for $c in /university/teaches[IID = $i/IID]
return $c/course id }
</instructor>
}
</university-1>
$c/* denotes all the children of the node to which $c is bound, without the
enclosing top-level tag
Database System Concepts - 7th Edition 30.38 ©Silberschatz, Korth and Sudarshan
Grouping and Aggregation
for $d in /university/department
return
<department-total-salary>
<dept_name> { $d/dept name } </dept_name>
<total_salary> { fn:sum(
for $i in /university/instructor[dept_name = $d/dept_name]
return $i/salary
)}
</total_salary>
</department-total-salary>
Database System Concepts - 7th Edition 30.39 ©Silberschatz, Korth and Sudarshan
Sorting in XQuery
Database System Concepts - 7th Edition 30.40 ©Silberschatz, Korth and Sudarshan
Functions and Other XQuery Features
Database System Concepts - 7th Edition 30.41 ©Silberschatz, Korth and Sudarshan
XSLT
Database System Concepts - 7th Edition 30.42 ©Silberschatz, Korth and Sudarshan
Application Program Interface
Database System Concepts - 7th Edition 30.43 ©Silberschatz, Korth and Sudarshan
Storage of XML Data
Database System Concepts - 7th Edition 30.44 ©Silberschatz, Korth and Sudarshan
Storage of XML in Relational Databases
Alternatives:
String Representation
Tree Representation
Map to relations
Database System Concepts - 7th Edition 30.45 ©Silberschatz, Korth and Sudarshan
String Representation
Database System Concepts - 7th Edition 30.46 ©Silberschatz, Korth and Sudarshan
String Representation (Cont.)
Benefits:
Can store any XML data even without DTD
As long as there are many top-level elements in a document, strings
are small compared to full document
Allows fast access to individual elements.
Drawback: Need to parse strings to access values inside the elements
Parsing is slow.
Database System Concepts - 7th Edition 30.47 ©Silberschatz, Korth and Sudarshan
Tree Representation
Tree representation: model XML data as tree and store using relations
nodes(id, parent_id, type, label, value)
university (id:1)
course_id dept_name
(id: 3) (id: 7)
Each element/attribute is given a unique identifier
Type indicates element/attribute
Label specifies the tag name of the element/name of attribute
Value is the text value of the element/attribute
Can add an extra attribute position to record ordering of children
Database System Concepts - 7th Edition 30.48 ©Silberschatz, Korth and Sudarshan
Tree Representation (Cont.)
Database System Concepts - 7th Edition 30.49 ©Silberschatz, Korth and Sudarshan
Mapping XML Data to Relations
Database System Concepts - 7th Edition 30.50 ©Silberschatz, Korth and Sudarshan
Storing XML Data in Relational Systems
Database System Concepts - 7th Edition 30.51 ©Silberschatz, Korth and Sudarshan
SQL/XML
New standard SQL extension that allows creation of nested XML output
Each output tuple is mapped to an XML element row
<university>
<department>
<row>
<dept name> Comp. Sci. </dept name>
<building> Taylor </building>
<budget> 100000 </budget>
</row>
…. more rows if there are more output tuples …
</department>
… other relations ..
</university>
Database System Concepts - 7th Edition 30.52 ©Silberschatz, Korth and Sudarshan
SQL Extensions
Database System Concepts - 7th Edition 30.53 ©Silberschatz, Korth and Sudarshan
XML Applications
Database System Concepts - 7th Edition 30.54 ©Silberschatz, Korth and Sudarshan
Web Services
Database System Concepts - 7th Edition 30.55 ©Silberschatz, Korth and Sudarshan
End of Chapter 30
Database System Concepts - 7th Edition 30.56 ©Silberschatz, Korth and Sudarshan