<?xml version='1.0'?>
<!DOCTYPE slideshow SYSTEM "slide.dtd">
<slideshow>
 
<showtitleslide>
  <title>I. XML Basics</title>
  <occasion>University of California Extension</occasion>
  <location>Sunnyvale, June 10, 1999</location>
  <speaker>Jon Bosak</speaker>
  <org>Sun Microsystems</org>
</showtitleslide>
 
<slidemodule>
 
  <moduletitleslide>
	 <title>The Fundamentals</title>
  </moduletitleslide>
 
  <slide>
	 <head>What is XML?</head>
	 <bulletlist spacing="wide">
		<item>
		  <p>Extensible Markup Language</p>
		</item>
		<item>
		  <p>An activity of the World Wide Web Consortium (W3C) organized
			 and led by Sun Microsystems</p>
		</item>
		<item>
		  <p>Objective: move the Web to its next stage of evolution by
			 adapting existing ISO standards for markup, linking, and formatting</p>
		</item>
	 </bulletlist>
	 <p>Primary effects:</p>
	 <numberedlist spacing="wide">
		<item>
		  <p>Will create new data-centric Web applications</p>
		</item>
		<item>
		  <p>Will fundamentally change publishing on the web and
			 publishing in general</p>
		</item>
	 </numberedlist>
  </slide>

  <slide>
	 <head>What made XML necessary?</head>
	 <p>Two aspects of Web evolution demanded a technology beyond HTML. 
	 </p>
	 <bulletlist>
		<item>
		  <p>Internationalized electronic publishing</p>
		  <bulletlist>
			 <item>
				<p>Platform-independent</p>
			 </item>
			 <item>
				<p>Language-independent</p>
			 </item>
			 <item>
				<p>Media-independent</p>
			 </item>
		  </bulletlist>
		</item>
		<item>
		  <p>New data-centric Web applications</p>
		  <bulletlist>
			 <item>
				<p>Database exchange</p>
			 </item>
			 <item>
				<p>Distribution of processing to clients</p>
			 </item>
			 <item>
				<p>Client-side manipulation of views into the data</p>
			 </item>
			 <item>
				<p>Customization of information by intelligent agents</p>
			 </item>
			 <item>
				<p>Management of document collections</p>
			 </item>
		  </bulletlist>
		</item>
	 </bulletlist>
  </slide>

  <slide>
	 <head>What's wrong with HTML?</head>
	 <bulletlist>
		<item>
		  <p>HTML was optimized for easy learning</p>
		  <bulletlist>
			 <item>
				<p>One tag set for all applications</p>
			 </item>
			 <item>
				<p>Predefined semantics for each tag</p>
			 </item>
			 <item>
				<p>Predefined data structures</p>
			 </item>
			 <item>
				<p>No formal validation</p>
			 </item>
		  </bulletlist>
		</item>
		<item>
		  <p>HTML trades power for ease of use</p>
		</item>
		<item>
		  <p>HTML is well suited to simple applications, but poorly
			 suited to more demanding applications</p>
		  <bulletlist>
			 <item>
				<p>Large or complex collections of data</p>
			 </item>
			 <item>
				<p>Data that must be used in different ways</p>
			 </item>
			 <item>
				<p>Data with a long life cycle</p>
			 </item>
			 <item>
				<p>Data intended to drive scripts or Java applets</p>
			 </item>
		  </bulletlist>
		</item>
	 </bulletlist>
  </slide>

  <slide>
	 <head>What does XML provide?</head>
	 <p>XML provides key features needed for a new generation of Web
		applications:</p>
	 <bulletlist spacing="wide">
		<item>
		  <p><b>Extensibility: </b>Users can define new tags as needed 
		  </p>
		</item>
		<item>
		  <p><b>Structure: </b>Hierarchical data can be modeled to any
			 level of complexity</p>
		</item>
		<item>
		  <p><b>Validation: </b>Data can be checked for structural
			 correctness</p>
		</item>
		<item>
		  <p><b>Media independence: </b>The same content can be published
			 in multiple media</p>
		</item>
	 </bulletlist>
  </slide>

  <slide>
	 <head>Why did Sun invest in XML?</head>
	 <numberedlist>
		<item>
		  <p>In industry, we knew from electronic publishing experience
			 that HTML would not work for publishing in the general case.</p>
		</item>
		<item>
		  <p>We also knew that future Web applications would require a
			 method of encoding that could drive arbitrarily complex distributed
			 processes.</p>
		</item>
		<item>
		  <p>It was clear that if an open standard like XML was not
			 created, HTML would be replaced by a more powerful <emph>binary
			 proprietary format.</emph></p>
		</item>
	 </numberedlist>
	 <p>Strategically, we had to have XML in order to keep Web data open
		and portable. We needed XML to do for data what Java does for programs.
		</p>
  </slide>

  <slide>
	 <head>Current status</head>
	 <bulletlist>
		<item>
		  <p>The XML 1.0 Rec is being widely deployed</p>
		</item>
		<item>
		  <p>XML is being widely adopted as a framework for the
			 definition of domain-specific languages</p>
		</item>
		<item>
		  <p>It is now generally agreed that Web content will be managed
			 using standards based on XML</p>
		</item>
	 </bulletlist>
	 <p>Key predictions:</p>
	 <numberedlist>
		<item>
		  <p>XML will be the basis for future Web standards. </p>
		</item>
		<item>
		  <p>XML will become the universal format for data exchange in
			 heterogenous environments.</p>
		</item>
		<item>
		  <p>XML will almost certainly become the basis for international
			 publishing.</p>
		</item>
		<item>
		  <p>The combination of XML and XSL may replace all existing word
			 processing and desktop publishing formats.</p>
		</item>
	 </numberedlist>
  </slide>

  <slide>
	 <head>Key sources of information about XML</head>
	 <bulletlist spacing="wide">
		<item>
		  <p><b>The W3C activity:</b></p>
		  <p><hlink
			 href="http://www.w3.org/XML">http://www.w3.org/XML/</hlink></p>
		</item>
		<item>
		  <p><b>Standards and drafts:</b></p>
		  <p><hlink
			 href="http://www.w3.org/TR">http://www.w3.org/TR/</hlink></p>
		</item>
		<item>
		  <p><b>Markup technology in general:</b></p>
		  <p><hlink
			 href="http://www.oasis-open.org/cover/">http://www.oasis-open.org/cover/</hlink>
		  </p>
		</item>
	 </bulletlist>
  </slide>

</slidemodule>

<slidemodule>
 
  <moduletitleslide>
	 <title>The XML Family of Standards</title>
  </moduletitleslide>
 
  <slide>
	 <head>Meet the family</head>
	 <p>The XML family of languages moves the web to a new level of
		evolution suitable for electronic commerce and other
		industrial-strength applications.</p>
	 <bulletlist>
		<item>
		  <p><b>XML </b>(Extensible Markup Language): A subset of SGML
			 (ISO 8879) designed for easy implementation</p>
		  <bulletlist>
			 <item>
				<p>Will replace HTML markup in industrial contexts</p>
			 </item>
		  </bulletlist>
		</item>
		<item>
		  <p><b>XLink/XPointer</b>: A set of standard hypertext
			 mechanisms based on HyTime (ISO/IEC 10744) and the Text Encoding
			 Initiative (TEI)</p>
		  <bulletlist>
			 <item>
				<p>Will replace HTML linking in industrial contexts</p>
			 </item>
		  </bulletlist>
		</item>
		<item>
		  <p><b>XSL </b>(Extensible Stylesheet Language): A standard
			 stylesheet language for structured information based on DSSSL (ISO/IEC
			 10179) and CSS</p>
		  <bulletlist>
			 <item>
				<p>Will replace CSS in industrial contexts</p>
			 </item>
		  </bulletlist>
		</item>
	 </bulletlist>
  </slide>

  <slide>
	 <head>XML itself</head>
	 <bulletlist spacing="wide">
		<item>
		  <p>A simplified subset of SGML (ISO 8879)</p>
		  <bulletlist spacing="tight">
			 <item>
				<p>Very powerful -- no limits on namespace or structural
				  depth</p>
			 </item>
			 <item>
				<p>But easy to implement and small enough for Web browsers 
				</p>
			 </item>
		  </bulletlist>
		</item>
		<item>
		  <p>Not a language but a metalanguage</p>
		  <bulletlist spacing="tight">
			 <item>
				<p>Designed to support the definition of an unlimited
				  number of vertical-market languages for specific industries</p>
			 </item>
			 <item>
				<p>All XML languages can be processed by a single
				  lightweight parser built into every Web browser</p>
			 </item>
		  </bulletlist>
		</item>
	 </bulletlist>
  </slide>

  <slide>
	 <head>XML tag languages</head>
	 <p>XML allows industries to design specific tag languages to solve
		specific problems.</p>
	 <p>Examples featured in Robin Cover's SGML/XML News page in one
		recent 30-day period (3/15 to 4/15, 1999):</p>
	 <bulletlist>
		<item>
		  <p>SVG (Scalable Vector Graphics)</p>
		</item>
		<item>
		  <p>XMLNews (for the news industry)</p>
		</item>
		<item>
		  <p>XCI (XML Court Interface)</p>
		</item>
		<item>
		  <p>DocBk XML (for software documentation)</p>
		</item>
		<item>
		  <p>XMI (XML Metadata Interface Format -- OMG)</p>
		</item>
		<item>
		  <p>WAP (Wireless Application Protocol)</p>
		</item>
		<item>
		  <p>SIF (Schools Interoperability Framework)</p>
		</item>
	 </bulletlist>
	 <p>Key: An unlimited number of domain-specific tag languages can
		all be processed by a single parser.</p>
  </slide>

  <slide>
	 <head>XML in isolation</head>
	 <bulletlist>
		<item>
		  <p>"Syntax, not semantics"</p>
		  <bulletlist>
			 <item>
				<p>Tags have no predefined meaning</p>
			 </item>
			 <item>
				<p>XML by itself conveys only content and structure, not
				  presentation or behavior (unlike HTML)</p>
			 </item>
		  </bulletlist>
		</item>
		<item>
		  <p>There are important applications for XML alone: interprocess
			 communication, object serialization, metadata, database exchange</p>
		</item>
		<item>
		  <p>But associating <emph>presentation or behavior </emph>with
			 XML requires additional mechanisms</p>
		  <bulletlist>
			 <item>
				<p>Downloadable programs, applets, or scripts designed for
				  a specific tag set (grammar)</p>
			 </item>
			 <item>
				<p>Tag-sensitive components (e.g., Java beans)</p>
			 </item>
			 <item>
				<p>Industry agreements on the processing of specific
				  grammars (example: HTML)</p>
			 </item>
			 <item>
				<p>Stylesheets (XSL or CSS)</p>
			 </item>
		  </bulletlist>
		</item>
	 </bulletlist>
  </slide>

</slidemodule>

<slidemodule>
 
  <moduletitleslide>
	 <title>Classical XML</title>
  </moduletitleslide>
 
  <slide>
	 <head>What's a document?</head>
	 <p>A document is data that you can <emph>read.</emph></p>
	 <p>Documents are a <emph>superset </emph>of data.</p>
	 <p>The basic problem with documents is that we need to display them
		in lots of <emph>different forms</emph>. This is the problem that XML
		and SGML were originally designed to solve.</p>
  </slide>

  <slide>
	 <head>Basic document analysis</head>
	 <image src="aspects.gif">[Content, structure, presentation]</image>
  </slide>

  <slide>
	 <head>Structured publishing</head>
	 <image src="concept.gif">[Presentation generated from
		content+structure] </image>
	 <p>XML allows you to specify the content and structure of a
		document in a way that lets you generate particular presentations as
		needed.</p>
  </slide>

  <slide>
	 <head>XML in one slide</head>
	 <bulletlist>
		<item>
		  <p>Legal XML documents are called <emph>well-formed</emph></p>
		</item>
		<item>
		  <p>A well-formed document describes a <emph>logical tree</emph>
		  </p>
		</item>
		<item>
		  <p>If a well-formed document conforms to an optional set of
			 constraints (a DTD), it is also <emph>valid</emph></p>
		</item>
	 </bulletlist>
	 <p>A well-formed XML document:</p>
	 <codesample>&lt;greeting type="friendly"&gt;Hello, world!&lt;/greeting&gt;</codesample>
	 <p>A valid XML document:</p>
	 <codesample>&lt;?xml version="1.0" encoding="UTF-8" ?&gt;
&lt;!DOCTYPE greeting [
  &lt;!ELEMENT greeting (#PCDATA)&gt;
  &lt;!ATTLIST greeting type (friendly | unfriendly)
                                      "friendly" &gt;
]&gt;
&lt;greeting&gt;Hello, world!&lt;/greeting&gt;</codesample>


  </slide>

  <slide>
	 <head>Proof of concept: this presentation</head>
	 <p><emph>(These are links in the online version.)</emph></p>
	 <bulletlist>
		<item>
		  <p>The <hlink href="show.xml">XML source </hlink>from which
			 this presentation was produced</p>
		</item>
		<item>
		  <p>The optional <hlink href="slide.dtd">XML DTD </hlink>used to
			 validate the XML source</p>
		</item>
		<item>
		  <p>The <hlink href="hslide.dsl">DSSSL style sheet </hlink>for
			 the HTML used in the online version</p>
		</item>
		<item>
		  <p>The <hlink href="pslide.dsl">DSSSL style sheet </hlink>for
			 the RTF used in the printed version</p>
		</item>
		<item>
		  <p>The <hlink href="http://www.jclark.com/jade">Jade DSSSL
			 engine </hlink>used to produce both the HTML and RTF files</p>
		</item>
		<item>
		  <p>An <hlink href="show.rtf">RTF version </hlink>of this
			 presentation produced by Jade</p>
		</item>
		<item>
		  <p>A <hlink href="show.ps">PostScript version </hlink>of this
			 presentation made from the RTF file</p>
		</item>
		<item>
		  <p>A <hlink href="show.pdf">PDF version </hlink>of this
			 presentation made from the PS file</p>
		</item>
	 </bulletlist>
  </slide>

  <slide>
	 <head>Lessons from the proof of concept</head>
	 <bulletlist spacing="wide">
		<item>
		  <p>Media-independent publishing works!</p>
		</item>
		<item>
		  <p>HTML can handle the online version (for the moment), but not
			 the print version</p>
		</item>
		<item>
		  <p>The language for formatting specifications (stylesheets)
			 must support <emph>structural transformation </emph>as well as
			 formatting</p>
		</item>
	 </bulletlist>
  </slide>

  <slide>
	 <head>Summary of classical XML</head>
	 <p>Separating content and structure from presentation and behavior
		makes possible</p>
	 <bulletlist spacing="wide">
		<item>
		  <p>Reusable information</p>
		</item>
		<item>
		  <p>Media-independent publishing</p>
		</item>
		<item>
		  <p>One-on-one marketing</p>
		</item>
		<item>
		  <p>Intelligent downstream document processing</p>
		</item>
		<item>
		  <p>Large-scale information management</p>
		</item>
	 </bulletlist>
  </slide>

</slidemodule>

<slidemodule>
 
  <moduletitleslide>
	 <title>Internationalization</title>
  </moduletitleslide>
 
  <slide>
	 <head>XML and Unicode</head>
	 <bulletlist>
		<item>
		  <p>XML has been based on Unicode from Day One</p>
		  <bulletlist>
			 <item>
				<p>There is nothing in an XML file but Unicode characters 
				</p>
			 </item>
			 <item>
				<p>Unicode is used for both content and markup (so you can
				  mix languages, even in tag names)</p>
			 </item>
		  </bulletlist>
		</item>
		<item>
		  <p>XML tools <emph>must </emph>support both the UTF-8 and
			 UTF-16 encodings of Unicode</p>
		  <bulletlist>
			 <item>
				<p>UTF-8: 1-5 bytes; Latin-1 is upward-compatible</p>
			 </item>
			 <item>
				<p>UTF-16: 2 bytes; fixed overhead</p>
			 </item>
		  </bulletlist>
		</item>
		<item>
		  <p>The widespread adoption of XML for data management and
			 electronic commerce will probably make Unicode support universal</p>
		</item>
	 </bulletlist>
  </slide>

  <slide>
	 <head>Example: an international bookstore</head>
	 <image src="bcat.gif">[Japanese book catalog]</image>
  </slide>

  <slide>
	 <head>With stylesheet for Japanese</head>
	 <image src="bcat-j.gif">[Catalog rendered for reader of Japanese ] 
	 </image>
  </slide>

  <slide>
	 <head>With stylesheet for English</head>
	 <image src="bcat-e.gif">[Catalog rendered for reader of English] 
	 </image>
  </slide>

  <slide>
	 <head>Source files for the bookstore example</head>
	 <p><emph>(These are links in the online version.)</emph></p>
	 <bulletlist>
		<item>
		  <p>The <hlink href="bcat.uc.txt">UTF-16 XML source </hlink>from
			 which the different versions were produced</p>
		</item>
		<item>
		  <p>The <hlink href="ssj-dsl.uc">UTF-16 DSSSL style sheet
			 </hlink>used to produce the version for the reader of Japanese</p>
		</item>
		<item>
		  <p>The <hlink href="sse-dsl.uc">UTF-16 DSSSL style sheet
			 </hlink>used to produce the version for the reader of English</p>
		</item>
		<item>
		  <p>The <hlink href="http://www.jclark.com/jade">Jade DSSSL
			 engine </hlink>used to produce RTF files from the source and the style
			 sheets</p>
		</item>
		<item>
		  <p>The <hlink href="bcat-j.rtf">UTF-16 RTF file </hlink>for the
			 reader of Japanese (font association done in Word 97)</p>
		</item>
		<item>
		  <p>The <hlink href="bcat-e.rtf">UTF-16 RTF file </hlink>for the
			 reader of English (font association done in Word 97)</p>
		</item>
	 </bulletlist>
  </slide>

  <slide>
	 <head>Lessons from the example</head>
	 <bulletlist spacing="normal">
		<item>
		  <p>The catalog example shows that the distinction between data
			 exchange and publishing is ultimately an artificial one (the same
			 source would also be used to create the printed catalog)</p>
		</item>
		<item>
		  <p>The rendition in each case occurs <emph>on the web
			 client</emph></p>
		</item>
		<item>
		  <p>The database owner can publish <emph>a single data stream
			 </emph>to the entire world</p>
		</item>
		<item>
		  <p>Consider the alternative:</p>
		  <bulletlist spacing="normal">
			 <item>
				<p>Generation of a different HTML output stream for
				  <emph>every possible </emph>user and target platform</p>
			 </item>
			 <item>
				<p>Much greater load on the server</p>
			 </item>
			 <item>
				<p>No user autonomy</p>
			 </item>
		  </bulletlist>
		</item>
	 </bulletlist>
  </slide>

</slidemodule>

<slidemodule>
 
  <moduletitleslide>
	 <title>Namespaces</title>
  </moduletitleslide>
 
  <slide>
	 <head>The naming of names</head>
	 <bulletlist>
		<item>
		  <p>In electronic commerce, XML documents will be assembled on
			 the fly from a wide variety of sources using different tag vocabularies
			 (DTDs)</p>
		</item>
		<item>
		  <p>Must prevent collisions between elements (or attributes)
			 with the same name but different meanings</p>
		  <bulletlist>
			 <item>
				<p>For example, the element &lt;RING&gt; would have very
				  different meanings in a jewelry catalogue, a chemistry textbook, and a
				  mathematical journal</p>
			 </item>
		  </bulletlist>
		</item>
		<item>
		  <p>Must also allow re-use of common data elements (dates,
			 currencies, measurements) across different XML tag languages</p>
		</item>
		<item>
		  <p>Ultimately, we will need a system for associating
			 <emph>meanings </emph>with XML components</p>
		</item>
		<item>
		  <p>XML Namespaces (http://www.w3.org/TR/) is a small first step
			 toward solving this problem</p>
		</item>
	 </bulletlist>
  </slide>

  <slide>
	 <head>The concept of the XML namespace</head>
	 <bulletlist>
		<item>
		  <p>An XML <i>namespace </i>is a collection of XML element
			 and/or attribute names that are guaranteed to be <i>unique</i></p>
		</item>
		<item>
		  <p>Basic trick: use DNS (Domain Name Service) to ensure
			 uniqueness</p>
		</item>
	 </bulletlist>
	 <p>DNS is the service that controls the ownership of domain names.
		It also provides the mechanism whereby names are resolved to actual
		resources, <i>but DNS resolution is not necessary to make XML
		namespaces work.</i></p>
  </slide>

  <slide>
	 <head>URI + name=unique name</head>
	 <p>Here the element name "price" is not unique:</p>
	 <codesample>&lt;x&gt;
  &lt;price units='Euro'&gt;
    32.18
  &lt;/price&gt;
&lt;/x&gt;</codesample>
	 <p>Prefix the element name with a URI such as
		"http://ecommerce.org/schema"; now the name is unique (although verbose
		and syntactically illegal):</p>
	 <codesample>&lt;x&gt;
  &lt;{http://ecommerce.org/schema}price units='Euro'&gt;
    32.18
  &lt;/{http://ecommerce.org/schema}price&gt;
&lt;/x&gt;</codesample>


  </slide>

  <slide>
	 <head>The namespace prefix</head>
	 <p>By substituting a <i>namespace prefix </i>for the URI we get a
		structure that is both elegant and legal:</p>
	 <codesample>&lt;x xmlns:edi='http://ecommerce.org/schema'&gt;
  &lt;edi:price units='Euro'&gt;
    32.18
  &lt;/edi:price&gt;
&lt;/x&gt;</codesample>
	 <p>Namespace scoping ensures that "edi:" means the same as
		"{http://ecommerce.org/schema}" only upon and within the element
		&lt;x&gt; on which it is declared.</p>

  </slide>

  <slide>
	 <head>Important things to remember about namespaces</head>
	 <numberedlist>
		<item>
		  <p>Namespace prefixes are just <emph>temporary placeholders
			 </emph>for the current namespace URI. </p>
		  <p><emph>There are no standard prefixes!</emph></p>
		</item>
		<item>
		  <p>A namespace URI does not necessarily point to a web resource
			 (although it may).</p>
		</item>
		<item>
		  <p>If there is a resource, it is as likely to be a prose
			 description as a machine-processable schema.</p>
		</item>
		<item>
		  <p>Namespace scoping is cool but complicated.</p>
		</item>
		<item>
		  <p>Namespaces make traditional DTD validation highly
			 problematic if not downright useless. The solution to this lies in the
			 XML schema work.</p>
		</item>
		<item>
		  <p>We need much more namespace implementation experience before
			 this technique can be considered fully cooked.</p>
		</item>
	 </numberedlist>
  </slide>

</slidemodule>

</slideshow>
