For the full presentation, click here.
Data Descriptors by Example (DDbE) is a package of components and a simple command line application which facilitates the automatic generation of data descriptors, (DTDs, Schemas, etc.), from a set of well-formed XML example documents. The example documents are valid under the resulting descriptor. A pre-release version can be downloaded from http://www.alphaworks.ibm.com/tech/DDbE. (Version 1.0 which includes documentation will be available before the meeting.)
The function supplied by the DDbE core:
Supports incremental operation. For example, the component interfaces permit additional examples of element content to be added, and descriptors to be accessed and/or modified under program control.
Permits the user to control the content model construction process. For example, maximum depth and looseness, i.e. relative desirability of sequence and list constructs, may be specified either through program control or command line arguments.
Performs type inference and accepts user requests regarding attribute type and default declarations. DDbE determines which declarations are consistent with the given example documents, reports this information, and fills user requests when they are consistent with the examples.
Extensions to the DDbE core are concentrated in two areas:
Expanded type inferencing to utilize the data typing capabilities of XML Schemas. Initially extensions will focus on the primitive and built-in data types specified in the Schema Data Types draft. We expect an initial implementation capable of inferring basic types and facets to be in place at the time of the meeting. For more complex facets, such as inference of lexical representation, we expect to be in the planning stage.
Utilization of repositories of pre-existing data descriptor fragments. Problems include searching repositories for relevant documents and adapting external declarations for local use. This phase of the work is in the planning stage.
In our presentation we discuss DDbE with an emphasis on the set of interfaces which model the processing required to incrementally construct and manipulate data descriptors. The examination of the interfaces will follow their evolution from DTDs to Schemas as well as the extensions necessary to integrate repositories.