Session D-XML

Einführung in XML

Markus Egger
EPS Software Corp.


Introduction to XML Development

XML (Extensible Markup Language) got a lot of attention recently, yet most people don’t really know what XML is, and why it will revolutionize the way we are doing things, just as HTML did.

XML is a language (standard) that may be used to define markup languages and file formats. It is not a new markup language or an extension to HTML. XML could be used to define HTML (with some special rules and exceptions), but this is only one example. One of the main problems we have with HTML is that we are stuck with the tags defined in the HTML specification. Those tags are mainly geared towards specifying the page layout and linking content. With the growing acceptance of the Internet and the integration in our daily live, this is not sufficient anymore. What if you want to render pages with content that’s beyond the capabilities of HTML? What if you want to render mathematical formulas, chemical formulas or musical notes? Those capabilities are not supported by HTML, but you could easily define a new markup language that is able to render those things. Defining that markup language is only half of the deal, of course. Just because you define that new language doesn’t mean a browser will be able to render those things. If your new language will be used for things as radical as musical notes, you will also have to provide a browser that is able to render your language. For most uses this is not required though. Especially in the world of databases, where XML files usually contain data. Data can easily be converted to HTML.

A simple example

Let’s look at a little XML example. Above I mentioned that you could create a markup language for musical notes. I created this markup language especially as an example for this session. I call this language MusML (Musical Markup Language).

As you may have noticed, this code is very similar to HTML, yet essentially different. We have lots of <…> tags, but the names of those tags aren’t available in HTML. That’s fine. After all, we are looking at MusML here and not at HTML.This markup language is designed to publish songs on the Internet.

Let’s have a look at how this document is organized: The very first tag (<?XML Version=”1.0”?>) identifies the code as XML It has to be there in order to let XML parsers know that this is XML version 1.0. Everything beyond that tag is individual to MusML. The first individual tag (<Song…>) identifies an entire song and it specifies the title and the beat. All the sub-tags identify individual notes, their frequency, the octave, the duration, and the lyrics.

So far so simple and so good. The question is how we can use this file now. For obvious reasons, a regular web browser such as Internet Explorer will not be able to render this file. We have the option of converting the XML code into HTML. We could convert each note into a certain <IMG…> tag so it would be rendered as a simple bitmap that represents the note. This would defeat the purpose though, since we would loose all the meaning behind our original file. What used to be a clearly defined note (including frequency, octave and more) would be turned into an almost meaningless image.

The second option would be to provide s special browser the was designed to render MusML files. I created such a browser using only Visual FoxPro code. You can download it from my web site (www.eps-software.com). Here’s how this browser looks like:

 

Again: The entire rendering engine was created in Visual FoxPro. No other browser (such as IE) is involved here. What happens here is that the browser reads and parses the MusML file and renders the individual notes without converting the file first as it would be necessary for other browsers.

But Markus…”, you may ask, “…what’s the difference between this browser and converting MusML into HTML?”. Good question! Well, the major difference is that we still know what each note stands for, what frequency it has and what the lyrics are. Using HTML, we would have lost all that information. To see what this means for us, click on the play button, lean back, and enjoy the show!

But this isn’t all we can do with MusML. Imagine a keyboard or a synthesizer that is connected to the Internet. It could download our MusML file and play it as it can play MIDI files today.

A Visual FoxPro example

Together with Rick Strahl, I developed a tool called Visual WebBuilder. It can be used for web development, and it natively supports XML output. When we created that XML output, we also discovered that XML is extremely useful in regular Windows applications.

Visual WebBuilder has a “toolbox” that allows dropping items in a web component or HTML code in HTML pages. Here’s how this looks like:

 

Originally, this toolbox was hard-coded, since we only had a small number of items, but eventually we faced the challenge of making the toolbox customizable. At first, we experimented with INI files, but those were very hard to parse, so we decided to define our toolboxes in XML. Here’s how such an XML file is organized:

This isn’t the entire toolbox definition file (TBX) since it would have been too long, but I think you get the idea. The meaning of the tags is rather obvious. There are various groups of tools. Each tool in a group has an ID that identifies what kind of tool it is. Additional tags then specify the caption and the icon for the tool.

The toolbox reads the TBX file and displays its contents. Essentially, the toolbox is a very simple XML browser.

As mentioned before, the toolbox can be customized. Here’s an excerpt from another TBX file that inserts a standard HTML phrase into an XML document.

This specifies the HTML text that provides a link to the EPS homepage. The “target” attribute in the <TOOL…> tag specifies that this tool is meant to go into the current HTML file (rather then building a new component). The <HTML> tag specifies the HTML code that will be inserted in the text. In this case it’s a <A> tag and some text. We also need to make sure that the XML parser understands that this part of the document is some regular text and not actual XML that should be interpreted. We do this using the <![DATA[ keyword. Everything between <![DATA[ and ]]> will be interpreted as regular text.

A data example

XML can not only be used to define new markup languages. It may also be used to define file formats. Microsoft uses XML as the file format for Office 2000. For us Visual FoxPro and database programmers, XML is important as a data interchange format. We can easily convert a table into XML. Here’s an example for a simple table:

This table has two records with 3 fields each. This file can be easily created in Visual FoxPro, to be published on the web. A file like this can also be downloaded and parsed by Visual FoxPro (or other tools and browsers) extremely easily (see below).

XML Applications

XML applications are applications of the XML standard. Our MusML language would be an XML application. XML applications are to be seen as the definition of standards. They should not be confused with Windows applications (programs).

The most famous XML application would be HTML. Here are a couple of other examples:

XSL – Extended Style sheet Language

XSL is an application that was defined to convert XML into HTML. In theory, it may be used with every XML file. However, it doesn’t always make sense. It is great to convert our database example into HTML, but it wouldn’t make sense to convert our MusML example.

XLL or XLink – XML Linking Language

XLL is a standard for linking documents and content. It’s capabilities go way beyond the capabilities in regular HTML.

MathML – Mathematical Markup Language

MathML is designed to publish documents that contain mathematical formulas that can not be rendered in HTML.

CDF – Channel Definition Format

CDF is the format Microsoft uses for their web channels. This was one of the first applications of XML.

Building a valid document

XML documents have to follow certain rules in order to be considered “valid”. Unless a document is valid, XML parsers and browsers will refuse to load the document. XML documents are relatively easy to write, even though they are not quite as forgiving as HTML. The main difference is that XML is case sensitive.

Content

Content is – you guessed it – the actual document content. So far, all the examples I used were content only. “Content” are the actual tags the document is composed of. The counterpart to the content would be the DTD (see below).

There are a handful of basic rules regarding content. The content has to have one root tag. The root tag is the very first tag in the content. All the other tags have to be child tags. This means that they have to be inside the root tag, like so:

The name of the root tag doesn’t matter. You can create as many child tags as you want.

Every tag needs to have an end tag like so:

In some special cases, you may want to use a single tag. In HTML, this is quite common. Here are some examples:

In XML, this isn’t valid. Empty tags (that’s how single tags are called in XML) have to have a special end character to indicate to the parser that there isn’t an end tag:

As mentioned above, you can specify as many child tags as you want. However, you have to make sure a proper hierarchy is maintained. The following structure would be perfectly valid in HTML, but not in XML:

As you can see, the hierarchy is broken. Here’s how it had to be in XML:

White space in between tags is insignificant. So this would be the same as the example above:

However, white space in the tag-text is significant, since it may be important data. So the following two lines are essentially different (unlike in HTML where they are the same):

You can adjust the way XML treats white space by redefining the defaults in the DTD, but this would be beyond the scope of these session notes.

Every tag may have an infinite number of attributes. An attribute is an additional setting, almost like a parameter. Here’s an example:

Unlike in HTML, the attribute value always needs to be enclosed by double-quotes.

You can use attributes for everything you want, but there are some guidelines you may want to follow. When working with data, attributes should only be used for meta data (data about data that is). Here’s an example:

Working with XML files

Using XML files is relatively simple. What you need is an “XML parser”. This is a piece of software that allows loading XML documents and analyzing the content. There are different parsers available. If you have IE4 or IE5, you also have MSXML, a simple yet effective XML parser. A newer version of MSXML is Microsoft.XMLDOM. This parser is more powerful than MSXML. It is also compliant to the XMLDOM standard. It is perfect for use in Visual FoxPro. It can also be used in web pages, but there are other parsers that are more efficient for web pages.

Microsoft.XMLDOM

Here’s how you instantiate Microsoft.XMLDOM in Visual FoxPro:

 

Now, we can navigate to a certain document:

If the referenced file exists and it was well formed, it is loaded now, so we can start analyzing it. Here’s how a MusML file can be analyzed:

As you can see, parsing an XML document is extremely simple and straight forward, even if you have to load a document across the web. Here’s a little routing that loads a MusML file over the web and puts it in a VFP database:

Who would have thought loading data over the web from within Visual FoxPro would be that simple?

Java Parsers

XML data is great to be loaded into existing HTML documents. You can load a HTML page that doesn’t contain any data. Then, when you need it, you can load required data and paste it into the page using DHTML features. On the web, Java applets are the preferred way of doing this. Here’s how those applets can be referenced in a HTML page:

This downloads the rudolph.musml file from my web page. When it is completely downloaded, the script function PlaySong() is invoked. This method can then go ahead and evaluate the downloaded data and paste it into the current page (or play the song, or whatever…).

Advanced XML Development

The Power of XML

The XML technology is very powerful thanks to its flexibility and simplicity as well as the lack of limitations. There are a number of advanced techniques that come along with the basic definition of the XML standard. In addition, there are a number of technologies that evolved around the XML standard.

This document discusses fundamental techniques specified in the XML standard as well as additional tools and languages.

The Document Type Definition (DTD)

The DTD is the tool that allows designing a new markup languages. In the examples I used in the XML Introduction, we simply wrote a document and used all kinds of tags and attributes we needed. We could have added additional tags or sub-tags, and it would have still been a valid XML file, but it wouldn’t have been valid for our use. A MusML file with tags other than <SONG> or <NOTE> wouldn’t be rendered properly by the MusML browser.

With the DTD, we can specify the valid tags, the valid attributes, valid structures and valid values. Here’s a DTD example (the MusML DTD):

This DTD defines that there is a tag called “SONG”. It may or may note have an infinite number of sub-tags called “NOTE” but it can not have any other sub-tags. The “NOTE” tag is defined in the next line. We also specify the attributes the tags may have. The “SONG” tag has the attribute “Title” which is a text attribute that is required. If it isn’t specified in the content, the whole file would be invalid. There also is a second attribute called “Beat”. It defaults to “4/4”, so if this attribute isn’t specified in the content, it would look to a parser like it was there and set to “4/4”. The attribute only has 3 possible values: “4/4”, “3/4” and “6/8”. The attributes for the “NOTE” tags are defined in a similar fashion.

Here’s the entire MusML file one more time. This time, it includes the DTD:

 

The DTD may be internal as in this example, or it may be external in a separate file. It wouldn’t make a lot of sense to include the DTD in all our MusML documents. We should rather create an external DTD that’s available on the web, and reference it like so:

This way, it is much easier to maintain the DTD. When we decide to update it, all the existing files will be able to benefit from the changes. Also, internal DTDs make files unnecessary large and complex. DTDs sometimes are very long. Just imagine the HTML DTD!

BizTalk – XML Standards/ XML Framework

BizTalk is an industry initiative started by Microsoft and supported by a wide range of organizations, from technology vendors like SAP and CommerceOne to technology users like Boeing and BP/Amoco. BizTalk is not a standards body. Instead, it is a community of standards users, with the goal of driving the rapid, consistent adoption of XML to enable electronic commerce and application integration.

The BizTalk Community defines the BizTalk Framework™, a set of guidelines for how to publish schemas in XML and how to use XML messages to easily integrate software programs together in order to build rich new solutions. Their design emphasis is to leverage what you have today - your existing data models, solutions, and application infrastructure - and adapt it for electronic commerce through the use of XML.

Through the Http://www.BizTalk.org Web site you can locate, manage, learn about, share information about and publish XML, XSL and information models and business processes supported by applications that support the BizTalk Framework. There's a library of XML schemas for you to review and download for use in your own applications. We even encourage you to publish your own schemas there for others to use! Membership is free, so why not sign up and start taking advantage of a public resource that is sure to keep you ahead of the game.

dFPUG c/o ISYS GmbH

Frankfurter Str. 21 b

 

D-61476 Kronberg

per Fax an:

+49-6173-950903

oder per e-Mail an:

konferenz@dfpug.de

© Texte, Grafiken und Inhalt: ISYS GmbH