Handling documents (parsing) –
You can use the DOM parser described above to handle the documents you receive from partners in XML exchanges. Using XPATH query expressions, you can easily grab one or two crucial values from the document, and discard the rest.
Be aware, however, that DOM parsers load documents into memory, so this may not be the best choice for large result sets. This is not a necessary limit of the DOM, it is just a limitation of DOM parsers currently released. (It is easy to imagine a DOM parser swapping parts of a document to and from memory, just as VFP swaps cursors to and from disk depending on their size and available resources.)
Normal VFP string-handling procedures might be more appropriate here, given our ability to handle long strings and less concern with the possibility that you might unwittingly do something “non standard” when you are parsing, rather than creating, a document. You can grab single values with an AT() search, or use low level file functions, MLINE() (don’t forget the performance-enhancing _MLINE offset!) or ALINES() to scan through an entire document efficiently, inserting rows into a cursor as you go.
VFP 7 makes parsing easier than ever, with enhancements to ALINES() and ASCAN() that may be useful to XML development, plus a new STREXTRACT(cString, cBeginDelim [,cEndDelim [,nOccurence, [nFlags]]]) function targeted at XML string handling.
In VFP 7, you might also decide to XSL-transform the document so the result was “cursor-shaped”, matching one of the formats understood by the new XMLToCursor(cExpression|cFile, [cCursorName, [nCursorType | lStructureOnly]]) function. After you ran XMLToCursor on your transformed document, it would be easy to manipulate and store the data using normal VFP methods. This could be a very efficient approach, especially for documents containing multiple rows of results.
There is a second XML standard way of approaching documents, however, beyond the DOM, called SAX (the Simple API for XML). You should consider it for document parsing, ,because it is efficient for large documents.
Until VFP 7 we haven’t had a way to use SAX within VFP. SAX is an event driven model. Your program must instantiate a class that implements the SAX interface – something we couldn’t do until VFP 7. When you tell this “document handler” to parse your document, it triggers the standard SAX events, and the code in your class runs as these events are triggered.
The code you’d write to
instantiate the parser and parse the document would look something like this:
In the code above, FoxSAXContentHandler and FoxSAXErrorHandler are my two classes implementing the standard interfaces. You’ll find my complete example in your source code for this session, as the file MISC\VFPSAX.PRG. As the comments there will tell you, I haven’t been entirely successful with SAX and VFP 7, and I can’t tell whether the problem lies with the VFP beta or the relatively new IVBSAX interfaces I’m trying to use. For the record, I wrote this code just before the September MSXML beta drop came out, and I think they have changed the SAX interfaces considerably in the latest version.
class declaration declares its interface implementation and implements the
required class members, in code that looks like this:
Note that all required events must be implemented and all required interface members declared, but you don’t have to actually use them all, you can leave a method empty if you don’t need to trigger any code at the event represented by that method.
If you need only a few bits from the center of the document, or if you have to do some calculations based on some parts of the document before you can decide how to treat other parts, SAX is not a good choice. For one thing, you usually can’t guarantee the order of the nodes in an XML document.
But, in other cases, you can see that this approach is quite exciting and has a lot of potential. It is best when your use of the document will work well with a steady , one pass “read” through the entire document, since that is what the SAX parser does.
Document transformations and exchanges through XSLT
Once you can create and parse XML documents, you get to the crux of the problem: how do people share these documents? If everybody creates their own internal documents to fit their own processes, as I’ve recommended, what happens then?
Throughout this paper I’ve been saying “you have a document and then you transform it using XSLT to meet somebody else’s needs” or “you receive a document and transform it so it matches your requirements”. What exactly does the transformation do, why do I think it’s a good idea, and how does it work?
The “why” part is probably easiest for me to answer:
The truth is that people will create their own formats, and they will not cooperate on one standard format. There are many reasons for this. But, whatever the reasons, even when you are dealing with simple row-and-column-shaped data, people will not agree on what how that data should be represented.
Oracle will require one root tag and one row structure to do an INSERT into their tables, and their SELECT statement will likewise generate one sort of XML. SQL Server might do something similar, but they will not even use the same format as ADO, let alone Oracle! Siebel will create something called an XML representation of an “business object” (where the “business object” is what VFP programmers think of as an “updateable view”), and this document format will, likewise, be required and nothing like the other two.
So… get used to it. You’re going to use XSLT to map between XML exchange partner formats, even when they are doing exactly the same kind of job (showing rows and columns in a table). When they are doing something more complicated and more specialized than showing rows and columns, the mapping gets a little more complicated to do but it is just as necessary, if not more.
The “what” deserves a paper as long as what I’ve already written (!)… and the “how” might take another paper that long.
I’m going to give you an overview by walking you through an XML exchange process, and then get down to specifics.
The case I’d direct your attention to is the recent partnership between various large airlines, to use XML to solve the problem of ticket transfers. (This story was reported in Computerworld,, 25 September, 2000, “Airlines turn to XML to try to fix e-ticket transfer problems”, by Michael Meehan.) Here’s a statement of the problem and the use of XML to resolve the problem:
Currently, passengers who have electronic tickets have to wait in line to receive a paper ticket from their initial airline if a flight has been canceled and they want to try to switch to another carrier. In addition, airline employees must fill out a handwritten "flight interruption manifest" for each ticketholder who's looking to rebook elsewhere.
But with an industry-standard setup based on XML, Young said, a passenger's electronic ticket could automatically be transferred to another airline's system. The common XML technology would provide an easy-to-process format for all the airlines and could make electronic tickets more valuable than paper ones, he added.
I think all of us can sympathize with the airlines’ desire to better this process, to make the growing numbers of flight cancellations and overbookings easier to handle, for everybody concerned. Let’s walk through what happens now, and what will happen with XML, to see how things are going to work in the new, improved system.
As you see in the quotation above, currently a ticket agent reads an electronic ticket record and either manually or through his/her system translates that e-ticket into a paper ticket. The passenger then takes the paper ticket to another ticket agent in another airline, with whom the passenger hopes to get a seat. The second agent reads the paper ticket and fills out a new record based on the contents of the old one, and issues a new ticket booking the passenger on the second airline.
Here’s the critical bit: the old booking record and the new booking record do not have the same format. The two airlines don’t keep their records the same way. Luckily the agents have read each other’s tickets so many times that the experienced ones are very good at this. They easily transfer the information from the right boxes on one record to the equivalent boxes in the second record. Where necessary, they translate between currencies or timezones or languages, and they squeeze the contents of two boxes together into one field, or break up the contents of one box into two entries, until everything fits their system. (The inexperienced ones tear up three new tickets before they get it right… lines get longer and more flights get missed…)
When you change over this system to XML, the agents no longer have to write out the tickets, which is a good thing. But the two airlines still don’t share the same system (and have no intention of sharing the same record-keeping systems, for many reasons!).
That’s the “why” of XSLT, as you can easily see: it enables one system to be mapped to another. Leaving aside “what” XSLT does to accomplish the translation for a moment, here’s the “how”: a systems developer sits down with one or more experienced agents and learns how these agents convert the contents of one form to the next. Once this process is recorded, no agent ever has to do it again, with five people shouting at them, two new trainees, and somebody’s baggage all over the floor.
The process by which a developer learns a manual system and transfers it to an automatic one should be a familiar process to you. You are all experienced at observing manual procedures and putting them into a program.
The difference between placing a transformation into a program and into an XSLT document is that XSLT is a declarative syntax, not a procedural syntax. You specify what mappings you want to occur, and you can use logic to do so, but you don’t write any code that emits any text. In other words, you don’t write the code that tells the XSLT processor how to do the transformation. In fact, although XSLT processors are based on XML parsers, you aren’t even supposed to think about whether your XSLT processor uses a SAX model parser or a DOM model parser to do its job. The XSLT specifies only the results you want, including any conditional logic, but not how the results are created.
You could, indeed, write DOM handling code or SAX event code to handle the mapping problem instead of using XSLT. But you’d write a lot of code and if even one box on a form or one use of a column changed your logic might be incorrect. In addition, when your disgruntled airline passenger took his paper ticket to a third airline, your logic for handling the mapping between airline 1 and airline 3 would be entirely different.
With XSLT, you don’t change any logic in your program. You don’t recompile
anything in your application when changes occur, or when you add another
partner to the exchange. You make XSLT stylesheets available, specify which
ones go with which translations, and you apply these translations with simple
lines of code that do not change. For example, with the Microsoft parser, your
code for the transformation might look like this:
As you can see above, you have two instances of the Microsoft parser loaded with two XML documents, one the source XML document you wish to transform and the other the stylesheet. Yes, XSLT stylesheets are written in XML. You can manipulate them with the DOM, change parameters in them at runtime using the DOM, like any other XML document.
You may want a quick, command line or Windows shell method of testing transformations, especially if you’re writing your XML and XSL in an editor like Notepad and have no built-in way of associating the transformation and applying it. Here’s a VBS script (which you can also call from a .BAT file using CSHELL.EXE) I use to do this:
I hesitate to write a lot of XSL examples here, for several reasons. First, the version of the XSLT processors that most of you have available is the one Microsoft published before the standard became available, and has a lot of deficiencies and non-standard syntax.
To make sure that you are writing and testing standard XSL, I suggest you test with at least one parser besides Microsoft’s. My choice is SAXON, written by Michael Kay, one of the authors of the XSLT standard. You can use SAXON on the command line to do transformations. You can download SAXON or “Instant SAXON for Windows”, which is just the interpreter without source, at http://users.iclway.co.uk/mhkay/saxon/ .
If you want to start using XSL with the Microsoft parser, you should download the MSXML Technical Preview from Microsoft’s MSDN site. Their more recent versions are far more standards-compliant than the one they released with IE. Be sure to install the MSXML files in replace mode, following the included instructions, or else remember to switch back and forth between the versions. If you don’t, IE and other default invocations of MSXML will keep using the old version.
You can load two separate XSLT transformation engines within the XML SPY interface (use the Edit Settings dialog, on the XSL tab). I usually keep IE loaded along with either SAXON or Oracle’s java-based processor, and I try to make sure my transformations work in both. If I have any doubts, I go with SAXON’s results as indicating a definitive (standards) ruling.
Second, although XSLT is XML, it is extremely unusual looking XML and tends to look quite alarming unless you have a particular goal in mind and can understand the XSLT you’re looking at in relation to that goal. There is no such thing as a “typical” translation document, in my experience. I’ve included one short and somewhat frivolous example of XSLT in the sample application I discuss in the next section (you’ll find it in the ASP\SUPPORT directory of the source for this session). Time permitting, we will go over several examples of non-trival XSLT syntax in detail during the session.
Putting it all together
As part of your session notes, in the \ASP directories, I’ve included a tiny but complete ASP application, using a VFP COM component where some data manipulation takes place. This COM server, which you’ll find supplied with all source in the \COM directory, is basically an “all purpose” VFP server that exposes the Application property and its crucial methods. In this version, a subclass augments the base DataToClip method to be able to provide XML along with its standard data formats. I use an instance of FRX2XML to create the XML within this augmented method.
Don’t be too concerned about the fact that it is an ASP application, because it is the structure of this application rather than its environment that should drive home the point I wanted to make. The COM component represents “the stuff that VFP does really well” and that we want to pass in to VFP to do. Although my little all-purpose VFP server won’t be anything like your implementation of the real life component it represents, keep in mind that a minimum of COM calls is a good idea, whether the caller is a web server or not. Replace this VFP component with some VFP component that accepts and returns XML instructions, and you’ve got it.
The external part of this application, here written in VBScript-ASP, represents “the stuff that is required for an XML-enabled application”. Some of it might be done in VFP in your case, rather than externally as I show here, but I wanted to make sure you could see all the XML-related pieces spread out in script code, while the “standard” VFP processing parts remained hidden behind the COM object.
What are the external pieces?
That’s it. You have configuration options, you have localization message strings, you have some standard chores such as figuring out what transforms are appropriate to your current action and current options, and actually performing the transformation, you have a set of XSL files, and you have a “face” of the application, some “main” routine, which accepts requests from the outside and returns responses to the outside.
As far as I am concerned, most, if not all, of these exposed pieces can be done in VFP rather than externally as shown here. You might certainly decide to instantiate your DOM parser objects and do the transforms within the VFP component. You could evaluate which XSLT transforms are appropriate for the current action, either inside or outside the VFP component. In this case, the deciding factor probably will be: Where can you best cache objects that do the work, even if they would the same objects (in this case, MSXML parser instances) in each implementation?
This example happens to be about ASP, and hence a web server handling the XML exchange, although once again you shouldn’t assume that this sort of exchange is only “important” over the Internet.
Within the Internet space, I just want to point out that there are additional differences of opinion about “where some of the work should be done”, beyond the server-side application portioning we’re discussion here. . The division of labor to be performed isn’t only “which application component on the server does what job”, it’s also “what does the client do versus what do the various serving tiers do”.
Microsoft, as usual in favor of relatively heavy clients, often indicates that you should hand over a reference to the XSL spreadsheet within the body of the XML document and then send that document to the client, so that the client can do the transform. I disagree. They like this approach because it lessens the burden on the server, but it also assumes a level of familiarity with XSL and a capability on the part of the client that is not a wise assumption (unless you want everybody in the world using IE as a client <g>!). It is much better, in my opinion, to do the transform on the server where you have control over it and so that you can serve all potential clients equally well. You can do the transform efficiently to maximize server performance – this is like anything else.
The important thing you should notice, when reviewing this application, is how little of it there is. It’s just a thin shell around the work you already know and do well (represented by the “VFPAllPurpose” server in this implementation).
Among these external pieces, you’ve also already seen that some work can be done by native VFP code, such as string manipulation, as well as by COM components designed expressly to work with XML. Your goals should be to use each tool where it is optimizable. For example, your VFP code can sort and calculate output before translating that output to its XML response format. You could also go to the XML response format and then ask your XSLT transformation to handle the sorting and calculating chores – but VFP will do it faster. (XSLT even has indexing and lookup capabilities – but whose version of these features do you think you should use?) On the other hand, when you’re ready to prepare an HTML version of your XML response, this is something XSLT handles far more elegantly than you will in FoxPro code, in my opinion.
One other criterion you might want to use when deciding which components do the work: What do you know well, what do you not know well? The answer to this doesn’t always point in the same direction that you might think. Because you know FoxPro well, and you do not know the DOM well, you might expect me to recommend manual document creation using VFP code. However, as you’ve seen, I recommend you let the parser take care of document manipulation to avoid errors, even though it means you have to learn parser syntax.
The parser knows what a valid document and a well-formed document look like, better than your code, and will not make mistakes. The goal of delivering valid and well-formed XML as your application’s responses is so crucial to an XML-enabled application that this is my highest concern. It’s this goal, faithfully pursued, that makes everything else run smoothly.
Conclusion (why we will still be here)
I started this paper with a bow to my friend John Alden, and I’ll close it by fulfilling a promise to mention another friend, Kevin Jamieson. Kevin is a young, but happily shining IT professional. I’ve known him since he and my son Josh were 5 years old and, when he asks a question, it’s generally a good one.
Kevin has become something of a Luddite, even though he works with high tech equipment all day. He has bought a manual typewriter to record his important (trans: non-work) thoughts. He asked me to ask, and think about, “why anybody would ever type XML on a typewriter”. Kevin says if XML were really good we’d want it everywhere.
I promised I would record Kevin’s question in this paper. I don’t really have an answer for you, or for him, about why we’d type XML on a typewriter. Unlike Kevin’s physical journal pages, XML isn’t really a product or an end result, in itself. It just provides a conduit – both for data exchange and data presentation – to more products and end results than anything else I can imagine. Since it’s not an end product, since it requires some application or device to extract meaning from it and apply format to it… it’s hard to imagine XML existing outside the world of electronic devices and processing power.
But, within that world, it has so much to offer! I expect, for as long as I work with computers in the future, I’ll be working with XML. When I’m using VFP to extract the meaning and apply the format to XML, I know I’m working with two well-matched technologies, and one of the most creative partnerships that the world of computers can offer any developer today.