XML Convert 2.2
Overview of XML Convert and XFlat
This overview is organized as follows:
Companies have started using XML to send application data to browsers and to business applications. XML is well suited for the interchange of data, since XML documents are self-describing, easily parsed and can represent complex data structures. Also, there is a wide variety of high-quality, inexpensive tools for parsing and transforming XML documents. When using XML for data interchange, ideally, the sending application will be able to export an XML document, and the receiving application will be able to import an XML document. Unfortunately, many legacy applications use flat files to import or export data. So, companies will need to convert flat files into XML documents when sending data to XML-capable applications. Likewise, companies will need to convert XML documents into flat files that can be imported into legacy applications.
Flat files contain machine-readable data that is typically encoded as printable characters. A flat file usually contains a series of records (or lines), where each record is a sequence of fields. A field contains an atomic piece of data (e.g., a postal code).
Let's look at a simple flat file containing employee data. The file contains one or more employee records. Each record contains the following three fields:
The following are the contents of the employees flat file:
123456789,"Carr, Lisa",100000.00 444556666,"Barr, Clark",87000.00 777227878,"Parr, Jack",123000.00 998877665,"Charr, Lee",123000.00
Each record contains information about one employee. The format of the flat file is Comma Separated Value (CSV), which means that each record is terminated by the operating system's line separator and the fields within a record are separated by a comma. In addition, a field value may be enclosed in quotes, which escape any commas or line terminator characters that appear within the field value. Note that the quotes that surround the field value are not actually part of the field value. Also, if a field value contains a quote character, then the field value must be surrounded by quotes and the quote character in the field value must escaped by prefixing it with an additional quote.
Let's look at a more complicated flat file, the structure of which is similar to the structure of a Windows Configuration Settings file (e.g., an INI file, such as win.ini). The flat file contains a list of contacts. The following are the contents of the contacts flat file:
[contact] name=Nancy Magill email=lil.magill@blackmountainhills.com phone=(100) 555-9328 [contact] email=molly.jones@oblada.com name=Molly Jones [contact] phone=(200) 555-3249 name=Penny Lane email=plane@bluesuburbanskies.com
Each contact consists of a begin contact record followed by three optional records. The begin contact record consists of the string "[contact]". The three optional records, which can appear in any order, are as follows:
Each record in the flat file is a line that is terminated with the operating system's line separator.
You might be wondering why these two files are considered "flat". The term "flat" means that the file is not indexed. The term also implies that a flat file does not have a hierarchical structure; however, many flat files do have a hierarchical structure. Even simple flat files, such as the employees file above, contain a sequence of records where each record contains a sequence of fields. Many flat files, such as those used to exchange insurance claims, have more complicated data structures, such as multiple record types, groups of records, nested groups of records, repeating groups, etc.
Flat files are commonly used to transfer data between applications, since many business applications (e.g., CRM systems, ERP systems, EDI translators, legacy applications, etc.) use flat files to import and export data. For example, when a company receives an EDI invoice from a vendor, it will use an EDI translator to convert the invoice data from the EDI data format (e.g., X12) into the data format required by the accounts payable system. The EDI translator will typically produce a flat file containing the converted invoice data. The accounts payable system then imports this flat file.
In the future, many business applications will be able to import and export XML. Until then, there will be a need for conversion tools that can convert complex flat files into XML documents, and vice versa.
Companies will need to convert flat files to XML when transferring data from legacy applications to XML-capable applications (e.g., an ERP system, Microsoft's Internet Explorer, etc.). Companies will also convert flat files to XML when they need to display the flat file data on a non-XML-capable browser, since it is easy to convert XML to HTML using XSLT.
Companies will need to convert XML into flat files when transferring data from XML-capable systems to a legacy system.
Conversion between flat file data and XML can be done via generic conversion tools (e.g., XML Convert) or custom scripts (e.g., a Perl script). Generic conversion tools are schema-driven, so that they can handle a wide range of flat file formats. Such a conversion tool uses the schema of the flat file to parse the file and convert it to an XML document. The conversion tool also needs the flat file schema when converting an XML document into a flat file that conforms to the flat file schema.
XFlat is an XML language for defining flat file schemas. An XFlat schema is an XML document that conforms to the XFlat language and that describes the format of a class of flat files. An XFlat schema defines the structure and syntax of a class of flat files that contain non-XML data. An XFlat schema also defines the structure and syntax of a class of XFlat instances. An XFlat instance is an XML document whose structure is the same as a flat file and whose data is the same as the data in a flat file. In other words, an XFlat schema describes the structure of a class of flat files and the corresponding class of XFlat instances. XML Convert uses XFlat schemas to transform flat files into XFlat instances and vice versa.
The flat file that is described by an XFlat schema must consist of records, where each record is a sequence of fields. A field is an atomic piece of data (e.g., a postal code). Records and fields may be delimited. A record separator (i.e., delimiter) occurs at the end of a record and helps the parser determine where the record ends. Likewise, a field separator occurs at the end of a field and helps a parser to determine where a field ends. Fields that are not delimited must meet one or both of the following constraints:
The records may be grouped, and groups of records may be nested in a hierarchical structure (in other words, groups of records may contain subgroups). Note that XFlat supports nested data structures, but it does not support recursive data structures.
The XFlat element types are as follows:
An XFlat schema contains all the information needed to convert a flat file to XML (or vice versa). The MapToXml attribute in the XFlat language allows you to map each group, record and field to an XML element or to nothing. A field can also be mapped to an XML attribute.
Note that XFlat is a declarative language. A non-programmer who is familiar with flat files can create an XFlat schema.
For more information about XFlat (e.g., the definitions of the XML elements and attributes in the XFlat language), please refer to the XFlat Language page.
Let's look at the XFlat schema for the employees flat file. The contents of that flat file were as follows:
123456789,"Carr, Lisa",100000.00 444556666,"Barr, Clark",87000.00 777227878,"Parr, Jack",123000.00 998877665,"Charr, Lee",92000.00
The following XFlat schema describes the layout of the employees flat file:
<?xml version='1.0'?> <XFlat Name="employees_schema" Description="Schema for CSV flat file"> <SequenceDef Name="employees" Description="employees flat file"> <RecordDef Name="employee" FieldSep="," RecSep="\N" MaxOccur="0"> <FieldDef Name="ssn" NullAllowed="No" MinFieldLength="9" MaxFieldLength="11" DataType="Integer" MinValue="0" QuotedValue="Yes"/> <FieldDef Name="name" NullAllowed="No" QuotedValue="Yes"/> <FieldDef Name="salary" NullAllowed="No" DataType="Float" MinValue="0" QuotedValue="Yes"/> </RecordDef> </SequenceDef> </XFlat>
Please note the following about this XFlat schema:
Now let's look at the XFlat schema for the contacts flat file. The following were the contents of that flat file:
[contact] name=Nancy Magill email=lil.magill@blackmountainhills.com phone=(100) 555-9328 [contact] email=molly.jones@oblada.com name=Molly Jones [contact] phone=(200) 555-3249 name=Penny Lane email=plane@bluesuburbanskies.com
The following XFlat schema describes the layout of the contacts flat file:
<?xml version='1.0'?> <XFlat Name="contacts_schema" Description="Schema for contacts flat file"> <SequenceDef Name="contacts" Description="Contacts flat file"> <SequenceDef Name="contact" MinOccur="0" MaxOccur="0"> <RecordDef Name="begin_contact" MapToXml="No" RecSep="\N"> <FieldDef Name="label" ValidValue="[contact]" NullAllowed="No" MapToXml="No"/> </RecordDef> <ChoiceDef Name="choice_of_one" MapToXml="No" MinOccur="0" MaxOccur="3"> <RecordDef Name="full_name" RecSep="\N" MapToXml="No"> <FieldDef Name="label" ValidValue="name=" NullAllowed="No" MapToXml="No"/> <FieldDef Name="full_name"/> </RecordDef> <RecordDef Name="phone_num" RecSep="\N" MapToXml="No"> <FieldDef Name="label" ValidValue="phone=" NullAllowed="No" MapToXml="No"/> <FieldDef Name="phone_number"/> </RecordDef> <RecordDef Name="email" RecSep="\N" MapToXml="No"> <FieldDef Name="label" ValidValue="email=" NullAllowed="No" MapToXml="No"/> <FieldDef Name="email_address"/> </RecordDef> </ChoiceDef> </SequenceDef> </SequenceDef> </XFlat>
Please note the following about this XFlat schema:
XML Convert 2.2 is a Java application that uses XFlat schemas to convert flat files into XML, and vice versa.
The key features of XML Convert 2.2 include:
When XML Convert transforms a flat file to an XML document (i.e., an XFlat instance), it will verify the structure of the flat file data and the data types of the fields using the XFlat schema. If the flat file does not pass this verification, then it is rejected. This verification minimizes the chance that an invalid XML document will be sent to the receiving application.
Likewise, when XML Convert transforms an XML document to a flat file, it will verify that the XML document conforms with the XFlat schema. This verification minimizes the chance that an invalid flat file will be imported into a business application.
Using the XFlat schema for the employees flat file (see above), XML Convert would convert the employees flat file into the following XML document (i.e., XFlat instance):
<?xml version='1.0'?> <employees> <employee> <ssn>123456789</ssn> <name>Carr, Lisa</name> <salary>100000.00</salary> </employee> <employee> <ssn>444556666</ssn> <name>Barr, Clark</name> <salary>87000.00</salary> </employee> <employee> <ssn>777227878</ssn> <name>Parr, Jack</name> <salary>123000.00</salary> </employee> <employee> <ssn>998877665</ssn> <name>Charr, Lee</name> <salary>92000.00</salary> </employee> </employees>
In the reverse direction, using the same XFlat schema, XML Convert would convert this XFlat instance back into the original employees flat file.
Using the XFlat schema for the contacts flat file (see above), XML Convert would convert the contacts flat file into the following XML document:
<?xml version='1.0'?> <contacts> <contact> <full_name>Nancy Magill</full_name> <email_address>lil.magill@blackmountainhills.com</email_address> <phone_number>(100) 555-9328</phone_number> </contact> <contact> <email_address>molly.jones@oblada.com</email_address> <full_name>Molly Jones</full_name> </contact> <contact> <phone_number>(200) 555-3249</phone_number> <full_name>Penny Lane</full_name> <email_address>plane@bluesuburbanskies.com</email_address> </contact> </contacts>
In the reverse direction, using the same XFlat schema, XML Convert would convert this XFlat instance back into the original contacts flat file.
After converting a flat file into XML using XML Convert, it may be necessary to transform the resulting XML document (i.e., the XFlat instance) before sending it to the receiving application. For example, if the resulting XFlat instance will be sent to a browser that does not support XML, then the XFlat instance should be converted from XML to HTML using an XSLT processor. If the output will be sent to an XML-capable application, then it may be necessary to use an XSLT processor to convert the XFlat instance into a new XML document whose structure meets the requirements of the receiving application. (Note that the output of the XSLT processor can be an XML document, an HTML document or text.)
If the resulting XFlat instance will be sent to an XML-capable browser, then the XFlat instance can specify a stylesheet, so that the browser renders the XML document as a nicely formatted web page.
When converting an XML document into a flat file, the XML document might not have the same structure as the target flat file. In this case, the user can use an XSLT processor to convert the XML document into an XFlat instance (i.e., an XML document that complies with the XFlat schema that describes the format of the target flat file). The user would then employ XML Convert to transform the XFlat instance into a flat file. XML Convert uses an XFlat schema to parse the XFlat instance and produce the target flat file.
If you plan to use an XSLT processor to transform the output of XML Convert into a new XML document, then keep in mind most XSLT processors read the entire source document into memory.
Please note that XML Convert and the XFlat language do not provide any XML to XML transformation features, since XSLT can be used to do XML to XML transformation.
Also note that an XSLT processor can convert an XML document into non-XML text, without any help from XML Convert. Thus, you could use an XSLT processor without XML Convert to transform an XML document into a flat file. However, it would be tedious to write an XSLT stylesheet that rejects an input document that cannot be transformed into a valid flat file. It's important to reject a source document that cannot be transformed into a valid flat file, so that the receiving application does not import an invalid flat file. XML Convert rejects the input data file when it does not conform to the XFlat schema. Also, for most flat files, you can write a single XFlat schema that can be used in both directions (i.e., conversion from flat file to XML, and conversion from XML to flat file).
XML Convert 2.2 is a Java application that uses XFlat schemas to convert flat files into XML and vice versa. XML Convert can also convert a flat file from one format to another. XFlat is an XML language for defining flat file schemas. XML Convert uses an XFlat schema to parse and validate the input file (i.e., the flat file or the XFlat instance), and to produce the output file. XML Convert supports a wide variety of flat file formats, including CSV, semi-structured data (e.g., human readable reports), fixed length records and fields, multiple record types, groups of records, nested groups, etc.
Copyright © 1999 - 2007 Unidex, Inc. |