Thursday, February 17, 2011

Where do you get XML file formats from

Looking at questions like http://stackoverflow.com/questions/77726/xml-or-sqlite-when-to-drop-xml-for-a-database and http://stackoverflow.com/questions/44207/what-are-good-alternative-data-formats-to-xml it is clear that XML is something you best use for exchanging data between systems, organizations, and software programs. For internal use, other formats tend to more compact and scalable, like home-made plain-text formats, regular source code, or using a database.

Once the time comes to move data out to some other program/tool/system/organization in XML: where and how do you define the data format encapsulated in the XML file? It is clear that you need strict rules for what goes where, and then meaning of all nodes in the tree. XML is not self-documenting, that much seems clear.

Do you use somebody else's design, or roll your own? Is there some good guide to how to do the document designs?

/jakob

From stackoverflow
  • How I handle it:

    • One element wrapping the whole thing (XML standard)
    • One element within that for each "object".
    • Attributes on the element for each Property, when they are a reasonable length.
    • The InnerXml text of that element for a property with lots of text (provided there is just one)
    • Elements nested within the element if there are several properties with a lot of text.

    (sorry, the editor just will not let me post XML. The preview and the submitted versions don't even match)

    If you write a sample version of the XML file in VisualStudio, there's a toolbar button to click which will have it automatically create a xsd schema for that file. Add a "targetNamespace" attribute to the XSD, and copy it to C:\Program Files\Microsoft Visual Studio 8\Common7\Packages\schemas\xml (varying slightly by VS release).

    Then add a xmlns="{your namespace}" attribute to your root element, and VS will give it Intellisense for your tags.

  • You can define the structure and data types and restrictions on XML documents using XML Schema (http://www.w3.org/TR/xmlschema-0/)

    It's a good idea to write a proper schema for all XML documents you use although it is a pain at times! Otherwise you have no guarentee that the XML you write out is the same as the XML that another (or the same!) program is expecting.

    Personally I find XML Schema makes my head hurt but the principle is good and I wouldn't use an XML format that didn't have the scema properly defined if I could avoid it.

  • It might depend on what you are trying to represent. If a standard format (i.e. a schema) exist already then there are advantages to using the standard. So there is Music representation, chemistry, chess and the list is very long. If you are doing something that is not covered by any existing standard then role your own.

    I agree with John that you should always create a schema. You just never know how far your application is going to go.

  • Actually, XML is self-documenting, when used with a DTD or an XSD, and that is one of its advantages when compared to plain-text files.

    As you've guessed, though, using XML to transfer data properly does require some work. I found the XML tutorial and XSD tutorial at w3schools.com to be helpful. If you go down this road, you may be interested in their tutorial on XSLT as well.

    At first, as Vincent suggests, you may want to use an existing schema if one is available. On the other hand, you might find it easier to learn XSD if you build your own (probably using an existing one as a starting point).

  • Consider using a more specialized form of XML called RDF or Resource Description Framework. RDF models everything as a series of triplets; subject, predicate, object. What you end up with is a fairly flat XML with references to handle hierarchies of arbitrary depth.

    RDF makes heavy use of name spacing. The downside to name spacing is that it makes the XML look a little more busy so it is a little harder to read by humans. The upside to name spacing is that it makes your schema easier to blend with other schemas, thus becoming more extensible.

    You can use a standard XML parser on RDF or use one of the many open source libraries that do RDF specific parsing and call into your code SAX style with that subject, predicate, and object.

    If that sounds like a good idea to you, then this site is a directory of published RDF schemas that might prove to be useful. Even if you don't find a perfect match, you might find some schemas that are close and point out some issues with how you are modeling your own data. Even if you don't go with RDF, it's most probably worth spending a little time researching this site. If you don't find what you want and end up creating your own RDF Schema, then consider publishing it on this site.

  • Any XML document that's going to be shared with the outside world should have a schema. Using a schema gives you a simple and deterministic way of rejecting XML documents that your process can't understand.

    Since it's deterministic, it also gives people trying to interoperate with your systems a similarly simple way of knowing that the XML they're producing meets your basic requirements.

    You can also use the annotation elements in your schema to document business rules that a schema can't represent. These rules aren't automatically enforced, of course, but the schema is generally a sensible place to put them. (It's also straightforward to write XSLT that produces human-readable HTML documentation off of your annotated schema.)

    Authoring schemas is a lot easier if you have a good tool. If you write a lot of schemas, XML Spy pays for itself - and XML Spy is shockingly expensive.

    I've been doing systems integration work for 20 years, and XML Schema is a godsend. My rule of thumb is that every schema I write, even for the simplest interface, eliminates at least two meetings and five phone calls. It also eliminates an entire class of defects from the finished system.

0 comments:

Post a Comment