Skip to main content
 
Using XML for Web Site Management: Lessons Learned Report



Chapter Two: Benefits

Reduced data duplication and content handling

Data duplication means that multiple source files exist for identical text, images, and other content that appear in multiple locations and formats across a Web site and other media. This requires that the “same” content be handled in several locations, probably by different people, and often at different times. The absence of a single source file:
  • creates greater risk of “version differences” between the duplicates,
  • requires manual tracking of all locations of the duplicated data, and
  • demands different technical skills for handling the data in its various guises (e.g., Word, HTML, database, etc.).
If content is modified in one instance, it needs to be modified in all its other occurrences, which can be an imposing task. One technical staff member characterized the process: “It’s sometimes difficult to get content up in a timely manner, again because of the multiple formats ... So there’s a big emphasis in the formatted Web pages. Then of course you have to do the full HTML document for accessibility standards. Then you have to do the PDF to actually get the full document for that. And as everybody knows, you need to make one change in one document while the other two things might not necessarily need a change. So you get multiple versions floating around [out] there all the time.” As Web sites grow, it becomes virtually impossible for any one person to remember where all of the entries are (see Figure 3).

Figure 3. Workflow in Non-XML Based Web Site
Workflow in Non-XML Based Web Site
The diagram above shows how much activity is spent passing documents back and forth while trying to keep them all consistent and up-to-date. Many of the tasks consist of manual reformatting of the content for the Web, while checking that it’s still accurate.

XML can eliminate the duplication of data because the XML file serves as the single source of the content. Its various manifestations throughout a Web site and beyond (HTML, Word, PDF, etc.) are produced via the XML stylesheets (XSL) that transform and present the XML content in the format and location desired, without modifying or duplicating the original XML source. XSL “handles” the content and produces the output, which not only eliminates the duplication of data, but also manages how that data is handled (see Figure 4 on next page). As one IT professional stated “Right now we have different Web pages for different types of documents like PDFs and different print-friendly forms and things along those lines. We have to change them in two or three places. So what we’ll do now is we’ll have the one document that will be accessed and we’ll only need to change it in that one place. So that will make for a much better environment and less work.”