Word documents to XML via upCast

I used upCast to convert Word documents to upCast (Windows machine).  The Foundation has a license for the software.  These are the steps I took for the conversion:

  1. Once upCast is installed: Resources > templates > Word to DocBook > open template.
    1. Documentation for upCast: http://upcast.de/iloop/assets/content/products/upcast/765b1744/doc/manual/html/index.html#N2004D
    2. Documentation is also in the Word to DocBook folder
  2. Change the Catalog (under Pipeline Settings) to ${pipeline:PipelineBase}/resources/schema/catalog
  3. Strip the title page, TOC from the Word documents; Large documents kept failing/timing-out so I had to break them in two.
  4. Choose the file (even though it says rtf to DocBook v 5.0 it seemed to convert .docx files just fine).
  5. I didn't select any options
  6. Table Model: CALS
  7. DocBook structure: book > chapter > section

DocBook Notes

Once converted, these are the items I needed to do

...

  • Removed TOC (table of content) description tables under chapters
  • Removed extra TOC anchors (TOC_xxxxx...) from Word. Check anchors
  • Put "red arrows notes" (in KFS documents) in note element; remove the red arrow images
  • Put tips into tip elements
  • Table titles need to have emphasis as bold
  • Changed Removed column widths on tables to 1.0*
  • Changed "phrase role="strong"" to "emphasis role="bold""
  • Added the Kuali copyright comment to each book from Rice xml files (kept the copyright from 0.8 docs; need to determine which/what is needed)

CSS Stylesheet

From Rice, https://svn.kuali.org/repos/rice/trunk/src/site/docbook/, did some minor modifications to their css (mostly removing non-applicable/OLE-book-breaking elements). 

OLEdocbooks.css is my work in progress.

...

Availability

DocBook Environment Setup - Jeff Caddel has experience with checking in and working with DocBook xml in SVN.  He helped Peri to hook the DocBook maven plugin into OLE.