Converting Word to DocBook XML

Contents
 

 

Word documents to XML via upCast

I used upCast to convert Word documents to upCast (Windows machine).  The Kuali Foundation has a license for the software.  These are the steps I took for the conversion:

  1. Once upCast is installed: Resources > templates > Word to DocBook > open template.
    1. Documentation for upCast: http://upcast.de/iloop/assets/content/products/upcast/765b1744/doc/manual/html/index.html#N2004D
    2. Documentation is also in the Word to DocBook folder
  2. Change the Catalog (under Pipeline Settings) to ${pipeline:PipelineBase}/resources/schema/catalog
  3. Strip the title page, TOC from the Word documents; Large documents kept failing/timing-out so I had to break them in two.
  4. Choose the file (even though it says rtf to DocBook v 5.0 it seemed to convert .docx files just fine).
  5. I didn't select any options
  6. Table Model: CALS
  7. DocBook structure: book > chapter > section

DocBook Notes

Once converted, these are the items I needed to do

  • Fix hierarchy:
    1. Chapters are Intro, Standard Docs (overview), Maintenance Docs (overview), Appendix
    2. Section X.X would be each individual standard doc or maintenance doc or item in the appendix.
    3. Section X.X.X are sub-titles for each document: Getting Started, Process Overview, Document Layout,
    4. Section X.X.X.X are lower-level titles for each: Tabs, Business Rules and Routing.
  • Removed TOC (table of content) description tables under chapters
  • Removed extra TOC anchors (TOC_xxxxx...) from Word. Check anchors
  • Put "red arrows notes" (in KFS documents) in note element; remove the red arrow images
  • Put tips into tip elements
  • Table titles need to have emphasis as bold
  • Removed column widths on tables
  • Changed "phrase role="strong"" to "emphasis role="bold""
  • Added the Kuali copyright comment to each book from Rice xml files

Availability

DocBook Environment Setup - Jeff Caddel has experience with checking in and working with DocBook xml in SVN.  He helped Peri to hook the DocBook maven plugin into OLE.

Operated as a Community Resource by the Open Library Foundation