Word documents to XML via upCast
I used upCast to convert Word documents to upCast (Windows machine). The Foundation has a license for the software. These are the steps I took for the conversion:
- Once upCast is installed: Resources > templates > Word to DocBook > open template.
- Documentation for upCast: http://upcast.de/iloop/assets/content/products/upcast/765b1744/doc/manual/html/index.html#N2004D
- Documentation is also in the Word to DocBook folder
- Change the Catalog (under Pipeline Settings) to
${pipeline:PipelineBase}/resources/schema/catalog
- Strip the title page, TOC from the Word documents; Large documents kept failing/timing-out so I had to break them in two.
- Choose the file (even though it says rtf to DocBook v 5.0 it seemed to convert .docx files just fine).
- I didn't select any options
- Table Model: CALS
- DocBook structure: book > chapter > section
DocBook Notes
Once converted, these are the items I needed to do
...
- Removed TOC (table of content) description tables under chapters
- Removed extra TOC anchors (TOC_xxxxx...) from Word. Check anchors
- Put "red arrows notes" (in KFS documents) in note element; remove the red arrow images
- Put tips into tip elements
- Table titles need to have emphasis as bold
- Changed Removed column widths on tables to 1.0*
- Changed "phrase role="strong"" to "emphasis role="bold""
- Added the Kuali copyright comment to each book from Rice xml files (kept the copyright from 0.8 docs; need to determine which/what is needed)
CSS Stylesheet
From Rice, https://svn.kuali.org/repos/rice/trunk/src/site/docbook/, did some minor modifications to their css (mostly removing non-applicable/OLE-book-breaking elements).
OLEdocbooks.css is my work in progress.
...
Availability
DocBook Environment Setup - Jeff Caddel has experience with checking in and working with DocBook xml in SVN. He helped Peri to hook the DocBook maven plugin into OLE.