Converting Word to DocBook XML
Contents
Â
Â
Word documents to XML via upCast
I used upCast to convert Word documents to upCast (Windows machine). The Kuali Foundation has a license for the software. These are the steps I took for the conversion:
- Once upCast is installed: Resources > templates > Word to DocBook > open template.
- Documentation for upCast: http://upcast.de/iloop/assets/content/products/upcast/765b1744/doc/manual/html/index.html#N2004D
- Documentation is also in the Word to DocBook folder
- Change the Catalog (under Pipeline Settings) to
${pipeline:PipelineBase}/resources/schema/catalog
- Strip the title page, TOC from the Word documents; Large documents kept failing/timing-out so I had to break them in two.
- Choose the file (even though it says rtf to DocBook v 5.0 it seemed to convert .docx files just fine).
- I didn't select any options
- Table Model: CALS
- DocBook structure: book > chapter > section
DocBook Notes
Once converted, these are the items I needed to do
- Fix hierarchy:
- Chapters are Intro, Standard Docs (overview), Maintenance Docs (overview), Appendix
- Section X.X would be each individual standard doc or maintenance doc or item in the appendix.
- Section X.X.X are sub-titles for each document: Getting Started, Process Overview, Document Layout,
- Section X.X.X.X are lower-level titles for each: Tabs, Business Rules and Routing.
- Removed TOC (table of content) description tables under chapters
- Removed extra TOC anchors (TOC_xxxxx...) from Word. Check anchors
- Put "red arrows notes" (in KFS documents) in note element; remove the red arrow images
- Put tips into tip elements
- Table titles need to have emphasis as bold
- Removed column widths on tables
- Changed "phrase role="strong"" to "emphasis role="bold""
- Added the Kuali copyright comment to each book from Rice xml files
Availability
DocBook Environment Setup - Jeff Caddel has experience with checking in and working with DocBook xml in SVN. He helped Peri to hook the DocBook maven plugin into OLE.
Operated as a Community Resource by the Open Library Foundation