Word documents to XML via upCast
Following Rice's example, using I used upCast to convert Word documents to upCast (Windows machine). The Foundation has a license for the software. These are the steps I took for the conversion:
- Once upCast is installed: Resources > templates > Word to DocBook > open template.
- Documentation for upCast: http://upcast.de/iloop/assets/content/products/upcast/765b1744/doc/manual/html/index.html#N2004D
- Documentation is also in the Word to DocBook folder
- Change the Catalog (under Pipeline Settings) to
${pipeline:PipelineBase}/resources/schema/catalog
- Strip the title page, TOC from the Word documents; Large documents kept failing/timing-out so I had to break them in two.
- Choose the file (even though it says rtf to DocBook v 5.0 it seemed to convert .docx files just fine).
- I didn't select any options
- Table Model: CALS
- DocBook structure: book > chapter > section
DocBook Notes
Once converted, these are the items I needed to do
- Fix hierarchy:
- Chapters are Intro, Standard Docs (overview), Maintenance Docs (overview), Appendix
- Section X.X would be each individual standard doc or maintenance doc or item in the appendix.
- Section X.X.X are sub-titles for each document: Getting Started, Process Overview, Document Layout,
- Section X.X.X.X are lower-level titles for each: Tabs, Business Rules and Routing.
...
UPDATE: We are able to use the Rice stylesheetstyle sheet, Jeff and Peri set up the documentation to be built through Maven very similar to Rice.
Availability
Can we follow this format: DocBook Environment Setup? - Jeff Caddel has experience with checking in and working with DocBook xml in SVN. He is looking into hooking helped Peri to hook the DocBook output maven plugin into OLE.