| ||||||
XML Workflow for Publishers, Part - 303, 2000By Dr. Brijesh Kumar, Digital Media Initiatives Publishers are confronted with three situations -
There may be different approaches to transform backlist and frontlist titles to a futuristic XML based workflow. We shall not deal with how content is extracted and converted in to a processable document, but we shall focus on the various features of a document which a publisher must generate in the system to move forward and unleash the XML advantage forever in future. One of the simplest forms to structure published content is in XHTML (eXtensible HyperText Markup Language) [http://www.w3.org/TR/xhtml1/] which is a reformulation of HTML 4.0 into XML. We are presuming that a publisher shall leverage in-house resources to convert from print titles to image-PDFs and then save print-PDFs as HTML and further make them tidy to make them processable through the XML workflow. The Extensible Hypertext Markup Language, or XHTML, is a markup language that has the same depth of expression as HTML, but also conforms to XML syntax. [1] CONVERT FROM WORD TO XHTMLWe shall now study the entire workflow of converting a manuscript available in MSWord file into a processable XHTML file:
Once the above excercise is done, a clean HTML Markup is obtained. This markup may be checked for Well-formedness to convert HTML in to XHTML, which is characterised by the following properties [3]: XHTML stands for EXtensible HyperText Markup Language
Once we have obtained a clean and stricter version as XHTML, we are ready to progress further on to the XML Pipeline, which we shall discuss in subsequent posts. --------- [1] http://en.wikipedia.org/wiki/XHTML [2] http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_re... [3] http://www.w3schools.com/xhtml/xhtml_intro.asp
|
|
|||||
|
||||||