source: icXML/icXML-devel/src/xercesc/dom/DOMLSSerializer.hpp @ 2722

Last change on this file since 2722 was 2722, checked in by cameron, 6 years ago

Original Xerces files with import mods for icxercesc

File size: 25.9 KB
Line 
1/*
2 * Licensed to the Apache Software Foundation (ASF) under one or more
3 * contributor license agreements.  See the NOTICE file distributed with
4 * this work for additional information regarding copyright ownership.
5 * The ASF licenses this file to You under the Apache License, Version 2.0
6 * (the "License"); you may not use this file except in compliance with
7 * the License.  You may obtain a copy of the License at
8 *
9 *      http://www.apache.org/licenses/LICENSE-2.0
10 *
11 * Unless required by applicable law or agreed to in writing, software
12 * distributed under the License is distributed on an "AS IS" BASIS,
13 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 * See the License for the specific language governing permissions and
15 * limitations under the License.
16 */
17
18/*
19 * $Id: DOMLSSerializer.hpp 883665 2009-11-24 11:41:38Z borisk $
20 */
21
22#if !defined(XERCESC_INCLUDE_GUARD_DOMLSSERIALIZER_HPP)
23#define XERCESC_INCLUDE_GUARD_DOMLSSERIALIZER_HPP
24
25/**
26 *
27 * DOMLSSerializer provides an API for serializing (writing) a DOM document out in
28 * an XML document. The XML data is written to an output stream, the type of
29 * which depends on the specific language bindings in use. During
30 * serialization of XML data, namespace fixup is done when possible.
31 * <p> <code>DOMLSSerializer</code> accepts any node type for serialization. For
32 * nodes of type <code>Document</code> or <code>Entity</code>, well formed
33 * XML will be created if possible. The serialized output for these node
34 * types is either as a Document or an External Entity, respectively, and is
35 * acceptable input for an XML parser. For all other types of nodes the
36 * serialized form is not specified, but should be something useful to a
37 * human for debugging or diagnostic purposes. Note: rigorously designing an
38 * external (source) form for stand-alone node types that don't already have
39 * one defined in  seems a bit much to take on here.
40 * <p>Within a Document or Entity being serialized, Nodes are processed as
41 * follows Documents are written including an XML declaration and a DTD
42 * subset, if one exists in the DOM. Writing a document node serializes the
43 * entire document.  Entity nodes, when written directly by
44 * <code>write</code> defined in the <code>DOMLSSerializer</code> interface,
45 * output the entity expansion but no namespace fixup is done. The resulting
46 * output will be valid as an external entity.  Entity References nodes are
47 * serializes as an entity reference of the form
48 * <code>"&amp;entityName;"</code>) in the output. Child nodes (the
49 * expansion) of the entity reference are ignored.  CDATA sections
50 * containing content characters that can not be represented in the
51 * specified output encoding are handled according to the
52 * "split-cdata-sections" feature.If the feature is <code>true</code>, CDATA
53 * sections are split, and the unrepresentable characters are serialized as
54 * numeric character references in ordinary content. The exact position and
55 * number of splits is not specified. If the feature is <code>false</code>,
56 * unrepresentable characters in a CDATA section are reported as errors. The
57 * error is not recoverable - there is no mechanism for supplying
58 * alternative characters and continuing with the serialization. All other
59 * node types (DOMElement, DOMText, etc.) are serialized to their corresponding
60 * XML source form.
61 * <p> Within the character data of a document (outside of markup), any
62 * characters that cannot be represented directly are replaced with
63 * character references. Occurrences of '&lt;' and '&amp;' are replaced by
64 * the predefined entities &amp;lt; and &amp;amp. The other predefined
65 * entities (&amp;gt, &amp;apos, etc.) are not used; these characters can be
66 * included directly. Any character that can not be represented directly in
67 * the output character encoding is serialized as a numeric character
68 * reference.
69 * <p> Attributes not containing quotes are serialized in quotes. Attributes
70 * containing quotes but no apostrophes are serialized in apostrophes
71 * (single quotes). Attributes containing both forms of quotes are
72 * serialized in quotes, with quotes within the value represented by the
73 * predefined entity &amp;quot;. Any character that can not be represented
74 * directly in the output character encoding is serialized as a numeric
75 * character reference.
76 * <p> Within markup, but outside of attributes, any occurrence of a character
77 * that cannot be represented in the output character encoding is reported
78 * as an error. An example would be serializing the element
79 * &lt;LaCa&#xF1;ada/&gt; with the encoding="us-ascii".
80 * <p> When requested by setting the <code>normalize-characters</code> feature
81 * on <code>DOMLSSerializer</code>, all data to be serialized, both markup and
82 * character data, is W3C Text normalized according to the rules defined in
83 * . The W3C Text normalization process affects only the data as it is being
84 * written; it does not alter the DOM's view of the document after
85 * serialization has completed.
86 * <p>Namespaces are fixed up during serialization, the serialization process
87 * will verify that namespace declarations, namespace prefixes and the
88 * namespace URIs associated with Elements and Attributes are consistent. If
89 * inconsistencies are found, the serialized form of the document will be
90 * altered to remove them. The algorithm used for doing the namespace fixup
91 * while seralizing a document is a combination of the algorithms used for
92 * lookupNamespaceURI and lookupPrefix. previous paragraph to be
93 * defined closer here.
94 * <p>Any changes made affect only the namespace prefixes and declarations
95 * appearing in the serialized data. The DOM's view of the document is not
96 * altered by the serialization operation, and does not reflect any changes
97 * made to namespace declarations or prefixes in the serialized output.
98 * <p> While serializing a document the serializer will write out
99 * non-specified values (such as attributes whose <code>specified</code> is
100 * <code>false</code>) if the <code>output-default-values</code> feature is
101 * set to <code>true</code>. If the <code>output-default-values</code> flag
102 * is set to <code>false</code> and the <code>use-abstract-schema</code>
103 * feature is set to <code>true</code> the abstract schema will be used to
104 * determine if a value is specified or not, if
105 * <code>use-abstract-schema</code> is not set the <code>specified</code>
106 * flag on attribute nodes is used to determine if attribute values should
107 * be written out.
108 * <p> Ref to Core spec (1.1.9, XML namespaces, 5th paragraph) entity ref
109 * description about warning about unbound entity refs. Entity refs are
110 * always serialized as &amp;foo;, also mention this in the load part of
111 * this spec.
112 * <p> When serializing a document the DOMLSSerializer checks to see if the document
113 * element in the document is a DOM Level 1 element or a DOM Level 2 (or
114 * higher) element (this check is done by looking at the localName of the
115 * root element). If the root element is a DOM Level 1 element then the
116 * DOMLSSerializer will issue an error if a DOM Level 2 (or higher) element is
117 * found while serializing. Likewise if the document element is a DOM Level
118 * 2 (or higher) element and the DOMLSSerializer sees a DOM Level 1 element an
119 * error is issued. Mixing DOM Level 1 elements with DOM Level 2 (or higher)
120 * is not supported.
121 * <p> <code>DOMLSSerializer</code>s have a number of named features that can be
122 * queried or set. The name of <code>DOMLSSerializer</code> features must be valid
123 * XML names. Implementation specific features (extensions) should choose an
124 * implementation dependent prefix to avoid name collisions.
125 * <p>Here is a list of properties that must be recognized by all
126 * implementations.
127 * <dl>
128 * <dt><code>"normalize-characters"</code></dt>
129 * <dd>
130 * <dl>
131 * <dt><code>true</code></dt>
132 * <dd>[
133 * optional] (default) Perform the W3C Text Normalization of the characters
134 * in document as they are written out. Only the characters being written
135 * are (potentially) altered. The DOM document itself is unchanged. </dd>
136 * <dt>
137 * <code>false</code></dt>
138 * <dd>[required] do not perform character normalization. </dd>
139 * </dl></dd>
140 * <dt>
141 * <code>"split-cdata-sections"</code></dt>
142 * <dd>
143 * <dl>
144 * <dt><code>true</code></dt>
145 * <dd>[required] (default)
146 * Split CDATA sections containing the CDATA section termination marker
147 * ']]&gt;' or characters that can not be represented in the output
148 * encoding, and output the characters using numeric character references.
149 * If a CDATA section is split a warning is issued. </dd>
150 * <dt><code>false</code></dt>
151 * <dd>[
152 * required] Signal an error if a <code>CDATASection</code> contains an
153 * unrepresentable character. </dd>
154 * </dl></dd>
155 * <dt><code>"validation"</code></dt>
156 * <dd>
157 * <dl>
158 * <dt><code>true</code></dt>
159 * <dd>[
160 * optional] Use the abstract schema to validate the document as it is being
161 * serialized. If validation errors are found the error handler is notified
162 * about the error. Setting this state will also set the feature
163 * <code>use-abstract-schema</code> to <code>true</code>. </dd>
164 * <dt><code>false</code></dt>
165 * <dd>[
166 * required] (default) Don't validate the document as it is being
167 * serialized. </dd>
168 * </dl></dd>
169 * <dt><code>"expand-entity-references"</code></dt>
170 * <dd>
171 * <dl>
172 * <dt><code>true</code></dt>
173 * <dd>[
174 * optional] Expand <code>EntityReference</code> nodes when serializing. </dd>
175 * <dt>
176 * <code>false</code></dt>
177 * <dd>[required] (default) Serialize all
178 * <code>EntityReference</code> nodes as XML entity references. </dd>
179 * </dl></dd>
180 * <dt>
181 * <code>"whitespace-in-element-content"</code></dt>
182 * <dd>
183 * <dl>
184 * <dt><code>true</code></dt>
185 * <dd>[required] (
186 * default) Output all white spaces in the document. </dd>
187 * <dt><code>false</code></dt>
188 * <dd>[
189 * optional] Only output white space that is not within element content. The
190 * implementation is expected to use the
191 * <code>isWhitespaceInElementContent</code> flag on <code>Text</code> nodes
192 * to determine if a text node should be written out or not. </dd>
193 * </dl></dd>
194 * <dt>
195 * <code>"discard-default-content"</code></dt>
196 * <dd>
197 * <dl>
198 * <dt><code>true</code></dt>
199 * <dd>[required] (default
200 * ) Use whatever information available to the implementation (i.e. XML
201 * schema, DTD, the <code>specified</code> flag on <code>Attr</code> nodes,
202 * and so on) to decide what attributes and content should be serialized or
203 * not. Note that the <code>specified</code> flag on <code>Attr</code> nodes
204 * in itself is not always reliable, it is only reliable when it is set to
205 * <code>false</code> since the only case where it can be set to
206 * <code>false</code> is if the attribute was created by a Level 1
207 * implementation. </dd>
208 * <dt><code>false</code></dt>
209 * <dd>[required] Output all attributes and
210 * all content. </dd>
211 * </dl></dd>
212 * <dt><code>"format-canonical"</code></dt>
213 * <dd>
214 * <dl>
215 * <dt><code>true</code></dt>
216 * <dd>[optional]
217 * This formatting writes the document according to the rules specified in .
218 * Setting this feature to true will set the feature "format-pretty-print"
219 * to false. </dd>
220 * <dt><code>false</code></dt>
221 * <dd>[required] (default) Don't canonicalize the
222 * output. </dd>
223 * </dl></dd>
224 * <dt><code>"format-pretty-print"</code></dt>
225 * <dd>
226 * <dl>
227 * <dt><code>true</code></dt>
228 * <dd>[optional]
229 * Formatting the output by adding whitespace to produce a pretty-printed,
230 * indented, human-readable form. The exact form of the transformations is
231 * not specified by this specification. Setting this feature to true will
232 * set the feature "format-canonical" to false. </dd>
233 * <dt><code>false</code></dt>
234 * <dd>[required]
235 * (default) Don't pretty-print the result. </dd>
236 * </dl></dd>
237 * <dt><code>"http://apache.org/xml/features/dom/byte-order-mark"</code></dt>
238 * <dd>
239 * <dl>
240 * <dt><code>false</code></dt>
241 * <dd>[optional]
242 * (default) Setting this feature to true will output the correct BOM for the specified
243 * encoding. </dd>
244 * <dt><code>true</code></dt>
245 * <dd>[required]
246 * Don't generate a BOM. </dd>
247 * </dl></dd>
248 * <dt><code>"http://apache.org/xml/features/pretty-print/space-first-level-elements"</code></dt>
249 * <dd>
250 * <dl>
251 * <dt><code>true</code></dt>
252 * <dd>[optional]
253 * (default) Setting this feature to true will add an extra line feed between the elements
254 * that are children of the document root. </dd>
255 * <dt><code>false</code></dt>
256 * <dd>[required]
257 * Don't add the extra line feed. </dd>
258 * </dl></dd>
259 * </dl>
260 * <p>See also the <a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407'>Document Object Model (DOM) Level 3 Load and Save Specification</a>.
261 *
262 * @since DOM Level 3
263 */
264
265
266#include <xercesc/dom/DOMNode.hpp>
267#include <xercesc/dom/DOMLSSerializerFilter.hpp>
268#include <xercesc/dom/DOMErrorHandler.hpp>
269#include <xercesc/dom/DOMConfiguration.hpp>
270
271XERCES_CPP_NAMESPACE_BEGIN
272
273class DOMLSOutput;
274
275class CDOM_EXPORT DOMLSSerializer
276{
277protected :
278    // -----------------------------------------------------------------------
279    //  Hidden constructors
280    // -----------------------------------------------------------------------
281    /** @name Hidden constructors */
282    //@{
283    DOMLSSerializer() {};
284    //@}
285private:
286    // -----------------------------------------------------------------------
287    // Unimplemented constructors and operators
288    // -----------------------------------------------------------------------
289    /** @name Unimplemented constructors and operators */
290    //@{
291    DOMLSSerializer(const DOMLSSerializer &);
292    DOMLSSerializer & operator = (const DOMLSSerializer &);
293    //@}
294
295
296public:
297    // -----------------------------------------------------------------------
298    //  All constructors are hidden, just the destructor is available
299    // -----------------------------------------------------------------------
300    /** @name Destructor */
301    //@{
302    /**
303     * Destructor
304     *
305     */
306    virtual ~DOMLSSerializer() {};
307    //@}
308
309    // -----------------------------------------------------------------------
310    //  Virtual DOMLSSerializer interface
311    // -----------------------------------------------------------------------
312    /** @name Functions introduced in DOM Level 3 */
313    //@{
314    // -----------------------------------------------------------------------
315    //  Feature methods
316    // -----------------------------------------------------------------------
317    /**
318      * The DOMConfiguration object used by the LSSerializer when serializing a DOM node.
319      *
320      * In addition to the parameters recognized in on the <code>DOMConfiguration</code>
321      * interface defined in [DOM Level 3 Core], the <code>DOMConfiguration</code> objects
322      * for <code>DOMLSSerializer</code> add or modify the following parameters:
323      *
324      * "canonical-form"
325      *     true [optional]
326      *         Writes the document according to the rules specified in [Canonical XML]. In addition to
327      *         the behavior described in "canonical-form" [DOM Level 3 Core], setting this parameter to
328      *         true will set the parameters "format-pretty-print", "discard-default-content", and
329      *         "xml-declaration", to false. Setting one of those parameters to true will set this
330      *         parameter to false. Serializing an XML 1.1 document when "canonical-form" is true will
331      *         generate a fatal error.
332      *     false [required] (default)
333      *         Do not canonicalize the output.
334      *
335      * "discard-default-content"
336      *     true [required] (default)
337      *         Use the DOMAttr::getSpecified attribute to decide what attributes should be discarded.
338      *         Note that some implementations might use whatever information available to the implementation
339      *         (i.e. XML schema, DTD, the DOMAttr::getSpecified attribute, and so on) to determine what
340      *         attributes and content to discard if this parameter is set to true.
341      *     false [required]
342      *         Keep all attributes and all content.
343      *
344      * "format-pretty-print"
345      *     true [optional]
346      *         Formatting the output by adding whitespace to produce a pretty-printed, indented,
347      *         human-readable form. The exact form of the transformations is not specified by this specification.
348      *         Pretty-printing changes the content of the document and may affect the validity of the document,
349      *         validating implementations should preserve validity.
350      *     false [required] (default)
351      *         Don't pretty-print the result.
352      *
353      * "ignore-unknown-character-denormalizations"
354      *     true [required] (default)
355      *         If, while verifying full normalization when [XML 1.1] is supported, a character is encountered
356      *         for which the normalization properties cannot be determined, then raise a "unknown-character-denormalization"
357      *         warning (instead of raising an error, if this parameter is not set) and ignore any possible
358      *         denormalizations caused by these characters.
359      *     false [optional]
360      *         Report a fatal error if a character is encountered for which the processor cannot determine the
361      *         normalization properties.
362      *
363      * "normalize-characters"
364      *     This parameter is equivalent to the one defined by <code>DOMConfiguration</code> in [DOM Level 3 Core].
365      *     Unlike in the Core, the default value for this parameter is true. While DOM implementations are not
366      *     required to support fully normalizing the characters in the document according to appendix E of [XML 1.1],
367      *     this parameter must be activated by default if supported.
368      *
369      * "xml-declaration"
370      *     true [required] (default)
371      *         If a DOMDocument, DOMElement, or DOMEntity node is serialized, the XML declaration, or text declaration,
372      *         should be included. The version (DOMDocument::xmlVersion if the document is a Level 3 document and the
373      *         version is non-null, otherwise use the value "1.0"), and the output encoding (see DOMLSSerializer::write
374      *         for details on how to find the output encoding) are specified in the serialized XML declaration.
375      *     false [required]
376      *         Do not serialize the XML and text declarations. Report a "xml-declaration-needed" warning if this will
377      *         cause problems (i.e. the serialized data is of an XML version other than [XML 1.0], or an encoding would
378      *         be needed to be able to re-parse the serialized data).
379      *
380      * "error-handler"
381      *     Contains a DOMErrorHandler object. If an error is encountered in the document, the implementation will call back
382      *     the DOMErrorHandler registered using this parameter. The implementation may provide a default DOMErrorHandler
383      *     object. When called, DOMError::relatedData will contain the closest node to where the error occurred.
384      *     If the implementation is unable to determine the node where the error occurs, DOMError::relatedData will contain
385      *     the DOMDocument node. Mutations to the document from within an error handler will result in implementation
386      *     dependent behavior.
387      *
388      * @return The pointer to the configuration object.
389      * @since DOM Level 3
390      */
391    virtual DOMConfiguration* getDomConfig() = 0;
392
393    // -----------------------------------------------------------------------
394    //  Setter methods
395    // -----------------------------------------------------------------------
396    /**
397     * The end-of-line sequence of characters to be used in the XML being
398     * written out. The only permitted values are these:
399     * <dl>
400     * <dt><code>null</code></dt>
401     * <dd>
402     * Use a default end-of-line sequence. DOM implementations should choose
403     * the default to match the usual convention for text files in the
404     * environment being used. Implementations must choose a default
405     * sequence that matches one of those allowed by  2.11 "End-of-Line
406     * Handling". However, Xerces-C++ always uses LF when this
407     * property is set to <code>null</code> since otherwise automatic
408     * translation of LF to CR-LF on Windows for text files would
409     * result in such files containing CR-CR-LF. If you need Windows-style
410     * end of line sequences in your output, consider writing to a file
411     * opened in text mode or explicitly set this property to CR-LF.</dd>
412     * <dt>CR</dt>
413     * <dd>The carriage-return character (\#xD).</dd>
414     * <dt>CR-LF</dt>
415     * <dd> The
416     * carriage-return and line-feed characters (\#xD \#xA). </dd>
417     * <dt>LF</dt>
418     * <dd> The line-feed
419     * character (\#xA). </dd>
420     * </dl>
421     * <br>The default value for this attribute is <code>null</code>.
422     *
423     * @param newLine      The end-of-line sequence of characters to be used.
424     * @see   getNewLine
425     * @since DOM Level 3
426     */
427    virtual void          setNewLine(const XMLCh* const newLine) = 0;
428
429    /**
430     * When the application provides a filter, the serializer will call out
431     * to the filter before serializing each Node. Attribute nodes are never
432     * passed to the filter. The filter implementation can choose to remove
433     * the node from the stream or to terminate the serialization early.
434     *
435     * @param filter       The writer filter to be used.
436     * @see   getFilter
437     * @since DOM Level 3
438     */
439    virtual void         setFilter(DOMLSSerializerFilter *filter) = 0;
440
441    // -----------------------------------------------------------------------
442    //  Getter methods
443    // -----------------------------------------------------------------------
444    /**
445     * Return the end-of-line sequence of characters to be used in the XML being
446     * written out.
447     *
448     * @return             The end-of-line sequence of characters to be used.
449     * @see   setNewLine
450     * @since DOM Level 3
451     */
452     virtual const XMLCh*       getNewLine() const = 0;
453
454    /**
455     * Return the WriterFilter used.
456     *
457     * @return             The writer filter used.
458     * @see   setFilter
459     * @since DOM Level 3
460     */
461     virtual DOMLSSerializerFilter*   getFilter() const = 0;
462
463    // -----------------------------------------------------------------------
464    //  Write methods
465    // -----------------------------------------------------------------------
466    /**
467     * Write out the specified node as described above in the description of
468     * <code>DOMLSSerializer</code>. Writing a Document or Entity node produces a
469     * serialized form that is well formed XML. Writing other node types
470     * produces a fragment of text in a form that is not fully defined by
471     * this document, but that should be useful to a human for debugging or
472     * diagnostic purposes.
473     *
474     * @param nodeToWrite The <code>Document</code> or <code>Entity</code> node to
475     *   be written. For other node types, something sensible should be
476     *   written, but the exact serialized form is not specified.
477     * @param destination The destination for the data to be written.
478     * @return  Returns <code>true</code> if <code>node</code> was
479     *   successfully serialized and <code>false</code> in case a failure
480     *   occured and the failure wasn't canceled by the error handler.
481     * @since DOM Level 3
482     */
483    virtual bool       write(const DOMNode*         nodeToWrite,
484                             DOMLSOutput* const destination) = 0;
485
486    /**
487     * Write out the specified node as described above in the description of
488     * <code>DOMLSSerializer</code>. Writing a Document or Entity node produces a
489     * serialized form that is well formed XML. Writing other node types
490     * produces a fragment of text in a form that is not fully defined by
491     * this document, but that should be useful to a human for debugging or
492     * diagnostic purposes.
493     *
494     * @param nodeToWrite The <code>Document</code> or <code>Entity</code> node to
495     *   be written. For other node types, something sensible should be
496     *   written, but the exact serialized form is not specified.
497     * @param uri The destination for the data to be written.
498     * @return  Returns <code>true</code> if <code>node</code> was
499     *   successfully serialized and <code>false</code> in case a failure
500     *   occured and the failure wasn't canceled by the error handler.
501     * @since DOM Level 3
502     */
503    virtual bool       writeToURI(const DOMNode*    nodeToWrite,
504                                  const XMLCh*      uri) = 0;
505    /**
506     * Serialize the specified node as described above in the description of
507     * <code>DOMLSSerializer</code>. The result of serializing the node is
508     * returned as a string. Writing a Document or Entity node produces a
509     * serialized form that is well formed XML. Writing other node types
510     * produces a fragment of text in a form that is not fully defined by
511     * this document, but that should be useful to a human for debugging or
512     * diagnostic purposes.
513     *
514     * @param nodeToWrite  The node to be written.
515     * @param manager  The memory manager to be used to allocate the result string.
516     *   If NULL is used, the memory manager used to construct the serializer will
517     *   be used.
518     * @return  Returns the serialized data, or <code>null</code> in case a
519     *   failure occured and the failure wasn't canceled by the error
520     *   handler.   The returned string is always in UTF-16.
521     *   The encoding information available in DOMLSSerializer is ignored in writeToString().
522     * @since DOM Level 3
523     */
524    virtual XMLCh*     writeToString(const DOMNode* nodeToWrite, MemoryManager* manager = NULL) = 0;
525
526    //@}
527
528    // -----------------------------------------------------------------------
529    //  Non-standard Extension
530    // -----------------------------------------------------------------------
531    /** @name Non-standard Extension */
532    //@{
533    /**
534     * Called to indicate that this Writer is no longer in use
535     * and that the implementation may relinquish any resources associated with it.
536     *
537     * Access to a released object will lead to unexpected result.
538     */
539    virtual void              release() = 0;
540    //@}
541
542
543};
544
545XERCES_CPP_NAMESPACE_END
546
547#endif
Note: See TracBrowser for help on using the repository browser.