Skip to content

v4.1.0

Latest
Compare
Choose a tag to compare
@rgrove rgrove released this 05 Feb 00:25
· 4 commits to main since this release
v4.1.0
6629d4a

Added

  • Added a new includeOffsets parser option. #25

    When true, the starting and ending byte offsets of each node in the input string will be made available via start and end properties on the node. The default is false.

    This option is useful if you want to preserve the original source text of each node when later serializing a document back to XML. Previously, the original source text was always discarded, which meant that if you parsed a document and then serialized it, the original source text would be lost.

    const { parseXml } = require('@rgrove/parse-xml');
    
    let xml = '<root><child /></root>';
    let doc = parseXml(xml, { includeOffsets: true });
    
    console.log(doc.root.toJSON());
    // => { type: 'element', name: 'root', start: 0, end: 22, ... }
    
    console.log(doc.root.children[0].toJSON());
    // => { type: 'element', name: 'child', start: 6, end: 15, ... }
  • Added a new preserveXmlDeclaration parser option. #31

    When true, an XmlDeclaration node representing the XML declaration (if there is one) will be included in the parsed document. When false, the XML declaration will be discarded. The default is false, which matches the behavior of previous versions.

    This option is useful if you want to preserve the XML declaration when later serializing a document back to XML. Previously, the XML declaration was always discarded, which meant that if you parsed a document with an XML declaration and then serialized it, the original XML declaration would be lost.

    const { parseXml } = require('@rgrove/parse-xml');
    
    let xml = '<?xml version="1.0" encoding="UTF-8"?><root />';
    let doc = parseXml(xml, { preserveXmlDeclaration: true });
    
    console.log(doc.children[0].toJSON());
    // => { type: 'xmldecl', version: '1.0', encoding: 'UTF-8' }
  • Added a new preserveDocumentType parser option. #32

    When true, an XmlDocumentType node representing a document type declaration (if there is one) will be included in the parsed document. When false, any document type declaration encountered will be discarded. The default is false, which matches the behavior of previous versions.

    Note that the parser only includes the document type declaration in the node tree; it doesn't actually validate the document against the DTD, load external DTDs, or resolve custom entity references.

    This option is useful if you want to preserve the document type declaration when later serializing a document back to XML. Previously, the document type declaration was always discarded, which meant that if you parsed a document with a document type declaration and then serialized it, the original document type declaration would be lost.

    const { parseXml } = require('@rgrove/parse-xml');
    
    let xml = '<!DOCTYPE root SYSTEM "root.dtd"><root />';
    let doc = parseXml(xml, { preserveDocumentType: true });
    
    console.log(doc.children[0].toJSON());
    // => { type: 'doctype', name: 'root', systemId: 'root.dtd' }
    
    xml = '<!DOCTYPE kittens [<!ELEMENT kittens (#PCDATA)>]><kittens />';
    doc = parseXml(xml, { preserveDocumentType: true });
    
    console.log(doc.children[0].toJSON());
    // => {
    //   type: 'doctype',
    //   name: 'kittens',
    //   internalSubset: '<!ELEMENT kittens (#PCDATA)>'
    // }

Changed

  • Errors thrown by the parser are now instances of a new XmlError class, which extends Error. These errors still have all the same properties as before, but now with improved type definitions. #27

Fixed

  • Leading and trailing whitespace in comment content is no longer trimmed. This issue only affected parsing when the preserveComments parser option was enabled. #28

  • Text content following a CDATA section is no longer appended to the preceding XmlCdata node. This issue only affected parsing when the preserveCdata parser option was enabled. #29