Skip to content

federicocarboni/saxe

Repository files navigation

Saxe

Docs Coverage Bundle size

Light-weight and efficient SAX-style XML parser for JavaScript.

Goals

  • Full XML 1.0 and Namespaces in XML 1.0 standard conformance
  • Simple and terse API
  • Reduced code footprint
  • Set a base for other standards built on XML (e.g. XHTML)

Non-Goals

  • XML DTD validation
  • Full DOM implementation
  • Syntax error tolerance
  • Source code analysis or LSP features

XML 1.1 and Namespaces in XML 1.1

XML 1.1 and Namespaces in XML 1.1 are not supported. Documents declaring version 1.1 are parsed as XML 1.0, so features exclusive to version 1.1 are not recognized.

Modern UTF-8 web content is exclusively XML 1.0, which makes XML 1.1 and its namespaces mostly irrelevant.

XML 1.1 is used almost exclusively in legacy or specialized contexts where its niche features and better EBCDIC support might be useful. See XML - Wikipedia § Versions 1.0 and 1.1.

Example

import {SaxParser} from "@federicocarboni/saxe";

const parser = new SaxParser({
  startTag(name, attributes) {
    // Start tag: example
    // Start tag: empty-tag [attr, value]
    console.log("Start tag:", name, ...attributes);
  },
  endTag(name) {
    // End tag: example
    // End tag: empty-tag
    console.log("End tag:", name);
  },
  text(content) {
    // Text: Hello, world!
    console.log("Text:", content);
  },
});
parser.parse("<example>Hello, world!", {stream: true});
parser.parse(`<empty-tag attr="value" />`, {stream: true});
parser.parse("</example>");

Runtime Support

  • Basic XML parsing: any ES2017 runtime. For older runtimes transpiling and polyfilling should be enough.

Document Type Declaration

Many1 JavaScript XML parsers simplify handling of the internal DTD subset, by either not checking for well-formedness or ignoring its declarations.

Internal DTD subset parsing is required even for non-validating2 processors, this parser implements the entire specification:

  • The internal DTD subset is parsed and checked for well-formedness.
  • ATTLIST declarations are recognized to apply normalization and default values to attributes.
  • ENTITY declarations are recognized to expand entity references.

This process has security implications; so DTD processing can be enabled by configuring SaxOptions.dtd.

External markup declarations and external entities are not required for non-validating2 processors and are explicitly not supported.

Security

XML parsers may be subject to a number of possible vulnerabilities, most common attacks exploit external entity resolution and entity expansion.

This parser is strictly non-validating, so by design it should not be vulnerable to any XXE3 based attack. Additionally the length of strings collected during parsing is capped to limit the efficacy of other denial-of-service attacks4.

Following OWASP recommendations DTD processing is prohibited by default.

new SaxParser(handler, {
  // Reject any DOCTYPE declaration
  dtd: "prohibit", // default
  // Alternatively, allow it but ignore any declarations
  // dtd: "ignore",

  // Enforce stricter limits over strings and values
  // collected during parsing.
  maxAttributesLength: 10000,
  maxElementDepth: 30,
  maxEntityDepth: 5,
  maxEntityLength: 1000,
  maxNameLength: 500,
  maxTextLength: 10000,
})

Known XML Bombs are tested for as part of regular integration tests and the parser is fuzz tested regularly. Despite this being the case, for very sensible or security oriented apps you may want to conduct your own security audit.

License

Licensed under the Apache License, Version 2.0. See the LICENSE file for details.

Footnotes

  1. Other JavaScript XML parsers inspected include isaacs/sax-js, NaturalIntelligence/fast-xml-parser and lddubeau/saxes

  2. Non-validating XML processors (parsers) do not validate documents, but must still recognize and report well-formedness (syntax) errors. Non-validating processors are not required to fetch and parse external markup declarations and external entities. XML Standard § 5.1 Validating and Non-Validating Processors 2

  3. XML External Entity (XXE) Processing | OWASP Foundation

  4. XML Denial of Service Attacks and Defenses | Microsoft Learn

About

Simple API for XML for JavaScript

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published