Skip to content

CWE-91: XML Injection - JavaScript

Overview

XML Injection in JavaScript/Node.js applications occurs when untrusted user input is used to construct XML documents without proper validation or escaping. Attackers can manipulate XML structure by injecting special characters like <, >, &, ', and ", leading to data corruption, authentication bypass, or information disclosure.

Primary Defence: Use XML builder libraries like xmlbuilder2 with proper API methods instead of string concatenation, validate and sanitize all user input before including in XML documents, escape XML special characters (<, >, &, ", ') using dedicated escaping functions, and implement schema validation to ensure XML structure integrity and prevent XML injection attacks.

Common JavaScript XML Vulnerability Scenarios:

  • Building XML with template literals or string concatenation
  • Using user input directly in XML elements or attributes
  • SOAP/REST XML payloads with unsanitized data
  • XML configuration files with user data
  • Server-side rendering of XML responses

Popular Node.js XML Libraries:

  • xmlbuilder2: Modern XML builder
  • xml2js: XML to JavaScript object conversion
  • fast-xml-parser: High-performance XML parser
  • xmldom: W3C DOM implementation
  • jstoxml: JSON to XML conversion
  • he: HTML entity encoder/decoder (works for XML)

XML Special Characters Requiring Escaping:

  • <&lt;
  • >&gt;
  • &&amp;
  • '&apos;
  • "&quot;

Common Vulnerable Patterns

Template Literal Concatenation

// VULNERABLE - Direct template literal
function createUserXml(username, email) {
    // VULNERABLE - User input directly in XML template
    const xml = `<?xml version="1.0"?>
<user>
    <username>${username}</username>
    <email>${email}</email>
</user>`;

    return xml;
}

// Attack: username = "</username><admin>true</admin><username>"
// Result: <username></username><admin>true</admin><username></username>
// Creates unintended <admin> element

Why this is vulnerable:

  • No escaping of XML special characters
  • Template literals allow injection
  • Can modify XML structure
  • Bypasses validation

Express.js REST API

// VULNERABLE - Express endpoint returning XML
const express = require('express');
const app = express();

app.get('/api/user', (req, res) => {
    const { username, email } = req.query;

    // VULNERABLE - Query parameters in XML
    const xmlResponse = `<?xml version="1.0" encoding="UTF-8"?>
<response>
    <user>
        <name>${username}</name>
        <email>${email}</email>
    </user>
</response>`;

    res.set('Content-Type', 'application/xml');
    res.send(xmlResponse);
});

app.listen(3000);

// Attack: username = "<admin>true</admin>"
// Response contains: <name><admin>true</admin></name>

Why this is vulnerable:

  • Express doesn't auto-escape XML
  • Request parameters directly in XML
  • No validation or sanitization
  • Information disclosure possible

SOAP Request Construction

// VULNERABLE - Building SOAP XML manually
const axios = require('axios');

async function callSoapService(userId, action) {
    // VULNERABLE - User input in SOAP envelope
    const soapEnvelope = `<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body>
        <GetUserData>
            <UserId>${userId}</UserId>
            <Action>${action}</Action>
        </GetUserData>
    </soap:Body>
</soap:Envelope>`;

    const response = await axios.post('https://api.example.com/soap', soapEnvelope, {
        headers: { 'Content-Type': 'text/xml' }
    });

    return response.data;
}

// Attack: userId = "</UserId><Role>admin</Role><UserId>"
// Injects admin role into SOAP request

Why this is vulnerable:

  • SOAP envelope built with template literals
  • Allows element injection
  • Can escalate privileges
  • Modify request structure

XML Configuration Files

// VULNERABLE - Writing XML config with user data
const fs = require('fs');

function saveUserSettings(username, theme, language) {
    // VULNERABLE - User input in XML config
    const configXml = `<?xml version="1.0"?>
<config>
    <user>${username}</user>
    <preferences>
        <theme>${theme}</theme>
        <language>${language}</language>
    </preferences>
</config>`;

    fs.writeFileSync('config.xml', configXml);
}

// Attack: theme = "</theme><admin_access>true</admin_access><theme>"
// Modifies configuration structure

Why this is vulnerable:

  • Configuration files parsed by XML parser
  • Persistent injection
  • Can modify application behavior
  • Privilege escalation

xml2js with String Building

// VULNERABLE - Building XML string before parsing
const xml2js = require('xml2js');

function createXmlResponse(data) {
    // VULNERABLE - Building XML string manually
    const xmlStr = `<response>
    <status>success</status>
    <data>${data}</data>
</response>`;

    // Parse the vulnerable XML
    xml2js.parseString(xmlStr, (err, result) => {
        console.log(result);
    });

    return xmlStr;
}

// Attack: data = "</data><malicious>payload</malicious><data>"
// Injects malicious elements

Why this is vulnerable:

  • Library doesn't escape string concatenation
  • Injection before parsing
  • No validation
  • Parser accepts malformed structure

XPath Query Injection

// VULNERABLE - User input in XPath query
const xpath = require('xpath');
const dom = require('xmldom').DOMParser;

function findUserByName(xmlDoc, username) {
    const doc = new dom().parseFromString(xmlDoc);

    // VULNERABLE - XPath injection
    const query = `//user[name='${username}']`;
    const nodes = xpath.select(query, doc);

    return nodes;
}

// Attack: username = "' or '1'='1"
// XPath: //user[name='' or '1'='1']
// Returns all users

Why this is vulnerable:

  • XPath query built with template literals
  • Boolean-based injection
  • Bypasses authentication checks
  • Information disclosure

Next.js API Route

// VULNERABLE - Next.js API route with XML
// pages/api/user.js

export default function handler(req, res) {
    const { username, email } = req.query;

    // VULNERABLE - Query parameters in XML
    const xml = `<?xml version="1.0"?>
<user>
    <username>${username}</username>
    <email>${email}</email>
</user>`;

    res.setHeader('Content-Type', 'application/xml');
    res.status(200).send(xml);
}

Why this is vulnerable:

  • Next.js doesn't escape XML automatically
  • Query parameters directly in response
  • No validation
  • Framework doesn't prevent injection

jstoxml with Unsafe Data

// VULNERABLE - jstoxml with unvalidated input
const jstoxml = require('jstoxml');

function convertToXml(userData) {
    // VULNERABLE - User data may contain XML
    const obj = {
        user: {
            name: userData.name,  // Could contain XML tags
            bio: userData.bio     // Could contain injection
        }
    };

    // jstoxml doesn't escape if data already contains XML
    return jstoxml.toXML(obj);
}

// Attack: userData.bio = "</bio><admin>true</admin><bio>"

Why this is vulnerable:

  • Assumes clean input data
  • Some libraries don't escape by default
  • Database values can contain injections
  • Persistent XSS-like attacks

Secure Patterns

xmlbuilder2 Library

// SECURE - Using xmlbuilder2 (recommended)
const { create } = require('xmlbuilder2');

function validateUsername(username) {
    return /^[a-zA-Z0-9._-]{1,100}$/.test(username);
}

function validateEmail(email) {
    return /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/.test(email);
}

function createUserXml(username, email) {
    // SECURE - Validate inputs
    if (!validateUsername(username)) {
        throw new Error('Invalid username');
    }
    if (!validateEmail(email)) {
        throw new Error('Invalid email');
    }

    // SECURE - Use xmlbuilder2 API
    const doc = create({ version: '1.0' })
        .ele('user')
            .ele('username').txt(username).up()  // Automatically escaped
            .ele('email').txt(email).up()
        .end({ prettyPrint: true });

    return doc;
}

// Example usage:
// createUserXml("<script>alert('xss')</script>", "test@example.com")
// Result: <username>&lt;script&gt;alert('xss')&lt;/script&gt;</username>
// Special characters properly escaped

Why this works: xmlbuilder2 (create(), .ele(), .txt()) automatically escapes XML special characters (<, >, &, ', ") when setting element text content via .txt(), preventing attackers from injecting closing tags like </username><admin>true</admin><username>. The .txt() method treats the parameter as character data, not markup - so even if username contains <admin>true</admin>, it becomes &lt;admin&gt;true&lt;/admin&gt; in the output.

Regex validation provides defense-in-depth: the username pattern (^[a-zA-Z0-9._-]{1,100}$) blocks XML metacharacters before they reach the API, and email validation prevents addresses like admin@example.com</email><role>admin</role><email>. .up() navigation moves back to the parent element, enabling fluent API chaining for nested structures. end({ prettyPrint: true }) serializes the XML tree with indentation, producing human-readable output.

This pattern is immune to injection because xmlbuilder2 maintains a DOM-like tree internally and serializes it safely, never concatenating raw strings. Modern API with method chaining is more readable than manual DOM manipulation, making security reviews easier.

Express.js with xmlbuilder2

// SECURE - Express with validation and escaping
const express = require('express');
const { create } = require('xmlbuilder2');

const app = express();

function validateSearchQuery(query) {
    return /^[a-zA-Z0-9\s]{1,50}$/.test(query);
}

app.get('/api/users/search', (req, res) => {
    const { q } = req.query;

    // SECURE - Validate input
    if (!q || !validateSearchQuery(q)) {
        const errorXml = create({ version: '1.0' })
            .ele('error')
                .txt('Invalid search query')
            .end();

        res.status(400)
            .set('Content-Type', 'application/xml')
            .send(errorXml);
        return;
    }

    try {
        // Simulate database search
        const users = [
            { name: 'Alice', email: 'alice@example.com' },
            { name: 'Bob', email: 'bob@example.com' }
        ].filter(u => u.name.toLowerCase().includes(q.toLowerCase()));

        // SECURE - Build XML with xmlbuilder2
        const doc = create({ version: '1.0' })
            .ele('response');

        const usersElem = doc.ele('users');

        users.forEach(user => {
            usersElem.ele('user')
                .ele('name').txt(user.name).up()  // Auto-escaped
                .ele('email').txt(user.email).up();
        });

        const xml = doc.end({ prettyPrint: true });

        res.set('Content-Type', 'application/xml')
            .send(xml);

    } catch (error) {
        // SECURE - Generic error message
        console.error('Search error:', error);

        const errorXml = create({ version: '1.0' })
            .ele('error')
                .txt('Search failed')
            .end();

        res.status(500)
            .set('Content-Type', 'application/xml')
            .send(errorXml);
    }
});

app.listen(3000);

Why this works: Regex validation (^[a-zA-Z0-9\s]{1,50}$) blocks XML metacharacters (<, >, &, quotes) before they reach the XML API, preventing injection attempts like q=</name><admin>true</admin><name>. xmlbuilder2's .txt() method automatically escapes any remaining content, providing layered defense - even if validation is bypassed, escaping prevents structural changes.

Generic error messages ("Invalid search query", "Search failed") in XML responses prevent information disclosure - attackers don't learn whether rejection was due to regex mismatch, length limits, or database errors. .forEach() iteration with .ele('user') for each result creates well-formed XML - the API enforces proper nesting and closing tags. res.set('Content-Type', 'application/xml') ensures browsers and clients parse the response as XML (not HTML), preventing MIME confusion attacks.

Try-catch with server-side logging (console.error) captures exceptions for debugging while sending generic errors to clients, supporting troubleshooting without information disclosure. The declarative xmlbuilder2 structure makes the code auditable - reviewers can see the XML schema directly in the API chaining.

This pattern is ideal for Express REST APIs returning XML responses to legacy clients.

Manual Escaping with 'he' Library

// SECURE - Using 'he' library for XML escaping
const he = require('he');

function validateInput(str, maxLength = 100) {
    return typeof str === 'string' && str.length <= maxLength;
}

function createUserXmlWithEscaping(username, email) {
    // SECURE - Validate inputs
    if (!validateInput(username, 100)) {
        throw new Error('Invalid username');
    }
    if (!validateInput(email, 255)) {
        throw new Error('Invalid email');
    }

    // SECURE - Use 'he' to escape XML entities
    const safeUsername = he.encode(username, {
        useNamedReferences: false,
        encodeEverything: false
    });
    const safeEmail = he.encode(email, {
        useNamedReferences: false,
        encodeEverything: false
    });

    const xml = `<?xml version="1.0"?>
<user>
    <username>${safeUsername}</username>
    <email>${safeEmail}</email>
</user>`;

    return xml;
}

// npm install he

Why this works: he.encode() is a lightweight, battle-tested library that escapes XML/HTML entities (<&lt;, >&gt;, &&amp;, '&#x27;, "&quot;), making it safe to embed user input in manually constructed XML strings. The useNamedReferences: false option produces numeric entities (&#x27; for ') instead of named ones (&apos;), which are more widely supported across parsers. encodeEverything: false escapes only special characters, not all Unicode - use encodeEverything: true for maximum paranoia (escapes all non-ASCII).

Pre-validation (validateInput() with length checks) provides defense-in-depth - even if he.encode() has edge cases, the allowlist blocks malicious input. This pattern is useful when xmlbuilder2 or other DOM APIs are unavailable (e.g., legacy codebases, bundle size constraints). However, manual string concatenation is less preferred than xmlbuilder2 because developers might forget to escape a variable, or escape incorrectly.

he library (2.5KB minified + gzipped) is the successor to he from mathiasbynens, widely used in React, webpack, and other tools. Note: he.encode() is safe for element content and attribute values, but for complex scenarios (CDATA, XPath), xmlbuilder2 is safer. Use this pattern only when bundle size matters and you understand the escaping rules.

TypeScript with xmlbuilder2

// SECURE - TypeScript with xmlbuilder2 and validation
import { create } from 'xmlbuilder2';

interface UserData {
    username: string;
    email: string;
    bio?: string;
}

class SecureXmlBuilder {
    private readonly USERNAME_PATTERN = /^[a-zA-Z0-9._-]{3,64}$/;
    private readonly EMAIL_PATTERN = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;

    private validateUsername(username: string): boolean {
        return this.USERNAME_PATTERN.test(username);
    }

    private validateEmail(email: string): boolean {
        return this.EMAIL_PATTERN.test(email);
    }

    public createUserXml(userData: UserData): string {
        // SECURE - Validate inputs
        if (!this.validateUsername(userData.username)) {
            throw new Error('Invalid username format');
        }

        if (!this.validateEmail(userData.email)) {
            throw new Error('Invalid email format');
        }

        if (userData.bio && userData.bio.length > 500) {
            throw new Error('Bio too long');
        }

        // SECURE - xmlbuilder2 handles escaping
        const doc = create({ version: '1.0', encoding: 'UTF-8' })
            .ele('user')
                .ele('username').txt(userData.username).up()
                .ele('email').txt(userData.email).up();

        if (userData.bio) {
            doc.ele('bio').txt(userData.bio).up();
        }

        return doc.end({ prettyPrint: true });
    }
}

export default SecureXmlBuilder;

Why this works: TypeScript's type system (interface UserData, string parameters) provides compile-time safety - the method signature enforces that username and email are strings, preventing accidental number or object injection. Class-based encapsulation (private methods, readonly patterns) centralizes validation logic, ensuring all XML generation paths enforce the same strict rules. Regex validation (^[a-zA-Z0-9._-]{3,64}$ for usernames, email pattern) blocks XML metacharacters before they reach xmlbuilder2. Length limits (bio.length > 500) prevent DoS via extremely long values.

Optional properties (bio?: string) with runtime checks (if (userData.bio)) handle optional fields safely - the XML only includes <bio> if provided. xmlbuilder2's .txt() method automatically escapes content, so even if bio contains <script>, it becomes &lt;script&gt;. Explicit encoding (encoding: 'UTF-8') ensures international characters are handled correctly.

The export default pattern enables easy integration with Express, Next.js, or other frameworks. TypeScript's type inference makes the code self-documenting - reviewers see that createUserXml returns string (the XML document). This pattern is ideal for TypeScript projects where type safety + security are critical.

Next.js API Route with Security

// SECURE - Next.js API route with validation
// pages/api/user.js
import { create } from 'xmlbuilder2';

function validateUsername(username) {
    return /^[a-zA-Z0-9._-]{3,64}$/.test(username);
}

function validateEmail(email) {
    return /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/.test(email);
}

export default function handler(req, res) {
    if (req.method !== 'GET') {
        return res.status(405).json({ error: 'Method not allowed' });
    }

    const { username, email } = req.query;

    // SECURE - Validate inputs
    if (!validateUsername(username)) {
        const errorXml = create({ version: '1.0' })
            .ele('error').txt('Invalid username').end();

        return res.status(400)
            .setHeader('Content-Type', 'application/xml')
            .send(errorXml);
    }

    if (!validateEmail(email)) {
        const errorXml = create({ version: '1.0' })
            .ele('error').txt('Invalid email').end();

        return res.status(400)
            .setHeader('Content-Type', 'application/xml')
            .send(errorXml);
    }

    // SECURE - Build XML with xmlbuilder2
    const xml = create({ version: '1.0' })
        .ele('user')
            .ele('username').txt(username).up()
            .ele('email').txt(email).up()
        .end({ prettyPrint: true });

    res.setHeader('Content-Type', 'application/xml');
    res.status(200).send(xml);
}

Why this works: Next.js API routes (pages/api/user.js) provide serverless-style endpoints where validation and error handling are critical since each request is isolated. Method checking (req.method !== 'GET') prevents CSRF via POST (though less relevant for XML APIs). Regex validation (^[a-zA-Z0-9._-]{3,64}$) blocks XML metacharacters and enforces length limits. xmlbuilder2 (create(), .ele(), .txt()) automatically escapes content, preventing injection.

Early validation (checking inputs before XML construction) provides fail-fast behavior - invalid requests return 400 immediately without expensive XML processing. res.setHeader('Content-Type', 'application/xml') ensures the response is parsed as XML, not HTML, preventing MIME confusion. Generic error messages ("Invalid username", "Invalid email") prevent information disclosure - attackers don't learn whether rejection was due to regex, length, or other checks.

Separate error XML (create().ele('error').txt('...')) provides consistent error format for clients. This pattern is ideal for Next.js 13+ App Router or Pages Router where API routes replace Express middleware, with automatic deployment to Vercel/serverless platforms.

SOAP with strong-soap Library

// SECURE - Using strong-soap library for SOAP
const soap = require('strong-soap').soap;

async function callSoapServiceSecure(userId, action) {
    // SECURE - Validate inputs
    if (!userId || typeof userId !== 'string' || userId.length > 50) {
        throw new Error('Invalid userId');
    }
    if (!action || typeof action !== 'string' || action.length > 50) {
        throw new Error('Invalid action');
    }

    const url = 'https://api.example.com/service?wsdl';
    const clientOptions = {};

    // SECURE - strong-soap handles XML construction
    const client = await soap.createClientAsync(url, clientOptions);

    // SECURE - Library escapes parameters
    const result = await client.GetUserDataAsync({
        UserId: userId,
        Action: action
    });

    return result;
}

// strong-soap library builds proper SOAP envelope with escaping

Why this works: strong-soap (successor to the deprecated soap module) automatically constructs SOAP envelopes with proper XML escaping, eliminating manual template literal construction like <UserId>${userId}</UserId>. The createClientAsync() method fetches the WSDL, parses the service definition, and generates type-safe method calls (GetUserDataAsync) that match the SOAP operations.

Parameter object ({ UserId: userId, Action: action }) is serialized to XML by the library, with automatic escaping of special characters - even if userId contains </UserId><Role>admin</Role>, it becomes &lt;/UserId&gt;&lt;Role&gt;.... Pre-validation (typeof userId !== 'string', length > 50) provides defense-in-depth and prevents DoS via extremely long strings. Async/await (await client.GetUserDataAsync()) enables clean error handling with try-catch.

The library parses SOAP responses into JavaScript objects, avoiding manual XML parsing vulnerabilities. Strong-soap is actively maintained (unlike the original soap), supporting SOAP 1.1/1.2, WS-Security, MTOM attachments. This pattern is ideal for enterprise SOAP integrations where manual envelope construction is error-prone and WSDL compliance is required.

Verification

After implementing the recommended secure patterns, verify the fix through multiple approaches:

  • Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
  • Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
  • Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
  • Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
  • Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
  • Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
  • Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
  • Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced

Verification

After implementing the recommended secure patterns, verify the fix through multiple approaches:

  • Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
  • Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
  • Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
  • Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
  • Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
  • Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
  • Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
  • Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced

Additional Resources