CWE-91: XML Injection - JavaScript
Overview
XML Injection in JavaScript/Node.js applications occurs when untrusted user input is used to construct XML documents without proper validation or escaping. Attackers can manipulate XML structure by injecting special characters like <, >, &, ', and ", leading to data corruption, authentication bypass, or information disclosure.
Primary Defence: Use XML builder libraries like xmlbuilder2 with proper API methods instead of string concatenation, validate and sanitize all user input before including in XML documents, escape XML special characters (<, >, &, ", ') using dedicated escaping functions, and implement schema validation to ensure XML structure integrity and prevent XML injection attacks.
Common JavaScript XML Vulnerability Scenarios:
- Building XML with template literals or string concatenation
- Using user input directly in XML elements or attributes
- SOAP/REST XML payloads with unsanitized data
- XML configuration files with user data
- Server-side rendering of XML responses
Popular Node.js XML Libraries:
- xmlbuilder2: Modern XML builder
- xml2js: XML to JavaScript object conversion
- fast-xml-parser: High-performance XML parser
- xmldom: W3C DOM implementation
- jstoxml: JSON to XML conversion
- he: HTML entity encoder/decoder (works for XML)
XML Special Characters Requiring Escaping:
<→<>→>&→&'→'"→"
Common Vulnerable Patterns
Template Literal Concatenation
// VULNERABLE - Direct template literal
function createUserXml(username, email) {
// VULNERABLE - User input directly in XML template
const xml = `<?xml version="1.0"?>
<user>
<username>${username}</username>
<email>${email}</email>
</user>`;
return xml;
}
// Attack: username = "</username><admin>true</admin><username>"
// Result: <username></username><admin>true</admin><username></username>
// Creates unintended <admin> element
Why this is vulnerable:
- No escaping of XML special characters
- Template literals allow injection
- Can modify XML structure
- Bypasses validation
Express.js REST API
// VULNERABLE - Express endpoint returning XML
const express = require('express');
const app = express();
app.get('/api/user', (req, res) => {
const { username, email } = req.query;
// VULNERABLE - Query parameters in XML
const xmlResponse = `<?xml version="1.0" encoding="UTF-8"?>
<response>
<user>
<name>${username}</name>
<email>${email}</email>
</user>
</response>`;
res.set('Content-Type', 'application/xml');
res.send(xmlResponse);
});
app.listen(3000);
// Attack: username = "<admin>true</admin>"
// Response contains: <name><admin>true</admin></name>
Why this is vulnerable:
- Express doesn't auto-escape XML
- Request parameters directly in XML
- No validation or sanitization
- Information disclosure possible
SOAP Request Construction
// VULNERABLE - Building SOAP XML manually
const axios = require('axios');
async function callSoapService(userId, action) {
// VULNERABLE - User input in SOAP envelope
const soapEnvelope = `<?xml version="1.0"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<GetUserData>
<UserId>${userId}</UserId>
<Action>${action}</Action>
</GetUserData>
</soap:Body>
</soap:Envelope>`;
const response = await axios.post('https://api.example.com/soap', soapEnvelope, {
headers: { 'Content-Type': 'text/xml' }
});
return response.data;
}
// Attack: userId = "</UserId><Role>admin</Role><UserId>"
// Injects admin role into SOAP request
Why this is vulnerable:
- SOAP envelope built with template literals
- Allows element injection
- Can escalate privileges
- Modify request structure
XML Configuration Files
// VULNERABLE - Writing XML config with user data
const fs = require('fs');
function saveUserSettings(username, theme, language) {
// VULNERABLE - User input in XML config
const configXml = `<?xml version="1.0"?>
<config>
<user>${username}</user>
<preferences>
<theme>${theme}</theme>
<language>${language}</language>
</preferences>
</config>`;
fs.writeFileSync('config.xml', configXml);
}
// Attack: theme = "</theme><admin_access>true</admin_access><theme>"
// Modifies configuration structure
Why this is vulnerable:
- Configuration files parsed by XML parser
- Persistent injection
- Can modify application behavior
- Privilege escalation
xml2js with String Building
// VULNERABLE - Building XML string before parsing
const xml2js = require('xml2js');
function createXmlResponse(data) {
// VULNERABLE - Building XML string manually
const xmlStr = `<response>
<status>success</status>
<data>${data}</data>
</response>`;
// Parse the vulnerable XML
xml2js.parseString(xmlStr, (err, result) => {
console.log(result);
});
return xmlStr;
}
// Attack: data = "</data><malicious>payload</malicious><data>"
// Injects malicious elements
Why this is vulnerable:
- Library doesn't escape string concatenation
- Injection before parsing
- No validation
- Parser accepts malformed structure
XPath Query Injection
// VULNERABLE - User input in XPath query
const xpath = require('xpath');
const dom = require('xmldom').DOMParser;
function findUserByName(xmlDoc, username) {
const doc = new dom().parseFromString(xmlDoc);
// VULNERABLE - XPath injection
const query = `//user[name='${username}']`;
const nodes = xpath.select(query, doc);
return nodes;
}
// Attack: username = "' or '1'='1"
// XPath: //user[name='' or '1'='1']
// Returns all users
Why this is vulnerable:
- XPath query built with template literals
- Boolean-based injection
- Bypasses authentication checks
- Information disclosure
Next.js API Route
// VULNERABLE - Next.js API route with XML
// pages/api/user.js
export default function handler(req, res) {
const { username, email } = req.query;
// VULNERABLE - Query parameters in XML
const xml = `<?xml version="1.0"?>
<user>
<username>${username}</username>
<email>${email}</email>
</user>`;
res.setHeader('Content-Type', 'application/xml');
res.status(200).send(xml);
}
Why this is vulnerable:
- Next.js doesn't escape XML automatically
- Query parameters directly in response
- No validation
- Framework doesn't prevent injection
jstoxml with Unsafe Data
// VULNERABLE - jstoxml with unvalidated input
const jstoxml = require('jstoxml');
function convertToXml(userData) {
// VULNERABLE - User data may contain XML
const obj = {
user: {
name: userData.name, // Could contain XML tags
bio: userData.bio // Could contain injection
}
};
// jstoxml doesn't escape if data already contains XML
return jstoxml.toXML(obj);
}
// Attack: userData.bio = "</bio><admin>true</admin><bio>"
Why this is vulnerable:
- Assumes clean input data
- Some libraries don't escape by default
- Database values can contain injections
- Persistent XSS-like attacks
Secure Patterns
xmlbuilder2 Library
// SECURE - Using xmlbuilder2 (recommended)
const { create } = require('xmlbuilder2');
function validateUsername(username) {
return /^[a-zA-Z0-9._-]{1,100}$/.test(username);
}
function validateEmail(email) {
return /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/.test(email);
}
function createUserXml(username, email) {
// SECURE - Validate inputs
if (!validateUsername(username)) {
throw new Error('Invalid username');
}
if (!validateEmail(email)) {
throw new Error('Invalid email');
}
// SECURE - Use xmlbuilder2 API
const doc = create({ version: '1.0' })
.ele('user')
.ele('username').txt(username).up() // Automatically escaped
.ele('email').txt(email).up()
.end({ prettyPrint: true });
return doc;
}
// Example usage:
// createUserXml("<script>alert('xss')</script>", "test@example.com")
// Result: <username><script>alert('xss')</script></username>
// Special characters properly escaped
Why this works: xmlbuilder2 (create(), .ele(), .txt()) automatically escapes XML special characters (<, >, &, ', ") when setting element text content via .txt(), preventing attackers from injecting closing tags like </username><admin>true</admin><username>. The .txt() method treats the parameter as character data, not markup - so even if username contains <admin>true</admin>, it becomes <admin>true</admin> in the output.
Regex validation provides defense-in-depth: the username pattern (^[a-zA-Z0-9._-]{1,100}$) blocks XML metacharacters before they reach the API, and email validation prevents addresses like admin@example.com</email><role>admin</role><email>. .up() navigation moves back to the parent element, enabling fluent API chaining for nested structures. end({ prettyPrint: true }) serializes the XML tree with indentation, producing human-readable output.
This pattern is immune to injection because xmlbuilder2 maintains a DOM-like tree internally and serializes it safely, never concatenating raw strings. Modern API with method chaining is more readable than manual DOM manipulation, making security reviews easier.
Express.js with xmlbuilder2
// SECURE - Express with validation and escaping
const express = require('express');
const { create } = require('xmlbuilder2');
const app = express();
function validateSearchQuery(query) {
return /^[a-zA-Z0-9\s]{1,50}$/.test(query);
}
app.get('/api/users/search', (req, res) => {
const { q } = req.query;
// SECURE - Validate input
if (!q || !validateSearchQuery(q)) {
const errorXml = create({ version: '1.0' })
.ele('error')
.txt('Invalid search query')
.end();
res.status(400)
.set('Content-Type', 'application/xml')
.send(errorXml);
return;
}
try {
// Simulate database search
const users = [
{ name: 'Alice', email: 'alice@example.com' },
{ name: 'Bob', email: 'bob@example.com' }
].filter(u => u.name.toLowerCase().includes(q.toLowerCase()));
// SECURE - Build XML with xmlbuilder2
const doc = create({ version: '1.0' })
.ele('response');
const usersElem = doc.ele('users');
users.forEach(user => {
usersElem.ele('user')
.ele('name').txt(user.name).up() // Auto-escaped
.ele('email').txt(user.email).up();
});
const xml = doc.end({ prettyPrint: true });
res.set('Content-Type', 'application/xml')
.send(xml);
} catch (error) {
// SECURE - Generic error message
console.error('Search error:', error);
const errorXml = create({ version: '1.0' })
.ele('error')
.txt('Search failed')
.end();
res.status(500)
.set('Content-Type', 'application/xml')
.send(errorXml);
}
});
app.listen(3000);
Why this works: Regex validation (^[a-zA-Z0-9\s]{1,50}$) blocks XML metacharacters (<, >, &, quotes) before they reach the XML API, preventing injection attempts like q=</name><admin>true</admin><name>. xmlbuilder2's .txt() method automatically escapes any remaining content, providing layered defense - even if validation is bypassed, escaping prevents structural changes.
Generic error messages ("Invalid search query", "Search failed") in XML responses prevent information disclosure - attackers don't learn whether rejection was due to regex mismatch, length limits, or database errors. .forEach() iteration with .ele('user') for each result creates well-formed XML - the API enforces proper nesting and closing tags. res.set('Content-Type', 'application/xml') ensures browsers and clients parse the response as XML (not HTML), preventing MIME confusion attacks.
Try-catch with server-side logging (console.error) captures exceptions for debugging while sending generic errors to clients, supporting troubleshooting without information disclosure. The declarative xmlbuilder2 structure makes the code auditable - reviewers can see the XML schema directly in the API chaining.
This pattern is ideal for Express REST APIs returning XML responses to legacy clients.
Manual Escaping with 'he' Library
// SECURE - Using 'he' library for XML escaping
const he = require('he');
function validateInput(str, maxLength = 100) {
return typeof str === 'string' && str.length <= maxLength;
}
function createUserXmlWithEscaping(username, email) {
// SECURE - Validate inputs
if (!validateInput(username, 100)) {
throw new Error('Invalid username');
}
if (!validateInput(email, 255)) {
throw new Error('Invalid email');
}
// SECURE - Use 'he' to escape XML entities
const safeUsername = he.encode(username, {
useNamedReferences: false,
encodeEverything: false
});
const safeEmail = he.encode(email, {
useNamedReferences: false,
encodeEverything: false
});
const xml = `<?xml version="1.0"?>
<user>
<username>${safeUsername}</username>
<email>${safeEmail}</email>
</user>`;
return xml;
}
// npm install he
Why this works: he.encode() is a lightweight, battle-tested library that escapes XML/HTML entities (< → <, > → >, & → &, ' → ', " → "), making it safe to embed user input in manually constructed XML strings. The useNamedReferences: false option produces numeric entities (' for ') instead of named ones ('), which are more widely supported across parsers. encodeEverything: false escapes only special characters, not all Unicode - use encodeEverything: true for maximum paranoia (escapes all non-ASCII).
Pre-validation (validateInput() with length checks) provides defense-in-depth - even if he.encode() has edge cases, the allowlist blocks malicious input. This pattern is useful when xmlbuilder2 or other DOM APIs are unavailable (e.g., legacy codebases, bundle size constraints). However, manual string concatenation is less preferred than xmlbuilder2 because developers might forget to escape a variable, or escape incorrectly.
he library (2.5KB minified + gzipped) is the successor to he from mathiasbynens, widely used in React, webpack, and other tools. Note: he.encode() is safe for element content and attribute values, but for complex scenarios (CDATA, XPath), xmlbuilder2 is safer. Use this pattern only when bundle size matters and you understand the escaping rules.
TypeScript with xmlbuilder2
// SECURE - TypeScript with xmlbuilder2 and validation
import { create } from 'xmlbuilder2';
interface UserData {
username: string;
email: string;
bio?: string;
}
class SecureXmlBuilder {
private readonly USERNAME_PATTERN = /^[a-zA-Z0-9._-]{3,64}$/;
private readonly EMAIL_PATTERN = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
private validateUsername(username: string): boolean {
return this.USERNAME_PATTERN.test(username);
}
private validateEmail(email: string): boolean {
return this.EMAIL_PATTERN.test(email);
}
public createUserXml(userData: UserData): string {
// SECURE - Validate inputs
if (!this.validateUsername(userData.username)) {
throw new Error('Invalid username format');
}
if (!this.validateEmail(userData.email)) {
throw new Error('Invalid email format');
}
if (userData.bio && userData.bio.length > 500) {
throw new Error('Bio too long');
}
// SECURE - xmlbuilder2 handles escaping
const doc = create({ version: '1.0', encoding: 'UTF-8' })
.ele('user')
.ele('username').txt(userData.username).up()
.ele('email').txt(userData.email).up();
if (userData.bio) {
doc.ele('bio').txt(userData.bio).up();
}
return doc.end({ prettyPrint: true });
}
}
export default SecureXmlBuilder;
Why this works: TypeScript's type system (interface UserData, string parameters) provides compile-time safety - the method signature enforces that username and email are strings, preventing accidental number or object injection. Class-based encapsulation (private methods, readonly patterns) centralizes validation logic, ensuring all XML generation paths enforce the same strict rules. Regex validation (^[a-zA-Z0-9._-]{3,64}$ for usernames, email pattern) blocks XML metacharacters before they reach xmlbuilder2. Length limits (bio.length > 500) prevent DoS via extremely long values.
Optional properties (bio?: string) with runtime checks (if (userData.bio)) handle optional fields safely - the XML only includes <bio> if provided. xmlbuilder2's .txt() method automatically escapes content, so even if bio contains <script>, it becomes <script>. Explicit encoding (encoding: 'UTF-8') ensures international characters are handled correctly.
The export default pattern enables easy integration with Express, Next.js, or other frameworks. TypeScript's type inference makes the code self-documenting - reviewers see that createUserXml returns string (the XML document). This pattern is ideal for TypeScript projects where type safety + security are critical.
Next.js API Route with Security
// SECURE - Next.js API route with validation
// pages/api/user.js
import { create } from 'xmlbuilder2';
function validateUsername(username) {
return /^[a-zA-Z0-9._-]{3,64}$/.test(username);
}
function validateEmail(email) {
return /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/.test(email);
}
export default function handler(req, res) {
if (req.method !== 'GET') {
return res.status(405).json({ error: 'Method not allowed' });
}
const { username, email } = req.query;
// SECURE - Validate inputs
if (!validateUsername(username)) {
const errorXml = create({ version: '1.0' })
.ele('error').txt('Invalid username').end();
return res.status(400)
.setHeader('Content-Type', 'application/xml')
.send(errorXml);
}
if (!validateEmail(email)) {
const errorXml = create({ version: '1.0' })
.ele('error').txt('Invalid email').end();
return res.status(400)
.setHeader('Content-Type', 'application/xml')
.send(errorXml);
}
// SECURE - Build XML with xmlbuilder2
const xml = create({ version: '1.0' })
.ele('user')
.ele('username').txt(username).up()
.ele('email').txt(email).up()
.end({ prettyPrint: true });
res.setHeader('Content-Type', 'application/xml');
res.status(200).send(xml);
}
Why this works: Next.js API routes (pages/api/user.js) provide serverless-style endpoints where validation and error handling are critical since each request is isolated. Method checking (req.method !== 'GET') prevents CSRF via POST (though less relevant for XML APIs). Regex validation (^[a-zA-Z0-9._-]{3,64}$) blocks XML metacharacters and enforces length limits. xmlbuilder2 (create(), .ele(), .txt()) automatically escapes content, preventing injection.
Early validation (checking inputs before XML construction) provides fail-fast behavior - invalid requests return 400 immediately without expensive XML processing. res.setHeader('Content-Type', 'application/xml') ensures the response is parsed as XML, not HTML, preventing MIME confusion. Generic error messages ("Invalid username", "Invalid email") prevent information disclosure - attackers don't learn whether rejection was due to regex, length, or other checks.
Separate error XML (create().ele('error').txt('...')) provides consistent error format for clients. This pattern is ideal for Next.js 13+ App Router or Pages Router where API routes replace Express middleware, with automatic deployment to Vercel/serverless platforms.
SOAP with strong-soap Library
// SECURE - Using strong-soap library for SOAP
const soap = require('strong-soap').soap;
async function callSoapServiceSecure(userId, action) {
// SECURE - Validate inputs
if (!userId || typeof userId !== 'string' || userId.length > 50) {
throw new Error('Invalid userId');
}
if (!action || typeof action !== 'string' || action.length > 50) {
throw new Error('Invalid action');
}
const url = 'https://api.example.com/service?wsdl';
const clientOptions = {};
// SECURE - strong-soap handles XML construction
const client = await soap.createClientAsync(url, clientOptions);
// SECURE - Library escapes parameters
const result = await client.GetUserDataAsync({
UserId: userId,
Action: action
});
return result;
}
// strong-soap library builds proper SOAP envelope with escaping
Why this works: strong-soap (successor to the deprecated soap module) automatically constructs SOAP envelopes with proper XML escaping, eliminating manual template literal construction like <UserId>${userId}</UserId>. The createClientAsync() method fetches the WSDL, parses the service definition, and generates type-safe method calls (GetUserDataAsync) that match the SOAP operations.
Parameter object ({ UserId: userId, Action: action }) is serialized to XML by the library, with automatic escaping of special characters - even if userId contains </UserId><Role>admin</Role>, it becomes </UserId><Role>.... Pre-validation (typeof userId !== 'string', length > 50) provides defense-in-depth and prevents DoS via extremely long strings. Async/await (await client.GetUserDataAsync()) enables clean error handling with try-catch.
The library parses SOAP responses into JavaScript objects, avoiding manual XML parsing vulnerabilities. Strong-soap is actively maintained (unlike the original soap), supporting SOAP 1.1/1.2, WS-Security, MTOM attachments. This pattern is ideal for enterprise SOAP integrations where manual envelope construction is error-prone and WSDL compliance is required.
Verification
After implementing the recommended secure patterns, verify the fix through multiple approaches:
- Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
- Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
- Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
- Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
- Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
- Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
- Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
- Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced
Verification
After implementing the recommended secure patterns, verify the fix through multiple approaches:
- Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
- Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
- Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
- Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
- Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
- Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
- Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
- Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced