CWE-80: Improper Neutralization of Script-Related HTML Tags (Basic XSS)

Overview

CWE-80 is a specific type of Cross-Site Scripting (XSS) that occurs when an application fails to properly neutralize or incorrectly neutralizes script-related HTML tags (like <script>, <img>, <iframe>) in web page output. This allows attackers to inject malicious scripts that execute in victims' browsers.

Note: CWE-80 is a more specific subset of CWE-79 (general XSS). While CWE-79 covers all XSS contexts (HTML, JavaScript, CSS, URLs), CWE-80 specifically focuses on basic HTML tag injection. The remediation strategies are nearly identical.

OWASP Classification

A05:2025 - Injection

Risk

High to Critical: Attackers can execute arbitrary JavaScript in the victim's browser, leading to:

Session hijacking (cookie theft)
Credential theft via fake login forms
Page defacement
Malware distribution
Performing actions on behalf of the victim
All user data accessible to the application is at risk

Remediation Steps

Core principle: Never include untrusted input in HTML output without context-appropriate output encoding so it cannot be interpreted as executable markup or script.

Trace the Data Path

Analyze how untrusted data reaches web page output:

Source: Identify where untrusted data enters (user input, external files, databases, network requests, cookies, headers)
Data Flow: Trace transformations between source and output
Sink: Locate where data is rendered (response writing, template rendering, DOM manipulation)
Output Context: Determine where data appears (HTML body, attribute, JavaScript, CSS, URL)
Missing Encoding: Check for encoding/escaping functions (or their absence)

Apply Context-Aware Output Encoding (Primary Defense)

Always encode untrusted data based on output context:

HTML Body Context:

Encode <, >, &, ", ' as HTML entities
< → <, > → >, & → &
Use framework-provided encoding functions

HTML Attribute Context:

Always quote attributes (<div class="value">, not <div class=value>)
Encode quotes and HTML special characters
Avoid placing untrusted data in event handler attributes (onclick, onerror, etc.)

JavaScript Context:

Avoid placing untrusted data directly in <script> tags
If unavoidable, use JavaScript encoding (escape quotes, backslashes, newlines)
Better: Pass data via data attributes and read with JavaScript

URL Context:

URL-encode special characters (percent-encoding)
Validate URL scheme (only allow http:// and https://)
Never allow javascript:, data:, or vbscript: URLs

CSS Context:

Avoid untrusted data in CSS contexts
If unavoidable, use CSS encoding
Never allow expression() or import directives

Use Safe APIs and Avoid Dangerous Functions

Leverage framework protections:

Safe DOM Manipulation:

// SAFE - Sets text content only, no HTML parsing
element.textContent = userInput;
element.innerText = userInput;
element.setAttribute('data-value', userInput);

// DANGEROUS - Avoid with untrusted data
element.innerHTML = userInput;  // NO
document.write(userInput);      // NO

Template Engines with Auto-Escaping:

Server-side: Thymeleaf, Razor, Jinja2 (auto-escape by default)
Client-side: React, Vue, Angular (auto-escape by default)
Verify auto-escaping is enabled and not bypassed

Never Use Framework Security Bypasses with Untrusted Data

Modern frameworks have "escape hatches" that bypass XSS protection. Never use these with untrusted data:

React: dangerouslySetInnerHTML
Angular: bypassSecurityTrustHtml(), bypassSecurityTrustScript(), bypassSecurityTrustUrl()
Vue.js: v-html directive
Jinja2: {{ data | safe }}
Thymeleaf: th:utext
Razor: @Html.Raw()

Key Principle: Any API with "unsafe", "raw", "bypass", "dangerously", or "trust" in the name is a security risk.

Add Input Validation and CSP (Defense in Depth)

Input Validation (supplementary):

Validate expected data format (email, phone, numeric, alphanumeric)
Use allowlists for enumerated values
Reject input containing <script>, event handlers, or javascript: URLs
Warning: Input validation alone is insufficient - encoding is still required

Content Security Policy (CSP):

Implement strict CSP header to prevent inline scripts
Disallow unsafe-inline and unsafe-eval
Use nonces or hashes for legitimate inline scripts
Restrict script sources to trusted domains
CSP is defense-in-depth, not a replacement for encoding

Test with XSS Payloads

Verify your encoding:

Basic XSS:

<script>alert(1)</script>
<img src=x onerror='alert(1)'>
<svg onload=alert(1)>

Context-specific:

Attribute injection: " onclick="alert(1)"
JavaScript injection: '; alert(1); //
URL injection: javascript:alert(1)

Verification:

Verify payloads displayed as text (not executed)
Check browser console for errors
Inspect encoded output in DevTools
Ensure legitimate functionality works
Run automated scanners (OWASP ZAP, Burp Suite)

Common Vulnerable Patterns

Direct HTML Injection

<!-- VULNERABLE - User input directly in HTML body -->
<div>Welcome, {{ username }}</div>

<!-- If username = <script>alert('XSS')</script> -->
<!-- Renders as: Welcome, <script>alert('XSS')</script> -->
<!-- Script executes! -->

Attribute Injection

<!-- VULNERABLE - Unquoted attribute -->
<input type="text" value={{ userValue }}>

<!-- If userValue = "x onload=alert('XSS')" -->
<!-- Renders as: <input type="text" value=x onload=alert('XSS')> -->

<!-- VULNERABLE - Even quoted attributes can be exploited -->
<div id="{{ userId }}" onclick="loadUser('{{ userId }}')"></div>
<!-- If userId = x'); alert('XSS'); // -->

JavaScript Context Injection

// VULNERABLE - User input in JavaScript
<script>
  var username = '{{ userInput }}';
</script>

// If userInput = '; alert('XSS'); //
// Becomes: var username = ''; alert('XSS'); //';

URL Injection

<!-- VULNERABLE - Unvalidated URL -->
<a href="{{ userUrl }}">Click here</a>

<!-- If userUrl = javascript:alert('XSS') -->
<!-- Clicking executes JavaScript -->

innerHTML and document.write

// VULNERABLE - Using innerHTML with user data
document.getElementById('output').innerHTML = userInput;

// VULNERABLE - document.write executes scripts
document.write(userInput);

// VULNERABLE - eval executes arbitrary code
eval(userInput);

Secure Patterns

Use Framework Auto-Escaping

Modern frameworks provide automatic HTML escaping by default:

React: <div>{userInput}</div> - auto-escapes
Angular: {{ userInput }} - auto-escapes
Vue: {{ userInput }} - auto-escapes
Jinja2 (Python): {{ userInput }} - auto-escapes
Thymeleaf (Java): <div th:text="${userInput}"></div> - auto-escapes
Razor (C#): @userInput - auto-escapes

Always verify that auto-escaping is enabled and not bypassed with dangerous APIs.

Why this works: Modern web frameworks use template engines that automatically HTML-encode all variable interpolations by default. When you write {{ userInput }} or {userInput} in a template, the framework processes the template, identifies all variable placeholders, and applies HTML entity encoding before rendering. This converts dangerous characters like <, >, &, ", and ' into their HTML entity equivalents (<, >, &, etc.), preventing browsers from interpreting user input as executable markup. The encoding happens at render time in the output context, ensuring even if malicious content like <script>alert(1)</script> is in the variable, it's displayed as plain text. This secure-by-default design makes XSS much harder to introduce accidentally - developers must explicitly opt out (using dangerouslySetInnerHTML, v-html, @Html.Raw(), etc.) to render raw HTML, making these dangerous usages easy to spot in code reviews. The auto-escaping is context-aware in advanced frameworks, adjusting encoding rules based on whether you're in HTML content, attributes, JavaScript, or URLs.

Manual Encoding When Needed

When frameworks are unavailable or for specific contexts, use language-appropriate encoding:

HTML context: HTML entity encoding (< → <, > → >)
JavaScript context: JavaScript string escaping
URL context: URL/percent encoding
Attribute context: Quote attributes and encode special characters

See language-specific guidance:

Why this works: Manual encoding functions apply context-specific transformations that neutralize special characters before they reach the browser. HTML encoding (like htmlspecialchars() in PHP, html.escape() in Python, or Encode.forHtml() in Java) converts characters with special meaning in HTML (<>"'&) into HTML entities. For example, <script> becomes <script>, which browsers render as plain text rather than an executable script tag. Different contexts require different encoding: JavaScript contexts need Unicode escape sequences (\xHH) to handle quotes and control characters, URL contexts need percent-encoding (%20 for spaces) to prevent parameter injection, and HTML attributes require both quote encoding and entity encoding. Using language-standard encoding libraries (rather than writing custom regex replacements) ensures comprehensive character coverage, handles edge cases like Unicode characters, and receives security updates as new attack vectors are discovered. The key principle is encoding at the output boundary - when data transitions from application code to HTML, JavaScript, URL, or CSS contexts.

Safe DOM Manipulation

Use text-only APIs that don't parse HTML:

// SAFE - Sets text only, no HTML parsing
element.textContent = userInput;
element.innerText = userInput;
element.setAttribute('data-value', userInput);

// DANGEROUS - Avoid these
element.innerHTML = userInput;  // NO
document.write(userInput);      // NO

Why this works: Modern DOM APIs provide type-safe methods that treat content as plain text rather than parsing it as HTML. When you set textContent, the browser stores the value as text node content without any HTML interpretation - there's no parsing step where malicious markup could be executed. This is fundamentally different from innerHTML, which parses the string as HTML, creates DOM elements, and executes any embedded scripts or event handlers. Using textContent, createTextNode(), or createElement() + appendChild() gives you direct, programmatic control over the DOM structure, making it impossible for attackers to inject executable code. The setAttribute() method is also safe for data attributes because attribute values are treated as text (though you should validate attribute names and be careful with URL and event handler attributes like href and onclick). This approach is the gold standard for client-side security: it's type-safe (you're working with DOM objects, not strings), performant (no parsing overhead), and inherently immune to XSS injection.

URL Validation

Always validate URL schemes before rendering links:

// Validate URL scheme
function isSafeUrl(url) {
  try {
    const parsed = new URL(url, window.location.origin);
    return parsed.protocol === 'http:' || parsed.protocol === 'https:';
  } catch {
    return false;
  }
}

const href = isSafeUrl(userUrl) ? userUrl : '#';
link.setAttribute('href', href);

Why this works: URL validation prevents javascript: URL attacks where attackers inject executable code in link URLs. The URL constructor parses the URL string and exposes its components (protocol, host, path, parameters), allowing you to validate the protocol against a allowlist of safe schemes (http:, https:, mailto:, etc.). If an attacker provides javascript:alert('XSS') as a URL, the parsed protocol will be javascript:, failing the allowlist check. The pattern then falls back to a safe # anchor. The URL constructor also handles relative URLs correctly by using the second parameter as a base URL, preventing confusion between relative paths and protocol-relative URLs. Combining protocol validation with proper HTML encoding of the entire href attribute provides defense-in-depth: even if validation is bypassed, HTML encoding prevents the URL from breaking out of the attribute context. This approach is critical for user-generated links, redirect URLs, and any scenario where URLs come from untrusted sources.

Common Mistakes to Avoid

Encoding too late: Encode at output time, not storage time
- Storing encoded data prevents search, sorting, and breaks functionality
- Encode when rendering to HTML
Wrong encoding for context: HTML encoding in JavaScript won't prevent XSS
- HTML encoding doesn't escape JavaScript special characters
- Use context-appropriate encoding (JavaScript encoding for <script> tags)
- Example: var name = '{{ userInput }}'; can be broken with '; alert('XSS'); //
Trusting client-side validation: Always validate/encode on server
- Client-side validation can be bypassed via browser DevTools or HTTP clients
- Server-side encoding is mandatory
Filtering instead of encoding: Denylist filters are incomplete
- Attackers bypass filters with tricks like: <scr<script>ipt>, <img src=x onerror=alert(1)>
- Use allowlist validation + proper encoding
Double encoding: Don't encode already-encoded data
- Encoding twice results in visible HTML entities: &lt;script&gt; instead of <script>
- Track whether data has already been encoded

Language-Specific Guidance

C# - Regex.Escape, XmlDocument validation
Java - Pattern.quote, DocumentBuilder with secure features
JavaScript/Node.js - DOMPurify, validator.js, escape-html
Perl - quotemeta, XML::LibXML with DTD disabled
PHP - preg_quote, htmlspecialchars, XML validation
Python - re.escape, html.escape, defusedxml for safe parsing

Dynamic Scan Guidance

For guidance on remediating this CWE when detected by dynamic (DAST) scanners:

Dynamic Scan Guidance - Analyzing DAST findings and mapping to source code