CWE-80: Improper Neutralization of Script-Related HTML Tags (Basic XSS)
Overview
CWE-80 is a specific type of Cross-Site Scripting (XSS) that occurs when an application fails to properly neutralize or incorrectly neutralizes script-related HTML tags (like <script>, <img>, <iframe>) in web page output. This allows attackers to inject malicious scripts that execute in victims' browsers.
Note: CWE-80 is a more specific subset of CWE-79 (general XSS). While CWE-79 covers all XSS contexts (HTML, JavaScript, CSS, URLs), CWE-80 specifically focuses on basic HTML tag injection. The remediation strategies are nearly identical.
OWASP Classification
A05:2025 - Injection
Risk
High to Critical: Attackers can execute arbitrary JavaScript in the victim's browser, leading to:
- Session hijacking (cookie theft)
- Credential theft via fake login forms
- Page defacement
- Malware distribution
- Performing actions on behalf of the victim
- All user data accessible to the application is at risk
Remediation Steps
Core principle: Never include untrusted input in HTML output without context-appropriate output encoding so it cannot be interpreted as executable markup or script.
Trace the Data Path
Analyze how untrusted data reaches web page output:
- Source: Identify where untrusted data enters (user input, external files, databases, network requests, cookies, headers)
- Data Flow: Trace transformations between source and output
- Sink: Locate where data is rendered (response writing, template rendering, DOM manipulation)
- Output Context: Determine where data appears (HTML body, attribute, JavaScript, CSS, URL)
- Missing Encoding: Check for encoding/escaping functions (or their absence)
Apply Context-Aware Output Encoding (Primary Defense)
Always encode untrusted data based on output context:
HTML Body Context:
- Encode
<,>,&,",'as HTML entities <→<,>→>,&→&- Use framework-provided encoding functions
HTML Attribute Context:
- Always quote attributes (
<div class="value">, not<div class=value>) - Encode quotes and HTML special characters
- Avoid placing untrusted data in event handler attributes (
onclick,onerror, etc.)
JavaScript Context:
- Avoid placing untrusted data directly in
<script>tags - If unavoidable, use JavaScript encoding (escape quotes, backslashes, newlines)
- Better: Pass data via data attributes and read with JavaScript
URL Context:
- URL-encode special characters (percent-encoding)
- Validate URL scheme (only allow
http://andhttps://) - Never allow
javascript:,data:, orvbscript:URLs
CSS Context:
- Avoid untrusted data in CSS contexts
- If unavoidable, use CSS encoding
- Never allow
expression()orimportdirectives
Use Safe APIs and Avoid Dangerous Functions
Leverage framework protections:
Safe DOM Manipulation:
// SAFE - Sets text content only, no HTML parsing
element.textContent = userInput;
element.innerText = userInput;
element.setAttribute('data-value', userInput);
// DANGEROUS - Avoid with untrusted data
element.innerHTML = userInput; // NO
document.write(userInput); // NO
Template Engines with Auto-Escaping:
- Server-side: Thymeleaf, Razor, Jinja2 (auto-escape by default)
- Client-side: React, Vue, Angular (auto-escape by default)
- Verify auto-escaping is enabled and not bypassed
Never Use Framework Security Bypasses with Untrusted Data
Modern frameworks have "escape hatches" that bypass XSS protection. Never use these with untrusted data:
- React:
dangerouslySetInnerHTML - Angular:
bypassSecurityTrustHtml(),bypassSecurityTrustScript(),bypassSecurityTrustUrl() - Vue.js:
v-htmldirective - Jinja2:
{{ data | safe }} - Thymeleaf:
th:utext - Razor:
@Html.Raw()
Key Principle: Any API with "unsafe", "raw", "bypass", "dangerously", or "trust" in the name is a security risk.
Add Input Validation and CSP (Defense in Depth)
Input Validation (supplementary):
- Validate expected data format (email, phone, numeric, alphanumeric)
- Use allowlists for enumerated values
- Reject input containing
<script>, event handlers, orjavascript:URLs - Warning: Input validation alone is insufficient - encoding is still required
Content Security Policy (CSP):
- Implement strict CSP header to prevent inline scripts
- Disallow
unsafe-inlineandunsafe-eval - Use nonces or hashes for legitimate inline scripts
- Restrict script sources to trusted domains
- CSP is defense-in-depth, not a replacement for encoding
Test with XSS Payloads
Verify your encoding:
Basic XSS:
<script>alert(1)</script><img src=x onerror='alert(1)'><svg onload=alert(1)>
Context-specific:
- Attribute injection:
" onclick="alert(1)" - JavaScript injection:
'; alert(1); // - URL injection:
javascript:alert(1)
Verification:
- Verify payloads displayed as text (not executed)
- Check browser console for errors
- Inspect encoded output in DevTools
- Ensure legitimate functionality works
- Run automated scanners (OWASP ZAP, Burp Suite)
Common Vulnerable Patterns
Direct HTML Injection
<!-- VULNERABLE - User input directly in HTML body -->
<div>Welcome, {{ username }}</div>
<!-- If username = <script>alert('XSS')</script> -->
<!-- Renders as: Welcome, <script>alert('XSS')</script> -->
<!-- Script executes! -->
Attribute Injection
<!-- VULNERABLE - Unquoted attribute -->
<input type="text" value={{ userValue }}>
<!-- If userValue = "x onload=alert('XSS')" -->
<!-- Renders as: <input type="text" value=x onload=alert('XSS')> -->
<!-- VULNERABLE - Even quoted attributes can be exploited -->
<div id="{{ userId }}" onclick="loadUser('{{ userId }}')"></div>
<!-- If userId = x'); alert('XSS'); // -->
JavaScript Context Injection
// VULNERABLE - User input in JavaScript
<script>
var username = '{{ userInput }}';
</script>
// If userInput = '; alert('XSS'); //
// Becomes: var username = ''; alert('XSS'); //';
URL Injection
<!-- VULNERABLE - Unvalidated URL -->
<a href="{{ userUrl }}">Click here</a>
<!-- If userUrl = javascript:alert('XSS') -->
<!-- Clicking executes JavaScript -->
innerHTML and document.write
// VULNERABLE - Using innerHTML with user data
document.getElementById('output').innerHTML = userInput;
// VULNERABLE - document.write executes scripts
document.write(userInput);
// VULNERABLE - eval executes arbitrary code
eval(userInput);
Secure Patterns
Use Framework Auto-Escaping
Modern frameworks provide automatic HTML escaping by default:
- React:
<div>{userInput}</div>- auto-escapes - Angular:
{{ userInput }}- auto-escapes - Vue:
{{ userInput }}- auto-escapes - Jinja2 (Python):
{{ userInput }}- auto-escapes - Thymeleaf (Java):
<div th:text="${userInput}"></div>- auto-escapes - Razor (C#):
@userInput- auto-escapes
Always verify that auto-escaping is enabled and not bypassed with dangerous APIs.
Why this works: Modern web frameworks use template engines that automatically HTML-encode all variable interpolations by default. When you write {{ userInput }} or {userInput} in a template, the framework processes the template, identifies all variable placeholders, and applies HTML entity encoding before rendering. This converts dangerous characters like <, >, &, ", and ' into their HTML entity equivalents (<, >, &, etc.), preventing browsers from interpreting user input as executable markup. The encoding happens at render time in the output context, ensuring even if malicious content like <script>alert(1)</script> is in the variable, it's displayed as plain text. This secure-by-default design makes XSS much harder to introduce accidentally - developers must explicitly opt out (using dangerouslySetInnerHTML, v-html, @Html.Raw(), etc.) to render raw HTML, making these dangerous usages easy to spot in code reviews. The auto-escaping is context-aware in advanced frameworks, adjusting encoding rules based on whether you're in HTML content, attributes, JavaScript, or URLs.
Manual Encoding When Needed
When frameworks are unavailable or for specific contexts, use language-appropriate encoding:
- HTML context: HTML entity encoding (
<→<,>→>) - JavaScript context: JavaScript string escaping
- URL context: URL/percent encoding
- Attribute context: Quote attributes and encode special characters
See language-specific guidance:
Why this works: Manual encoding functions apply context-specific transformations that neutralize special characters before they reach the browser. HTML encoding (like htmlspecialchars() in PHP, html.escape() in Python, or Encode.forHtml() in Java) converts characters with special meaning in HTML (<>"'&) into HTML entities. For example, <script> becomes <script>, which browsers render as plain text rather than an executable script tag. Different contexts require different encoding: JavaScript contexts need Unicode escape sequences (\xHH) to handle quotes and control characters, URL contexts need percent-encoding (%20 for spaces) to prevent parameter injection, and HTML attributes require both quote encoding and entity encoding. Using language-standard encoding libraries (rather than writing custom regex replacements) ensures comprehensive character coverage, handles edge cases like Unicode characters, and receives security updates as new attack vectors are discovered. The key principle is encoding at the output boundary - when data transitions from application code to HTML, JavaScript, URL, or CSS contexts.
Safe DOM Manipulation
Use text-only APIs that don't parse HTML:
// SAFE - Sets text only, no HTML parsing
element.textContent = userInput;
element.innerText = userInput;
element.setAttribute('data-value', userInput);
// DANGEROUS - Avoid these
element.innerHTML = userInput; // NO
document.write(userInput); // NO
Why this works: Modern DOM APIs provide type-safe methods that treat content as plain text rather than parsing it as HTML. When you set textContent, the browser stores the value as text node content without any HTML interpretation - there's no parsing step where malicious markup could be executed. This is fundamentally different from innerHTML, which parses the string as HTML, creates DOM elements, and executes any embedded scripts or event handlers. Using textContent, createTextNode(), or createElement() + appendChild() gives you direct, programmatic control over the DOM structure, making it impossible for attackers to inject executable code. The setAttribute() method is also safe for data attributes because attribute values are treated as text (though you should validate attribute names and be careful with URL and event handler attributes like href and onclick). This approach is the gold standard for client-side security: it's type-safe (you're working with DOM objects, not strings), performant (no parsing overhead), and inherently immune to XSS injection.
URL Validation
Always validate URL schemes before rendering links:
// Validate URL scheme
function isSafeUrl(url) {
try {
const parsed = new URL(url, window.location.origin);
return parsed.protocol === 'http:' || parsed.protocol === 'https:';
} catch {
return false;
}
}
const href = isSafeUrl(userUrl) ? userUrl : '#';
link.setAttribute('href', href);
Why this works: URL validation prevents javascript: URL attacks where attackers inject executable code in link URLs. The URL constructor parses the URL string and exposes its components (protocol, host, path, parameters), allowing you to validate the protocol against a allowlist of safe schemes (http:, https:, mailto:, etc.). If an attacker provides javascript:alert('XSS') as a URL, the parsed protocol will be javascript:, failing the allowlist check. The pattern then falls back to a safe # anchor. The URL constructor also handles relative URLs correctly by using the second parameter as a base URL, preventing confusion between relative paths and protocol-relative URLs. Combining protocol validation with proper HTML encoding of the entire href attribute provides defense-in-depth: even if validation is bypassed, HTML encoding prevents the URL from breaking out of the attribute context. This approach is critical for user-generated links, redirect URLs, and any scenario where URLs come from untrusted sources.
Common Mistakes to Avoid
-
Encoding too late: Encode at output time, not storage time
- Storing encoded data prevents search, sorting, and breaks functionality
- Encode when rendering to HTML
-
Wrong encoding for context: HTML encoding in JavaScript won't prevent XSS
- HTML encoding doesn't escape JavaScript special characters
- Use context-appropriate encoding (JavaScript encoding for
<script>tags) - Example:
var name = '{{ userInput }}';can be broken with'; alert('XSS'); //
-
Trusting client-side validation: Always validate/encode on server
- Client-side validation can be bypassed via browser DevTools or HTTP clients
- Server-side encoding is mandatory
-
Filtering instead of encoding: Denylist filters are incomplete
- Attackers bypass filters with tricks like:
<scr<script>ipt>,<img src=x onerror=alert(1)> - Use allowlist validation + proper encoding
- Attackers bypass filters with tricks like:
-
Double encoding: Don't encode already-encoded data
- Encoding twice results in visible HTML entities:
&lt;script&gt;instead of<script> - Track whether data has already been encoded
- Encoding twice results in visible HTML entities:
Language-Specific Guidance
- C# - Regex.Escape, XmlDocument validation
- Java - Pattern.quote, DocumentBuilder with secure features
- JavaScript/Node.js - DOMPurify, validator.js, escape-html
- Perl - quotemeta, XML::LibXML with DTD disabled
- PHP - preg_quote, htmlspecialchars, XML validation
- Python - re.escape, html.escape, defusedxml for safe parsing
Dynamic Scan Guidance
For guidance on remediating this CWE when detected by dynamic (DAST) scanners:
- Dynamic Scan Guidance - Analyzing DAST findings and mapping to source code