CWE-79: Cross-Site Scripting (XSS) - PHP
Overview
XSS occurs when untrusted data is included in web output without proper encoding. PHP provides multiple encoding functions, and frameworks like Laravel and Symfony offer additional protection.
Primary Defence: Use htmlspecialchars() with ENT_QUOTES | ENT_HTML5 and UTF-8 encoding for all user-controlled output in HTML contexts, use framework-specific auto-escaping features (Laravel Blade's {{ }}, Twig's {{ }}), implement context-appropriate encoding for JavaScript, URL, and CSS contexts, set Content-Type headers and X-Content-Type-Options: nosniff, and use Content Security Policy (CSP) headers to prevent XSS attacks.
Common Vulnerable Patterns
Direct Echo of User Input
<?php
// VULNERABLE - No encoding
$name = $_GET['name'];
echo "<h1>Welcome, $name</h1>";
// VULNERABLE - Interpolation without encoding
$comment = $_POST['comment'];
?>
<div class="comment"><?php echo $comment; ?></div>
Why this is vulnerable: Directly echoing user input without htmlspecialchars() allows attackers to inject HTML tags and JavaScript, such as <script>alert(document.cookie)</script>, which executes in victims' browsers to steal sessions or perform actions.
Building HTML Without Escaping
<?php
// VULNERABLE - String concatenation
function displayUser($userId) {
$user = getUser($userId);
$html = '<div class="profile">';
$html .= '<h2>' . $user['name'] . '</h2>';
$html .= '<p>' . $user['bio'] . '</p>';
$html .= '</div>';
return $html; // No escaping!
}
Why this is vulnerable: String concatenation to build HTML without escaping user data allows XSS attacks, as the raw name and bio can contain tags like <img src=x onerror=alert(1)> that execute JavaScript when the HTML is rendered.
JavaScript Context Without Escaping
<?php
<script>
// VULNERABLE - Can break out with quotes
var message = '<?php echo $_GET['msg']; ?>';
alert(message);
</script>
Why this is vulnerable: Injecting unescaped data into JavaScript strings allows attackers to break out using quotes and inject code like '; alert(document.cookie); //, executing arbitrary JavaScript in the user's session.
Using print_r or var_dump on User Data
Why this is vulnerable: print_r() and var_dump() output raw HTML without encoding, so if user input contains <script> tags or other HTML, it renders directly in the page, allowing XSS attacks through any user-controlled parameter.
Secure Patterns
htmlspecialchars() with Correct Flags
<?php
// SECURE - Proper HTML encoding
$name = $_GET['name'] ?? 'Guest';
$safeName = htmlspecialchars($name, ENT_QUOTES, 'UTF-8');
echo "<h1>Welcome, $safeName</h1>";
// SECURE - Function wrapper
function h($string) {
return htmlspecialchars($string, ENT_QUOTES | ENT_HTML5, 'UTF-8');
}
$userBio = $_POST['bio'] ?? '';
?>
<div class="bio"><?= h($userBio) ?></div>
Why this works:
PHP's htmlspecialchars() function converts dangerous HTML characters (<, >, &, ", ') into their HTML entity equivalents (<, >, &, ", ') before sending output to the browser. This conversion happens during the rendering phase when the PHP script executes, transforming any injected script tags or HTML into harmless text that the browser displays rather than executes. The ENT_QUOTES flag is critical because it encodes both single and double quotes, preventing attackers from breaking out of HTML attributes like <input value="<?= $userInput ?>"> using quote injection. The ENT_HTML5 flag ensures compatibility with modern HTML5 parsing rules, handling edge cases in character encoding that older encoding methods might miss. Specifying UTF-8 encoding prevents character set manipulation attacks where attackers use multi-byte characters to bypass filters. The wrapper function h() pattern makes secure encoding convenient and easy to apply consistently throughout your codebase - developers can simply wrap any user input with h() rather than remembering the full function signature and flags. This approach is secure by default for HTML content contexts, though you still need different encoding for JavaScript, CSS, or URL contexts.
Important Flags:
ENT_QUOTES: Encode both single and double quotesENT_HTML5: Use HTML5 encoding rulesUTF-8: Specify character encoding
htmlentities() for More Comprehensive Encoding
<?php
// SECURE - Encodes all applicable characters
function escape($string) {
return htmlentities($string, ENT_QUOTES | ENT_HTML5, 'UTF-8');
}
$userComment = $_POST['comment'];
?>
<p><?= escape($userComment) ?></p>
Why this works:
The htmlentities() function is more comprehensive than htmlspecialchars() - it encodes all characters that have HTML entity equivalents, not just the most dangerous ones (<, >, &, ", '). This includes characters like accented letters (é → é), currency symbols (€ → €), and mathematical symbols (± → ±), in addition to the critical XSS-prevention characters. This broader encoding is particularly useful when dealing with international content or user input containing special characters that might have semantic meaning in HTML. The function applies the same flags as htmlspecialchars() - ENT_QUOTES encodes both quote types, ENT_HTML5 uses modern HTML5 encoding rules, and the UTF-8 parameter prevents character set manipulation attacks. The trade-off is that htmlentities() can make source code harder to read when viewing HTML (you'll see é instead of é in the raw HTML), though browsers render both identically. For XSS prevention specifically, htmlspecialchars() with proper flags is usually sufficient and more readable, but htmlentities() provides defense-in-depth by encoding a wider range of characters that might trigger unexpected behavior in certain browsers or HTML parsers. Use htmlentities() when you want maximum encoding coverage or are dealing with complex international content.
Context-Specific Encoding
HTML Context
<?php
// SECURE - HTML body content
function htmlEncode($text) {
return htmlspecialchars($text, ENT_QUOTES | ENT_HTML5, 'UTF-8');
}
$message = $_GET['msg'];
echo '<div>' . htmlEncode($message) . '</div>';
Why this works:
HTML context encoding with htmlspecialchars() protects against XSS by converting characters that have special meaning in HTML (<, >, &, ", ') into their entity equivalents. When you encode user input before inserting it into HTML body content or attribute values, the browser interprets the input as text data rather than executable code. For example, if an attacker submits <script>alert('xss')</script>, the encoding converts it to <script>alert('xss')</script>, which the browser displays as literal text instead of executing. The ENT_QUOTES | ENT_HTML5 flags ensure both single and double quotes are encoded, preventing quote-based injection in HTML attributes like <div title="<?= $input ?>">. The UTF-8 encoding specification is critical because it prevents charset manipulation attacks where attackers use multi-byte character sequences to bypass filters. This encoding is specifically designed for HTML contexts - it won't protect you in JavaScript string contexts (where you need JSON encoding), CSS contexts (where you need CSS escaping), or URL contexts (where you need percent-encoding). Always match your encoding method to the context where the data appears.
JavaScript Context
<?php
// SECURE - JavaScript string context
function jsEncode($string) {
// JSON encode provides proper JavaScript escaping
return json_encode($string, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT);
}
$userName = $_SESSION['username'];
?>
<script>
var currentUser = <?= jsEncode($userName) ?>;
console.log(currentUser);
</script>
Why this works:
JavaScript contexts require different encoding than HTML because HTML entity encoding doesn't protect against attacks when data is inserted into JavaScript code. PHP's json_encode() function with hex flags (JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT) properly escapes data for safe inclusion in JavaScript contexts by converting dangerous characters into unicode escape sequences that JavaScript will interpret correctly. For example, json_encode() wraps strings in double quotes and escapes characters like < (becomes \u003C), > (becomes \u003E), ' (becomes \u0027), and " (becomes \u0022). The hex flags are critical because they convert HTML-significant characters (<, >, &, ', ") into their unicode hex equivalents, preventing attacks that try to break out of the JavaScript context and inject HTML tags like </script><script>alert('xss')</script>. This encoding produces valid JavaScript that won't be misinterpreted by the browser's HTML parser. The output from json_encode() doesn't need additional quotes around it because the function already wraps strings in quotes - writing var x = "<?= jsEncode($data) ?>"; would create double-quoted strings. This approach works for JavaScript string contexts, object properties, and array values, making it versatile for passing PHP data structures to client-side code safely.
URL Context
<?php
// SECURE - URL parameter encoding
function urlEncode($string) {
return urlencode($string);
}
$query = $_GET['search'];
$searchUrl = '/search?q=' . urlEncode($query);
?>
<a href="<?= htmlspecialchars($searchUrl, ENT_QUOTES, 'UTF-8') ?>">Search</a>
Why this works:
URL contexts require percent-encoding (also called URL encoding) which converts special characters into %XX hexadecimal format that's safe for inclusion in URLs. PHP's urlencode() function encodes spaces as + and converts characters like &, =, ?, /, and # (which have special meaning in URLs) into percent-encoded equivalents. This prevents attackers from injecting malicious parameters or breaking out of the URL context. For example, if a user searches for admin&delete=all, urlencode() converts it to admin%26delete%3Dall, preventing the &delete=all from being interpreted as a separate URL parameter. The example shows defense-in-depth by combining urlencode() for the query parameter with htmlspecialchars() for the href attribute value - this double encoding is necessary because the URL appears inside an HTML attribute, so you need URL encoding first (to create a valid URL) followed by HTML encoding (to prevent breaking out of the href attribute with quotes). Use urlencode() for query parameters and path segments, but use rawurlencode() for path segments when you need to preserve / characters. Note that urlencode() only makes URLs safe for inclusion in HTML - if you're building JavaScript that manipulates URLs, you still need JavaScript-appropriate encoding like encodeURIComponent() on the client side.
JSON Responses
<?php
// SECURE - JSON encoding handles escaping
header('Content-Type: application/json; charset=utf-8');
$user = [
'name' => $_GET['name'],
'bio' => $_POST['bio']
];
echo json_encode($user, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT);
Why this works:
When responding with JSON data, json_encode() with hex flags provides safe encoding that prevents XSS even if the JSON response is mistakenly embedded in HTML or mishandled by the client. The Content-Type: application/json header tells the browser to treat the response as data rather than renderable HTML, preventing script execution. The hex flags (JSON_HEX_TAG, JSON_HEX_AMP, JSON_HEX_APOS, JSON_HEX_QUOT) convert HTML-significant characters into unicode escape sequences (e.g., < becomes \u003C), providing defense-in-depth if the JSON is later inserted into HTML contexts. This approach is particularly important for JSON endpoints that might be consumed by JavaScript code that then renders the data into the DOM - even though the JSON itself won't execute scripts, the data becomes safe to handle client-side. Always set the correct Content-Type header to prevent browsers from trying to render JSON as HTML. For APIs, this pattern ensures that even if an attacker injects malicious scripts into your database, the JSON encoding neutralizes them before transmission.
Framework-Specific Guidance
Laravel Blade Templates
<?php
{{-- SECURE - Blade auto-escapes {{ }} --}}
<div class="user-profile">
<h1>{{ $user->name }}</h1>
<p>{{ $user->bio }}</p>
<small>Joined: {{ $user->created_at }}</small>
</div>
{{-- DANGEROUS - Unescaped output --}}
<div>{!! $user->bio !!}</div>
{{-- SECURE - Sanitize before using {!! !!} --}}
@php
$sanitized = Purifier::clean($user->richBio);
@endphp
<div class="rich-content">{!! $sanitized !!}</div>
{{-- Controller --}}
<?php
namespace App\Http\Controllers;
class ProfileController extends Controller
{
public function show(Request $request, $id)
{
$user = User::findOrFail($id);
// Blade automatically escapes these in {{ }}
return view('profile', [
'user' => $user,
'message' => $request->query('msg', '')
]);
}
}
Why this works:
Laravel's Blade templating engine automatically HTML-encodes all output using the {{ }} syntax (double curly braces), similar to how htmlspecialchars() works but applied automatically during template compilation. When you write {{ $user->name }} in a Blade template, Laravel converts it to <?php echo e($user->name); ?> where e() is Laravel's encoding helper function that applies htmlspecialchars() with appropriate flags. This secure-by-default design prevents XSS by transforming dangerous characters (<, >, &, ", ') into HTML entities before rendering. The encoding happens during the template rendering phase, after your controller passes data to the view but before the HTTP response is sent to the browser. The {!! !!} syntax (triple curly braces) explicitly bypasses encoding, which is necessary for rendering trusted HTML content like admin-created rich text, but this should only be used with sanitized content from a library like HTMLPurifier. The example demonstrates proper sanitization with Purifier::clean() before using {!! !!} - never use raw output with unsanitized user input. Blade's auto-escaping is context-aware for HTML, but you still need the @json directive for JavaScript contexts and proper URL encoding for href attributes. The framework's convention of making secure behavior the default and requiring explicit opt-out for dangerous operations ({!! !!}) makes risky code easy to spot during code reviews.
Laravel HTMLPurifier Package:
<?php
use Mews\Purifier\Facades\Purifier;
// Clean HTML before storing
$cleanHtml = Purifier::clean($request->input('content'));
$article = Article::create([
'title' => $request->input('title'),
'content' => $cleanHtml
]);
// config/purifier.php
return [
'encoding' => 'UTF-8',
'finalize' => true,
'cachePath' => storage_path('app/purifier'),
'settings' => [
'default' => [
'HTML.Doctype' => 'HTML 4.01 Transitional',
'HTML.Allowed' => 'p,br,strong,em,ul,ol,li,a[href|title]',
'AutoFormat.AutoParagraph' => true,
'AutoFormat.RemoveEmpty' => true,
],
],
];
Symfony Twig Templates
{# SECURE - Twig auto-escapes {{ }} #}
<div class="profile">
<h1>{{ user.name }}</h1>
<p>{{ user.bio }}</p>
</div>
{# DANGEROUS - Raw output #}
<div>{{ user.content|raw }}</div>
{# SECURE - Sanitize first #}
<div>{{ user.content|sanitize_html|raw }}</div>
{# Controller #}
<?php
namespace App\Controller;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\Response;
class ProfileController extends AbstractController
{
public function show(Request $request, int $id): Response
{
$user = $this->getDoctrine()
->getRepository(User::class)
->find($id);
// Twig auto-escapes in templates
return $this->render('profile.html.twig', [
'user' => $user,
'message' => $request->query->get('msg', '')
]);
}
}
Why this works:
Symfony's Twig templating engine automatically HTML-escapes all output using the {{ }} syntax, functioning identically to Blade's auto-escaping but as part of the Twig template compilation process. When you write {{ user.name }} in a Twig template, the engine applies HTML entity encoding (converting <, >, &, ", ' to their entity equivalents) before rendering the output. This encoding is enabled by default through Twig's autoescape setting (typically 'html'), making XSS injection extremely difficult unless developers explicitly bypass it. The |raw filter disables escaping for specific variables, which is necessary when rendering trusted HTML content like admin-authored rich text or sanitized user content, but should only be used after proper sanitization with a library like Symfony's HTML Sanitizer component. The example shows the correct pattern: sanitize user input with HtmlSanitizer, then use |raw to preserve the safe HTML markup. Twig's escaping is context-aware for HTML content but requires the |json_encode filter for JavaScript contexts and proper URL encoding for href attributes. Like Blade, Twig's design philosophy is secure-by-default - dangerous operations require explicit opt-in through the |raw filter, making risky code patterns easy to identify during security reviews. The HTML Sanitizer component (available in Symfony 6.1+) provides configurable allowlist-based HTML cleaning that allows safe elements and attributes while removing dangerous ones like <script> tags and javascript: URLs.
HTML Sanitizer Component (Symfony 6.1+):
<?php
use Symfony\Component\HtmlSanitizer\HtmlSanitizer;
use Symfony\Component\HtmlSanitizer\HtmlSanitizerConfig;
$config = (new HtmlSanitizerConfig())
->allowSafeElements()
->allowElement('a', ['href', 'title']);
$sanitizer = new HtmlSanitizer($config);
$cleanHtml = $sanitizer->sanitize($userInput);
Rich HTML Sanitization
For allowing safe HTML (e.g., WYSIWYG editors):
<?php
// Use HTML Purifier library
require_once 'vendor/autoload.php';
use HTMLPurifier;
use HTMLPurifier_Config;
function sanitizeHtml($dirtyHtml) {
$config = HTMLPurifier_Config::createDefault();
// Set cache path
$config->set('Cache.SerializerPath', '/tmp');
// Define allowed elements and attributes
$config->set('HTML.Allowed', 'p,br,strong,em,ul,ol,li,a[href|title]');
// Encoding
$config->set('Core.Encoding', 'UTF-8');
// Remove empty paragraphs
$config->set('AutoFormat.RemoveEmpty', true);
$purifier = new HTMLPurifier($config);
return $purifier->purify($dirtyHtml);
}
// Usage:
$userContent = $_POST['article_content'];
$cleanContent = sanitizeHtml($userContent);
// Now safe to output with minimal escaping
echo $cleanContent;
Installation:
Input Validation (Defense in Depth)
<?php
// Validation before storage
class CommentValidator {
public static function validate($data) {
$errors = [];
// Validate author name
if (!isset($data['author']) || empty(trim($data['author']))) {
$errors[] = 'Author name is required';
} elseif (strlen($data['author']) > 100) {
$errors[] = 'Author name too long';
} elseif (!preg_match('/^[a-zA-Z0-9\s]+$/', $data['author'])) {
$errors[] = 'Author name contains invalid characters';
}
// Validate comment text
if (!isset($data['text']) || empty(trim($data['text']))) {
$errors[] = 'Comment text is required';
} elseif (strlen($data['text']) > 1000) {
$errors[] = 'Comment too long';
}
return $errors;
}
}
// Controller:
$errors = CommentValidator::validate($_POST);
if (empty($errors)) {
// Still encode output even after validation!
$comment = [
'author' => htmlspecialchars($_POST['author'], ENT_QUOTES, 'UTF-8'),
'text' => htmlspecialchars($_POST['text'], ENT_QUOTES, 'UTF-8')
];
saveComment($comment);
}
Content Security Policy
<?php
// Set CSP headers
header("Content-Security-Policy: " .
"default-src 'self'; " .
"script-src 'self' https://trusted-cdn.com; " .
"style-src 'self' 'unsafe-inline'; " .
"img-src 'self' data: https:; " .
"frame-ancestors 'none';"
);
header("X-Content-Type-Options: nosniff");
header("X-Frame-Options: DENY");
header("X-XSS-Protection: 1; mode=block");
// Or in a middleware/bootstrap file:
class SecurityHeaders {
public static function apply() {
if (!headers_sent()) {
header("Content-Security-Policy: default-src 'self'");
header("X-Content-Type-Options: nosniff");
header("X-Frame-Options: SAMEORIGIN");
}
}
}
// Call early in application bootstrap
SecurityHeaders::apply();
Verification
After implementing the recommended secure patterns, verify the fix through multiple approaches:
- Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
- Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
- Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
- Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
- Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
- Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
- Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
- Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced
PHP Configuration
; php.ini security settings
; Don't expose PHP version
expose_php = Off
; Disable dangerous functions
disable_functions = exec,passthru,shell_exec,system,proc_open,popen,curl_exec,curl_multi_exec,parse_ini_file,show_source
; Session security
session.cookie_httponly = 1
session.cookie_secure = 1
session.cookie_samesite = "Strict"