CWE-401: Missing Release of Memory After Effective Lifetime - Python

Overview

Memory leaks in Python typically involve unclosed resources (files, connections, sockets) or circular references preventing garbage collection, rather than memory management (handled by Python's garbage collector). File handle exhaustion, connection pool depletion, and unbounded caches are common sources of resource leaks.

Primary Defence: Use context managers (with statement) for all resources, implement custom context managers with @contextmanager decorator or __enter__/__exit__ methods, avoid circular references or use weakref to break them, implement bounded caches with TTL or LRU eviction, and explicitly close resources in finally blocks when context managers aren't available.

Common Vulnerable Patterns

Unclosed Files

def read_file(path):
    f = open(path, 'r')
    content = f.read()
    return content
    # No f.close() - file handle leaked!

# Called repeatedly
for i in range(10000):
    read_file(f'data_{i}.txt')
    # Each call leaks a file descriptor
    # Eventually: OSError: [Errno 24] Too many open files

Why is this vulnerable: Each open() call consumes a file descriptor from the operating system's limited pool (typically 1024 per process on Linux, 256 on macOS by default). When read_file() returns without calling close(), the file handle remains open even though the file object is no longer accessible. Python's garbage collector will eventually finalize the file object and close the handle, but this happens non-deterministically - the GC runs when memory pressure builds, not when file descriptors are exhausted. A web application handling hundreds of requests per second can exhaust all file descriptors in seconds, causing "Too many open files" errors that crash the entire process. Open file handles also prevent file deletion on Windows and waste kernel resources.

Database Connection Leaks

import psycopg2

def get_users():
    conn = psycopg2.connect(dbname="mydb", user="user", password="pass")
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users")
    users = cursor.fetchall()
    return users
    # No cursor.close() or conn.close() - both leaked!

# After 10-20 calls, connection pool exhausted
# New requests block waiting for available connection

Why is this vulnerable: Database connections are backed by network sockets, server-side resources, and connection pool entries. When the connection and cursor aren't closed, they remain allocated in the connection pool (if using one) or hold server resources (if not), preventing reuse. Connection pools typically limit concurrent connections to 10-50. After enough calls without closing, the pool is exhausted and new requests block indefinitely, causing application-wide denial of service. Unlike file handles, database connections hold significant server-side state (transactions, locks, temp tables), so leaked connections waste resources on both client and database server. The garbage collector provides no help here - connection objects may not even have finalizers, and even if they do, finalization is too slow for high-throughput applications.

Unbounded Cache Without Eviction

class DataCache:
    def __init__(self):
        self._cache = {}  # Unbounded dictionary

    def get_data(self, key):
        if key not in self._cache:
            # Expensive operation: fetch from database or API
            data = fetch_from_database(key)
            self._cache[key] = data  # Cache forever
        return self._cache[key]

# Web application caching user data by user ID
cache = DataCache()

# After millions of users access the system
for user_id in range(1, 10_000_000):
    cache.get_data(user_id)
    # Cache grows to 10 million entries - gigabytes of RAM

Why is this vulnerable: The cache dictionary grows without bounds, storing every unique key ever requested. In a long-running application, this accumulates potentially millions of entries, consuming gigabytes of memory. Unlike a least-recently-used (LRU) cache that evicts old entries, this cache never removes anything - it assumes infinite memory. After enough time, the application exhausts available RAM and crashes or is killed by the OS (OOM killer on Linux). Even before crashing, the large dictionary degrades performance - Python dictionaries resize when they grow, requiring full rehashing that freezes the application for seconds. The cache might hold outdated data indefinitely (stale user records, expired API responses), wasting memory on useless information. This pattern is particularly dangerous in microservices or serverless functions that handle diverse requests - each unique request parameter becomes a cache key, and unique keys grow linearly with traffic volume.

Circular References Preventing Garbage Collection

class Parent:
    def __init__(self):
        self.children = []

    def add_child(self, child):
        self.children.append(child)

class Child:
    def __init__(self, parent):
        self.parent = parent
        parent.add_child(self)

# Create circular reference
parent = Parent()
child = Child(parent)

# Delete references
del parent
del child

# Objects NOT garbage collected!
# parent.children holds reference to child
# child.parent holds reference to parent
# Circular reference prevents collection (in CPython without cycle detector it would leak)

Why is this vulnerable: Circular references create reference cycles where objects reference each other, preventing simple reference counting from freeing them. While CPython has a cycle detector that can eventually collect these objects, it runs periodically and non-deterministically. In Python implementations without cycle detection (like older versions or alternative implementations), circular references cause permanent leaks. Even with cycle detection, the cycles tie up memory longer than necessary and can leak if the objects have __del__ finalizers (cycle detector can't break cycles with finalizers). More importantly, if the objects hold other resources (file handles, sockets), those resources remain open until GC runs, causing resource exhaustion. This pattern is common in tree/graph data structures, parent-child relationships, and callback systems.

Secure Patterns

Context Managers

# Using built-in context manager
def read_file(path):
    with open(path, 'r') as f:
        return f.read()
    # f.close() called automatically

# Multiple resources
def copy_file(src, dst):
    with open(src, 'r') as source:
        with open(dst, 'w') as dest:
            dest.write(source.read())
    # Both files closed automatically

# Python 3.1+ supports multiple context managers
def copy_file_v2(src, dst):
    with open(src, 'r') as source, open(dst, 'w') as dest:
        dest.write(source.read())

# Database connections
import psycopg2

def get_users():
    with psycopg2.connect(dbname="mydb", user="user", password="pass") as conn:
        with conn.cursor() as cursor:
            cursor.execute('SELECT * FROM users')
            users = cursor.fetchall()
        return users
    # cursor and connection closed, transaction committed/rolled back

Why this works: Python's with statement ensures the context manager's __exit__ method is called when the block exits, regardless of whether it completes normally or via exception. For files, this closes the file handle; for database connections, it commits or rolls back the transaction and closes the connection. This provides deterministic cleanup in a language with non-deterministic garbage collection - without with, files might stay open until GC runs (unpredictable timing). The with statement handles exceptions gracefully - if an exception occurs in the block, __exit__ still runs (with exception info) before the exception propagates. This eliminates forgetting to close resources in error paths, the most common source of resource leaks.

Custom Context Managers

from contextlib import contextmanager
import socket

@contextmanager
def managed_socket(host, port):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    try:
        sock.connect((host, port))
        yield sock
    finally:
        sock.close()

# Usage
with managed_socket('example.com', 80) as sock:
    sock.sendall(b'GET / HTTP/1.1\r\nHost: example.com\r\n\r\n')
    response = sock.recv(4096)
# Socket closed automatically

# Class-based context manager
class DatabaseConnection:
    def __init__(self, db_config):
        self.config = db_config
        self.conn = None

    def __enter__(self):
        self.conn = psycopg2.connect(**self.config)
        return self.conn

    def __exit__(self, exc_type, exc_val, exc_tb):
        if exc_type is None:
            self.conn.commit()
        else:
            self.conn.rollback()
        self.conn.close()
        return False  # Don't suppress exceptions

with DatabaseConnection(db_config) as conn:
    cursor = conn.cursor()
    cursor.execute("INSERT INTO users VALUES (%s)", (username,))
# Transaction committed and connection closed

Why this works: The @contextmanager decorator makes it trivial to create context managers - the code before yield runs on entry, the yielded value is returned to the with block, and the code after yield (in the finally block) runs on exit, guaranteeing cleanup. Class-based context managers provide more control by implementing __enter__ and __exit__ methods. The __exit__ method receives exception information if an exception occurred, allowing custom error handling (commit on success, rollback on failure for databases). Context managers can return True from __exit__ to suppress exceptions, or False (default) to propagate them. This pattern extends automatic resource management to any custom resource (network connections, locks, temporary files, API clients).

LRU Cache with functools

from functools import lru_cache

@lru_cache(maxsize=1000)
def fetch_user(user_id):
    # Expensive database query
    return database.get_user(user_id)

# Cache automatically evicts least recently used entries
# when size exceeds 1000
for i in range(10_000):
    user = fetch_user(i)
    # Only 1000 most recent entries kept in memory

# Manual cache control
fetch_user.cache_info()  # hits, misses, size, maxsize
fetch_user.cache_clear()  # Clear entire cache

Why this works: functools.lru_cache provides a decorator that caches function results with automatic eviction of least recently used entries when the cache reaches maxsize. This prevents unbounded growth while maintaining high hit rates for frequently accessed data. The cache is implemented efficiently using a hash table with a doubly-linked list to track access order. Setting maxsize=None creates an unbounded cache (dangerous for long-running apps), while maxsize=128 (or any positive integer) creates a bounded cache that won't exhaust memory. The cache is thread-safe (uses locks internally). This is the simplest way to add caching to pure functions in Python without worrying about memory leaks. For more complex scenarios (TTL-based expiration, size-based eviction, cache invalidation), use libraries like cachetools or implement custom caching with collections.OrderedDict.

Breaking Circular References with weakref

import weakref

class Parent:
    def __init__(self):
        self.children = []

    def add_child(self, child):
        self.children.append(child)

class Child:
    def __init__(self, parent):
        # Use weak reference to parent - doesn't prevent GC
        self.parent = weakref.ref(parent)
        parent.add_child(self)

    def get_parent(self):
        # Dereference weak reference
        parent = self.parent()
        if parent is None:
            raise ValueError("Parent has been garbage collected")
        return parent

# Create objects
parent = Parent()
child = Child(parent)

# Delete parent reference
del parent

# parent can now be garbage collected
# child.parent() will return None
# No circular reference leak

Why this works: weakref.ref creates a weak reference that doesn't prevent garbage collection. In a tree structure, parents hold strong references to children (preventing child GC), while children hold weak references to parents (not preventing parent GC). When the parent is no longer strongly referenced elsewhere, it can be garbage collected even though children still hold weak references to it. Those weak references become dead (dereferencing returns None), detectable via the callable interface. This breaks circular reference cycles that would otherwise prevent collection. Weak references are essential for callback systems, caches, observer patterns, and parent-child relationships where you want navigation in both directions without preventing cleanup.

Connection Pooling with Context Managers

import psycopg2.pool
from contextlib import contextmanager

# Create connection pool (once at app startup)
db_pool = psycopg2.pool.SimpleConnectionPool(
    minconn=1,
    maxconn=20,
    dbname="mydb",
    user="user",
    password="pass"
)

@contextmanager
def get_db_connection():
    conn = db_pool.getconn()
    try:
        yield conn
        conn.commit()
    except Exception:
        conn.rollback()
        raise
    finally:
        db_pool.putconn(conn)  # Return to pool

def get_users():
    with get_db_connection() as conn:
        with conn.cursor() as cursor:
            cursor.execute("SELECT * FROM users")
            return cursor.fetchall()
    # Connection returned to pool automatically

# At app shutdown
db_pool.closeall()

Why this works: Connection pooling reuses a fixed number of database connections instead of creating new ones for each request, preventing connection exhaustion. The context manager pattern ensures connections are always returned to the pool after use, even if exceptions occur. getconn() borrows a connection from the pool (blocks if all connections busy), and putconn() returns it (making it available for other requests). By combining pooling with context managers, we get both efficiency (connection reuse) and safety (guaranteed return to pool). Without the context manager, forgetting to call putconn() would permanently remove a connection from the pool, eventually exhausting it. This pattern is essential for database-heavy applications - it caps resource usage while maintaining high throughput.