ORA-00020: Maximum Processes Exceeded — How Three Different Errors Led to One Root Cause

ORA-00020 maximum processes exceeded — Oracle CDB PDB connection leak production incident

If you’ve ever hit ORA-00020 maximum processes exceeded in production, you know it rarely announces itself cleanly. Instead, it hides behind a cascade of unrelated-looking errors that send you down completely the wrong diagnostic path.

Here’s a real example. You try to reset a password on the ddvpdb pluggable database. Routine task. And Oracle fires back with this:

ORA-65048: error encountered when processing the current DDL statement
in pluggable database
ORA-12850: Could not allocate slaves on all specified instances:
needed, allocated

Meanwhile, your application team is calling because nobody can connect. They’re hitting this:

Enter password:
ERROR:
ORA-12537: TNS:connection closed

Three different errors. Zero obvious connection between them. Production is down.

Sound familiar?

This is exactly the scenario we hit recently on a single-node CDB/PDB environment running on ASM. And every single one of those errors was a red herring.


What Causes ORA-00020 Maximum Processes Exceeded

Let’s talk about the false trails first — because in a real incident, this is where you lose 20 minutes.

ORA-12537 immediately sends you toward the network stack. tnsping working? Yes. Listener healthy? Yes. sqlnet.ora encryption mismatch? No. Firewall blocking the loopback on a VM connecting to itself? No. Dead end.

ORA-12850 on a single-node CDB/PDB makes no sense at all — this is a RAC-era error about parallel slaves across instances. You start checking parallel_instance_group, instance_groups, leftover RAC parameters in SPFILE. All clean. Another dead end.

ORA-65048 is just a DDL propagation wrapper. It tells you something failed downstream, not what.

The real story was sitting in the alert log the entire time, quietly suppressing itself after the first minute:

ORA-00020: maximum number of processes (320) exceeded
ORA-20 errors will not be written to the alert log for
the next minute. Please look at trace files to see all
the ORA-20 errors.

Process table full. That’s it. When Oracle can’t spin up a new OS process, the connection is terminated before a usable authenticated session can be established — ORA-12537. When DDL tries to allocate parallel worker processes and there are none available — ORA-12850. Everything cascades from this single point of failure.

Lesson one: always check the alert log first. Cascading, seemingly unrelated errors almost always have one root cause buried in there.


Finding the Culprit Behind the Process Limit Breach

Once you’ve confirmed ORA-00020 maximum processes exceeded is the root cause, the next question is: what’s eating all 320 process slots?

SELECT status, COUNT(*)
FROM v$session
GROUP BY status
ORDER BY 2 DESC;
STATUS       COUNT(*)
--------   ----------
INACTIVE          228
ACTIVE             71

228 inactive sessions on a 320-process limit. That’s 71% of your process budget sitting completely idle.

Drill in on the inactive sessions:

SELECT sid, serial#, username, status,
       machine, program,
       ROUND(last_call_et/60) idle_minutes
FROM v$session
WHERE status = 'INACTIVE'
  AND last_call_et > 300
  AND username IS NOT NULL
ORDER BY last_call_et DESC;

Every single row came back identical:

USERNAME   STATUS     PROGRAM              IDLE_MINUTES
--------   --------   ------------------   ------------
DTVS       INACTIVE   lslist_wpf.exe             366
DTVS       INACTIVE   lslist_wpf.exe             365
DTVS       INACTIVE   lslist_wpf.exe             365
... (220+ rows)

There it is. lslist_wpf.exe — a Windows application connecting to ddvpdb via the DTVS schema — had been opening database sessions and simply never closing them. No connection.Close(). No connection pooling. No timeout configured anywhere on either side.

From Oracle’s perspective these connections were perfectly alive — the TCP socket was still up. From Oracle’s perspective these sessions were still valid and connected, so no automatic cleanup was triggered. So they just accumulated, one per application invocation, until the process table was full and the database couldn’t accept a single new connection from anywhere — not the app, not Zabbix, not the DBA trying to reset a password.


Getting Back In When You’re Locked Out

Here’s the thing about ORA-00020 — you can’t connect normally to fix it because connecting requires a new process. The entry point is local SYSDBA authentication via bequeath, which bypasses the listener and doesn’t compete for the same slots:

export ORACLE_SID=dataverse
sqlplus / as sysdba

You’re in. Now kill all the leaked sessions with a PL/SQL block rather than running 220 individual statements:

BEGIN
  FOR s IN (
    SELECT sid, serial#
    FROM v$session
    WHERE username = 'DTVS'
    AND status = 'INACTIVE'
    AND program LIKE 'lslist_wpf.exe%'
    AND last_call_et > 300
  ) LOOP
    BEGIN
      EXECUTE IMMEDIATE
        'ALTER SYSTEM KILL SESSION '''
        || s.sid || ',' || s.serial#
        || ''' IMMEDIATE';
    EXCEPTION
      WHEN OTHERS THEN NULL;
    END;
  END LOOP;
END;
/

This syntax is correct for a single-instance database. On RAC — or if you’re ever troubleshooting a CDB where ORA-12850 makes you doubt whether you’re truly on a single node — use the three-part form to explicitly target the instance:

-- RAC-safe syntax
ALTER SYSTEM KILL SESSION 'sid,serial#,@inst_id' IMMEDIATE;

The EXCEPTION WHEN OTHERS THEN NULL is intentional. Some sessions may already be dying when the loop hits them — you don’t want the block aborting halfway through on a stale session. Let it run clean across all rows.

Verify immediately after:

SELECT status, COUNT(*)
FROM v$session
GROUP BY status;

Process slots freed. DB accepting connections. Crisis resolved — in under five minutes from identifying the root cause.


Making Sure It Doesn’t Come Back

The kill block buys you time. It doesn’t fix anything. Those lslist_wpf.exe sessions will start accumulating again the moment the application restarts. You need guardrails at multiple layers.

Layer 1 — Dead Connection Detection

Add to your server-side sqlnet.ora:

SQLNET.EXPIRE_TIME = 10

Oracle Net checks whether client/server connections are still alive at the configured interval (in minutes). If the client-side process is dead or the socket is gone, Oracle cleans up the session automatically. No listener restart needed — takes effect on new connections immediately.

This handles dead connections — crashed app servers, VM migrations, network resets. For live-but-idle connections (the app process is still running, it just isn’t using the DB session), you need the next layer.

Layer 2 — Profile-Based Idle Session Timeout

First — and this is the step most people skip — resource limits must be explicitly enabled or the profile does nothing:

ALTER SYSTEM SET resource_limit = TRUE SCOPE=BOTH;

Then create the profile and assign it:

CREATE PROFILE ddvpdb_app_profile LIMIT
  IDLE_TIME         30    -- mark sessions SNIPED after 30 min idle
  CONNECT_TIME     480    -- max 8-hour session lifetime
  SESSIONS_PER_USER 50;   -- hard ceiling per schema, acts as circuit breaker

ALTER USER DTVS PROFILE ddvpdb_app_profile;

IDLE_TIME is the primary weapon here. Once a session crosses the idle threshold, Oracle marks it as SNIPED — the next call from that client receives an error, and the session is then cleaned up. Important distinction: Oracle does not immediately terminate the OS process the moment the timer expires. The session sits in SNIPED state until the client makes its next attempt. For genuinely abandoned connections where the client never calls back, pair this with SQLNET.EXPIRE_TIME (Layer 1 above) to handle the OS-level cleanup. Together, the two layers cover both live-idle and dead connections.

SESSIONS_PER_USER 50 is the circuit breaker. If the leak is aggressive, it can never blow past 50 sessions for this schema. Beyond that, the application gets an error — which is actually useful because it makes the leak visible to the development team instead of silently degrading the database for hours.

Layer 3 — Increase the Process Limit

320 is too tight for a production CDB. Background processes, PDB overhead, monitoring agents, connection pools — they all add up. Size it with headroom:

ALTER SYSTEM SET processes=500 SCOPE=SPFILE;
ALTER SYSTEM SET sessions=555 SCOPE=SPFILE;
-- Oracle default derivation: processes * 1.1 + 5

Schedule the bounce for your next maintenance window. The leak fix reduces the frequency of hitting the ceiling — but the ceiling itself needs to be sane.


The Conversation With Your Application Team

This is fundamentally an application-side bug, not a database problem. The message the development team needs:

lslist_wpf.exe is not closing database connections after use. Every execution opens a session against ddvpdb via the DTVS schema and abandons it. At a 320-process limit, ~220 leaked connections accumulate within a single hour of the application running. Implement ODP.NET connection pooling. Ensure connection.Close() and connection.Dispose() are called in all code paths — including exception handlers and finally blocks. Connection leaks in exception paths are the most common source of this pattern.

The SESSIONS_PER_USER profile limit enforces a hard ceiling in the meantime. But that’s a guardrail, not a solution.


Key Takeaways

Chase the alert log before anything else. ORA-12537 and ORA-12850 sent the initial triage completely in the wrong direction. The alert log had ORA-00020 the entire time.

SQLNET.EXPIRE_TIME should be non-negotiable in every environment. Zombie sessions from crashed app servers, VM migrations, and network hiccups are a universal problem. A 10-minute probe interval costs nothing.

Profile limits are operational guardrails, not just security controls. IDLE_TIME marks sessions SNIPED — not immediately killed — and SESSIONS_PER_USER caps concurrent connections hard. Set both proactively on every application schema. And don’t forget RESOURCE_LIMIT=TRUE or the profile does absolutely nothing.

sqlplus / as sysdba is your emergency lifeline. When ORA-00020 locks you out, local bequeath authentication bypasses the process slot competition entirely. Know this before you need it.

The database was back up and accepting connections within minutes of identifying the root cause. The real work — the profile, sqlnet.ora tuning, the process limit bump, and the application team conversation — that’s what turns a recurring incident into a one-time war story.


Seen this pattern in your environment? What application-side connection leak have you had to track down? Drop it in the comments — always curious about the variety of lslist_ equivalents out there.


Other Oracle Topics worth reading:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.