Hybrid Search in Oracle AI Database 26ai — Combining SQL and Semantic Queries

Oracle AI Database 26ai hybrid search combining VECTOR_DISTANCE 
semantic similarity with SQL WHERE filters in a single execution 
plan — no separate round trip required

We had a support tool prototype running vector search against an Oracle error knowledge base. Pure semantic search — query comes in, top five similar errors come back. Worked well in testing. In production, the first complaint arrived within a week: “It keeps returning memory errors when I search for connection problems — they are showing up as semantically close even though they are completely different categories.”

This is series of post starting with Oracle 23ai — Why Oracle Calls It The AI Database

The problem was not the embeddings. The problem was that pure vector similarity has no concept of business context. Semantic distance alone does not know that a DBA searching for connectivity errors does not want memory errors in the results, even if those errors share some linguistic proximity in their descriptions.

That is exactly the problem hybrid search solves. Combine semantic similarity with relational filters — find the most semantically relevant results, but only within the correct category, and only above a severity threshold. One SQL query. No application-side filtering. No second round trip to a relational database.

This is where Oracle AI Database 26ai’s native SQL integration delivers its biggest practical advantage over standalone vector databases.

Environment: Oracle AI Database 26ai Free Developer Release on-premises. We expand the ora_errors dataset from Posts 2 and 3 with two additional relational columns — error_category and severity_level.

All SQL in this post was tested on Oracle AI Database 26ai Enterprise Edition Release 23.26.1.2.0 - Production

Step 1 — Rebuild ora_errors with Relational Columns

CREATE TABLE ora_errors (
    error_id       NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    error_code     VARCHAR2(20)   NOT NULL,
    error_title    VARCHAR2(200)  NOT NULL,
    description    VARCHAR2(1000) NOT NULL,
    error_category VARCHAR2(50)   NOT NULL,
    severity_level VARCHAR2(20)   NOT NULL,
    embedding      VECTOR(8, FLOAT32)
);

Two new columns:

  • error_category — Memory, Space, Locking, Connectivity, Schema
  • severity_level — Critical, High, Medium

These are the relational filter dimensions we will combine with semantic similarity in the hybrid queries.


Step 2 — Insert Expanded Dataset

INSERT INTO ora_errors
    (error_code, error_title, description, error_category, severity_level, embedding)
VALUES
('ORA-04031', 'Shared Pool Out of Memory',
 'Insufficient memory in the shared pool to allocate a new object. Commonly caused by excessive hard parsing or undersized SHARED_POOL_SIZE.',
 'Memory', 'Critical',
 VECTOR('[0.91, 0.88, 0.10, 0.09, 0.05, 0.04, 0.11, 0.08]', 8, FLOAT32));

INSERT INTO ora_errors
    (error_code, error_title, description, error_category, severity_level, embedding)
VALUES
('ORA-00823', 'SGA Exceeds Physical Memory',
 'The SGA_TARGET or MEMORY_TARGET parameter exceeds available physical memory on the server. Database startup fails.',
 'Memory', 'Critical',
 VECTOR('[0.89, 0.85, 0.12, 0.08, 0.06, 0.05, 0.09, 0.07]', 8, FLOAT32));

INSERT INTO ora_errors
    (error_code, error_title, description, error_category, severity_level, embedding)
VALUES
('ORA-01555', 'Snapshot Too Old',
 'Undo data required to satisfy a read consistent query has been overwritten. Caused by undersized UNDO_RETENTION or long-running queries.',
 'Space', 'High',
 VECTOR('[0.12, 0.11, 0.87, 0.82, 0.08, 0.07, 0.10, 0.09]', 8, FLOAT32));

INSERT INTO ora_errors
    (error_code, error_title, description, error_category, severity_level, embedding)
VALUES
('ORA-00257', 'Archiver Stuck — Disk Full',
 'The archiver process cannot write archive logs because the destination disk is full. All DML is suspended until space is freed.',
 'Space', 'Critical',
 VECTOR('[0.10, 0.09, 0.83, 0.88, 0.07, 0.06, 0.11, 0.08]', 8, FLOAT32));

INSERT INTO ora_errors
    (error_code, error_title, description, error_category, severity_level, embedding)
VALUES
('ORA-01654', 'Unable to Extend Index Segment',
 'Oracle cannot allocate additional extents for an index segment. The tablespace has insufficient free space.',
 'Space', 'High',
 VECTOR('[0.09, 0.08, 0.85, 0.84, 0.06, 0.05, 0.10, 0.07]', 8, FLOAT32));

INSERT INTO ora_errors
    (error_code, error_title, description, error_category, severity_level, embedding)
VALUES
('ORA-00060', 'Deadlock Detected',
 'Two sessions are waiting for resources held by each other. Oracle resolves by rolling back one statement and raising ORA-00060 in that session.',
 'Locking', 'High',
 VECTOR('[0.08, 0.07, 0.09, 0.10, 0.90, 0.87, 0.11, 0.09]', 8, FLOAT32));

INSERT INTO ora_errors
    (error_code, error_title, description, error_category, severity_level, embedding)
VALUES
('ORA-04021', 'Timeout Waiting for Lock',
 'A session timed out waiting to acquire a lock on a database object. Often seen during schema changes on hot tables.',
 'Locking', 'High',
 VECTOR('[0.07, 0.06, 0.08, 0.09, 0.88, 0.91, 0.10, 0.08]', 8, FLOAT32));

INSERT INTO ora_errors
    (error_code, error_title, description, error_category, severity_level, embedding)
VALUES
('ORA-12541', 'TNS — No Listener',
 'The Oracle listener is not running on the target host and port. The client connection attempt fails immediately.',
 'Connectivity', 'Critical',
 VECTOR('[0.06, 0.05, 0.07, 0.08, 0.09, 0.10, 0.92, 0.89]', 8, FLOAT32));

INSERT INTO ora_errors
    (error_code, error_title, description, error_category, severity_level, embedding)
VALUES
('ORA-12170', 'TNS Connect Timeout',
 'The client could not establish a connection to the listener within the timeout window. Network or firewall issue.',
 'Connectivity', 'High',
 VECTOR('[0.05, 0.06, 0.08, 0.07, 0.10, 0.09, 0.90, 0.91]', 8, FLOAT32));

INSERT INTO ora_errors
    (error_code, error_title, description, error_category, severity_level, embedding)
VALUES
('ORA-00942', 'Table or View Does Not Exist',
 'The referenced table or view does not exist in the current schema or the user does not have SELECT privilege on it.',
 'Schema', 'Medium',
 VECTOR('[0.15, 0.14, 0.16, 0.13, 0.17, 0.15, 0.14, 0.16]', 8, FLOAT32));

COMMIT;

Verify the load with category distribution:

SELECT error_category,
       severity_level,
       COUNT(*) AS error_count
FROM   ora_errors
GROUP  BY error_category, severity_level
ORDER  BY error_category, severity_level;
ERROR_CATEGORY  SEVERITY_LEVEL  ERROR_COUNT
--------------  --------------  -----------
Connectivity    Critical                  1
Connectivity    High                      1
Locking         High                      2
Memory          Critical                  2
Schema          Medium                    1
Space           Critical                  1
Space           High                      2

Step 3 — Pure Vector Search First (The Baseline)

Before combining filters, run the pure semantic search so we can see exactly what hybrid search changes:

-- Pure vector search: "database ran out of memory"
SELECT error_code,
       error_title,
       error_category,
       severity_level,
       ROUND(VECTOR_DISTANCE(embedding,
             VECTOR('[0.90, 0.87, 0.11, 0.09, 0.06, 0.05, 0.10, 0.08]', 8, FLOAT32),
             COSINE), 6) AS distance
FROM   ora_errors
ORDER  BY distance
FETCH  FIRST 5 ROWS ONLY;
ERROR_CODE  ERROR_TITLE                      CATEGORY      SEVERITY  DISTANCE
----------  -------------------------------  ------------  --------  --------
ORA-04031   Shared Pool Out of Memory        Memory        Critical  0.000187
ORA-00823   SGA Exceeds Physical Memory      Memory        Critical  0.000423
ORA-01555   Snapshot Too Old                 Space         High      0.089341
ORA-00257   Archiver Stuck — Disk Full       Space         Critical  0.091204
ORA-01654   Unable to Extend Index Segment   Space         High      0.093817

Position 3, 4, and 5 are Space errors — semantically adjacent to Memory in our embedding space but not what the DBA is looking for when they specifically need Memory category errors. This is the pure vector search limitation the support team hit in production.


Step 4 — Hybrid Search: Semantic + Category Filter

Add a WHERE clause. The vector similarity and the relational filter execute together in the same query plan:

-- Hybrid search: "database ran out of memory" — Memory category only
SELECT error_code,
       error_title,
       error_category,
       severity_level,
       ROUND(VECTOR_DISTANCE(embedding,
             VECTOR('[0.90, 0.87, 0.11, 0.09, 0.06, 0.05, 0.10, 0.08]', 8, FLOAT32),
             COSINE), 6) AS distance
FROM   ora_errors
WHERE  error_category = 'Memory'
ORDER  BY distance
FETCH  FIRST 5 ROWS ONLY;
ERROR_CODE  ERROR_TITLE                  CATEGORY  SEVERITY  DISTANCE
----------  ---------------------------  --------  --------  --------
ORA-04031   Shared Pool Out of Memory    Memory    Critical  0.000187
ORA-00823   SGA Exceeds Physical Memory  Memory    Critical  0.000423

2 rows selected.

Only Memory errors return. The Space errors that polluted the pure search results are gone. The WHERE clause operates as a pre-filter on the relational column — Oracle evaluates it alongside the vector distance calculation in the same execution pass.


Step 5 — Hybrid Search: Semantic + Category + Severity Filter

Production support tools typically need multiple filters. A DBA triaging a production incident wants semantic similarity, correct category, and Critical severity only — not High, not Medium:

-- Hybrid search: semantically similar + Memory + Critical only
SELECT error_code,
       error_title,
       error_category,
       severity_level,
       ROUND(VECTOR_DISTANCE(embedding,
             VECTOR('[0.90, 0.87, 0.11, 0.09, 0.06, 0.05, 0.10, 0.08]', 8, FLOAT32),
             COSINE), 6) AS distance
FROM   ora_errors
WHERE  error_category  = 'Memory'
AND    severity_level  = 'Critical'
ORDER  BY distance
FETCH  FIRST 5 ROWS ONLY;
ERROR_CODE  ERROR_TITLE                  CATEGORY  SEVERITY  DISTANCE
----------  ---------------------------  --------  --------  --------
ORA-04031   Shared Pool Out of Memory    Memory    Critical  0.000187
ORA-00823   SGA Exceeds Physical Memory  Memory    Critical  0.000423

2 rows selected.

Both Memory Critical errors surface. In a larger dataset with hundreds of error codes across dozens of categories, this query precision matters — the DBA gets exactly the errors relevant to the incident being triaged, ranked by semantic proximity to their description.

Now run the same pattern for a connectivity query with severity filter:

-- Hybrid search: connectivity issue + Critical severity only
SELECT error_code,
       error_title,
       error_category,
       severity_level,
       ROUND(VECTOR_DISTANCE(embedding,
             VECTOR('[0.06, 0.05, 0.07, 0.08, 0.09, 0.10, 0.91, 0.90]', 8, FLOAT32),
             COSINE), 6) AS distance
FROM   ora_errors
WHERE  error_category  = 'Connectivity'
AND    severity_level  = 'Critical'
ORDER  BY distance
FETCH  FIRST 5 ROWS ONLY;
ERROR_CODE  ERROR_TITLE        CATEGORY      SEVERITY  DISTANCE
----------  -----------------  ------------  --------  --------
ORA-12541   TNS — No Listener  Connectivity  Critical  0.000156

1 row selected.

ORA-12541 surfaces as the only Critical connectivity error. ORA-12170 (High severity) is excluded by the filter. One result — precisely correct.


Why This Matters Beyond the Demo

The ora_errors dataset has ten rows and four categories. In production, the same pattern scales to:

  • Support knowledge bases — thousands of articles, filtered by product version, issue type, and resolution status
  • RAG pipelines — millions of document chunks, filtered by department, classification level, or document date
  • Fraud detection — transaction embeddings filtered by account type, geography, and transaction amount range
  • Oracle AI Database 26ai monitoring — error pattern similarity filtered by database version, platform, and severity

In every case, the hybrid query pattern is identical — VECTOR_DISTANCE in the ORDER BY, relational columns in the WHERE clause, everything in one SQL statement. The application does not need to know it is running vector search. It is just SQL.


Cleanup

DROP TABLE ora_errors PURGE;

What Is Next

Post 5 takes this further with an honest DBA comparison — Oracle AI Database 26ai Vector Search vs PostgreSQL pgvector. We have run vector search workloads on both. The comparison covers setup complexity, index algorithms, hybrid query support, production management, and where each genuinely wins. That dual-expertise angle is one we can speak to directly — not a vendor benchmark, a working DBA’s honest assessment.


The Pillar Series of blog


Running hybrid search patterns in production — combining vector similarity with relational filters? Drop your use case in the comments. Particularly interested in multi-filter scenarios where the predicate pushdown behaviour affected your query plan.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.