Database and Migration Insights

Why Ora2Pg Should Be Your First Stop for PostgreSQL Conversion

Deepak Mahto Mar 31, 2026 Updated Apr 1, 2026

The author emphasizes the importance of using Ora2pg for Oracle-to-PostgreSQL migrations, particularly for table DDL conversion. With over a decade of experience, they highlight Ora2pg's robust configuration options that provide precise control. The post outlines specific features and transformation directives that enhance migration efficiency and accuracy, making complex migrations manageable. Continue reading →

Show full content

I have been doing Oracle-to-PostgreSQL migrations for over last decades across enterprises, cloud platforms, and everything in between. I have used commercial tools, cloud-native services, and custom scripts. And when it comes to table DDL conversion, I keep coming back to the same tool: Ora2pg. Not because it is the flashiest or the easiest to set up, but because once you understand its configuration model, nothing else comes close to the control it gives you. This post is my attempt to convince you of the same and to walk you through the specific features that I use on every single engagement.

Let me be direct if you are doing an Oracle-to-PostgreSQL migration and you have not tried Ora2pg for your DDL conversion, you are making your life harder than it needs to be. This is a 20+ year battle-tested open-source project that has seen more Oracle schemas than most of us ever will.

This post is specifically about Table DDL conversion and the transformation knobs Ora2pg gives you. Not PL/SQL. Not data export. Just the DDL side because that alone deserves a dedicated conversation.

What Ora2Pg Actually Is (Quick Primer)

Ora2Pg connects to your Oracle database, scans it automatically, extracts its structure or data or code, and generates PostgreSQL-compatible SQL scripts. One config file ora2pg.conf controls everything. Set your DSN, set your schema, set the export type, and you are off.

ora2pg -t TABLE -c ora2pg.conf -o tables.sql

That one command gets you tables with constraints – primary keys, foreign keys, unique constraints, check constraints all translated and ready to load. It handles sequences, indexes, partitions, grants, views, the works. But let us focus on what makes it genuinely powerful for DDL work: the transformation section.

The Transformation Section: Where the Real Power Lives

The ora2pg.conf transformation directives are what separate Ora2pg from a simple schema dumper. You are not just getting a mechanical type mapping, you are getting a configuration-driven transformation engine.

MODIFY_TYPE — Override Column Types on Your Terms

Oracle schemas are full of NUMBER(1) columns that are logically boolean but typed as numbers because Oracle did not have a native boolean until very recently. You know which ones they are, the application knows which ones they are, but a dumb type mapper does not.

MODIFY_TYPE lets you explicitly override the target type for specific table-column combinations:

MODIFY_TYPE ORDERS:IS_ACTIVE:booleanMODIFY_TYPE CUSTOMERS:STATUS_FLAG:smallint

This gives you surgical control. The column that truly represents a boolean flips to boolean. The one that happens to be NUMBER(1) but actually stores an enum value stays as smallint. No global assumptions, no surprises downstream.

REPLACE_AS_BOOLEAN + BOOLEAN_VALUES — Boolean Transformation at Scale

If you have a whole class of columns say, every NUMBER(1) in the schema that need to become boolean, doing them one by one in MODIFY_TYPE is tedious. That is where REPLACE_AS_BOOLEAN comes in:

REPLACE_AS_BOOLEAN NUMBER:1

This tells ora2pg to treat every NUMBER(1) as a candidate for boolean conversion. Combined with BOOLEAN_VALUES, which defines what the true/false pairs look like in your data:

BOOLEAN_VALUES yes:no y:n 1:0 true:false enabled:disabled

Ora2Pg ships with a sensible default set. You extend it for your domain-specific values. This is the kind of feature you only build after seeing hundreds of real Oracle schemas.

ORA_RESERVED_WORDS – Escaping Oracle-isms That Break PostgreSQL

Oracle has its own set of reserved words that are perfectly valid as column or table names in Oracle but will blow up in PostgreSQL. ORA_RESERVED_WORDS tells Ora2pg to double-quote those specific identifiers during export:

ORA_RESERVED_WORDS audit,comment,references

Real-world Oracle schemas use COMMENT, AUDIT, LEVEL, REFERENCES as column names all the time particularly in legacy financial and ERP systems. Without this directive, your generated DDL silently produces broken SQL. With it, Ora2pg wraps those identifiers and your PostgreSQL DDL loads cleanly.

I have seen this one issue alone cause post-migration failures that took a team two days to trace back to a column name. One directive. Two days saved.

PRESERVE_CASE – Case Sensitivity Control

By default, ora2pg lowercases everything which is the right call for PostgreSQL. But some shops have mixed-case Oracle schemas they need to preserve exactly as-is (usually because the application references identifiers with specific casing). Flip PRESERVE_CASE to 1 and ora2pg double-quotes all object names to preserve their original case.

Use this carefully. If you enable it, every SQL statement touching those objects in PostgreSQL will need double-quoted identifiers. But when you need it, you need it.

USE_RESERVED_WORDS – PostgreSQL Reserved Word Protection

The flip side of ORA_RESERVED_WORDS. If your Oracle table or column names happen to collide with PostgreSQL reserved words, USE_RESERVED_WORDS 1 tells Ora2pg to automatically quote them in the output DDL. One flag, no manual grep-and-replace exercise. But be cautious of identifiers that start with $, those will fail in PostgreSQL.

DATA_TYPE – Custom Type Mapping

The default Oracle-to-PostgreSQL type mapping in Ora2pg is solid, but every schema has quirks. DATA_TYPE lets you override the global type mapping rules:

DATA_TYPE NUMBER(1)    booleanDATA_TYPE NUMBER(*)    bigintDATA_TYPE VARCHAR2     varchar

This is coarser than MODIFY_TYPE (which is column-specific) but useful when you have organisation-wide conventions for example, “we always used NUMBER(1) for booleans across every schema.”

The discipline is knowing when to use DATA_TYPE (global convention) versus MODIFY_TYPE (surgical override). Apply DATA_TYPE too broadly and you will quietly corrupt edge cases. Apply MODIFY_TYPE too narrowly and you miss a class of columns. The right answer depends on the schema.

FORCE_IDENTITY_BIGINT – Enforce BIGINT for all Identity Columns.

Enable migrating all Identity column with BIGINT data type that serve as best practise as per PostgreSQL for long term stability.

The Testing Framework: This Is What Closes the Loop

Transformation is half the story. How do you know the conversion is correct? Ora2Pg ships a built-in testing framework that most people never fully use, and that is a mistake.

TEST – Checks that all objects exported from Oracle actually exist in PostgreSQL. Tables, indexes, constraints, sequences — it compares both sides and reports gaps.

TEST_VIEW – Validates that views have been created correctly on the PostgreSQL side.

TEST_COUNT – Compares row counts across all tables between Oracle and PostgreSQL. Fast sanity check after data load.

TEST_DATA – Runs a deeper comparison, sampling actual data values across tables to verify the content, not just the counts.

# Run full validationora2pg -t TEST -c ora2pg.confora2pg -t TEST_COUNT -c ora2pg.confora2pg -t TEST_DATA -c ora2pg.conf

You run the DDL export, load it, migrate your data, and then run TEST + TEST_COUNT + TEST_DATA as a pipeline. That is a migration validation loop built into the same tool you used to convert. No external tooling needed to get started.

The Assessment Report: Before You Touch a Line of DDL

Before any of the above, run SHOW_REPORT. It generates an HTML report that breaks down every Oracle object in your schema, estimates migration complexity (A/B/C tiers), calculates effort in person-days, and flags what needs manual rewriting.

ora2pg -t SHOW_REPORT --estimate_cost -c ora2pg.conf --dump_as_html > report.html

This is your pre-migration brief. It tells you what ora2pg can handle automatically versus what will need human intervention. On a real enterprise schema, this report saves weeks of scoping conversations.

Sample Glimpse of Ora2pg

Where Ora2pg Gets Tricky

Ora2Pg is excellent for table DDL, sequences, indexes, and constraints. Straightforward schemas even large ones come out clean with the right transformation flags set.

But real Oracle schemas are not always clean, and the complexity compounds at scale.

Configuration accuracy under scale.
A 50-table schema is manageable to tune by hand. A 500-table schema across 10 business schemas is not. The number of MODIFY_TYPE decisions, the boolean column candidates, the reserved word conflicts, the partition strategies it all multiplies. Miss a flag in one schema and you get DDL that loads without errors but behaves wrong. That is the hardest class of migration bug to catch.

Directive interaction.
DATA_TYPE, MODIFY_TYPE, REPLACE_AS_BOOLEAN, and BOOLEAN_VALUES interact. Setting them in the wrong order or with conflicting scopes produces unexpected output. You need to understand how ora2pg resolves precedence before you write a production config.

Non-standard Oracle patterns.
Column defaults using Oracle sequences, function-based index definitions, interval partitions with custom rules, domain indexes these often need post-processing that ora2pg.conf alone cannot handle. Ora2Pg gets you 90–95% of the way. The remaining 5–10% is where migration engineers earn their keep.

None of this is a criticism of Ora2pg. These are the inherent complexities of Oracle-to-PostgreSQL migration at enterprise scale. Ora2Pg gives you the tools to handle all of it. The discipline is in knowing how to apply them.

One More Thing: DCGMigrator

I will keep this short because it is not the point of this post.

The challenge with Ora2pg at scale is not the tool it is the configuration. When you are dealing with 10, 20, 50 Oracle schemas or large set of tables, figuring out the right combination of MODIFY_TYPE overrides, boolean column candidates, reserved word conflicts, and type mapping rules for each schema is a repetitive analysis job. Miss something and you get DDL that loads clean but behaves wrong.

DCGMigrator – the tool we built at DataCloudGaze scans your Oracle database and generates a best-fit ora2pg.conf automatically. It does the discovery work so you do not have to. You still run ora2pg. You still own the output. You just start from a configuration grounded in what is actually in your schema, not guesswork.

Use ora2pg. Let DCGMigrator get your configuration right faster. Connect to Know More.

Bottom Line

Ora2Pg has been around since 2001. It is free, open-source, actively maintained, and has more Oracle-to-PostgreSQL institutional knowledge baked into it than any commercial tool I have used.

For table DDL conversion, there is no better starting point. Learn the transformation directives MODIFY_TYPE, REPLACE_AS_BOOLEAN, BOOLEAN_VALUES, ORA_RESERVED_WORDS, USE_RESERVED_WORDS, PRESERVE_CASE, DATA_TYPE — understand how they interact, get comfortable with the assessment and testing pipeline, and you will be converting Oracle schemas to PostgreSQL with real confidence.

The configuration is the craft. Everything else builds on top of that foundation.

http://databaserookies.wordpress.com/?p=3727

Extensions

Oracle & SQL Server to PostgreSQL – Migration Tooling Gotchas No One Warns You About!

Deepak Mahto Mar 26, 2026 Updated Mar 30, 2026

Migrations from Oracle and SQL Server to PostgreSQL often encounter overlooked challenges. The tools, while capable, can produce flawed code that compromises performance and functionality. Issues like improper data type mapping, silent errors from implicit casting, incomplete code, and extension lock-in require careful attention. Understanding these pitfalls is crucial for a successful migration. Continue reading →

Show full content

Every migration unfolds a story. Here are the chapters most teams miss.

Years of guiding customers and partners through Oracle and SQL Server migrations to PostgreSQL taught me one thing above everything else: the tool was never the whole answer.

Legacy Oracle and SQL Server code wasn’t written yesterday. It was built over three decades, under different constraints, by people solving real problems with what they had. That logic has history. It deserves empathy not just a conversion pass.

Ora2pg, AWS SCT, Google Cloud DMS, Ispirer all are solid tools. All do a decent job translating DDL and procedural logic. But default tooling output becomes your new core first PostgreSQL foundation. If that foundation has cracks, your customer’s first experience of PostgreSQL will be frustration not its true capability.

Every migration I’ve worked on unfolded a story. I learned something new in each one and carried that into every PostgreSQL implementation I built after.
Here are four gotchas that will bite you if you’re not watching.

1. Oracle NUMBER Without Precision – A Silent Performance Trap

Oracle lets you declare NUMBER without precision or scale. That’s a 30-year-old design choice. Most tools map it to NUMERIC or DOUBLE PRECISION depending on context but without deliberate fine-tuning, you get a blanket mapping that introduces implicit casting inside procedural blocks and hurts performance long term.

It compounds further when NUMBER columns carry primary key or foreign key constraints wrong type mapping directly impacts index access patterns, storage, and query performance over time.

If your conversion tool is producing this mapping without any precision inference — please run. Fix it before you go further.

Oracle Number DeclarationConstraintPostgreSQLNUMBERPrimary KeyNumeric or Double PrecisionNUMBERForeign KeyNumeric or Double PrecisionNUMBERNDefault – Double Precision

It also quietly introduces implicit casting inside procedural blocks, a performance issue that compounds over time and is hard to trace back to the root cause. Spoke about this in detail at PGConf India 2026.

2. Implicit Type Casting – Errors Hidden for a Decade

Oracle silently casts mismatched types. Comparing a NUMBER column to a VARCHAR? No error, Oracle just handles it and hides the problem for years. PostgreSQL is strict by design. It throws an operator error the moment argument types don’t match. That’s not a bug that’s PostgreSQL doing the right thing.

Most conversion tools can’t detect this during translation. It doesn’t surface until the first functional run or worse, when the customer is already in testing. Code that ran in Oracle for a decade starts failing. To the end customer it feels like a PostgreSQL problem. It isn’t it’s a decade-old data quality issue that Oracle never surfaced.

Two mitigation paths: explicit casting wherever feasible, or custom operators in a dedicated schema with casting functions, prioritized via search_path. Neither is a silver bullet, but understanding the behavioral gap is half the battle before you even touch the fix.

→ Deep dive: Implicit Conversion in Oracle to PostgreSQL Migration
→ Related: Same SQL, Different Results — A Subtle Migration Bug

3. Incomplete Code Marked as Completed

This one is specific to AWS Schema Conversion Tool and it’s dangerous.

For certain patterns AWS SCT can’t convert, it doesn’t fail loudly. It creates a version that compiles but doesn’t work. If you’re using metadata-driven schema validation, the object shows up as present. Functionally, it’s garbage.

Two real examples:

Oracle Code
COUNT(DISTINCT col) inside analytical functions – SCT outputs a view that returns the error message as a column value. It compiles. It does nothing useful.

CREATE OR REPLACE FORCE VIEW MOCKSCHEMA.VW_DISTINCT_CLAUSE_ANALYTICAL(COL1, COL2, DISTINCT_COUNT_COL1) AS 
WITH
......................
COUNT(DISTINCT COL1) OVER (PARTITION BY COL2) AS DISTINCT_COUNT_COL1 FROM ALIAS1
/

AWS Schema Conversion Tool Output

CREATE OR REPLACE  VIEW mockschema.vw_distinct_clause_analytical (text, error_msg) AS
SELECT '.........'5340 - Severity CRITICAL - PostgreSQL doesn''t support the COUNT(<DYNAMIC_TYPE>) function. Revise your code to use another function or create a user-defined function.
5578 - Severity CRITICAL........
::varchar AS error_msg;

One more examples from AWS Schema Conversion Tool

BULK COLLECT + FORALL patterns

CREATE OR REPLACE PROCEDURE "MOCKSCHEMA"."PROC_EMP_BULK_LOAD" as
type emp_info_type is table of employees % rowtype;emp_info_type_tbl emp_info_type;
begin
select
	e. * bulk collect into emp_info_type_tbl
from
	employees e;forall I in 1 ..emp_info_type_tbl.count
insert into
	employees_bkp (EMPLOYEE_ID)
VALUES
	(
		emp_info_type_tbl(i).EMPLOYEE_ID
	);
exception
	when others then rollback;raise;
end proc_emp_bulk_load;/

SCT wraps the untranslated logic in comments inside a valid PL/pgSQL shell. The procedure compiles. The body does nothing.

CREATE OR REPLACE PROCEDURE mockschema.proc_emp_bulk_load()
AS 
$BODY$
/* create table employees_bkp(EMPLOYEE_ID NUMBER(6)); */
DECLARE
    emp_info_type_tbl mockschema.proc_emp_bulk_load$emp_info_type;
BEGIN
    BEGIN
    /*
    [9996 - Severity CRITICAL - Transformer error occurred in plSqlStatement. Please submit report to developers.]
    select e.*
            		bulk collect into emp_info_type_tbl
           			from employees e
    */
    /*
    [9996 - Severity CRITICAL - Transformer error occurred in plSqlStatement. Please submit report to developers.]
    forall I in 1 ..emp_info_type_tbl.count
    
      insert into employees_bkp
               (
                EMPLOYEE_ID
    
                )
                VALUES
                (
                emp_info_type_tbl(i).EMPLOYEE_ID
    
                )
    */
    END;
    EXCEPTION
        WHEN others THEN
            /*
            [5035 - Severity CRITICAL - Your code ends a transaction inside a block with exception handlers. Revise your code to move transaction control to the application side and try again.]
            rollback
            */
            RAISE;
END;
$BODY$
LANGUAGE plpgsql;

The lesson: Never validate migration completeness from metadata alone. Run functional or sanity tests on Every object!

4. Hidden Lock-in via AWS SCT Extension Packs

AWS SCT converts Oracle/MSSQL functions by wrapping them in proprietary extension schemas aws_oracle_ext, aws_sqlserver_ext. It accelerates initial conversion but embeds undocumented, unmaintained, AWS-specific code into what should be an open PostgreSQL database.

Move to another cloud or self-host later? That extension dependency goes with you or blocks you.

The fix is replacing extension pack references with native PostgreSQL functionality or orafce equivalents.
We built two tools at DataCloudGaze specifically for this:

Extension Assessment Tool — scans your migrated code, maps extension dependencies by function, categorizes effort as Simple/Medium/Complex.

SCTMigrator — automates the replacement, giving you PostgreSQL that’s genuinely portable across any cloud or postgres platform.

Final Thoughts

Across dozens of migrations, Ora2pg, AWS SCT, Google Cloud DMS, and Ispirer have converted enormous chunks of Oracle PL/SQL and SQL Server T-SQL into working PL/pgSQL. Core translation – DDL, procedural logic, data type mapping — they handle well. That’s months of manual effort saved and that matters.

But enablers are not finishers. Compatible and production-ready are two very different things. The NUMBER precision decisions, implicit cast failures, silent compilation errors, extension lock-in none of that gets resolved by the single tool. That requires judgment, empathy for legacy code, and a systematic approach to validation.

A good reference on this — PostgreSQL Migration from Oracle (Converting 1 millions lines of code).

Every migration I’ve worked on taught me something new. These four gotchas are the ones that show up most consistently across tools, across customers, across source databases. They’re not edge cases. They’re the job.

If you’re in the middle of an Oracle or SQL Server migration to PostgreSQL and hitting walls, let’s talk.

contact@datacloudgaze.com
Book a 30-min call
DCGMigrator — end-to-end migration intelligence platform

http://databaserookies.wordpress.com/?p=3666

Extensions

PostgreSQL Sequence Reset: START WITH vs RESTART WITH vs SETVAL Explained

Deepak Mahto Mar 21, 2026 Updated Mar 22, 2026

During an Oracle to PostgreSQL migration, a critical issue arose regarding sequence value resets. The tool ora2pg generated inadequate commands causing sequence values to mismatch post-cutover. A fix was implemented to replace the incorrect ALTER SEQUENCE START WITH command with RESTART WITH or setval() for accurate sequence resetting, crucial for successful migrations. Continue reading →

Show full content

Recently during one of the Oracle to PostgreSQL migration with enterprise customer while designing cutover runbook, we were evaluating steps to perform Sequence value reset to match it as per Source so that every new value request using NextVal is an new one and does not incur transactional failure.

It is one of the critical steps of any database migrations, as in most cases sequence LAST_VALUE is not migrated at target implicitly and sequence values need to be matching as per source so that new transaction never fails and application work post cutover.

We used ora2pg’s SEQUENCE_VALUES export to generate DDL command to set the last values of sequences.

Sample Ora2pg Command for exporting Sequence DDL.

ora2pg -t SEQUENCE_VALUES -c ora2pg_conf.conf -o sequence_values.sql

The Bug That Started This

Ora2pg generates this for sequence migration

ALTER SEQUENCE seq_sample START WITH 1494601;

ora2pg’s SEQUENCE_VALUES export generates ALTER SEQUENCE ... START WITH. That command does not reset the sequence.

Expected: next nextval() = 1494601. Actual: sequence continues from wherever it was.
Silently wrong. The kind that surfaces as a “balance mismatch” as part of post-cutover steps.

Understand Different Sequence Reset Options.

Any sequence in PostgreSQL has three internal fields:
start_value → reference point for bare RESTART
last_value → where the counter actually is (nextval reads this)
is_called → has last_value been consumed yet?

We have couple of options to reset last_value for an sequences,

Following table summarize with all options in comparison with START WITH.

Command start_valuelast_valueis_calledNext nextval()START WITH n✓ updated✗ unchanged✗ unchangedwherever it wasRESTART WITH n✗✓ n-1✓ falseexactly nsetval(seq, n, false)✗✓ n✓ falseexactly nsetval(seq, n, true)✗✓ n✓ truen + incrementsetval(seq, n)✗✓ n✓ truen + increment

Sequence Reset option comparison

START WITH only touches start_value and nextval() still uses last_value. That’s the entire issue or bug with initial Ora2pg approach.

START WITH sets the initial value of a sequence when creating it. If the sequence already exists, START WITH defines the value used for any future “restart with” statement that lacks an explicit value but no impact on current last value of the sequence.

Let’s see it action with an sample sequence.

CREATE SEQUENCE seq_test;
SELECT nextval('seq_test');  -- 1
SELECT nextval('seq_test');  -- 2
SELECT nextval('seq_test');  -- 3

-- What ora2pg does
ALTER SEQUENCE seq_test START WITH 1494601;
SELECT nextval('seq_test');  -- 4 ← not 1494601

SELECT last_value, start_value, is_called FROM seq_test;
-- last_value=4  start_value=1494601  is_called=t

-- What actually works
ALTER SEQUENCE seq_test RESTART WITH 1494601;
SELECT nextval('seq_test');  -- 1494601 ✓

DROP SEQUENCE seq_test;

RESTART WITH vs SETVAL() — Which One to Use

RESTART WITH is simpler and readable. Use it in ad-hoc scripts. setval(seq, n, false) is better for migration pipelines for three reasons:

1. Usage within SQL — We can alter sequence values based on SQL that fetched max of column internally using sequences as nextval

select setval('seq_test', max(id)+1, false) from generate_series(1,100) as id;

2. Logical replication — DDL is not replicated by default. setval() gets replicated as a function call. RESTART WITH on the subscriber has to run separately.

3. Works inside PL/pgSQL and application code — you can call setval() conditionally inside a procedure or from JDBC/psycopg2 without DDL privileges. ALTER SEQUENCE requires ownership.

The Fix

In Ora2pg for SEQUENCE_VALUES patch was committed by core contributor to change START WITH to RESTART WITH.

(Existing Issue #1937, Core Contributor, committed fix on Dec 22, 2025):

-- ora2pg generates (wrong):
ALTER SEQUENCE seq_test START WITH 1494601;

-- Use this instead:
ALTER SEQUENCE seq_test RESTART WITH 1494601; --new patch in Ora2pg use these.
OR
select setval('seq_test', max(id)+1, false) from sample_table ; 
OR
select setval('seq_test', max(id), true) from sample_table ;

Couple of thing’s to remember

–> with setval(): the default (no third argument) behaves like is_called=true next value is n+1, not n. Always pass false explicitly when you want exactly n.

–> for cutover runbooks, use setval(seq, max(id), true) derived from actual table data rather than Oracle’s sequence value safer if the data load and sequence state diverged.

Thanking Note
Big Shoutout to Ora2pg Committers, It is one of the great tool if you are starting your Oracle to PostgreSQL Migrations.

http://databaserookies.wordpress.com/?p=3604

Extensions

PGConf India 2026: PostgreSQL Query Tuning: A Foundation Every Database Developer Should Build

Deepak Mahto Mar 13, 2026 Updated Mar 13, 2026

PostgreSQL tuning should not rely on quick fixes like adding indexes or rewriting queries but instead focus on understanding the root issues affecting query performance. A slow query highlights underlying problems such as inaccurate statistics or schema inefficiencies. A five-question framework guides effective SQL tuning, as discussed at PGConf India 2026. Continue reading →

Show full content

Most PostgreSQL tuning advice that folks chase is quick fixes but not on understanding what made planners choose an path or join over others optimal path. !

Tuning should not start with Analyze on tables involved in the Query but with intend what is causing the issue and why planner is not self sufficient to choose the optimal path.

Most fixes we search for SQL tuning are around,

Add an index. 
Rewrite the query. 
Bump work_mem. 
Done.

Except it’s not done. The same problem comes back, different query, different table, same confusion.

The Real Problem

A slow query is a symptom. Statistics, DDL, query style, and PG version are the actual culprit’s.

Before you touch anything, you need to answer five questions — in order:

Find it — which query actually hurts the most right now?
Read the plan — what is the planner doing and where is it wrong?
Check statistics — is the planner even working with accurate data?
Check the DDL — is your schema helping or hiding the answer?
Check GUCs & version — are the defaults silently working against you?

Most developers skip straight to question two. Many skip to indexes without asking any question at all.

What I Covered at PGConf India 2026

I presented this framework at PGConf India yesterday, a room full of developers and DBA , sharp questions, and a lot of “I’ve hit exactly this” moments.

The slides cover core foundations for approaching Query Tuning and production gotchas including partition pruning, SARGability, CTE fences, and correlated column statistics.

Slide – PostgreSQL Query Tuning: A Foundation Every Database Developer Should Build

http://databaserookies.wordpress.com/?p=3576

Extensions

Same SQL, Different Results: A Subtle Oracle vs PostgreSQL Migration Bug

Deepak Mahto Jan 30, 2026 Updated Jan 30, 2026

The article examines issues encountered when migrating an Oracle application to PostgreSQL, particularly differences in operator precedence and implicit casting. A case study reveals how similar SQL expressions yield different results due to inherent database behaviors. The piece emphasizes the need for explicit intent in SQL to avoid silent data corruption and bugs post-migration. Continue reading →

Show full content

Read time: ~6 minutes

A real-world deep dive into operator precedence, implicit casting, and why database engines “don’t think the same way”.

The Database Migration Mystery That Started It All

You migrate a perfectly stable Oracle application to PostgreSQL.

The SQL runs
The tests pass
The syntax looks correct
Nothing crashes

And yet… the numbers or query calculations are wrong.

Not obviously wrong. Not broken. Just different.
Those are the worst bugs the ones that quietly ship to production. This is a story about one such bug, hiding behind familiar operators, clean-looking conversions, and false confidence.

The Original Business Logic (Oracle)

Here’s a simplified piece of real production logic used to compute a varhour value from timestamp data:

CASE     WHEN TO_CHAR(varmonth,'MI') + 1 = 60     THEN varhr - 1 || TO_CHAR(varmonth,'MI') + 1 + 40     ELSE varhr - 1 || TO_CHAR(varmonth,'MI') + 1 END AS varhour

At first glance, this feels routine:

Extract minutes
Perform arithmetic
Concatenate values

Nothing here screams “migration risk”.

The Migration Illusion: “Looks Correct, Right?”

During migration, teams don’t blindly copy Oracle SQL. They do the right thing make types explicit and clean up the logic.

Here’s the PostgreSQL converted version, already “fixed” with necessary casts:

SELECTCASE WHEN TO_CHAR(varmonth, 'MI') :: integer + 1 = 60   THEN   (end_hr -1) :: text || TO_CHAR(varmonth, 'MI')::integer + 1 + 40  ELSE   (end_hr -1)::text || TO_CHAR(varmonth, 'MI') ::integer + 1  END varhourFROM sample_loadsORDER BY id;

No syntax errors. Explicit casting. Clean and readable. At this point, most migrations move on.

Side-by-Side: Oracle vs PostgreSQL (At First Glance)

Let’s compare the two versions:

Everything appears aligned.
Same operators. Same order. Same intent. So naturally, we expect the same result.

Let’s test with a real value:

end_hr  = 15minutes = 59

Output:

DatabasevarhourOracle1500PostgreSQL14100

Same logic. Same data. Different result. Now the real question appears:

How can two “explicit” queries still behave differently?

What Your Brain Thinks Is Happening

When most of us read this expression:

(end_hr - 1)::text || TO_CHAR(varmonth,'MI')::integer + 1 + 40

Our brain assumes:

Arithmetic happens first (+, -)
Concatenation happens last (||)

That assumption is correct in PostgreSQL. It is not correct in Oracle.

Oracle’s Behavior: “Let Me Help You”

Oracle aggressively applies implicit type conversion. Internally, Oracle rewrites the expression to something closer to:

TO_NUMBER  (   TO_CHAR(varhr - 1) || TO_CHAR(loadmonth,'MI')  ) + 1 + 40

Concatenation happens before arithmetic.

Step by step:

varhr - 1 → 14
TO_CHAR(14) → '14'
TO_CHAR(varmonth,'MI') → '59'
'14' || '59' → '1459'
TO_NUMBER('1459') → 1459
1459 + 1 + 40 → 1500

Oracle silently guessed your intent.

PostgreSQL’s Behavior: “Be Explicit”

PostgreSQL does no guessing. It follows strict operator precedence:

TO_CHAR(loadmonth,'MI')::integer → 59
59 + 1 + 40 → 100
(end_hr - 1)::text → '14'
'14' || '100' → 14100

Different grouping. Different result. No error.

Proof: Oracle’s Execution Plan

Oracle doesn’t hide this, it just doesn’t advertise it.

EXPLAIN PLAN FORSELECT CASE         WHEN TO_CHAR(varmonth,'MI')+1=60         THEN varhr-1||TO_CHAR(varmonth,'MI')+1+40         ELSE varhr-1||TO_CHAR(varmonth,'MI')+1     ENDFROM sample_loads;SELECT *FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, NULL, 'ALL'));

The projection shows:

TO_NUMBER(    TO_CHAR("VARHR"-1)||TO_CHAR(INTERNAL_FUNCTION("VARMONTH"),'MI')    )

That TO_NUMBER() wrapping the concatenation is the smoking gun.

Why This Bug Is So Hard to Catch

It never throws an error
The SQL looks correct
Early test data rarely hits edge cases
Automated migration tools miss it
The behavior difference is undocumented in most migration guides

This is not a syntax problem. It’s a behavioral difference.

The Real Issue Isn’t concat operator(||) or implicit casting

This comes down to philosophy:

AspectOraclePostgreSQLType handlingImplicit type coercionExplicit castingOperator behaviorFlexible, context-drivenStrict and deterministicOperator precedenceMay group expressions implicitlyFixed, well-defined precedenceDeveloper experienceConvenience-orientedPrecision-orientedError toleranceTries to “make it work”Forces you to be explicitCore philosophy“Make it work”“Say what you mean”

Neither is wrong. But assuming they behave the same is dangerous.

The Fix: Make Intent Explicit

SELECTCASE WHEN TO_CHAR(varmonth, 'MI') :: integer + 1 = 60   THEN ((end_hr -1) :: text || TO_CHAR(varmonth, 'MI'))::integer + 1 + 40ELSE ((end_hr -1)::text || TO_CHAR(varmonth, 'MI')) ::integer + 1   END varhourFROM sample_loadsORDER BY id;

Output:

This version:

Produces identical results
Documents intent
Survives migrations
Prevents silent data corruption

Real-World Impact

I’ve seen this exact pattern cause:

Financial miscalculations
Audit timestamp mismatches
Reconciliation failures weeks after go-live
“The numbers don’t add up” production escalations

The worst part? These bugs surface after trust is already established.

Key Takeaways

Execution plans reveal truth, not source code
|| mixed with + is a migration red flag
Explicit casting doesn’t guarantee identical behavior
Migration is about semantics, not syntax

The Bottom Line

Database migration isn’t translation. It’s interpretation.

When Oracle silently rewrites logic and PostgreSQL refuses to guess, you must be explicit. And once you start writing SQL that works the same everywhere, you don’t just migrate safely you migrate confidently.

Try it Yourself

-- OracleDROP TABLE sample_loads;CREATE TABLE sample_loads (    id INTEGER,    varmonth TIMESTAMP,    varhr INTEGER);INSERT INTO sample_loads VALUES (1, TIMESTAMP '2024-01-16 23:59:59', 15);INSERT INTO sample_loads VALUES (2, TIMESTAMP '2024-01-15 23:59:59', 24);SELECT varhr,        TO_CHAR(varmonth,'MI') as minutes,       varhr-1||TO_CHAR(varmonth,'MI')+1+40 as loadhourFROM sample_loads;-- Check the execution planEXPLAIN PLAN FORSELECT varhr-1||TO_CHAR(varmonth,'MI')+1+40 FROM sample_loads;SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, NULL, 'ALL'));

http://databaserookies.wordpress.com/?p=3538

Extensions

PostgreSQL Table Rename and Views – An OID Story

Deepak Mahto Jan 5, 2026 Updated Jan 6, 2026

In a recent post-migration activity, a large table required a UUID column addition. Instead of using ALTER TABLE, which locks the table, a new table was created to improve performance. However, renaming can create dependency issues with views, as PostgreSQL uses OIDs for tracking. Properly managing dependencies is crucial to avoid operational risks. Continue reading →

Show full content

Recently during a post-migration activity, we had to populate a very large table with a new UUID column (NOT NULL with a default) and backfill it for all existing rows.

Instead of doing a straight:

ALTER TABLE ... ADD COLUMN ... DEFAULT ... NOT NULL;

we chose the commonly recommended performance approach:

Create a new table (optionally UNLOGGED),
Copy the data,
Rename/swap the tables.

This approach is widely used to avoid long-running locks and table rewrites but it comes with hidden gotchas. This post is about one such gotcha: object dependencies, especially views, and how PostgreSQL tracks them internally using OIDs.

A quick but important note

On PostgreSQL 11+, adding a column with a constant default is a metadata-only operation and does not rewrite the table. However:

This is still relevant when the default is volatile (like uuidv7()),
Or when you must immediately enforce NOT NULL,
Or when working on older PostgreSQL versions,
Or when rewriting is unavoidable for other reasons.

So the rename approach is still valid but only when truly needed.

The Scenario: A Common Performance Optimization

Picture this: You’ve got a massive table with millions of rows, and you need to add a column with unique UUID default value and not null constraint values. The naive approach? Just run ALTER TABLE ADD COLUMN. But wait for large tables, this can lock your table while PostgreSQL rewrites every single row and can incur considerable time.

So what do we do? We get clever. We use the intermediate table(we can also use unlogged table) with rename trick, below is an sample created to show the scenario’s.

drop table test1;
create table test1
(col1 integer, col2 text, col3 timestamp(0));

insert into test1 
select col1, col1::text , (now() - (col1||' hour')::interval) 
from generate_series(1,1000000) as col1;

create view vw_test1 as 
select * from test1;


CREATE TABLE test1_new 
(like test1 including all);
alter table test1_new 
add column col_uuid uuid default uuidv7() not null;
insert into test1_new 
select * , uuidv7() from test1;

BEGIN;
ALTER TABLE test1 RENAME TO test1_old;
ALTER TABLE test1_new RENAME TO test1;
COMMIT;

Looks perfect, right? Fast, efficient, minimal downtime.
But do take a note on what happen to when we want to drop old table.

demo=# drop table test1_old;
ERROR:  cannot drop table test1_old because other objects depend on it
DETAIL:  view vw_test1 depends on table test1_old
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

The Silent Killer: Views Still Point to the Old Table

Even though test1 was renamed, the view still references test1_old:

demo=# \dv vw_test1
             List of views
┌────────┬──────────┬──────┬──────────┐
│ Schema │   Name   │ Type │  Owner   │
├────────┼──────────┼──────┼──────────┤
│ public │ vw_test1 │ view │ postgres │
└────────┴──────────┴──────┴──────────┘
(1 row)

demo=# \sv vw_test1
CREATE OR REPLACE VIEW public.vw_test1 AS
 SELECT col1,
    col2,
    col3
   FROM test1_old

Why?

Understanding PostgreSQL’s OID and its usage in dependency.

PostgreSQL does not track dependencies by object names. It uses internal object identifiers (OIDs).

When a view is created:

The SQL is parsed,
* is expanded,
Table and column references are stored by OID and attribute number in pg_rewrite.

Renaming a table only changes the human-readable name in pg_class. The OID remains the same.

But when you swap tables, you are not renaming — you are introducing a new object with a new OID. The view still points to the old one.

-- Check what your view is actually referencing
SELECT DISTINCT
    source_ns.nspname AS source_schema,
    source_table.relname AS source_table,
    source_table.relkind AS object_type
FROM pg_depend d
JOIN pg_rewrite r ON r.oid = d.objid
JOIN pg_class view_class ON view_class.oid = r.ev_class
JOIN pg_class source_table ON source_table.oid = d.refobjid
JOIN pg_namespace source_ns ON source_ns.oid = source_table.relnamespace
WHERE view_class.relname = 'vw_test1'
    AND d.deptype = 'n'  -- normal dependency
    AND source_table.relkind IN ('r', 'v', 'm', 'f')  -- tables, views, mat views, foreign tables
ORDER BY source_schema, source_table;

And It’s Not Just Views, it’s just the visible symptom. This approach also affects:

Grants and privileges
RLS policies
Triggers
Foreign keys
Logical replication publications
Statistics objects
Atomic Functions that reference the table

All of these are bound to the old table’s OID.

Ready to enhance your PostgreSQL development skills? My course on PL/pgSQL will help you excel as Database Developer. Click here to start your journey!!

The Right Way to Handle It

So how do we fix this? We have a few options:

Option 1: Recreate the Views (Safest)

-- After the rename, drop and recreate all dependent views
demo=# drop view vw_test1;create view vw_test1 as select * from test1;
DROP VIEW
CREATE VIEW

Option 2: Avoid table swapping if you can, Batch updates are often safer:

Sharing an sample just for reference.

-- Add column without default (fast)
ALTER TABLE users ADD COLUMN status TEXT;

-- Update in batches (no long lock)
DO $$
DECLARE
    batch_size INT := 10000;
    last_id BIGINT := 0;
BEGIN
    LOOP
        UPDATE users 
        SET status = 'active'
        WHERE id > last_id 
        AND status IS NULL
        AND id <= last_id + batch_size;
        
        EXIT WHEN NOT FOUND;
        last_id := last_id + batch_size;
        COMMIT;
    END LOOP;
END $$;

-- Then add the default for future rows
ALTER TABLE users ALTER COLUMN status SET DEFAULT 'active';

Always Check Dependencies First

SELECT 
    dependent_view.relname as view_name
FROM pg_depend 
JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid 
JOIN pg_class as dependent_view ON pg_rewrite.ev_class = dependent_view.oid 
JOIN pg_class as source_table ON pg_depend.refobjid = source_table.oid 
WHERE source_table.relname = '<<table_name>>'
AND dependent_view.relkind = 'v';

If you see views, you know you’ll need to recreate them.

The Takeaway

PostgreSQL’s OID-based dependency system is elegant and efficient — but it also means:

Names are for humans. OIDs are what PostgreSQL actually uses.

The rename/swap trick can be a powerful optimization, but it trades execution-time safety for operational complexity. If you use it:

Audit dependencies,
Script object recreation,
Test in staging,
And document everything.

Otherwise, the optimization meant to avoid a lock may quietly introduce correctness, maintenance, and operational risks that cost far more later.

http://databaserookies.wordpress.com/?p=3513

Extensions

PostgreSQL Partition Pruning: The Role of Function Volatility

Deepak Mahto Nov 4, 2025 Updated Nov 5, 2025

Improper volatility settings in PostgreSQL functions can significantly impact query performance, especially with partitioned tables. Functions like now() and statement_timestamp() (STABLE) allow efficient partition pruning, while volatile functions like clock_timestamp() hinder performance by necessitating full partition scans. Therefore, understanding function volatility is crucial for optimized SQL queries. Continue reading →

Show full content

In one of our earlier blogs, we explored how improper volatility settings in PL/pgSQL functions — namely using IMMUTABLE, STABLE, or VOLATILE — can lead to unexpected behavior and performance issues during migrations.

Today, let’s revisit that topic from a slightly different lens. This time, we’re not talking about your user-defined functions, but the ones PostgreSQL itself provides — and how their volatility can quietly shape your query performance, especially when you’re using partitioned tables.

When Partitioning Doesn’t Perform

When designing partitioning strategies, performance is usually the first thing that comes to mind.
But what if despite all your tuning the queries still fail to leverage partition pruning effectively?

The answer sometimes lies not in your schema or indexes, but in how your filters are evaluated.

Every function you use in a filter has its own volatility behavior that determines how it’s planned and executed. You might pick the right filter logically — but if the function is volatile, it can completely change how PostgreSQL builds and optimizes the plan.

Let’s say you’ve designed a table as daily partitions and use Date Functions as partitions filters and check its underlying executions plan.

CREATE TABLE IF NOT EXISTS events_daily (
    id bigint generated always as identity ,
    created_at timestamptz NOT NULL,
    created_date date NOT NULL,      
    user_id int NOT NULL,
    amount numeric(10,2) NOT NULL,
    status text,
    payload text
) PARTITION BY RANGE (created_date);

--create a year of partitions as daily
DO
$$
DECLARE
    start_date date := current_date - INTERVAL '364 days';
    d date;
    part_name text;
BEGIN
    d := start_date;
    WHILE d <= current_date LOOP
        part_name := format('events_p_%s', to_char(d, 'YYYYMMDD'));
        EXECUTE format($sql$
            CREATE TABLE IF NOT EXISTS %I PARTITION OF events_daily
            FOR VALUES FROM (%L) TO (%L);
        $sql$, part_name, d::text, (d + INTERVAL '1 day')::date::text);
        d := d + INTERVAL '1 day';
    END LOOP;
    EXECUTE 'CREATE TABLE IF NOT EXISTS events_default PARTITION OF events_daily DEFAULT';
END;
$$;

--Populate Partitions
WITH params AS (
  SELECT
    (current_date - INTERVAL '364 days')::date AS start_date,
    365 AS ndays
)
INSERT INTO events_daily (created_at, created_date, user_id, amount, status, payload)
SELECT
  (params.start_date + ((s.i - 1) % params.ndays) * INTERVAL '1 day')
    + ( (random() * 86399)::int * INTERVAL '1 second' ) AS created_at,
  (params.start_date + ((s.i - 1) % params.ndays) * INTERVAL '1 day')::date AS created_date,
  (1 + floor(random() * 50000))::int AS user_id,
  round(random() * 10000::numeric) AS amount,
  (array['new','processing','done','failed'])[1 + (floor(random() * 4))::int]::text AS status,
  md5((random()::text || clock_timestamp()::text || s.i::text)) AS payload
FROM
  generate_series(1, 1000000) AS s(i),params
;

analyze events_daily;

When working with partitioned tables, always ensure filters are applied on the partition key.Otherwise, PostgreSQL will scan every partition — even if indexes exist.

In most date-based partitioning strategies, filters are based on the system date. Let’s assume we’re querying data for the last two days.

Sharing some of the samples across data functions to get last 3 days of partitions data.

--Filtering using clock_timestamp(), now(), and statement_timestamp()
select count(1) from events_daily 
where 
created_date between clock_timestamp() - interval '2 days' 
and clock_timestamp();

select count(1) from events_daily 
where created_date between now() - interval '2 days' 
and now();

select count(1) from events_daily where 
created_date between statement_timestamp() - interval '2 days' 
and statement_timestamp();

Same Logic, Different Performance

At first glance, all these queries look functionally identical — they fetch data for the last few days.
But there’s one subtle, yet crucial difference: the volatility of the functions used in the filter against the partition column.

Below is the volatility characteristics of the date functions used in each queries.

From PostgreSQL Docs on Volatile functions.

The optimizer makes no assumptions about the behavior of such functions. A query using a volatile function will re-evaluate the function at every row where its value is needed.

Why Volatility Matters for Partition Pruning

Partition pruning depends on whether PostgreSQL can evaluate the filter expression at plan time.
If it can, only relevant partitions are scanned.
If not, PostgreSQL must assume any partition could match — and ends up scanning all of them.

With statement_timestamp() or now() (both STABLE), PostgreSQL can safely compute their values during planning. It knows exactly which partitions fall within range — so it prunes the rest efficiently.

But with clock_timestamp() (VOLATILE), the optimizer can’t predict its value in advance. So PostgreSQL plays it safe — scanning every partition because pruning can’t be guaranteed.

Putting It All Together: Volatility and Query Planning

Relying on volatile functions in partition filters doesn’t break correctness — but it quietly breaks performance.
What should have been a quick lookup across a few recent partitions can turn into a full partition scan, wasting CPU and I/O.

The fix isn’t complex:
Prefer STABLE functions like now() and statement_timestamp() when filtering partitions
Avoid VOLATILE ones like clock_timestamp() in partition predicates

In performance-sensitive environments, understanding function volatility isn’t optional — it’s a key part of writing partition-friendly SQL.

http://databaserookies.wordpress.com/?p=3470

Extensions

Exploring PostgreSQL 18: A Developer’s Guide to New Features – Part 1: The COPY Command Gets More User-Friendly

Deepak Mahto Sep 28, 2025 Updated Sep 28, 2025

PostgreSQL 18, released on September 25, 2024, enhances the COPY command with improved error handling through the REJECT_LIMIT option, allowing data loading to be controlled by limiting errors. This version also introduces additional logging options, aiding database developers in maintaining data integrity during file imports and improving overall data quality. Continue reading →

Show full content

PostgreSQL 18 was released on September 25, 2024, introducing a host of exciting new features across all major areas. In this blog series, we’ll delve into these features and explore how they can benefit database developers and migration engineers transitioning to PostgreSQL.

Part 1 of the series focuses on enhancements to the COPY command, particularly for use cases involving loading external files into the database.

PostgreSQL 18 – Enhancements with the COPY Command

The COPY command is the default native option for loading flat files, primarily in text or CSV format, into PostgreSQL. If you are a data engineer or database developer looking for ways to load flat files into PostgreSQL, you must take care of quality checks and encapsulate a mechanism to report or discard failed records.

Until PostgreSQL 16, if quality checks were planned to be done on the database side, one option was to load flat files into staging tables with all columns as text or a generic data type and later move legitimate and discarded data to the concerned tables.With PostgreSQL 17 onward, the COPY command provides options for handling data type incompatibilities and logging failed rows or records using on_error options.

In PostgreSQL 18 these features is further enhances with REJECT_LIMIT options. If overall discarded or error row data is beyond the reject limit the copy command is discarded as whole.

PostgreSQL 18 – New REJECT_LIMIT option.

Let’s start by creating sample target tables and a CSV file with intentional data type mismatch issues. The sample table also has check and NOT NULL constraints defined on it.

CREATE TABLE copy18(
    col1 integer,
    col2 text DEFAULT 'x'::text,
    col3 text NOT NULL,
    col4 text,
    col5 boolean,
    col6 date,
    col7 timestamp without time zone
    CONSTRAINT copy18_col4_check CHECK (length(col4) > 2)
);

The CSV file contains data type incompatibilities in the following lines:

Timestamp (4th line)
Integer (5th line)
Boolean type (7th line)
Date (8th line)

Here is the content of the test.csv file:

--test.csv file.
1,a,a,aaaa,yes,20240101,2024-01-01 12:00:00
2,a,a,aaaa,no,20240102,2024-01-01 12:00:00
3,a,a,aaaa,Y,20240103,2024-01-01 12:00:00
4,a,a,aaaa,N,20240104,2024-01-01 12:00:61
a,a,a,aaaa,1,20240103,20240101010101
999999999999999999999999,1,1,1,0,20241201,20240101010101
5,,a,aaaa,adhoc,20241201,2024-01-01 12:00:51
6,,a,aaaa,false,20241301,2024-01-01 12:00:51

In our sample, we will be using the client-side copy command available as part of the psql command line. It is one of the preferred options when working with managed PostgreSQL on cloud platforms like RDS or Cloud SQL. If you are wondering about the difference between COPY and copy, do check out here.

Controlling Data Quality During Loads with Reject Limits in PostgreSQL 18

If we try to load the same CSV into PostgreSQL 17 with on_error options, it will processed data and discard all error without any limit on overall no.of errors.

postgres=# \copy copy18 from 'test.csv' (on_error ignore, format csv, log_verbosity verbose);
NOTICE:  skipping row due to data type incompatibility at line 4 for column "col7": "2024-01-01 12:00:61"
NOTICE:  skipping row due to data type incompatibility at line 5 for column "col1": "a"
NOTICE:  skipping row due to data type incompatibility at line 6 for column "col1": "999999999999999999999999"
NOTICE:  skipping row due to data type incompatibility at line 7 for column "col5": "adhoc"
NOTICE:  skipping row due to data type incompatibility at line 8 for column "col6": "20241301"
NOTICE:  5 rows were skipped due to data type incompatibility
COPY 3
postgres=# table copy18;
┌──────┬──────┬──────┬──────┬──────┬────────────┬─────────────────────┐
│ col1 │ col2 │ col3 │ col4 │ col5 │    col6    │        col7         │
├──────┼──────┼──────┼──────┼──────┼────────────┼─────────────────────┤
│    1 │ a    │ a    │ aaaa │ t    │ 2024-01-01 │ 2024-01-01 12:00:00 │
│    2 │ a    │ a    │ aaaa │ f    │ 2024-01-02 │ 2024-01-01 12:00:00 │
│    3 │ a    │ a    │ aaaa │ t    │ 2024-01-03 │ 2024-01-01 12:00:00 │
└──────┴──────┴──────┴──────┴──────┴────────────┴─────────────────────┘
(3 rows)

If we need to control error’s rows and accept it only when it is within a limit, REJECT_LIMIT helps to achieve it. it’s applicable with on_error option.

postgres=# \copy copy18 from 'test.csv' (on_error ignore, reject_limit 3, format csv, log_verbosity verbose);
NOTICE:  skipping row due to data type incompatibility at line 4 for column "col7": "2024-01-01 12:00:61"
NOTICE:  skipping row due to data type incompatibility at line 5 for column "col1": "a"
NOTICE:  skipping row due to data type incompatibility at line 6 for column "col1": "999999999999999999999999"
NOTICE:  skipping row due to data type incompatibility at line 7 for column "col5": "adhoc"
ERROR:  skipped more than REJECT_LIMIT (3) rows due to data type incompatibility
CONTEXT:  COPY copy18, line 7, column col5: "adhoc"

With REJECT_LIMIT set to 3, for 4th error it failed the COPY command.

Additional LOG_VERBOSITY silent option

LOG_VERBOSITY also provides an additional option silent to limits information printed or logged.

postgres=# \copy copy18 from 'test.csv' (on_error ignore, reject_limit 5, format csv, log_verbosity silent);
COPY 3

Conclusion

Database developer working on PostgreSQL 18 can take benefit of newly introduced features on COPY command,:

Reject data load based on data type incompatibilities on ON_ERROR options based on REJECT_LIMIT
Improved logging with LOG_VERBOSITY with silent options.

These features streamline data loading and ensure higher data quality.

http://databaserookies.wordpress.com/?p=3443

Extensions

PostgreSQL Case-Insensitive Search: Handling LIKE with Nondeterministic Collations

Deepak Mahto Sep 6, 2025 Updated Sep 6, 2025

Migrating from Oracle or SQL Server to PostgreSQL often brings challenges with case-insensitive searches, since PostgreSQL doesn’t support them natively. Workarounds include nondeterministic collations or custom operators, though the recommended approach is to use the built-in "C" collation with ILIKE. Notably, PostgreSQL 18 is set to improve case-insensitive matching, making migrations smoother. Continue reading →

Show full content

Case-insensitive search is one of the most common issues I encounter when helping customers migrate from Oracle or SQL Server to PostgreSQL. Unlike Oracle (with NLS_SORT) or SQL Server (with case-insensitive collations), PostgreSQL does not natively support case-insensitive search.

Developers typically rely on workarounds such as:

Wrapping comparisons with LOWER() or UPPER()
leverage citext data type (not recommended)
Leveraging ICU-based nondeterministic collations

Why citext is not recommended.
Reference – https://www.postgresql.org/docs/current/citext.html

In this post, I’ll focus on nondeterministic collations and their behavior with LIKE or ILIKE operators. Understanding Nondeterministic Collations

In PostgreSQL, a nondeterministic collation lets text comparisons ignore case/accents, enabling flexible searches like case-insensitive equality.

creating sample collation to support case insensitive search.

CREATE COLLATION collate_ci (provider = icu, deterministic = false, locale = 'en-US-u-ks-level2');

Let’s create a sample table with columns using nondeterministic collation:

create table testcollation(col1 text collate collate_ci); 
insert into testcollation values ('a');
insert into testcollation values ('A');

select * from testcollation where col1 = 'a';
 col1 
------
 a
 A
(2 rows)

Using nondeterministic collations helps us implement case-insensitive comparison with equality operators. However, let’s examine how this works with LIKE or ILIKE operators for wildcard-based searches.
The Problem: Wildcard Operator Limitations While nondeterministic collations enable case-insensitive searches, they fail when used with wildcard operators (LIKE or ILIKE) because this functionality is not supported in PostgreSQL versions prior to 18:

postgres=> select * from testcollation where col1 like 'a%';
ERROR:  nondeterministic collations are not supported for LIKE

PostgreSQL 17 does not support LIKE/ILIKE with nondeterministic collations.
Let’s explore two workarounds.

Solution Approaches

We will examine two alternative approaches to mitigate the limitation of nondeterministic collations when using wildcard operators.

Option 1 :- Using buildin “C” Collation with ilike

PostgreSQL offers multiple built-in collations for different use cases. You can view available collations using the following psql command:

postgres=# \dOS+

List of collations
┌────────────┬────────────────────────┬──────────┬─────────────┬─────────────┐
│   Schema   │  Name │ Provider │  Collate  │ Ctype  │  Locale │ ICU Rules │ Deterministic? │
├────────────┼────────────────────────┼──────────┼───────────┤
│ pg_catalog │ C   │ libc     │ C     │ C   │    │    │ yes            │
│ pg_catalog │ POSIX  │ libc  │ POSIX │ POSIX    │   │   │ yes            │
........

In PostgreSQL, the C collation is one of the built-in collations that comes from the underlying C library (libc).

The Workaround: Override the collation with “C” and use it in SQL statements that require the LIKE operator. Such occurrences can be identified in views or procedural code and fixed proactively as part of the migration process.

To support case-insensitive searches, map the LIKE operator to ILIKE while applying the “C” collation:

postgres=# select * from testcollation where col1 collate "C" ilike 'a%';
┌──────┐
│ col1 │
├──────┤
│ a    │
│ A    │
└──────┘
(2 rows)

Option 2: Using Custom Operators with Internal “C” Collation

PostgreSQL’s extensibility allows us to create custom operators. By defining our own, we can extend the behavior of LIKE and ILIKE to overcome this limitation.

Understanding Internal Operators: When we use the LIKE operator, it is internally transformed to ~~ as shown in the execution plan. Similarly, the ILIKE operator uses ~~* internally.

Similarly, we can identify operators for ILIKE operators as well, i..e ~~*

Creating Custom Functions: To create a custom operator in PostgreSQL, we first need supporting functions. PostgreSQL provides built-in functions that we can leverage to implement our own comparison logic.

Sample function samples using texticlike to exhibit case insensitive search.

To avoid code changes while preserving case-insensitive functionality, we can create a custom schema with operators that override ~~, the internal operator used by LIKE:

Sample code on creating custom operator, for more information check out here.

create schema customoperator;

CREATE OR REPLACE FUNCTION customoperator.custom_texticlike(text, text) RETURNS boolean
    LANGUAGE sql IMMUTABLE STRICT
    AS $_$SELECT pg_catalog.texticlike($1 collate "C",$2 collate "C");$_$;

CREATE OR REPLACE FUNCTION customoperator.custom_texticnotlike(text, text) RETURNS boolean
    LANGUAGE sql IMMUTABLE STRICT
    AS $_$SELECT pg_catalog.texticnlike($1 collate "C",$2 collate "C");$_$;

CREATE OPERATOR customoperator.~~ (
    FUNCTION = customoperator.custom_texticlike,
    LEFTARG = text,
    RIGHTARG = text,
    COMMUTATOR = OPERATOR(customoperator.~~),
    NEGATOR = OPERATOR(customoperator.!~~)
);

CREATE OPERATOR customoperator.!~~ (
    FUNCTION = customoperator.custom_texticnotlike,
    LEFTARG = text,
    RIGHTARG = text,
    COMMUTATOR = OPERATOR(customoperator.!~~),
    NEGATOR = OPERATOR(customoperator.~~)
);

Important Consideration:
Search Path Precedence When overriding built-in operators, we must consider the precedence of pg_catalog within the search_path. The search_path tells PostgreSQL which schemas to search when you reference an object without a schema name. By default, the system schema pg_catalog—which contains all built-in functions and operators—is always included.

To make our custom operators take priority over the built-in ones, we must place our custom schema before pg_catalog in the search_path:

Setting custom search_path with customoperator schema precede pg_catalog.

postgres=# set search_path to customoperator,pg_catalog, "$user", public;

Once the search_path is updated, any LIKE operator in SQL will automatically use the custom operator and its function. In effect, every LIKE clause is transparently handled as a case-insensitive match with “C” collation:

postgres=# explain select * from testcollation where col1 like 'a%';
┌──────────────────────────────────────────────────────────────────┐
│                            QUERY PLAN                            │
├──────────────────────────────────────────────────────────────────┤
│ Seq Scan on testcollation  (cost=0.00..27.00 rows=1360 width=32) │
│   Filter: texticlike((col1)::text, 'a%'::text COLLATE "C")       │
└──────────────────────────────────────────────────────────────────┘
(2 rows)

Comparison of Solutions:-

Nondeterministic collations fixesPro’sCon’sUsing buildin “C” Collation with ilikeUse inbuild “C” collation for all like comparisons.

Changes are limited to each occurrence.Require manual code changes Using Custom operators that internally use collate “C”Requires no manual code changes.

App code can function without any changes.Change default PostgreSQL behaviour(overriding pg_catalog search_path)

Impact all sessions of PostgreSQL Database. Final Recommendations

Use Option 1 (Collate “C” + ILIKE) for production environments — it’s safer and more explicit Consider
Option 2 (custom operators) only if code changes are impractical — due to the global impact on search_path
Plan to re-test with PostgreSQL 18 once it becomes generally available — for native support

Note: Overwriting search_path to change pg_catalog precedence can have minor impacts on all sessions and is not generally recommended as best practice.

Looking Ahead: PostgreSQL 18 Enhancements

PostgreSQL 18 introduces significant enhancements related to collations, particularly around nondeterministic collations:

New CASEFOLD collation feature to further assist with case-insensitive matches

Native support for LIKE comparisons on text columns that use nondeterministic (case-insensitive) ICU collations

Simplified pattern matching on case-insensitive text, making it more consistent across operations

These improvements will make pattern matching on case-insensitive text much simpler and eliminate the need for the workarounds described in this post.

Conclusion

While PostgreSQL’s current limitations with nondeterministic collations and LIKE operators present challenges for case-insensitive searches, both workarounds provide viable solutions. Choose the approach that best fits your migration strategy and maintenance requirements, keeping in mind the upcoming native support in PostgreSQL 18.

Reference

Case-insensitive pattern matching in PostgreSQL

https://aws.amazon.com/blogs/database/manage-case-insensitive-data-in-postgresql/(PostgreSQL 15)

http://databaserookies.wordpress.com/?p=3377

Extensions

PostgreSQL 18 Beta Preview – Export or Amend Statistics with Ease

Deepak Mahto May 10, 2025 Updated May 11, 2025

PostgreSQL 18 Beta introduces powerful enhancements to statistics management. For the first time, users can export, import, and modify planner statistics—making it easier to replicate production behavior in lower environments and fine-tune query performance without relying on actual data loads. Continue reading →

Show full content

PostgreSQL 18 beta has been released, and it’s the perfect time to start exploring the new features we can expect in the General Availability (GA) release. One feature that particularly caught my attention relates to improvements in statistics export and amendment.

Here’s an excerpt from the official PostgreSQL release notes:

Add functions to modify per-relation and per-column optimizer statistics (Corey Huinker) 
Add pg_dump, pg_dumpall, and pg_restore options --statistics-only, --no-statistics, --no-data, and --no-schema (Corey Huinker, Jeff Davis)

One of the key ingredients in performance analysis is understanding the underlying statistics of tables, columns, and indexes. Often, we encounter cases where queries behave differently in production compared to Pre-Prod or UAT environments. A common reason for this discrepancy is the difference in data distribution and statistics. How many times have we wished for a way to replicate production-like statistics in lower environments—without the overhead of copying data and running manual ANALYZE?

With PostgreSQL 18, we can look forward to features that allow us to export, import, or even modify statistics—making it easier to simulate production behavior in non-production environments without needing actual data loads.

Dump Statistics.
pg_dump --statistics-only --table=public.skewed_data_int
Reset Table, Index or Materialize View Stats
pg_restore_relation_stats
Reset Attributes level statistics
pg_restore_attribute_stats

The following is a sample output from pg_dump that includes statistics. This can also be used within the same instance—or in another environment—to overwrite existing statistics and influence execution plans.

--table or materialize views
SELECT * FROM pg_catalog.pg_restore_relation_stats(
	'version', '180000'::integer,
	'schemaname', 'public',
	'relname', 'skewed_data_int',
	'relpages', '84'::integer,
	'reltuples', '10000'::real,
	'relallvisible', '84'::integer,
	'relallfrozen', '0'::integer
);
--columns or attributes
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
	'version', '180000'::integer,
	'schemaname', 'public',
	'relname', 'skewed_data_int',
	'attname', 'categorical_col',
	'inherited', 'f'::boolean,
	'null_frac', '0'::real,
	'avg_width', '4'::integer,
	'n_distinct', '4'::real,
	'most_common_vals', '{1,2,3,4}'::text,
	'most_common_freqs', '{0.5999,0.3154,0.0809,0.0038}'::real[],
	'correlation', '0.4595874'::real
);
--Index
SELECT * FROM pg_catalog.pg_restore_relation_stats(
	'version', '180000'::integer,
	'schemaname', 'public',
	'relname', 'skewed_data_int_pkey',
	'relpages', '30'::integer,
	'reltuples', '10000'::real,
	'relallvisible', '0'::integer,
	'relallfrozen', '0'::integer
);

Conclusion

Statistics are at the heart of PostgreSQL’s query planner, and having accurate or production-like stats can significantly impact performance tuning and execution plan accuracy. With PostgreSQL 18, the ability to export, import, and tweak statistics opens up powerful new possibilities for DBAs and developers—especially in testing and pre-production environments.

This feature reduces the guesswork in performance investigations and offers a safer, data-free way to simulate real-world workloads. If you’re serious about performance, this is a feature worth exploring in the PostgreSQL 18 Beta.

http://databaserookies.wordpress.com/?p=3362

Extensions