Foreign Data Wrappers (FDWs) are one of PostgreSQL’s most powerful features, enabling seamless access to remote databases as if they were local.
Among these FDW options, Postgres_FDW stands out for connecting PostgreSQL instances across servers. But with great power comes complexity —especially when dealing with large datasets or complex distributed queries.
In this post, we’ll cover how to set up Postgres_FDW, best practices for performance, and troubleshooting tips for common issues like I/O errors, connection failures, and query timeouts.
Why Use Postgres_FDW?
Postgres_FDW allows you to query remote PostgreSQL databases as if they existed locally, without the overhead of ETL processes, making cross-database joins and reporting seamless. It simplifies data access across multiple systems and significantly reduces operational complexity when working with distributed environments.
Let’s progress towards setup and explore how to configure Postgres_FDW step by step, ensuring you enable the extension, along with optimizations to key parameters for performance and stability.
Step 1: Install and Configure Postgres_FDW
Add Postgres_FDW to the azure.extensions server parameter.
Install the extension on the server.
CREATE EXTENSION IF NOT EXISTS Postgres_FDW; Create a foreign server and user mapping.
CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'remote-host', dbname 'remote-db', port '5432'); CREATE USER MAPPING FOR local_user SERVER foreign_server OPTIONS (user 'remote_user', password 'remote_password'); Import schemas to avoid manual errors.
IMPORT FOREIGN SCHEMA remote_schema FROM SERVER foreign_server INTO local_schema; Step 2: Optimize Performance with Key parameters
Step a: Enable Remote Cost Estimates
use_remote_estimate: When set to true, PostgreSQL retrieves cost estimates from the remote server instead of relying on local statistics. It improves query planning accuracy for joins and filters on foreign tables. By default, PostgreSQL uses local stats for query planning. Below is how you can set this parameter.
ALTER SERVER foreign_server OPTIONS (add use_remote_estimate 'true'); Step b: Tune Fetch and Batch Sizes
fetch_size: Controls the number of rows fetches per network round trip. Larger fetch_size reduces network overhead but increases memory usage. For large datasets (>100GB), start with fetch_size = 10000 and adjust based on memory usage and latency. Below is the command to update the parameter.
ALTER SERVER foreign_server OPTIONS (add fetch_size '10000'); batch_size: Determines how many rows are inserted in a single batch when using INSERT ... SELECT into foreign tables. Larger batches reduce transaction overhead but might increase lock contention and memory usage. Hence be conservative while setting this parameter and monitor memory usage.
ALTER FOREIGN TABLE orders OPTIONS (ADD batch_size '2000'); ALTER FOREIGN TABLE products OPTIONS (ADD batch_size '2000'); These settings reduce network latency and improve throughput.
Step c: Analyze foreign tables
When you create foreign tables using Postgres_FDW, PostgreSQL treats them like local tables for query planning—but here’s the catch:
- The local planner needs statistics (row counts, column distribution, etc.) to estimate costs and choose the best plan.
- Foreign tables don’t automatically have these stats because the data lives on a remote server.
Without stats, the planner guesses and lead to bad plans—like nested loops instead of hash joins or fetching far more rows than needed.
Running ANALYZE on foreign tables populates the local statistics catalog with sampled data from the remote source. This gives the planner a realistic picture of:
- Row count in the foreign table
- Column value distribution for selectivity estimates
This enables query planner to perform the below accurately instead of guess work.
- Join Order Decisions: With accurate row counts, the planner can decide whether to push joins remotely or do them locally.
- Filter Selectivity: If a column has skewed data, ANALYZE helps the planner avoid scanning millions of rows unnecessarily.
Hence running ANALYZE after importing foreign tables or after major data changes on the remote side will help with improving query performance.
Step 3: Network related parameters to keep in check
Long-running queries often fail because connections go idle. That’s where TCP keepalive settings come in, these keep the connection alive during heavy lifting. You can find the server parameters below in the azure portal.
tcp_keepalives_idle = 120
tcp_keepalives_interval = 30
tcp_keepalives_count = 9
You can also look for the above server parameters from the client using the command below.
SHOW tcp_keepalives_idle; SHOW tcp_keepalives_interval ; SHOW tcp_keepalives_count ;
Finally, there is search_path – if misconfigured may cause trouble accessing the remote tables.
SHOW search_path; -- Recommended: public A wrong search_path could lead to “relation not found” errors.
Example depicting performance improvement by adjusting the parameters:
Creating tables and loading data
On database1, create the tables below, load the data.
CREATE TABLE customers ( customer_id SERIAL PRIMARY KEY, name TEXT NOT NULL ); INSERT INTO customers (name) SELECT 'Customer_' || generate_series(1, 10000); CREATE TABLE customer_orders ( name TEXT, order_id INT, product_name TEXT, quantity INT ); On database2, create the below tables and load the data.
CREATE TABLE products ( product_id SERIAL PRIMARY KEY, product_name TEXT NOT NULL, price NUMERIC(10,2) ); CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INT NOT NULL, product_id INT NOT NULL REFERENCES products(product_id), quantity INT NOT NULL ); INSERT INTO products (product_name, price) SELECT 'Product_' || gs AS product_name, round((random() * 900 + 100)::numeric, 2) FROM generate_series(1, 1000000) AS gs; INSERT INTO orders (customer_id, product_id, quantity) SELECT (random() * 999 + 1)::int, (random() * 9999 + 1)::int, (random() * 9 + 1)::int FROM generate_series(1, 1000000); On databse1, now let’s create the Postgres_FDW extension and setup the foreign server.
CREATE EXTENSION postgres_fdw; CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'pg17testremove_server.postgres.database.azure.com', dbname 'database2'remote_database, port '5432'); CREATE USER MAPPING FOR CURRENT_USER SERVER foreign_server OPTIONS (user 'postgres', password 'xxxxxx'); IMPORT FOREIGN SCHEMA public LIMIT TO (products, orders) FROM SERVER foreign_server INTO public; The query to populate the customer_orders table is fetching the data from the foreign tables joining with customers table data and inserting into customer_orders table.
INSERT INTO customer_orders (name, order_id, product_name, quantity) SELECT c.name, o.order_id, p.product_name, o.quantity FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN products p ON o.product_id = p.product_id; Query plans based on different settings are provided below showing the performance improvement in the query.
Before setting use_remote_estimate:
postgres=> EXPLAIN (ANALYZE, BUFFERS) INSERT INTO customer_orders (name, order_id, product_name, quantity) postgres-> SELECT c.name, o.order_id, p.product_name, o.quantity postgres-> FROM customers c postgres-> JOIN orders o ON c.customer_id = o.customer_id postgres-> JOIN products p ON o.product_id = p.product_id; --------------------------------------------------------QUERY PLAN------------------------------------------------------------------------------------------------ Insert on customer_orders (cost=2590.00..5725.96 rows=0 width=0) (actual time=12296.455..12296.459 rows=0 loops=1) Buffers: shared hit=1934684 dirtied=15843 written=15838, temp read=17542 written=17542 -> Hash Join (cost=2590.00..5725.96 rows=10003 width=33) (actual time=3101.650..10238.213 rows=1900006 loops=1) Hash Cond: (o.product_id = p.product_id) Buffers: shared hit=3003, temp read=17542 written=17542 -> Nested Loop (cost=100.30..3098.72 rows=10003 width=25) (actual time=0.911..6239.952 rows=1900006 loops=1) Buffers: shared hit=3000 -> Foreign Scan on orders o (cost=100.00..2355.66 rows=10003 width=16) (actual time=0.888..5622.590 rows=1900006 loops=1) -> Memoize (cost=0.30..0.50 rows=1 width=17) (actual time=0.000..0.000 rows=1 loops=1900006) Cache Key: o.customer_id Cache Mode: logical Hits: 1899006 Misses: 1000 Evictions: 0 Overflows: 0 Memory Usage: 117kB Buffers: shared hit=3000 -> Index Scan using customers_pkey on customers c (cost=0.29..0.49 rows=1 width=17) (actual time=0.001..0.001 rows=1 loops=1000) Index Cond: (customer_id = o.customer_id) Buffers: shared hit=3000 -> Hash (cost=2364.66..2364.66 rows=10003 width=16) (actual time=3100.622..3100.623 rows=1000005 loops=1) Buckets: 262144 (originally 16384) Batches: 16 (originally 1) Memory Usage: 6653kB Buffers: shared hit=3, temp written=4570 -> Foreign Scan on products p (cost=100.00..2364.66 rows=10003 width=16) (actual time=0.709..2891.655 rows=1000005 loops=1) Planning: Buffers: shared hit=3 Planning Time: 0.166 ms Execution Time: 12298.305 ms (24 rows) Time: 12509.725 ms (00:12.510) After setting use_remote_estimate to TRUE:
postgres=> ALTER SERVER foreign_server OPTIONS (set use_remote_estimate 'true'); ALTER SERVER Time: 58.475 ms postgres=> EXPLAIN (ANALYZE, BUFFERS) INSERT INTO customer_orders (name, order_id, product_name, quantity) postgres-> SELECT c.name, o.order_id, p.product_name, o.quantity postgres-> FROM customers c postgres-> JOIN orders o ON c.customer_id = o.customer_id postgres-> JOIN products p ON o.product_id = p.product_id; -----------------------------------------------------QUERY PLAN-------------------------------------------------------------------------------- Insert on customer_orders (cost=2987.54..488274.85 rows=0 width=0) (actual time=9198.901..9198.903 rows=0 loops=1) Buffers: shared hit=1932316 dirtied=15837 written=15837 -> Hash Join (cost=2987.54..488274.85 rows=1588542 width=34) (actual time=23.026..7874.434 rows=1900006 loops=1) Hash Cond: (o.customer_id = c.customer_id) Buffers: shared hit=637 -> Foreign Scan (cost=100.43..480400.01 rows=1900006 width=26) (actual time=1.193..7338.653 rows=1900006 loops=1) Relations: (orders o) INNER JOIN (products p) -> Hash (cost=1637.05..1637.05 rows=100005 width=18) (actual time=21.674..21.674 rows=100005 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 6102kB Buffers: shared hit=637 -> Seq Scan on customers c (cost=0.00..1637.05 rows=100005 width=18) (actual time=0.007..6.670 rows=100005 loops=1) Buffers: shared hit=637 Planning: Buffers: shared hit=25 Planning Time: 41.459 ms Execution Time: 9200.142 ms (16 rows) Time: 9334.478 ms (00:09.334) The performance of the query has improved 25% compared to the initial run without setting the parameter.
Query performance after running ANALYZE on the foreign tables:
postgres=> ANALYZE products; ANALYZE Time: 247.826 ms postgres=> ANALYZE orders; ANALYZE Time: 143.998 ms postgres=> explain EXPLAIN (ANALYZE, BUFFERS) INSERT INTO customer_orders (name, order_id, product_name, quantity) postgres-> SELECT c.name, o.order_id, p.product_name, o.quantity postgres-> FROM customers c postgres-> JOIN orders o ON c.customer_id = o.customer_id postgres-> JOIN products p ON o.product_id = p.product_id; ------------------------------------------------------------QUERY PLAN------------------------------------------------------------------------- Insert on customer_orders (cost=2987.54..488274.85 rows=0 width=0) (actual time=5696.542..5696.545 rows=0 loops=1) Buffers: shared hit=1932308 dirtied=15837 written=15837 -> Hash Join (cost=2987.54..488274.85 rows=1900006 width=36) (actual time=24.049..4460.791 rows=1900006 loops=1) Hash Cond: (o.customer_id = c.customer_id) Buffers: shared hit=637 -> Foreign Scan (cost=100.43..480400.01 rows=1900006 width=26) (actual time=1.005..3987.181 rows=1900006 loops=1) Relations: (orders o) INNER JOIN (products p) -> Hash (cost=1637.05..1637.05 rows=100005 width=18) (actual time=22.842..22.843 rows=100005 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 6102kB Buffers: shared hit=637 -> Seq Scan on customers c (cost=0.00..1637.05 rows=100005 width=18) (actual time=0.009..6.931 rows=100005 loops=1) Buffers: shared hit=637 Planning: Buffers: shared hit=40 Planning Time: 4.444 ms Execution Time: 5697.377 ms (16 rows) Time: 5847.650 ms (00:05.848) The execution time after running ANALYZE on foreign tables has further reduced, thus improving the performance of the query. After running ANALYZE on the foreign tables the query run has improved by 53% of the initial run.
Query performance after setting fetch_size and batch_size =10000:
postgres=> ALTER SERVER foreign_server OPTIONS (set fetch_size '10000'); ALTER SERVER Time: 58.789 ms postgres=> ALTER FOREIGN TABLE orders OPTIONS (set batch_size '10000'); ALTER FOREIGN TABLE Time: 58.612 ms postgres=> ALTER FOREIGN TABLE products OPTIONS (set batch_size '10000'); ALTER FOREIGN TABLE Time: 73.730 ms postgres=> EXPLAIN (ANALYZE, BUFFERS) INSERT INTO customer_orders (name, order_id, product_name, quantity) postgres-> SELECT c.name, o.order_id, p.product_name, o.quantity postgres -> FROM customers c postgres -> JOIN orders o ON c.customer_id = o.customer_id postgres -> JOIN products p ON o.product_id = p.product_id; -------------------------------------------------------QUERY PLAN------------------------------------------------------------------------------ Insert on customer_orders (cost=2987.54..488274.85 rows=0 width=0) (actual time=3907.570..3907.572 rows=0 loops=1) Buffers: shared hit=1932308 dirtied=15838 written=15838 -> Hash Join (cost=2987.54..488274.85 rows=1900006 width=36) (actual time=43.116..2545.099 rows=1900006 loops=1) Hash Cond: (o.customer_id = c.customer_id) Buffers: shared hit=637 -> Foreign Scan (cost=100.43..480400.01 rows=1900006 width=26) (actual time=20.743..2136.200 rows=1900006 loops=1) Relations: (orders o) INNER JOIN (products p) -> Hash (cost=1637.05..1637.05 rows=100005 width=18) (actual time=21.984..21.984 rows=100005 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 6102kB Buffers: shared hit=637 -> Seq Scan on customers c (cost=0.00..1637.05 rows=100005 width=18) (actual time=0.014..6.853 rows=100005 loops=1) Buffers: shared hit=637 Planning: Buffers: shared hit=14 Planning Time: 41.627 ms Execution Time: 3909.023 ms (16 rows) Time: 4049.642 ms (00:04.050) Setting this parameter has further improved the query performance. After setting fetch_size and batch_size parameters the query performance has improved by 67% of the initial run.
Common Issues and Fixes
Connection Failures / I/O Errors: Verify your network, firewall configurations, and pg_hba.conf settings.
Query Timeouts: Optimize query plans, enable remote estimates.
Out of Memory: Tune memory parameters and fetch size.
Slow Performance: Use batch inserts and analyze foreign tables.
Postgres_FDW can transform how you manage distributed PostgreSQL environments—but only if you respect its parameters. Each setting—whether it is Analyzing the foreign tables, use_remote_estimate for smarter planning, fetch_size / batch_size for efficient data transfer, or TCP keepalives for stability—plays a role in this orchestration. Configure them thoughtfully, and you will unlock the full potential of FDWs without falling into the common traps.