HomeDatabase TechnologyHow to Tune Advanced PostgreSQL Performance: A Hands-on Guide

How to Tune Advanced PostgreSQL Performance: A Hands-on Guide

PostgreSQL supports some of the highest-performance applications worldwide—from financial services platforms doing tens of millions of transactions daily to analytics engines analyzing all of Facebook every hour.

Even with a skilled hand, there’s huge additional speed you can get from your existing infrastructure, simply by making a few changes.

- Advertisement -
dbametrix

Here’s eight proven steps you can follow in sequence now to push your server into overdrive.

PostgreSQL can perform extremely well when configured correctly.

This tutorial will help you do just that.

By running your current setup in the first step and tweaking it step by step, you will understand what changes to make to improve absolute performance.

- Advertisement -
dbametrix

This guide aims to introduce you to query profiling, storage management, index creation, configuration tuning, and other PostgreSQL-specific issues that remain overlooked by many underperforming PostgreSQL instances.

SQL injection attacks often run on default installations of MySQL without tuning, leading to severely underperforming configurations and slower-than-necessary response times.

Keep in mind that PostgreSQL offers great capabilities, yet its default configuration uses conservative settings to maintain safety on low-powered hardware.

Mostly, you will immediately see tons of performance when applying the changes discussed below.

Once you have completed all the steps, your long-term goal should be to keep monitoring your workload for changes and to keep making adjustments—slowly and carefully—in response.

Step 1: Establish Your Current PostgreSQL Performance Baseline:

The first thing you need to do is find out your current PostgreSQL configuration.

Spontaneously tuning the configuration without establishing what you’re tuning may lead to dialing it back after you’ve gone too far in the opposite direction.

In other words: Always profile an existing system before making drastic modifications.

View pg_settings information:

EXECUTE: SELECT name, setting, unit, context FROM pg_settings WHERE category LIKE ‘%Memory%’ OR category LIKE ‘%Query%’ ORDER BY name;

This query will show you a comprehensive overview of your overall memory and query planning settings.

Measure your system resources:

Before diving into modifications, determine your available RAM, CPU cores, and disk type (SSD versus HDD).

The available resources directly influence which configuration values you should use.

An optimal PostgreSQL instance on a server with 16 GB of RAM will differ completely from one on a 4 GB machine.

– Use `free -h` for memory
– use `lscpu` for cores
– use `lsblk -d -o name,rota` for disk type (with rota=0 being an SSD)

Collect baseline query execution statistics:

Install the extension:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

This extension will eventually enable you to compare your new and old configurations through previously written queries.

Having hard numerical proof of prior query performance will determine whether your changes helped.

Step 2: Adjust PostgreSQL Memory Settings:

Postgres defaults assume very limited available RAM.

Increasing shared buffers, query work memory, and effective cache size all provide much better server performance than defaults—especially when used together.

shared_buffers:

Use 25% of available RAM as a rule of thumb:
shared_buffers = 4GB if RAM=16GB

work_mem:

Similarly, increase per-query sort and hash_JOIN buffers, but don’t forget that it applies per operation:
work_mem = 64MB is a good safe starting point. Increase cautiously if you start hitting disk during sorts.

effective_cache_size:

Make the query planner aware of available OS memory (and queries will be more index-friendly):
effective_cache_size=12GB if RAM=16GB

This will make the speed of your queries really shine.

Step 3: Optimize PostgreSQL Index Strategy:

Too many indexes slow write speed, not enough leave you with huge table scans.

Choose carefully; create small, one time indexes for your most commonly run queries.

Then remove all not used by the application.

Fine-tuned indexes will reap immediate rewards.

Find your slowest performing tables:

using direct SELECTs against pg_stat_user_tables:
SELECT relname, seq_scan, idx_scan FROM pg_stat_user_tables ORDER BY seq_scan DESC LIMIT 10;

Using the reverse of the above will find newly wasteful indexes.

Create composite and partial indexes:

A composite index on (product_id, creation_date) will beat two indexes on the two individual columns, especially if creation_date isn’t highly selective.

If the table order_id is frequently updated and your queries filter on status=’pending’ 50% of the time, a partial index may be more efficient:
CREATE INDEX ON orders(id) WHERE status=’pending’;

Reduce index bloat by removing any indexes NOT used by your application.

– Run `VACUUM` on all tables after the removal
– Make an exhaustive query against the index usage statistics view `pg_stat_user_indexes` to find dead indexes:
SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan=0;

This can build an optimization picture, with indexes that are used heavily on the slowest queries.

– Remember to preserve indexing on columns that are used consistently in JOINs or as constraints so the query optimizer behaves correctly.

Step 4: Profile and refine your slowest running queries:

Good indexes won’t help you if your queries are poorly written.

PostgreSQL provides EXPLAIN ANALYZE and EXPLAIN options for this purpose.

Always use EXPLAIN ( ANALYZE, BUFFERS) instead of plotting loose EXPLAINs:
EXPLAIN(ANALYZE, BUFFERS) SELECT * FROM orders JOIN customers ON orders.customer_id=customers.id WHERE orders.status=’shipped’;

Look for:
– Expensive sequential scans on multi-GIG tables
– Nested loop joins repeated in large record volumes
– Huge, high-cost operations with empty or small results
– Queries with actual time much larger than the estimated plan time

Step 5: Fix your plans by updating your statistics:

To fix bad estimates that lead to poor join decisions, continually analyze your tables:
— repeatedly update top 3 tables by size:
BEGIN FOR n IN 1..3 LOOP SELECT relname INTO mytable FROM pg_stat_user_tables LIMIT 1 OFFSET n; EXECUTE format(‘ANALYZE %I’, mytable); END; END;;

In a new state of high-activity workload in a Linux environment check the sluggish behavior by running “sar -u” showing the utilization by CPU activity: 15 minutes of observed data (all lines, full report) $ sar -u 1 900 90000 seconds to return 900 measurements at 1000 milliseconds intervals The output is one line per measurement with the CPU load in the 4 columns, with the first CPU line. To describe the data the following example can be read as follow-ignoring the header-if the first CPU line, at the beginning of the run, shows only 5% of CPU idle time then the CPU has an utilization of 95% in the last 10 minutes. 10:30:50 AM CPU %user %nice %system %iowait %irq %soft %steal %idle 10:30:50 AM CPU 4 0 2 0 0 0 0 5 10:31:00 AM CPU 4 0 2 0 0 0 0 5
If the first line shows no more than 95% idle time then the server is heavily loaded. For example 95% idle time means only a usage of 5%

To detect disk usage monitoring output 1 minute with 1 second interval, which is suitable for a great number of disks monitoring last state, using iostat command:(Example)

iostat -x -m 1 60 Linux 2.6.9-89.ELsmp (logan.torok) 12/10/2007 average wait in ms: 58 (for this 60 second test) then we can analyze the activity of disks and ext3 file system used. By over looking the iostat command output by hand, we can look for disks with high activity and for available disk latency at reasonable reduced. HIGH activity, with high number of I/O operations, will make the server slow in case the disks are saturated if the disks have high latency, we won’t see a positive impact on the server and the I/O should be reduced

To establish the current load on the network by online activity used command: (Example)

netstat -s -p -e 1 600 The output will be something like: 1 second intervals dump of netstat statistics 175 packets received 28 packets sent 0.0 packets received/sec 95.15 packets/sec 233 packets dropping (99989/sec) 0; packets not received 0 totals dumps for last 10 seconds, 8 averages in total

Considering that statistics are dumped every second we can have some information about the network load. Also the total number of incoming packets (packets received) and packet’s lost (packets dropping) can give us some idea of a bad place and point in which the network is highly loaded. Examining the manual page of netstat a high number of “packets dropping” can be taken as significant and implying some kind of saturation. In the example netsat output, the network seems to be quite saturated, on the other hand.

To check for the lack of CPU RAM or scaling problem can be used commands “sar -r” in comparison with “sar -u”: 15 minutes of observed data (all lines, full report) $ sar -r 1 900 90000 seconds to return 900 measurements at 1000 milliseconds intervals The output will be one line per measurement with the approximate free RAM present in the server, in KB, and the free RAM page in KB, with the third field representing the available cache.

To describe the data we need to paste the number of free pages then first divide the number into 1024. To example over 15 minutes, the snapshot of the server running with no load between run 100 and 300: 00:05:50 KB Memfree KB Cachefree 38 26228 42 1024 522132 78550 52 26228 52 1024 544220 75876 42 26228 44 1024 544218 75200 34 26212 60 1024 544328 77136 30 26212 50 1024 544150 76473 From the above, the server had a system free RAM was more than 50MB during the period,and a server cache free of more than 732MB is sufficient.

To highlight the server performance and not to be restrained to any measurement interval, we will use the “top” command: $ top To get a accumulated CPU utilization percentage, we add up the last 5 column for the server (average). Since the samples are taken every 10 seconds, we will get 0.55 CPU utilization: #top -b -n 1 | grep Cpu, -23.435 id (text): 55.259 us (task running): 28.318 Sy (kernal): 2.108 access the total CPU time used by looking at the number added and divided by the number of samples taken (here 15). The total is then: 28.318+2.108+23.435=53.86 in 15s averaging to an average of 95.69% CPU utilization. From the above we see that the processor usage is quite high.

The “iostat” command needs to be run and averages and file system latency assessed. From the previous example, we would look for the file system activity. On the file, if the command offers a high number of reads and writes at a high latency, we will proof that an high activity making the disks saturated and introduced latency.

You should run “iostat -d” and analyze the output. To combine file system activity with redirections to file, we can do something like (Example)

iostat -d > log and then analyze it with

cat log | grep -w ‘/dev/sda’ (or other disks). From the previous command we will analyze the activity of the disks. During a high server load I will expect a high number of redirections on disks. The number of KB reads or writes with the latency could be used to assess the impact on the disks.

You should run “iostat -d” and analyze the output. To combine file system activity with redirections to file, we can do something like (Example)

iostat -d > log and then analyze it with

cat log | grep -w ‘/dev/sda’ (or other disks). From the previous command we will analyze the activity of the disks. During a high server load I will expect a high number of redirections on disks. The number of KB reads or writes with the latency could be used to assess the impact on the disks. In the average, in a case if the “in” rate is high and the “out” lower, the file system use its read operation more than the write operation, and disks are over loaded.

In the average, in a case if the “in” rate is high and the “out” lower, the file system use its read operation more than the write operation, and disks are over loaded. The average “in” and “out” value during those observation period are: 254171 KB in 400 seconds, at a rate of 635.42 KB/second 554658 KB out in 400 seconds, at a rate of 1386.64 KB/second. So it seems that during time of high server load, the file system top value is the read operation at 635 KBs during the “off load” period.

This command dumps an average of 6.0006 packets per second. If the server seems to provide a relatively high number of requests/packets in and out, then network should be fine. Running command “iptraf” or any other network activity monitoring tool on the server gives us an idea of the network activity. Should the network throughput at high disk and CPU activity be too high, some configuration adjustments will have to be taken.

To check server performance by open connections, running command “netstat -a”, which will show all active connections will be translated in incoming and outgoing packets and connections:(Example)

netstat -a Considering that the number of incoming connections should be similar as well as the outgoing ones, a snapshot of the incoming connections would be used to compare with outgoing connections. During a period of 600 seconds, the number of total connections dumped will be directly comparable with number of active outgoing sessions and connection.

To check the number of active open sessions, running command “wc” with output of “netstat” command: (Example)

netstat -a | wc Here are the example of the number of active sessions during server idle and under load: 910 5350 7714 in the first situation we have 910 active servers, in the second 7714 active sessions the number of active sessions is already quite high The conclusion was the following: The more low_load, the biggest number of active sessions were found. While the high load could not be related to high number of active sessions, we can keep in mind that as possible bottleneck.

In conclusion, from the previous server tests we could keep in mind the following: – “sar -u” showed relatively low utilization (around 5%, being 0 the load) – “sar -r” showed plenty of free RAM – “netstat -s” showed relatively low percentage of forwarded packets – “iostat” showed disk activity – “top” showed high CPU activity— all the observatories done showed relatively low server activity despite being high server loads. Time spent on the different CPU modes, achieved looking at the “kstat” command output were: Time spent in different CPU states 0.61% user level 13.29% kernel level 61.13% wait 23.02% idle From the above, it appears that in case of server high load and high CPU activity, system spends 0.61% of user mode time and 13.29% of kernel mode time, leaving the system times in the waiting level more than 63% of time.

The server activity in a case where the server is under very high load (with a great number of processes running), is expected to have low user time, higher kernel time and high waiting time. In this way, the single final server number of processes.

It seems proper to keep in mind the above the following: – In case of high server load the CPU will be high. Should the CPU be high for any understanding of server performance then high server workload should be evidenced through high load. Then should the CPU be high, the root cause of the activity should be search by looking at the C-states with “kstat” or game out on the CPU tables available. Should you want to back this up using “mpstat” and get some detailed overview of the activity happening are then all listed, with “mpstat -P ALL 1 3” repeating 3 time at 1 second time interval. The last line is the sum of the first three ones to get a really good overview of all the activity.

Thus you should get ready, in case of high server load, for high CPU usage and search for the cause according to the results, i.e. CPU frequency, processes sleeping at a particular priority, number of interrupted retired. Should you proof the “mpstat” command output in the above example to be too high, you are running something that will make the server sluggish.

Should this be the case, then if you proof a high activity by open connections, by running “netstat -a” output, then you proof a high network activity and the I/O should be dropped accordingly.2- Congestion on the network could be evidenced by using “sar -n DEV 1 3” and output depending from the server network interface (eth0, eth1, etc). Should the comma separated value “packets per second” be overloading the network, slow servers could be noticed during slow network activity, the incoming and outcoming packets for the network should be over 100.

Force a statistics refresh:
ANALYZE orders;
ANALYZE customers;

Replace N+1 Query Patterns:

A common antipattern in the applications: pull a list of records, then issue a new query for each row in the list as a loop.

Almost always, a single JOIN or a WHERE IN would suffice.

This simple change alone can cut dozens of roundtrips to a single call from the database down to milliseconds of latency.

Step 6: Manage PostgreSQL Table Bloat with VACUUM:

PostgreSQL relies on multi-version concurrency control (MVCC), so requires that even deleted or updated rows still remain somewhere – they turn into “dead tuples” which accumulate, sometimes dramatically degrading performance.

Learn how autovacuum functions:

Autovacuum runs automatically to clear out dead tuple space.

But default thresholds are very conservative and geared toward small tables.

Large, constantly updated tables need much more aggressive autoservice.

Adjust autovacuum parameters for each table:

You can change autovacuum parameters for each table:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_analyze_scale_factor = 0.005
);

reports that autovacuum should run on this table after about 1% of rows change rather than 20%.

For a 10 million row table, that’s a big difference in responsiveness.

Check dead tuple levels with the monitor the amount of dead tuples who have accumulated:

SELECT relname, n_dead_tup, n_live_tup FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;

When dead tuples outnumber live ones, your autovacuum configurations need to be tightened up or your writing pattern needs a more aggressive approach.

Step 7: Tune Connection Management with Pooling:

Every PostgreSQL connection consumes about 5-10MB of RAM and causes process spawning overhead.

Even a system with hundreds of simultaneous connections will spend more time managing connections than executing queries.

Use PgBouncer to Pool Connections:

PgBouncer is a lightweight connection pooler that sits between your app and the database.

Instead of each thread holding a connection, they share a small pool of pre-made connections.

Installation on Ubuntu: apt-get install or (here’s from the pgbouncer.net home page)

Configure Pool Mode:

PgBouncer has 3 modes:
– Transaction (a server connection is only alive during a server transaction);
– Session (hold a server connection for a client session);
– Statement (close the connection after each statement.)

Transaction mode can have a big impact, dropping hundreds of active connections down to a few dozen but remaining quite scalable.

Set max_connections appropriately:

The recommended maximum is that which matches the maximum number of pooled connections you expect to have across all your app processes:
shared_buffers = 16GB, max_connections= 1000, max_worker_processes = 17,

With PgBouncer staring at 100 server connections, it can serve thousands of app threads.

Otherwise you spend tons of memory maintaining connections that are never needed or make the context switching on your database’s server hard to endure.

Step 8: Adjust Write Ahead Log Settings for Your Use Case:

The Write Ahead Log (WAL) helps safeguard data written to disk but the defaults aren’t tuned for push-it-through speeds.

Tweaking the below balances safety versus throughput.

Change checkpoint_completion_target:

This controls how slowly PostgreSQL distributes checkpoint completions over the checkpoint interval.

Raise away from 0.5 and you hit fewer I/O spikes:
checkpoint_completion_target = 0.9

Set wal_buffers:

The buffers hold data for the log before writing to disk.

The default is almost always not large enough to support high throughput:
wal_buffers = 16MB

Select the appropriate level of synchronous_commit:

Many apps can get away with a little bit of data loss, so you can relax the durability constraint for a speed boost:
synchronous_commit = off

This won’t leave you with a corrupt database but you could lose roughly the last 600 ms of committed transactions in a crash.

Where financial data is concerned, stick to default on. For high volume event streams, the throughput enhancements can be extraordinary.

Step 9: Implement Continuous PostgreSQL Monitoring and Alerting:

Optimization is an iterative process.

If your workload changes, your data volume increases, or the workload of your users shifts, settings that worked well six months ago may do poorly now.

Keeping a monitoring system in place allows you stay ahead of problems before outages occur.

Enable pg_stat_statements Reporting:

Gather data on your slow queries by running:
SELECT query, calls, total_exec_time, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20;

This helps find the biggest average offender of poorly-performing query tuning.

Utilize pgBadger for Log Files:

PostgreSQL log files are information dense but can be sifted through with a fast, comprehensive parsing tool: with a report in minutes covering:
– Top SQL queries by total duration
– Top call counts
– Large lock time spreads
– How many active clients over time the number of connections

The duration of a few queries may be outliers but from that you can prioritize your tuning.

Establish Alerts for Key Metrics:

Whether you shell out to a hosted solution like DataDog or Prometheus’ postgres_exporter there’s a set of metrics:

  • – Cache hit ratio (targets 99+);
    – lag time for from replication scripts (if pertinent);
    – ratio of dead tuples;
    – active connection count in relation to max connections;
    – query duration Percentiles(P99, P95).

You’ll catch regressions quickly.

from the user interfaces your team works from.

Conclusion:

This guide has taken you from stock out of the box default database through a fairly optimized set of best practices.

Starting with a solid initial audit, then core adjustments to memory size, index strategy, query generation, bloat prevention, connection pooling, WAL tuning, and monitoring each builds on the last.

You won’t need any special tools to use any of these techniques, just time and research. Most teams manage optimization without drilling into the internals of the database. The key is that optimization isn’t ever a single event, it’s an ongoing process.

Check your key performance metrics after applying these tips and most likely you’ll find you’re not quite hitting your expectations.

Though if you are, we want to hear from you.

For most applications, that translates into configuring shared_buffers properly, activating connection pooling and analyzing the five most insecure slowest queries. These three measures alone sometimes lead to dramatic quantifiable gains over a few hours’ time. Then continue working methodically through the other items on the list.

Do your homework and properly tuning PostgreSQL truly is worthwhile. Your daily expedited experience will be consistent and worth every bit of the investment.

- Advertisement -
dbametrix
- Advertisment -
remote dba services

Most Popular