Vacuuming is used for periodic maintenance of the PostgreSQL database.
Generally, it is enough to let vacuuming be performed by the autovacum process.
Autovacuum is on by default in PostgreSQL. To confirm, you can check by running the command SHOW autovacuum;
. It should output on
Why do we need vacuuming?
PostgreSQL follows a concurrency control mechanism called MVCC (Multi-Version Concurrency Control). This means that when there is an update
query and a select
query on the same row, PostgreSQL creates a snapshot of the current version of the data and returns it to the select
query. The update
query then operates on a separate version of the data, which is ultimately stored.
During this process, the previous snapshot remains in the system and is treated as a dead tuple. These dead tuples can also be generated when performing a delete
operation. They represent previous versions of data that are no longer relevant.
Vacuum reclaims the storage occupied by the dead tuples.
To get the number of live tuples and dead tuples, we can use the below query:
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables;
If you see any dead tuples for any table, simply run VACUUM
command manually and should see n_dead_tup
count would become 0
Other than reclaiming storage vacuuming also helps,
updating statistics used by the query planner
updating the visibility map, which helps in
index only
scans
VACUUM ANALYZE
VACUUM ANALYZE
is a combination of two PostgreSQL database maintenance operations: VACUUM
and ANALYZE
. VACUUM
is used to reclaim space by removing dead tuples from the database. Dead tuples are created when a row is updated or deleted ANALYZE
is used to update the statistics of the database for better query planning
Types of VACUUM
VACUUM OR standard vacuum
VACUUM FULL
The main difference between the two is that VACUUM FULL rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This can reclaim more space, but takes much longer and exclusively locks the table. It also requires extra disk space and can have adverse side-effects like increased I/O, index bloat, and high loads on replication servers.
On the other hand, VACUUM (without FULL) simply reclaims space and makes it available for re-use within the same table. This form of the command can operate in parallel with normal reading and writing of the table, as an exclusive lock is not obtained. However, extra space is not returned to the operating system (in most cases).
VACCUM FULL can reclaim more space but it runs slowly. The standard vacuum runs in parallel with other database normal operations. Which means select, update, insert and delete can continue to work. While vaccum full requires a exculsive lock on the table so the table can not be used during the process.
VACUUM and VACUUM FULL in action
Create a table
CREATE TABLE test(id serial, data int);
Insert some large data
INSERT INTO test (data) SELECT * FROM pg_catalog.generate_series(1, 100000);
Let's update a row
UPDATE test SET data = 234234 where data = 23;
Now let's see if there is any dead tuple
SELECT relname, n_live_tup, n_dead_tup FROM pg_stat_user_tables where relname = 'test'; -- Output -- |relname|n_live_tup|n_dead_tup| -- |-------|----------|----------| -- |test |300 |1 |
Let's run vacuum on this table
VACUUM test;
Let's see if there any dead tuple
SELECT relname, n_live_tup, n_dead_tup FROM pg_stat_user_tables where relname = 'test'; -- Output -- |relname|n_live_tup|n_dead_tup| -- |-------|----------|----------| -- |test |300 |0 |
Now, let's do vacuum full
VACUUM FULL test;
Let's verify if there is any lock on the table
SELECT locktype, relation::regclass, mode FROM pg_locks WHERE relation = 'test'::regclass; -- Output -- |locktype|relation|mode | -- |--------|--------|-------------------| -- |relation|test |AccessExclusiveLock| -- you can also use below query to get the actual query which caused the lock SELECT pg_stat_activity.query, pg_locks.mode, pg_locks.granted, locktype, relation::regclass FROM pg_locks JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid WHERE pg_locks.relation = 'test'::regclass;
pg_repack
pg_repack
is a PostgreSQL extension that allows you to remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike VACUUM FULL
, it works online, without holding an exclusive lock on the processed tables during processing.
Fore more info on installation and overview please their official page
Thanks for reading ๐