Mastering PostgreSQL Vacuuming

Photo by Tarek Kunze on Unsplash

Mastering PostgreSQL Vacuuming

ยท

4 min read

Vacuuming is used for periodic maintenance of the PostgreSQL database.

Generally, it is enough to let vacuuming be performed by the autovacum process.

Autovacuum is on by default in PostgreSQL. To confirm, you can check by running the command SHOW autovacuum; . It should output on

Why do we need vacuuming?

PostgreSQL follows a concurrency control mechanism called MVCC (Multi-Version Concurrency Control). This means that when there is an update query and a select query on the same row, PostgreSQL creates a snapshot of the current version of the data and returns it to the select query. The update query then operates on a separate version of the data, which is ultimately stored.

During this process, the previous snapshot remains in the system and is treated as a dead tuple. These dead tuples can also be generated when performing a delete operation. They represent previous versions of data that are no longer relevant.

Vacuum reclaims the storage occupied by the dead tuples.

To get the number of live tuples and dead tuples, we can use the below query:

SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables;

If you see any dead tuples for any table, simply run VACUUM command manually and should see n_dead_tup count would become 0

Other than reclaiming storage vacuuming also helps,

  • updating statistics used by the query planner

  • updating the visibility map, which helps in index only scans

VACUUM ANALYZE

VACUUM ANALYZE is a combination of two PostgreSQL database maintenance operations: VACUUM and ANALYZE. VACUUM is used to reclaim space by removing dead tuples from the database. Dead tuples are created when a row is updated or deleted ANALYZE is used to update the statistics of the database for better query planning

Types of VACUUM

  • VACUUM OR standard vacuum

  • VACUUM FULL

The main difference between the two is that VACUUM FULL rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This can reclaim more space, but takes much longer and exclusively locks the table. It also requires extra disk space and can have adverse side-effects like increased I/O, index bloat, and high loads on replication servers.

On the other hand, VACUUM (without FULL) simply reclaims space and makes it available for re-use within the same table. This form of the command can operate in parallel with normal reading and writing of the table, as an exclusive lock is not obtained. However, extra space is not returned to the operating system (in most cases).

VACCUM FULL can reclaim more space but it runs slowly. The standard vacuum runs in parallel with other database normal operations. Which means select, update, insert and delete can continue to work. While vaccum full requires a exculsive lock on the table so the table can not be used during the process.

VACUUM and VACUUM FULL in action

  1. Create a table

     CREATE TABLE test(id serial, data int);
    
  2. Insert some large data

     INSERT INTO test (data) 
       SELECT * FROM pg_catalog.generate_series(1, 100000);
    
  3. Let's update a row

     UPDATE test SET data = 234234 where data = 23;
    
  4. Now let's see if there is any dead tuple

     SELECT relname, n_live_tup, n_dead_tup
     FROM pg_stat_user_tables where relname = 'test';
     -- Output
     -- |relname|n_live_tup|n_dead_tup|
     -- |-------|----------|----------|
     -- |test   |300       |1         |
    
  5. Let's run vacuum on this table

     VACUUM test;
    
  6. Let's see if there any dead tuple

     SELECT relname, n_live_tup, n_dead_tup
     FROM pg_stat_user_tables where relname = 'test';
     -- Output
     -- |relname|n_live_tup|n_dead_tup|
     -- |-------|----------|----------|
     -- |test   |300       |0         |
    
  7. Now, let's do vacuum full

     VACUUM FULL test;
    
  8. Let's verify if there is any lock on the table

     SELECT locktype, relation::regclass, mode 
       FROM pg_locks WHERE relation = 'test'::regclass;
     -- Output
     -- |locktype|relation|mode               |
     -- |--------|--------|-------------------|
     -- |relation|test    |AccessExclusiveLock|
    
     -- you can also use below query to get the actual query which caused the lock
     SELECT
         pg_stat_activity.query,
         pg_locks.mode,
         pg_locks.granted,
         locktype, relation::regclass
     FROM
         pg_locks
     JOIN
         pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid
     WHERE
         pg_locks.relation = 'test'::regclass;
    

pg_repack

pg_repack is a PostgreSQL extension that allows you to remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike VACUUM FULL, it works online, without holding an exclusive lock on the processed tables during processing.

Fore more info on installation and overview please their official page

Thanks for reading ๐Ÿ™‚

Did you find this article valuable?

Support Sujeet Agrahari by becoming a sponsor. Any amount is appreciated!