Photo by Joaquín Rivero on Unsplash
Mastering JSON in PostgreSQL: Efficient Storage, Querying, and Mapping
Over the past years, the importance of JSON data has grown significantly. JSON has emerged as the preferred choice for web developers, replacing other technologies like SOAP, plain XML, and custom APIs. It has become the de facto standard for data exchange on the internet. Recognizing this trend, the PostgreSQL community has taken action by implementing robust JSON support within PostgreSQL, enabling efficient storage and management of JSON data.
PostgreSQL introduced two data types for handling JSON data: json
and jsonb
. Let's take a closer look at how they work.
The json
data type validates the JSON document but stores it as plain text, without any binary formatting. While this might provide a small benefit during insertions, it can be costly when accessing the document later on. Imagine you have a large JSON document stored as plain text, and you need to extract specific values or perform operations on it. The database would need to parse the entire document every time, resulting in slower performance.
To address this issue, PostgreSQL offers the jsonb
data type. With jsonb
, the JSON document is parsed and stored in a binary format, optimized for efficient access. This means that querying and manipulating the data becomes much faster, as the database can directly work with the binary representation.
Moreover, jsonb
comes with additional advantages. Many functions and operators in PostgreSQL are specifically designed for the binary representation of JSON data. For example, the jsonb_pretty
function formats the JSON data in a more readable way. These functions leverage the binary structure of jsonb
, allowing for optimized and convenient data operations.
Creating JSON documents
row_to_json()
To create some data, we can use VALUES, which is a SQL instruction to simply return a dataset of your choice (usually some constants). Here is how it works:
VALUES (1, 2, 3), (4, 5, 6);
column1 | column2 | column3
---------+---------+---------
1 | 2 | 3
4 | 5 | 6
What we have here is two lines featuring three columns each. We can turn each row into a JSON document easily:
SELECT row_to_json(x) FROM (VALUES (1, 2, 3), (4, 5, 6)) AS x;
row_to_json
-------------------------------------
{"column1": 1, "column2": 2, "column3": 3}
{"column1": 4, "column2": 5, "column3": 6}
(2 rows)
The important part is that we can turn a generic data structure (in our case, x) into a JSON document. Anything can be passed to the row_to_json
function. The second important observation is that each row will be turned into one JSON document.
json_agg()
Often, we want the entire result set to be a single document. To achieve that, the json_agg
function is what we need:
SELECT json_agg(x) FROM (VALUES (1, 2, 3), (4, 5, 6)) AS x;
json_agg
-------------------------------------
[{"column1":1,"column2":2,"column3":3},{"column1":4,"column2":5,"column3":6}]
(1 row)
We can achieve the above using json_build_object()
as well.
json_build_object()
json_build_object
is a PostgreSQL function that allows you to construct a JSON object from key-value pairs. It takes an arbitrary number of arguments, where each argument represents a key-value pair in the form of key, value
. The function returns a JSON object.
SELECT json_build_object('column1', x.column1, 'column2', x.column2, 'column3', x.column3)
FROM (VALUES (1, 2, 3), (4, 5, 6)) AS x;
json_build_object
-------------------------------------
{"column1": 1, "column2": 2, "column3": 3}
{"column1": 4, "column2": 5, "column3": 6}
(2 rows)
SELECT json_agg(json_build_object('column1', column1, 'column2', column2, 'column3', column3))
FROM (VALUES (1, 2, 3), (4, 5, 6)) AS x;
json)agg
-------------------------------------
[{"column1": 1, "column2": 2, "column3": 3}, {"column1": 4, "column2": 5, "column3": 6}]
(1 row)
Accessing a JSON document
After learning how to create a JSON document, you will see how to access this data type, extract subtrees, and a lot more. Let us first create a table, insert a document, and dissect it:
-- Create a table to store JSON documents
CREATE TABLE json_table (
id SERIAL PRIMARY KEY,
data JSONB,
tags TEXT[]
);
-- Insert a JSON document with nested fields into the table
INSERT INTO json_table (data) VALUES ('{
"name": "John",
"age": 30,
"address": {
"street": "123 Main St",
"city": "New York",
"country": "USA"
},
"hobbies": ["reading", "painting", "gardening"]
}');
-- Query the table and dissect the nested JSON document
select data->'name' from json_table; -- "John"
select data->>'name' from json_table; -- John
-- we can nest it
select data->'address'->'street' from json_table; -- "123 Main St"
-- below will return NULL
select data->'name'->'first_name' from json_table;
-- below will throw error becase address will be cast to test
select data->>'address'->'street' from json_table;
-- reading arrays
SELECT data ->'hobbies' FROM json_table;select pg_typeof(data -> 'name') FROM json_table;SELECT data ->>'hobbies' FROM json_table;
-- validating types
select pg_typeof(data -> 'hobbies') FROM json_table; -- jsonb
select pg_typeof(data ->> 'hobbies') FROM json_table; -- text
-- fetch rows where key exist in the json data
select * from json_table where id = 2 AND data ? 'address';
select * from json_table jt where id=1 and data -> 'address' ? 'city';
data->'name'
returns a JSON value, while data->>'name'
returns the extracted value as text.
jsonb_each()
The jsonb_each
function will loop over the subtree and return all elements as a composite type (the record data type).
SELECT (jsonb_each(data -> 'address')) FROM json_table;
-- Output
-- (city,"""New York""")
-- (street,"""123 Main St""")
-- (country,"""USA""")
However, we can expand on this type and return those elements as separate fields:
SELECT (jsonb_each(data -> 'address')).* FROM json_table;
-- output
-- |key |value |
-- |-------|-----------------|
-- |city |"New York" |
-- |street |"123 Main St |
-- |country|"USA" |
jsonb_array_elements()
The jsonb_array_elements()
function is used to unnest the array and create a row for each element in the array.
-- Query the table and loop over the addresses array within the JSON document
SELECT
data -> 'name' AS name,
data -> 'age' AS age,
(address ->> 'city') AS city,
(address ->> 'street') AS street
FROM json_table,
jsonb_array_elements(data -> 'addresses') AS address;
-- Output:
-- name | age | city | street
---------------------------------
-- John | 30 | New York| 123 Main St
-- John | 30 | London | 456 Elm St
jsonb_object_keys()
Just extract the keys in the document or subtree.
-- Retrieve the keys from the JSONB column
SELECT jsonb_object_keys(data) AS keys
FROM json_table;
-- Output:
-- keys
--------
-- name
-- age
-- city
Turning JSON documents into rows
JSON does not end up in a database by itself – we have to put it there. Inserting a document directly into a JSON column is easy. However, sometimes we have to map a document to an existing table. Consider the following example:
-- Create a table to store user information
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
age INTEGER,
city VARCHAR(100)
);
-- Insert a JSON document into the table using JSONB_POPULATE_RECORDSET
INSERT INTO users (name, age, city)
SELECT data->>'name', (data->>'age')::INTEGER, data->>'city'
FROM jsonb_populate_recordset(null::users, '[{"name": "John", "age": 30, "city": "New York"},
{"name": "Jane", "age": 25, "city": "London"}]');
-- Retrieve the rows from the users table
SELECT * FROM users;
-- Output:
-- id | name | age | city
-- ---+------+-----+------------
-- 1 | John | 30 | New York
-- 2 | Jane | 25 | London
jsonb_populate_recordset
is used when you have a JSON array and want to insert multiple JSON documents into a table, while jsonb_populate_record
is used when you have a single JSON object and want to map its keys to columns in a record or row.
By passing NULL
as the first parameter and casting it to the table name, jsonb_populate_recordset
dynamically creates a composite type that matches the table's structure, allowing JSON data to be mapped to the table's columns.