skip to content
Pradeep Chhetri

Software Engineer. Writing about databases, infrastructure, and distributed systems.

Main navigation

  • Home
  • Blog
  • TIL
GitHub LinkedIn RSS

Using Swap with ClickHouse

September 6, 2025

Tags:
  • clickhouse
  • linux
  • swap
  • zram

Introduction

Engineers have a common misunderstanding that swap slows down the system. Hence they prefer not to use swap at all. ClickHouse being the 3I/ATLAS of databases world, definitely can’t afford to slow down. Little do we know that over time, swap code has improved a lot and newer kernel modules like zram and zswap are developed, which can help to make swap lot faster. Hence I thought why not to try to prove that ClickHouse can benefit from swap without sacrificing speed.

Experiment

I wanted to design a simple experiment which leverages zram as swap. We first run a query without swap by giving enough memory required by the query, run the same query with swap by giving lesser than the max memory required by the query so that swap is utilized, measure the query duration between the two instances of the query and come up with the conclusion.

ClickHouse has a feature “spill to disk”, which automatically starts using disk in case ClickHouse is short of memory for GROUP BY and ORDER BY queries. Since I wanted the whole query to run in-memory, I came up with the following memory intensive query:

SELECT count()
FROM
(
    SELECT number
    FROM numbers(1, 400000000)
    INTERSECT
    SELECT number
    FROM numbers(1, 400000000)
)

Running query with no memory limit

My machine has 58G RAM so I ran the query giving the full memory to ClickHouse.

nixos :) select count() from (select number from numbers(1,400000000) intersect select number from numbers(1,400000000))

SELECT count()
FROM
(
    SELECT number
    FROM numbers(1, 400000000)
    INTERSECT
    SELECT number
    FROM numbers(1, 400000000)
)

Query id: d2e2008a-97ca-41f7-a390-2b71715cca56

   ┌───count()─┐
1. │ 400000000 │ -- 400.00 million
   └───────────┘

1 row in set. Elapsed: 22.545 sec. Processed 800.00 million rows, 6.40 GB (35.49 million rows/s., 283.88 MB/s.)
Peak memory usage: 12.00 GiB.

You can see that the query took 12 GiB of RAM to finish and ran in around 22.5 seconds.

[pradeep@nixos:~]$ cat /sys/fs/cgroup/system.slice/clickhouse.service/memory.max
max

You can see the clickhouse-server process is given full memory at its disposal (58G).

Running query with 11 GB of memory limit

In order to restrict the memory available for clickhouse-server process, we used the following systemd config with enforces the max memory available at cgroup-level.

[Service]
MemoryMax=11G

We were able to confirm that only 11 GB is available to clickhouse-server process.

[pradeep@nixos:~]$ cat /sys/fs/cgroup/system.slice/clickhouse.service/memory.max
11811160064

If we now run the same query, obviously clickhouse built-in memory-tracker will come in between and prevent the query from running. This is a safety feature to prevent whole server getting oom-killed.

nixos :) select count() from (select number from numbers(1,400000000) intersect select number from numbers(1,400000000))

SELECT count()
FROM
(
    SELECT number
    FROM numbers(1, 400000000)
    INTERSECT
    SELECT number
    FROM numbers(1, 400000000)
)

Query id: 6e24ad4e-944a-41e6-9518-4909d2f258b2


Elapsed: 5.314 sec. Processed 268.57 million rows, 2.15 GB (50.54 million rows/s., 404.31 MB/s.)
Peak memory usage: 12.00 GiB.

Received exception from server (version 25.7.5):
Code: 241. DB::Exception: Received from localhost:9000. DB::Exception: (total) memory limit exceeded: would use 12.38 GiB (attempt to allocate chunk of 8.00 GiB bytes), current RSS: 4.46 GiB, maximum: 9.90 GiB. OvercommitTracker decision: Query was selected to stop by OvercommitTracker: While executing IntersectOrExcept. (MEMORY_LIMIT_EXCEEDED)

In order to bypass the memory-tracker to interfere with our test, I just gave a fake perspective to clickhouse-server process that it has more memory event though it was still limited to 11 GB on cgroup-level by increasing the value of max_server_memory_usage_to_ram_ratio settings.

nixos :) select getServerSetting('max_server_memory_usage_to_ram_ratio')

SELECT getServerSetting('max_server_memory_usage_to_ram_ratio')

Query id: b49c8b8e-9f9b-49ba-9ab7-61f17270cc52

   ┌─getServerSet⋯ram_ratio')─┐
1. │                      1.5 │
   └──────────────────────────┘

With this clickhouse-server thinks it has 1.5 * 11 = 16.5 GB of memory and hence it should be sufficient to run the query.

nixos :) select count() from (select number from numbers(1,400000000) intersect select number from numbers(1,400000000))

SELECT count()
FROM
(
    SELECT number
    FROM numbers(1, 400000000)
    INTERSECT
    SELECT number
    FROM numbers(1, 400000000)
)
  
Query id: 619fbd7f-6a39-42ae-9501-329ab1c3e96d

   ┌───count()─┐
1. │ 400000000 │ -- 400.00 million
   └───────────┘

1 row in set. Elapsed: 22.639 sec. Processed 800.00 million rows, 6.40 GB (35.65 million rows/s., 285.21 MB/s.)
Peak memory usage: 12.00 GiB.

Surprisingly the query ran in same time as before.

In order to validate the swap usage, i ran the zramctl command in loop during the lifetime of the query. Swap usage went upto 700M (uncompressed) or 190M (compressed) hence gave 3.5x compression.

zramctl output
[pradeep@nixos:~]$ while true;do zramctl; sleep 1; done
NAME       ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G   4K   64B   20K         [SWAP]
NAME       ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G   4K   64B   20K         [SWAP]
NAME       ALGORITHM DISKSIZE  DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 46.1M 11.6M 13.1M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 260.3M 45.7M 48.1M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA  COMPR  TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 475.7M 116.6M 122.9M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA  COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 695.2M 188.5M  198M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 241.2M 41.9M 47.2M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 237.8M 41.4M 47.2M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 234.1M 40.9M 46.9M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 232.9M 40.8M 46.9M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 231.2M 40.5M 46.9M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 229.8M 40.4M 46.9M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 226.3M 39.8M 46.9M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 217.1M   38M 46.5M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 215.6M 37.8M 46.5M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 214.8M 37.7M 46.4M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 214.1M 37.6M 46.4M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 213.4M 37.5M 46.4M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 212.7M 37.4M 46.3M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 209.9M   37M 46.3M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 207.8M 36.6M 46.3M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 206.2M 36.4M 46.2M         [SWAP]
NAME       ALGORITHM DISKSIZE  DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G  204M 36.1M 46.1M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 203.6M   36M 46.1M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 203.1M   36M 46.1M         [SWAP]
NAME       ALGORITHM DISKSIZE   DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd          1.2G 202.4M 35.9M 46.1M         [SWAP]

During this test, I also modified vm.swappiness to 199 (range: 0-200) which instructs kernel to be aggressive in in swapping out memory pages.

[pradeep@nixos:~]$ sudo sysctl vm.swappiness
vm.swappiness = 199

Conclusion

We might have a misconception that swap is a workaround for emergency purpose during memory starvation situations but actually it makes the memory reclaimation efficient. If you understand your application properly, you can tune various swap parameters to get the best use of it.

© 2026 Pradeep Chhetri