skip to content
Pradeep Chhetri

Software Engineer. Writing about databases, infrastructure, and distributed systems.

Main navigation

  • Home
  • Blog
  • TIL
GitHub LinkedIn RSS

ClickHouse Keeper

April 3, 2022

Tags:
  • clickhouse
  • keeper

Introduction

ClickHouse recently replaced ZooKeeper with their own implementation ClickHouse-Keeper which can either run as part of ClickHouse server or as a standalone server. Unlike Zookeeper which uses ZAB as the coordination algorithm, ClickHouse-Keeper uses RAFT.

In this post, we will see if we can use the standalone version of ClickHouse-Keeper with other distributed systems.

Kafka

Kafka is a widely adopted distributed event streaming datastore which uses Zookeeper extensively. Let’s try to run a single-node Kafka backed by ClickHouse-Keeper.

docker-compose.yaml
services:
  clickhouse-keeper:
    image: clickhouse/clickhouse-keeper:head-alpine
    entrypoint:
      - /usr/bin/clickhouse-keeper 
    command:
      - --config-file=/etc/clickhouse-keeper/keeper_config.xml
    ports:
      - 9181:9181
    volumes:
      - ${PWD}/clickhouse-keeper/coordination:/var/lib/clickhouse/coordination
      - ${PWD}/clickhouse-keeper/keeper_config.xml:/etc/clickhouse-keeper/keeper_config.xml
  kafka:
    image: bitnami/kafka:3.1
    ports:
      - 9092:9092
    environment:
      - ALLOW_PLAINTEXT_LISTENER=yes
      - KAFKA_CFG_ZOOKEEPER_CONNECT=clickhouse-keeper:9181
    depends_on:
      - clickhouse-keeper
keeper_config.xml
  <clickhouse>
      <listen_host>0.0.0.0</listen_host>
      <logger>
          <!-- Possible levels [1]:

          - none (turns off logging)
          - fatal
          - critical
          - error
          - warning
          - notice
          - information
          - debug
          - trace

              [1]: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105-L114
          -->
          <level>trace</level>
          <log>/var/log/clickhouse-keeper/clickhouse-keeper.log</log>
          <errorlog>/var/log/clickhouse-keeper/clickhouse-keeper.err.log</errorlog>
          <!-- Rotation policy
              See https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/FileChannel.h#L54-L85
          -->
          <size>1000M</size>
          <count>10</count>
          <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
      </logger>

      <max_connections>4096</max_connections>

      <keeper_server>
          <tcp_port>9181</tcp_port>

          <!-- Must be unique among all keeper serves -->
          <server_id>1</server_id>

          <log_storage_path>/var/lib/clickhouse/coordination/logs</log_storage_path>
          <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

          <coordination_settings>
              <operation_timeout_ms>10000</operation_timeout_ms>
              <min_session_timeout_ms>10000</min_session_timeout_ms>
              <session_timeout_ms>100000</session_timeout_ms>
              <raft_logs_level>information</raft_logs_level>
              <!-- All settings listed in https://github.com/ClickHouse/ClickHouse/blob/master/src/Coordination/CoordinationSettings.h -->
          </coordination_settings>

          <raft_configuration>
              <server>
                  <id>1</id>

                  <!-- Internal port and hostname -->
                  <hostname>localhost</hostname>
                  <port>9234</port>
              </server>

              <!-- Add more servers here -->

          </raft_configuration>
      </keeper_server>


      <openSSL>
      <server>
              <!-- Used for secure tcp port -->
              <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
              <certificateFile>/etc/clickhouse-keeper/server.crt</certificateFile>
              <privateKeyFile>/etc/clickhouse-keeper/server.key</privateKeyFile>
              <!-- dhparams are optional. You can delete the <dhParamsFile> element.
                  To generate dhparams, use the following command:
                  openssl dhparam -out /etc/clickhouse-keeper/dhparam.pem 4096
                  Only file format with BEGIN DH PARAMETERS is supported.
              -->
              <dhParamsFile>/etc/clickhouse-keeper/dhparam.pem</dhParamsFile>
              <verificationMode>none</verificationMode>
              <loadDefaultCAFile>true</loadDefaultCAFile>
              <cacheSessions>true</cacheSessions>
              <disableProtocols>sslv2,sslv3</disableProtocols>
              <preferServerCiphers>true</preferServerCiphers>
          </server>
      </openSSL>

  </clickhouse>

Let’s try to do basic operations like creating topics, produce/consume messages, describe topic information.

$ kafka-topics.sh --bootstrap-server localhost:9092 --create --topic foobar
Created topic foobar.
$ kafka-console-producer.sh --bootstrap-server localhost:9092 --topic foobar
>first
>second
>third
$ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic foobar --from-beginning
first
second
third
Processed a total of 3 messages

Let’s validate that statistics from ClickHouse-Keeper side

$ echo stat | nc localhost 9181
ClickHouse Keeper version: v22.4.1.1173-testing-7c3227adc5a716eef2b2f9dcdf4874b83ecf56da
Clients:
 127.0.0.1:45333(recved=0,sent=0)
 172.23.0.3:40540(recved=469,sent=476)

Latency min/avg/max: 0/1/7
Received: 469
Sent : 476
Connections: 1
Outstanding: 0
Zxid: 350
Mode: standalone
Node count: 142

We can see the zookeeper nodes count and number of sent and received packets and connection details.

Solr

Solr is a popular search engine written on top of Lucene. It’s major competitor is Elasticsearch. Let’s try to run a single-node Solr backed by ClickHouse-Keeper.

docker-compose.yaml
services:
  clickhouse-keeper:
    image: clickhouse/clickhouse-keeper:head-alpine
    entrypoint:
      - /usr/bin/clickhouse-keeper
    command:
      - --config-file=/etc/clickhouse-keeper/keeper_config.xml
    ports:
      - 2181:2181
    volumes:
      - ${PWD}/clickhouse-keeper/coordination:/var/lib/clickhouse/coordination
      - ${PWD}/clickhouse-keeper/keeper_config.xml:/etc/clickhouse-keeper/keeper_config.xml
  solr:
    image: solr:8.11.1
    ports:
      - 8983:8983
    environment:
      - ZK_HOST=clickhouse-keeper:2181
    depends_on:
      - clickhouse-keeper
keeper_config.xml
  <clickhouse>
      <listen_host>0.0.0.0</listen_host>
      <logger>
          <!-- Possible levels [1]:

          - none (turns off logging)
          - fatal
          - critical
          - error
          - warning
          - notice
          - information
          - debug
          - trace

              [1]: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105-L114
          -->
          <level>trace</level>
          <log>/var/log/clickhouse-keeper/clickhouse-keeper.log</log>
          <errorlog>/var/log/clickhouse-keeper/clickhouse-keeper.err.log</errorlog>
          <!-- Rotation policy
              See https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/FileChannel.h#L54-L85
          -->
          <size>1000M</size>
          <count>10</count>
          <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
      </logger>

      <max_connections>4096</max_connections>

      <keeper_server>
          <tcp_port>2181</tcp_port>

          <!-- Must be unique among all keeper serves -->
          <server_id>1</server_id>

          <log_storage_path>/var/lib/clickhouse/coordination/logs</log_storage_path>
          <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

          <coordination_settings>
              <operation_timeout_ms>10000</operation_timeout_ms>
              <min_session_timeout_ms>10000</min_session_timeout_ms>
              <session_timeout_ms>100000</session_timeout_ms>
              <raft_logs_level>information</raft_logs_level>
              <!-- All settings listed in https://github.com/ClickHouse/ClickHouse/blob/master/src/Coordination/CoordinationSettings.h -->
          </coordination_settings>

          <raft_configuration>
              <server>
                  <id>1</id>

                  <!-- Internal port and hostname -->
                  <hostname>localhost</hostname>
                  <port>9234</port>
              </server>

              <!-- Add more servers here -->

          </raft_configuration>
      </keeper_server>


      <openSSL>
      <server>
              <!-- Used for secure tcp port -->
              <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
              <certificateFile>/etc/clickhouse-keeper/server.crt</certificateFile>
              <privateKeyFile>/etc/clickhouse-keeper/server.key</privateKeyFile>
              <!-- dhparams are optional. You can delete the <dhParamsFile> element.
                  To generate dhparams, use the following command:
                  openssl dhparam -out /etc/clickhouse-keeper/dhparam.pem 4096
                  Only file format with BEGIN DH PARAMETERS is supported.
              -->
              <dhParamsFile>/etc/clickhouse-keeper/dhparam.pem</dhParamsFile>
              <verificationMode>none</verificationMode>
              <loadDefaultCAFile>true</loadDefaultCAFile>
              <cacheSessions>true</cacheSessions>
              <disableProtocols>sslv2,sslv3</disableProtocols>
              <preferServerCiphers>true</preferServerCiphers>
          </server>
      </openSSL>

  </clickhouse>

Let’s try to do basic operations like creating collection, indexing some data into the collection and then trying to read the indexed data.

$ ./bin/solr create_collection -c techproducts -V
WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use.
         To turn off: bin/solr config -c techproducts -p 8983 -action set-user-property -property update.autoCreateFields -value false

Connecting to ZooKeeper at clickhouse-keeper:2181 ...
INFO  - 2022-04-03 11:12:15.425; org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect to ZooKeeper
WARN  - 2022-04-03 11:12:15.482; org.apache.zookeeper.ClientCnxnSocket; Connected to an old server; r-o mode will be unavailable
INFO  - 2022-04-03 11:12:15.494; org.apache.solr.common.cloud.ConnectionManager; zkClient has connected
INFO  - 2022-04-03 11:12:15.494; org.apache.solr.common.cloud.ConnectionManager; Client is connected to ZooKeeper
INFO  - 2022-04-03 11:12:15.542; org.apache.solr.common.cloud.ZkStateReader; Updated live nodes from ZooKeeper... (0) -> (1)
INFO  - 2022-04-03 11:12:15.579; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at clickhouse-keeper:2181 ready
Uploading /opt/solr-8.11.1/server/solr/configsets/_default/conf for config techproducts to ZooKeeper at clickhouse-keeper:2181

Creating new collection 'techproducts' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=techproducts&numShards=1&replicationFactor=1&maxShardsPerNode=-1&collection.configName=techproducts

{
  "responseHeader":{
    "status":0,
    "QTime":1100},
  "success":{"172.26.0.3:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":619},
      "core":"techproducts_shard1_replica_n1"}}}
$ ./bin/post -c techproducts example/exampledocs/books.csv
/usr/local/openjdk-11/bin/java -classpath /opt/solr-8.11.1/dist/solr-core-8.11.1.jar -Dauto=yes -Dc=techproducts -Ddata=files org.apache.solr.util.SimplePostTool example/exampledocs/books.csv
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/techproducts/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.csv (text/csv) to [base]
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/techproducts/update...
Time spent: 0:00:00.651
$ curl "http://localhost:8983/solr/techproducts/select?indent=on&q=*:*"
{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":1,
    "params":{
      "q":"*:*",
      "indent":"on"}},
  "response":{"numFound":10,"start":0,"numFoundExact":true,"docs":[
      {
        "id":"0553573403",
        "cat":["book"],
        "name":["A Game of Thrones"],
        "price":[7.99],
        "inStock":[true],
        "author":["George R.R. Martin"],
        "series_t":"A Song of Ice and Fire",
        "sequence_i":1,
        "genre_s":"fantasy",
        "_version_":1729085445208276992},
....
....
....
      {
        "id":"080508049X",
        "cat":["book"],
        "name":["The Black Cauldron"],
        "price":[5.99],
        "inStock":[true],
        "author":["Lloyd Alexander"],
        "series_t":"The Chronicles of Prydain",
        "sequence_i":2,
        "genre_s":"fantasy",
        "_version_":1729085445227151360}]
  }}

Let’s validate that statistics from ClickHouse-Keeper side

$ echo "stat" | nc localhost 2181
ClickHouse Keeper version: v22.4.1.1173-testing-7c3227adc5a716eef2b2f9dcdf4874b83ecf56da
Clients:
 127.0.0.1:33215(recved=0,sent=0)
 172.26.0.3:54574(recved=3843,sent=3874)

Latency min/avg/max: 0/0/21
Received: 4027
Sent : 4057
Connections: 1
Outstanding: 0
Zxid: 443
Mode: standalone
Node count: 134

We can see the zookeeper nodes count and number of sent and received packets and connection details.

Mesos

Mesos was the first project which abstracted your complete datacenter as a single large pool of hardware (cpu, memory and disk) resources (yes you heard it right, even before kubernetes). Let’s try to run a three-node Mesos Masters cluster backed by ClickHouse-Keeper.

docker-compose.yaml
services:
  clickhouse-keeper:
    image: clickhouse/clickhouse-keeper:head-alpine
    entrypoint:
      - /usr/bin/clickhouse-keeper
    command:
      - --config-file=/etc/clickhouse-keeper/keeper_config.xml
    ports:
      - 2181:2181
    volumes:
      - ${PWD}/clickhouse-keeper/coordination:/var/lib/clickhouse/coordination
      - ${PWD}/clickhouse-keeper/keeper_config.xml:/etc/clickhouse-keeper/keeper_config.xml
  mesos-master-1:
    image: mesosphere/mesos-master:1.7.1
    environment:
      MESOS_ZK: 'zk://clickhouse-keeper:2181/mesos'
      MESOS_QUORUM: '1'
      MESOS_CLUSTER: 'test-mesos'
      MESOS_HOSTNAME: 'localhost'
      MESOS_LOG_DIR: '/var/log/mesos/master'
    depends_on:
      - clickhouse-keeper
  mesos-master-2:
    image: mesosphere/mesos-master:1.7.1
    environment:
      MESOS_ZK: 'zk://clickhouse-keeper:2181/mesos'
      MESOS_QUORUM: '1'
      MESOS_CLUSTER: 'test-mesos'
      MESOS_HOSTNAME: 'localhost'
      MESOS_LOG_DIR: '/var/log/mesos/master'
    depends_on:
      - clickhouse-keeper
  mesos-master-3:
    image: mesosphere/mesos-master:1.7.1
    environment:
      MESOS_ZK: 'zk://clickhouse-keeper:2181/mesos'
      MESOS_QUORUM: '1'
      MESOS_CLUSTER: 'test-mesos'
      MESOS_HOSTNAME: 'localhost'
      MESOS_LOG_DIR: '/var/log/mesos/master'
    depends_on:
      - clickhouse-keeper
keeper_config.xml
  <clickhouse>
      <listen_host>0.0.0.0</listen_host>
      <logger>
          <!-- Possible levels [1]:

          - none (turns off logging)
          - fatal
          - critical
          - error
          - warning
          - notice
          - information
          - debug
          - trace

              [1]: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105-L114
          -->
          <level>trace</level>
          <log>/var/log/clickhouse-keeper/clickhouse-keeper.log</log>
          <errorlog>/var/log/clickhouse-keeper/clickhouse-keeper.err.log</errorlog>
          <!-- Rotation policy
              See https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/FileChannel.h#L54-L85
          -->
          <size>1000M</size>
          <count>10</count>
          <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
      </logger>

      <max_connections>4096</max_connections>

      <keeper_server>
          <tcp_port>2181</tcp_port>

          <!-- Must be unique among all keeper serves -->
          <server_id>1</server_id>

          <log_storage_path>/var/lib/clickhouse/coordination/logs</log_storage_path>
          <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

          <coordination_settings>
              <operation_timeout_ms>10000</operation_timeout_ms>
              <min_session_timeout_ms>10000</min_session_timeout_ms>
              <session_timeout_ms>100000</session_timeout_ms>
              <raft_logs_level>information</raft_logs_level>
              <!-- All settings listed in https://github.com/ClickHouse/ClickHouse/blob/master/src/Coordination/CoordinationSettings.h -->
          </coordination_settings>

          <raft_configuration>
              <server>
                  <id>1</id>

                  <!-- Internal port and hostname -->
                  <hostname>localhost</hostname>
                  <port>9234</port>
              </server>

              <!-- Add more servers here -->

          </raft_configuration>
      </keeper_server>


      <openSSL>
      <server>
              <!-- Used for secure tcp port -->
              <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
              <certificateFile>/etc/clickhouse-keeper/server.crt</certificateFile>
              <privateKeyFile>/etc/clickhouse-keeper/server.key</privateKeyFile>
              <!-- dhparams are optional. You can delete the <dhParamsFile> element.
                  To generate dhparams, use the following command:
                  openssl dhparam -out /etc/clickhouse-keeper/dhparam.pem 4096
                  Only file format with BEGIN DH PARAMETERS is supported.
              -->
              <dhParamsFile>/etc/clickhouse-keeper/dhparam.pem</dhParamsFile>
              <verificationMode>none</verificationMode>
              <loadDefaultCAFile>true</loadDefaultCAFile>
              <cacheSessions>true</cacheSessions>
              <disableProtocols>sslv2,sslv3</disableProtocols>
              <preferServerCiphers>true</preferServerCiphers>
          </server>
      </openSSL>

  </clickhouse>

Mesos uses zookeeper for running highly-available setup of mesos-master. Hence we are starting 3 mesos-masters. Let’s validate who is the current leader

$ curl localhost:5050/master/state | jq ''
....
....
  "leader": "master@172.18.0.3:5050",

Let’s kill the container with that IP address and see if leader election happens properly and another mesos-master becomes the new leader.

$ docker stop mesos-mesos-master-2-1
mesos-mesos-master-2-1
I0417 06:45:46.542330    13 detector.cpp:152] Detected a new leader: (id='1')
I0417 06:45:46.542580    13 group.cpp:700] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
I0417 06:45:46.544179    13 zookeeper.cpp:262] A new leading master (UPID=master@172.18.0.4:5050) is detected
I0417 06:45:46.544409    13 master.cpp:2107] Elected as the leading master!
I0417 06:45:46.544483    13 master.cpp:1662] Recovering from registrar
I0417 06:45:46.545598     9 registrar.cpp:383] Successfully fetched the registry (0B) in 883968ns
I0417 06:45:46.545730     9 registrar.cpp:487] Applied 1 operations in 25020ns; attempting to update the registry
I0417 06:45:46.546718    15 registrar.cpp:544] Successfully updated the registry in 877824ns
I0417 06:45:46.546821    15 registrar.cpp:416] Successfully recovered registrar
I0417 06:45:46.547056    15 master.cpp:1776] Recovered 0 agents from the registry (124B); allowing 10mins for agents to reregister
$ curl localhost:5050/master/state | jq ''
....
....
  "leader": "master@172.18.0.4:5050",

As we can see, mesos-master failover happened successfully.

Let’s validate that statistics from ClickHouse-Keeper side.

$ echo stat | nc localhost 2181
ClickHouse Keeper version: v22.4.1.1173-testing-7c3227adc5a716eef2b2f9dcdf4874b83ecf56da
Clients:
 172.18.0.5:44286(recved=1096,sent=1096)
 172.18.0.4:47924(recved=1097,sent=1098)
 172.18.0.5:44288(recved=1096,sent=1096)
 172.18.0.3:50138(recved=1099,sent=1102)
 172.18.0.4:47922(recved=1098,sent=1100)
 127.0.0.1:34779(recved=0,sent=0)
 172.18.0.3:50140(recved=1099,sent=1101)

Latency min/avg/max: 0/0/2
Received: 6585
Sent : 6593
Connections: 6
Outstanding: 0
Zxid: 6569
Mode: standalone
Node count: 5

We can see the zookeeper nodes count and number of sent and received packets and connection details.

Conclusion

Overall, I am pretty impressed with ClickHouse Keeper and how it just works with other distributed systems. It will be interesting to see if replacing Zookeeper with ClickHouse Keeper will help in improving performance of large scale deployment of such systems.

© 2026 Pradeep Chhetri