ClickHouse at Scale on Kubernetes

A hands-on 2026 guide to run ClickHouse on Kubernetes: storage, sharding, schema, resource tuning, autoscaling, and cost controls for startup scale.

Hook: When fast analytics meets unpredictable growth

Nothing breaks developer velocity faster than a production analytics cluster that can’t keep up. Startups pushing product-market fit see spikes in ingestion and ad-hoc queries overnight — and that’s where ClickHouse shines. But deploying ClickHouse on Kubernetes poorly can become a cost and operational nightmare: noisy neighbors, I/O bottlenecks, and slow merges that blow out cloud bills.

This guide captures a pragmatic, step-by-step path to run ClickHouse at scale on Kubernetes in 2026: storage design, sharding and replication, schema and query patterns, resource requests and autoscaling, and cost controls that let a startup scale adoption without breaking the bank.

Why ClickHouse on Kubernetes in 2026?

ClickHouse has exploded in adoption — driven by modern analytics patterns and a wave of investment and product maturity. As of early 2026, ClickHouse Inc. raised a large financing round, highlighting broader enterprise and cloud adoption across OLAP workloads. Running ClickHouse on Kubernetes gives teams portability, GitOps-style lifecycle management, and closer integration with Kafka, object storage, and CI/CD. But the trade-off is operational complexity: stateful workloads, disk performance, and careful cluster topology planning.

High-level architecture pattern

Before diving into YAML and SQL, adopt a proven separation of concerns:

Ingestion layer — stateless frontends or Kafka handlers that buffer writes.
Storage/compute layer — ClickHouse pods as statefulsets or operator-managed stateful clusters handling MergeTree storage.
Coordination — ClickHouse Keeper (or ZooKeeper) for cluster metadata and replication coordination.
Cold storage — S3-compatible object stores for tiered/archived data via ClickHouse’s disk policies.
Scaling control — Node pool autoscaling + K8s autoscalers (KEDA/HPA/CA) and resource requests tuned for predictable performance.

1) Storage: the foundation of ClickHouse performance

OLAP workloads are I/O heavy. Mis-provisioned disks or noisy neighbors are the top cause of degraded query and ingestion performance. Design storage with these priorities: low latency, consistent IOPS, and cost-tiering using object storage when appropriate.

Local NVMe vs network-attached volumes

Local NVMe (recommended for hot shards): best throughput/latency, predictable merges, ideal for primary shards and heavy write traffic. Use local PersistentVolumes (local PV) or local SSD instance storage with careful PodDisruptionBudgets and node affinity.
Networked SSD (EBS/GCE PD): OK for moderate loads. Provision with provisioned IOPS on heavy OLAP workloads to avoid burst throttling.
Object storage (S3/MinIO): Use as warm/cold tier for infrequently queried data or for replica backups. In 2026, ClickHouse’s native S3-backed disk policies are standard for cost savings at scale.

Example storageClass for local NVMe (Kubernetes)

# Example: local-path or local-volume StorageClass skeleton
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-nvme
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

Disk policies and tiering (ClickHouse config)

Configure disks and policies in clickhouse-server config.xml to use local NVMe for default, and S3 for cold storage. This lets you offload historic partitions to S3 with TTL rules.

<storage_configuration>
  <disks>
    <disk>
      <name>fast_local</name>
      <path>/var/lib/clickhouse/fast</path>
    </disk>
    <disk>
      <name>s3_cold</name>
      <type>s3</type>
      <endpoint>https://s3.company.example</endpoint>
      <bucket>clickhouse-cold</bucket>
    </disk>
  </disks>
  <policies>
    <policy>
      <name>cost_optimized</name>
      <volumes>
        <volume><disk>fast_local</disk></volume>
        <volume><disk>s3_cold</disk></volume>
      </volumes>
    </policy>
  </policies>
</storage_configuration>

2) Sharding and replication strategy

The core of scaling ClickHouse is how you shard and replicate data. Shard for parallelism; replicate for availability and read scale. Keep sharding aligned with query patterns.

Sharding rules

Shard by a high-cardinality dimension that partitions traffic evenly (e.g., customer_id hashed into N buckets).
Keep time partitioning (date) in your MergeTree PARTITION BY to make TTLs and drops efficient.
Design shards to fit node capacity: aim for 500GB–2TB per shard on fast NVMe depending on compute resources and query mix.

Replication and cluster topology

Use at least 2 replicas for production-critical data; 3 if you need higher availability during maintenance.
Distribute replicas across failure domains (node pools / AZs) with anti-affinity rules in Kubernetes.
Use ClickHouse Keeper for lightweight coordination (or ZooKeeper if your environment standardizes on it).

Distributed table example

CREATE TABLE default.events_local
(
  event_date Date,
  customer_id UInt64,
  event_type String,
  payload String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (customer_id, event_date);

CREATE TABLE default.events ON CLUSTER my_cluster AS default.events_local ENGINE = Distributed(my_cluster, default, events_local, cityHash64(customer_id));

3) Schema design and ingestion patterns for cost and performance

Schema design in ClickHouse is an OLAP art: order_by, partitioning, compression, and projections affect both performance and storage cost.

Order_by vs Primary key

ORDER BY determines how data is sorted on disk and what ranges can be pruned during queries. Put high-selectivity columns first (customer_id before timestamp if queries filter by customer).
Don't use too many columns in ORDER BY — keep it narrow to reduce write amplification on merges.

Use partitions for retention and TTLs

Partition by month (or day for extreme ingestion) so TTL deletions remove whole partitions rather than deleting rows one-by-one, which is costly.

ALTER TABLE default.events
MODIFY SETTING storage_policy = 'cost_optimized',
TTL event_date + INTERVAL 90 DAY;

Ingestion best practices

Buffering: Use Kafka + MaterializedViews or Buffer engine to smooth spikes. In 2026, Kinesis-to-ClickHouse patterns still work, but Kafka remains the most proven at scale.
Batch writes: Send larger batches (tens of KB–MB) to reduce CPU and network overhead on ClickHouse.
Schema on write: Validate and normalize upstream to avoid heavy query-time transforms.

4) Resource requests, limits, and QoS

Kubernetes resource requests directly influence scheduler decisions and node autoscaling costs. For stateful ClickHouse pods, be deterministic: reserve CPU and memory to avoid noisy neighbor interference.

Baseline resource recommendations (starting point)

Small production replica (ingest+query): 8–16 vCPU, 32–64 GiB RAM, local NVMe.
Heavy query node: 32+ vCPU, 128+ GiB RAM, 4–8 TB NVMe depending on concurrency.
Coordination (ClickHouse Keeper) nodes: 2–3 vCPU, 4–8 GiB RAM.

Kubernetes manifest snippet (statefulset resource requests)

resources:
  requests:
    cpu: "8"
    memory: "32Gi"
  limits:
    cpu: "12"
    memory: "48Gi"

Use requests so the scheduler uses the right nodes. Keep limits a bit higher than requests to allow bursts but not unlimited behavior that could starve other pods.

PodDisruptionBudgets and anti-affinity

Ensure availability during node upgrades by using PodDisruptionBudgets and hard anti-affinity so replicas don’t colocate.

podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: app
        operator: In
        values:
        - clickhouse
    topologyKey: kubernetes.io/hostname

5) Autoscaling strategy — stay scalable and cost-efficient

Stateful OLAP systems don’t autoscale like stateless web apps. Your strategy should combine separation of roles, event-based scaling for ingestion, and node autoscaling for capacity.

What to autoscale

Stateless ingestion frontends: Autoscale horizontally with HPA/KEDA using Kafka lag or HTTP QPS as metrics.
ClickHouse compute: Scale vertically (bigger nodes) or add/remove shards in planned steps — automatic horizontal scaling of stateful replicas is complex and error-prone.
Node pools: Use cluster autoscaler to add nodes when PVCs bind or when scheduled pods request more resources.

KEDA example for ingestion workers

apiVersion: keda.sh/v1
kind: ScaledObject
metadata:
  name: clickhouse-ingest-scaler
spec:
  scaleTargetRef:
    name: clickhouse-ingest-deployment
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: my-kafka:9092
      topic: events
      consumerGroup: clickhouse-ingest
      lagThreshold: '5000'

Planned shard scaling

When you need more shards, follow a documented plan: add new ClickHouse instances, create new shards, backfill small time windows, and rebalance metadata. Automate with the ClickHouse operator or custom GitOps flows, but avoid ad-hoc re-sharding under load.

6) Query optimization and cost control

Optimize queries and data lifecycle to reduce CPU and I/O costs. In 2026, projections, materialized views, and read time sampling remain essential tools.

Use materialized views for pre-aggregation

CREATE MATERIALIZED VIEW default.events_by_customer
TO default.events_agg
AS
SELECT
  customer_id,
  toStartOfHour(event_date) AS hour,
  count() AS cnt
FROM default.events
GROUP BY customer_id, hour;

Projections and skipping indices

Projections: store pre-sorted sub-tables for common query shapes and dramatically reduce CPU for aggregations.
Skipping indices: use minmax or bloom_filter indices for high-selectivity columns to skip parts during reads.

Limit scans with ORDER BY and LIMIT

For dashboard queries, use LIMIT + ORDER BY tuned with appropriate ORDER keys to avoid full-table scans.

7) Operational patterns and reliability

Successful long-term operations come down to observability, safe upgrades, and disaster recovery.

Monitoring and alerting

Collect ClickHouse metrics (queries, merges, parts, merges_queue_size) via Prometheus exporters.
Alert on sustained high merge rates, long-running merges, disk pressure, and replication lag.
Track Kafka lag for ingestion to detect upstream backpressure.

Backups and DR

Configure periodic backups to S3 of table parts and metadata.
For cross-cluster DR, use replicated tables with replicas in a secondary region, or snapshot exports to S3 to rehydrate clusters.

Upgrades

Prefer rolling upgrades of the ClickHouse operator and server, upgrade replicas one at a time, and monitor merges during upgrades. In 2026, operators support zero-downtime rolling upgrades when used with proper PDBs.

8) Cost controls and capacity planning

At scale, storage and CPU are the biggest costs. Use these levers:

Tier old data to S3 and apply aggressive TTLs for ephemeral datasets.
Use compression codecs (LZ4 for hot, ZSTD higher compression for cold) — ClickHouse supports codec selection per column.
Right-size nodes and choose instance types with local NVMe for heavy workloads; avoid oversized memory or CPU that sits idle most of the time.

9) Example GitOps workflow with ClickHouse Operator

Using an operator reduces error-prone manual steps. Below is a simplified ClickHouseInstallation CR example (conceptual) to manage a 3-shard, 2-replica cluster.

apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  name: analytics
spec:
  configuration:
    zookeeper:
      nodes:
        - host: zk-0
        - host: zk-1
        - host: zk-2
  templates:
    podTemplate:
      spec:
        containers:
        - name: clickhouse
          resources:
            requests:
              cpu: "8"
              memory: "32Gi"
  clusters:
    - name: cluster-1
      layout:
        shardsCount: 3
        replicasCount: 2

Use GitOps (ArgoCD/Flux) to manage this CR. That gives you auditable, automated cluster changes and predictable scaling steps when you add shards or change resources.

10) Case study: a startup that scaled from 3 nodes to 30 nodes without runaway costs

A B2B analytics startup adopted the pattern above in late 2024–2025. They began with a 3-node ClickHouse cluster on Kubernetes. After product-market fit in 2025, traffic increased 12x in three months. Key actions that prevented cost overruns:

Moved hot partitions to local NVMe and tiered 18-month history to S3.
Introduced pre-aggregations (materialized views) for dashboard queries which cut CPU by 60%.
Used KEDA to autoscale ingestion services and cluster autoscaler for node pools; they never autoscaled ClickHouse pods directly.
Implemented aggressive TTLs for test and ephemeral datasets, reducing storage footprint by 40%.

The result: they scaled to 30 nodes and handled 10k QPS at peak while keeping monthly costs within forecasted budgets.

Trends and predictions for 2026

Look out for these patterns in 2026:

Managed hybrid approaches: Many organizations will use a hybrid model — managed ClickHouse for base needs and self-hosted clusters for cost-sensitive, high-throughput workloads.
Smarter tiering: Native tiering between NVMe and object storage will become the default cost-control method for large OLAP datasets.
Operator maturity: Operators will add more safe re-sharding automation and better multi-cluster replication tools.
Query acceleration: Wider adoption of projections and built-in approximate aggregation functions to reduce cost per query.

"ClickHouse’s 2025 product and funding momentum makes it a first-class option for startups and enterprises building real-time analytics at scale in 2026." — industry roundup, early 2026

Checklist: Deploy ClickHouse on K8s without breaking the bank

Choose local NVMe for hot data; configure an S3 policy for cold data.
Shard by a hashed high-cardinality key; partition by time.
Use 2–3 replicas across failure domains with anti-affinity and PDBs.
Reserve CPU/memory via requests; don’t rely on limits alone.
Separate ingestion (scale horizontally) from ClickHouse (scale planned shards).
Implement KEDA/HPA for ingesters; use cluster autoscaler for node pools.
Materialize common aggregates and use projections to reduce query costs.
Monitor merge activity, replication lag, and disk pressure proactively.
Automate with a ClickHouse operator and GitOps for reproducible rollouts.

Actionable next steps (30/60/90 day plan)

30 days: Set up a small K8s ClickHouse cluster with local NVMe, enable Prometheus metrics, and move one hot table over with a materialized view.

60 days: Add Kafka-based ingestion with KEDA autoscaling, set up S3 policies for cold storage, and create retention TTLs for 90-day archival.

90 days: Run a planned scaling test — add a shard, validate redistribution, and measure cost/throughput. Create runbooks for upgrades and DR.

Closing thoughts and call-to-action

ClickHouse on Kubernetes is powerful — but only when you treat storage, sharding, and autoscaling as architectural levers instead of knobs. With careful topology planning, disciplined schema design, and automated operational practices, startups can grow from prototypes to production at scale while keeping costs under control.

Ready to adopt ClickHouse on Kubernetes with a production-grade blueprint? Download our free repo with operator manifests, sample clickhouse configs, and a 90-day runbook — or book a technical review with our platform engineers to map a cost-optimized migration plan for your workloads.