Hybrid Transactional and Analytical Processing (HTAP) with TiFlash
- Objective: Learn to deploy TiFlash in a TiDB cluster on AWS (with Kubernetes)
- Prerequisites:
- Background knowledge of TiDB components
- Background knowledge of Kubernetes and TiDB Operator
- Background knowledge of TiFlash
- Optionality: Required
- Estimated time: 30 mins
TiFlash is the key component that makes the TiDB platform an effective Hybrid Transactional and Analytical Processing (HTAP) database. As a columnar storage extension of TiKV, TiFlash provides both isolation and strong consistency guarantees.
In TiFlash, the columnar replicas are asynchronously replicated according to the Raft consensus algorithm. When these replicas are read, the Snapshot Isolation level of consistency is achieved by validating Raft index and Multi-Version Concurrency Control (MVCC).
For more information on TiFlash,refer to TiFlash Overview.
Prepare
This document assumes that you have a TiDB cluster deployed in Kubernetes and data available in the TiDB cluster.
To deploy a TiDB cluster in AWS EKS, you can follow the instructions in Deploy a TiDB Cluster.
To generate data with sysbench, you can follow the instructions in Run Sysbench. Alternatively, you can create your own table and ingest data.
Provision TiFlash Nodes
Before deploying TiFlash, we need to provide TiFlash nodes. To do that, you can modify terraform.tfvars
to make the following changes:
create_tiflash_node_pool = true
cluster_tiflash_count = 1
cluster_tiflash_instance_type = "i3.4xlarge"
To apply the changes, you can run:
It might take 10 minutes or more to finish the process.
Deploy TiFlash
To deploy TiFlash, you can edit TidbCluster
CR:
In the editor, add the TiFlash specification:
spec:
tiflash:
baseImage: pingcap/tiflash
maxFailoverCount: 3
replicas: 1
storageClaims:
- resources:
requests:
storage: 100Gi
storageClassName: ebs-gp2
Once you have saved the changes, TiDB operator starts to deploy TiFlash. You can use the following command to observe the status of TiFlash pod,
NAME READY STATUS RESTARTS AGE
basic-discovery-6cd9cc794-vn7l6 1/1 Running 0 91m
basic-pd-0 1/1 Running 0 91m
basic-tidb-0 2/2 Running 0 89m
basic-tiflash-0 5/5 Running 0 13m
basic-tikv-0 1/1 Running 0 90m
basic-tikv-1 1/1 Running 0 90m
basic-tikv-2 1/1 Running 0 90m
Add TiFlash Replica
After TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated:
You can check the status of the TiFlash replicas of a specific table using the following statement:
SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'sbtest' and TABLE_NAME = 'sbtest1';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| sbtest | sbtest1 | 89 | 1 | | 1 | 1 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
1 row in set (0.01 sec)
In the result of above statement:
- The
AVAILABLE
column indicates whether the TiFlash replicas of this table are available for queries or not.1
means available and0
means unavailable. If you use DDL statements to modify the number of replicas, the replication status will be recalculated. - The
PROGRESS
column indicates the progress of the replication. The value is between0.0
and1.0
.1.0
means at least one replica is available.
Query with TiFlash
For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use TiFlash replicas based on the cost estimation.
+----------------------------+----------+---------+--------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+----------------------------+----------+---------+--------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+------+
| StreamAgg_24 | 1.00 | 1 | root | | time:18.113388ms, loops:2 | funcs:count(Column#10)->Column#5 | 372 Bytes | N/A |
| └─TableReader_25 | 1.00 | 2 | root | | time:18.102075ms, loops:2, rpc num: 2, rpc max:14.679672ms, min:14.493199ms, avg:14.586435ms, p80:14.679672ms, p95:14.679672ms, proc keys max:0, p95:0 | data:StreamAgg_8 | 206 Bytes | N/A |
| └─StreamAgg_8 | 1.00 | 2 | cop[tiflash] | | proc max:0s, min:0s, p80:0s, p95:0s, iters:2, tasks:2 | funcs:count(1)->Column#10 | N/A | N/A |
| └─TableFullScan_22 | 10000.00 | 1000 | cop[tiflash] | table:sbtest1 | proc max:0s, min:0s, p80:0s, p95:0s, iters:1, tasks:2 | keep order:false, stats:pseudo | N/A | N/A |
+----------------------------+----------+---------+--------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+------+
You can find more detailed usage information about TiFlash in Use TiFlash.
Cleanup TiFlash
Remove TiFlash replicas from all tables
You can use this command to identify all tables that are currently configured to have TiFlash replicas:
+--------------+------------+----------+---------------+-----------------+-----------+----------+ | TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS | +--------------+------------+----------+---------------+-----------------+-----------+----------+ | sbtest | sbtest1 | 47 | 1 | | 1 | 1 | +--------------+------------+----------+---------------+-----------------+-----------+----------+ 1 row in set (0.00 sec)
This command will generate
ALTER TABLE
statements that you can execute to remove TiFlash replicas from all tables:{.bash copyable} mysql -h "$tidb_host" -P 4000 -u root -BNe \ 'select concat("ALTER TABLE `", table_schema, "`.`", table_name, "` SET TIFLASH REPLICA 0;") from information_schema.tiflash_replica'
ALTER TABLE `sbtest`.`sbtest1` SET TIFLASH REPLICA 0;
If you want, you can pipe that output back to the MySQL client to execute all the commands:
Set
replicas
to 0 in thetiflash
section of the TidbClusterspec
structure:tiflash: baseImage: pingcap/tiflash maxFailoverCount: 3 replicas: 0 storageClaims: - resources: requests: storage: 100Gi storageClassName: ebs-gp2
Wait until
kubectl get sts -n poc
shows0/0
for the-tiflash
service:NAME READY AGE my-cluster-pd 3/3 4h8m my-cluster-tidb 2/2 4h6m my-cluster-tiflash 0/0 15m my-cluster-tikv 3/3 4h7m
Delete the
my-cluster-tiflash
StatefulSet:statefulset.apps "my-cluster-tiflash" deleted
Update
create_tiflash_node_pool = false
in your terraform configuration and executeterraform apply
to remove the unneeded resources.
Comments
0 comments
Please sign in to leave a comment.