Binlog
- Objective: Learn to use Binlog to sync data between TiDB cluster on AWS (with Kubernetes)
- Prerequisites:
- Background knowledge of TiDB components
- Background knowledge of Kubernetes and TiDB Operator
- Background knowledge of TiDB Binlog
- AWS account
- TiDB cluster on AWS
- Optionality: Optional
- Estimated time: 30 mins
Create Downstream TiDB Cluster
- Optionality: Optional
If you have a downstream cluster already deployed, you can skip this section.
Otherwise, follow these steps to deploy an additional "downstream" TiDB Cluster to use as the destination for TiDB Binlog replication.
Provision Nodes
Create a downstream_cluster.tf
file and add the following to provision machines for the downstream cluster.
cat > downstream_cluster.tf<<EOF
module downstream-cluster {
providers = {
helm = helm.eks
}
source = "../modules/aws/tidb-cluster"
eks = local.eks
subnets = local.subnets
region = var.region
cluster_name = "downstream-cluster"
ssh_key_name = module.key-pair.key_name
pd_count = 3
pd_instance_type = "c5.large"
tikv_count = 3
tikv_instance_type = "c5d.large"
tidb_count = 2
tidb_instance_type = "c4.large"
monitor_instance_type = "c5.large"
create_tidb_cluster_release = false
}
EOF
In the above configuration, we provision 3 PD instanceis, 3 TiKV instances and 2 TiDB instance. You can modify the instance types and count to match your needs.
To create the new cluster nodes, execute the following two commands:
Deploy TiDB Cluster
Create a configuration file downstream_cluster.yaml
for TiDB Operator to deploy the downstream cluster. You need to change CLUSTER_NAME
to the same name as the cluster_name
in downstream_cluster.tf
and ensure that the replicas
in the spec of pd
, tikv
and tidb
match the pd_count
, tikv_count
and tidb_count
in downstream_cluster.tf
.
downstream_cluster_name=downstream-cluster
sed "s/CLUSTER_NAME/$downstream_cluster_name/" \
manifests/db.yaml.example > downstream_cluster.yaml
To create the TiDB cluster, execute the following two commands:
downstream_namespace=poc-downstream
kubectl create namespace "$downstream_namespace"
kubectl create -f downstream_cluster.yaml -n "$downstream_namespace"
Watch the pods get created, and wait until all of them are running before proceeding.
Now we have two TiDB clusters running in two different namespaces of a single Kubernetes cluster.
NAMESPACE NAME READY PD STORAGE READY DESIRE TIKV STORAGE READY DESIRE TIDB READY DESIRE AGE
poc-downstream downstream-cluster True pingcap/pd:v3.0.13 1Gi 3 3 pingcap/tikv:v3.0.13 45Gi 3 3 pingcap/tidb:v3.0.13 2 2 17m
poc my-cluster True pingcap/pd:v3.0.13 1Gi 3 3 pingcap/tikv:v3.0.13 45Gi 3 3 pingcap/tidb:v3.0.13 2 2 139m
Import Data to Downstream Cluster
Optionality: Optional
If you already have data in your upstream cluster (whether from sysbench or another application), and you want that existing data to form the basis for your TiDB Binlog deployment, you must import the data into the downstream cluster before starting the Drainer component of TiDB Binlog. This is the same concept used in MySQL to "seed" a replica, where you'd typically take a backup or snapshot of the primary/upstream server and then load the backup into the downstream/replica server before setting up replication.
Note that the Pump component of TiDB Binlog must not have been started after the timestamp of the data snapshot/export you've used to seed the downstream cluster. If you have an operational load on the cluster, set up the Pump component (but not the Drainer) before creating a new export from the upstream cluster.
The Import & Export guide shows how to export data from the upstream cluster into S3, and how to restore that export back into the same cluster. To import/seed data into the downstream cluster before deploying TiDB Binlog, we will adapt the Data Import section of the Import & Export guide to act against the downstream cluster instead of the upstream cluster.
Assuming you have already created the export of the data to S3, the process to restore it to the downstream cluster is straightforward.
Get Details of Backup
You'll need the Backup Path and Commit Ts of the export you created. The Backup Path will be used with the Restore job to fetch the data from S3, and the Commit Ts will be used in the Drainer configuration to tell it where to begin replication.
kubectl get backup export-to-s3 -n "$upstream_namespace" -o jsonpath='{.status.backupPath}{"\n"}{.status.commitTs}{"\n"}'
s3://bucket/prefix/backup-2020-06-18T18:18:49Z.tgz
417465454990983179
Create Secrets
Create s3-secret
TiDB operator needs to access S3 when performing data import & export operations. To do that, you can create the s3-secret secret which stores the credential used to access S3:
kubectl create secret generic s3-secret --from-literal=access_key="${aws_access_key}" --from-literal=secret_key="${aws_secret_key}" -n "$downstream_namespace"
secret/s3-secret created
Create tidb-secret
TiDB operator needs to access TiDB when performing data import & export operations. To do that, you can create a secret which stores the password of the user account needed to access the TiDB cluster in the downstream cluster:
kubectl create secret generic export-secret --from-literal=password="${password}" -n "$downstream_namespace"
secret/export-secret created
Import Data
Install RBAC
The Backup and Restore resource definitions must also be installed in the namespace of the downstream cluster.
role.rbac.authorization.k8s.io/tidb-backup-manager created
serviceaccount/tidb-backup-manager created
rolebinding.rbac.authorization.k8s.io/tidb-backup-manager created
Configure Restore CR
You should replace values including <>
with the correct values for your envrionment and save as import-from-s3.yaml
. Note that these values must match your downstream cluster.
Note that ${backupPath}
should be the full path to the file created by the Backup job in the Import & Export guide. That must include the s3://
scheme, the bucket name, any prefix, and the full filename of the backup. You can find the backupPath by looking at the Backup job, as described in Verify Data Export.
downstream_namespace=<downstream_namespace>
cluster_name=<cluster_name>
tidb_port=<tidb_port>
tidb_user=<tidb_user>
region=<region>
backupPath=<backupPath>
cat > import-from-s3.yaml<<EOF
apiVersion: pingcap.com/v1alpha1
kind: Restore
metadata:
name: import-from-s3
namespace: ${downstream_namespace}
spec:
backupType: full
to:
host: ${cluster_name}-tidb
port: ${tidb_port}
user: ${tidb_user}
secretName: export-secret
s3:
provider: aws
region: ${region}
secretName: s3-secret
path: ${backupPath}
storageClassName: ebs-gp2
storageSize: 100Gi
EOF
Perform Data Import
restore.pingcap.com/import-from-s3 created
You can use the following command to check the import status:
NAME READY STATUS RESTARTS AGE
restore-import-from-s3-5tg75 1/1 Running 0 22s
Deploy TiDB Binlog
The recommended production deployment of TiDB Binlog is to create a number of Pump replicas equal to the number of TiDB replicas and to set "affinity" for Pump and TiDB. This means that one Pump replica will run on the same node as each TiDB replica. This means that if one of the TiDB nodes goes offline, you will not lose any binlogs.
We will deploy TiDB Binlog components (Pump & Drainer) in the "upstream" (first) cluster.
Deploy Pump
Create a pump.yaml
file with the following contents:
cat > pump.yaml<<EOF
spec:
pump:
baseImage: pingcap/tidb-binlog
replicas: 2
storageClassName: ebs-gp2
requests:
storage: 30Gi
schedulerName: default-scheduler
config:
addr: 0.0.0.0:8250
gc: 7
heartbeat-interval: 2
EOF
If you've been following this guide from the beginning, your upstream cluster should be called "my-cluster" and the upstream namespace is "poc". If you've used different values for either of those, make sure you reflect them in the variables you set for the following commands.
upstream_cluster_name=my-cluster
upstream_namespace=poc
kubectl patch tc/"$upstream_cluster_name" -n "$upstream_namespace" \
--type=merge -p "$(<pump.yaml)"
Confirm that pump pod is running:
NAME READY STATUS RESTARTS AGE
my-cluster-pump-0 1/1 Running 0 32m
my-cluster-pump-1 1/1 Running 0 33m
Deploy Drainer
Configure Drainer
You can search for the available versions
pingcap/tidb-drainer v1.1.0 A Helm chart for TiDB Binlog drainer.
pingcap/tidb-drainer v1.0.6 A Helm chart for TiDB Binlog drainer.
pingcap/tidb-drainer v1.0.5 A Helm chart for TiDB Binlog drainer.
pingcap/tidb-drainer v1.0.4 A Helm chart for TiDB Binlog drainer.
pingcap/tidb-drainer latest A Helm chart for TiDB Binlog drainer.
pingcap/tidb-drainer dev A Helm chart for TiDB Binlog drainer.
Make sure you choose the same version of the pingcap/tidb-drainer
chart that you are using for your tidb-operator
chart.
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
tidb-operator 1 Wed Jun 17 17:36:19 2020 DEPLOYED tidb-operator-v1.1.0 tidb-admin
In this case, the version being used is v1.1.0
.
Make these important changes to the drainer.yaml
file:
clusterName
must be set to the name of your upstream clusterclusterVersion
must be the same version as your running upstream and downstream clustersinitialCommitTs
should be set to the Commit Ts of the snapshot you took from the upstream cluster (.status.commitTs
), or"0"
if both clusters are empty- Surround the value with double quotes so that a very large value is not evaluated to have an exponent
config
must be modified so that the[syncer.to]
section has valid connection details for the downstream clusterhost
should be the Kubernetes hostname of the downstream TiDB serviceuser
andpassword
must be a user account on the downstream TiDB serviceport
should be the port of the downstream TiDB server (4000)
Alternatively, you can use the following minimal file as your drainer.yaml
file:
cat > drainer.yaml<<EOF
clusterName: my-cluster
clusterVersion: v4.0.0
storageClassName: ebs-gp2
storage: 10Gi
initialCommitTs: "0"
config: |
detect-interval = 10
[syncer]
worker-count = 16
txn-batch = 20
disable-dispatch = false
ignore-schemas = "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql"
safe-mode = false
db-type = "tidb"
[syncer.to]
host = "downstream-cluster-tidb.poc-downstream.svc"
user = "root"
password = ""
port = 4000
EOF
Install Drainer
helm install pingcap/tidb-drainer --name=drainer --namespace=${upstream_namespace} \
--version=v1.1.0 -f drainer.yaml
NAME: drainer
LAST DEPLOYED: Wed Jun 17 22:18:36 2020
NAMESPACE: poc
STATUS: DEPLOYED
RESOURCES:
==> v1/ConfigMap
NAME AGE
my-cluster-drainer-drainer-798a65fa 0s
==> v1/Pod(related)
NAME AGE
my-cluster-drainer-drainer-0 0s
==> v1/Service
NAME AGE
my-cluster-drainer-drainer 0s
==> v1/StatefulSet
NAME AGE
my-cluster-drainer-drainer 0s
NOTES:
StatefulSet my-cluster-drainer-drainer installed.
1. Watch if the drainer instance gets created:
watch kubectl get pod -n poc my-cluster-drainer-drainer-0
2. Check if the drainer instance works properly:
kubectl logs -f -n poc my-cluster-drainer-drainer-0
Monitor the progress of Drainer deployment and Wait until the Drainer pod is Running before proceeding.
NAME READY STATUS RESTARTS AGE
my-cluster-drainer-drainer-0 1/1 Running 0 7m22s
Insert Data
We can connect to the TiDB instances in both clusters from the bastion machine, because they both run in the same VPC.
We need to get the endpoint for the TiDB service in the downstream cluster so that we can connect to it to monitor progress. This command will give the endpoints for the TiDB services in both clusters. Make note of both of them.
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
poc-downstream downstream-cluster-tidb LoadBalancer 172.20.139.226 ad90873b7bdb046d086e37806ba563e0-e3e3ed2e571f96ea.elb.us-west-2.amazonaws.com 4000:31863/TCP,10080:32378/TCP 165m
poc-downstream downstream-cluster-tidb-peer ClusterIP None <none> 10080/TCP 165m
poc my-cluster-tidb LoadBalancer 172.20.154.115 ac67299dd43304e83ac09b93d24b0883-82856fa183371c57.elb.us-west-2.amazonaws.com 4000:31108/TCP,10080:32147/TCP 4h47m
poc my-cluster-tidb-peer ClusterIP None <none> 10080/TCP 4h47m
After connecting to the bastion machine, start a screen session to make it easier to go back and forth between the two clusters.
Verify Operation of Downstream Cluster
Connect to the downstream cluster to make sure it's operating correctly and to confirm that the Drainer is connected.
mysql -P4000 -u root --prompt='downstream> ' -h ad90873b7bdb046d086e37806ba563e0-e3e3ed2e571f96ea.elb.us-west-2.amazonaws.com
downstream> show databases;
+--------------------+
| Database |
+--------------------+
| INFORMATION_SCHEMA |
| METRICS_SCHEMA |
| PERFORMANCE_SCHEMA |
| mysql |
| test |
| tidb_binlog |
+--------------------+
6 rows in set (0.00 sec)
downstream> select * from tidb_binlog.checkpoint;
+---------------------+----------------------------------------------------------------------------------------------------------------------------+
| clusterID | checkPoint |
+---------------------+----------------------------------------------------------------------------------------------------------------------------+
| 6839373075379446253 | {"consistent":false,"commitTS":417446991410495489,"ts-map":{"master-ts":417446636244697090,"slave-ts":417446636370788354}} |
+---------------------+----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
The tidb_binlog.checkpoint
table is used by TiDB Binlog to keep track of its progress in replicating from the upstream to downstream clusters.
Insert Data Into Upstream Cluster
Now, connect to the upstream cluster to create a table and insert some data into it.
mysql -P4000 -u root --prompt='upstream> ' -h ac67299dd43304e83ac09b93d24b0883-82856fa183371c57.elb.us-west-2.amazonaws.com
create database binlog_test;
create table binlog_test.binlog_test (id int unsigned not null auto_increment primary key, v varchar(255));
insert into binlog_test.binlog_test (v) values (@@hostname);
upstream> create database binlog_test;
Query OK, 0 rows affected (0.09 sec)
upstream> create table binlog_test.binlog_test (id int unsigned not null auto_increment primary key, v varchar(255));
Query OK, 0 rows affected (0.10 sec)
upstream> insert into binlog_test.binlog_test (v) values (@@hostname);
Query OK, 1 row affected (0.01 sec)
Confirm Replication to Downstream Cluster
Now, back in the downstream instance, make sure the binlog_test
database is there:
downstream> show databases;
+--------------------+
| Database |
+--------------------+
| INFORMATION_SCHEMA |
| METRICS_SCHEMA |
| PERFORMANCE_SCHEMA |
| binlog_test |
| mysql |
| test |
| tidb_binlog |
+--------------------+
7 rows in set (0.00 sec)
And let's see the content of the binlog_test
table we created:
downstream> select binlog_test.*, @@hostname from binlog_test.binlog_test;
+----+-------------------+---------------------------+
| id | v | @@hostname |
+----+-------------------+---------------------------+
| 1 | my-cluster-tidb-1 | downstream-cluster-tidb-1 |
+----+-------------------+---------------------------+
1 row in set (0.00 sec)
The value in the v
column is the hostname inserted on the upstream cluster, and the value in the @@hostname
column is the hostname of the downstream cluster node. This shows that the data has replicated successfully from the upstream cluster to the downstream cluster.
Comments
0 comments
Please sign in to leave a comment.