Import & Export
- Objective: Learn how to a TiDB cluster on AWS (with Kubernetes)
- Prerequisites:
- Background knowledge of TiDB components
- Background knowledge of Kubernetes and TiDB Operator
- Background knowledge of Mydumper & TiDB Lightning
- AWS account
- TiDB cluster on AWS
- Optionality: Optional
- Estimated time: 30 minutes
In this document, we will demonstrate how to use Mydumper and TiDB Lightning to export data from a TiDB cluster as SQL files and import SQL files to TiDB.
Prepare
Prepare Data
- Optionality: You can skip this section if you already have run sysbench in the previous step, or if you've inserted data into the cluster by some other means.
If you haven't already run sysbench or loaded other data into the cluster, refer to Run Sysbench to import some data. After you finish loading data, switch back here and continue with the following steps.
Grant AWS Account Permissions
Before you perform backup, AWS account permissions need to be granted to the Backup Custom Resource (CR) object. There are three methods to grant AWS account permissions:
- Grant permissions by importing AccessKey and SecretKey
- Grant permissions by associating IAM with Pod
- Grant permissions by associating IAM with ServiceAccount
In this doc, we will grant permissions by importing AccessKey and SecretKey to grant AWS account permissions.
Note
Granting permissions by associating the pod with an IAM role associating IAM with a ServiceAccount is recommended in a production setting.
Create S3 Bucket
You can skip this section if you already have an S3 bucket to store backup data.
If you don't already have an S3 bucket for data export, create an S3 bucket in the same AWS region as your EKS cluster:
Install RBAC
Working from the deploy/aws
subdirectory of your tidb-operator
clone, create the rbac resources related to backups:
role.rbac.authorization.k8s.io/tidb-backup-manager created
serviceaccount/tidb-backup-manager created
rolebinding.rbac.authorization.k8s.io/tidb-backup-manager created
Create Secrets
Create s3-secret
TiDB operator needs to access S3 when performing data import & export operations. To do that, you can create the s3-secret secret which stores the credential used to access S3 (remember to replace keys with real values):
aws_access_key=<AWS Access Key>
aws_secret_key=<AWS Secret Key>
kubectl create secret generic s3-secret --from-literal=access_key="${aws_access_key}" --from-literal=secret_key="${aws_secret_key}" -n "$namespace"
secret/s3-secret created
Create export-secret
TiDB operator needs to access TiDB when performing data import & export operations. To do that, you can create a secret which stores the password of the user account needed to access the TiDB cluster (remember to replace password with real value):
password=<TiDB Password>
kubectl create secret generic export-secret --from-literal=password="${password}" -n "$namespace"
secret/export-secret created
Data Export
This section describes how to perform data export. We use Backup Custom Resource (CR) to desbribe an data export. TiDB Operator performs data export operation based on the specification in the Backup CR.
Checksum Table
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| sbtest | sbtest1 | xxx | xxx | xxx|
+---------+------------+--------------------+-----------+-------------+
Configure Backup CR
The following is an example Backup CR.
You should replace values including <>
with the correct values for your envrionment and save as export-aws-s3.yaml
.
namespace=<namespace>
cluster_name=<cluster_name>
tidb_port=<tidb_port>
tidb_user=<tidb_user>
region=<region>
bucket=<bucket>
prefix=<prefix>
cat > export-aws-s3.yaml<<EOF
apiVersion: pingcap.com/v1alpha1
kind: Backup
metadata:
name: export-to-s3
namespace: ${namespace}
spec:
from:
host: ${cluster_name}-tidb
port: ${tidb_port}
user: ${tidb_user}
secretName: export-secret
s3:
provider: aws
secretName: s3-secret
region: ${region}
bucket: ${bucket}
prefix: ${prefix}
storageClassName: ebs-gp2
storageSize: 100Gi
EOF
Note: Each Backup job you run must have a unique
name
. If you want to create another export of your data, edit your .yaml file to give a unique value for.metadata.name
.
For the deployment scenario described in this guide, the storageClassName
of ebs-gp2
is correct, but if you've chosen different instance types you may be able to use local-storage
.
Perform Data Export
You can perform data export using the following command:
Verify Data Export
You can use the following command to check the data export status:
NAME READY STATUS RESTARTS AGE
backup-export-to-s3-fb7gq 1/1 Running 0 31s
After some time, the Backup job will complete, at which point the Pod state will look like this:
NAME READY STATUS RESTARTS AGE
backup-export-to-s3-fb7gq 0/1 Completed 0 2m49s
And you can view the logs of the Backup job using this command, adjusting the name of the job to match your .yaml file if necessary, if you want to follow along until the job is completed:
# kubectl logs job/backup-export-to-s3-n "$namespace" -f
Create rclone.conf file.
/tidb-backup-manager export --namespace=poc --backupName=export-to-s3 --bucket=bucket --storageType=s3
I0616 22:13:11.437458 1 export.go:72] start to process backup poc/export-to-s3
I0616 22:13:11.454114 1 backup_status_updater.go:66] Backup: [poc/export-to-s3] updated successfully
I0616 22:13:11.459206 1 manager.go:169] cluster poc/export-to-s3 tikv_gc_life_time is 10m0s
I0616 22:13:11.467393 1 manager.go:233] set cluster poc/export-to-s3 tikv_gc_life_time to 72h success
I0616 22:13:16.690785 1 manager.go:251] reset cluster poc/export-to-s3 tikv_gc_life_time to 10m0s success
I0616 22:13:16.690806 1 manager.go:266] dump cluster poc/export-to-s3 data to /backup/bucket/prefix/backup-2020-06-16T22:13:11Z success
You can use the aws
command-line client to view the artifact/archive/file related to your Backup job:
2020-06-16 19:05:03 1648500666 backup-2020-06-16T19:01:35Z.tgz
To know what file to restore later from this backup, you can get the backupPath of the Backup using this command:
s3://bucket/prefix/backup-2020-06-16T22:13:11Z.tgz
You should also capture the Commit Ts of the backup, if you want to use it to seed a replica/downstream cluster for use with TiDB Binlog.
417465454990983179
Checksum Table After Export
You can compare those values later to the data after you've restored cluster. These values will only be meaningful if you stopped write activity to the cluster before taking the backup.
+---------+------------+---------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+---------------------+-----------+-------------+
| sbtest | sbtest1 | 6011909912281260729 | 34496988 | 4327530401 |
+---------+------------+---------------------+-----------+-------------+
1 row in set (3.68 sec)
Clean Up Backup Resources
Note that the Backup job creates a PVC (and, if necessary, a PV) resource in your Kubernetes cluster. This volume is used to temporarily hold the output of Mydumper before it's uploaded to S3. If you don't want to keep this volume after the backup completes, you should delete it.
Before deleting the PVC associated with the Backup, you must delete the Pod associated with the Backup. This command will delete all pods associated with Backup jobs.
pod "backup-export-to-s3-fb7qg" deleted
Use this command to view all PVCs associated with Backups:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
backup-pvc-77c7f7f95 Bound pvc-86f03442-3523-449e-b298-ea48ae2341bd 10Gi RWO gp2 5m56s
This command will delete all PVCs associated with Backup jobs.
persistentvolumeclaim "backup-pvc-77c7f7f95" deleted
Data Import
In this section, we will demonstrate how to use the file created in previous sections to restore a database.
Note: If you want to restore from data generated elsewhere, it's important to note that the data files, metadata, and directory structure must be in
mydumper
format.
Prepare Destination Cluster
To simplify matters, we'll just re-laod this data into the cluster where we took the backup. The easiest way to do that is to drop the database(s) that were dumped. If you've been following the earlier parts of this tutorial, that's just the sbtest
database for sysbench data. The various system databases are not included in the dump and you should not try to drop them.
If you don't want to drop the data in your cluster, you can set up another cluster to restore the backup into.
Note: These commands will delete all data in your cluster!
select schema_name from information_schema.schemata where lower(schema_name) not in ('information_schema', 'performance_schema', 'mysql');
+-------------+
| schema_name |
+-------------+
| sbtest |
| test |
+-------------+
2 rows in set (0.00 sec)
MySQL [(none)]> drop database test;
Query OK, 0 rows affected (2.01 sec)
MySQL [(none)]> drop database sbtest;
Query OK, 0 rows affected (2.02 sec)
Configure Restore CR
You should replace values including <>
with the correct values for your envrionment and save as restore-from-s3.yaml
.
Note that ${backupPath}
should be the full path to the file created by the Backup job above. That must include the s3://
scheme, the bucket name, any prefix, and the full filename of the backup. You can find the backupPath by looking at the Backup job, as described above in Verify Data Export.
namespace=<namespace>
cluster_name=<cluster_name>
tidb_port=<tidb_port>
tidb_user=<tidb_user>
region=<region>
backupPath=<backupPath>
cat > restore-from-s3.yaml<<EOF
apiVersion: pingcap.com/v1alpha1
kind: Restore
metadata:
name: restore-from-s3
namespace: ${namespace}
spec:
backupType: full
to:
host: ${cluster_name}-tidb
port: ${tidb_port}
user: ${tidb_user}
secretName: export-secret
s3:
provider: aws
region: ${region}
secretName: s3-secret
path: ${backupPath}
storageClassName: ebs-gp2
storageSize: 100Gi
EOF
Perform Data Import
restore.pingcap.com/restore-from-s3 created
Verify Data Import
You can use the following command to check the import status:
NAME READY STATUS RESTARTS AGE
restore-restore-from-s3-5tg75 1/1 Running 0 22s
And you can view the logs of the Restore job using this command, adjusting the name of the job to match your .yaml file if necessary, if you want to follow along until the job is completed:
Create rclone.conf file.
/tidb-backup-manager import --namespace=poc --restoreName=restore-from-s3 --backupPath=s3://bucket/prefix/backup-2020-06-16T19:01:35Z.tgz
I0616 21:29:07.703681 1 restore.go:71] start to process restore poc/restore-from-s3
I0616 21:29:07.714557 1 restore_status_updater.go:66] Restore: [poc/restore-from-s3] updated successfully
I0616 21:29:23.326415 1 manager.go:152] download cluster poc/restore-from-s3 backup s3://bucket/prefix/backup-2020-06-16T19:01:35Z.tgz data success
I0616 21:30:06.668960 1 manager.go:168] unarchive cluster poc/restore-from-s3 backup /backup/bucket/prefix/backup-2020-06-16T19:01:35Z.tgz data success
I0616 21:33:59.380526 1 manager.go:183] restore cluster poc/restore-from-s3 from backup s3://bucket/prefix/backup-2020-06-16T19:01:35Z.tgz success
I0616 21:33:59.399894 1 restore_status_updater.go:66] Restore: [poc/restore-from-s3] updated successfully
Checksum Table After Import
+---------+------------+---------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+---------------------+-----------+-------------+
| sbtest | sbtest1 | 6011909912281260729 | 34496988 | 4327530401 |
+---------+------------+---------------------+-----------+-------------+
1 row in set (3.68 sec)
Now confirm that the two clusters have the same data.
Clean Up Restore Resources
Note that the Restore job, like the Backup job, creates a PVC (and, if necessary, a PV) resource in your Kubernetes cluster. This volume is used to temporarily hold the uncompressed backup archive as it's loaded into the cluster. If you don't want to keep this volume after the import completes, you should delete it.
Before deleting the PVC associated with the Restore, you must delete the Pod associated with the Restore. This command will delete all pods associated with Restore jobs.
pod "restore-restore-from-s3-5tg75" deleted
Use this command to view all PVCs associated with Backups:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
restore-pvc-77c7f7f95 Bound pvc-d1b30c24-a225-421e-8071-fab91e543f69 10Gi RWO gp2 3h35m
This command will delete all PVCs associated with Restore jobs.
persistentvolumeclaim "restore-pvc-77c7f7f95" deleted
Comments
0 comments
Please sign in to leave a comment.