MySQL Compatibility
- Objective: Learn compatibility differences between TiDB and MySQL
- Prerequisites:
- Background knowledge of TiDB components and architecture
- Background knowledge of MySQL
- Optionality: Optional
- Estimated time: 15 mins
TiDB aims for compatibility with MySQL 5.7, in terms of wire/client protocol, SQL syntax, and many behaviors. There are, however, many exceptions to this that are important to understand.
You should carefully review the MySQL Compatibility page in the documentation to see whether there are some clear issues that might affect your application.
This document aims to help you understand a few common issues that users encounter when assessing and executing a migration from MySQL to TiDB.
TiDB is cluster-native
TiDB is a clustered database platform, which means that transactions can be executing independently on many different front-end TiDB Server instances, against many different back-end TiKV Server instances. Hence, commands and behaviors that you might expect on a standalone MySQL instance will be different when using TiDB. Here are some examples:
SHOW PROCESSLIST
works only on a single TiDB serverUse
information_schema.cluster_processlist
to see all transactions across the whole cluster.MySQL [(none)]> select * from information_schema.cluster_processlist\G *************************** 1. row *************************** INSTANCE: poc-cluster-tidb-1.poc-cluster-tidb-peer.tidb-cluster.svc:10080 ID: 1249 USER: root HOST: 10.0.68.251 DB: NULL COMMAND: Query TIME: 0 STATE: autocommit INFO: select * from information_schema.cluster_processlist MEM: 0 TxnStart: 04-29 23:56:57.365(416338311214530561) *************************** 2. row *************************** INSTANCE: poc-cluster-tidb-1.poc-cluster-tidb-peer.tidb-cluster.svc:10080 ID: 1048 USER: root HOST: 10.0.68.251 DB: test COMMAND: Query TIME: 1 STATE: autocommit INFO: select sleep(30) MEM: 0 TxnStart: *************************** 3. row *************************** INSTANCE: poc-cluster-tidb-0.poc-cluster-tidb-peer.tidb-cluster.svc:10080 ID: 458 USER: root HOST: 10.0.68.251 DB: test COMMAND: Sleep TIME: 123 STATE: in transaction INFO: NULL MEM: 0 TxnStart: 04-29 23:54:50.015(416338277830492161) 3 rows in set (0.01 sec)
KILL
requires an additionalTIDB
keyword, because you could accidentally kill a query on the wrong TiDB Server instanceBecause DDL runs on the TiKV nodes, you must use
ADMIN SHOW DDL JOBS
to view DDL jobs andADMIN CANCEL DDL JOBS
to kill themtidb 1> alter table sbtest1 add key(c) tidb 2> admin show ddl\G *************************** 1. row *************************** SCHEMA_VER: 134 OWNER_ID: 9864057a-a2ee-428d-9678-e22c3628f3e0 OWNER_ADDRESS: poc-cluster-tidb-1.poc-cluster-tidb-peer.tidb-cluster.svc:4000 RUNNING_JOBS: SELF_ID: 9864057a-a2ee-428d-9678-e22c3628f3e0 QUERY: 1 row in set (0.01 sec) tidb 2> admin cancel ddl jobs 153; +--------+------------+ | JOB_ID | RESULT | +--------+------------+ | 153 | successful | +--------+------------+ 1 row in set (0.01 sec) (tidb 1) ERROR 8214 (HY000): Cancelled DDL job
Things like
GET_LOCK()
,LOCK TABLE
don't work like you'd expect in MySQLThe TiDB slow query log is written separately by each instance of TiDB Server, so you need to aggregate those logs together when debugging slow queries.
AUTO_INCREMENT & AUTO_RANDOM
AUTO_INCREMENT in TiDB can have surprising gaps. Each TiDB Server instance is allocated a range of auto-increment values. Inserts into multiple different TiDB Server instances have monotonically increasing values for a single server instance, but not across the entire cluster:
$ mysql -e 'insert into test.t1 () values (); select @@hostname, last_insert_id();' +--------------------+------------------+ | @@hostname | last_insert_id() | +--------------------+------------------+ | poc-cluster-tidb-0 | 1 | +--------------------+------------------+ $ mysql -e 'insert into test.t1 () values (); select @@hostname, last_insert_id();' +--------------------+------------------+ | @@hostname | last_insert_id() | +--------------------+------------------+ | poc-cluster-tidb-1 | 30001 | +--------------------+------------------+ $ mysql -e 'insert into test.t1 () values (); select @@hostname, last_insert_id();' +--------------------+------------------+ | @@hostname | last_insert_id() | +--------------------+------------------+ | poc-cluster-tidb-0 | 2 | +--------------------+------------------+ $ mysql -e 'insert into test.t1 () values (); select @@hostname, last_insert_id();' +--------------------+------------------+ | @@hostname | last_insert_id() | +--------------------+------------------+ | poc-cluster-tidb-1 | 30002 | +--------------------+------------------+
More problematic is the tendency for auto-increment behavior to create a "hot zone" in TiKV for insert-heavy workloads, because all writes go to the same region.
TiDB has experimental support for
AUTO_RANDOM
. This causes values across the range of a data type to be used instead of values that all cluster in a small window. If you have workloads that make heavy use of massively parallel inserts into auto-increment columns, you may wish to plan on testing auto-random.Note: The
allow-auto-random
option must be enabled in TiDB Server:
DDL
DDL operations in a distributed database are more complex than in a standalone database system because schema changes must be synchronized across multiple nodes. Fortunately, TiDB has adopted the same approach described in Online, Asynchronous Schema Change in F1. This affords non-blocking DDL that does not block reads or writes even when adding or removing columns from a table.
One important DDL behavior differences between MySQL and TiDB that may affect your application:
Most compound (multiple operation) ALTER TABLE statements are not supported
tidb> alter table t1 add key(a), add key(b); ERROR 8200 (HY000): Unsupported multi schema change
If your application relies on setup, upgrade, or migration scripts that use this type of compound ALTER TABLE
statement, those will need to be modified to work with TiDB.
Transactions & Locking
TiDB now defaults to a pessimistic locking model, which means most behaviors are similar to MySQL/InnoDB, with some important caveats: https://pingcap.com/docs/stable/reference/transactions/transaction-pessimistic/#difference-with-mysql-innodb
Older versions of TiDB supported only optimistic locking, which is still available as an option. When using optimistic locking, you get deferred constraint checking and no deadlocks.
Here's a scenario using pessimistic locking that shows deadlock detection in action:
tidb 1> create table t1 (id int unsigned not null auto_increment primary key);
Query OK, 0 rows affected (1.01 sec)
tidb 1> create table t2 (id int unsigned not null auto_increment primary key);
Query OK, 0 rows affected (1.02 sec)
tidb 1> select @@hostname;
+--------------------+
| @@hostname |
+--------------------+
| poc-cluster-tidb-0 |
+--------------------+
1 row in set (0.00 sec)
tidb 1> set tidb_txn_mode=pessimistic;
Query OK, 0 rows affected (0.00 sec)
tidb 1> begin;
Query OK, 0 rows affected (0.00 sec)
tidb 1> insert into t1 (id) values (1);
Query OK, 1 row affected (0.02 sec)
tidb 2> select @@hostname;
+--------------------+
| @@hostname |
+--------------------+
| poc-cluster-tidb-1 |
+--------------------+
1 row in set (0.00 sec)
tidb 2> set tidb_txn_mode=pessimistic;
Query OK, 0 rows affected (0.00 sec)
tidb 2> begin;
Query OK, 0 rows affected (0.00 sec)
tidb 2> insert into t2 (id) values (1);
Query OK, 1 row affected (0.01 sec)
tidb 1> insert into t2 (id) values (1);
Query OK, 1 row affected (8.32 sec)
<blocks>
tidb 2> insert into t1 (id) values (1);
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
And here's the same scenario using optimistic locking, showing that there's no deadlock and that conflicts are resolved only at commit time:
tidb 1> set tidb_txn_mode=optimistic;
Query OK, 0 rows affected (0.00 sec)
tidb 1> begin;
Query OK, 0 rows affected (0.00 sec)
tidb 1> insert into t1 (id) values (1);
Query OK, 1 row affected (0.01 sec)
tidb 2> set tidb_txn_mode=pessimistic;
Query OK, 0 rows affected (0.00 sec)
tidb 2> begin;
Query OK, 0 rows affected (0.00 sec)
tidb 2> insert into t2 (id) values (1);
Query OK, 1 row affected (0.02 sec)
tidb 1> insert into t2 (id) values (1);
Query OK, 1 row affected (0.01 sec)
tidb 2> insert into t1 (id) values (1);
Query OK, 1 row affected (0.01 sec)
tidb 1> commit;
<blocks>
tidb 2> commit;
Query OK, 0 rows affected (0.00 sec)
(tidb 1)
ERROR 9007 (HY000): Write conflict, txnStartTS=416337970833653761, conflictStartTS=416337972537589761, conflictCommitTS=416337975893032964, key={tableID=144, handle=1} primary={tableID=144, handle=1} [try again later]
TiDB Supports only "Snapshot Isolation" (SI), which is similar to MySQL's Repeatable Read but has some important differences: https://pingcap.com/docs/stable/reference/transactions/transaction-isolation/
Character Sets and Collations
TiDB supports single-byte (binary, latin1, ascii) character sets as well as utf8 (which is an alias for utf8mb4) and utf8mb4.
Until version 4.0, TiDB only supported "binary" collations. Starting with version 4.0, TiDB supports utf8_general_ci and utf8mb4_general_ci collations as part of a new framework for collations.
Note: A decision must be made at the time of cluster creation whether to enable the new collation model. This is to avoid backward compatibility problems.
To use the new collation model, you must set the
new_collations_enabled_on_first_bootstrap
configuration item before bootstrapping the cluster.
If the old collation model is in place, collation clauses will generally be ignored and binary collations will be used implicitly. You can check whether the new collation model is in use by consulting the mysql.tidb
table.
With new collation model disabled:
tidb> select variable_value from mysql.tidb
where variable_name = 'new_collation_enabled';
+----------------+
| VARIABLE_VALUE |
+----------------+
| False |
+----------------+
1 row in set (0.00 sec)
tidb> create table t1 (id int unsigned not null auto_increment primary key,
v varchar(32) character set utf8 collate utf8_unicode_ci);
Query OK, 0 rows affected (0.11 sec)
With new collation model enabled:
tidb> select variable_value from mysql.tidb
where variable_name = 'new_collation_enabled';
+----------------+
| VARIABLE_VALUE |
+----------------+
| True |
+----------------+
1 row in set (0.00 sec)
tidb> create table t1 (id int unsigned not null auto_increment primary key,
v varchar(32) character set utf8 collate utf8_unicode_ci);
ERROR 1273 (HY000): Unsupported collation when new collation is enabled: 'utf8_unicode_ci'
Partitioning
There's no need to use explicit partitioning in TiDB for data to be split into multiple regions and distributed across multiple TiKV instances.
However, TiDB does support explicit partitioning for cases where the volume or distribution of data is already well-known and the overhead of region splitting can be avoided.
TiDB also has a SPLIT REGION
command that can be used to approach this issue from a somewhat different angle.
SQL Syntax
TiDB accepts but ignores a lot of MySQL-compatible SQL syntax. We already discussed collations above, but here are some other important items that are accepted by the parser but ignored:
- Foreign Key constraints
ENGINE
clauses
SQL Modes & optimizer hints
- TiDB supports many/most SQL modes from MySQL 5.7
EXPLAIN
TiDB uses a different EXPLAIN
output format than MySQL. It takes some getting used to, but EXPLAIN
in TiDB offers a lot more information than EXPLAIN
in MySQL or MariaDB, especially information related to pushdown of work to the "coprocessor", the name used in TiDB Server for the TiKV backend.
tidb> explain select c from sbtest1 where k between 1000 and 10000;
+---------------------------+-------------+-----------+---------------+---------------------------------------------------------+
| id | estRows | task | access object | operator info |
+---------------------------+-------------+-----------+---------------+---------------------------------------------------------+
| Projection_4 | 74467098.77 | root | | sbtest.sbtest1.c |
| └─TableReader_7 | 74467098.77 | root | | data:Selection_6 |
| └─Selection_6 | 74467098.77 | cop[tikv] | | ge(sbtest.sbtest1.k, 1000), le(sbtest.sbtest1.k, 10000) |
| └─TableFullScan_5 | 98494812.00 | cop[tikv] | table:sbtest1 | keep order:false |
+---------------------------+-------------+-----------+---------------+---------------------------------------------------------+
4 rows in set (0.00 sec)
mysql> explain select c from sbtest1 where k between 1000 and 10000;
+------+-------------+---------+-------+---------------+------+---------+------+-------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+-------+---------------+------+---------+------+-------+-----------------------+
| 1 | SIMPLE | sbtest1 | range | k_1 | k_1 | 4 | NULL | 98640 | Using index condition |
+------+-------------+---------+-------+---------------+------+---------+------+-------+-----------------------+
1 row in set (0.00 sec)
Comments
0 comments
Article is closed for comments.