Data synchronize between MySQL and Elasticsearch using Canal
There are many ways to sync data: synchronous dual writes or asynchronous syncing. We definitely won’t use dual writes, because they write to both MySQL and ES simultaneously, which not only impacts performance but also involves distributed transactions, making it hard to ensure data consistency. Additionally, this tightly couples the business logic, making future scaling difficult—so we’ll pass on that.
As for asynchronous data synchronization, there are several popular tools in the market like Alibaba’s Canal and Debezium. Both use CDC (Change Data Capture) to listen to binlog logs. Since Debezium requires Kafka integration and writing Kafka consumers manually, the system becomes more complex. Therefore, we opt for Alibaba’s Canal to handle data synchronization.
1.1 Master-Slave Replication Principle
MySQL’s master-slave replication is based on binlog, which records all changes in MySQL and saves them as binary log files.
Replication works by transferring the binlog data from the master to the slave, typically in asynchronous mode, meaning the master’s operations do not wait for the binlog to be synchronized.
Process:
- Master writes binlog: SQL updates (INSERT, UPDATE, DELETE) are written to the binlog.
- Master sends binlog: The master creates a log dump thread to send binlog to the slave.
- Slave writes relay log: The slave creates an I/O thread that receives the binlog and writes it to a relay log.
- Slave replays: The slave’s SQL thread reads the relay log and replays the changes to achieve consistency.
1.2 Canal Basics
Canal is a commonly used data synchronization tool. It simulates a MySQL slave, subscribes to binlog logs, and implements CDC (Change Data Capture) by converting the raw byte stream into JSON format.
Workflow:
- Canal server sends a dump protocol request to MySQL’s master.
- The master responds by pushing binlog logs to the Canal server.
- Canal server parses the logs and transforms them into JSON.
- Canal client (via TCP or MQ) listens to these logs and syncs the data to ES.