gh-ost解析

6 minute read

gh—ost是github的dba开源一款使用go语言开发的MySQL在线改表工具,解决了目前采用pt-online-schema-change遇到的一些问题,思路也很新颖;作者很厉害,也是是openark kit工具集的作者(主要是用python写的一套工具集)。

为什么叫gh-ost而不是叫gh-osc,创造了一个新的缩写,gh-ost可以发音为ghost,t可以解读为Transmogrifier/Translator/Transformer/Transfigurator。

相关链接:

  1. github地址
  2. 作者介绍gh-ost的博客
  3. GitHub为MySQL社区贡献了新的在线更改表定义工具gh-ost

原理

抛弃了pt-online-schema-change使用trigger来同步增量数据的方法,而通过模拟slave获取row格式的binlog的方式来获取增量数据,具体的数据流图可以看下图。

数据流图

使用tirgger的方式有很多缺点,作者在triggerless-design里面做了详尽的阐述,对与我们在工作中遇到的具体问题主要是MDL锁的问题、死锁的问题,这两个问题在使用pt-online-schema-change的时候没有办法绕过去。我们线上在使用pt-osc的时候遇到过多起因为MDL锁和死锁导致的问题,具体的DML锁和死锁问题在其他文章里面做详尽的阐述。

因为trigger的方式,trigger触发的语句和被触发的语句在一个事务中;所以当时看到这种方式的第一反应就是会不会存在时序的问题(binlog和copy数据产生的时序问题);经过和朋友的讨论以及推理之后发现不会存在问题。

因为binlog中记录的是full image,所以binlog中的数据是最权威的,而且读取的binlog在应用的时候做了如下转化,而且copy old data是insert ignore,因此会以binlog的优先级为最高,因此不会有问题。

源类型目标类型
insertreplace
updateupdate
deletedelete

对与insert和update是没有问题的,因为无论copy old row和apply binlog的先后顺序,如果apply binlog在后,会覆盖掉copy old row,如果apply binlog在前面,copy old row因为使用insert ignore,因此会被ignore掉;

对与delete数据,我们可以演算一下,abc三个操作,可能存在三种情况(b肯定在a的后面):

  • a.delete old row
  • b.delete binlog apply
  • c.copy old row
  1. cab,c会将数据copy到ghost表,最后b会把ghost表中的数据delete掉;
  2. acb,c空操作,b也是空操作;
  3. abc,b空操作,c也是空操作;

详细过程

一、通过count(*)来获取这个表的准确数据量,或者explain来获取大概值,用于计算进度;通过exact-rowcount参数来决定,默认方式是通过explain。

--方法1:通过explain来获取大概的数据
explain select /* gh-ost */ * from %s.%s where 1=1;
--方法2:通过count(*)获取一个准确值
select /* gh-ost */ count(*) as rows from %s.%s;

二、通过order by分别获取最大值和最小值范围

SELECT /* gh-ost `test`.`test` */ `id` 
FROM   `test`.`test` 
ORDER  BY `id` ASC 
LIMIT  1; 

SELECT /* gh-ost `test`.`test` */ `id` 
FROM   `test`.`test` 
ORDER  BY `id` DESC 
LIMIT  1; 

三、通过一个下面的sql不断的探测是否还有数据需要copy

SELECT /* gh-ost `test`.`faredb_detail` iteration:25 */ `faredb_detail_id` 
FROM   (SELECT `faredb_detail_id` 
        FROM   `test`.`faredb_detail` 
        WHERE  (( `faredb_detail_id` > 118887 )) 
               AND ( ( `faredb_detail_id` < '105070846' ) 
                      OR (( `faredb_detail_id` = '105070846' )) ) 
        ORDER  BY `faredb_detail_id` ASC 
        LIMIT  1000) select_osc_chunk 
ORDER  BY `faredb_detail_id` DESC 
LIMIT  1; 

四、通过下面的SQL Copy原始表的数据

insert ignore into new_table select from old_table force index (`PRIMARY`) where  lock in share mode

五、通过go/binlog/gomysql_reader.go读取binlog,然后通过go/logic/applier.go将读取的binlog进行转化写入到ghost表中;

通过buildDMLEventQuery可以看到binlog的转化细节;

源类型目标类型
insertreplace
updateupdate
deletedelete

六、cut-over阶段(即rename表阶段),作者连续写了三篇blog来描述,比较复杂,有兴趣的可以阅读,1.Solving the Facebook-OSC non-atomic table swap problem2.Solving the non-atomic table swap, Take II3.Solving the non-atomic table swap, Take III: making it atomic;

六、详细可以参照下面的日志

2016-08-12 12:00:31 INFO starting gh-ost 1.0.8
2016-08-12 12:00:31 INFO Migrating `test`.`sbtest`
2016-08-12 12:00:31 INFO connection validated on test02:5002
2016-08-12 12:00:31 INFO User has ALL privileges
2016-08-12 12:00:31 INFO binary logs validated on test02:5002
2016-08-12 12:00:31 INFO Restarting replication on test02:5002 to make sure binlog settings apply to replication thread
2016-08-12 12:00:31 INFO Table found. Engine=InnoDB
2016-08-12 12:00:31 INFO Estimated number of rows via EXPLAIN: 9936
2016-08-12 12:00:31 INFO Master found to be test01:5002
2016-08-12 12:00:31 INFO connection validated on test02:5002
2016-08-12 12:00:31 INFO Registering replica at test02:5002
2016-08-12 12:00:31 INFO Connecting binlog streamer at mysql-bin.000023:954323927
2016-08-12 12:00:31 INFO rotate to next log name: mysql-bin.000023
2016-08-12 12:00:31 INFO connection validated on test01:5002
2016-08-12 12:00:31 INFO connection validated on test01:5002
2016-08-12 12:00:31 INFO Droppping table `test`.`_sbtest_gho`
2016-08-12 12:00:31 INFO Table dropped
2016-08-12 12:00:31 INFO Droppping table `test`.`_sbtest_ghc`
2016-08-12 12:00:31 INFO Table dropped
2016-08-12 12:00:31 INFO Creating changelog table `test`.`_sbtest_ghc`
2016-08-12 12:00:32 INFO Changelog table created
2016-08-12 12:00:32 INFO Creating ghost table `test`.`_sbtest_gho`
2016-08-12 12:00:32 INFO Ghost table created
2016-08-12 12:00:32 INFO Altering ghost table `test`.`_sbtest_gho`
2016-08-12 12:00:32 INFO Ghost table altered
2016-08-12 12:00:32 INFO Chosen shared unique key is PRIMARY
2016-08-12 12:00:32 INFO Shared columns are id,k,c,pad
2016-08-12 12:00:32 INFO Listening on unix socket file: /tmp/gh-ost.test.sbtest.sock
2016-08-12 12:00:32 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2016-08-12 12:00:32 INFO Exact number of rows via COUNT: 10000
2016-08-12 12:00:32 INFO Migration min values: [1]
2016-08-12 12:00:32 INFO Migration max values: [10000]
# Migrating `test`.`sbtest`; Ghost table is `test`.`_sbtest_gho`
# Migrating test01.dx.sankuai.com:5002; inspecting test02.dx.sankuai.com:5002; executing on dx-dba01.dx.sankuai.com
# Migration started at Fri Aug 12 12:00:31 +0800 2016
# chunk-size: 1000; max-lag-millis: 1500ms; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.test.sbtest.sock
Copy: 0/10000 0.0%; Applied: 0; Backlog: 0/100; Time: 0s(total), 0s(copy); streamer: mysql-bin.000023:954326679; ETA: N/A
Copy: 0/10000 0.0%; Applied: 0; Backlog: 0/100; Time: 1s(total), 1s(copy); streamer: mysql-bin.000023:954328041; ETA: N/A
2016-08-12 12:00:33 INFO Row copy complete
Copy: 10000/10000 100.0%; Applied: 0; Backlog: 0/100; Time: 1s(total), 1s(copy); streamer: mysql-bin.000023:954831040; ETA: 0s
2016-08-12 12:00:33 INFO Grabbing voluntary lock: gh-ost.84092.lock
2016-08-12 12:00:33 INFO Setting LOCK timeout as 6 seconds
2016-08-12 12:00:33 INFO Looking for magic cut-over table
2016-08-12 12:00:33 INFO Creating magic cut-over table `test`.`_sbtest_del`
2016-08-12 12:00:33 INFO Magic cut-over table created
2016-08-12 12:00:33 INFO Locking `test`.`sbtest`, `test`.`_sbtest_del`
2016-08-12 12:00:33 INFO Tables locked
2016-08-12 12:00:33 INFO Session locking original & magic tables is 84092
2016-08-12 12:00:33 INFO Writing changelog state: AllEventsUpToLockProcessed
2016-08-12 12:00:33 INFO Waiting for events up to lock
Copy: 10000/10000 100.0%; Applied: 0; Backlog: 1/100; Time: 2s(total), 1s(copy); streamer: mysql-bin.000023:954959331; ETA: 0s
2016-08-12 12:00:34 INFO Done waiting for events up to lock; duration=971.748469ms
# Migrating `test`.`sbtest`; Ghost table is `test`.`_sbtest_gho`
# Migrating test01.dx.sankuai.com:5002; inspecting test02.dx.sankuai.com:5002; executing on dx-dba01.dx.sankuai.com
# Migration started at Fri Aug 12 12:00:31 +0800 2016
# chunk-size: 1000; max-lag-millis: 1500ms; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.test.sbtest.sock
Copy: 10000/10000 100.0%; Applied: 0; Backlog: 0/100; Time: 2s(total), 1s(copy); streamer: mysql-bin.000023:954960101; ETA: 0s
2016-08-12 12:00:34 INFO Setting RENAME timeout as 3 seconds
2016-08-12 12:00:34 INFO Session renaming tables is 84090
2016-08-12 12:00:34 INFO Issuing and expecting this to block: rename /* gh-ost */ table `test`.`sbtest` to `test`.`_sbtest_del`, `test`.`_sbtest_gho` to `test`.`sbtest`
2016-08-12 12:00:34 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2016-08-12 12:00:34 INFO Checking session lock: gh-ost.84092.lock
2016-08-12 12:00:34 INFO Connection holding lock on original table still exists
2016-08-12 12:00:34 INFO Will now proceed to drop magic table and unlock tables
2016-08-12 12:00:34 INFO Dropping magic cut-over table
2016-08-12 12:00:34 INFO Releasing lock from `test`.`sbtest`, `test`.`_sbtest_del`
2016-08-12 12:00:34 INFO Tables unlocked
2016-08-12 12:00:34 INFO Tables renamed
2016-08-12 12:00:34 INFO Lock & rename duration: 981.435405ms. During this time, queries on `sbtest` were blocked
2016-08-12 12:00:34 INFO Looking for magic cut-over table
2016-08-12 12:00:34 INFO Droppping table `test`.`_sbtest_ghc`
2016-08-12 12:00:34 INFO Table dropped
2016-08-12 12:00:34 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2016-08-12 12:00:34 INFO -- drop table `test`.`_sbtest_del`
2016-08-12 12:00:34 INFO Done migrating `test`.`sbtest`
2016-08-12 12:00:34 INFO Done

重要的一些选项以及典型使用方式

重要的一些选项:

-critical-load --max-load
        Comma delimited status-name=threshold, same format as --max-load. When status exceeds threshold, app panics and quits

改表完成之后是否删除老表。        
-ok-to-drop-table
        Shall the tool drop the old table at end of operation. DROPping tables can be a long locking operation, which is why I'm not doing it by default. I'm an online tool, yes?     
        
如果执行之前发现old-table,删除还是终止?默认终止。
-initially-drop-old-table
        Drop a possibly existing OLD table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists           

port选项仅对应的是host吗?其他地方都需要设置全
-port int
        MySQL port (preferably a replica, not the master) (default 3306)

因为gh-ost没有提供recursion-method=processlist方法,因此需要通过throttle-control-replicas指定所有的需要检查的slave,并且注意上面的port仅仅对应于host选项,因此需要host:port的方式来写全称
-throttle-control-replicas string
    	List of replicas on which to check for lag; comma delimited. Example: myhost1.com:3306,myhost2.com,myhost3.com:3307
    	
拷贝每个chunk之后sleep的时间=nice-ratio*copy-chunk-time,默认值是0,表示不sleep
-nice-ratio float
    	force being 'nice', imply sleep time per chunk time; range: [0.0..100.0]. Example values: 0 is aggressive. 1.5: for every ms spend in a rowcopy chunk, spend 1.5ms sleeping immediately after    
  
默认是不删除socket文件的,这样当第二次运行的时候报错,提示socket文件已经存在 
-initially-drop-socket-file
    	Should gh-ost forcibly delete an existing socket file. Be careful: this might drop the socket file of a running migration!    	
-exact-rowcount
    	actually count table rows as opposed to estimate them (results in more accurate progress estimation)	

典型使用方式:

./gh-ost \
--max-load=Threads_running=25 \
--critical-load=Threads_running=64 \
--chunk-size=1000 \
--throttle-control-replicas="test02:3306" \
--max-lag-millis=1500 \
--initially-drop-old-table \
--initially-drop-ghost-table \
--initially-drop-socket-file \
--ok-to-drop-table \
--conf="./my.cnf" \
--host="test02" \
--port=3306 \
--user="admin" \
--password="admin" \
--database="test" \
--table="test" \
--verbose \
--alter="drop index id1" \
--switch-to-rbr \
--allow-master-master \
--cut-over=default \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
--execute

发现的一些不足:

  1. 相比pt-osc缺少–check-interval参数;目前这个值写死是1s;issue地址默认是1s,足够小了,作者认为当前可以;
  2. –conf选项设置的my.cnf文件中不支持设置prompt=[\h]\u@\d\r:\m:\s>;issue地址,作者已经修复;
  3. 缺少增加unique index的时候的检查;如果增加unique index的时候会丢失数据;(pt-osc也存在这个问题);作者认为不需要修复,问题的解决难度也比较大,DBA在使用前需要注意;
  4. 相比pt-osc缺少–check-replication-filters;是否确实对从库是否存在这个表的检查
  5. 相比pt-osc缺少–recursion-method=processlist;需要通过参数进行设置,issue地址

问题:

1.相比pt-osc缺乏–check-interval参数,那么检查的频率是多少呢?

~~~go // Migrate executes the complete migration logic. This is the major gh-ost function. func (this *Migrator) Migrate() (err error) { //在这个函数中进行load和slave lag的检查; go this.initiateThrottler() }

//通过这个函数可以很清晰的看到,间隔是每秒钟检查一次load和slave lag;通过time.Tick(1 * time.Second)创建一个定时器,每秒钟往管道throttlerTick中写一个值,然后再通过for range从管道中读取; // initiateThrottler initiates the throttle ticker and sets the basic behavior of throttling. func (this *Migrator) initiateThrottler() error { throttlerTick := time.Tick(1 * time.Second)

throttlerFunction := func() {
	alreadyThrottling, currentReason := this.migrationContext.IsThrottled()
	shouldThrottle, throttleReason := this.shouldThrottle()
	if shouldThrottle && !alreadyThrottling {
		// New throttling
		this.applier.WriteAndLogChangelog("throttle", throttleReason)
	} else if shouldThrottle && alreadyThrottling && (currentReason != throttleReason) {
		// Change of reason
		this.applier.WriteAndLogChangelog("throttle", throttleReason)
	} else if alreadyThrottling && !shouldThrottle {
		// End of throttling
		this.applier.WriteAndLogChangelog("throttle", "done throttling")
	}
	this.migrationContext.SetThrottled(shouldThrottle, throttleReason)
}
throttlerFunction()
for range throttlerTick {
	throttlerFunction()
}

return nil }

~~~