Skip to content

Commit aae4932

Browse files
committed
MDEV-12012/MDEV-11969 Can't remove GTIDs for a stale GTID Domain ID
As reported in MDEV-11969 "there's no way to ditch knowledge" about some domain that is no longer updated on a server. Besides being of annoyance to clutter output in DBA console stale domains can prevent the slave to connect the master as MDEV-12012 witnesses. What domain is obsolete must be evaluated by the user (DBA) according to whether the domain info is still relevant and will the domain ever receive any update. This patch introduces a method to discard obsolete gtid domains from the server binlog state. The removal requires no event group from such domain present in existing binlog files though. If there are any the containing logs must be first PURGEd in order for FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains) succeed. Otherwise the command returns an error. The list of obsolete domains can be computed through intersecting two sets - the earliest (first) binlog's Gtid_list and the current value of @@global.gtid_binlog_state - and extracting the domain id components from the intersection list items. The new DELETE_DOMAIN_ID featured FLUSH continues to rotate binlog omitting the deleted domains from the active binlog file's Gtid_list. Notice though when the command is ineffective - that none of requested to delete domain exists in the binlog state - rotation does not occur. Obsolete domain deletion is not harmful for connected slaves as long as master side binlog files *purge* is synchronized with FLUSH-DELETE_DOMAIN_ID. The slaves must have the last event from purged files processed as usual, in order not to bump later into requesting a gtid from a file which was already gone. While the command is not replicated (as ordinary FLUSH BINLOG LOGS is) slaves, even though having extra domains, won't suffer from reconnection errors thanks to master-slave gtid connection protocol allowing the master to be ignorant about a gtid domain. Should at failover such slave to be promoted into master role it may run the ex-master's FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains) to clean its own binlog state. NOTES. suite/perfschema/r/start_server_low_digest.result is re-recorded as consequence of internal parser codes changes.
1 parent 7e1326c commit aae4932

20 files changed

+772
-77
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# ==== Purpose ====
2+
#
3+
# Extract Gtid_list info from SHOW BINLOG EVENTS output masking
4+
# non-deterministic fields.
5+
#
6+
# ==== Usage ====
7+
#
8+
# [--let $binlog_file=filename
9+
#
10+
if ($binlog_file)
11+
{
12+
--let $_in_binlog_file=in '$binlog_file'
13+
}
14+
--replace_column 2 # 5 #
15+
--eval show binlog events $_in_binlog_file limit 1,1
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
RESET MASTER;
2+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = ();
3+
and the command execution is effective thence rotates binlog as usual
4+
show binary logs;
5+
Log_name File_size
6+
master-bin.000001 #
7+
master-bin.000002 #
8+
Non-existed domain is warned, the command completes without rotation
9+
but with a warning
10+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (99);
11+
Warnings:
12+
Warning 1982 The gtid domain being deleted ('99') is not in the current binlog state
13+
show binary logs;
14+
Log_name File_size
15+
master-bin.000001 #
16+
master-bin.000002 #
17+
SET @@SESSION.gtid_domain_id=1;
18+
SET @@SESSION.server_id=1;
19+
CREATE TABLE t (a int);
20+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
21+
ERROR HY000: Could not delete gtid domain. Reason: binlog files may contain gtids from the domain ('1') being deleted. Make sure to first purge those files.
22+
FLUSH BINARY LOGS;
23+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
24+
ERROR HY000: Could not delete gtid domain. Reason: binlog files may contain gtids from the domain ('1') being deleted. Make sure to first purge those files.
25+
PURGE BINARY LOGS TO 'master-bin.000003';;
26+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
27+
Gtid_list of the current binlog does not contain '1':
28+
show binlog events in 'master-bin.000004' limit 1,1;
29+
Log_name Pos Event_type Server_id End_log_pos Info
30+
master-bin.000004 # Gtid_list 1 # []
31+
But the previous log's Gtid_list may have it which explains a warning from the following command
32+
show binlog events in 'master-bin.000003' limit 1,1;
33+
Log_name Pos Event_type Server_id End_log_pos Info
34+
master-bin.000003 # Gtid_list 1 # [1-1-1]
35+
Already deleted domain in Gtid_list of the earliest log is benign
36+
but may cause a warning
37+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
38+
Warnings:
39+
Warning 1982 The current gtid binlog state is incompatible with a former one missing gtids from the '1-1' domain-server pair which is referred to in the gtid list describing an earlier state. Ignore if the domain ('1') was already explicitly deleted.
40+
Warning 1982 The gtid domain being deleted ('1') is not in the current binlog state
41+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 0);
42+
ERROR HY000: Could not delete gtid domain. Reason: binlog files may contain gtids from the domain ('1') being deleted. Make sure to first purge those files.
43+
FLUSH BINARY LOGS;
44+
PURGE BINARY LOGS TO 'master-bin.000005';
45+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 0);
46+
Warnings:
47+
Warning 1982 The gtid domain being deleted ('0') is not in the current binlog state
48+
Gtid_list of the current binlog does not contain 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 0:
49+
show binlog events in 'master-bin.000006' limit 1,1;
50+
Log_name Pos Event_type Server_id End_log_pos Info
51+
master-bin.000006 # Gtid_list 1 # []
52+
SET @@SESSION.gtid_domain_id=1;;
53+
SET @@SESSION.server_id=1;
54+
SET @@SESSION.gtid_seq_no=1;
55+
INSERT INTO t SET a=1;
56+
SET @@SESSION.server_id=2;
57+
SET @@SESSION.gtid_seq_no=2;
58+
INSERT INTO t SET a=2;
59+
SET @@SESSION.gtid_domain_id=11;
60+
SET @@SESSION.server_id=11;
61+
SET @@SESSION.gtid_seq_no=11;
62+
INSERT INTO t SET a=11;
63+
SET @gtid_binlog_state_saved=@@GLOBAL.gtid_binlog_state;
64+
FLUSH BINARY LOGS;
65+
SET @@SESSION.gtid_domain_id=11;
66+
SET @@SESSION.server_id=11;
67+
SET @@SESSION.gtid_seq_no=1;
68+
INSERT INTO t SET a=1;
69+
SELECT @gtid_binlog_state_saved "as original state", @@GLOBAL.gtid_binlog_state as "out of order for 11 domain state";
70+
as original state out of order for 11 domain state
71+
1-1-1,1-2-2,11-11-11 1-1-1,1-2-2,11-11-1
72+
PURGE BINARY LOGS TO 'master-bin.000007';
73+
the following command succeeds with warnings
74+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
75+
Warnings:
76+
Warning 1982 The current gtid binlog state is incompatible with a former one having a gtid '11-11-1' which is less than the '11-11-11' of the gtid list describing an earlier state. The state may have been affected by manually injecting a lower sequence number gtid or via replication.
77+
DROP TABLE t;
78+
RESET MASTER;
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
SET @@SESSION.debug_dbug='+d,inject_binlog_delete_domain_init_error';
2+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (99);
3+
ERROR HY000: Could not delete gtid domain. Reason: injected error.
4+
SHOW WARNINGS;
5+
Level Code Message
6+
Error 1982 Could not delete gtid domain. Reason: injected error.
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Prove basic properties of
2+
#
3+
# FLUSH BINARY LOGS DELETE_DOMAIN_ID = (...)
4+
#
5+
# The command removes the supplied list of domains from the current
6+
# @@global.gtid_binlog_state provided the binlog files do not contain
7+
# events from such domains.
8+
9+
# The test is not format specific. One format is chosen to run it.
10+
--source include/have_binlog_format_mixed.inc
11+
12+
# Reset binlog state
13+
RESET MASTER;
14+
15+
# Empty list is accepted
16+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = ();
17+
--echo and the command execution is effective thence rotates binlog as usual
18+
--source include/show_binary_logs.inc
19+
20+
--echo Non-existed domain is warned, the command completes without rotation
21+
--echo but with a warning
22+
--let $binlog_pre_flush=query_get_value(SHOW MASTER STATUS, Position, 1)
23+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (99);
24+
--let $binlog_start=$binlog_pre_flush
25+
--source include/show_binary_logs.inc
26+
27+
# Log one event in a specified domain and try to delete the domain
28+
SET @@SESSION.gtid_domain_id=1;
29+
SET @@SESSION.server_id=1;
30+
CREATE TABLE t (a int);
31+
32+
--error ER_BINLOG_CANT_DELETE_GTID_DOMAIN
33+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
34+
35+
# the same error after log rotation
36+
FLUSH BINARY LOGS;
37+
--error ER_BINLOG_CANT_DELETE_GTID_DOMAIN
38+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
39+
40+
# the latest binlog does not really contain any events incl ones from 1-domain
41+
--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
42+
--eval PURGE BINARY LOGS TO '$purge_to_binlog';
43+
# So now it's safe to delete
44+
--error 0
45+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
46+
--echo Gtid_list of the current binlog does not contain '1':
47+
--let $binlog_file=query_get_value(SHOW MASTER STATUS, File, 1)
48+
--source include/show_gtid_list.inc
49+
--echo But the previous log's Gtid_list may have it which explains a warning from the following command
50+
--let $binlog_file=$purge_to_binlog
51+
--source include/show_gtid_list.inc
52+
53+
--echo Already deleted domain in Gtid_list of the earliest log is benign
54+
--echo but may cause a warning
55+
--error 0
56+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (1);
57+
58+
# Few domains delete. The chosen number verifies among others how
59+
# expected overrun of the static buffers of underlying dynamic arrays is doing.
60+
--let $domain_cnt=17
61+
--let $server_in_domain_cnt=3
62+
--let $domain_list=
63+
--disable_query_log
64+
while ($domain_cnt)
65+
{
66+
--let servers=$server_in_domain_cnt
67+
--eval SET @@SESSION.gtid_domain_id=$domain_cnt
68+
while ($servers)
69+
{
70+
--eval SET @@SESSION.server_id=10*$domain_cnt + $servers
71+
--eval INSERT INTO t SET a=@@SESSION.server_id
72+
73+
--dec $servers
74+
}
75+
--let $domain_list= $domain_cnt, $domain_list
76+
77+
--dec $domain_cnt
78+
}
79+
--enable_query_log
80+
--let $zero=0
81+
--let $domain_list= $domain_list$zero
82+
83+
--error ER_BINLOG_CANT_DELETE_GTID_DOMAIN
84+
--eval FLUSH BINARY LOGS DELETE_DOMAIN_ID = ($domain_list)
85+
86+
# Now satisfy the safety condtion to purge log files containing $domain list
87+
FLUSH BINARY LOGS;
88+
--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
89+
--eval PURGE BINARY LOGS TO '$purge_to_binlog'
90+
--error 0
91+
--eval FLUSH BINARY LOGS DELETE_DOMAIN_ID = ($domain_list)
92+
--echo Gtid_list of the current binlog does not contain $domain_list:
93+
--let $binlog_file=query_get_value(SHOW MASTER STATUS, File, 1)
94+
--source include/show_gtid_list.inc
95+
96+
# Show reaction on @@global.gtid_binlog_state not succeeding
97+
# earlier state as described by the 1st binlog' Gtid_list.
98+
# Now let it be out-order gtid logged to a domain unrelated to deletion.
99+
100+
--let $del_d_id=1
101+
--eval SET @@SESSION.gtid_domain_id=$del_d_id;
102+
SET @@SESSION.server_id=1;
103+
SET @@SESSION.gtid_seq_no=1;
104+
INSERT INTO t SET a=1;
105+
SET @@SESSION.server_id=2;
106+
SET @@SESSION.gtid_seq_no=2;
107+
INSERT INTO t SET a=2;
108+
109+
SET @@SESSION.gtid_domain_id=11;
110+
SET @@SESSION.server_id=11;
111+
SET @@SESSION.gtid_seq_no=11;
112+
INSERT INTO t SET a=11;
113+
114+
SET @gtid_binlog_state_saved=@@GLOBAL.gtid_binlog_state;
115+
FLUSH BINARY LOGS;
116+
117+
# Inject out of order for domain '11' before
118+
SET @@SESSION.gtid_domain_id=11;
119+
SET @@SESSION.server_id=11;
120+
SET @@SESSION.gtid_seq_no=1;
121+
INSERT INTO t SET a=1;
122+
123+
SELECT @gtid_binlog_state_saved "as original state", @@GLOBAL.gtid_binlog_state as "out of order for 11 domain state";
124+
125+
# to delete '1', first to purge logs containing its events
126+
--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
127+
--eval PURGE BINARY LOGS TO '$purge_to_binlog'
128+
129+
--echo the following command succeeds with warnings
130+
--eval FLUSH BINARY LOGS DELETE_DOMAIN_ID = ($del_d_id)
131+
132+
#
133+
# Cleanup
134+
#
135+
136+
DROP TABLE t;
137+
RESET MASTER;
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Check "internal" error branches of
2+
# FLUSH BINARY LOGS DELETE_DOMAIN_ID = (...)
3+
# handler.
4+
--source include/have_debug.inc
5+
--source include/have_binlog_format_mixed.inc
6+
7+
SET @@SESSION.debug_dbug='+d,inject_binlog_delete_domain_init_error';
8+
--error ER_BINLOG_CANT_DELETE_GTID_DOMAIN
9+
FLUSH BINARY LOGS DELETE_DOMAIN_ID = (99);
10+
11+
SHOW WARNINGS;

mysql-test/suite/perfschema/r/start_server_low_digest.result

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,5 @@ SELECT 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1
88
####################################
99
SELECT event_name, digest, digest_text, sql_text FROM events_statements_history_long;
1010
event_name digest digest_text sql_text
11-
statement/sql/truncatee1c917a43f978456fab15240f89372caTRUNCATE TABLE truncate table events_statements_history_long
12-
statement/sql/select3f7ca34376814d0e985337bd588b5ffdSELECT ? + ? + SELECT 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1
11+
statement/sql/truncate6206ac02a54d832f55015e480e6f2213TRUNCATE TABLE truncate table events_statements_history_long
12+
statement/sql/select4cc1c447d79877c4e8df0423fd0cde9aSELECT ? + ? + SELECT 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
include/master-slave.inc
2+
[connection master]
3+
SET @@SESSION.gtid_domain_id=0;
4+
CREATE TABLE t (a INT);
5+
call mtr.add_suppression("connecting slave requested to start from.*which is not in the master's binlog");
6+
include/stop_slave.inc
7+
CHANGE MASTER TO master_use_gtid=slave_pos;
8+
SET @@SESSION.gtid_domain_id=11;
9+
SET @@SESSION.server_id=111;
10+
SET @@SESSION.gtid_seq_no=1;
11+
INSERT INTO t SET a=1;
12+
SET @save.gtid_slave_pos=@@global.gtid_slave_pos;
13+
SET @@global.gtid_slave_pos=concat(@@global.gtid_slave_pos, ",", 11, "-", 111, "-", 1 + 1);
14+
Warnings:
15+
Warning 1947 Specified GTID 0-1-1 conflicts with the binary log which contains a more recent GTID 0-2-2. If MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override the new value of @@gtid_slave_pos.
16+
START SLAVE IO_THREAD;
17+
include/wait_for_slave_io_error.inc [errno=1236]
18+
FLUSH BINARY LOGS;
19+
PURGE BINARY LOGS TO 'master-bin.000002';;
20+
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(11);
21+
include/start_slave.inc
22+
INSERT INTO t SET a=1;
23+
include/wait_for_slave_io_error.inc [errno=1236]
24+
FLUSH BINARY LOGS;
25+
PURGE BINARY LOGS TO 'master-bin.000004';;
26+
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(11);
27+
include/start_slave.inc
28+
SET @@SESSION.gtid_domain_id=0;
29+
DROP TABLE t;
30+
include/rpl_end.inc
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# In case master's gtid binlog state is divergent from the slave's gtid_slave_pos
2+
# slave may not be able to connect.
3+
# For instance when slave is more updated in some of domains, see
4+
# MDEV-12012 as example, the master's state may require adjustment.
5+
# In a specific case of an "old" divergent domain, that is there
6+
# won't be no more event groups from it generated, the states can be
7+
# made compatible with wiping the problematic domain away. After that slave
8+
# becomes connectable.
9+
#
10+
# Notice that the slave applied gtid state is not really required to
11+
# be similarly cleaned in order for replication to flow.
12+
# However this could lead to an expected error when the master
13+
# resumes binlogging of such domain which the test demonstrate.
14+
15+
--source include/master-slave.inc
16+
17+
--connection master
18+
# enforce the default domain_id binlogging explicitly
19+
SET @@SESSION.gtid_domain_id=0;
20+
CREATE TABLE t (a INT);
21+
--sync_slave_with_master
22+
23+
--connection slave
24+
call mtr.add_suppression("connecting slave requested to start from.*which is not in the master's binlog");
25+
26+
--source include/stop_slave.inc
27+
CHANGE MASTER TO master_use_gtid=slave_pos;
28+
29+
--connection master
30+
# create extra gtid domains for binlog state
31+
--let $extra_domain_id=11
32+
--let $extra_domain_server_id=111
33+
--let $extra_gtid_seq_no=1
34+
--eval SET @@SESSION.gtid_domain_id=$extra_domain_id
35+
--eval SET @@SESSION.server_id=$extra_domain_server_id
36+
--eval SET @@SESSION.gtid_seq_no=$extra_gtid_seq_no
37+
INSERT INTO t SET a=1;
38+
39+
#
40+
# Set up the slave replication state as if slave knows more events from the extra
41+
# domain.
42+
#
43+
--connection slave
44+
SET @save.gtid_slave_pos=@@global.gtid_slave_pos;
45+
--eval SET @@global.gtid_slave_pos=concat(@@global.gtid_slave_pos, ",", $extra_domain_id, "-", $extra_domain_server_id, "-", $extra_gtid_seq_no + 1)
46+
47+
# unsuccessful attempt to start slave
48+
START SLAVE IO_THREAD;
49+
--let $slave_io_errno=1236
50+
--source include/wait_for_slave_io_error.inc
51+
52+
--connection master
53+
# adjust the master binlog state
54+
FLUSH BINARY LOGS;
55+
--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
56+
--eval PURGE BINARY LOGS TO '$purge_to_binlog';
57+
# with final removal of the extra domain
58+
--eval FLUSH BINARY LOGS DELETE_DOMAIN_ID=($extra_domain_id)
59+
60+
--connection slave
61+
# start the slave sucessfully
62+
--source include/start_slave.inc
63+
64+
--connection master
65+
# but the following gtid from the *extra* domain will break replication
66+
INSERT INTO t SET a=1;
67+
68+
# take note of the slave io thread error due to being dismissed
69+
# extra domain at connection to master which tried becoming active;
70+
# slave is to stop.
71+
--connection slave
72+
--let $errno=1236
73+
--source include/wait_for_slave_io_error.inc
74+
75+
# let's apply the very same medicine
76+
--connection master
77+
FLUSH BINARY LOGS;
78+
--let $purge_to_binlog= query_get_value(SHOW MASTER STATUS, File, 1)
79+
--eval PURGE BINARY LOGS TO '$purge_to_binlog';
80+
# with final removal of the extra domain
81+
--eval FLUSH BINARY LOGS DELETE_DOMAIN_ID=($extra_domain_id)
82+
83+
--connection slave
84+
--source include/start_slave.inc
85+
86+
#
87+
# cleanup
88+
#
89+
--connection master
90+
SET @@SESSION.gtid_domain_id=0;
91+
DROP TABLE t;
92+
93+
sync_slave_with_master;
94+
95+
--source include/rpl_end.inc

sql/lex.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,7 @@ static SYMBOL symbols[] = {
179179
{ "DELAYED",SYM(DELAYED_SYM)},
180180
{ "DELAY_KEY_WRITE",SYM(DELAY_KEY_WRITE_SYM)},
181181
{ "DELETE",SYM(DELETE_SYM)},
182+
{ "DELETE_DOMAIN_ID", SYM(DELETE_DOMAIN_ID_SYM)},
182183
{ "DESC",SYM(DESC)},
183184
{ "DESCRIBE",SYM(DESCRIBE)},
184185
{ "DES_KEY_FILE",SYM(DES_KEY_FILE)},

0 commit comments

Comments
 (0)