- Notifications
You must be signed in to change notification settings - Fork 68
Description
When a sentinel failover happens between initial connection in parent process and fork, somewhere in hiredis-client (or hiredis itself) a connection to old primary is wrongly reused, despite correct discovery of new primary via sentinel. This produces Expected to connect to a master, but the server is a replica (RedisClient::FailoverError)
Note resolve_master being called twice after fork, and while resolved correctly (to port 6980) the connection still goes to the previous master (which is now replica of 6980), while having correct port in config of the client object.
This does not occur when using pure-ruby redis-client.
How to reproduce:
docker-compose.yml to setup a simple primary-replica pair with one sentinel:
services: redis-master: &redis image: redis:8.2-alpine # exact version is not relevant, also tested on 7.4 container_name: redis-master ports: ["6379:6379"] entrypoint: ["redis-server", "--bind", "0.0.0.0", "--protected-mode", "no", "--appendonly", "no", "--save", ""] command: ["--port", "6379"] redis-replica: <<: *redis container_name: redis-replica ports: ["6380:6380"] command: ["--port", "6380", "--replicaof", redis-master, "6379"] depends_on: [redis-master] redis-sentinel: <<: *redis container_name: redis-sentinel ports: ["26379:26379"] entrypoint: [sh, "-c"] command: - | cat > /tmp/sentinel.conf << EOF port 26379 bind 0.0.0.0 protected-mode no dir /tmp sentinel monitor mymaster redis-master 6379 1 sentinel down-after-milliseconds mymaster 5000 sentinel failover-timeout mymaster 10000 sentinel parallel-syncs mymaster 1 sentinel resolve-hostnames yes EOF exec redis-sentinel /tmp/sentinel.conf depends_on: [redis-master, redis-replica]Script:
#!/usr/bin/env ruby # frozen_string_literal: true require 'bundler/inline' gemfile do source 'https://rubygems.org' gem 'redis-client', '0.26.0' # also tested 0.22 gem 'hiredis-client', '0.26.0' end module SentinelOverride def resolve_master super.tap do |res| res.instance_variable_set(:@host, '127.0.0.1') # because script runs outside docker puts "master resolved to #{res.port}" end end end RedisClient::SentinelConfig.prepend SentinelOverride $log = true module ConnectionMixinLogging def call(command, timeout) super.tap {|res| puts " @#{self.config.port} #{command.inspect} => #{res.inspect}" if res != $prev_res && $log $prev_res = res } end def call_pipelined(commands, timeouts, exception: true) super.tap { |res| puts " @#{self.config.port} #{commands.inspect} => #{res.inspect}" if res != $prev_res && $log $prev_res = res } end end RedisClient::HiredisConnection.prepend(ConnectionMixinLogging) if defined?(RedisClient::HiredisConnection) def wait_for(msg, no_log: false, &block) puts "Waiting #{msg}" $log = false if no_log 60.times { break if block.call; sleep 1 } $log = true if no_log end redis_config = { role: :master, sentinels: [{ host: "127.0.0.1", port: 26379 }], url: "redis://mymaster" }.freeze redis = RedisClient.sentinel(**redis_config).new_client redis.call("PING") # so that connection is established in parent process sentinel = RedisClient.config(**redis_config[:sentinels].first).new_client old_master = sentinel.call("sentinel", "get-master-addr-by-name", "mymaster") puts "Starting failover" sentinel.call("sentinel", "failover", "mymaster") wait_for("failover") { old_master != sentinel.call("sentinel", "get-master-addr-by-name", "mymaster") } wait_for("old master to become replica...", no_log: true) { sentinel.call("sentinel", "REPLICAS", "mymaster").any? { _1['role-reported'] == 'slave' } } puts puts "Forking" Process.waitpid(fork { redis.close # just to be sure, but does not help puts "#{Process.pid}: #{redis.call("role").inspect}" })Result:
@26379 [["HELLO", "3"]] => [{"server"=>"redis", "version"=>"8.2.1", "proto"=>3, "id"=>3, "mode"=>"sentinel", "modules"=>[]}] @26379 ["SENTINEL", "get-master-addr-by-name", "mymaster"] => ["172.25.0.2", "6379"] @26379 ["SENTINEL", "sentinels", "mymaster"] => [] master resolved to 6379 @6379 [["HELLO", "3"], ["ROLE"]] => [{"server"=>"redis", "version"=>"8.2.1", "proto"=>3, "id"=>8, "mode"=>"standalone", "role"=>"master", "modules"=>[{"name"=>"vectorset", "ver"=>1, "path"=>"", "args"=>[]}]}, ["master", 1405255, [["172.25.0.3", "6380", "1405255"]]]] @6379 ["PING"] => "PONG" @26379 [["HELLO", "3"]] => [{"server"=>"redis", "version"=>"8.2.1", "proto"=>3, "id"=>4, "mode"=>"sentinel", "modules"=>[]}] @26379 ["sentinel", "get-master-addr-by-name", "mymaster"] => ["172.25.0.2", "6379"] Starting failover @26379 ["sentinel", "failover", "mymaster"] => "OK" Waiting failover @26379 ["sentinel", "get-master-addr-by-name", "mymaster"] => ["172.25.0.2", "6379"] @26379 ["sentinel", "get-master-addr-by-name", "mymaster"] => ["172.25.0.3", "6380"] Waiting old master to become replica... Forking @6379 [["HELLO", "3"], ["ROLE"]] => [{"server"=>"redis", "version"=>"8.2.1", "proto"=>3, "id"=>15, "mode"=>"standalone", "role"=>"replica", "modules"=>[{"name"=>"vectorset", "ver"=>1, "path"=>"", "args"=>[]}]}, ["slave", "172.25.0.3", 6380, "connected", 1406786]] @26379 [["HELLO", "3"]] => [{"server"=>"redis", "version"=>"8.2.1", "proto"=>3, "id"=>5, "mode"=>"sentinel", "modules"=>[]}] @26379 ["SENTINEL", "get-master-addr-by-name", "mymaster"] => ["172.25.0.3", "6380"] @26379 ["SENTINEL", "sentinels", "mymaster"] => [] master resolved to 6380 @6380 [["HELLO", "3"], ["ROLE"]] => [{"server"=>"redis", "version"=>"8.2.1", "proto"=>3, "id"=>16, "mode"=>"standalone", "role"=>"replica", "modules"=>[{"name"=>"vectorset", "ver"=>1, "path"=>"", "args"=>[]}]}, ["slave", "172.25.0.3", 6380, "connected", 1406786]] @26379 [["HELLO", "3"]] => [{"server"=>"redis", "version"=>"8.2.1", "proto"=>3, "id"=>6, "mode"=>"sentinel", "modules"=>[]}] @26379 ["SENTINEL", "get-master-addr-by-name", "mymaster"] => ["172.25.0.3", "6380"] @26379 ["SENTINEL", "sentinels", "mymaster"] => [] master resolved to 6380 @6380 [["HELLO", "3"], ["ROLE"]] => [{"server"=>"redis", "version"=>"8.2.1", "proto"=>3, "id"=>17, "mode"=>"standalone", "role"=>"replica", "modules"=>[{"name"=>"vectorset", "ver"=>1, "path"=>"", "args"=>[]}]}, ["slave", "172.25.0.3", 6380, "connected", 1406786]] /Users/vasfed/.rvm/gems/ruby-3.2.9/gems/redis-client-0.26.0/lib/redis_client/sentinel_config.rb:110:in `check_role!': Expected to connect to a master, but the server is a replica (RedisClient::FailoverError) from /Users/vasfed/.rvm/gems/ruby-3.2.9/gems/redis-client-0.26.0/lib/redis_client.rb:816:in `connect' from /Users/vasfed/.rvm/gems/ruby-3.2.9/gems/redis-client-0.26.0/lib/redis_client.rb:781:in `raw_connection' from /Users/vasfed/.rvm/gems/ruby-3.2.9/gems/redis-client-0.26.0/lib/redis_client.rb:739:in `ensure_connected' from /Users/vasfed/.rvm/gems/ruby-3.2.9/gems/redis-client-0.26.0/lib/redis_client.rb:318:in `call' from ./test_hiredis.rb:62:in `block in <main>' from ./test_hiredis.rb:60:in `fork' from ./test_hiredis.rb:60:in `<main>' Expected result
When running the same with pure-ruby redis-client (sans raw connection data):
master resolved to 6379 Starting failover Waiting failover Waiting old master to become replica... Forking master resolved to 6380 16330: ["master", 1430905, [["172.25.0.2", "6379", "1430770"]]]