1

As part of a project I'm involved in, system is required with as close to 99.999% uptime as possible (the system involves healthcare). The solution I am investigating involves having multiple sites which in turn have their own load balancers and multiple internal servers, and their own replicated database which is synchronised with every other site. What sits in front of all of this is a DNS based failover system that redirects traffic if a site goes down (or is manually taken down for maintenance).

What I'm struggling with however is how the DNS aspect functions without preventing a single point of failure. I've seen talk of floating IPs (which present that point of failure), various managed services such as DNSMadeEasy (which don't provide the ability to fully test their failover process during their free trial, so I can't verify if it's right for the project or not) and much more, and have been playing around with simple solutions such as assigning multiple A records for a domain name (which I understand falls far short given the discrepancies between how different browsers will interact with such a setup).

For a more robust DNS based approach, do you simply stipulate a nameserver for each location on a domain, run a nameserver at each location, and update each nameserver's independent records regularly when a failure is detected at another site (using scripts run on each nameserver to check all other sites)? If so, aren't there still the same issues that are found with regularly changed A records (browsers not updating to the new records, or ignoring very low TTLs)?

Here's a visual representation of how I understand the system would work.

I have been reading around this subject for several days now (including plenty of Q&As on here), but feel like I'm missing a fundamental piece of the puzzle.

Thanks in advance!

2
  • Ideally the nameservers should be located separately from the web servers. So if one of the webserver datacenters goes down, you'll still have both nameservers running. Commented Apr 20, 2017 at 19:27
  • One solution is to use a third-party DNS service that implements failover, such as Amazon Route53. Commented Apr 20, 2017 at 19:29

1 Answer 1

1

A failover system based on updating information in DNS will not be good enough for five nines of availability.

The lowest DNS TTL that can be generally relied upon to be adhered to is 300 seconds. 0.001% of a year is 315 seconds. So a system based on DNS can have at most one failover per year before it breaks five nines. It doesn't matter how well you build your DNS infrastructure, since this is a limitation based on the general behaviour of DNS clients, which you cannot change.

I suggest you start looking at building your resilience at the IP address level, via anycast or something like that (not my area of expertise, so I can't give detailed advice there). You'll still need a good DNS infrastructure, of course, but with largely static DNS data just buying a standard service from a reputable DNS service provider will be good enough.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.