Thursday, March 19, 2009

Misuse of Round-Robin DNS

This is a "re-print" of an article I had on that I notice that a number of people search for.

See additional information on how not to use Round-Robin DNS.

The use of Round-Robin DNS for load-balancing has been around for a number of years. It is meant to serve as a way to have multiple hosts with different IP addresses represent the sam hostname. It is useful, but as dedicated load-balancing technology began to evolve, its use began to decrease, as it was not sensitive to the conditions that exist within a server-farm; it simply told clients which IP they should connect to, without any consideration of the condition of the server they were connecting to.

Recently, I have seen a few instances where companies have switched back to Round-Robin DNS without considering the pitfalls and limitations of this particular method of load-balancing. The most dangerous of these is the limitation, set out in RFC 1035, that a DNS message carried in UDP cannot exceed 512 bytes.
2.3.4. Size limits

Various objects and parameters in the DNS have size limits. They are
listed below. Some could be easily changed, others are more

labels 63 octets or less
names 255 octets or less
TTL positive values of a signed 32 bit number.
UDP messages 512 octets or less

When a UDP DNS message exceeds 512 octets/bytes, the TRUNCATED bit is included in the response, indicating to the client/resolver that not all of the answers were returned, and they should re-query using a TCP DNS message.

It's the final phrase of the previous paragraph that should set everyone's ears a-twitter: the only thing that should be requested using TCP DNS messages are zone transfers between servers. If a someone who is not one of your authorized servers is attempting to connect using TCP to port 53 of your DNS server, they may be attempting to do an illegal zone transfer of your domain information.

As a result, TCP connections to port 53 are usually blocked inbound (and outbound, if you run an ISP and you assume your users are up to no good). So, guess what happens when the truncated DNS information is re-requested over TCP? Web performance is negatively affected by between 21 and 93 seconds.[1]

So, simply put, DNS information needs to be tightly managed and examined, to ensure that only the most appropriate information sent out when a DNS query arrives. Using Round-Robin DNS to load-balance more than 5 servers indicates that you should be examining other load-balancing schemes.

24 April 2003 — Round-Robin Authority Records
This morning, I spoke with a client who was experiencing extremely variable DNS lookup times on their site after they implemented a new architecture. This architecture saw them locate a full mirror of the production site in Europe, for any potential failover situations.

This looks good on paper — apparently the project plan ran to eleven pages — but they made one very serious technical faux pas: they deployed two of the authoritative name servers in the UK.

The DNS gurus in the audience are saying "So what?". This is because when you set up 4 authoritative DNS servers correctly, the query is sent to all 4 simultaneously and the one with the fastest response time is used. This usually results from the following type of configuration:

The client this morning, however, had a very different type of configuration.

When the authority record is returned in this fashion, the results are easily understood. The host name is the same for all four IP addresses, so the querying name server does what it is supposed to do in these situations: resort to the Round-RObin algorithm. Instead of querying all four name servers simultaneously, the querying name server rotates through the authoritative names.

Depending on where the authoritative name servers are located, the DNS lookup time could vary wildly. In the case of the client this morning, 50% of the DNS lookups were being routed to the UK, regardless of where the querying name server was located.

[1] This value varies depending on the operating system. For Windows 2000, the TCP timeout is 21 seconds; for Linux, this value is 93 seconds.

No comments:

Post a Comment