DNS Lookup Delay
Lastmod: 2023-01-26

Overview

This issue happens when your application pauses a significant period of time when doing a DNS lookup.

Check RunBook Match

If your application regularly pauses when doing anything on the network, but successfully continues, this runbook may help.

Initial Steps Overview

  1. Gather information

  2. Run lookup on the command line

  3. Subset of URLs?

  4. Using DNSMasq?

  5. IPTables?

  6. IPv6 issue?

Detailed Steps

1) Gather information

1.1) Operating system

First, determine which OS you are running. See the Determine Operating System How-To

1.2) DNS servers

Next, determine which DNS servers you are using.

If your OS is Darwin/Mac/iOS, then run

scutil --dns > /tmp/runbooks-dns-servers.txt

and capture the output.

If your OS is Linux, then run:

cat /etc/resolv.conf | grep ^nameserver > /tmp/runbooks-dns-servers.txt

1.3) Get host you are trying to look up

You may need to get this from your application logs.

If you can’t work this out, just use a ‘standard’ one, like google.com.

This domain will be referred to as [DOMAIN] from here on.

1.4) Figure Out When This Started

If this behaviour was not always happening on this host, then try and work out when it started.

1.5) How widespread is the problem

If you can, determine whether this affects all DNS lookups or just a subset.

If it’s just a subset, then you may want to skip to Step 3

1.5) Are you running a DNS proxy?

A DNS proxy is a program that intercepts DNS requests and replies to DNS requests, or passes them on to other DNS servers. Sometimes these run as local services on the host making the request, to

1.5.1) Use netstat

First try connecting to localhost on port 53 (the standard DNS port) with netstat, eg if you run:

netstat -l | grep -w 53

and get a response that contains a line like:

udp        0      0 127.0.0.53:domain       0.0.0.0:*

then you have a server running on the DNS port on your local machine.

If you have suspicions that a DNS proxy may be running on your host, then running the same command without grep’s -w flag may help. Many DNS proxies have 53 somewhere in their port number (eg see Vagrant landrush, below).

1.5.2) DNSMasq?
  • Are you running DNSMasq? Running this may help determine that:
ps -ef | grep -i dnsmasq
1.5.3) Vagrant landrush?

Vagrant landrush is another DNS proxy that can ‘take over’ your DNS requests. It uses IPTables to divert DNS requests on the host to port 53 to itself. If Landrush can’t reply, then it passes the request on to its original destination.

It uses port 10053.

1.5.4) Kubernetes?

TODO

1.5.5) systemd-resolved?

It uses port 10053.

2) Run lookup on the command line

2.1) Reproduce problem on the command line

First, try and reproduce the problem by running a lookup on your machine, using time to get a report of the time taken, and dig to do the lookup itself:

time dig [DOMAIN]

If you can’t reproduce the problem, then it may be that your application is using a different DNS lookup method than your host does.

2.2) Run lookup against each nameserver in turn

Then, do the same lookup on each nameserver you extracted from step 1.2.

time dig @1.2.3.4 [DOMAIN]

If one of the nameservers is delayed, and the others are fast, consider moving to Solution A

2.3) Run lookup against a well known DNS server

Use a well-known public DNS server, for example Google’s 8.8.8.8:

time dig @8.8.8.8 [DOMAIN]

to check whether the issue is with the nameservers you are using.

3) Subset of URLs?

If this issue affects a subset of URLs, then consider:

  • do the affected domains return a large DNS response?

If so, then you may be hitting limits on the size of the DNS response, which in turn trigger network MTUs limits, or exceed the RFC length of DNS responses (see size limits here).

This has been seen when enterprises have

4) Using DNSMasq?

If you are using DNSMasq (a popular local DNS server), consider:

4.1) DNSMasq config

  • whether DNSMasq could be causing the problem. Config to consider includes:
IGNORE_RESOLVCONF=yes

4.2) /etc/resolv.conf DNSMasq Config

[...]
nameserver [...]

restart dnsmasq

4.3) Check DNSMasq logs

This may give a clue as to where the issue lies.

Typically, this will involve running (as root):

journalctl -u dnsmasq

4.4) Further information

See below for more background that may help, and in further information:

Possible layers involved, along with systemd

5) IPTables/NetFilter?

It may be possible that IPTables/NetFilter is interfering with your DNS request somehow.

See the DNS lookup failure article step for more guidance.

6) IPv6 issue?

If there is no delay when running

host [DOMAIN]

then the issue may be related to IPv6. See solution C.

Solutions List

A) Edit the nameserver list

B) Reduce the timeout

C) Disable IPv6

Solution Detail

A) Edit the nameserver list

If you are using Linux, then this is likely to be in /etc/resolv.conf.

Please be aware of systemd when editing /etc/resolv.conf. It is possible that changes to this file will be overwritten.

If you are using Mac/iOS, then (?TODO?)

B) Reduce the timeout

This is less of a solution than a workaround, since it masks the problem. It may help to confirm whether the delay is due to a problem with a specific DNS server among others.

If you add the line options timeout:1 to your /etc/resolv.conf file and see the delay fall to 1 second, then the issue is likely to be with a specific DNS server in your list.

C) Disable IPv6

Before applying this solution, be aware that switching off IPv6 may not be advisable if you are using it.

If you run this as root, then IPv6 will be disabled.

sysctl net.ipv6.conf.all.disable_ipv6=1

TODO: persisting this?

Check Resolution

Your application should no longer be pausing on DNS lookups.

Further Steps

None

Further Information

Network MTUs

DNSMasq Documentation

RFC1035 - Domain Names - Implementation and Specification

DNSMasq layers

systemd-resolved and /etc/resolv.conf

Systemd has made DNS configuration a lot more complicated in recent years.

There are two systemd services associated with DNS resolution.

systemd-resolved.service

resolvconf.service

TODO: more on this, and advice on how to update your configuration

Owner

Ian Miell

comments powered by Disqus