Posts

How to deal with Alpine DNS issues

Apr 7, 2023
Dmitry Chuyko
11.3

Alpine Linux is a go-to OS for those striving to reduce the size of their containers. It is secure, minimalistic, performant, and free-to-use. What’s there not to like? All is well until one day your Kubernetes clusters start behaving strangely. A closer look at the logs will tell you that the devil is in the DNS, namely in the way your Alpine-based application deals with it. And this devil is hard to chase away.

Read on to know how the DNS problems may arise and what you could do to solve or prevent them.

The root cause of Alpine DNS issues

How the problem manifests

The issues with DNS resolution exist in musl-based distributions, one of which is Alpine. In a nutshell, an application running in a Docker container on Kubernetes resolves most DNS queries normally, but in case of larger DNS entries, it throws the UnknownHostException or a similar exception such as

/ # ping google.com
ping: bad address 'google.com'

and the host resolution fails.

This DNS issue is reproducible in specific cases only. For instance, large DNS requests can be encountered in environments like K8s with many nodes running Ingress. But in the majority of cases, the DNS resolution works fine.

The downside is that this issue is unpredictable. Even if your containerized application runs fine on Kubernetes now, one day a large DNS entry comes your way and causes havoc in your clusters.

musl specifics

musl is a lightweight alternative to the glibc libc. It provides smaller overhead, but unfortunately, comes hand in hand with the above DNS problem. Note that this is not a musl bug, but intended behavior stemming from the functional difference between musl and glibc resolvers. musl supports DNS over UDP (user datagram protocol) only, and doesn’t support DNS over TCP, which is used for packets larger than 512 bytes. It also doesn’t support the feature specified by the RFC standard, which allows to increase the size of the UDP packet above 512 bytes (limit for DNS over UDP) via the Extension Mechanism for DNS (EDNS). This leads to limited support for big DNS packets in musl, so in case of large DNS responses, musl resolver exits.

The musl author explains that he didn’t add TCP support for the sake of improved performance. Indeed, UPD enables data transmission without establishing or verifying connection, which results in faster data transfer. TCP, in turn, uses a handshake between client and server leading to higher latency and overhead.

There are other peculiarities concerning DNS resolution in musl which may cause additional problems. For instance, musl performs parallel querying of name servers and can’t switch to sequential one like glibc. So if you start the Docker daemon with --dns=172.17.42.1 --dns=10.0.2.15, where the former is the local DNS server and the latter is used for external DNS resolving, there is no guarantee that 172.17.42.1 will be tried first, which will lead to unforeseen failures.

To find out more about differences between musl and glibc, refer to a dedicated guide.

Possible solutions

There are feasible workarounds for the issues mentioned above. Firstly, use fully qualified domain names (FQDN) to avoid resolution failures. An FQDN in DNS records ends with a dot (for instance, “google.com.” is a FQDN, whereas “google.com” is not). Secondly, you can use a local caching DNS server for caching and search path routing. It will reduce the load on the CPU and network and can even speed up name resolution in glibc to overcome a sequential nature of the requests there.

Below is the example of setting up a caching server with a flexible tool dnsmasq:

apk add dnsmasq

cat > /etc/resolv.conf << EOF
127.0.0.1
EOF

cat >> /etc/dnsmasq.conf << EOF
port=53
listen-address=127.0.0.1
strict-order
no-resolv
no-poll
server=IP_address_1
server=IP_address_2
EOF

rc-update add dnsmasq default
rc-service dnsmasq start

Finally, you can use a specialized DNS library to resolve queries longer than 512 bytes.

As far as the problem with large DNS queries is concerned, the Alpine community is aware of it and has been working on fixes for quite a while now. The exciting news is that the fixes are ready! The following set of patches applied from the musl upstream repository adds DNS TCP fallback support and additional network fixes.

0009-fix-fallback-when-ipv6-is-disabled-but-resolv.conf-h.patch
0010-dns-fail-if-ipv6-is-disabled-and-resolv.conf-has-onl.patch
0011-res_mkquery-error-out-on-consecutive-final-dots-in-n.patch
0012-dns-treat-names-rejected-by-res_mkquery-as-nonexiste.patch
0013-fix-return-value-of-gethostnbyname-2-_r-on-result-no.patch
0014-remove-impossible-error-case-from-gethostbyname2_r.patch
0015-fix-error-cases-in-gethostbyaddr_r.patch
0016-getaddrinfo-add-EAI_NODATA-error-code-to-distinguish.patch
0017-adapt-res_msend-DNS-query-core-for-working-with-mult.patch
0018-res_send-use-a-temp-buffer-if-caller-s-buffer-is-und.patch
0019-dns-implement-tcp-fallback-in-__res_msend-query-core.patch
0020-getaddrinfo-dns-lookup-use-larger-answer-buffer-to-h.patch
0021-dns-query-core-detect-udp-truncation-at-recv-time.patch
0022-dns-response-handling-ignore-presence-of-wrong-type-.patch
0023-dns-response-handling-don-t-treat-too-many-addresses.patch
0024-clean-up-dns_parse_callback.patch
0025-fix-return-value-of-gethostby-name-2-addr-with-no-re.patch
0026-inet_pton-fix-uninitialized-memory-use-for-IPv4-mapp.patch
0027-dns-prefer-monotonic-clock-for-timeouts.patch
0105-dns-check-length-field-in-tcp-response-message.patch

These patches allow the developers to use TCP for DNS-related data exchange. For instance, you can check a large DNS reply by calling getaddrinfo() (54 A records 10.23.0.2..10.23.0.55). Before the patches, we could see the following output:

IP 10.0.2.15.60368 > 10.0.2.2.53: 7847+ A? node.bell-sw.org. (34)

IP 10.0.2.2.53 > 10.0.2.15.60368: 7847*-| 29/0/0 A 10.23.0.48, A 10.23.0.52, A 10.23.0.24, A 10.23.0.54, A 10.23.0.6, A 10.23.0.17, A 10.23.0.2, A 10.23.0.30, A 10.23.0.22, A 10.23.0.21, A 10.23.0.3, A 10.23.0.9, A 10.23.0.4, A 10.23.0.47, A 10.23.0.16, A 10.23.0.14, A 10.23.0.46, A 10.23.0.42, A 10.23.0.8, A 10.23.0.31, A 10.23.0.5, A 10.23.0.13, A 10.23.0.20, A 10.23.0.10, A 10.23.0.36, A 10.23.0.40, A 10.23.0.12, A 10.23.0.7, A 10.23.0.50 (498)

IP 10.0.2.15.60368 > 10.0.2.2.53: 8614+ AAAA? node.bell-sw.org. (34)

IP 10.0.2.2.53 > 10.0.2.15.60368: 8614*- 0/1/0 (79)

However, with the patches, the result is:

IP 10.0.2.15.55127 > 10.0.2.2.53: 2485+ A? node.bell-sw.org. (34)

IP 10.0.2.15.55127 > 10.0.2.2.53: 3608+ AAAA? node.bell-sw.org. (34)

IP 10.0.2.2.53 > 10.0.2.15.55127: 2485*-| 29/0/0 A 10.23.0.47, A 10.23.0.18, A 10.23.0.43, A 10.23.0.41, A 10.23.0.9, A 10.23.0.48, A 10.23.0.11, A 10.23.0.51, A 10.23.0.8, A 10.23.0.29, A 10.23.0.27, A 10.23.0.15, A 10.23.0.5, A 10.23.0.7, A 10.23.0.19, A 10.23.0.17, A 10.23.0.12, A 10.23.0.49, A 10.23.0.2, A 10.23.0.34, A 10.23.0.14, A 10.23.0.38, A 10.23.0.3, A 10.23.0.45, A 10.23.0.10, A 10.23.0.22, A 10.23.0.55, A 10.23.0.20, A 10.23.0.4 (498)

IP 10.0.2.2.53 > 10.0.2.15.55127: 3608*- 0/1/0 (79)

IP 10.0.2.15.53870 > 10.0.2.2.53: Flags [S], seq 562297716, win 64240, options [mss 1460,sackOK,TS val 2626445450 ecr 0,nop,wscale 9,exp-tfo cookiereq], length 0

IP 10.0.2.2.53 > 10.0.2.15.53870: Flags [S.], seq 877696001, ack 562297717, win 65535, options [mss 1460], length 0

IP 10.0.2.15.53870 > 10.0.2.2.53: Flags [.], ack 1, win 64240, length 0

IP 10.0.2.15.53870 > 10.0.2.2.53: Flags [P.], seq 1:37, ack 1, win 64240, length 36 2485+ A? node.bell-sw.org. (34)

IP 10.0.2.2.53 > 10.0.2.15.53870: Flags [.], ack 37, win 65535, length 0

IP 10.0.2.2.53 > 10.0.2.15.53870: Flags [P.], seq 1:901, ack 37, win 65535, length 900 2485*- 54/0/0 A 10.23.0.50, A 10.23.0.2, A 10.23.0.24, A 10.23.0.3, A 10.23.0.8, A 10.23.0.5, A 10.23.0.32, A 10.23.0.29, A 10.23.0.38, A 10.23.0.23, A 10.23.0.40, A 10.23.0.48, A 10.23.0.7, A 10.23.0.30, A 10.23.0.13, A 10.23.0.14, A 10.23.0.26, A 10.23.0.15, A 10.23.0.6, A 10.23.0.43, A 10.23.0.17, A 10.23.0.9, A 10.23.0.20, A 10.23.0.42, A 10.23.0.10, A 10.23.0.44, A 10.23.0.36, A 10.23.0.22, A 10.23.0.11, A 10.23.0.33, A 10.23.0.4, A 10.23.0.18, A 10.23.0.12, A 10.23.0.19, A 10.23.0.16, A 10.23.0.45, A 10.23.0.35, A 10.23.0.31, A 10.23.0.28, A 10.23.0.53, A 10.23.0.39, A 10.23.0.25, A 10.23.0.27, A 10.23.0.51, A 10.23.0.34, A 10.23.0.21, A 10.23.0.46, A 10.23.0.47, A 10.23.0.49, A 10.23.0.41, A 10.23.0.52, A 10.23.0.37, A 10.23.0.54, A 10.23.0.55 (898)

IP 10.0.2.15.53870 > 10.0.2.2.53: Flags [.], ack 901, win 63900, length 0

IP 10.0.2.15.53870 > 10.0.2.2.53: Flags [R.], seq 37, ack 901, win 63900, length 0

These patches are already integrated into Alpine 3.18, so upgrading will eliminate the above issues.

However, if you are not planning to upgrade Alpine version soon, or don't want to wait for another long-lasting problem to emerge, consider migration to Alpaquita Linux, a 100 % Alpine-compatible distribution backed by vendor support.

Lightweight Alpaquita Linux with musl and glibc

The motivation behind Alpaquita was to provide companies with a small Linux distribution that overcomes known Alpine issues, including the ones described above. Therefore, we provide two Alpaquita options: with optimized musl and glibc.

The BellSoft engineers took the above patches from the musl upstream repository and integrated them into our musl builds for Alpaquita. Therefore, you can use Alpaquita Linux with optimized musl without the risk of encountering DNS issues.

However, if you have been using glibc-based Linux or are still cautious about musl, take a look at our glibc-based Alpaquita variant. It is almost as small as Alpine (8.32 MB), and we also tuned the libc implementation so that it shows even better performance than the default glibc in some cases.

Comparison of container images based on various Linux distribution

Alpaquita comes with other enhancements as well:

  • Increased kernel performance;
  • Additional security features;
  • Four malloc implementations for various workloads;
  • LTS releases as part of commercial support.

Coupled with Liberica Lite, Alpaquita enables you to create microcontainers without sacrificing performance and usability.

Have we piqued your interest? Download the white paper on Alpaquita to learn more about our Linux distro: technical characteristics, support options, and comparative performance studies. Or head over to Docker Hub and try free Alpaquita Stream with your application right now!

Get Alpaquita Linux White Paper

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading