Alpine Linux is a go-to OS for those striving to reduce the size of their containers. It is secure, minimalistic, performant, and free-to-use. What’s there not to like? All is well until one day your Kubernetes clusters start behaving strangely. A closer look at the logs will tell you that the devil is in the DNS, namely in the way your Alpine-based application deals with it. And this devil is hard to chase away.
Read on to know how the DNS problems may arise and what you could do to solve or prevent them.
Table of Contents
The root cause of Alpine DNS issues
How the problem manifests
The issues with DNS resolution exist in musl-based distributions, one of which is Alpine. In a nutshell, an application running in a Docker container on Kubernetes resolves most DNS queries normally, but in case of larger DNS entries, it throws the UnknownHostException or a similar exception such as
/ # ping google.com ping: bad address 'google.com'
and the host resolution fails.
This DNS issue is reproducible in specific cases only. For instance, large DNS requests can be encountered in environments like K8s with many nodes running Ingress. But in the majority of cases, the DNS resolution works fine.
The downside is that this issue is unpredictable. Even if your containerized application runs fine on Kubernetes now, one day a large DNS entry comes your way and causes havoc in your clusters.
musl is a lightweight alternative to the glibc libc. It provides smaller overhead, but unfortunately, comes hand in hand with the above DNS problem. Note that this is not a musl bug, but intended behavior stemming from the functional difference between musl and glibc resolvers. musl supports DNS over UDP (user datagram protocol) only, and doesn’t support DNS over TCP, which is used for packets larger than 512 bytes. It also doesn’t support the feature specified by the RFC standard, which allows to increase the size of the UDP packet above 512 bytes (limit for DNS over UDP) via the Extension Mechanism for DNS (EDNS). This leads to limited support for big DNS packets in musl, so in case of large DNS responses, musl resolver exits.
The musl author explains that he didn’t add TCP support for the sake of improved performance. Indeed, UPD enables data transmission without establishing or verifying connection, which results in faster data transfer. TCP, in turn, uses a handshake between client and server leading to higher latency and overhead.
There are other peculiarities concerning DNS resolution in musl which may cause additional problems. For instance, musl performs parallel querying of name servers and can’t switch to sequential one like glibc. So if you start the Docker daemon with --dns=172.17.42.1 --dns=10.0.2.15, where the former is the local DNS server and the latter is used for external DNS resolving, there is no guarantee that 172.17.42.1 will be tried first, which will lead to unforeseen failures.
There are feasible workarounds for the issues mentioned above. Firstly, use fully qualified domain names (FQDN) to avoid resolution failures. An FQDN in DNS records ends with a dot (for instance, “google.com.” is a FQDN, whereas “google.com” is not). Secondly, you can use a local caching DNS server for caching and search path routing. It will reduce the load on the CPU and network and can even speed up name resolution in glibc to overcome a sequential nature of the requests there.
Below is the example of setting up a caching server with a flexible tool
apk add dnsmasq cat > /etc/resolv.conf << EOF 127.0.0.1 EOF cat >> /etc/dnsmasq.conf << EOF port=53 listen-address=127.0.0.1 strict-order no-resolv no-poll server=IP_address_1 server=IP_address_2 EOF rc-update add dnsmasq default rc-service dnsmasq start
Finally, you can use a specialized DNS library to resolve queries longer than 512 bytes.
As far as the problem with large DNS queries is concerned, the Alpine community is aware of it and has been working on fixes for quite a while now. The exciting news is that the fixes are ready! The following set of patches applied from the musl upstream repository adds DNS TCP fallback support and additional network fixes.
0009-fix-fallback-when-ipv6-is-disabled-but-resolv.conf-h.patch 0010-dns-fail-if-ipv6-is-disabled-and-resolv.conf-has-onl.patch 0011-res_mkquery-error-out-on-consecutive-final-dots-in-n.patch 0012-dns-treat-names-rejected-by-res_mkquery-as-nonexiste.patch 0013-fix-return-value-of-gethostnbyname-2-_r-on-result-no.patch 0014-remove-impossible-error-case-from-gethostbyname2_r.patch 0015-fix-error-cases-in-gethostbyaddr_r.patch 0016-getaddrinfo-add-EAI_NODATA-error-code-to-distinguish.patch 0017-adapt-res_msend-DNS-query-core-for-working-with-mult.patch 0018-res_send-use-a-temp-buffer-if-caller-s-buffer-is-und.patch 0019-dns-implement-tcp-fallback-in-__res_msend-query-core.patch 0020-getaddrinfo-dns-lookup-use-larger-answer-buffer-to-h.patch 0021-dns-query-core-detect-udp-truncation-at-recv-time.patch 0022-dns-response-handling-ignore-presence-of-wrong-type-.patch 0023-dns-response-handling-don-t-treat-too-many-addresses.patch 0024-clean-up-dns_parse_callback.patch 0025-fix-return-value-of-gethostby-name-2-addr-with-no-re.patch 0026-inet_pton-fix-uninitialized-memory-use-for-IPv4-mapp.patch 0027-dns-prefer-monotonic-clock-for-timeouts.patch 0105-dns-check-length-field-in-tcp-response-message.patch
These patches allow the developers to use TCP for DNS-related data exchange. For instance, you can check a large DNS reply by calling
getaddrinfo() (54 A records 10.23.0.2..10.23.0.55). Before the patches, we could see the following output:
IP 10.0.2.15.60368 > 10.0.2.2.53: 7847+ A? node.bell-sw.org. (34) IP 10.0.2.2.53 > 10.0.2.15.60368: 7847*-| 29/0/0 A 10.23.0.48, A 10.23.0.52, A 10.23.0.24, A 10.23.0.54, A 10.23.0.6, A 10.23.0.17, A 10.23.0.2, A 10.23.0.30, A 10.23.0.22, A 10.23.0.21, A 10.23.0.3, A 10.23.0.9, A 10.23.0.4, A 10.23.0.47, A 10.23.0.16, A 10.23.0.14, A 10.23.0.46, A 10.23.0.42, A 10.23.0.8, A 10.23.0.31, A 10.23.0.5, A 10.23.0.13, A 10.23.0.20, A 10.23.0.10, A 10.23.0.36, A 10.23.0.40, A 10.23.0.12, A 10.23.0.7, A 10.23.0.50 (498) IP 10.0.2.15.60368 > 10.0.2.2.53: 8614+ AAAA? node.bell-sw.org. (34) IP 10.0.2.2.53 > 10.0.2.15.60368: 8614*- 0/1/0 (79)
However, with the patches, the result is:
IP 10.0.2.15.55127 > 10.0.2.2.53: 2485+ A? node.bell-sw.org. (34) IP 10.0.2.15.55127 > 10.0.2.2.53: 3608+ AAAA? node.bell-sw.org. (34) IP 10.0.2.2.53 > 10.0.2.15.55127: 2485*-| 29/0/0 A 10.23.0.47, A 10.23.0.18, A 10.23.0.43, A 10.23.0.41, A 10.23.0.9, A 10.23.0.48, A 10.23.0.11, A 10.23.0.51, A 10.23.0.8, A 10.23.0.29, A 10.23.0.27, A 10.23.0.15, A 10.23.0.5, A 10.23.0.7, A 10.23.0.19, A 10.23.0.17, A 10.23.0.12, A 10.23.0.49, A 10.23.0.2, A 10.23.0.34, A 10.23.0.14, A 10.23.0.38, A 10.23.0.3, A 10.23.0.45, A 10.23.0.10, A 10.23.0.22, A 10.23.0.55, A 10.23.0.20, A 10.23.0.4 (498) IP 10.0.2.2.53 > 10.0.2.15.55127: 3608*- 0/1/0 (79) IP 10.0.2.15.53870 > 10.0.2.2.53: Flags [S], seq 562297716, win 64240, options [mss 1460,sackOK,TS val 2626445450 ecr 0,nop,wscale 9,exp-tfo cookiereq], length 0 IP 10.0.2.2.53 > 10.0.2.15.53870: Flags [S.], seq 877696001, ack 562297717, win 65535, options [mss 1460], length 0 IP 10.0.2.15.53870 > 10.0.2.2.53: Flags [.], ack 1, win 64240, length 0 IP 10.0.2.15.53870 > 10.0.2.2.53: Flags [P.], seq 1:37, ack 1, win 64240, length 36 2485+ A? node.bell-sw.org. (34) IP 10.0.2.2.53 > 10.0.2.15.53870: Flags [.], ack 37, win 65535, length 0 IP 10.0.2.2.53 > 10.0.2.15.53870: Flags [P.], seq 1:901, ack 37, win 65535, length 900 2485*- 54/0/0 A 10.23.0.50, A 10.23.0.2, A 10.23.0.24, A 10.23.0.3, A 10.23.0.8, A 10.23.0.5, A 10.23.0.32, A 10.23.0.29, A 10.23.0.38, A 10.23.0.23, A 10.23.0.40, A 10.23.0.48, A 10.23.0.7, A 10.23.0.30, A 10.23.0.13, A 10.23.0.14, A 10.23.0.26, A 10.23.0.15, A 10.23.0.6, A 10.23.0.43, A 10.23.0.17, A 10.23.0.9, A 10.23.0.20, A 10.23.0.42, A 10.23.0.10, A 10.23.0.44, A 10.23.0.36, A 10.23.0.22, A 10.23.0.11, A 10.23.0.33, A 10.23.0.4, A 10.23.0.18, A 10.23.0.12, A 10.23.0.19, A 10.23.0.16, A 10.23.0.45, A 10.23.0.35, A 10.23.0.31, A 10.23.0.28, A 10.23.0.53, A 10.23.0.39, A 10.23.0.25, A 10.23.0.27, A 10.23.0.51, A 10.23.0.34, A 10.23.0.21, A 10.23.0.46, A 10.23.0.47, A 10.23.0.49, A 10.23.0.41, A 10.23.0.52, A 10.23.0.37, A 10.23.0.54, A 10.23.0.55 (898) IP 10.0.2.15.53870 > 10.0.2.2.53: Flags [.], ack 901, win 63900, length 0 IP 10.0.2.15.53870 > 10.0.2.2.53: Flags [R.], seq 37, ack 901, win 63900, length 0
These patches are already integrated into Alpine 3.18, so upgrading will eliminate the above issues.
However, if you are not planning to upgrade Alpine version soon, or don't want to wait for another long-lasting problem to emerge, consider migration to Alpaquita Linux, a 100 % Alpine-compatible distribution backed by vendor support.
Lightweight Alpaquita Linux with musl and glibc
The motivation behind Alpaquita was to provide companies with a small Linux distribution that overcomes known Alpine issues, including the ones described above. Therefore, we provide two Alpaquita options: with optimized musl and glibc.
The BellSoft engineers took the above patches from the musl upstream repository and integrated them into our musl builds for Alpaquita. Therefore, you can use Alpaquita Linux with optimized musl without the risk of encountering DNS issues.
However, if you have been using glibc-based Linux or are still cautious about musl, take a look at our glibc-based Alpaquita variant. It is almost as small as Alpine (8.32 MB), and we also tuned the libc implementation so that it shows even better performance than the default glibc in some cases.
Comparison of container images based on various Linux distribution
Alpaquita comes with other enhancements as well:
- Increased kernel performance;
- Additional security features;
- Four malloc implementations for various workloads;
- LTS releases as part of commercial support.
Coupled with Liberica Lite, Alpaquita enables you to create microcontainers without sacrificing performance and usability.
Have we piqued your interest? Download the white paper on Alpaquita to learn more about our Linux distro: technical characteristics, support options, and comparative performance studies. Or head over to Docker Hub and try free Alpaquita Stream with your application right now!