DNS rate limiting is crucial for protecting email systems from attacks like DNS amplification and cache poisoning, which can disrupt email delivery and overload servers. This article outlines five practical strategies to secure your DNS infrastructure and ensure email stability:
These steps protect your email systems by blocking malicious traffic early, reducing server strain, and ensuring reliable email delivery.
DNS Rate Limiting Best Practices: Security Benefits and Email Stability Impact
One of the most important steps in securing DNS servers is disabling recursion on authoritative servers. These servers are designed to provide definitive answers, not resolve queries on behalf of clients. If recursion is enabled, it opens the door to DNS amplification attacks. In such attacks, a small spoofed query can trigger an oversized response - like a 10-byte query generating an 800-byte response through EDNS0.
"Ideally, your Internet-facing authoritative servers should not perform recursion for any clients at all." - Internet Systems Consortium (ISC)
Disabling recursion also protects against cache poisoning. With recursion turned off, your server can't be tricked into storing fake DNS records that could misdirect traffic. This is especially crucial for maintaining the integrity of email authentication protocols like SPF, DKIM, and DMARC, ensuring their records stay accurate and reliable.
Another benefit? Turning off recursion reduces the load on your server's CPU, memory, and network buffers. This not only improves server stability but also makes troubleshooting easier.
recursion no; to the options block in your configuration file.If you need recursion for internal clients, you can limit it to trusted IP ranges using Access Control Lists (ACLs). After implementing these changes, always test SPF and DKIM queries to verify that your server is providing accurate authoritative responses.
After disabling recursion, the next critical step is setting response rate limits. This measure protects your infrastructure from being overwhelmed by high query volumes, a tactic often used in DNS amplification attacks. In these attacks, a small 36-byte query can generate a massive 3,576-byte response, creating nearly 100 times the amplification effect. Configuring and fine-tuning Response Rate Limiting (RRL) settings is essential to mitigate such risks.
Begin by enabling RRL in your DNS server's configuration. For instance, in Knot Resolver, you can enable it with the following setting: rate-limiting/enable: true.
Next, define a base rate limit, which caps the number of UDP responses allowed per IP per second. This prevents any single source from overwhelming your server with excessive requests.
To manage occasional bursts of traffic from inactive hosts, configure an instant-limit. Combine this with the SLIP mechanism to handle restricted responses more effectively. The SLIP value determines how many responses are truncated versus dropped entirely. Setting SLIP to 2 ensures that half of the responses are dropped while the other half are truncated, encouraging clients to retry using TCP.
"The idea behind SLIP mechanism is to send each Nth response as truncated, thus allowing client to reconnect via TCP for at least some degree of service." – Knot Resolver Documentation
Run your server in dry-run mode and monitor key metrics, such as coredns_ratelimit_dropped_request_total, using tools like Prometheus. This allows you to fine-tune thresholds without immediately impacting live traffic. Additionally, whitelist trusted IPs through Access Control Lists (ACLs) to ensure legitimate queries - like those for SPF, DKIM, and DMARC records - are not inadvertently blocked.
Sliding window algorithms are a smart way to monitor DNS queries over a rolling 60-second period, addressing the limitations of fixed time intervals. This method closes the gaps that attackers might exploit, such as the "double-burst" issue. In this scenario, attackers time their queries to hit just before one window ends and right after the next begins, effectively doubling their allowed rate. With a sliding window, the time frame moves continuously, ensuring consistent protection without leaving exploitable gaps.
To enhance rate limiting, you can combine sliding windows with leak rate mechanisms. Leak rates help smooth out traffic spikes by processing queries at a steady pace. This is particularly useful for handling sudden bursts of DNS traffic, like those caused by batch email sends or authentication checks for SPF, DKIM, and DMARC records. By doing so, you prevent your DNS resolvers from being overwhelmed.
Set up a slip parameter (e.g., slip 1) to handle overflow queries. This configuration sends truncated responses to excess requests, encouraging legitimate mail servers to retry using TCP. Additionally, you can customize requests-per-second (RPS) limits based on IP blocks (e.g., /24 for IPv4) to avoid localized overloads. This setup allows genuine mail servers sending large batches of emails to operate smoothly while blocking suspicious activities like DNS tunneling.
Keep an eye on your DNS logs for signs of tunneling, such as high-entropy domain names or frequent NXDomain responses . This proactive monitoring reduces the strain on your system's CPU and memory while ensuring email authentication queries are processed efficiently, even during peak traffic periods.
Access Control Lists (ACLs) and Transaction Signatures (TSIG) are essential tools for securing DNS access, ensuring that only authorized sources can interact with your email infrastructure. Together, they help prevent spoofing and unauthorized changes. ACLs specify which clients are allowed to perform functions like allow-query, allow-recursion, allow-transfer, and allow-update.
That said, relying solely on IP-based ACLs can be risky since source IP addresses can be easily spoofed. This is where TSIG steps in. Defined in RFC 2845, TSIG uses cryptographic shared secrets to authenticate DNS messages. For sensitive operations such as dynamic updates and zone transfers, the BIND 9 Documentation emphasizes the importance of TSIG:
"We strongly recommend that updates be cryptographically authenticated by means of transaction signatures (TSIG). That is, the allow-update option should list only TSIG key names, not IP addresses or network prefixes."
For email systems, configuring allow-update and allow-transfer to accept only TSIG keys ensures that unauthorized changes are blocked, maintaining the integrity of your DNS records. You can also enhance security by using nested ACLs, which require both a specific IP address and a valid TSIG key - offering an added layer of verification.
Here’s how to implement these measures effectively:
Keep in mind that ACLs follow a first-match logic. Once a query matches an earlier rule, subsequent rules are ignored. This design ensures efficient performance while safeguarding your DNS servers from misuse.
Static rate limits often fall short when traffic fluctuates. To handle surges and potential attacks effectively, adaptive thresholds come into play. Modern monitoring tools track query patterns in real time and adjust these thresholds dynamically, ensuring your system stays resilient. Dynamic algorithms are at the heart of this approach, offering a smarter way to manage traffic.
A popular method for dynamic rate limiting is the token bucket algorithm. Here’s how it works: a "bucket" refills with tokens at a steady pace, and each DNS query uses one token. If the bucket runs out of tokens, additional requests are temporarily blocked. Tools like Rspamd enhance this approach by using dynamic multipliers - values like 1.01 to increase the rate for legitimate traffic and 0.99 to decrease it for spam. These multipliers can adjust the bucket’s capacity by up to 5x. This adds a layer of real-time adaptability to your DNS defenses, complementing other security measures.
Monitoring plays a crucial role in identifying potential threats early. For example, a spike in DNS queries resulting in 403 or 404 responses could signal automated scraping or botnet activity. Similarly, frequent changes in hostname-to-IP mappings paired with unusually short TTL values might indicate the presence of Fast-Flux networks, which are often used for phishing or malware distribution.
In email systems, DMARC aggregate reports (RUA) provide critical insights. These reports show which servers are attempting to send mail on your behalf, helping you detect unauthorized sources and fine-tune your rate limits. Adding multiple layers of filtering - such as DNS-based blocklists (DNSBLs) at various stages like the initial connection, pre-data SMTP phase, and post-data content inspection - strengthens your defenses even further.
To automate threshold adjustments, export DNS metrics to tools like Prometheus. For instance, the CoreDNS ratelimit plugin tracks dropped requests using the coredns_ratelimit_dropped_request_total metric. A sudden increase in dropped requests could indicate overly strict limits or an active attack. Adjusting thresholds based on these metrics ensures reliable email delivery while blocking malicious traffic. These automated tweaks seamlessly integrate with broader DNS security practices.
Platforms like Infraforge take these practices to the next level by combining automated monitoring and DNS adjustments. This ensures a secure, adaptable, and resilient email infrastructure capable of handling evolving threats.
DNS rate limiting safeguards your email infrastructure through five key strategies: disabling recursion to prevent cache poisoning and protect authoritative records, setting responses-per-second (RPS) limits to counter volumetric attacks, employing sliding windows and leak rates to throttle abuse while accommodating legitimate traffic, using ACLs and TSIG to block unauthorized changes, and automated monitoring to respond to evolving threats while minimizing false positives.
"The key to email filtering is to remove the bulk of unwanted emails right at the beginning of the process before they reach the resource-intensive content inspection stage." - The Spamhaus Team
By implementing rate limits at both the DNS and initial SMTP connection stages, malicious traffic is stopped early, reducing the burden on resource-heavy content scanning systems.
Here’s a quick breakdown of these practices and their benefits:
| Practice | Primary Security Benefit | Impact on Email Stability |
|---|---|---|
| Disable Recursion | Prevents cache poisoning and resource exhaustion | Protects critical records (MX, SPF) from being corrupted |
| RPS Limits | Blocks volumetric DDoS and amplification attacks | Ensures DNS availability for mail server lookups |
| Sliding Windows | Controls sustained abuse while allowing traffic bursts | Avoids accidental blocking of legitimate high-volume senders |
| ACLs & TSIG | Restricts access to authorized users and servers | Prevents unauthorized DNS changes or abuse of open resolvers |
| Auto-Monitoring | Adapts to emerging threats in real-time | Reduces false positives while maintaining email deliverability |
For businesses managing complex cold email systems, manual DNS configuration can be both time-consuming and error-prone. That’s where automated solutions come into play. Tools like Infraforge simplify the process by automating DNS setup, rate limit alignment, and bulk updates. This not only eliminates the risk of manual errors but also allows you to focus more on your outreach efforts rather than infrastructure management.
Disabling DNS recursion is a smart move to safeguard your server. It prevents your server from functioning as an open resolver, which attackers could exploit in DNS reflection or amplification attacks. This step significantly lowers the chances of your server being misused in distributed denial-of-service (DDoS) attacks.
On top of that, turning off recursion helps protect against cache-poisoning attacks. These attacks involve bad actors trying to insert fake DNS records into your server's cache. By disabling recursion, you strengthen the security and reliability of your DNS setup.
Sliding window and leak rate algorithms are popular choices for DNS rate limiting because they offer a flexible and effective way to manage traffic flow. These techniques help distribute requests more evenly over time, avoiding sudden surges that could strain your system.
The sliding window method works by continuously monitoring and enforcing rate limits within a rolling time frame. This makes it particularly well-suited for managing traffic in real time. On the other hand, leak rate algorithms ensure a steady stream of requests by processing them at a consistent pace, minimizing the chances of bottlenecks. When used together, these methods help maintain system stability, prevent misuse, and keep performance running smoothly.
Automated monitoring takes DNS rate limiting to the next level by making it a dynamic, proactive defense mechanism. It works by continuously analyzing key metrics like query volume, response times, error codes, and the geographic origins of requests. This allows it to quickly spot unusual spikes that could indicate abuse, such as credential stuffing or DNS amplification attacks. When such activity is detected, the system can either adjust rate limits automatically or notify administrators to act before mail servers get overwhelmed.
Infraforge has built this automated DNS monitoring right into its private email infrastructure. Each dedicated IP and domain is equipped with a monitoring system that flags traffic surges - like sudden increases in MX or TXT lookups - and immediately adjusts limits to keep operations running smoothly. This approach ensures stable cold-email outreach and protects deliverability, all without the need for manual oversight.