netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] Reduce IP_FRAG_TIME fragment-reassembly timeout to 1s, from 30s
@ 2021-04-28  2:29 Matt Corallo
  2021-04-28 12:20 ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Matt Corallo @ 2021-04-28  2:29 UTC (permalink / raw)
  To: David S. Miller, netdev, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: Eric Dumazet, Willy Tarreau, Keyu Man

The default IP reassembly timeout of 30 seconds predates git
history (and cursory web searches turn up nothing related to it).
The only relevant source cited in net/ipv4/ip_fragment.c is RFC
791 defining IPv4 in 1981. RFC 791 suggests allowing the timer to
increase on the receipt of each fragment (which Linux deliberately
does not do), with a default timeout for each fragment of 15
seconds. It suggests 15s to cap a 10Kb/s flow to a 150Kb buffer of
fragments.

When Linux receives a fragment, if the total memory used for the
fragment reassembly buffer (across all senders) exceeds
net.ipv4.ipfrag_high_thresh (or the equivalent for IPv6), it
silently drops all future fragments fragments until the timers on
the original expire.

All the way in 2021, these numbers feel almost comical. The default
buffer size for fragmentation reassembly is hard-coded at 4MiB as
`net->ipv4.fqdir->high_thresh = 4 * 1024 * 1024;` capping a host at
1.06Mb/s of lost fragments before all fragments received on the
host are dropped (with independent limits for IPv6).

At 1.06Mb/s of lost fragments, we move from DoS attack threshold to
real-world scenarios - at moderate loss rates on consumer networks
today its fairly easy to hit this, causing end hosts with their MTU
(mis-)configured to fragment to have nearly 10-20 second blocks of
100% packet loss.

Reducing the default fragment timeout to 1sec gives us 32Mb/s of
fragments before we drop all fragments, which is certainly more in
line with today's network speeds than 1.06Mb/s, though an optimal
value may be still lower. Sadly, reducing it further requires a
change to the sysctl interface, as net.ipv4.ipfrag_time is only
specified in seconds.
---
  include/net/ip.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 2d6b985d11cc..f1473ac5a27c 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -135,7 +135,7 @@ struct ip_ra_chain {
  #define IP_MF        0x2000        /* Flag: "More Fragments"    */
  #define IP_OFFSET    0x1FFF        /* "Fragment Offset" part    */

-#define IP_FRAG_TIME    (30 * HZ)        /* fragment lifetime    */
+#define IP_FRAG_TIME    (1 * HZ)        /* fragment lifetime    */

  struct msghdr;
  struct net_device;
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-05-03 14:30 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28  2:29 [PATCH net-next] Reduce IP_FRAG_TIME fragment-reassembly timeout to 1s, from 30s Matt Corallo
2021-04-28 12:20 ` Eric Dumazet
2021-04-28 14:09   ` Matt Corallo
2021-04-28 14:13     ` Willy Tarreau
2021-04-28 14:28       ` Matt Corallo
2021-04-28 15:38         ` Eric Dumazet
2021-04-28 16:35           ` Matt Corallo
     [not found]             ` <1baf048d-18e8-3e0c-feee-a01b381b0168@bluematt.me>
2021-04-30 17:09               ` Eric Dumazet
2021-04-30 17:42                 ` Matt Corallo
2021-04-30 17:49                   ` Eric Dumazet
2021-04-30 17:53                     ` Matt Corallo
2021-04-30 18:04                       ` Matt Corallo
2021-05-03 14:30                         ` Matt Corallo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).