* [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies @ 2019-11-28 21:36 Guillaume Nault 2019-11-28 22:04 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Guillaume Nault @ 2019-11-28 21:36 UTC (permalink / raw) To: David Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet, Arnd Bergmann In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the time_after32() call might underflow and return the opposite of the expected result. This happens after socket initialisation, when ->synq_overflow_ts and ->rx_opt.ts_recent_stamp are still set to zero. In this case, they can't be compared reliably to the current value of jiffies using time_after32(), because jiffies may be too far apart (especially soon after system startup, when it's close to 2^32). In such a situation, the erroneous time_after32() result prevents tcp_synq_overflow() from updating ->synq_overflow_ts and ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps and exceeds HZ. Practical consequences should be quite limited though, because the time_after32() call of tcp_synq_no_recent_overflow() would also underflow (unless jiffies wrapped since the first time_after32() call), thus detecting a socket overflow and triggering the syncookie verification anyway. Also, since commit 399040847084 ("bpf: add helper to check for a valid SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be triggered from BPF programs. Even though such programs would normally pair these two operations, so both underflows would compensate each other as described above, we'd better avoid exposing the problem outside of the kernel networking stack. Let's fix it by initialising ->rx_opt.ts_recent_stamp and ->synq_overflow_ts to a value that can be safely compared to jiffies using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to indicate that we're not in a socket overflow phase. Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS") Signed-off-by: Guillaume Nault <gnault@redhat.com> --- net/core/sock_reuseport.c | 10 ++++++++++ net/ipv4/tcp.c | 8 ++++++++ 2 files changed, 18 insertions(+) diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index f19f179538b9..87c287433a52 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -11,6 +11,7 @@ #include <linux/idr.h> #include <linux/filter.h> #include <linux/rcupdate.h> +#include <net/tcp.h> #define INIT_SOCKS 128 @@ -85,6 +86,15 @@ int reuseport_alloc(struct sock *sk, bool bind_inany) reuse->socks[0] = sk; reuse->num_socks = 1; reuse->bind_inany = bind_inany; + + /* synq_overflow_ts can be used for syncookies. Ensure that it has a + * recent value, so that tcp_synq_overflow() and + * tcp_synq_no_recent_overflow() can safely use time_after32(). + * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to + * ensure that we start in the 'no recent overflow' case. + */ + reuse->synq_overflow_ts = jiffies - TCP_SYNCOOKIE_VALID - 1; + rcu_assign_pointer(sk->sk_reuseport_cb, reuse); out: diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 9b48aec29aca..e9555db95dff 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -443,6 +443,14 @@ void tcp_init_sock(struct sock *sk) tp->tsoffset = 0; tp->rack.reo_wnd_steps = 1; + /* ts_recent_stamp can be used for syncookies. Ensure that it has a + * recent value, so that tcp_synq_overflow() and + * tcp_synq_no_recent_overflow() can safely use time_after32(). + * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to + * ensure that we start in the 'no recent overflow' case. + */ + tp->rx_opt.ts_recent_stamp = jiffies - TCP_SYNCOOKIE_VALID - 1; + sk->sk_state = TCP_CLOSE; sk->sk_write_space = sk_stream_write_space; -- 2.21.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies 2019-11-28 21:36 [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies Guillaume Nault @ 2019-11-28 22:04 ` Eric Dumazet 2019-11-29 8:13 ` Guillaume Nault 2019-12-02 21:51 ` Guillaume Nault 0 siblings, 2 replies; 10+ messages in thread From: Eric Dumazet @ 2019-11-28 22:04 UTC (permalink / raw) To: Guillaume Nault; +Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote: > > In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the > time_after32() call might underflow and return the opposite of the > expected result. > > This happens after socket initialisation, when ->synq_overflow_ts and > ->rx_opt.ts_recent_stamp are still set to zero. In this case, they > can't be compared reliably to the current value of jiffies using > time_after32(), because jiffies may be too far apart (especially soon > after system startup, when it's close to 2^32). > > In such a situation, the erroneous time_after32() result prevents > tcp_synq_overflow() from updating ->synq_overflow_ts and > ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps > and exceeds HZ. > > Practical consequences should be quite limited though, because the > time_after32() call of tcp_synq_no_recent_overflow() would also > underflow (unless jiffies wrapped since the first time_after32() call), > thus detecting a socket overflow and triggering the syncookie > verification anyway. > > Also, since commit 399040847084 ("bpf: add helper to check for a valid > SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie > helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be > triggered from BPF programs. Even though such programs would normally > pair these two operations, so both underflows would compensate each > other as described above, we'd better avoid exposing the problem > outside of the kernel networking stack. > > Let's fix it by initialising ->rx_opt.ts_recent_stamp and > ->synq_overflow_ts to a value that can be safely compared to jiffies > using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to > indicate that we're not in a socket overflow phase. > > Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS") > Signed-off-by: Guillaume Nault <gnault@redhat.com> > --- > net/core/sock_reuseport.c | 10 ++++++++++ > net/ipv4/tcp.c | 8 ++++++++ > 2 files changed, 18 insertions(+) > > diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c > index f19f179538b9..87c287433a52 100644 > --- a/net/core/sock_reuseport.c > +++ b/net/core/sock_reuseport.c > @@ -11,6 +11,7 @@ > #include <linux/idr.h> > #include <linux/filter.h> > #include <linux/rcupdate.h> > +#include <net/tcp.h> > > #define INIT_SOCKS 128 > > @@ -85,6 +86,15 @@ int reuseport_alloc(struct sock *sk, bool bind_inany) > reuse->socks[0] = sk; > reuse->num_socks = 1; > reuse->bind_inany = bind_inany; > + > + /* synq_overflow_ts can be used for syncookies. Ensure that it has a > + * recent value, so that tcp_synq_overflow() and > + * tcp_synq_no_recent_overflow() can safely use time_after32(). > + * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to > + * ensure that we start in the 'no recent overflow' case. > + */ > + reuse->synq_overflow_ts = jiffies - TCP_SYNCOOKIE_VALID - 1; > + > rcu_assign_pointer(sk->sk_reuseport_cb, reuse); > > out: > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 9b48aec29aca..e9555db95dff 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -443,6 +443,14 @@ void tcp_init_sock(struct sock *sk) > tp->tsoffset = 0; > tp->rack.reo_wnd_steps = 1; > > + /* ts_recent_stamp can be used for syncookies. Ensure that it has a > + * recent value, so that tcp_synq_overflow() and > + * tcp_synq_no_recent_overflow() can safely use time_after32(). > + * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to > + * ensure that we start in the 'no recent overflow' case. > + */ > + tp->rx_opt.ts_recent_stamp = jiffies - TCP_SYNCOOKIE_VALID - 1; > + > sk->sk_state = TCP_CLOSE; > > sk->sk_write_space = sk_stream_write_space; > -- > 2.21.0 > A listener could be live for one year, and flip its ' I am under synflood' status every 24 days (assuming HZ=1000) You only made sure the first 24 days are ok, but the problem is still there. We need to refresh the values, maybe in tcp_synq_no_recent_overflow() (Note the issue has been there forever on 32bit arches) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies 2019-11-28 22:04 ` Eric Dumazet @ 2019-11-29 8:13 ` Guillaume Nault 2019-12-02 21:51 ` Guillaume Nault 1 sibling, 0 replies; 10+ messages in thread From: Guillaume Nault @ 2019-11-29 8:13 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote: > On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote: > > > > In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the > > time_after32() call might underflow and return the opposite of the > > expected result. > > > > This happens after socket initialisation, when ->synq_overflow_ts and > > ->rx_opt.ts_recent_stamp are still set to zero. In this case, they > > can't be compared reliably to the current value of jiffies using > > time_after32(), because jiffies may be too far apart (especially soon > > after system startup, when it's close to 2^32). > > > > In such a situation, the erroneous time_after32() result prevents > > tcp_synq_overflow() from updating ->synq_overflow_ts and > > ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps > > and exceeds HZ. > > > > Practical consequences should be quite limited though, because the > > time_after32() call of tcp_synq_no_recent_overflow() would also > > underflow (unless jiffies wrapped since the first time_after32() call), > > thus detecting a socket overflow and triggering the syncookie > > verification anyway. > > > > Also, since commit 399040847084 ("bpf: add helper to check for a valid > > SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie > > helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be > > triggered from BPF programs. Even though such programs would normally > > pair these two operations, so both underflows would compensate each > > other as described above, we'd better avoid exposing the problem > > outside of the kernel networking stack. > > > > Let's fix it by initialising ->rx_opt.ts_recent_stamp and > > ->synq_overflow_ts to a value that can be safely compared to jiffies > > using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to > > indicate that we're not in a socket overflow phase. > > > > Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS") > > Signed-off-by: Guillaume Nault <gnault@redhat.com> > > --- > > net/core/sock_reuseport.c | 10 ++++++++++ > > net/ipv4/tcp.c | 8 ++++++++ > > 2 files changed, 18 insertions(+) > > > > diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c > > index f19f179538b9..87c287433a52 100644 > > --- a/net/core/sock_reuseport.c > > +++ b/net/core/sock_reuseport.c > > @@ -11,6 +11,7 @@ > > #include <linux/idr.h> > > #include <linux/filter.h> > > #include <linux/rcupdate.h> > > +#include <net/tcp.h> > > > > #define INIT_SOCKS 128 > > > > @@ -85,6 +86,15 @@ int reuseport_alloc(struct sock *sk, bool bind_inany) > > reuse->socks[0] = sk; > > reuse->num_socks = 1; > > reuse->bind_inany = bind_inany; > > + > > + /* synq_overflow_ts can be used for syncookies. Ensure that it has a > > + * recent value, so that tcp_synq_overflow() and > > + * tcp_synq_no_recent_overflow() can safely use time_after32(). > > + * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to > > + * ensure that we start in the 'no recent overflow' case. > > + */ > > + reuse->synq_overflow_ts = jiffies - TCP_SYNCOOKIE_VALID - 1; > > + > > rcu_assign_pointer(sk->sk_reuseport_cb, reuse); > > > > out: > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index 9b48aec29aca..e9555db95dff 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -443,6 +443,14 @@ void tcp_init_sock(struct sock *sk) > > tp->tsoffset = 0; > > tp->rack.reo_wnd_steps = 1; > > > > + /* ts_recent_stamp can be used for syncookies. Ensure that it has a > > + * recent value, so that tcp_synq_overflow() and > > + * tcp_synq_no_recent_overflow() can safely use time_after32(). > > + * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to > > + * ensure that we start in the 'no recent overflow' case. > > + */ > > + tp->rx_opt.ts_recent_stamp = jiffies - TCP_SYNCOOKIE_VALID - 1; > > + > > sk->sk_state = TCP_CLOSE; > > > > sk->sk_write_space = sk_stream_write_space; > > -- > > 2.21.0 > > > > A listener could be live for one year, and flip its ' I am under > synflood' status every 24 days (assuming HZ=1000) > > You only made sure the first 24 days are ok, but the problem is still there. > > We need to refresh the values, maybe in tcp_synq_no_recent_overflow() > Indeed. I'll work on that. > (Note the issue has been there forever on 32bit arches) > Yes, I'll also update the Fixes tag. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies 2019-11-28 22:04 ` Eric Dumazet 2019-11-29 8:13 ` Guillaume Nault @ 2019-12-02 21:51 ` Guillaume Nault 2019-12-02 22:23 ` Eric Dumazet 1 sibling, 1 reply; 10+ messages in thread From: Guillaume Nault @ 2019-12-02 21:51 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote: > On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote: > > > > In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the > > time_after32() call might underflow and return the opposite of the > > expected result. > > > > This happens after socket initialisation, when ->synq_overflow_ts and > > ->rx_opt.ts_recent_stamp are still set to zero. In this case, they > > can't be compared reliably to the current value of jiffies using > > time_after32(), because jiffies may be too far apart (especially soon > > after system startup, when it's close to 2^32). > > > > In such a situation, the erroneous time_after32() result prevents > > tcp_synq_overflow() from updating ->synq_overflow_ts and > > ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps > > and exceeds HZ. > > > > Practical consequences should be quite limited though, because the > > time_after32() call of tcp_synq_no_recent_overflow() would also > > underflow (unless jiffies wrapped since the first time_after32() call), > > thus detecting a socket overflow and triggering the syncookie > > verification anyway. > > > > Also, since commit 399040847084 ("bpf: add helper to check for a valid > > SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie > > helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be > > triggered from BPF programs. Even though such programs would normally > > pair these two operations, so both underflows would compensate each > > other as described above, we'd better avoid exposing the problem > > outside of the kernel networking stack. > > > > Let's fix it by initialising ->rx_opt.ts_recent_stamp and > > ->synq_overflow_ts to a value that can be safely compared to jiffies > > using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to > > indicate that we're not in a socket overflow phase. > > > > A listener could be live for one year, and flip its ' I am under > synflood' status every 24 days (assuming HZ=1000) > > You only made sure the first 24 days are ok, but the problem is still there. > > We need to refresh the values, maybe in tcp_synq_no_recent_overflow() > Yes, but can't we refresh it in tcp_synq_overflow() instead? We basically always want to update the timestamp, unless it's already in the [last_overflow, last_overflow + HZ] interval: diff --git a/include/net/tcp.h b/include/net/tcp.h index 36f195fb576a..1a3d76dafba8 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -494,14 +494,16 @@ static inline void tcp_synq_overflow(const struct sock *sk) reuse = rcu_dereference(sk->sk_reuseport_cb); if (likely(reuse)) { last_overflow = READ_ONCE(reuse->synq_overflow_ts); - if (time_after32(now, last_overflow + HZ)) + if (time_before32(now, last_overflow) || + time_after32(now, last_overflow + HZ)) WRITE_ONCE(reuse->synq_overflow_ts, now); return; } } last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; - if (time_after32(now, last_overflow + HZ)) + if (time_before32(now, last_overflow) || + time_after32(now, last_overflow + HZ)) tcp_sk(sk)->rx_opt.ts_recent_stamp = now; } This way, tcp_synq_no_recent_overflow() should always have a recent timestamp to work on, unless tcp_synq_overflow() wasn't called. But I can't see this case happening for a legitimate connection (unless I've missed something of course). One could send an ACK without a SYN and get into this situation, but then the timestamp value doesn't have too much importance since we have to drop the connection anyway. So, even though an expired timestamp could let the packet pass the tcp_synq_no_recent_overflow() test, the syncookie validation would fail. So the packet is correctly rejected in any case. ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies 2019-12-02 21:51 ` Guillaume Nault @ 2019-12-02 22:23 ` Eric Dumazet 2019-12-04 0:46 ` Guillaume Nault 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2019-12-02 22:23 UTC (permalink / raw) To: Guillaume Nault; +Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann On Mon, Dec 2, 2019 at 1:51 PM Guillaume Nault <gnault@redhat.com> wrote: > > On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote: > > On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote: > > > > > > In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the > > > time_after32() call might underflow and return the opposite of the > > > expected result. > > > > > > This happens after socket initialisation, when ->synq_overflow_ts and > > > ->rx_opt.ts_recent_stamp are still set to zero. In this case, they > > > can't be compared reliably to the current value of jiffies using > > > time_after32(), because jiffies may be too far apart (especially soon > > > after system startup, when it's close to 2^32). > > > > > > In such a situation, the erroneous time_after32() result prevents > > > tcp_synq_overflow() from updating ->synq_overflow_ts and > > > ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps > > > and exceeds HZ. > > > > > > Practical consequences should be quite limited though, because the > > > time_after32() call of tcp_synq_no_recent_overflow() would also > > > underflow (unless jiffies wrapped since the first time_after32() call), > > > thus detecting a socket overflow and triggering the syncookie > > > verification anyway. > > > > > > Also, since commit 399040847084 ("bpf: add helper to check for a valid > > > SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie > > > helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be > > > triggered from BPF programs. Even though such programs would normally > > > pair these two operations, so both underflows would compensate each > > > other as described above, we'd better avoid exposing the problem > > > outside of the kernel networking stack. > > > > > > Let's fix it by initialising ->rx_opt.ts_recent_stamp and > > > ->synq_overflow_ts to a value that can be safely compared to jiffies > > > using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to > > > indicate that we're not in a socket overflow phase. > > > > > > > A listener could be live for one year, and flip its ' I am under > > synflood' status every 24 days (assuming HZ=1000) > > > > You only made sure the first 24 days are ok, but the problem is still there. > > > > We need to refresh the values, maybe in tcp_synq_no_recent_overflow() > > > Yes, but can't we refresh it in tcp_synq_overflow() instead? We > basically always want to update the timestamp, unless it's already in > the [last_overflow, last_overflow + HZ] interval: > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 36f195fb576a..1a3d76dafba8 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -494,14 +494,16 @@ static inline void tcp_synq_overflow(const struct sock *sk) > reuse = rcu_dereference(sk->sk_reuseport_cb); > if (likely(reuse)) { > last_overflow = READ_ONCE(reuse->synq_overflow_ts); > - if (time_after32(now, last_overflow + HZ)) > + if (time_before32(now, last_overflow) || > + time_after32(now, last_overflow + HZ)) > WRITE_ONCE(reuse->synq_overflow_ts, now); > return; > } > } > > last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; > - if (time_after32(now, last_overflow + HZ)) > + if (time_before32(now, last_overflow) || > + time_after32(now, last_overflow + HZ)) > tcp_sk(sk)->rx_opt.ts_recent_stamp = now; > } > > This way, tcp_synq_no_recent_overflow() should always have a recent > timestamp to work on, unless tcp_synq_overflow() wasn't called. But I > can't see this case happening for a legitimate connection (unless I've > missed something of course). > > One could send an ACK without a SYN and get into this situation, but > then the timestamp value doesn't have too much importance since we have > to drop the connection anyway. So, even though an expired timestamp > could let the packet pass the tcp_synq_no_recent_overflow() test, the > syncookie validation would fail. So the packet is correctly rejected in > any case. But the bug never has been that we could accept an invalid cookie (this is unlikely because of the secret matching) The bug was that even if we have not sent recent SYNCOOKIES, we would still try to validate a cookie when a stray packet comes. In the end, the packet would be dropped anyway, but we could drop the packet much faster. Updating timestamps in tcp_synq_overflow() is not going to work if we never call it :) I believe tcp_synq_no_recent_overflow() is the right place to add the fix. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies 2019-12-02 22:23 ` Eric Dumazet @ 2019-12-04 0:46 ` Guillaume Nault 2019-12-04 2:20 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Guillaume Nault @ 2019-12-04 0:46 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann On Mon, Dec 02, 2019 at 02:23:35PM -0800, Eric Dumazet wrote: > On Mon, Dec 2, 2019 at 1:51 PM Guillaume Nault <gnault@redhat.com> wrote: > > > > On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote: > > > On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote: > > > > > > > > In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the > > > > time_after32() call might underflow and return the opposite of the > > > > expected result. > > > > > > > > This happens after socket initialisation, when ->synq_overflow_ts and > > > > ->rx_opt.ts_recent_stamp are still set to zero. In this case, they > > > > can't be compared reliably to the current value of jiffies using > > > > time_after32(), because jiffies may be too far apart (especially soon > > > > after system startup, when it's close to 2^32). > > > > > > > > In such a situation, the erroneous time_after32() result prevents > > > > tcp_synq_overflow() from updating ->synq_overflow_ts and > > > > ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps > > > > and exceeds HZ. > > > > > > > > Practical consequences should be quite limited though, because the > > > > time_after32() call of tcp_synq_no_recent_overflow() would also > > > > underflow (unless jiffies wrapped since the first time_after32() call), > > > > thus detecting a socket overflow and triggering the syncookie > > > > verification anyway. > > > > > > > > Also, since commit 399040847084 ("bpf: add helper to check for a valid > > > > SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie > > > > helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be > > > > triggered from BPF programs. Even though such programs would normally > > > > pair these two operations, so both underflows would compensate each > > > > other as described above, we'd better avoid exposing the problem > > > > outside of the kernel networking stack. > > > > > > > > Let's fix it by initialising ->rx_opt.ts_recent_stamp and > > > > ->synq_overflow_ts to a value that can be safely compared to jiffies > > > > using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to > > > > indicate that we're not in a socket overflow phase. > > > > > > > > > > A listener could be live for one year, and flip its ' I am under > > > synflood' status every 24 days (assuming HZ=1000) > > > > > > You only made sure the first 24 days are ok, but the problem is still there. > > > > > > We need to refresh the values, maybe in tcp_synq_no_recent_overflow() > > > > > Yes, but can't we refresh it in tcp_synq_overflow() instead? We > > basically always want to update the timestamp, unless it's already in > > the [last_overflow, last_overflow + HZ] interval: > > > > diff --git a/include/net/tcp.h b/include/net/tcp.h > > index 36f195fb576a..1a3d76dafba8 100644 > > --- a/include/net/tcp.h > > +++ b/include/net/tcp.h > > @@ -494,14 +494,16 @@ static inline void tcp_synq_overflow(const struct sock *sk) > > reuse = rcu_dereference(sk->sk_reuseport_cb); > > if (likely(reuse)) { > > last_overflow = READ_ONCE(reuse->synq_overflow_ts); > > - if (time_after32(now, last_overflow + HZ)) > > + if (time_before32(now, last_overflow) || > > + time_after32(now, last_overflow + HZ)) > > WRITE_ONCE(reuse->synq_overflow_ts, now); > > return; > > } > > } > > > > last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; > > - if (time_after32(now, last_overflow + HZ)) > > + if (time_before32(now, last_overflow) || > > + time_after32(now, last_overflow + HZ)) > > tcp_sk(sk)->rx_opt.ts_recent_stamp = now; > > } > > > > This way, tcp_synq_no_recent_overflow() should always have a recent > > timestamp to work on, unless tcp_synq_overflow() wasn't called. But I > > can't see this case happening for a legitimate connection (unless I've > > missed something of course). > > > > One could send an ACK without a SYN and get into this situation, but > > then the timestamp value doesn't have too much importance since we have > > to drop the connection anyway. So, even though an expired timestamp > > could let the packet pass the tcp_synq_no_recent_overflow() test, the > > syncookie validation would fail. So the packet is correctly rejected in > > any case. > > But the bug never has been that we could accept an invalid cookie > (this is unlikely because of the secret matching) > I didn't mean that it was. Maybe I wasn't clear, but this last paragraph was there just to explain why I didn't address the stray ACK scenario: such packets are correctly rejected anyway (yes we could be more efficient at it, but that wasn't the original purpose of the patch). > The bug was that even if we have not sent recent SYNCOOKIES, we would > still try to validate a cookie when a stray packet comes. > I realise that I misunderstood your original answer. I was looking at this problem from a different angle: my original intend was to fix the legitimate syncookie packet case. Stale last_overflow timestamps can cause tcp_synq_overflow() to reject valid packets in the following situation: * tcp_synq_overflow() is called while jiffies is not within the (last_overflow + HZ, last_overflow + HZ + 2^31] interval. So the stale last_overflow timestamp isn't updated because time_after32(jiffies, last_overflow + HZ) returns false. * Then tcp_synq_no_recent_overflow() is called, jiffies might have increased a bit, but can still be in the (last_overflow + TCP_SYNCOOKIE_VALID, last_overflow + TCP_SYNCOOKIE_VALID + 2^31] interval. If so, time_after32(jiffies, last_overflow + TCP_SYNCOOKIE_VALID) returns true and the packet is rejected. The case is unlikely, and whenever it happens, it can't last more than two minutes. But the problem is now exposed to BPF programs and the fix is small, so I think it's worth considering it. Admittedly my original submission wasn't complete. Checking that jiffies is outside of the [last_overflow, last_overflow + HZ] interval, as in my second proposal, should fix the problem completely. > In the end, the packet would be dropped anyway, but we could drop the > packet much faster. > > Updating timestamps in tcp_synq_overflow() is not going to work if we > never call it :) > > I believe tcp_synq_no_recent_overflow() is the right place to add the fix. > Are you talking about a flood of stray ACKs (just to be sure we're on the same page here)? If so, I guess tcp_synq_no_recent_overflow() is our only chance to update last_overflow. But do we really need it? I think tcp_synq_no_recent_overflow() should first check if jiffies isn't within the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. Currently, we just test that it's not within (last_overflow + TCP_SYNCOOKIE_VALID - 2^31, last_overflow + TCP_SYNCOOKIE_VALID]. That leaves a lot of room for stale timestamps to erroneously trigger syncookie verification. By reducing the interval, we'd make sure that only values that are undistinguishable from accurate fresh overflow timestamps can trigger syncookie verification. The only bad scenario that'd remain is if a stray ACK flood happens when last_overflow is stale and jiffies is within the (last_overflow, last_overflow + TCP_SYNCOOKIE_VALID) interval. If the flood starts while we're in this state, then I fear that we can't do anything but to wait for jiffies to exeed the interval's upper bound (2 minutes max) and to rely on syncookies verification to reject stray ACKs in the mean time. For cases where the flood starts before jiffies enters that interval, then refreshing last_overflow in tcp_synq_no_recent_overflow() would prevent jiffies from entering the interval for the duration of the flood. last_overflow would have to be set TCP_SYNCOOKIE_VALID seconds in the past. But since we have no synchronisation with tcp_synq_overflow(), we'd have a race where tcp_synq_overflow() would refresh last_overflow (entering syncookie validation mode) and tcp_synq_no_recent_overflow() would immediately move it backward (leaving syncookie validation mode). We'd have to wait for the next SYN to finally reach a stable state with syncookies validation re-enabled. This is a bit far fetched, and requires a stray ACK arriving at the same moment a SYN flood is detected. Still, I feel that moving last_overflow backward in tcp_synq_no_recent_overflow() is a bit dangerous. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies 2019-12-04 0:46 ` Guillaume Nault @ 2019-12-04 2:20 ` Eric Dumazet 2019-12-04 14:34 ` Guillaume Nault 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2019-12-04 2:20 UTC (permalink / raw) To: Guillaume Nault, Eric Dumazet Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann On 12/3/19 4:46 PM, Guillaume Nault wrote: > On Mon, Dec 02, 2019 at 02:23:35PM -0800, Eric Dumazet wrote: >> On Mon, Dec 2, 2019 at 1:51 PM Guillaume Nault <gnault@redhat.com> wrote: >>> >>> On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote: >>>> On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote: >>>>> >>>>> In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the >>>>> time_after32() call might underflow and return the opposite of the >>>>> expected result. >>>>> >>>>> This happens after socket initialisation, when ->synq_overflow_ts and >>>>> ->rx_opt.ts_recent_stamp are still set to zero. In this case, they >>>>> can't be compared reliably to the current value of jiffies using >>>>> time_after32(), because jiffies may be too far apart (especially soon >>>>> after system startup, when it's close to 2^32). >>>>> >>>>> In such a situation, the erroneous time_after32() result prevents >>>>> tcp_synq_overflow() from updating ->synq_overflow_ts and >>>>> ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps >>>>> and exceeds HZ. >>>>> >>>>> Practical consequences should be quite limited though, because the >>>>> time_after32() call of tcp_synq_no_recent_overflow() would also >>>>> underflow (unless jiffies wrapped since the first time_after32() call), >>>>> thus detecting a socket overflow and triggering the syncookie >>>>> verification anyway. >>>>> >>>>> Also, since commit 399040847084 ("bpf: add helper to check for a valid >>>>> SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie >>>>> helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be >>>>> triggered from BPF programs. Even though such programs would normally >>>>> pair these two operations, so both underflows would compensate each >>>>> other as described above, we'd better avoid exposing the problem >>>>> outside of the kernel networking stack. >>>>> >>>>> Let's fix it by initialising ->rx_opt.ts_recent_stamp and >>>>> ->synq_overflow_ts to a value that can be safely compared to jiffies >>>>> using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to >>>>> indicate that we're not in a socket overflow phase. >>>>> >>>> >>>> A listener could be live for one year, and flip its ' I am under >>>> synflood' status every 24 days (assuming HZ=1000) >>>> >>>> You only made sure the first 24 days are ok, but the problem is still there. >>>> >>>> We need to refresh the values, maybe in tcp_synq_no_recent_overflow() >>>> >>> Yes, but can't we refresh it in tcp_synq_overflow() instead? We >>> basically always want to update the timestamp, unless it's already in >>> the [last_overflow, last_overflow + HZ] interval: >>> >>> diff --git a/include/net/tcp.h b/include/net/tcp.h >>> index 36f195fb576a..1a3d76dafba8 100644 >>> --- a/include/net/tcp.h >>> +++ b/include/net/tcp.h >>> @@ -494,14 +494,16 @@ static inline void tcp_synq_overflow(const struct sock *sk) >>> reuse = rcu_dereference(sk->sk_reuseport_cb); >>> if (likely(reuse)) { >>> last_overflow = READ_ONCE(reuse->synq_overflow_ts); >>> - if (time_after32(now, last_overflow + HZ)) >>> + if (time_before32(now, last_overflow) || >>> + time_after32(now, last_overflow + HZ)) >>> WRITE_ONCE(reuse->synq_overflow_ts, now); >>> return; >>> } >>> } >>> >>> last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; >>> - if (time_after32(now, last_overflow + HZ)) >>> + if (time_before32(now, last_overflow) || >>> + time_after32(now, last_overflow + HZ)) >>> tcp_sk(sk)->rx_opt.ts_recent_stamp = now; >>> } >>> >>> This way, tcp_synq_no_recent_overflow() should always have a recent >>> timestamp to work on, unless tcp_synq_overflow() wasn't called. But I >>> can't see this case happening for a legitimate connection (unless I've >>> missed something of course). >>> >>> One could send an ACK without a SYN and get into this situation, but >>> then the timestamp value doesn't have too much importance since we have >>> to drop the connection anyway. So, even though an expired timestamp >>> could let the packet pass the tcp_synq_no_recent_overflow() test, the >>> syncookie validation would fail. So the packet is correctly rejected in >>> any case. >> >> But the bug never has been that we could accept an invalid cookie >> (this is unlikely because of the secret matching) >> > I didn't mean that it was. Maybe I wasn't clear, but this last > paragraph was there just to explain why I didn't address the stray > ACK scenario: such packets are correctly rejected anyway (yes we could > be more efficient at it, but that wasn't the original purpose of the > patch). > >> The bug was that even if we have not sent recent SYNCOOKIES, we would >> still try to validate a cookie when a stray packet comes. >> > I realise that I misunderstood your original answer. I was looking at > this problem from a different angle: my original intend was to fix the > legitimate syncookie packet case. > > Stale last_overflow timestamps can cause tcp_synq_overflow() to reject > valid packets in the following situation: > > * tcp_synq_overflow() is called while jiffies is not within the > (last_overflow + HZ, last_overflow + HZ + 2^31] interval. So the > stale last_overflow timestamp isn't updated because > time_after32(jiffies, last_overflow + HZ) returns false. > > * Then tcp_synq_no_recent_overflow() is called, jiffies might have > increased a bit, but can still be in the (last_overflow + TCP_SYNCOOKIE_VALID, > last_overflow + TCP_SYNCOOKIE_VALID + 2^31] interval. If so, > time_after32(jiffies, last_overflow + TCP_SYNCOOKIE_VALID) returns > true and the packet is rejected. > > The case is unlikely, and whenever it happens, it can't last more than > two minutes. But the problem is now exposed to BPF programs and the fix > is small, so I think it's worth considering it. Admittedly my original > submission wasn't complete. Checking that jiffies is outside of the > [last_overflow, last_overflow + HZ] interval, as in my second proposal, > should fix the problem completely. > >> In the end, the packet would be dropped anyway, but we could drop the >> packet much faster. >> >> Updating timestamps in tcp_synq_overflow() is not going to work if we >> never call it :) >> >> I believe tcp_synq_no_recent_overflow() is the right place to add the fix. >> > Are you talking about a flood of stray ACKs (just to be sure we're on > the same page here)? If so, I guess tcp_synq_no_recent_overflow() is > our only chance to update last_overflow. But do we really need it? > > I think tcp_synq_no_recent_overflow() should first check if jiffies > isn't within the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] > interval. Currently, we just test that it's not within > (last_overflow + TCP_SYNCOOKIE_VALID - 2^31, last_overflow + TCP_SYNCOOKIE_VALID]. > That leaves a lot of room for stale timestamps to erroneously trigger > syncookie verification. By reducing the interval, we'd make sure that > only values that are undistinguishable from accurate fresh overflow > timestamps can trigger syncookie verification. > > The only bad scenario that'd remain is if a stray ACK flood happens > when last_overflow is stale and jiffies is within the > (last_overflow, last_overflow + TCP_SYNCOOKIE_VALID) interval. If the > flood starts while we're in this state, then I fear that we can't do > anything but to wait for jiffies to exeed the interval's upper bound > (2 minutes max) and to rely on syncookies verification to reject stray > ACKs in the mean time. For cases where the flood starts before > jiffies enters that interval, then refreshing last_overflow in > tcp_synq_no_recent_overflow() would prevent jiffies from entering the > interval for the duration of the flood. last_overflow would have to be > set TCP_SYNCOOKIE_VALID seconds in the past. But since we have > no synchronisation with tcp_synq_overflow(), we'd have a race where > tcp_synq_overflow() would refresh last_overflow (entering > syncookie validation mode) and tcp_synq_no_recent_overflow() would > immediately move it backward (leaving syncookie validation mode). We'd > have to wait for the next SYN to finally reach a stable state with > syncookies validation re-enabled. This is a bit far fetched, and > requires a stray ACK arriving at the same moment a SYN flood is > detected. Still, I feel that moving last_overflow backward in > tcp_synq_no_recent_overflow() is a bit dangerous. > Sorry I am completely lost with this amount of text. Whatever solution you come up with, make sure it covers all the points that have been raised. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies 2019-12-04 2:20 ` Eric Dumazet @ 2019-12-04 14:34 ` Guillaume Nault 2019-12-04 16:53 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Guillaume Nault @ 2019-12-04 14:34 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, David Miller, Jakub Kicinski, netdev, Arnd Bergmann On Tue, Dec 03, 2019 at 06:20:24PM -0800, Eric Dumazet wrote: > > Sorry I am completely lost with this amount of text. > No problem. I'll write a v2 and remork the problem description to make it clearer. > Whatever solution you come up with, make sure it covers all the points > that have been raised. > Will do. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies 2019-12-04 14:34 ` Guillaume Nault @ 2019-12-04 16:53 ` Eric Dumazet 2019-12-04 18:05 ` Guillaume Nault 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2019-12-04 16:53 UTC (permalink / raw) To: Guillaume Nault, Eric Dumazet Cc: Eric Dumazet, David Miller, Jakub Kicinski, netdev, Arnd Bergmann On 12/4/19 6:34 AM, Guillaume Nault wrote: > On Tue, Dec 03, 2019 at 06:20:24PM -0800, Eric Dumazet wrote: >> >> Sorry I am completely lost with this amount of text. >> > No problem. I'll write a v2 and remork the problem description to make > it clearer. > >> Whatever solution you come up with, make sure it covers all the points >> that have been raised. >> > Will do. > Thanks. > Le me apologize for my last email. I should have slept first after my long day. Thanks a lot for working on this ! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies 2019-12-04 16:53 ` Eric Dumazet @ 2019-12-04 18:05 ` Guillaume Nault 0 siblings, 0 replies; 10+ messages in thread From: Guillaume Nault @ 2019-12-04 18:05 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, David Miller, Jakub Kicinski, netdev, Arnd Bergmann On Wed, Dec 04, 2019 at 08:53:08AM -0800, Eric Dumazet wrote: > > > On 12/4/19 6:34 AM, Guillaume Nault wrote: > > On Tue, Dec 03, 2019 at 06:20:24PM -0800, Eric Dumazet wrote: > >> > >> Sorry I am completely lost with this amount of text. > >> > > No problem. I'll write a v2 and remork the problem description to make > > it clearer. > > > >> Whatever solution you come up with, make sure it covers all the points > >> that have been raised. > >> > > Will do. > > Thanks. > > > > Le me apologize for my last email. > > I should have slept first after my long day. > > Thanks a lot for working on this ! > No problem :) I'm preparing a small series, splitting the valid syncookie and stray ACK cases. I expect to have something ready later tonight. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-12-04 18:05 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-28 21:36 [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies Guillaume Nault 2019-11-28 22:04 ` Eric Dumazet 2019-11-29 8:13 ` Guillaume Nault 2019-12-02 21:51 ` Guillaume Nault 2019-12-02 22:23 ` Eric Dumazet 2019-12-04 0:46 ` Guillaume Nault 2019-12-04 2:20 ` Eric Dumazet 2019-12-04 14:34 ` Guillaume Nault 2019-12-04 16:53 ` Eric Dumazet 2019-12-04 18:05 ` Guillaume Nault
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).