netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies
@ 2019-11-28 21:36 Guillaume Nault
  2019-11-28 22:04 ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Guillaume Nault @ 2019-11-28 21:36 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet, Arnd Bergmann

In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the
time_after32() call might underflow and return the opposite of the
expected result.

This happens after socket initialisation, when ->synq_overflow_ts and
->rx_opt.ts_recent_stamp are still set to zero. In this case, they
can't be compared reliably to the current value of jiffies using
time_after32(), because jiffies may be too far apart (especially soon
after system startup, when it's close to 2^32).

In such a situation, the erroneous time_after32() result prevents
tcp_synq_overflow() from updating ->synq_overflow_ts and
->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps
and exceeds HZ.

Practical consequences should be quite limited though, because the
time_after32() call of tcp_synq_no_recent_overflow() would also
underflow (unless jiffies wrapped since the first time_after32() call),
thus detecting a socket overflow and triggering the syncookie
verification anyway.

Also, since commit 399040847084 ("bpf: add helper to check for a valid
SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie
helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be
triggered from BPF programs. Even though such programs would normally
pair these two operations, so both underflows would compensate each
other as described above, we'd better avoid exposing the problem
outside of the kernel networking stack.

Let's fix it by initialising ->rx_opt.ts_recent_stamp and
->synq_overflow_ts to a value that can be safely compared to jiffies
using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to
indicate that we're not in a socket overflow phase.

Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
 net/core/sock_reuseport.c | 10 ++++++++++
 net/ipv4/tcp.c            |  8 ++++++++
 2 files changed, 18 insertions(+)

diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index f19f179538b9..87c287433a52 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -11,6 +11,7 @@
 #include <linux/idr.h>
 #include <linux/filter.h>
 #include <linux/rcupdate.h>
+#include <net/tcp.h>
 
 #define INIT_SOCKS 128
 
@@ -85,6 +86,15 @@ int reuseport_alloc(struct sock *sk, bool bind_inany)
 	reuse->socks[0] = sk;
 	reuse->num_socks = 1;
 	reuse->bind_inany = bind_inany;
+
+	/* synq_overflow_ts can be used for syncookies. Ensure that it has a
+	 * recent value, so that tcp_synq_overflow() and
+	 * tcp_synq_no_recent_overflow() can safely use time_after32().
+	 * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to
+	 * ensure that we start in the 'no recent overflow' case.
+	 */
+	reuse->synq_overflow_ts = jiffies - TCP_SYNCOOKIE_VALID - 1;
+
 	rcu_assign_pointer(sk->sk_reuseport_cb, reuse);
 
 out:
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9b48aec29aca..e9555db95dff 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -443,6 +443,14 @@ void tcp_init_sock(struct sock *sk)
 	tp->tsoffset = 0;
 	tp->rack.reo_wnd_steps = 1;
 
+	/* ts_recent_stamp can be used for syncookies. Ensure that it has a
+	 * recent value, so that tcp_synq_overflow() and
+	 * tcp_synq_no_recent_overflow() can safely use time_after32().
+	 * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to
+	 * ensure that we start in the 'no recent overflow' case.
+	 */
+	tp->rx_opt.ts_recent_stamp = jiffies - TCP_SYNCOOKIE_VALID - 1;
+
 	sk->sk_state = TCP_CLOSE;
 
 	sk->sk_write_space = sk_stream_write_space;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies
  2019-11-28 21:36 [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies Guillaume Nault
@ 2019-11-28 22:04 ` Eric Dumazet
  2019-11-29  8:13   ` Guillaume Nault
  2019-12-02 21:51   ` Guillaume Nault
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2019-11-28 22:04 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann

On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote:
>
> In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the
> time_after32() call might underflow and return the opposite of the
> expected result.
>
> This happens after socket initialisation, when ->synq_overflow_ts and
> ->rx_opt.ts_recent_stamp are still set to zero. In this case, they
> can't be compared reliably to the current value of jiffies using
> time_after32(), because jiffies may be too far apart (especially soon
> after system startup, when it's close to 2^32).
>
> In such a situation, the erroneous time_after32() result prevents
> tcp_synq_overflow() from updating ->synq_overflow_ts and
> ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps
> and exceeds HZ.
>
> Practical consequences should be quite limited though, because the
> time_after32() call of tcp_synq_no_recent_overflow() would also
> underflow (unless jiffies wrapped since the first time_after32() call),
> thus detecting a socket overflow and triggering the syncookie
> verification anyway.
>
> Also, since commit 399040847084 ("bpf: add helper to check for a valid
> SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie
> helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be
> triggered from BPF programs. Even though such programs would normally
> pair these two operations, so both underflows would compensate each
> other as described above, we'd better avoid exposing the problem
> outside of the kernel networking stack.
>
> Let's fix it by initialising ->rx_opt.ts_recent_stamp and
> ->synq_overflow_ts to a value that can be safely compared to jiffies
> using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to
> indicate that we're not in a socket overflow phase.
>
> Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS")
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
>  net/core/sock_reuseport.c | 10 ++++++++++
>  net/ipv4/tcp.c            |  8 ++++++++
>  2 files changed, 18 insertions(+)
>
> diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
> index f19f179538b9..87c287433a52 100644
> --- a/net/core/sock_reuseport.c
> +++ b/net/core/sock_reuseport.c
> @@ -11,6 +11,7 @@
>  #include <linux/idr.h>
>  #include <linux/filter.h>
>  #include <linux/rcupdate.h>
> +#include <net/tcp.h>
>
>  #define INIT_SOCKS 128
>
> @@ -85,6 +86,15 @@ int reuseport_alloc(struct sock *sk, bool bind_inany)
>         reuse->socks[0] = sk;
>         reuse->num_socks = 1;
>         reuse->bind_inany = bind_inany;
> +
> +       /* synq_overflow_ts can be used for syncookies. Ensure that it has a
> +        * recent value, so that tcp_synq_overflow() and
> +        * tcp_synq_no_recent_overflow() can safely use time_after32().
> +        * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to
> +        * ensure that we start in the 'no recent overflow' case.
> +        */
> +       reuse->synq_overflow_ts = jiffies - TCP_SYNCOOKIE_VALID - 1;
> +
>         rcu_assign_pointer(sk->sk_reuseport_cb, reuse);
>
>  out:
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 9b48aec29aca..e9555db95dff 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -443,6 +443,14 @@ void tcp_init_sock(struct sock *sk)
>         tp->tsoffset = 0;
>         tp->rack.reo_wnd_steps = 1;
>
> +       /* ts_recent_stamp can be used for syncookies. Ensure that it has a
> +        * recent value, so that tcp_synq_overflow() and
> +        * tcp_synq_no_recent_overflow() can safely use time_after32().
> +        * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to
> +        * ensure that we start in the 'no recent overflow' case.
> +        */
> +       tp->rx_opt.ts_recent_stamp = jiffies - TCP_SYNCOOKIE_VALID - 1;
> +
>         sk->sk_state = TCP_CLOSE;
>
>         sk->sk_write_space = sk_stream_write_space;
> --
> 2.21.0
>

A listener could be live for one year, and flip its ' I am under
synflood' status every 24 days (assuming HZ=1000)

You only made sure the first 24 days are ok, but the problem is still there.

We need to refresh the values, maybe in tcp_synq_no_recent_overflow()

(Note the issue has been there forever on 32bit arches)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies
  2019-11-28 22:04 ` Eric Dumazet
@ 2019-11-29  8:13   ` Guillaume Nault
  2019-12-02 21:51   ` Guillaume Nault
  1 sibling, 0 replies; 10+ messages in thread
From: Guillaume Nault @ 2019-11-29  8:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann

On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote:
> On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote:
> >
> > In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the
> > time_after32() call might underflow and return the opposite of the
> > expected result.
> >
> > This happens after socket initialisation, when ->synq_overflow_ts and
> > ->rx_opt.ts_recent_stamp are still set to zero. In this case, they
> > can't be compared reliably to the current value of jiffies using
> > time_after32(), because jiffies may be too far apart (especially soon
> > after system startup, when it's close to 2^32).
> >
> > In such a situation, the erroneous time_after32() result prevents
> > tcp_synq_overflow() from updating ->synq_overflow_ts and
> > ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps
> > and exceeds HZ.
> >
> > Practical consequences should be quite limited though, because the
> > time_after32() call of tcp_synq_no_recent_overflow() would also
> > underflow (unless jiffies wrapped since the first time_after32() call),
> > thus detecting a socket overflow and triggering the syncookie
> > verification anyway.
> >
> > Also, since commit 399040847084 ("bpf: add helper to check for a valid
> > SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie
> > helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be
> > triggered from BPF programs. Even though such programs would normally
> > pair these two operations, so both underflows would compensate each
> > other as described above, we'd better avoid exposing the problem
> > outside of the kernel networking stack.
> >
> > Let's fix it by initialising ->rx_opt.ts_recent_stamp and
> > ->synq_overflow_ts to a value that can be safely compared to jiffies
> > using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to
> > indicate that we're not in a socket overflow phase.
> >
> > Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS")
> > Signed-off-by: Guillaume Nault <gnault@redhat.com>
> > ---
> >  net/core/sock_reuseport.c | 10 ++++++++++
> >  net/ipv4/tcp.c            |  8 ++++++++
> >  2 files changed, 18 insertions(+)
> >
> > diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
> > index f19f179538b9..87c287433a52 100644
> > --- a/net/core/sock_reuseport.c
> > +++ b/net/core/sock_reuseport.c
> > @@ -11,6 +11,7 @@
> >  #include <linux/idr.h>
> >  #include <linux/filter.h>
> >  #include <linux/rcupdate.h>
> > +#include <net/tcp.h>
> >
> >  #define INIT_SOCKS 128
> >
> > @@ -85,6 +86,15 @@ int reuseport_alloc(struct sock *sk, bool bind_inany)
> >         reuse->socks[0] = sk;
> >         reuse->num_socks = 1;
> >         reuse->bind_inany = bind_inany;
> > +
> > +       /* synq_overflow_ts can be used for syncookies. Ensure that it has a
> > +        * recent value, so that tcp_synq_overflow() and
> > +        * tcp_synq_no_recent_overflow() can safely use time_after32().
> > +        * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to
> > +        * ensure that we start in the 'no recent overflow' case.
> > +        */
> > +       reuse->synq_overflow_ts = jiffies - TCP_SYNCOOKIE_VALID - 1;
> > +
> >         rcu_assign_pointer(sk->sk_reuseport_cb, reuse);
> >
> >  out:
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 9b48aec29aca..e9555db95dff 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -443,6 +443,14 @@ void tcp_init_sock(struct sock *sk)
> >         tp->tsoffset = 0;
> >         tp->rack.reo_wnd_steps = 1;
> >
> > +       /* ts_recent_stamp can be used for syncookies. Ensure that it has a
> > +        * recent value, so that tcp_synq_overflow() and
> > +        * tcp_synq_no_recent_overflow() can safely use time_after32().
> > +        * Initialise it 'TCP_SYNCOOKIE_VALID + 1' jiffies in the past, to
> > +        * ensure that we start in the 'no recent overflow' case.
> > +        */
> > +       tp->rx_opt.ts_recent_stamp = jiffies - TCP_SYNCOOKIE_VALID - 1;
> > +
> >         sk->sk_state = TCP_CLOSE;
> >
> >         sk->sk_write_space = sk_stream_write_space;
> > --
> > 2.21.0
> >
> 
> A listener could be live for one year, and flip its ' I am under
> synflood' status every 24 days (assuming HZ=1000)
> 
> You only made sure the first 24 days are ok, but the problem is still there.
> 
> We need to refresh the values, maybe in tcp_synq_no_recent_overflow()
> 
Indeed. I'll work on that.

> (Note the issue has been there forever on 32bit arches)
> 
Yes, I'll also update the Fixes tag.

Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies
  2019-11-28 22:04 ` Eric Dumazet
  2019-11-29  8:13   ` Guillaume Nault
@ 2019-12-02 21:51   ` Guillaume Nault
  2019-12-02 22:23     ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: Guillaume Nault @ 2019-12-02 21:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann

On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote:
> On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote:
> >
> > In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the
> > time_after32() call might underflow and return the opposite of the
> > expected result.
> >
> > This happens after socket initialisation, when ->synq_overflow_ts and
> > ->rx_opt.ts_recent_stamp are still set to zero. In this case, they
> > can't be compared reliably to the current value of jiffies using
> > time_after32(), because jiffies may be too far apart (especially soon
> > after system startup, when it's close to 2^32).
> >
> > In such a situation, the erroneous time_after32() result prevents
> > tcp_synq_overflow() from updating ->synq_overflow_ts and
> > ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps
> > and exceeds HZ.
> >
> > Practical consequences should be quite limited though, because the
> > time_after32() call of tcp_synq_no_recent_overflow() would also
> > underflow (unless jiffies wrapped since the first time_after32() call),
> > thus detecting a socket overflow and triggering the syncookie
> > verification anyway.
> >
> > Also, since commit 399040847084 ("bpf: add helper to check for a valid
> > SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie
> > helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be
> > triggered from BPF programs. Even though such programs would normally
> > pair these two operations, so both underflows would compensate each
> > other as described above, we'd better avoid exposing the problem
> > outside of the kernel networking stack.
> >
> > Let's fix it by initialising ->rx_opt.ts_recent_stamp and
> > ->synq_overflow_ts to a value that can be safely compared to jiffies
> > using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to
> > indicate that we're not in a socket overflow phase.
> >
> 
> A listener could be live for one year, and flip its ' I am under
> synflood' status every 24 days (assuming HZ=1000)
> 
> You only made sure the first 24 days are ok, but the problem is still there.
> 
> We need to refresh the values, maybe in tcp_synq_no_recent_overflow()
>
Yes, but can't we refresh it in tcp_synq_overflow() instead? We
basically always want to update the timestamp, unless it's already in
the [last_overflow, last_overflow + HZ] interval:

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 36f195fb576a..1a3d76dafba8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -494,14 +494,16 @@ static inline void tcp_synq_overflow(const struct sock *sk)
 		reuse = rcu_dereference(sk->sk_reuseport_cb);
 		if (likely(reuse)) {
 			last_overflow = READ_ONCE(reuse->synq_overflow_ts);
-			if (time_after32(now, last_overflow + HZ))
+			if (time_before32(now, last_overflow) ||
+			    time_after32(now, last_overflow + HZ))
 				WRITE_ONCE(reuse->synq_overflow_ts, now);
 			return;
 		}
 	}
 
 	last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
-	if (time_after32(now, last_overflow + HZ))
+	if (time_before32(now, last_overflow) ||
+	    time_after32(now, last_overflow + HZ))
 		tcp_sk(sk)->rx_opt.ts_recent_stamp = now;
 }
 
This way, tcp_synq_no_recent_overflow() should always have a recent
timestamp to work on, unless tcp_synq_overflow() wasn't called. But I
can't see this case happening for a legitimate connection (unless I've
missed something of course).

One could send an ACK without a SYN and get into this situation, but
then the timestamp value doesn't have too much importance since we have
to drop the connection anyway. So, even though an expired timestamp
could let the packet pass the tcp_synq_no_recent_overflow() test, the
syncookie validation would fail. So the packet is correctly rejected in
any case.


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies
  2019-12-02 21:51   ` Guillaume Nault
@ 2019-12-02 22:23     ` Eric Dumazet
  2019-12-04  0:46       ` Guillaume Nault
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2019-12-02 22:23 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann

On Mon, Dec 2, 2019 at 1:51 PM Guillaume Nault <gnault@redhat.com> wrote:
>
> On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote:
> > On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote:
> > >
> > > In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the
> > > time_after32() call might underflow and return the opposite of the
> > > expected result.
> > >
> > > This happens after socket initialisation, when ->synq_overflow_ts and
> > > ->rx_opt.ts_recent_stamp are still set to zero. In this case, they
> > > can't be compared reliably to the current value of jiffies using
> > > time_after32(), because jiffies may be too far apart (especially soon
> > > after system startup, when it's close to 2^32).
> > >
> > > In such a situation, the erroneous time_after32() result prevents
> > > tcp_synq_overflow() from updating ->synq_overflow_ts and
> > > ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps
> > > and exceeds HZ.
> > >
> > > Practical consequences should be quite limited though, because the
> > > time_after32() call of tcp_synq_no_recent_overflow() would also
> > > underflow (unless jiffies wrapped since the first time_after32() call),
> > > thus detecting a socket overflow and triggering the syncookie
> > > verification anyway.
> > >
> > > Also, since commit 399040847084 ("bpf: add helper to check for a valid
> > > SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie
> > > helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be
> > > triggered from BPF programs. Even though such programs would normally
> > > pair these two operations, so both underflows would compensate each
> > > other as described above, we'd better avoid exposing the problem
> > > outside of the kernel networking stack.
> > >
> > > Let's fix it by initialising ->rx_opt.ts_recent_stamp and
> > > ->synq_overflow_ts to a value that can be safely compared to jiffies
> > > using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to
> > > indicate that we're not in a socket overflow phase.
> > >
> >
> > A listener could be live for one year, and flip its ' I am under
> > synflood' status every 24 days (assuming HZ=1000)
> >
> > You only made sure the first 24 days are ok, but the problem is still there.
> >
> > We need to refresh the values, maybe in tcp_synq_no_recent_overflow()
> >
> Yes, but can't we refresh it in tcp_synq_overflow() instead? We
> basically always want to update the timestamp, unless it's already in
> the [last_overflow, last_overflow + HZ] interval:
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 36f195fb576a..1a3d76dafba8 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -494,14 +494,16 @@ static inline void tcp_synq_overflow(const struct sock *sk)
>                 reuse = rcu_dereference(sk->sk_reuseport_cb);
>                 if (likely(reuse)) {
>                         last_overflow = READ_ONCE(reuse->synq_overflow_ts);
> -                       if (time_after32(now, last_overflow + HZ))
> +                       if (time_before32(now, last_overflow) ||
> +                           time_after32(now, last_overflow + HZ))
>                                 WRITE_ONCE(reuse->synq_overflow_ts, now);
>                         return;
>                 }
>         }
>
>         last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
> -       if (time_after32(now, last_overflow + HZ))
> +       if (time_before32(now, last_overflow) ||
> +           time_after32(now, last_overflow + HZ))
>                 tcp_sk(sk)->rx_opt.ts_recent_stamp = now;
>  }
>
> This way, tcp_synq_no_recent_overflow() should always have a recent
> timestamp to work on, unless tcp_synq_overflow() wasn't called. But I
> can't see this case happening for a legitimate connection (unless I've
> missed something of course).
>
> One could send an ACK without a SYN and get into this situation, but
> then the timestamp value doesn't have too much importance since we have
> to drop the connection anyway. So, even though an expired timestamp
> could let the packet pass the tcp_synq_no_recent_overflow() test, the
> syncookie validation would fail. So the packet is correctly rejected in
> any case.

But the bug never has been that we could accept an invalid cookie
(this is unlikely because of the secret matching)

The bug was that even if we have not sent recent SYNCOOKIES, we would
still try to validate
a cookie when a stray packet comes.

In the end, the packet would be dropped anyway, but we could drop the
packet much faster.

Updating timestamps in tcp_synq_overflow() is not going to work if we
never call it :)

I believe tcp_synq_no_recent_overflow() is the right place to add the fix.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies
  2019-12-02 22:23     ` Eric Dumazet
@ 2019-12-04  0:46       ` Guillaume Nault
  2019-12-04  2:20         ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Guillaume Nault @ 2019-12-04  0:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann

On Mon, Dec 02, 2019 at 02:23:35PM -0800, Eric Dumazet wrote:
> On Mon, Dec 2, 2019 at 1:51 PM Guillaume Nault <gnault@redhat.com> wrote:
> >
> > On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote:
> > > On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote:
> > > >
> > > > In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the
> > > > time_after32() call might underflow and return the opposite of the
> > > > expected result.
> > > >
> > > > This happens after socket initialisation, when ->synq_overflow_ts and
> > > > ->rx_opt.ts_recent_stamp are still set to zero. In this case, they
> > > > can't be compared reliably to the current value of jiffies using
> > > > time_after32(), because jiffies may be too far apart (especially soon
> > > > after system startup, when it's close to 2^32).
> > > >
> > > > In such a situation, the erroneous time_after32() result prevents
> > > > tcp_synq_overflow() from updating ->synq_overflow_ts and
> > > > ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps
> > > > and exceeds HZ.
> > > >
> > > > Practical consequences should be quite limited though, because the
> > > > time_after32() call of tcp_synq_no_recent_overflow() would also
> > > > underflow (unless jiffies wrapped since the first time_after32() call),
> > > > thus detecting a socket overflow and triggering the syncookie
> > > > verification anyway.
> > > >
> > > > Also, since commit 399040847084 ("bpf: add helper to check for a valid
> > > > SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie
> > > > helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be
> > > > triggered from BPF programs. Even though such programs would normally
> > > > pair these two operations, so both underflows would compensate each
> > > > other as described above, we'd better avoid exposing the problem
> > > > outside of the kernel networking stack.
> > > >
> > > > Let's fix it by initialising ->rx_opt.ts_recent_stamp and
> > > > ->synq_overflow_ts to a value that can be safely compared to jiffies
> > > > using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to
> > > > indicate that we're not in a socket overflow phase.
> > > >
> > >
> > > A listener could be live for one year, and flip its ' I am under
> > > synflood' status every 24 days (assuming HZ=1000)
> > >
> > > You only made sure the first 24 days are ok, but the problem is still there.
> > >
> > > We need to refresh the values, maybe in tcp_synq_no_recent_overflow()
> > >
> > Yes, but can't we refresh it in tcp_synq_overflow() instead? We
> > basically always want to update the timestamp, unless it's already in
> > the [last_overflow, last_overflow + HZ] interval:
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 36f195fb576a..1a3d76dafba8 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -494,14 +494,16 @@ static inline void tcp_synq_overflow(const struct sock *sk)
> >                 reuse = rcu_dereference(sk->sk_reuseport_cb);
> >                 if (likely(reuse)) {
> >                         last_overflow = READ_ONCE(reuse->synq_overflow_ts);
> > -                       if (time_after32(now, last_overflow + HZ))
> > +                       if (time_before32(now, last_overflow) ||
> > +                           time_after32(now, last_overflow + HZ))
> >                                 WRITE_ONCE(reuse->synq_overflow_ts, now);
> >                         return;
> >                 }
> >         }
> >
> >         last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
> > -       if (time_after32(now, last_overflow + HZ))
> > +       if (time_before32(now, last_overflow) ||
> > +           time_after32(now, last_overflow + HZ))
> >                 tcp_sk(sk)->rx_opt.ts_recent_stamp = now;
> >  }
> >
> > This way, tcp_synq_no_recent_overflow() should always have a recent
> > timestamp to work on, unless tcp_synq_overflow() wasn't called. But I
> > can't see this case happening for a legitimate connection (unless I've
> > missed something of course).
> >
> > One could send an ACK without a SYN and get into this situation, but
> > then the timestamp value doesn't have too much importance since we have
> > to drop the connection anyway. So, even though an expired timestamp
> > could let the packet pass the tcp_synq_no_recent_overflow() test, the
> > syncookie validation would fail. So the packet is correctly rejected in
> > any case.
> 
> But the bug never has been that we could accept an invalid cookie
> (this is unlikely because of the secret matching)
> 
I didn't mean that it was. Maybe I wasn't clear, but this last
paragraph was there just to explain why I didn't address the stray
ACK scenario: such packets are correctly rejected anyway (yes we could
be more efficient at it, but that wasn't the original purpose of the
patch).

> The bug was that even if we have not sent recent SYNCOOKIES, we would
> still try to validate a cookie when a stray packet comes.
> 
I realise that I misunderstood your original answer. I was looking at
this problem from a different angle: my original intend was to fix the
legitimate syncookie packet case.

Stale last_overflow timestamps can cause tcp_synq_overflow() to reject
valid packets in the following situation:

  * tcp_synq_overflow() is called while jiffies is not within the
    (last_overflow + HZ, last_overflow + HZ + 2^31] interval. So the
    stale last_overflow timestamp isn't updated because
    time_after32(jiffies, last_overflow + HZ) returns false.

  * Then tcp_synq_no_recent_overflow() is called, jiffies might have
    increased a bit, but can still be in the (last_overflow + TCP_SYNCOOKIE_VALID,
    last_overflow + TCP_SYNCOOKIE_VALID + 2^31] interval. If so,
    time_after32(jiffies, last_overflow + TCP_SYNCOOKIE_VALID) returns
    true and the packet is rejected.

The case is unlikely, and whenever it happens, it can't last more than
two minutes. But the problem is now exposed to BPF programs and the fix
is small, so I think it's worth considering it. Admittedly my original
submission wasn't complete. Checking that jiffies is outside of the
[last_overflow, last_overflow + HZ] interval, as in my second proposal,
should fix the problem completely.

> In the end, the packet would be dropped anyway, but we could drop the
> packet much faster.
> 
> Updating timestamps in tcp_synq_overflow() is not going to work if we
> never call it :)
> 
> I believe tcp_synq_no_recent_overflow() is the right place to add the fix.
> 
Are you talking about a flood of stray ACKs (just to be sure we're on
the same page here)? If so, I guess tcp_synq_no_recent_overflow() is
our only chance to update last_overflow. But do we really need it?

I think tcp_synq_no_recent_overflow() should first check if jiffies
isn't within the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID]
interval. Currently, we just test that it's not within
(last_overflow + TCP_SYNCOOKIE_VALID - 2^31, last_overflow + TCP_SYNCOOKIE_VALID].
That leaves a lot of room for stale timestamps to erroneously trigger
syncookie verification. By reducing the interval, we'd make sure that
only values that are undistinguishable from accurate fresh overflow
timestamps can trigger syncookie verification.

The only bad scenario that'd remain is if a stray ACK flood happens
when last_overflow is stale and jiffies is within the
(last_overflow, last_overflow + TCP_SYNCOOKIE_VALID) interval. If the
flood starts while we're in this state, then I fear that we can't do
anything but to wait for jiffies to exeed the interval's upper bound
(2 minutes max) and to rely on syncookies verification to reject stray
ACKs in the mean time. For cases where the flood starts before
jiffies enters that interval, then refreshing last_overflow in
tcp_synq_no_recent_overflow() would prevent jiffies from entering the
interval for the duration of the flood. last_overflow would have to be
set TCP_SYNCOOKIE_VALID seconds in the past. But since we have
no synchronisation with tcp_synq_overflow(), we'd have a race where
tcp_synq_overflow() would refresh last_overflow (entering
syncookie validation mode) and tcp_synq_no_recent_overflow() would
immediately move it backward (leaving syncookie validation mode). We'd
have to wait for the next SYN to finally reach a stable state with
syncookies validation re-enabled. This is a bit far fetched, and
requires a stray ACK arriving at the same moment a SYN flood is
detected. Still, I feel that moving last_overflow backward in
tcp_synq_no_recent_overflow() is a bit dangerous.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies
  2019-12-04  0:46       ` Guillaume Nault
@ 2019-12-04  2:20         ` Eric Dumazet
  2019-12-04 14:34           ` Guillaume Nault
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2019-12-04  2:20 UTC (permalink / raw)
  To: Guillaume Nault, Eric Dumazet
  Cc: David Miller, Jakub Kicinski, netdev, Arnd Bergmann



On 12/3/19 4:46 PM, Guillaume Nault wrote:
> On Mon, Dec 02, 2019 at 02:23:35PM -0800, Eric Dumazet wrote:
>> On Mon, Dec 2, 2019 at 1:51 PM Guillaume Nault <gnault@redhat.com> wrote:
>>>
>>> On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote:
>>>> On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@redhat.com> wrote:
>>>>>
>>>>> In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the
>>>>> time_after32() call might underflow and return the opposite of the
>>>>> expected result.
>>>>>
>>>>> This happens after socket initialisation, when ->synq_overflow_ts and
>>>>> ->rx_opt.ts_recent_stamp are still set to zero. In this case, they
>>>>> can't be compared reliably to the current value of jiffies using
>>>>> time_after32(), because jiffies may be too far apart (especially soon
>>>>> after system startup, when it's close to 2^32).
>>>>>
>>>>> In such a situation, the erroneous time_after32() result prevents
>>>>> tcp_synq_overflow() from updating ->synq_overflow_ts and
>>>>> ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps
>>>>> and exceeds HZ.
>>>>>
>>>>> Practical consequences should be quite limited though, because the
>>>>> time_after32() call of tcp_synq_no_recent_overflow() would also
>>>>> underflow (unless jiffies wrapped since the first time_after32() call),
>>>>> thus detecting a socket overflow and triggering the syncookie
>>>>> verification anyway.
>>>>>
>>>>> Also, since commit 399040847084 ("bpf: add helper to check for a valid
>>>>> SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie
>>>>> helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be
>>>>> triggered from BPF programs. Even though such programs would normally
>>>>> pair these two operations, so both underflows would compensate each
>>>>> other as described above, we'd better avoid exposing the problem
>>>>> outside of the kernel networking stack.
>>>>>
>>>>> Let's fix it by initialising ->rx_opt.ts_recent_stamp and
>>>>> ->synq_overflow_ts to a value that can be safely compared to jiffies
>>>>> using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to
>>>>> indicate that we're not in a socket overflow phase.
>>>>>
>>>>
>>>> A listener could be live for one year, and flip its ' I am under
>>>> synflood' status every 24 days (assuming HZ=1000)
>>>>
>>>> You only made sure the first 24 days are ok, but the problem is still there.
>>>>
>>>> We need to refresh the values, maybe in tcp_synq_no_recent_overflow()
>>>>
>>> Yes, but can't we refresh it in tcp_synq_overflow() instead? We
>>> basically always want to update the timestamp, unless it's already in
>>> the [last_overflow, last_overflow + HZ] interval:
>>>
>>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>>> index 36f195fb576a..1a3d76dafba8 100644
>>> --- a/include/net/tcp.h
>>> +++ b/include/net/tcp.h
>>> @@ -494,14 +494,16 @@ static inline void tcp_synq_overflow(const struct sock *sk)
>>>                 reuse = rcu_dereference(sk->sk_reuseport_cb);
>>>                 if (likely(reuse)) {
>>>                         last_overflow = READ_ONCE(reuse->synq_overflow_ts);
>>> -                       if (time_after32(now, last_overflow + HZ))
>>> +                       if (time_before32(now, last_overflow) ||
>>> +                           time_after32(now, last_overflow + HZ))
>>>                                 WRITE_ONCE(reuse->synq_overflow_ts, now);
>>>                         return;
>>>                 }
>>>         }
>>>
>>>         last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
>>> -       if (time_after32(now, last_overflow + HZ))
>>> +       if (time_before32(now, last_overflow) ||
>>> +           time_after32(now, last_overflow + HZ))
>>>                 tcp_sk(sk)->rx_opt.ts_recent_stamp = now;
>>>  }
>>>
>>> This way, tcp_synq_no_recent_overflow() should always have a recent
>>> timestamp to work on, unless tcp_synq_overflow() wasn't called. But I
>>> can't see this case happening for a legitimate connection (unless I've
>>> missed something of course).
>>>
>>> One could send an ACK without a SYN and get into this situation, but
>>> then the timestamp value doesn't have too much importance since we have
>>> to drop the connection anyway. So, even though an expired timestamp
>>> could let the packet pass the tcp_synq_no_recent_overflow() test, the
>>> syncookie validation would fail. So the packet is correctly rejected in
>>> any case.
>>
>> But the bug never has been that we could accept an invalid cookie
>> (this is unlikely because of the secret matching)
>>
> I didn't mean that it was. Maybe I wasn't clear, but this last
> paragraph was there just to explain why I didn't address the stray
> ACK scenario: such packets are correctly rejected anyway (yes we could
> be more efficient at it, but that wasn't the original purpose of the
> patch).
> 
>> The bug was that even if we have not sent recent SYNCOOKIES, we would
>> still try to validate a cookie when a stray packet comes.
>>
> I realise that I misunderstood your original answer. I was looking at
> this problem from a different angle: my original intend was to fix the
> legitimate syncookie packet case.
> 
> Stale last_overflow timestamps can cause tcp_synq_overflow() to reject
> valid packets in the following situation:
> 
>   * tcp_synq_overflow() is called while jiffies is not within the
>     (last_overflow + HZ, last_overflow + HZ + 2^31] interval. So the
>     stale last_overflow timestamp isn't updated because
>     time_after32(jiffies, last_overflow + HZ) returns false.
> 
>   * Then tcp_synq_no_recent_overflow() is called, jiffies might have
>     increased a bit, but can still be in the (last_overflow + TCP_SYNCOOKIE_VALID,
>     last_overflow + TCP_SYNCOOKIE_VALID + 2^31] interval. If so,
>     time_after32(jiffies, last_overflow + TCP_SYNCOOKIE_VALID) returns
>     true and the packet is rejected.
> 
> The case is unlikely, and whenever it happens, it can't last more than
> two minutes. But the problem is now exposed to BPF programs and the fix
> is small, so I think it's worth considering it. Admittedly my original
> submission wasn't complete. Checking that jiffies is outside of the
> [last_overflow, last_overflow + HZ] interval, as in my second proposal,
> should fix the problem completely.
> 
>> In the end, the packet would be dropped anyway, but we could drop the
>> packet much faster.
>>
>> Updating timestamps in tcp_synq_overflow() is not going to work if we
>> never call it :)
>>
>> I believe tcp_synq_no_recent_overflow() is the right place to add the fix.
>>
> Are you talking about a flood of stray ACKs (just to be sure we're on
> the same page here)? If so, I guess tcp_synq_no_recent_overflow() is
> our only chance to update last_overflow. But do we really need it?
> 
> I think tcp_synq_no_recent_overflow() should first check if jiffies
> isn't within the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID]
> interval. Currently, we just test that it's not within
> (last_overflow + TCP_SYNCOOKIE_VALID - 2^31, last_overflow + TCP_SYNCOOKIE_VALID].
> That leaves a lot of room for stale timestamps to erroneously trigger
> syncookie verification. By reducing the interval, we'd make sure that
> only values that are undistinguishable from accurate fresh overflow
> timestamps can trigger syncookie verification.
> 
> The only bad scenario that'd remain is if a stray ACK flood happens
> when last_overflow is stale and jiffies is within the
> (last_overflow, last_overflow + TCP_SYNCOOKIE_VALID) interval. If the
> flood starts while we're in this state, then I fear that we can't do
> anything but to wait for jiffies to exeed the interval's upper bound
> (2 minutes max) and to rely on syncookies verification to reject stray
> ACKs in the mean time. For cases where the flood starts before
> jiffies enters that interval, then refreshing last_overflow in
> tcp_synq_no_recent_overflow() would prevent jiffies from entering the
> interval for the duration of the flood. last_overflow would have to be
> set TCP_SYNCOOKIE_VALID seconds in the past. But since we have
> no synchronisation with tcp_synq_overflow(), we'd have a race where
> tcp_synq_overflow() would refresh last_overflow (entering
> syncookie validation mode) and tcp_synq_no_recent_overflow() would
> immediately move it backward (leaving syncookie validation mode). We'd
> have to wait for the next SYN to finally reach a stable state with
> syncookies validation re-enabled. This is a bit far fetched, and
> requires a stray ACK arriving at the same moment a SYN flood is
> detected. Still, I feel that moving last_overflow backward in
> tcp_synq_no_recent_overflow() is a bit dangerous.
> 


Sorry I am completely lost with this amount of text.

Whatever solution you come up with, make sure it covers all the points
that have been raised.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies
  2019-12-04  2:20         ` Eric Dumazet
@ 2019-12-04 14:34           ` Guillaume Nault
  2019-12-04 16:53             ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Guillaume Nault @ 2019-12-04 14:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David Miller, Jakub Kicinski, netdev, Arnd Bergmann

On Tue, Dec 03, 2019 at 06:20:24PM -0800, Eric Dumazet wrote:
> 
> Sorry I am completely lost with this amount of text.
> 
No problem. I'll write a v2 and remork the problem description to make
it clearer.

> Whatever solution you come up with, make sure it covers all the points
> that have been raised.
> 
Will do.
Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies
  2019-12-04 14:34           ` Guillaume Nault
@ 2019-12-04 16:53             ` Eric Dumazet
  2019-12-04 18:05               ` Guillaume Nault
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2019-12-04 16:53 UTC (permalink / raw)
  To: Guillaume Nault, Eric Dumazet
  Cc: Eric Dumazet, David Miller, Jakub Kicinski, netdev, Arnd Bergmann



On 12/4/19 6:34 AM, Guillaume Nault wrote:
> On Tue, Dec 03, 2019 at 06:20:24PM -0800, Eric Dumazet wrote:
>>
>> Sorry I am completely lost with this amount of text.
>>
> No problem. I'll write a v2 and remork the problem description to make
> it clearer.
> 
>> Whatever solution you come up with, make sure it covers all the points
>> that have been raised.
>>
> Will do.
> Thanks.
> 

Le me apologize for my last email.

I should have slept first after my long day.

Thanks a lot for working on this !

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies
  2019-12-04 16:53             ` Eric Dumazet
@ 2019-12-04 18:05               ` Guillaume Nault
  0 siblings, 0 replies; 10+ messages in thread
From: Guillaume Nault @ 2019-12-04 18:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David Miller, Jakub Kicinski, netdev, Arnd Bergmann

On Wed, Dec 04, 2019 at 08:53:08AM -0800, Eric Dumazet wrote:
> 
> 
> On 12/4/19 6:34 AM, Guillaume Nault wrote:
> > On Tue, Dec 03, 2019 at 06:20:24PM -0800, Eric Dumazet wrote:
> >>
> >> Sorry I am completely lost with this amount of text.
> >>
> > No problem. I'll write a v2 and remork the problem description to make
> > it clearer.
> > 
> >> Whatever solution you come up with, make sure it covers all the points
> >> that have been raised.
> >>
> > Will do.
> > Thanks.
> > 
> 
> Le me apologize for my last email.
> 
> I should have slept first after my long day.
> 
> Thanks a lot for working on this !
> 
No problem :)

I'm preparing a small series, splitting the valid syncookie and stray
ACK cases.
I expect to have something ready later tonight.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-12-04 18:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-28 21:36 [PATCH net] tcp: Avoid time_after32() underflow when handling syncookies Guillaume Nault
2019-11-28 22:04 ` Eric Dumazet
2019-11-29  8:13   ` Guillaume Nault
2019-12-02 21:51   ` Guillaume Nault
2019-12-02 22:23     ` Eric Dumazet
2019-12-04  0:46       ` Guillaume Nault
2019-12-04  2:20         ` Eric Dumazet
2019-12-04 14:34           ` Guillaume Nault
2019-12-04 16:53             ` Eric Dumazet
2019-12-04 18:05               ` Guillaume Nault

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).