netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2 0/2] tcp: fix handling of stale syncookies timestamps
@ 2019-12-05  0:58 Guillaume Nault
  2019-12-05  0:59 ` [PATCH net v2 1/2] tcp: fix rejected syncookies due to stale timestamps Guillaume Nault
  2019-12-05  0:59 ` [PATCH net v2 2/2] tcp: tighten acceptance of ACKs not matching a child socket Guillaume Nault
  0 siblings, 2 replies; 8+ messages in thread
From: Guillaume Nault @ 2019-12-05  0:58 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet, Arnd Bergmann

The synflood timestamps (->ts_recent_stamp and ->synq_overflow_ts) are
only refreshed when the syncookie protection triggers. Therefore, their
value can become very far apart from jiffies if no synflood happens for
a long time.

If jiffies grows too much and wraps while the synflood timestamp isn't
refreshed, then time_after32() might consider the later to be in the
future. This can trick tcp_synq_no_recent_overflow() into returning
erroneous values and rejecting valid ACKs.

Patch 1 handles the case of ACKs using legitimate syncookies.
Patch 2 handles the case of stray ACKs.

Changes from v1:
  - Initialising timestamps at socket creation time is not enough
    because jiffies wraps in 24 days with HZ=1000 (Eric Dumazet).
    Handle stale timestamps in tcp_synq_overflow() and
    tcp_synq_no_recent_overflow() instead.
  - Rework commit description.
  - Add a second patch to handle the case of stray ACKs.

Guillaume Nault (2):
  tcp: fix rejected syncookies due to stale timestamps
  tcp: tighten acceptance of ACKs not matching a child socket

 include/net/tcp.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

-- 
@DaveM, I'm sending both patches in one series as they logically fit
together, although patch 2 is arguably a performance optimisation. I
can drop it from the series and repost it when net-next reopens if
you prefer. Although that'd make the link between the two less obvious.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net v2 1/2] tcp: fix rejected syncookies due to stale timestamps
  2019-12-05  0:58 [PATCH net v2 0/2] tcp: fix handling of stale syncookies timestamps Guillaume Nault
@ 2019-12-05  0:59 ` Guillaume Nault
  2019-12-05  0:59 ` [PATCH net v2 2/2] tcp: tighten acceptance of ACKs not matching a child socket Guillaume Nault
  1 sibling, 0 replies; 8+ messages in thread
From: Guillaume Nault @ 2019-12-05  0:59 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet, Arnd Bergmann

If no synflood happens for a long enough period of time, then the
synflood timestamp isn't refreshed and jiffies can advance so much
that time_after32() can't accurately compare them any more.

Therefore, we can end up in a situation where time_after32(now,
last_overflow + HZ) returns false, just because these two values are
too far apart. In that case, the synflood timestamp isn't updated as
it should be, which can trick tcp_synq_no_recent_overflow() into
rejecting valid syncookies.

For example, let's consider the following scenario on a system
with HZ=1000:

  * The synflood timestamp is 0, either because that's the timestamp
    of the last synflood or, more commonly, because we're working with
    a freshly created socket.

  * We receive a new SYN, which triggers synflood protection. Let's say
    that this happens when jiffies == 2147484649 (that is,
    'synflood timestamp' + HZ + 2^31 + 1).

  * Then tcp_synq_overflow() doesn't update the synflood timestamp,
    because time_after32(2147484649, 1000) returns false.
    With:
      - 2147484649: the value of jiffies, aka. 'now'.
      - 1000: the value of 'last_overflow' + HZ.

  * A bit later, we receive the ACK completing the 3WHS. But
    cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
    says that we're not under synflood. That's because
    time_after32(2147484649, 120000) returns false.
    With:
      - 2147484649: the value of jiffies, aka. 'now'.
      - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.

    Of course, in reality jiffies would have increased a bit, but this
    condition will last for the next 119 seconds, which is far enough
    to accommodate for jiffie's growth.

Fix this by updating the overflow timestamp whenever jiffies isn't
within the [last_overflow, last_overflow + HZ] range. That shouldn't
have any performance impact: only stale timestamps would trigger the
time_before32() case, and that can only happen for the first SYN of a
flood.

Now we're guaranteed to have fresh timestamps while under synflood, so
tcp_synq_no_recent_overflow() can safely use it with time_after32() in
such situations.

Stale timestamps can still make tcp_synq_no_recent_overflow() return
the wrong verdict when not under synflood. This will be handled in the
next patch.

For 64 bits architectures, the problem was introduced with the
conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
cca9bab1b72c ("tcp: use monotonic timestamps for PAWS").
The problem has always been there on 32 bits architectures.

Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
 include/net/tcp.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 36f195fb576a..f0eae83ee555 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -494,14 +494,16 @@ static inline void tcp_synq_overflow(const struct sock *sk)
 		reuse = rcu_dereference(sk->sk_reuseport_cb);
 		if (likely(reuse)) {
 			last_overflow = READ_ONCE(reuse->synq_overflow_ts);
-			if (time_after32(now, last_overflow + HZ))
+			if (time_after32(now, last_overflow + HZ) ||
+			    time_before32(now, last_overflow))
 				WRITE_ONCE(reuse->synq_overflow_ts, now);
 			return;
 		}
 	}
 
 	last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
-	if (time_after32(now, last_overflow + HZ))
+	if (time_after32(now, last_overflow + HZ) ||
+	    time_before32(now, last_overflow))
 		tcp_sk(sk)->rx_opt.ts_recent_stamp = now;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net v2 2/2] tcp: tighten acceptance of ACKs not matching a child socket
  2019-12-05  0:58 [PATCH net v2 0/2] tcp: fix handling of stale syncookies timestamps Guillaume Nault
  2019-12-05  0:59 ` [PATCH net v2 1/2] tcp: fix rejected syncookies due to stale timestamps Guillaume Nault
@ 2019-12-05  0:59 ` Guillaume Nault
  2019-12-05  3:08   ` Eric Dumazet
  1 sibling, 1 reply; 8+ messages in thread
From: Guillaume Nault @ 2019-12-05  0:59 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet

When no synflood occurs, the synflood timestamp isn't updated.
Therefore it can be so old that time_after32() can consider it to be
in the future.

That's a problem for tcp_synq_no_recent_overflow() as it may report
that a recent overflow occurred while, in fact, it's just that jiffies
has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.

Spurious detection of recent overflows lead to extra syncookie
verification in cookie_v[46]_check(). At that point, the verification
should fail and the packet dropped. But we should have dropped the
packet earlier as we didn't even send a syncookie.

Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
only if jiffies is within the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
way, no spurious recent overflow is reported when jiffies wraps and
'last_overflow' becomes in the future from the point of view of
time_after32().

However, if jiffies wraps and enters the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
'last_overflow' being a stale synflood timestamp), then
tcp_synq_no_recent_overflow() still erroneously reports an
overflow. In such cases, we have to rely on syncookie verification
to drop the packet. We unfortunately have no way to differentiate
between a fresh and a stale syncookie timestamp.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
 include/net/tcp.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index f0eae83ee555..005d4c691543 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -520,12 +520,14 @@ static inline bool tcp_synq_no_recent_overflow(const struct sock *sk)
 		if (likely(reuse)) {
 			last_overflow = READ_ONCE(reuse->synq_overflow_ts);
 			return time_after32(now, last_overflow +
-					    TCP_SYNCOOKIE_VALID);
+					    TCP_SYNCOOKIE_VALID) ||
+				time_before32(now, last_overflow);
 		}
 	}
 
 	last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
-	return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID);
+	return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID) ||
+		time_before32(now, last_overflow);
 }
 
 static inline u32 tcp_cookie_time(void)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2 2/2] tcp: tighten acceptance of ACKs not matching a child socket
  2019-12-05  0:59 ` [PATCH net v2 2/2] tcp: tighten acceptance of ACKs not matching a child socket Guillaume Nault
@ 2019-12-05  3:08   ` Eric Dumazet
  2019-12-05 18:00     ` Guillaume Nault
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2019-12-05  3:08 UTC (permalink / raw)
  To: Guillaume Nault, David Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet



On 12/4/19 4:59 PM, Guillaume Nault wrote:
> When no synflood occurs, the synflood timestamp isn't updated.
> Therefore it can be so old that time_after32() can consider it to be
> in the future.
> 
> That's a problem for tcp_synq_no_recent_overflow() as it may report
> that a recent overflow occurred while, in fact, it's just that jiffies
> has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
> 
> Spurious detection of recent overflows lead to extra syncookie
> verification in cookie_v[46]_check(). At that point, the verification
> should fail and the packet dropped. But we should have dropped the
> packet earlier as we didn't even send a syncookie.
> 
> Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
> only if jiffies is within the
> [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
> way, no spurious recent overflow is reported when jiffies wraps and
> 'last_overflow' becomes in the future from the point of view of
> time_after32().
> 
> However, if jiffies wraps and enters the
> [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
> 'last_overflow' being a stale synflood timestamp), then
> tcp_synq_no_recent_overflow() still erroneously reports an
> overflow. In such cases, we have to rely on syncookie verification
> to drop the packet. We unfortunately have no way to differentiate
> between a fresh and a stale syncookie timestamp.
> 
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
>  include/net/tcp.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index f0eae83ee555..005d4c691543 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -520,12 +520,14 @@ static inline bool tcp_synq_no_recent_overflow(const struct sock *sk)
>  		if (likely(reuse)) {
>  			last_overflow = READ_ONCE(reuse->synq_overflow_ts);
>  			return time_after32(now, last_overflow +
> -					    TCP_SYNCOOKIE_VALID);
> +					    TCP_SYNCOOKIE_VALID) ||
> +				time_before32(now, last_overflow);
>  		}
>  	}
>  
>  	last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
> -	return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID);
> +	return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID) ||
> +		time_before32(now, last_overflow);
>  }


There is a race I believe here.

CPU1                                 CPU2
 
now = jiffies.
    ...
                                     jiffies++
                                     ...
                                     SYN received, last_overflow is updated to the new jiffies.


CPU1 
 timer_before32(now, last_overflow) is true, because last_overflow was set to now+1


I suggest some cushion here.

Also we TCP uses between() macro, we might add a time_between32(a, b, c) macro
to ease code review.

->
  return !time_between32(last_overflow - HZ, now, last_overflow + TCP_SYNCOOKIE_VALID);



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2 2/2] tcp: tighten acceptance of ACKs not matching a child socket
  2019-12-05  3:08   ` Eric Dumazet
@ 2019-12-05 18:00     ` Guillaume Nault
  2019-12-05 18:14       ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Guillaume Nault @ 2019-12-05 18:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Jakub Kicinski, netdev, Eric Dumazet

On Wed, Dec 04, 2019 at 07:08:49PM -0800, Eric Dumazet wrote:
> 
> 
> On 12/4/19 4:59 PM, Guillaume Nault wrote:
> > When no synflood occurs, the synflood timestamp isn't updated.
> > Therefore it can be so old that time_after32() can consider it to be
> > in the future.
> > 
> > That's a problem for tcp_synq_no_recent_overflow() as it may report
> > that a recent overflow occurred while, in fact, it's just that jiffies
> > has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
> > 
> > Spurious detection of recent overflows lead to extra syncookie
> > verification in cookie_v[46]_check(). At that point, the verification
> > should fail and the packet dropped. But we should have dropped the
> > packet earlier as we didn't even send a syncookie.
> > 
> > Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
> > only if jiffies is within the
> > [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
> > way, no spurious recent overflow is reported when jiffies wraps and
> > 'last_overflow' becomes in the future from the point of view of
> > time_after32().
> > 
> > However, if jiffies wraps and enters the
> > [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
> > 'last_overflow' being a stale synflood timestamp), then
> > tcp_synq_no_recent_overflow() still erroneously reports an
> > overflow. In such cases, we have to rely on syncookie verification
> > to drop the packet. We unfortunately have no way to differentiate
> > between a fresh and a stale syncookie timestamp.
> > 
> > Signed-off-by: Guillaume Nault <gnault@redhat.com>
> > ---
> >  include/net/tcp.h | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index f0eae83ee555..005d4c691543 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -520,12 +520,14 @@ static inline bool tcp_synq_no_recent_overflow(const struct sock *sk)
> >  		if (likely(reuse)) {
> >  			last_overflow = READ_ONCE(reuse->synq_overflow_ts);
> >  			return time_after32(now, last_overflow +
> > -					    TCP_SYNCOOKIE_VALID);
> > +					    TCP_SYNCOOKIE_VALID) ||
> > +				time_before32(now, last_overflow);
> >  		}
> >  	}
> >  
> >  	last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
> > -	return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID);
> > +	return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID) ||
> > +		time_before32(now, last_overflow);
> >  }
> 
> 
> There is a race I believe here.
> 
> CPU1                                 CPU2
>  
> now = jiffies.
>     ...
>                                      jiffies++
>                                      ...
>                                      SYN received, last_overflow is updated to the new jiffies.
> 
> 
> CPU1 
>  timer_before32(now, last_overflow) is true, because last_overflow was set to now+1
> 
> 
> I suggest some cushion here.
> 
Yes, we should wrap access to ->rx_opt.ts_recent_stamp into READ_ONCE(),
to ensure that last_overflow won't be reloaded between the
time_after32() and the time_before32() calls. Is that what you had in
mind?

-	last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
+	last_overflow = READ_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp);

Patch 1 would need the same fix BTW.

> Also we TCP uses between() macro, we might add a time_between32(a, b, c) macro
> to ease code review.
> 
I didn't realise that. I'll define it in v3.

> ->
>   return !time_between32(last_overflow - HZ, now, last_overflow + TCP_SYNCOOKIE_VALID);
> 
'last_overflow - HZ'? I don't get why we'd take HZ into account here.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2 2/2] tcp: tighten acceptance of ACKs not matching a child socket
  2019-12-05 18:00     ` Guillaume Nault
@ 2019-12-05 18:14       ` Eric Dumazet
  2019-12-05 19:22         ` Guillaume Nault
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2019-12-05 18:14 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: Eric Dumazet, David Miller, Jakub Kicinski, netdev

On Thu, Dec 5, 2019 at 10:00 AM Guillaume Nault <gnault@redhat.com> wrote:
>
> On Wed, Dec 04, 2019 at 07:08:49PM -0800, Eric Dumazet wrote:
> >
> >
> > On 12/4/19 4:59 PM, Guillaume Nault wrote:
> > > When no synflood occurs, the synflood timestamp isn't updated.
> > > Therefore it can be so old that time_after32() can consider it to be
> > > in the future.
> > >
> > > That's a problem for tcp_synq_no_recent_overflow() as it may report
> > > that a recent overflow occurred while, in fact, it's just that jiffies
> > > has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
> > >
> > > Spurious detection of recent overflows lead to extra syncookie
> > > verification in cookie_v[46]_check(). At that point, the verification
> > > should fail and the packet dropped. But we should have dropped the
> > > packet earlier as we didn't even send a syncookie.
> > >
> > > Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
> > > only if jiffies is within the
> > > [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
> > > way, no spurious recent overflow is reported when jiffies wraps and
> > > 'last_overflow' becomes in the future from the point of view of
> > > time_after32().
> > >
> > > However, if jiffies wraps and enters the
> > > [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
> > > 'last_overflow' being a stale synflood timestamp), then
> > > tcp_synq_no_recent_overflow() still erroneously reports an
> > > overflow. In such cases, we have to rely on syncookie verification
> > > to drop the packet. We unfortunately have no way to differentiate
> > > between a fresh and a stale syncookie timestamp.
> > >
> > > Signed-off-by: Guillaume Nault <gnault@redhat.com>
> > > ---
> > >  include/net/tcp.h | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > > index f0eae83ee555..005d4c691543 100644
> > > --- a/include/net/tcp.h
> > > +++ b/include/net/tcp.h
> > > @@ -520,12 +520,14 @@ static inline bool tcp_synq_no_recent_overflow(const struct sock *sk)
> > >             if (likely(reuse)) {
> > >                     last_overflow = READ_ONCE(reuse->synq_overflow_ts);
> > >                     return time_after32(now, last_overflow +
> > > -                                       TCP_SYNCOOKIE_VALID);
> > > +                                       TCP_SYNCOOKIE_VALID) ||
> > > +                           time_before32(now, last_overflow);
> > >             }
> > >     }
> > >
> > >     last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
> > > -   return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID);
> > > +   return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID) ||
> > > +           time_before32(now, last_overflow);
> > >  }
> >
> >
> > There is a race I believe here.
> >
> > CPU1                                 CPU2
> >
> > now = jiffies.
> >     ...
> >                                      jiffies++
> >                                      ...
> >                                      SYN received, last_overflow is updated to the new jiffies.
> >
> >
> > CPU1
> >  timer_before32(now, last_overflow) is true, because last_overflow was set to now+1
> >
> >
> > I suggest some cushion here.
> >
> Yes, we should wrap access to ->rx_opt.ts_recent_stamp into READ_ONCE(),
> to ensure that last_overflow won't be reloaded between the
> time_after32() and the time_before32() calls. Is that what you had in
> mind?
>
> -       last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
> +       last_overflow = READ_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp);
>
> Patch 1 would need the same fix BTW.
>
> > Also we TCP uses between() macro, we might add a time_between32(a, b, c) macro
> > to ease code review.
> >
> I didn't realise that. I'll define it in v3.
>
> > ->
> >   return !time_between32(last_overflow - HZ, now, last_overflow + TCP_SYNCOOKIE_VALID);
> >
> 'last_overflow - HZ'? I don't get why we'd take HZ into account here.
>

Please read carefuly my prior feedback.

Even with READ_ONCE(), you still have a race.


CPU1                                 CPU2

now = jiffies.
 <some long interrupt>
                                     jiffies++  (or jiffies += 3 or 4
if CPU1 has been interrupted by a long interrupt)
                                     ...
                                     SYN received, last_overflow is
updated to the new jiffies.


CPU1

 @now still has a stale values (an old jiffies value)
 timer_before32(now, last_overflow) is true, because last_overflow was
set to now+1 (or now + 2 or now + 3)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2 2/2] tcp: tighten acceptance of ACKs not matching a child socket
  2019-12-05 18:14       ` Eric Dumazet
@ 2019-12-05 19:22         ` Guillaume Nault
  2019-12-05 19:30           ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Guillaume Nault @ 2019-12-05 19:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David Miller, Jakub Kicinski, netdev

On Thu, Dec 05, 2019 at 10:14:15AM -0800, Eric Dumazet wrote:
> On Thu, Dec 5, 2019 at 10:00 AM Guillaume Nault <gnault@redhat.com> wrote:
> >
> > On Wed, Dec 04, 2019 at 07:08:49PM -0800, Eric Dumazet wrote:
> > >
> > >
> > > On 12/4/19 4:59 PM, Guillaume Nault wrote:
> > > > When no synflood occurs, the synflood timestamp isn't updated.
> > > > Therefore it can be so old that time_after32() can consider it to be
> > > > in the future.
> > > >
> > > > That's a problem for tcp_synq_no_recent_overflow() as it may report
> > > > that a recent overflow occurred while, in fact, it's just that jiffies
> > > > has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
> > > >
> > > > Spurious detection of recent overflows lead to extra syncookie
> > > > verification in cookie_v[46]_check(). At that point, the verification
> > > > should fail and the packet dropped. But we should have dropped the
> > > > packet earlier as we didn't even send a syncookie.
> > > >
> > > > Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
> > > > only if jiffies is within the
> > > > [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
> > > > way, no spurious recent overflow is reported when jiffies wraps and
> > > > 'last_overflow' becomes in the future from the point of view of
> > > > time_after32().
> > > >
> > > > However, if jiffies wraps and enters the
> > > > [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
> > > > 'last_overflow' being a stale synflood timestamp), then
> > > > tcp_synq_no_recent_overflow() still erroneously reports an
> > > > overflow. In such cases, we have to rely on syncookie verification
> > > > to drop the packet. We unfortunately have no way to differentiate
> > > > between a fresh and a stale syncookie timestamp.
> > > >
> > > > Signed-off-by: Guillaume Nault <gnault@redhat.com>
> > > > ---
> > > >  include/net/tcp.h | 6 ++++--
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > > > index f0eae83ee555..005d4c691543 100644
> > > > --- a/include/net/tcp.h
> > > > +++ b/include/net/tcp.h
> > > > @@ -520,12 +520,14 @@ static inline bool tcp_synq_no_recent_overflow(const struct sock *sk)
> > > >             if (likely(reuse)) {
> > > >                     last_overflow = READ_ONCE(reuse->synq_overflow_ts);
> > > >                     return time_after32(now, last_overflow +
> > > > -                                       TCP_SYNCOOKIE_VALID);
> > > > +                                       TCP_SYNCOOKIE_VALID) ||
> > > > +                           time_before32(now, last_overflow);
> > > >             }
> > > >     }
> > > >
> > > >     last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
> > > > -   return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID);
> > > > +   return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID) ||
> > > > +           time_before32(now, last_overflow);
> > > >  }
> > >
> > >
> > > There is a race I believe here.
> > >
> > > CPU1                                 CPU2
> > >
> > > now = jiffies.
> > >     ...
> > >                                      jiffies++
> > >                                      ...
> > >                                      SYN received, last_overflow is updated to the new jiffies.
> > >
> > >
> > > CPU1
> > >  timer_before32(now, last_overflow) is true, because last_overflow was set to now+1
> > >
> > >
> > > I suggest some cushion here.
> > >
> > Yes, we should wrap access to ->rx_opt.ts_recent_stamp into READ_ONCE(),
> > to ensure that last_overflow won't be reloaded between the
> > time_after32() and the time_before32() calls. Is that what you had in
> > mind?
> >
> > -       last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
> > +       last_overflow = READ_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp);
> >
> > Patch 1 would need the same fix BTW.
> >
> > > Also we TCP uses between() macro, we might add a time_between32(a, b, c) macro
> > > to ease code review.
> > >
> > I didn't realise that. I'll define it in v3.
> >
> > > ->
> > >   return !time_between32(last_overflow - HZ, now, last_overflow + TCP_SYNCOOKIE_VALID);
> > >
> > 'last_overflow - HZ'? I don't get why we'd take HZ into account here.
> >
> 
> Please read carefuly my prior feedback.
> 
> Even with READ_ONCE(), you still have a race.
> 
> 
> CPU1                                 CPU2
> 
> now = jiffies.
>  <some long interrupt>
>                                      jiffies++  (or jiffies += 3 or 4
> if CPU1 has been interrupted by a long interrupt)
>                                      ...
>                                      SYN received, last_overflow is
> updated to the new jiffies.
> 
> 
> CPU1
> 
>  @now still has a stale values (an old jiffies value)
>  timer_before32(now, last_overflow) is true, because last_overflow was
> set to now+1 (or now + 2 or now + 3)
> 
Ok, I get it now. Thanks!
Will send v3 using 'last_overflow - HZ' as lower bound.

I think READ_ONCE()/WRITE_ONCE() are still necessary to prevent reloading
and imaginary write of last_overflow. At least that's my understanding
after reading memory-barriers.txt again.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v2 2/2] tcp: tighten acceptance of ACKs not matching a child socket
  2019-12-05 19:22         ` Guillaume Nault
@ 2019-12-05 19:30           ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2019-12-05 19:30 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: Eric Dumazet, David Miller, Jakub Kicinski, netdev

On Thu, Dec 5, 2019 at 11:23 AM Guillaume Nault <gnault@redhat.com> wrote:
>

> Ok, I get it now. Thanks!
> Will send v3 using 'last_overflow - HZ' as lower bound.
>
> I think READ_ONCE()/WRITE_ONCE() are still necessary to prevent reloading
> and imaginary write of last_overflow. At least that's my understanding
> after reading memory-barriers.txt again.

Sure, it can not hurt, but please make this as a separate patch, thanks !

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-12-05 19:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05  0:58 [PATCH net v2 0/2] tcp: fix handling of stale syncookies timestamps Guillaume Nault
2019-12-05  0:59 ` [PATCH net v2 1/2] tcp: fix rejected syncookies due to stale timestamps Guillaume Nault
2019-12-05  0:59 ` [PATCH net v2 2/2] tcp: tighten acceptance of ACKs not matching a child socket Guillaume Nault
2019-12-05  3:08   ` Eric Dumazet
2019-12-05 18:00     ` Guillaume Nault
2019-12-05 18:14       ` Eric Dumazet
2019-12-05 19:22         ` Guillaume Nault
2019-12-05 19:30           ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).