[PATCH net-next] net: Limit logical shift left of TCP probe0 timeout

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout
@ 2020-12-08  9:19 Cambda Zhu
  2020-12-12 22:32 ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Cambda Zhu @ 2020-12-08  9:19 UTC (permalink / raw)
  To: netdev, Eric Dumazet; +Cc: Dust Li, Tony Lu, Cambda Zhu

For each TCP zero window probe, the icsk_backoff is increased by one and
its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
shift count would be masked to range 0 to 63. And on ARMv7 the result is
zero. If the shift count is masked, only several probes will be sent
with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
needs tcp_retries2 times probes to end this false timeout. Besides,
bitwise shift greater than or equal to the width is an undefined
behavior.

This patch adds a limit to the backoff. The max value of max_when is
TCP_RTO_MAX and the min value of timeout base is TCP_RTO_MIN. The limit
is the backoff from TCP_RTO_MIN to TCP_RTO_MAX.

Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
---
 include/net/tcp.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d4ef5bf94168..82044179c345 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1321,7 +1321,9 @@ static inline unsigned long tcp_probe0_base(const struct sock *sk)
 static inline unsigned long tcp_probe0_when(const struct sock *sk,
 					    unsigned long max_when)
 {
-	u64 when = (u64)tcp_probe0_base(sk) << inet_csk(sk)->icsk_backoff;
+	u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1,
+			   inet_csk(sk)->icsk_backoff);
+	u64 when = (u64)tcp_probe0_base(sk) << backoff;

 	return (unsigned long)min_t(u64, when, max_when);
 }
-- 
2.16.6

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout
  2020-12-08  9:19 [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout Cambda Zhu
@ 2020-12-12 22:32 ` Jakub Kicinski
  2020-12-13 13:59   ` Cambda Zhu
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2020-12-12 22:32 UTC (permalink / raw)
  To: Cambda Zhu, Eric Dumazet; +Cc: netdev, Eric Dumazet, Dust Li, Tony Lu

On Tue,  8 Dec 2020 17:19:10 +0800 Cambda Zhu wrote:
> For each TCP zero window probe, the icsk_backoff is increased by one and
> its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
> probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
> shift count would be masked to range 0 to 63. And on ARMv7 the result is
> zero. If the shift count is masked, only several probes will be sent
> with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
> needs tcp_retries2 times probes to end this false timeout. Besides,
> bitwise shift greater than or equal to the width is an undefined
> behavior.

If icsk_backoff can reach 64, can it not also reach 256 and wrap?

Adding Eric's address from MAINTAINERS to CC.

> This patch adds a limit to the backoff. The max value of max_when is
> TCP_RTO_MAX and the min value of timeout base is TCP_RTO_MIN. The limit
> is the backoff from TCP_RTO_MIN to TCP_RTO_MAX.

> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index d4ef5bf94168..82044179c345 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1321,7 +1321,9 @@ static inline unsigned long tcp_probe0_base(const struct sock *sk)
>  static inline unsigned long tcp_probe0_when(const struct sock *sk,
>  					    unsigned long max_when)
>  {
> -	u64 when = (u64)tcp_probe0_base(sk) << inet_csk(sk)->icsk_backoff;
> +	u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1,
> +			   inet_csk(sk)->icsk_backoff);
> +	u64 when = (u64)tcp_probe0_base(sk) << backoff;
>  
>  	return (unsigned long)min_t(u64, when, max_when);
>  }


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout
  2020-12-12 22:32 ` Jakub Kicinski
@ 2020-12-13 13:59   ` Cambda Zhu
  2020-12-15  2:08     ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Cambda Zhu @ 2020-12-13 13:59 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Eric Dumazet, netdev, Dust Li, Tony Lu


> On Dec 13, 2020, at 06:32, Jakub Kicinski <kuba@kernel.org> wrote:
> 
> On Tue,  8 Dec 2020 17:19:10 +0800 Cambda Zhu wrote:
>> For each TCP zero window probe, the icsk_backoff is increased by one and
>> its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
>> probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
>> shift count would be masked to range 0 to 63. And on ARMv7 the result is
>> zero. If the shift count is masked, only several probes will be sent
>> with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
>> needs tcp_retries2 times probes to end this false timeout. Besides,
>> bitwise shift greater than or equal to the width is an undefined
>> behavior.
> 
> If icsk_backoff can reach 64, can it not also reach 256 and wrap?

If tcp_retries2 is set greater than 255, it can be wrapped. But for TCP probe0,
it seems to be not a serious problem. The timeout will be icsk_rto and backoff
again. And considering icsk_backoff is 8 bits, not only it may always be lesser
than tcp_retries2, but also may always be lesser than tcp_orphan_retries. And
the icsk_probes_out is 8 bits too. So if the max_probes is greater than 255,
the connection won’t abort even if it’s an orphan sock in some cases.

We can change the type of icsk_backoff/icsk_probes_out to fix these problems.
But I think maybe the retries greater than 255 have no sense indeed and the RFC
only requires the timeout(R2) greater than 100s at least. Could it be better to
limit the min/max ranges of their sysctls?

Regards,
Cambda

> 
> Adding Eric's address from MAINTAINERS to CC.
> 
>> This patch adds a limit to the backoff. The max value of max_when is
>> TCP_RTO_MAX and the min value of timeout base is TCP_RTO_MIN. The limit
>> is the backoff from TCP_RTO_MIN to TCP_RTO_MAX.
> 
>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>> index d4ef5bf94168..82044179c345 100644
>> --- a/include/net/tcp.h
>> +++ b/include/net/tcp.h
>> @@ -1321,7 +1321,9 @@ static inline unsigned long tcp_probe0_base(const struct sock *sk)
>> static inline unsigned long tcp_probe0_when(const struct sock *sk,
>> 					    unsigned long max_when)
>> {
>> -	u64 when = (u64)tcp_probe0_base(sk) << inet_csk(sk)->icsk_backoff;
>> +	u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1,
>> +			   inet_csk(sk)->icsk_backoff);
>> +	u64 when = (u64)tcp_probe0_base(sk) << backoff;
>> 
>> 	return (unsigned long)min_t(u64, when, max_when);
>> }


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout
  2020-12-13 13:59   ` Cambda Zhu
@ 2020-12-15  2:08     ` Jakub Kicinski
  2020-12-15 11:06       ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2020-12-15  2:08 UTC (permalink / raw)
  To: Cambda Zhu; +Cc: Eric Dumazet, netdev, Dust Li, Tony Lu

On Sun, 13 Dec 2020 21:59:45 +0800 Cambda Zhu wrote:
> > On Dec 13, 2020, at 06:32, Jakub Kicinski <kuba@kernel.org> wrote:
> > On Tue,  8 Dec 2020 17:19:10 +0800 Cambda Zhu wrote:  
> >> For each TCP zero window probe, the icsk_backoff is increased by one and
> >> its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
> >> probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
> >> shift count would be masked to range 0 to 63. And on ARMv7 the result is
> >> zero. If the shift count is masked, only several probes will be sent
> >> with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
> >> needs tcp_retries2 times probes to end this false timeout. Besides,
> >> bitwise shift greater than or equal to the width is an undefined
> >> behavior.  
> > 
> > If icsk_backoff can reach 64, can it not also reach 256 and wrap?  
> 
> If tcp_retries2 is set greater than 255, it can be wrapped. But for TCP probe0,
> it seems to be not a serious problem. The timeout will be icsk_rto and backoff
> again. And considering icsk_backoff is 8 bits, not only it may always be lesser
> than tcp_retries2, but also may always be lesser than tcp_orphan_retries. And
> the icsk_probes_out is 8 bits too. So if the max_probes is greater than 255,
> the connection won’t abort even if it’s an orphan sock in some cases.
> 
> We can change the type of icsk_backoff/icsk_probes_out to fix these problems.
> But I think maybe the retries greater than 255 have no sense indeed and the RFC
> only requires the timeout(R2) greater than 100s at least. Could it be better to
> limit the min/max ranges of their sysctls?

All right, I think the patch is good as is, applied for 5.11, thank you!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout
  2020-12-15  2:08     ` Jakub Kicinski
@ 2020-12-15 11:06       ` Eric Dumazet
  2020-12-15 13:29         ` Cambda Zhu
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2020-12-15 11:06 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Cambda Zhu, netdev, Dust Li, Tony Lu

On Tue, Dec 15, 2020 at 3:08 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 13 Dec 2020 21:59:45 +0800 Cambda Zhu wrote:
> > > On Dec 13, 2020, at 06:32, Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Tue,  8 Dec 2020 17:19:10 +0800 Cambda Zhu wrote:
> > >> For each TCP zero window probe, the icsk_backoff is increased by one and
> > >> its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
> > >> probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
> > >> shift count would be masked to range 0 to 63. And on ARMv7 the result is
> > >> zero. If the shift count is masked, only several probes will be sent
> > >> with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
> > >> needs tcp_retries2 times probes to end this false timeout. Besides,
> > >> bitwise shift greater than or equal to the width is an undefined
> > >> behavior.
> > >
> > > If icsk_backoff can reach 64, can it not also reach 256 and wrap?
> >
> > If tcp_retries2 is set greater than 255, it can be wrapped. But for TCP probe0,
> > it seems to be not a serious problem. The timeout will be icsk_rto and backoff
> > again. And considering icsk_backoff is 8 bits, not only it may always be lesser
> > than tcp_retries2, but also may always be lesser than tcp_orphan_retries. And
> > the icsk_probes_out is 8 bits too. So if the max_probes is greater than 255,
> > the connection won’t abort even if it’s an orphan sock in some cases.
> >
> > We can change the type of icsk_backoff/icsk_probes_out to fix these problems.
> > But I think maybe the retries greater than 255 have no sense indeed and the RFC
> > only requires the timeout(R2) greater than 100s at least. Could it be better to
> > limit the min/max ranges of their sysctls?
>
> All right, I think the patch is good as is, applied for 5.11, thank you!

It looks like we can remove the (u64) casts then.

Also if we _really_ care about icsk_backoff approaching 63, we also
need to change inet_csk_rto_backoff() ?

Was your patch based on a real world use, or some fuzzer UBSAN report ?

diff --git a/include/net/inet_connection_sock.h
b/include/net/inet_connection_sock.h
index 7338b3865a2a3d278dc27c0167bba1b966bbda9f..a2a145e3b062c0230935c293fc1900df095937d4
100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -242,9 +242,10 @@ static inline unsigned long
 inet_csk_rto_backoff(const struct inet_connection_sock *icsk,
                     unsigned long max_when)
 {
-        u64 when = (u64)icsk->icsk_rto << icsk->icsk_backoff;
+       u8 backoff = min_t(u8, 32U, icsk->icsk_backoff);
+       u64 when = (u64)icsk->icsk_rto << backoff;

-        return (unsigned long)min_t(u64, when, max_when);
+       return (unsigned long)min_t(u64, when, max_when);
 }

 struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 78d13c88720fda50e3f1880ac741cea1985ef3e9..fc6e4d40fd94a717d24ebd8aef7f7930a4551fe9
100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1328,9 +1328,8 @@ static inline unsigned long
tcp_probe0_when(const struct sock *sk,
 {
        u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1,
                           inet_csk(sk)->icsk_backoff);
-       u64 when = (u64)tcp_probe0_base(sk) << backoff;

-       return (unsigned long)min_t(u64, when, max_when);
+       return min(tcp_probe0_base(sk) << backoff, max_when);
 }

 static inline void tcp_check_probe_timer(struct sock *sk)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout
  2020-12-15 11:06       ` Eric Dumazet
@ 2020-12-15 13:29         ` Cambda Zhu
  0 siblings, 0 replies; 6+ messages in thread
From: Cambda Zhu @ 2020-12-15 13:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jakub Kicinski, netdev, Dust Li, Tony Lu


> On Dec 15, 2020, at 19:06, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, Dec 15, 2020 at 3:08 AM Jakub Kicinski <kuba@kernel.org> wrote:
>> 
>> On Sun, 13 Dec 2020 21:59:45 +0800 Cambda Zhu wrote:
>>>> On Dec 13, 2020, at 06:32, Jakub Kicinski <kuba@kernel.org> wrote:
>>>> On Tue,  8 Dec 2020 17:19:10 +0800 Cambda Zhu wrote:
>>>>> For each TCP zero window probe, the icsk_backoff is increased by one and
>>>>> its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
>>>>> probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
>>>>> shift count would be masked to range 0 to 63. And on ARMv7 the result is
>>>>> zero. If the shift count is masked, only several probes will be sent
>>>>> with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
>>>>> needs tcp_retries2 times probes to end this false timeout. Besides,
>>>>> bitwise shift greater than or equal to the width is an undefined
>>>>> behavior.
>>>> 
>>>> If icsk_backoff can reach 64, can it not also reach 256 and wrap?
>>> 
>>> If tcp_retries2 is set greater than 255, it can be wrapped. But for TCP probe0,
>>> it seems to be not a serious problem. The timeout will be icsk_rto and backoff
>>> again. And considering icsk_backoff is 8 bits, not only it may always be lesser
>>> than tcp_retries2, but also may always be lesser than tcp_orphan_retries. And
>>> the icsk_probes_out is 8 bits too. So if the max_probes is greater than 255,
>>> the connection won’t abort even if it’s an orphan sock in some cases.
>>> 
>>> We can change the type of icsk_backoff/icsk_probes_out to fix these problems.
>>> But I think maybe the retries greater than 255 have no sense indeed and the RFC
>>> only requires the timeout(R2) greater than 100s at least. Could it be better to
>>> limit the min/max ranges of their sysctls?
>> 
>> All right, I think the patch is good as is, applied for 5.11, thank you!
> 
> It looks like we can remove the (u64) casts then.
> 
> Also if we _really_ care about icsk_backoff approaching 63, we also
> need to change inet_csk_rto_backoff() ?

Yes, we need. But the socket can close after tcp_orphan_retries times probes even if alive is
always true. And there’re something I’m not very clear yet:
1) inet_csk_rto_backoff() may be not only for TCP, and the RFC 6298 requires the max value
   of RTO is 60 seconds at least. So what’s the proper shift limit?
2) If max_probes is greater than 255, the icsk_probes_out cannot be greater than max_probes
   and the connection may not close forever. This looks more serious.

> 
> Was your patch based on a real world use, or some fuzzer UBSAN report ?
> 

I found this issue not on TCP. I’m developing a private protocol, and in this protocol I made
something like probe0 with max RTO lesser than 120 seconds. I found similar issues on testing
and found Linux TCP have same issues. So it’s not a real world use for TCP and it may be ok to
ignore the issues.

> diff --git a/include/net/inet_connection_sock.h
> b/include/net/inet_connection_sock.h
> index 7338b3865a2a3d278dc27c0167bba1b966bbda9f..a2a145e3b062c0230935c293fc1900df095937d4
> 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -242,9 +242,10 @@ static inline unsigned long
> inet_csk_rto_backoff(const struct inet_connection_sock *icsk,
>                     unsigned long max_when)
> {
> -        u64 when = (u64)icsk->icsk_rto << icsk->icsk_backoff;
> +       u8 backoff = min_t(u8, 32U, icsk->icsk_backoff);
> +       u64 when = (u64)icsk->icsk_rto << backoff;
> 
> -        return (unsigned long)min_t(u64, when, max_when);
> +       return (unsigned long)min_t(u64, when, max_when);
> }
> 
> struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern);
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 78d13c88720fda50e3f1880ac741cea1985ef3e9..fc6e4d40fd94a717d24ebd8aef7f7930a4551fe9
> 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1328,9 +1328,8 @@ static inline unsigned long
> tcp_probe0_when(const struct sock *sk,
> {
>        u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1,
>                           inet_csk(sk)->icsk_backoff);
> -       u64 when = (u64)tcp_probe0_base(sk) << backoff;
> 
> -       return (unsigned long)min_t(u64, when, max_when);
> +       return min(tcp_probe0_base(sk) << backoff, max_when);
> }
> 
> static inline void tcp_check_probe_timer(struct sock *sk)


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-12-15 13:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-08  9:19 [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout Cambda Zhu
2020-12-12 22:32 ` Jakub Kicinski
2020-12-13 13:59   ` Cambda Zhu
2020-12-15  2:08     ` Jakub Kicinski
2020-12-15 11:06       ` Eric Dumazet
2020-12-15 13:29         ` Cambda Zhu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.