All of lore.kernel.org
 help / color / mirror / Atom feed
* [regression] TC_MD5SIG on established sockets
@ 2020-05-13 19:38 Mathieu Desnoyers
  2020-05-13 19:49 ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2020-05-13 19:38 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien

Hi,

I am reporting a regression with respect to use of TCP_MD5SIG/TCP_MD5SIG_EXT
on established sockets. It is observed by a customer.

This issue is introduced by this commit:

commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets"

The intent of this commit appears to be to fix a use of uninitialized value in
tcp_parse_options(). The change introduced by this commit is to disallow setting
the TCP_MD5SIG{,_EXT} socket options on an established socket.

The justification for this change appears in the commit message:

   "I believe this was caused by a TCP_MD5SIG being set on live
    flow.
    
    This is highly unexpected, since TCP option space is limited.
    
    For instance, presence of TCP MD5 option automatically disables
    TCP TimeStamp option at SYN/SYNACK time, which we can not do
    once flow has been established.
    
    Really, adding/deleting an MD5 key only makes sense on sockets
    in CLOSE or LISTEN state."

However, reading through RFC2385 [1], this justification does not appear
correct. Quoting to the RFC:

   "This password never appears in the connection stream, and the actual
    form of the password is up to the application. It could even change
    during the lifetime of a particular connection so long as this change
    was synchronized on both ends"

The paragraph above clearly underlines that changing the MD5 signature of
a live TCP socket is allowed.

I also do not understand why it would be invalid to transition an established
TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting the
RFC:

  "The total header size is also an issue.  The TCP header specifies
   where segment data starts with a 4-bit field which gives the total
   size of the header (including options) in 32-byte words.  This means
   that the total size of the header plus option must be less than or
   equal to 60 bytes -- this leaves 40 bytes for options."

The paragraph above seems to be the only indication that some TCP options
cannot be combined on a given TCP socket: if the resulting header size does
not fit. However, I do not see anything in the specification preventing any
of the following use-cases on an established TCP socket:

- Transition from no-MD5 to MD5,
- Transition from MD5 to no-MD5,
- Changing the MD5 key associated with a socket.

As long as the resulting combination of options does not exceed the available
header space.

Can we please fix this KASAN report in a way that does not break user-space
applications expectations about Linux' implementation of RFC2385 ?

Thanks,

Mathieu

[1] RFC2385: https://tools.ietf.org/html/rfc2385

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TC_MD5SIG on established sockets
  2020-05-13 19:38 [regression] TC_MD5SIG on established sockets Mathieu Desnoyers
@ 2020-05-13 19:49 ` Eric Dumazet
  2020-05-13 19:56   ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2020-05-13 19:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David S. Miller, linux-kernel, netdev, Yuchung Cheng,
	Jonathan Rajotte-Julien

I do not think we want to transition sockets in the middle. since
packets can be re-ordered in the network.

MD5 is about security (and a loose form of it), so better make sure
all packets have it from the beginning of the flow.

A flow with TCP TS on can not suddenly be sending packets without TCP TS.

Clearly, trying to support this operation is a can of worms, I do not
want to maintain such atrocity.

RFC can state whatever it wants, sometimes reality forces us to have
sane operations.

Thanks.

On Wed, May 13, 2020 at 12:38 PM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> Hi,
>
> I am reporting a regression with respect to use of TCP_MD5SIG/TCP_MD5SIG_EXT
> on established sockets. It is observed by a customer.
>
> This issue is introduced by this commit:
>
> commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets"
>
> The intent of this commit appears to be to fix a use of uninitialized value in
> tcp_parse_options(). The change introduced by this commit is to disallow setting
> the TCP_MD5SIG{,_EXT} socket options on an established socket.
>
> The justification for this change appears in the commit message:
>
>    "I believe this was caused by a TCP_MD5SIG being set on live
>     flow.
>
>     This is highly unexpected, since TCP option space is limited.
>
>     For instance, presence of TCP MD5 option automatically disables
>     TCP TimeStamp option at SYN/SYNACK time, which we can not do
>     once flow has been established.
>
>     Really, adding/deleting an MD5 key only makes sense on sockets
>     in CLOSE or LISTEN state."
>
> However, reading through RFC2385 [1], this justification does not appear
> correct. Quoting to the RFC:
>
>    "This password never appears in the connection stream, and the actual
>     form of the password is up to the application. It could even change
>     during the lifetime of a particular connection so long as this change
>     was synchronized on both ends"
>
> The paragraph above clearly underlines that changing the MD5 signature of
> a live TCP socket is allowed.
>
> I also do not understand why it would be invalid to transition an established
> TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting the
> RFC:
>
>   "The total header size is also an issue.  The TCP header specifies
>    where segment data starts with a 4-bit field which gives the total
>    size of the header (including options) in 32-byte words.  This means
>    that the total size of the header plus option must be less than or
>    equal to 60 bytes -- this leaves 40 bytes for options."
>
> The paragraph above seems to be the only indication that some TCP options
> cannot be combined on a given TCP socket: if the resulting header size does
> not fit. However, I do not see anything in the specification preventing any
> of the following use-cases on an established TCP socket:
>
> - Transition from no-MD5 to MD5,
> - Transition from MD5 to no-MD5,
> - Changing the MD5 key associated with a socket.
>
> As long as the resulting combination of options does not exceed the available
> header space.
>
> Can we please fix this KASAN report in a way that does not break user-space
> applications expectations about Linux' implementation of RFC2385 ?
>
> Thanks,
>
> Mathieu
>
> [1] RFC2385: https://tools.ietf.org/html/rfc2385
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TC_MD5SIG on established sockets
  2020-05-13 19:49 ` Eric Dumazet
@ 2020-05-13 19:56   ` Eric Dumazet
  2020-06-29 19:43     ` [regression] TCP_MD5SIG " Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2020-05-13 19:56 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David S. Miller, linux-kernel, netdev, Yuchung Cheng,
	Jonathan Rajotte-Julien

On Wed, May 13, 2020 at 12:49 PM Eric Dumazet <edumazet@google.com> wrote:
>
> I do not think we want to transition sockets in the middle. since
> packets can be re-ordered in the network.
>
> MD5 is about security (and a loose form of it), so better make sure
> all packets have it from the beginning of the flow.
>
> A flow with TCP TS on can not suddenly be sending packets without TCP TS.
>
> Clearly, trying to support this operation is a can of worms, I do not
> want to maintain such atrocity.
>
> RFC can state whatever it wants, sometimes reality forces us to have
> sane operations.
>
> Thanks.


Also the RFC states :

"This password never appears in the connection stream, and the actual
    form of the password is up to the application. It could even change
    during the lifetime of a particular connection so long as this change
    was synchronized on both ends"

It means the key can be changed, but this does not imply the option
can be turned on/off dynamically.



>
> On Wed, May 13, 2020 at 12:38 PM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> >
> > Hi,
> >
> > I am reporting a regression with respect to use of TCP_MD5SIG/TCP_MD5SIG_EXT
> > on established sockets. It is observed by a customer.
> >
> > This issue is introduced by this commit:
> >
> > commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets"
> >
> > The intent of this commit appears to be to fix a use of uninitialized value in
> > tcp_parse_options(). The change introduced by this commit is to disallow setting
> > the TCP_MD5SIG{,_EXT} socket options on an established socket.
> >
> > The justification for this change appears in the commit message:
> >
> >    "I believe this was caused by a TCP_MD5SIG being set on live
> >     flow.
> >
> >     This is highly unexpected, since TCP option space is limited.
> >
> >     For instance, presence of TCP MD5 option automatically disables
> >     TCP TimeStamp option at SYN/SYNACK time, which we can not do
> >     once flow has been established.
> >
> >     Really, adding/deleting an MD5 key only makes sense on sockets
> >     in CLOSE or LISTEN state."
> >
> > However, reading through RFC2385 [1], this justification does not appear
> > correct. Quoting to the RFC:
> >
> >    "This password never appears in the connection stream, and the actual
> >     form of the password is up to the application. It could even change
> >     during the lifetime of a particular connection so long as this change
> >     was synchronized on both ends"
> >
> > The paragraph above clearly underlines that changing the MD5 signature of
> > a live TCP socket is allowed.
> >
> > I also do not understand why it would be invalid to transition an established
> > TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting the
> > RFC:
> >
> >   "The total header size is also an issue.  The TCP header specifies
> >    where segment data starts with a 4-bit field which gives the total
> >    size of the header (including options) in 32-byte words.  This means
> >    that the total size of the header plus option must be less than or
> >    equal to 60 bytes -- this leaves 40 bytes for options."
> >
> > The paragraph above seems to be the only indication that some TCP options
> > cannot be combined on a given TCP socket: if the resulting header size does
> > not fit. However, I do not see anything in the specification preventing any
> > of the following use-cases on an established TCP socket:
> >
> > - Transition from no-MD5 to MD5,
> > - Transition from MD5 to no-MD5,
> > - Changing the MD5 key associated with a socket.
> >
> > As long as the resulting combination of options does not exceed the available
> > header space.
> >
> > Can we please fix this KASAN report in a way that does not break user-space
> > applications expectations about Linux' implementation of RFC2385 ?
> >
> > Thanks,
> >
> > Mathieu
> >
> > [1] RFC2385: https://tools.ietf.org/html/rfc2385
> >
> > --
> > Mathieu Desnoyers
> > EfficiOS Inc.
> > http://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-05-13 19:56   ` Eric Dumazet
@ 2020-06-29 19:43     ` Mathieu Desnoyers
  2020-06-29 20:47       ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2020-06-29 19:43 UTC (permalink / raw)
  To: Eric Dumazet, Linus Torvalds
  Cc: David S. Miller, linux-kernel, netdev, Yuchung Cheng,
	Jonathan Rajotte-Julien

----- On May 13, 2020, at 3:56 PM, Eric Dumazet edumazet@google.com wrote:

> On Wed, May 13, 2020 at 12:49 PM Eric Dumazet <edumazet@google.com> wrote:
>>
>>
>> On Wed, May 13, 2020 at 12:38 PM Mathieu Desnoyers
>> <mathieu.desnoyers@efficios.com> wrote:
>> >
>> > Hi,
>> >
>> > I am reporting a regression with respect to use of TCP_MD5SIG/TCP_MD5SIG_EXT
>> > on established sockets. It is observed by a customer.
>> >
>> > This issue is introduced by this commit:
>> >
>> > commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on
>> > established sockets"
>> >
>> > The intent of this commit appears to be to fix a use of uninitialized value in
>> > tcp_parse_options(). The change introduced by this commit is to disallow setting
>> > the TCP_MD5SIG{,_EXT} socket options on an established socket.
>> >
>> > The justification for this change appears in the commit message:
>> >
>> >    "I believe this was caused by a TCP_MD5SIG being set on live
>> >     flow.
>> >
>> >     This is highly unexpected, since TCP option space is limited.
>> >
>> >     For instance, presence of TCP MD5 option automatically disables
>> >     TCP TimeStamp option at SYN/SYNACK time, which we can not do
>> >     once flow has been established.
>> >
>> >     Really, adding/deleting an MD5 key only makes sense on sockets
>> >     in CLOSE or LISTEN state."
>> >
>> > However, reading through RFC2385 [1], this justification does not appear
>> > correct. Quoting to the RFC:
>> >
>> >    "This password never appears in the connection stream, and the actual
>> >     form of the password is up to the application. It could even change
>> >     during the lifetime of a particular connection so long as this change
>> >     was synchronized on both ends"
>> >
>> > The paragraph above clearly underlines that changing the MD5 signature of
>> > a live TCP socket is allowed.
>> >
>> > I also do not understand why it would be invalid to transition an established
>> > TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting the
>> > RFC:
>> >
>> >   "The total header size is also an issue.  The TCP header specifies
>> >    where segment data starts with a 4-bit field which gives the total
>> >    size of the header (including options) in 32-byte words.  This means
>> >    that the total size of the header plus option must be less than or
>> >    equal to 60 bytes -- this leaves 40 bytes for options."
>> >
>> > The paragraph above seems to be the only indication that some TCP options
>> > cannot be combined on a given TCP socket: if the resulting header size does
>> > not fit. However, I do not see anything in the specification preventing any
>> > of the following use-cases on an established TCP socket:
>> >
>> > - Transition from no-MD5 to MD5,
>> > - Transition from MD5 to no-MD5,
>> > - Changing the MD5 key associated with a socket.
>> >
>> > As long as the resulting combination of options does not exceed the available
>> > header space.
>> >
>> > Can we please fix this KASAN report in a way that does not break user-space
>> > applications expectations about Linux' implementation of RFC2385 ?
[...]
>> > [1] RFC2385: https://tools.ietf.org/html/rfc2385
>>
>>
>> I do not think we want to transition sockets in the middle. since
>> packets can be re-ordered in the network.
>>
>> MD5 is about security (and a loose form of it), so better make sure
>> all packets have it from the beginning of the flow.
>>
>> A flow with TCP TS on can not suddenly be sending packets without TCP TS.
>>
>> Clearly, trying to support this operation is a can of worms, I do not
>> want to maintain such atrocity.
>>
>> RFC can state whatever it wants, sometimes reality forces us to have
>> sane operations.
>>
>> Thanks.
>>
> Also the RFC states :
> 
> "This password never appears in the connection stream, and the actual
>    form of the password is up to the application. It could even change
>    during the lifetime of a particular connection so long as this change
>    was synchronized on both ends"
> 
> It means the key can be changed, but this does not imply the option
> can be turned on/off dynamically.
> 

The change discussed previously (introduced by commit 721230326891 "tcp:
md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets") breaks
user-space ABI. As an example, the following BGP application uses
setsockopt TCP_MD5SIG on a live TCP socket:

https://github.com/IPInfusion/SDN-IP

In addition to break user-space, it also breaks network protocol
expectations for network equipment vendors implementing RFC2385.
Considering that the goal of these protocols is interaction between
different network equipment, breaking compatibility on that side
is unexpected as well. Requiring to bring down/up the connection
just to change the TCP MD5 password is a no-go in networks with
high availability requirements. Changing the BGP authentication
password must be allowed without tearing down and re-establishing
the TCP sockets. Otherwise it doesn't scale for large network
operators to have to individually manage each individual TCP socket
in their network. However, based on the feedback I received, it
would be acceptable to tear-down the TCP connections and re-establish
them when enabling or disabling the MD5 option.

Here is a list of a few network vendors along with their behavior
with respect to TCP MD5:

- Cisco: Allows for password to be changed, but within the hold-down
timer (~180 seconds).
- Juniper: When password is initially set on active connection it will
reset, but after that any subsequent password changes no network
resets.
- Nokia: No notes on if they flap the tcp connection or not.
- Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until
both sides are ok with new passwords.
- Meta-Switch: Expects the password to be set before a connection is
attempted, but no further info on whether they reset the TCP
connection on a change.
- Avaya: Disable the neighbor, then set password, then re-enable.
- Zebos: Would normally allow the change when socket connected.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-29 19:43     ` [regression] TCP_MD5SIG " Mathieu Desnoyers
@ 2020-06-29 20:47       ` Eric Dumazet
  2020-06-30 19:43         ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2020-06-29 20:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, David S. Miller, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Mon, Jun 29, 2020 at 12:43 PM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On May 13, 2020, at 3:56 PM, Eric Dumazet edumazet@google.com wrote:
>
> > On Wed, May 13, 2020 at 12:49 PM Eric Dumazet <edumazet@google.com> wrote:
> >>
> >>
> >> On Wed, May 13, 2020 at 12:38 PM Mathieu Desnoyers
> >> <mathieu.desnoyers@efficios.com> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I am reporting a regression with respect to use of TCP_MD5SIG/TCP_MD5SIG_EXT
> >> > on established sockets. It is observed by a customer.
> >> >
> >> > This issue is introduced by this commit:
> >> >
> >> > commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on
> >> > established sockets"
> >> >
> >> > The intent of this commit appears to be to fix a use of uninitialized value in
> >> > tcp_parse_options(). The change introduced by this commit is to disallow setting
> >> > the TCP_MD5SIG{,_EXT} socket options on an established socket.
> >> >
> >> > The justification for this change appears in the commit message:
> >> >
> >> >    "I believe this was caused by a TCP_MD5SIG being set on live
> >> >     flow.
> >> >
> >> >     This is highly unexpected, since TCP option space is limited.
> >> >
> >> >     For instance, presence of TCP MD5 option automatically disables
> >> >     TCP TimeStamp option at SYN/SYNACK time, which we can not do
> >> >     once flow has been established.
> >> >
> >> >     Really, adding/deleting an MD5 key only makes sense on sockets
> >> >     in CLOSE or LISTEN state."
> >> >
> >> > However, reading through RFC2385 [1], this justification does not appear
> >> > correct. Quoting to the RFC:
> >> >
> >> >    "This password never appears in the connection stream, and the actual
> >> >     form of the password is up to the application. It could even change
> >> >     during the lifetime of a particular connection so long as this change
> >> >     was synchronized on both ends"
> >> >
> >> > The paragraph above clearly underlines that changing the MD5 signature of
> >> > a live TCP socket is allowed.
> >> >
> >> > I also do not understand why it would be invalid to transition an established
> >> > TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting the
> >> > RFC:
> >> >
> >> >   "The total header size is also an issue.  The TCP header specifies
> >> >    where segment data starts with a 4-bit field which gives the total
> >> >    size of the header (including options) in 32-byte words.  This means
> >> >    that the total size of the header plus option must be less than or
> >> >    equal to 60 bytes -- this leaves 40 bytes for options."
> >> >
> >> > The paragraph above seems to be the only indication that some TCP options
> >> > cannot be combined on a given TCP socket: if the resulting header size does
> >> > not fit. However, I do not see anything in the specification preventing any
> >> > of the following use-cases on an established TCP socket:
> >> >
> >> > - Transition from no-MD5 to MD5,
> >> > - Transition from MD5 to no-MD5,
> >> > - Changing the MD5 key associated with a socket.
> >> >
> >> > As long as the resulting combination of options does not exceed the available
> >> > header space.
> >> >
> >> > Can we please fix this KASAN report in a way that does not break user-space
> >> > applications expectations about Linux' implementation of RFC2385 ?
> [...]
> >> > [1] RFC2385: https://tools.ietf.org/html/rfc2385
> >>
> >>
> >> I do not think we want to transition sockets in the middle. since
> >> packets can be re-ordered in the network.
> >>
> >> MD5 is about security (and a loose form of it), so better make sure
> >> all packets have it from the beginning of the flow.
> >>
> >> A flow with TCP TS on can not suddenly be sending packets without TCP TS.
> >>
> >> Clearly, trying to support this operation is a can of worms, I do not
> >> want to maintain such atrocity.
> >>
> >> RFC can state whatever it wants, sometimes reality forces us to have
> >> sane operations.
> >>
> >> Thanks.
> >>
> > Also the RFC states :
> >
> > "This password never appears in the connection stream, and the actual
> >    form of the password is up to the application. It could even change
> >    during the lifetime of a particular connection so long as this change
> >    was synchronized on both ends"
> >
> > It means the key can be changed, but this does not imply the option
> > can be turned on/off dynamically.
> >
>
> The change discussed previously (introduced by commit 721230326891 "tcp:
> md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets") breaks
> user-space ABI. As an example, the following BGP application uses
> setsockopt TCP_MD5SIG on a live TCP socket:
>
> https://github.com/IPInfusion/SDN-IP
>
> In addition to break user-space, it also breaks network protocol
> expectations for network equipment vendors implementing RFC2385.
> Considering that the goal of these protocols is interaction between
> different network equipment, breaking compatibility on that side
> is unexpected as well. Requiring to bring down/up the connection
> just to change the TCP MD5 password is a no-go in networks with
> high availability requirements. Changing the BGP authentication
> password must be allowed without tearing down and re-establishing
> the TCP sockets. Otherwise it doesn't scale for large network
> operators to have to individually manage each individual TCP socket
> in their network. However, based on the feedback I received, it
> would be acceptable to tear-down the TCP connections and re-establish
> them when enabling or disabling the MD5 option.
>
> Here is a list of a few network vendors along with their behavior
> with respect to TCP MD5:
>
> - Cisco: Allows for password to be changed, but within the hold-down
> timer (~180 seconds).
> - Juniper: When password is initially set on active connection it will
> reset, but after that any subsequent password changes no network
> resets.
> - Nokia: No notes on if they flap the tcp connection or not.
> - Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until
> both sides are ok with new passwords.
> - Meta-Switch: Expects the password to be set before a connection is
> attempted, but no further info on whether they reset the TCP
> connection on a change.
> - Avaya: Disable the neighbor, then set password, then re-enable.
> - Zebos: Would normally allow the change when socket connected.
>
>

If you want to be able to _change_ md5 key, this is fine by me, please
send a patch.

We can not dynamically turn on MD5, this is mentioned briefly in
tcp_synack_options().

If you want to turn on MD5 on an established flow, then you must
ensure that both SACK and TS were not enabled in the 3WHS,
and then make sure nothing blows up in the stack.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-29 20:47       ` Eric Dumazet
@ 2020-06-30 19:43         ` Linus Torvalds
  2020-06-30 19:52           ` Linus Torvalds
  2020-06-30 20:21           ` David Miller
  0 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2020-06-30 19:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mathieu Desnoyers, David S. Miller, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Mon, Jun 29, 2020 at 1:47 PM Eric Dumazet <edumazet@google.com> wrote:
>
> If you want to be able to _change_ md5 key, this is fine by me, please
> send a patch.

Eric, if this change breaks existing users, then it gets reverted.
That's just fundamental.

No RFC's are in the lreast relevant when compared to "this broke
existing users".

If you're not willing to do the work to fix it, I will revert that
commit. Because that's how it works - you can't ask other people to
fix the breakage you introduced.

It really is that simple. We do not allow developers to break things
and then step away and say "not my problem".

         Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 19:43         ` Linus Torvalds
@ 2020-06-30 19:52           ` Linus Torvalds
  2020-06-30 20:34             ` Mathieu Desnoyers
  2020-06-30 20:21           ` David Miller
  1 sibling, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2020-06-30 19:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mathieu Desnoyers, David S. Miller, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 12:43 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If you're not willing to do the work to fix it, I will revert that
> commit.

Hmm. I only now noticed that that commit is over two years old.

So I think it's still wrong (clearly others do change passwords
outside of listening state), but considering that it apparently took
people two years to notice, at least some of the onus on figuring out
a better morel is on people who didn't even bother to test things in a
timely manner.

At some point "entreprise vendor kernels" or whatever who stay with
legacy kernels for a long time only have themselves to blame.

             Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 19:43         ` Linus Torvalds
  2020-06-30 19:52           ` Linus Torvalds
@ 2020-06-30 20:21           ` David Miller
  2020-06-30 20:30             ` Eric Dumazet
  1 sibling, 1 reply; 32+ messages in thread
From: David Miller @ 2020-06-30 20:21 UTC (permalink / raw)
  To: torvalds; +Cc: edumazet, mathieu.desnoyers, linux-kernel, netdev, ycheng, joraj

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 30 Jun 2020 12:43:21 -0700

> If you're not willing to do the work to fix it, I will revert that
> commit.

Please let me handle this situation instead of making threats, this
just got reported.

Thank you.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 20:21           ` David Miller
@ 2020-06-30 20:30             ` Eric Dumazet
  2020-06-30 20:39               ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2020-06-30 20:30 UTC (permalink / raw)
  To: David Miller
  Cc: Linus Torvalds, Mathieu Desnoyers, LKML, netdev, Yuchung Cheng,
	Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 1:21 PM David Miller <davem@davemloft.net> wrote:
>
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Tue, 30 Jun 2020 12:43:21 -0700
>
> > If you're not willing to do the work to fix it, I will revert that
> > commit.
>
> Please let me handle this situation instead of making threats, this
> just got reported.
>
> Thank you.
>

Also keep in mind the commit fixed a security issue, since we were
sending on the wire
garbage bytes from the kernel.

We can not simply revert it and hope for the best.

I find quite alarming vendors still use TCP MD5 "for security
reasons", but none of them have contributed to it in linux kernel
since 2018
(Time of the 'buggy patch')

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 19:52           ` Linus Torvalds
@ 2020-06-30 20:34             ` Mathieu Desnoyers
  2020-06-30 20:39               ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2020-06-30 20:34 UTC (permalink / raw)
  To: Linus Torvalds, Eric Dumazet
  Cc: David S. Miller, linux-kernel, netdev, Yuchung Cheng,
	Jonathan Rajotte-Julien

----- On Jun 30, 2020, at 3:52 PM, Linus Torvalds torvalds@linux-foundation.org wrote:

> On Tue, Jun 30, 2020 at 12:43 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
[...]
> So I think it's still wrong (clearly others do change passwords
> outside of listening state), but considering that it apparently took
> people two years to notice, at least some of the onus on figuring out
> a better morel is on people who didn't even bother to test things in a
> timely manner.

I'm fully willing to work with Eric on finding a way forward with a
fix which addresses the original issue Eric's patch was trying to
fix while preserving ABI compatibility.

The main thing we need to agree on at this stage is what is our goal. We
can either choose to restore the original ABI behavior entirely, or only
focus on what appears to be the most important use-cases.

AFAIU, restoring full ABI compatibility would require to re-enable all
the following scenarios:

A) Transition of live socket from no key -> MD5 key.
B) Transition of live socket from MD5 key -> no key.
C) Transition of live socket from MD5 key to a different MD5 key.

Scenario (C) appears to be the most important use-case, and probably the
easiest to restore to its original behavior.

AFAIU restoring scenarios A and B would require us to validate how
much header space is needed by each SACK, TS and MD5 option enabled
on the socket, and reject enabling any option that adds header space
requirement exceeding the available space.

I welcome advice on what should be the end goal here.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 20:34             ` Mathieu Desnoyers
@ 2020-06-30 20:39               ` Eric Dumazet
  2020-06-30 20:44                 ` David Miller
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2020-06-30 20:39 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, David S. Miller, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 1:34 PM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Jun 30, 2020, at 3:52 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
>
> > On Tue, Jun 30, 2020 at 12:43 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> [...]
> > So I think it's still wrong (clearly others do change passwords
> > outside of listening state), but considering that it apparently took
> > people two years to notice, at least some of the onus on figuring out
> > a better morel is on people who didn't even bother to test things in a
> > timely manner.
>
> I'm fully willing to work with Eric on finding a way forward with a
> fix which addresses the original issue Eric's patch was trying to
> fix while preserving ABI compatibility.
>
> The main thing we need to agree on at this stage is what is our goal. We
> can either choose to restore the original ABI behavior entirely, or only
> focus on what appears to be the most important use-cases.
>
> AFAIU, restoring full ABI compatibility would require to re-enable all
> the following scenarios:
>
> A) Transition of live socket from no key -> MD5 key.
> B) Transition of live socket from MD5 key -> no key.
> C) Transition of live socket from MD5 key to a different MD5 key.
>
> Scenario (C) appears to be the most important use-case, and probably the
> easiest to restore to its original behavior.
>
> AFAIU restoring scenarios A and B would require us to validate how
> much header space is needed by each SACK, TS and MD5 option enabled
> on the socket, and reject enabling any option that adds header space
> requirement exceeding the available space.
>
> I welcome advice on what should be the end goal here.
>

The (C) & (B) case are certainly doable.

A) case is more complex, I have no idea of breakages of various TCP
stacks if a flow got SACK
at some point (in 3WHS) but suddenly becomes Reno.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 20:30             ` Eric Dumazet
@ 2020-06-30 20:39               ` Mathieu Desnoyers
  0 siblings, 0 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2020-06-30 20:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

----- On Jun 30, 2020, at 4:30 PM, Eric Dumazet edumazet@google.com wrote:

> On Tue, Jun 30, 2020 at 1:21 PM David Miller <davem@davemloft.net> wrote:
>>
>> From: Linus Torvalds <torvalds@linux-foundation.org>
>> Date: Tue, 30 Jun 2020 12:43:21 -0700
>>
>> > If you're not willing to do the work to fix it, I will revert that
>> > commit.
>>
>> Please let me handle this situation instead of making threats, this
>> just got reported.
>>
>> Thank you.
>>
> 
> Also keep in mind the commit fixed a security issue, since we were
> sending on the wire
> garbage bytes from the kernel.
> 
> We can not simply revert it and hope for the best.
> 
> I find quite alarming vendors still use TCP MD5 "for security
> reasons", but none of them have contributed to it in linux kernel
> since 2018
> (Time of the 'buggy patch')

I'm helping a customer increase their contributions and feedback to upstream.
As we can see, they have accumulated some backlog over time.

Clearly reverting a security fix is not acceptable here. Coming up with a
proper ABI-compatible fix should not be out of our reach though.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 20:39               ` Eric Dumazet
@ 2020-06-30 20:44                 ` David Miller
  2020-06-30 20:56                   ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: David Miller @ 2020-06-30 20:44 UTC (permalink / raw)
  To: edumazet; +Cc: mathieu.desnoyers, torvalds, linux-kernel, netdev, ycheng, joraj

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 30 Jun 2020 13:39:27 -0700

> The (C) & (B) case are certainly doable.
> 
> A) case is more complex, I have no idea of breakages of various TCP
> stacks if a flow got SACK
> at some point (in 3WHS) but suddenly becomes Reno.

I agree that C and B are the easiest to implement without having to
add complicated code to handle various negotiated TCP option
scenerios.

It does seem to be that some entities do A, or did I misread your
behavioral analysis of various implementations Mathieu?

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 20:44                 ` David Miller
@ 2020-06-30 20:56                   ` Eric Dumazet
  2020-06-30 21:17                     ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2020-06-30 20:56 UTC (permalink / raw)
  To: David Miller
  Cc: Mathieu Desnoyers, Linus Torvalds, LKML, netdev, Yuchung Cheng,
	Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
> Date: Tue, 30 Jun 2020 13:39:27 -0700
>
> > The (C) & (B) case are certainly doable.
> >
> > A) case is more complex, I have no idea of breakages of various TCP
> > stacks if a flow got SACK
> > at some point (in 3WHS) but suddenly becomes Reno.
>
> I agree that C and B are the easiest to implement without having to
> add complicated code to handle various negotiated TCP option
> scenerios.
>
> It does seem to be that some entities do A, or did I misread your
> behavioral analysis of various implementations Mathieu?
>
> Thanks.

Yes, another question about Mathieu cases is do determine the behavior
of all these stacks vs :
SACK option
TCP TS option.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 20:56                   ` Eric Dumazet
@ 2020-06-30 21:17                     ` Mathieu Desnoyers
  2020-06-30 21:23                       ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2020-06-30 21:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

----- On Jun 30, 2020, at 4:56 PM, Eric Dumazet edumazet@google.com wrote:

> On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote:
>>
>> From: Eric Dumazet <edumazet@google.com>
>> Date: Tue, 30 Jun 2020 13:39:27 -0700
>>
>> > The (C) & (B) case are certainly doable.
>> >
>> > A) case is more complex, I have no idea of breakages of various TCP
>> > stacks if a flow got SACK
>> > at some point (in 3WHS) but suddenly becomes Reno.
>>
>> I agree that C and B are the easiest to implement without having to
>> add complicated code to handle various negotiated TCP option
>> scenerios.
>>
>> It does seem to be that some entities do A, or did I misread your
>> behavioral analysis of various implementations Mathieu?
>>
>> Thanks.
> 
> Yes, another question about Mathieu cases is do determine the behavior
> of all these stacks vs :
> SACK option
> TCP TS option.

I will ask my customer's networking team to investigate these behaviors,
which will allow me to prepare a thorough reply to the questions raised
by Eric and David. I expect to have an answer within 2-3 weeks at most.

Thank you!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 21:17                     ` Mathieu Desnoyers
@ 2020-06-30 21:23                       ` Eric Dumazet
  2020-06-30 21:54                         ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2020-06-30 21:23 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Jun 30, 2020, at 4:56 PM, Eric Dumazet edumazet@google.com wrote:
>
> > On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote:
> >>
> >> From: Eric Dumazet <edumazet@google.com>
> >> Date: Tue, 30 Jun 2020 13:39:27 -0700
> >>
> >> > The (C) & (B) case are certainly doable.
> >> >
> >> > A) case is more complex, I have no idea of breakages of various TCP
> >> > stacks if a flow got SACK
> >> > at some point (in 3WHS) but suddenly becomes Reno.
> >>
> >> I agree that C and B are the easiest to implement without having to
> >> add complicated code to handle various negotiated TCP option
> >> scenerios.
> >>
> >> It does seem to be that some entities do A, or did I misread your
> >> behavioral analysis of various implementations Mathieu?
> >>
> >> Thanks.
> >
> > Yes, another question about Mathieu cases is do determine the behavior
> > of all these stacks vs :
> > SACK option
> > TCP TS option.
>
> I will ask my customer's networking team to investigate these behaviors,
> which will allow me to prepare a thorough reply to the questions raised
> by Eric and David. I expect to have an answer within 2-3 weeks at most.
>
> Thank you!


Great, I am working on adding back support for (B) & (C) by the end of
this week.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 21:23                       ` Eric Dumazet
@ 2020-06-30 21:54                         ` Eric Dumazet
  2020-06-30 22:07                           ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2020-06-30 21:54 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 2:23 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> >
> > ----- On Jun 30, 2020, at 4:56 PM, Eric Dumazet edumazet@google.com wrote:
> >
> > > On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote:
> > >>
> > >> From: Eric Dumazet <edumazet@google.com>
> > >> Date: Tue, 30 Jun 2020 13:39:27 -0700
> > >>
> > >> > The (C) & (B) case are certainly doable.
> > >> >
> > >> > A) case is more complex, I have no idea of breakages of various TCP
> > >> > stacks if a flow got SACK
> > >> > at some point (in 3WHS) but suddenly becomes Reno.
> > >>
> > >> I agree that C and B are the easiest to implement without having to
> > >> add complicated code to handle various negotiated TCP option
> > >> scenerios.
> > >>
> > >> It does seem to be that some entities do A, or did I misread your
> > >> behavioral analysis of various implementations Mathieu?
> > >>
> > >> Thanks.
> > >
> > > Yes, another question about Mathieu cases is do determine the behavior
> > > of all these stacks vs :
> > > SACK option
> > > TCP TS option.
> >
> > I will ask my customer's networking team to investigate these behaviors,
> > which will allow me to prepare a thorough reply to the questions raised
> > by Eric and David. I expect to have an answer within 2-3 weeks at most.
> >
> > Thank you!
>
>
> Great, I am working on adding back support for (B) & (C) by the end of
> this week.

Note that the security issue (of sending uninit bytes to the wire) has
been independently fixed with [1]

This means syzbot was able to have MD5+TS+SACK  ~6 months ago.

It seems we (linux) do not enable this combination for PASSIVE flows,
(according to tcp_synack_options()),
but  for ACTIVE flows we do nothing special.

So maybe code in tcp_synack_options() should be mirrored to
tcp_syn_options() for consistency.
(disabling TS if  both MD5 and SACK are enabled)

[1]

commit 9424e2e7ad93ffffa88f882c9bc5023570904b55
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Dec 5 10:10:15 2019 -0800

    tcp: md5: fix potential overestimation of TCP option space

    Back in 2008, Adam Langley fixed the corner case of packets for flows
    having all of the following options : MD5 TS SACK

    Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
    can be cooked from the remaining 8 bytes.

    tcp_established_options() correctly sets opts->num_sack_blocks
    to zero, but returns 36 instead of 32.

    This means TCP cooks packets with 4 extra bytes at the end
    of options, containing unitialized bytes.

    Fixes: 33ad798c924b ("tcp: options clean up")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Acked-by: Neal Cardwell <ncardwell@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index be6d22b8190fa375074062032105879270af4be5..b184f03d743715ef4b2d166ceae651529be77953
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -755,8 +755,9 @@ static unsigned int tcp_established_options(struct
sock *sk, struct sk_buff *skb
                        min_t(unsigned int, eff_sacks,
                              (remaining - TCPOLEN_SACK_BASE_ALIGNED) /
                              TCPOLEN_SACK_PERBLOCK);
-               size += TCPOLEN_SACK_BASE_ALIGNED +
-                       opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK;
+               if (likely(opts->num_sack_blocks))
+                       size += TCPOLEN_SACK_BASE_ALIGNED +
+                               opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK;
        }

        return size;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 21:54                         ` Eric Dumazet
@ 2020-06-30 22:07                           ` Eric Dumazet
  2020-06-30 22:38                             ` Eric Dumazet
  2020-07-01 17:24                             ` Eric Dumazet
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2020-06-30 22:07 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 2:54 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Jun 30, 2020 at 2:23 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> > >
> > > ----- On Jun 30, 2020, at 4:56 PM, Eric Dumazet edumazet@google.com wrote:
> > >
> > > > On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote:
> > > >>
> > > >> From: Eric Dumazet <edumazet@google.com>
> > > >> Date: Tue, 30 Jun 2020 13:39:27 -0700
> > > >>
> > > >> > The (C) & (B) case are certainly doable.
> > > >> >
> > > >> > A) case is more complex, I have no idea of breakages of various TCP
> > > >> > stacks if a flow got SACK
> > > >> > at some point (in 3WHS) but suddenly becomes Reno.
> > > >>
> > > >> I agree that C and B are the easiest to implement without having to
> > > >> add complicated code to handle various negotiated TCP option
> > > >> scenerios.
> > > >>
> > > >> It does seem to be that some entities do A, or did I misread your
> > > >> behavioral analysis of various implementations Mathieu?
> > > >>
> > > >> Thanks.
> > > >
> > > > Yes, another question about Mathieu cases is do determine the behavior
> > > > of all these stacks vs :
> > > > SACK option
> > > > TCP TS option.
> > >
> > > I will ask my customer's networking team to investigate these behaviors,
> > > which will allow me to prepare a thorough reply to the questions raised
> > > by Eric and David. I expect to have an answer within 2-3 weeks at most.
> > >
> > > Thank you!
> >
> >
> > Great, I am working on adding back support for (B) & (C) by the end of
> > this week.
>
> Note that the security issue (of sending uninit bytes to the wire) has
> been independently fixed with [1]
>
> This means syzbot was able to have MD5+TS+SACK  ~6 months ago.
>
> It seems we (linux) do not enable this combination for PASSIVE flows,
> (according to tcp_synack_options()),
> but  for ACTIVE flows we do nothing special.
>
> So maybe code in tcp_synack_options() should be mirrored to
> tcp_syn_options() for consistency.
> (disabling TS if  both MD5 and SACK are enabled)

Oh well, tcp_syn_options() is supposed to have the same logic.

Maybe we have an issue with SYNCOOKIES (with MD5 + TS + SACK)

Nice can of worms.

>
> [1]
>
> commit 9424e2e7ad93ffffa88f882c9bc5023570904b55
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Thu Dec 5 10:10:15 2019 -0800
>
>     tcp: md5: fix potential overestimation of TCP option space
>
>     Back in 2008, Adam Langley fixed the corner case of packets for flows
>     having all of the following options : MD5 TS SACK
>
>     Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
>     can be cooked from the remaining 8 bytes.
>
>     tcp_established_options() correctly sets opts->num_sack_blocks
>     to zero, but returns 36 instead of 32.
>
>     This means TCP cooks packets with 4 extra bytes at the end
>     of options, containing unitialized bytes.
>
>     Fixes: 33ad798c924b ("tcp: options clean up")
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Reported-by: syzbot <syzkaller@googlegroups.com>
>     Acked-by: Neal Cardwell <ncardwell@google.com>
>     Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index be6d22b8190fa375074062032105879270af4be5..b184f03d743715ef4b2d166ceae651529be77953
> 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -755,8 +755,9 @@ static unsigned int tcp_established_options(struct
> sock *sk, struct sk_buff *skb
>                         min_t(unsigned int, eff_sacks,
>                               (remaining - TCPOLEN_SACK_BASE_ALIGNED) /
>                               TCPOLEN_SACK_PERBLOCK);
> -               size += TCPOLEN_SACK_BASE_ALIGNED +
> -                       opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK;
> +               if (likely(opts->num_sack_blocks))
> +                       size += TCPOLEN_SACK_BASE_ALIGNED +
> +                               opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK;
>         }
>
>         return size;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 22:07                           ` Eric Dumazet
@ 2020-06-30 22:38                             ` Eric Dumazet
  2020-06-30 23:44                               ` Mathieu Desnoyers
  2020-07-01  2:02                               ` Herbert Xu
  2020-07-01 17:24                             ` Eric Dumazet
  1 sibling, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2020-06-30 22:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 3:07 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Jun 30, 2020 at 2:54 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Jun 30, 2020 at 2:23 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers
> > > <mathieu.desnoyers@efficios.com> wrote:
> > > >
> > > > ----- On Jun 30, 2020, at 4:56 PM, Eric Dumazet edumazet@google.com wrote:
> > > >
> > > > > On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote:
> > > > >>
> > > > >> From: Eric Dumazet <edumazet@google.com>
> > > > >> Date: Tue, 30 Jun 2020 13:39:27 -0700
> > > > >>
> > > > >> > The (C) & (B) case are certainly doable.
> > > > >> >
> > > > >> > A) case is more complex, I have no idea of breakages of various TCP
> > > > >> > stacks if a flow got SACK
> > > > >> > at some point (in 3WHS) but suddenly becomes Reno.
> > > > >>
> > > > >> I agree that C and B are the easiest to implement without having to
> > > > >> add complicated code to handle various negotiated TCP option
> > > > >> scenerios.
> > > > >>
> > > > >> It does seem to be that some entities do A, or did I misread your
> > > > >> behavioral analysis of various implementations Mathieu?
> > > > >>
> > > > >> Thanks.
> > > > >
> > > > > Yes, another question about Mathieu cases is do determine the behavior
> > > > > of all these stacks vs :
> > > > > SACK option
> > > > > TCP TS option.
> > > >
> > > > I will ask my customer's networking team to investigate these behaviors,
> > > > which will allow me to prepare a thorough reply to the questions raised
> > > > by Eric and David. I expect to have an answer within 2-3 weeks at most.
> > > >
> > > > Thank you!
> > >
> > >
> > > Great, I am working on adding back support for (B) & (C) by the end of
> > > this week.
> >
> > Note that the security issue (of sending uninit bytes to the wire) has
> > been independently fixed with [1]
> >
> > This means syzbot was able to have MD5+TS+SACK  ~6 months ago.
> >
> > It seems we (linux) do not enable this combination for PASSIVE flows,
> > (according to tcp_synack_options()),
> > but  for ACTIVE flows we do nothing special.
> >
> > So maybe code in tcp_synack_options() should be mirrored to
> > tcp_syn_options() for consistency.
> > (disabling TS if  both MD5 and SACK are enabled)
>
> Oh well, tcp_syn_options() is supposed to have the same logic.
>
> Maybe we have an issue with SYNCOOKIES (with MD5 + TS + SACK)
>
> Nice can of worms.
>

For updates of keys, it seems existing code lacks some RCU care.

MD5 keys use RCU for lookups/hashes, but the replacement of a key does
not allocate a new piece of memory.

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
 int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
tcp_md5sig_key *key)
 {
        struct scatterlist sg;
+       u8 keylen = key->keylen;

-       sg_init_one(&sg, key->key, key->keylen);
-       ahash_request_set_crypt(hp->md5_req, &sg, NULL, key->keylen);
+       smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
+
+       sg_init_one(&sg, key->key, keylen);
+       ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen);
        return crypto_ahash_update(hp->md5_req);
 }
 EXPORT_SYMBOL(tcp_md5_hash_key);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f
100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union
tcp_md5_addr *addr,
        if (key) {
                /* Pre-existing entry - just update that one. */
                memcpy(key->key, newkey, newkeylen);
+
+               smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
+
                key->keylen = newkeylen;
                return 0;
        }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 22:38                             ` Eric Dumazet
@ 2020-06-30 23:44                               ` Mathieu Desnoyers
  2020-07-01  0:01                                 ` Eric Dumazet
  2020-07-01  2:02                               ` Herbert Xu
  1 sibling, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2020-06-30 23:44 UTC (permalink / raw)
  To: Eric Dumazet, paulmck
  Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

----- On Jun 30, 2020, at 6:38 PM, Eric Dumazet edumazet@google.com wrote:
[...]
> 
> For updates of keys, it seems existing code lacks some RCU care.
> 
> MD5 keys use RCU for lookups/hashes, but the replacement of a key does
> not allocate a new piece of memory.

How is that RCU-safe ?

Based on what I see here:

tcp_md5_do_add() has a comment stating:

"/* This can be called on a newly created socket, from other files */"

which appears to be untrue if this can indeed be called on a live socket.

The path for pre-existing keys does:

        key = tcp_md5_do_lookup_exact(sk, addr, family, prefixlen, l3index);
        if (key) {
                /* Pre-existing entry - just update that one. */
                memcpy(key->key, newkey, newkeylen);
                key->keylen = newkeylen;
                return 0;
        }

AFAIU, this works only if you assume there are no concurrent readers
accessing key->key, else they can see a corrupted key.

The change you are proposing adds smp_wmb/smp_rmb to pair stores
to key before key_len with loads of key_len before key. I'm not sure
what this is trying to achieve, and how it prevents the readers from
observing a corrupted state if the key is updated on a live socket ?

Based on my understanding, this path which deals with pre-existing keys
in-place should only ever be used when there are no concurrent readers,
else a new memory allocation would be needed to guarantee that readers
always observe a valid copy.

Thanks,

Mathieu

> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index
> 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
> int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
> tcp_md5sig_key *key)
> {
>        struct scatterlist sg;
> +       u8 keylen = key->keylen;
> 
> -       sg_init_one(&sg, key->key, key->keylen);
> -       ahash_request_set_crypt(hp->md5_req, &sg, NULL, key->keylen);
> +       smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
> +
> +       sg_init_one(&sg, key->key, keylen);
> +       ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen);
>        return crypto_ahash_update(hp->md5_req);
> }
> EXPORT_SYMBOL(tcp_md5_hash_key);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index
> ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union
> tcp_md5_addr *addr,
>        if (key) {
>                /* Pre-existing entry - just update that one. */
>                memcpy(key->key, newkey, newkeylen);
> +
> +               smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
> +
>                key->keylen = newkeylen;
>                return 0;
>         }

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 23:44                               ` Mathieu Desnoyers
@ 2020-07-01  0:01                                 ` Eric Dumazet
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Dumazet @ 2020-07-01  0:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: paulmck, David S. Miller, Linus Torvalds, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 4:44 PM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Jun 30, 2020, at 6:38 PM, Eric Dumazet edumazet@google.com wrote:
> [...]
> >
> > For updates of keys, it seems existing code lacks some RCU care.
> >
> > MD5 keys use RCU for lookups/hashes, but the replacement of a key does
> > not allocate a new piece of memory.
>
> How is that RCU-safe ?
>
> Based on what I see here:
>
> tcp_md5_do_add() has a comment stating:
>
> "/* This can be called on a newly created socket, from other files */"
>
> which appears to be untrue if this can indeed be called on a live socket.

"This can be called" is not the same than "this is always called for
newly created socket"

>
> The path for pre-existing keys does:
>
>         key = tcp_md5_do_lookup_exact(sk, addr, family, prefixlen, l3index);
>         if (key) {
>                 /* Pre-existing entry - just update that one. */
>                 memcpy(key->key, newkey, newkeylen);
>                 key->keylen = newkeylen;
>                 return 0;
>         }
>
> AFAIU, this works only if you assume there are no concurrent readers
> accessing key->key, else they can see a corrupted key.

This is fine.

>
> The change you are proposing adds smp_wmb/smp_rmb to pair stores
> to key before key_len with loads of key_len before key. I'm not sure
> what this is trying to achieve, and how it prevents the readers from
> observing a corrupted state if the key is updated on a live socket ?



By definition if you change the MD5 key on a socket while packets are
flying, the incoming packet could either

1) See old key (packet is dropped)
2) See new key.

So any other decision (catching intermediate state) is really not an
issue, since you already accepted the fact that a packet could be
dropped,
and TCP will retransmit.

TCP MD5 implementation does not support multiple keys  for one flow,
you can not have both old and new keys being checked.


>
> Based on my understanding, this path which deals with pre-existing keys
> in-place should only ever be used when there are no concurrent readers,
> else a new memory allocation would be needed to guarantee that readers
> always observe a valid copy.

This is not _needed,_ and since memory allocations can fail, we would
potentially break applications
assuming that changing MD5 key would never fail.

Patch has been sent for review on netdev@ (
https://patchwork.ozlabs.org/project/netdev/patch/20200630234101.3259179-1-edumazet@google.com/
)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 22:38                             ` Eric Dumazet
  2020-06-30 23:44                               ` Mathieu Desnoyers
@ 2020-07-01  2:02                               ` Herbert Xu
  2020-07-01  2:17                                 ` Eric Dumazet
  1 sibling, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2020-07-01  2:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: mathieu.desnoyers, davem, torvalds, linux-kernel, netdev, ycheng, joraj

Eric Dumazet <edumazet@google.com> wrote:
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
> int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
> tcp_md5sig_key *key)
> {
>        struct scatterlist sg;
> +       u8 keylen = key->keylen;
> 
> -       sg_init_one(&sg, key->key, key->keylen);
> -       ahash_request_set_crypt(hp->md5_req, &sg, NULL, key->keylen);
> +       smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
> +
> +       sg_init_one(&sg, key->key, keylen);
> +       ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen);
>        return crypto_ahash_update(hp->md5_req);
> }
> EXPORT_SYMBOL(tcp_md5_hash_key);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union
> tcp_md5_addr *addr,
>        if (key) {
>                /* Pre-existing entry - just update that one. */
>                memcpy(key->key, newkey, newkeylen);
> +
> +               smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
> +
>                key->keylen = newkeylen;
>                return 0;
>        }

This doesn't make sense.  Your smp_rmb only guarantees that you
see a version of key->key that's newer than keylen.  What if the
key got changed twice? You could still read more than what's in
the key (if that's what you're trying to protect against).

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-07-01  2:02                               ` Herbert Xu
@ 2020-07-01  2:17                                 ` Eric Dumazet
  2020-07-01  2:22                                   ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2020-07-01  2:17 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 7:02 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Eric Dumazet <edumazet@google.com> wrote:
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124
> > 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
> > int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
> > tcp_md5sig_key *key)
> > {
> >        struct scatterlist sg;
> > +       u8 keylen = key->keylen;
> >
> > -       sg_init_one(&sg, key->key, key->keylen);
> > -       ahash_request_set_crypt(hp->md5_req, &sg, NULL, key->keylen);
> > +       smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
> > +
> > +       sg_init_one(&sg, key->key, keylen);
> > +       ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen);
> >        return crypto_ahash_update(hp->md5_req);
> > }
> > EXPORT_SYMBOL(tcp_md5_hash_key);
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f
> > 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union
> > tcp_md5_addr *addr,
> >        if (key) {
> >                /* Pre-existing entry - just update that one. */
> >                memcpy(key->key, newkey, newkeylen);
> > +
> > +               smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
> > +
> >                key->keylen = newkeylen;
> >                return 0;
> >        }
>
> This doesn't make sense.  Your smp_rmb only guarantees that you
> see a version of key->key that's newer than keylen.  What if the
> key got changed twice? You could still read more than what's in
> the key (if that's what you're trying to protect against).

The worst that can happen is a dropped packet.

Which is anyway going to happen anyway eventually if an application
changes a TCP MD5 key on a live flow.

The main issue of the prior code was the double read of key->keylen in
tcp_md5_hash_key(), not that few bytes could change under us.

I used smp_rmb() to ease backports, since old kernels had no
READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-07-01  2:17                                 ` Eric Dumazet
@ 2020-07-01  2:22                                   ` Herbert Xu
  2020-07-01  2:30                                     ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2020-07-01  2:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 07:17:46PM -0700, Eric Dumazet wrote:
>
> The main issue of the prior code was the double read of key->keylen in
> tcp_md5_hash_key(), not that few bytes could change under us.
>
> I used smp_rmb() to ease backports, since old kernels had no
> READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead.

If it's the double-read that you're protecting against, you should
just use barrier() and the comment should say so too.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-07-01  2:22                                   ` Herbert Xu
@ 2020-07-01  2:30                                     ` Eric Dumazet
  2020-07-01  2:39                                       ` Joe Perches
  2020-07-01  2:58                                       ` Herbert Xu
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2020-07-01  2:30 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 7:23 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Tue, Jun 30, 2020 at 07:17:46PM -0700, Eric Dumazet wrote:
> >
> > The main issue of the prior code was the double read of key->keylen in
> > tcp_md5_hash_key(), not that few bytes could change under us.
> >
> > I used smp_rmb() to ease backports, since old kernels had no
> > READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead.
>
> If it's the double-read that you're protecting against, you should
> just use barrier() and the comment should say so too.

I made this clear in the changelog, do we want comments all over the places ?
Do not get me wrong, we had this bug for years and suddenly this is a
big deal...

    MD5 keys are read with RCU protection, and tcp_md5_do_add()
    might update in-place a prior key.

    Normally, typical RCU updates would allocate a new piece
    of memory. In this case only key->key and key->keylen might
    be updated, and we do not care if an incoming packet could
    see the old key, the new one, or some intermediate value,
    since changing the key on a live flow is known to be problematic
    anyway.

    We only want to make sure that in the case key->keylen
    is changed, cpus in tcp_md5_hash_key() wont try to use
    uninitialized data, or crash because key->keylen was
    read twice to feed sg_init_one() and ahash_request_set_crypt()

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-07-01  2:30                                     ` Eric Dumazet
@ 2020-07-01  2:39                                       ` Joe Perches
  2020-07-01  2:58                                       ` Herbert Xu
  1 sibling, 0 replies; 32+ messages in thread
From: Joe Perches @ 2020-07-01  2:39 UTC (permalink / raw)
  To: Eric Dumazet, Herbert Xu
  Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, 2020-06-30 at 19:30 -0700, Eric Dumazet wrote:
> On Tue, Jun 30, 2020 at 7:23 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > On Tue, Jun 30, 2020 at 07:17:46PM -0700, Eric Dumazet wrote:
> > > The main issue of the prior code was the double read of key->keylen in
> > > tcp_md5_hash_key(), not that few bytes could change under us.
> > > 
> > > I used smp_rmb() to ease backports, since old kernels had no
> > > READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead.
> > 
> > If it's the double-read that you're protecting against, you should
> > just use barrier() and the comment should say so too.
> 
> I made this clear in the changelog, do we want comments all over the places ?

Having to run git for every line of code isn't great.

Comments in code is better than comments in changelogs.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-07-01  2:30                                     ` Eric Dumazet
  2020-07-01  2:39                                       ` Joe Perches
@ 2020-07-01  2:58                                       ` Herbert Xu
  2020-07-01  3:36                                         ` Eric Dumazet
  1 sibling, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2020-07-01  2:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 07:30:43PM -0700, Eric Dumazet wrote:
>
> I made this clear in the changelog, do we want comments all over the places ?
> Do not get me wrong, we had this bug for years and suddenly this is a
> big deal...

I thought you were adding a new pair of smp_rmb/smp_wmb.  If they
already exist in the code then I agree it's not a big deal.  But
adding a new pair of bogus smp_Xmb's is bad for maintenance.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-07-01  2:58                                       ` Herbert Xu
@ 2020-07-01  3:36                                         ` Eric Dumazet
  2020-07-01  3:50                                           ` Herbert Xu
  2020-07-01 12:19                                           ` Mathieu Desnoyers
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2020-07-01  3:36 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 7:59 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Tue, Jun 30, 2020 at 07:30:43PM -0700, Eric Dumazet wrote:
> >
> > I made this clear in the changelog, do we want comments all over the places ?
> > Do not get me wrong, we had this bug for years and suddenly this is a
> > big deal...
>
> I thought you were adding a new pair of smp_rmb/smp_wmb.  If they
> already exist in the code then I agree it's not a big deal.  But
> adding a new pair of bogus smp_Xmb's is bad for maintenance.
>

If I knew so many people were excited about TCP / MD5, I would have
posted all my patches on lkml ;)

Without the smp_wmb() we would still need something to prevent KMSAN
from detecting that we read uninitialized bytes,
if key->keylen is increased.  (initial content of key->key[] is garbage)

Something like this :

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f111660453241692a17c881dd6dc2910a1236263..c3af8180c7049d5c4987bf5c67e4aff2ed6967c9
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4033,11 +4033,9 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);

 int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
tcp_md5sig_key *key)
 {
-       u8 keylen = key->keylen;
+       u8 keylen = READ_ONCE(key->keylen); /* paired with
WRITE_ONCE() in tcp_md5_do_add */
        struct scatterlist sg;

-       smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
-
        sg_init_one(&sg, key->key, keylen);
        ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen);
        return crypto_ahash_update(hp->md5_req);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 99916fcc15ca0be12c2c133ff40516f79e6fdf7f..0d08e0134335a21d23702e6a5c24a0f2b3c61c6f
100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1114,9 +1114,13 @@ int tcp_md5_do_add(struct sock *sk, const union
tcp_md5_addr *addr,
                /* Pre-existing entry - just update that one. */
                memcpy(key->key, newkey, newkeylen);

-               smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
+               /* Pairs with READ_ONCE() in tcp_md5_hash_key().
+                * Also note that a reader could catch new key->keylen value
+                * but old key->key[], this is the reason we use __GFP_ZERO
+                * at sock_kmalloc() time below these lines.
+                */
+               WRITE_ONCE(key->keylen, newkeylen);

-               key->keylen = newkeylen;
                return 0;
        }

@@ -1132,7 +1136,7 @@ int tcp_md5_do_add(struct sock *sk, const union
tcp_md5_addr *addr,
                rcu_assign_pointer(tp->md5sig_info, md5sig);
        }

-       key = sock_kmalloc(sk, sizeof(*key), gfp);
+       key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO);
        if (!key)
                return -ENOMEM;
        if (!tcp_alloc_md5sig_pool()) {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-07-01  3:36                                         ` Eric Dumazet
@ 2020-07-01  3:50                                           ` Herbert Xu
  2020-07-01 12:19                                           ` Mathieu Desnoyers
  1 sibling, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2020-07-01  3:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 08:36:51PM -0700, Eric Dumazet wrote:
>
> If I knew so many people were excited about TCP / MD5, I would have
> posted all my patches on lkml ;)
> 
> Without the smp_wmb() we would still need something to prevent KMSAN
> from detecting that we read uninitialized bytes,
> if key->keylen is increased.  (initial content of key->key[] is garbage)
> 
> Something like this :

LGTM.  Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-07-01  3:36                                         ` Eric Dumazet
  2020-07-01  3:50                                           ` Herbert Xu
@ 2020-07-01 12:19                                           ` Mathieu Desnoyers
  2020-07-01 15:15                                             ` Eric Dumazet
  1 sibling, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2020-07-01 12:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, David S. Miller, Linus Torvalds, linux-kernel,
	netdev, Yuchung Cheng, Jonathan Rajotte-Julien

----- On Jun 30, 2020, at 11:36 PM, Eric Dumazet edumazet@google.com wrote:

> On Tue, Jun 30, 2020 at 7:59 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>>
>> On Tue, Jun 30, 2020 at 07:30:43PM -0700, Eric Dumazet wrote:
>> >
>> > I made this clear in the changelog, do we want comments all over the places ?
>> > Do not get me wrong, we had this bug for years and suddenly this is a
>> > big deal...
>>
>> I thought you were adding a new pair of smp_rmb/smp_wmb.  If they
>> already exist in the code then I agree it's not a big deal.  But
>> adding a new pair of bogus smp_Xmb's is bad for maintenance.
>>
> 
> If I knew so many people were excited about TCP / MD5, I would have
> posted all my patches on lkml ;)
> 
> Without the smp_wmb() we would still need something to prevent KMSAN
> from detecting that we read uninitialized bytes,
> if key->keylen is increased.  (initial content of key->key[] is garbage)
> 
> Something like this :

The approach below looks good to me, but you'll also need to annotate
both tcp_md5_hash_key and tcp_md5_do_add with __no_kcsan or use
data_race(expr) to let the concurrency sanitizer know that there is
a known data race which is there on purpose (triggered by memcpy in tcp_md5_do_add
and somewhere within crypto_ahash_update). See Documentation/dev-tools/kcsan.rst
for details.

Thanks,

Mathieu

> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index
> f111660453241692a17c881dd6dc2910a1236263..c3af8180c7049d5c4987bf5c67e4aff2ed6967c9
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4033,11 +4033,9 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
> 
> int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
> tcp_md5sig_key *key)
> {
> -       u8 keylen = key->keylen;
> +       u8 keylen = READ_ONCE(key->keylen); /* paired with
> WRITE_ONCE() in tcp_md5_do_add */
>        struct scatterlist sg;
> 
> -       smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
> -
>        sg_init_one(&sg, key->key, keylen);
>        ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen);
>        return crypto_ahash_update(hp->md5_req);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index
> 99916fcc15ca0be12c2c133ff40516f79e6fdf7f..0d08e0134335a21d23702e6a5c24a0f2b3c61c6f
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1114,9 +1114,13 @@ int tcp_md5_do_add(struct sock *sk, const union
> tcp_md5_addr *addr,
>                /* Pre-existing entry - just update that one. */
>                memcpy(key->key, newkey, newkeylen);
> 
> -               smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
> +               /* Pairs with READ_ONCE() in tcp_md5_hash_key().
> +                * Also note that a reader could catch new key->keylen value
> +                * but old key->key[], this is the reason we use __GFP_ZERO
> +                * at sock_kmalloc() time below these lines.
> +                */
> +               WRITE_ONCE(key->keylen, newkeylen);
> 
> -               key->keylen = newkeylen;
>                return 0;
>        }
> 
> @@ -1132,7 +1136,7 @@ int tcp_md5_do_add(struct sock *sk, const union
> tcp_md5_addr *addr,
>                rcu_assign_pointer(tp->md5sig_info, md5sig);
>        }
> 
> -       key = sock_kmalloc(sk, sizeof(*key), gfp);
> +       key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO);
>        if (!key)
>                return -ENOMEM;
>         if (!tcp_alloc_md5sig_pool()) {

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-07-01 12:19                                           ` Mathieu Desnoyers
@ 2020-07-01 15:15                                             ` Eric Dumazet
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Dumazet @ 2020-07-01 15:15 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Herbert Xu, David S. Miller, Linus Torvalds, linux-kernel,
	netdev, Yuchung Cheng, Jonathan Rajotte-Julien

On Wed, Jul 1, 2020 at 5:19 AM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:

> The approach below looks good to me, but you'll also need to annotate
> both tcp_md5_hash_key and tcp_md5_do_add with __no_kcsan or use
> data_race(expr) to let the concurrency sanitizer know that there is
> a known data race which is there on purpose (triggered by memcpy in tcp_md5_do_add
> and somewhere within crypto_ahash_update). See Documentation/dev-tools/kcsan.rst
> for details.

Sure, I can add a data_race() and let stable teams handle the
backports without it ;)

>
> Thanks,
>
> Mathieu
>
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index
> > f111660453241692a17c881dd6dc2910a1236263..c3af8180c7049d5c4987bf5c67e4aff2ed6967c9
> > 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -4033,11 +4033,9 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data);
> >
> > int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct
> > tcp_md5sig_key *key)
> > {
> > -       u8 keylen = key->keylen;
> > +       u8 keylen = READ_ONCE(key->keylen); /* paired with
> > WRITE_ONCE() in tcp_md5_do_add */
> >        struct scatterlist sg;
> >
> > -       smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */
> > -
> >        sg_init_one(&sg, key->key, keylen);
> >        ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen);
> >        return crypto_ahash_update(hp->md5_req);
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index
> > 99916fcc15ca0be12c2c133ff40516f79e6fdf7f..0d08e0134335a21d23702e6a5c24a0f2b3c61c6f
> > 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1114,9 +1114,13 @@ int tcp_md5_do_add(struct sock *sk, const union
> > tcp_md5_addr *addr,
> >                /* Pre-existing entry - just update that one. */
> >                memcpy(key->key, newkey, newkeylen);
> >
> > -               smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */
> > +               /* Pairs with READ_ONCE() in tcp_md5_hash_key().
> > +                * Also note that a reader could catch new key->keylen value
> > +                * but old key->key[], this is the reason we use __GFP_ZERO
> > +                * at sock_kmalloc() time below these lines.
> > +                */
> > +               WRITE_ONCE(key->keylen, newkeylen);
> >
> > -               key->keylen = newkeylen;
> >                return 0;
> >        }
> >
> > @@ -1132,7 +1136,7 @@ int tcp_md5_do_add(struct sock *sk, const union
> > tcp_md5_addr *addr,
> >                rcu_assign_pointer(tp->md5sig_info, md5sig);
> >        }
> >
> > -       key = sock_kmalloc(sk, sizeof(*key), gfp);
> > +       key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO);
> >        if (!key)
> >                return -ENOMEM;
> >         if (!tcp_alloc_md5sig_pool()) {
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [regression] TCP_MD5SIG on established sockets
  2020-06-30 22:07                           ` Eric Dumazet
  2020-06-30 22:38                             ` Eric Dumazet
@ 2020-07-01 17:24                             ` Eric Dumazet
  1 sibling, 0 replies; 32+ messages in thread
From: Eric Dumazet @ 2020-07-01 17:24 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev,
	Yuchung Cheng, Jonathan Rajotte-Julien

On Tue, Jun 30, 2020 at 3:07 PM Eric Dumazet <edumazet@google.com> wrote:

> Oh well, tcp_syn_options() is supposed to have the same logic.
>
> Maybe we have an issue with SYNCOOKIES (with MD5 + TS + SACK)
>
> Nice can of worms.


Yes, MD5 does not like SYNCOOKIES in some cases.

In this trace, S is a linux host, the SYNACK is a syncookie.

C host is attempting a SYN with MD5+TS+SACK, which a standard linux
host would not attempt.
But we could expect another stack to attempt this combination.

TCP stack believes it can encode selected TCP options (in the TSVAL value),
but since MD5 option is set, TS option is not sent in the SYNACK.
However we still send other options that MUST not be sent if TS option
could not fit the TCP option space.

10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535,
options [nop,nop,md5 valid,mss 65495,sackOK,TS val 456965269 ecr
0,nop,wscale 8], length 0
10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602,
win 65535, options [nop,nop,md5 valid,mss
65495,nop,nop,sackOK,nop,wscale 8], length 0

<When ACK packets comes back from client, the server has no state and
no TS ecr value, so must assume no option was negotiated>

10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options
[nop,nop,md5 valid], length 0
10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256,
options [nop,nop,md5 valid], length 12
10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options
[nop,nop,md5 valid], length 0

We can see for instance the windows are wrong by a 256 factor (wscale 8)

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-07-01 17:24 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13 19:38 [regression] TC_MD5SIG on established sockets Mathieu Desnoyers
2020-05-13 19:49 ` Eric Dumazet
2020-05-13 19:56   ` Eric Dumazet
2020-06-29 19:43     ` [regression] TCP_MD5SIG " Mathieu Desnoyers
2020-06-29 20:47       ` Eric Dumazet
2020-06-30 19:43         ` Linus Torvalds
2020-06-30 19:52           ` Linus Torvalds
2020-06-30 20:34             ` Mathieu Desnoyers
2020-06-30 20:39               ` Eric Dumazet
2020-06-30 20:44                 ` David Miller
2020-06-30 20:56                   ` Eric Dumazet
2020-06-30 21:17                     ` Mathieu Desnoyers
2020-06-30 21:23                       ` Eric Dumazet
2020-06-30 21:54                         ` Eric Dumazet
2020-06-30 22:07                           ` Eric Dumazet
2020-06-30 22:38                             ` Eric Dumazet
2020-06-30 23:44                               ` Mathieu Desnoyers
2020-07-01  0:01                                 ` Eric Dumazet
2020-07-01  2:02                               ` Herbert Xu
2020-07-01  2:17                                 ` Eric Dumazet
2020-07-01  2:22                                   ` Herbert Xu
2020-07-01  2:30                                     ` Eric Dumazet
2020-07-01  2:39                                       ` Joe Perches
2020-07-01  2:58                                       ` Herbert Xu
2020-07-01  3:36                                         ` Eric Dumazet
2020-07-01  3:50                                           ` Herbert Xu
2020-07-01 12:19                                           ` Mathieu Desnoyers
2020-07-01 15:15                                             ` Eric Dumazet
2020-07-01 17:24                             ` Eric Dumazet
2020-06-30 20:21           ` David Miller
2020-06-30 20:30             ` Eric Dumazet
2020-06-30 20:39               ` Mathieu Desnoyers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.