* [regression] TC_MD5SIG on established sockets @ 2020-05-13 19:38 Mathieu Desnoyers 2020-05-13 19:49 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: Mathieu Desnoyers @ 2020-05-13 19:38 UTC (permalink / raw) To: Eric Dumazet, David S. Miller Cc: linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien Hi, I am reporting a regression with respect to use of TCP_MD5SIG/TCP_MD5SIG_EXT on established sockets. It is observed by a customer. This issue is introduced by this commit: commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets" The intent of this commit appears to be to fix a use of uninitialized value in tcp_parse_options(). The change introduced by this commit is to disallow setting the TCP_MD5SIG{,_EXT} socket options on an established socket. The justification for this change appears in the commit message: "I believe this was caused by a TCP_MD5SIG being set on live flow. This is highly unexpected, since TCP option space is limited. For instance, presence of TCP MD5 option automatically disables TCP TimeStamp option at SYN/SYNACK time, which we can not do once flow has been established. Really, adding/deleting an MD5 key only makes sense on sockets in CLOSE or LISTEN state." However, reading through RFC2385 [1], this justification does not appear correct. Quoting to the RFC: "This password never appears in the connection stream, and the actual form of the password is up to the application. It could even change during the lifetime of a particular connection so long as this change was synchronized on both ends" The paragraph above clearly underlines that changing the MD5 signature of a live TCP socket is allowed. I also do not understand why it would be invalid to transition an established TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting the RFC: "The total header size is also an issue. The TCP header specifies where segment data starts with a 4-bit field which gives the total size of the header (including options) in 32-byte words. This means that the total size of the header plus option must be less than or equal to 60 bytes -- this leaves 40 bytes for options." The paragraph above seems to be the only indication that some TCP options cannot be combined on a given TCP socket: if the resulting header size does not fit. However, I do not see anything in the specification preventing any of the following use-cases on an established TCP socket: - Transition from no-MD5 to MD5, - Transition from MD5 to no-MD5, - Changing the MD5 key associated with a socket. As long as the resulting combination of options does not exceed the available header space. Can we please fix this KASAN report in a way that does not break user-space applications expectations about Linux' implementation of RFC2385 ? Thanks, Mathieu [1] RFC2385: https://tools.ietf.org/html/rfc2385 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TC_MD5SIG on established sockets 2020-05-13 19:38 [regression] TC_MD5SIG on established sockets Mathieu Desnoyers @ 2020-05-13 19:49 ` Eric Dumazet 2020-05-13 19:56 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2020-05-13 19:49 UTC (permalink / raw) To: Mathieu Desnoyers Cc: David S. Miller, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien I do not think we want to transition sockets in the middle. since packets can be re-ordered in the network. MD5 is about security (and a loose form of it), so better make sure all packets have it from the beginning of the flow. A flow with TCP TS on can not suddenly be sending packets without TCP TS. Clearly, trying to support this operation is a can of worms, I do not want to maintain such atrocity. RFC can state whatever it wants, sometimes reality forces us to have sane operations. Thanks. On Wed, May 13, 2020 at 12:38 PM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > Hi, > > I am reporting a regression with respect to use of TCP_MD5SIG/TCP_MD5SIG_EXT > on established sockets. It is observed by a customer. > > This issue is introduced by this commit: > > commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets" > > The intent of this commit appears to be to fix a use of uninitialized value in > tcp_parse_options(). The change introduced by this commit is to disallow setting > the TCP_MD5SIG{,_EXT} socket options on an established socket. > > The justification for this change appears in the commit message: > > "I believe this was caused by a TCP_MD5SIG being set on live > flow. > > This is highly unexpected, since TCP option space is limited. > > For instance, presence of TCP MD5 option automatically disables > TCP TimeStamp option at SYN/SYNACK time, which we can not do > once flow has been established. > > Really, adding/deleting an MD5 key only makes sense on sockets > in CLOSE or LISTEN state." > > However, reading through RFC2385 [1], this justification does not appear > correct. Quoting to the RFC: > > "This password never appears in the connection stream, and the actual > form of the password is up to the application. It could even change > during the lifetime of a particular connection so long as this change > was synchronized on both ends" > > The paragraph above clearly underlines that changing the MD5 signature of > a live TCP socket is allowed. > > I also do not understand why it would be invalid to transition an established > TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting the > RFC: > > "The total header size is also an issue. The TCP header specifies > where segment data starts with a 4-bit field which gives the total > size of the header (including options) in 32-byte words. This means > that the total size of the header plus option must be less than or > equal to 60 bytes -- this leaves 40 bytes for options." > > The paragraph above seems to be the only indication that some TCP options > cannot be combined on a given TCP socket: if the resulting header size does > not fit. However, I do not see anything in the specification preventing any > of the following use-cases on an established TCP socket: > > - Transition from no-MD5 to MD5, > - Transition from MD5 to no-MD5, > - Changing the MD5 key associated with a socket. > > As long as the resulting combination of options does not exceed the available > header space. > > Can we please fix this KASAN report in a way that does not break user-space > applications expectations about Linux' implementation of RFC2385 ? > > Thanks, > > Mathieu > > [1] RFC2385: https://tools.ietf.org/html/rfc2385 > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TC_MD5SIG on established sockets 2020-05-13 19:49 ` Eric Dumazet @ 2020-05-13 19:56 ` Eric Dumazet 2020-06-29 19:43 ` [regression] TCP_MD5SIG " Mathieu Desnoyers 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2020-05-13 19:56 UTC (permalink / raw) To: Mathieu Desnoyers Cc: David S. Miller, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Wed, May 13, 2020 at 12:49 PM Eric Dumazet <edumazet@google.com> wrote: > > I do not think we want to transition sockets in the middle. since > packets can be re-ordered in the network. > > MD5 is about security (and a loose form of it), so better make sure > all packets have it from the beginning of the flow. > > A flow with TCP TS on can not suddenly be sending packets without TCP TS. > > Clearly, trying to support this operation is a can of worms, I do not > want to maintain such atrocity. > > RFC can state whatever it wants, sometimes reality forces us to have > sane operations. > > Thanks. Also the RFC states : "This password never appears in the connection stream, and the actual form of the password is up to the application. It could even change during the lifetime of a particular connection so long as this change was synchronized on both ends" It means the key can be changed, but this does not imply the option can be turned on/off dynamically. > > On Wed, May 13, 2020 at 12:38 PM Mathieu Desnoyers > <mathieu.desnoyers@efficios.com> wrote: > > > > Hi, > > > > I am reporting a regression with respect to use of TCP_MD5SIG/TCP_MD5SIG_EXT > > on established sockets. It is observed by a customer. > > > > This issue is introduced by this commit: > > > > commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets" > > > > The intent of this commit appears to be to fix a use of uninitialized value in > > tcp_parse_options(). The change introduced by this commit is to disallow setting > > the TCP_MD5SIG{,_EXT} socket options on an established socket. > > > > The justification for this change appears in the commit message: > > > > "I believe this was caused by a TCP_MD5SIG being set on live > > flow. > > > > This is highly unexpected, since TCP option space is limited. > > > > For instance, presence of TCP MD5 option automatically disables > > TCP TimeStamp option at SYN/SYNACK time, which we can not do > > once flow has been established. > > > > Really, adding/deleting an MD5 key only makes sense on sockets > > in CLOSE or LISTEN state." > > > > However, reading through RFC2385 [1], this justification does not appear > > correct. Quoting to the RFC: > > > > "This password never appears in the connection stream, and the actual > > form of the password is up to the application. It could even change > > during the lifetime of a particular connection so long as this change > > was synchronized on both ends" > > > > The paragraph above clearly underlines that changing the MD5 signature of > > a live TCP socket is allowed. > > > > I also do not understand why it would be invalid to transition an established > > TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting the > > RFC: > > > > "The total header size is also an issue. The TCP header specifies > > where segment data starts with a 4-bit field which gives the total > > size of the header (including options) in 32-byte words. This means > > that the total size of the header plus option must be less than or > > equal to 60 bytes -- this leaves 40 bytes for options." > > > > The paragraph above seems to be the only indication that some TCP options > > cannot be combined on a given TCP socket: if the resulting header size does > > not fit. However, I do not see anything in the specification preventing any > > of the following use-cases on an established TCP socket: > > > > - Transition from no-MD5 to MD5, > > - Transition from MD5 to no-MD5, > > - Changing the MD5 key associated with a socket. > > > > As long as the resulting combination of options does not exceed the available > > header space. > > > > Can we please fix this KASAN report in a way that does not break user-space > > applications expectations about Linux' implementation of RFC2385 ? > > > > Thanks, > > > > Mathieu > > > > [1] RFC2385: https://tools.ietf.org/html/rfc2385 > > > > -- > > Mathieu Desnoyers > > EfficiOS Inc. > > http://www.efficios.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-05-13 19:56 ` Eric Dumazet @ 2020-06-29 19:43 ` Mathieu Desnoyers 2020-06-29 20:47 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: Mathieu Desnoyers @ 2020-06-29 19:43 UTC (permalink / raw) To: Eric Dumazet, Linus Torvalds Cc: David S. Miller, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien ----- On May 13, 2020, at 3:56 PM, Eric Dumazet edumazet@google.com wrote: > On Wed, May 13, 2020 at 12:49 PM Eric Dumazet <edumazet@google.com> wrote: >> >> >> On Wed, May 13, 2020 at 12:38 PM Mathieu Desnoyers >> <mathieu.desnoyers@efficios.com> wrote: >> > >> > Hi, >> > >> > I am reporting a regression with respect to use of TCP_MD5SIG/TCP_MD5SIG_EXT >> > on established sockets. It is observed by a customer. >> > >> > This issue is introduced by this commit: >> > >> > commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on >> > established sockets" >> > >> > The intent of this commit appears to be to fix a use of uninitialized value in >> > tcp_parse_options(). The change introduced by this commit is to disallow setting >> > the TCP_MD5SIG{,_EXT} socket options on an established socket. >> > >> > The justification for this change appears in the commit message: >> > >> > "I believe this was caused by a TCP_MD5SIG being set on live >> > flow. >> > >> > This is highly unexpected, since TCP option space is limited. >> > >> > For instance, presence of TCP MD5 option automatically disables >> > TCP TimeStamp option at SYN/SYNACK time, which we can not do >> > once flow has been established. >> > >> > Really, adding/deleting an MD5 key only makes sense on sockets >> > in CLOSE or LISTEN state." >> > >> > However, reading through RFC2385 [1], this justification does not appear >> > correct. Quoting to the RFC: >> > >> > "This password never appears in the connection stream, and the actual >> > form of the password is up to the application. It could even change >> > during the lifetime of a particular connection so long as this change >> > was synchronized on both ends" >> > >> > The paragraph above clearly underlines that changing the MD5 signature of >> > a live TCP socket is allowed. >> > >> > I also do not understand why it would be invalid to transition an established >> > TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting the >> > RFC: >> > >> > "The total header size is also an issue. The TCP header specifies >> > where segment data starts with a 4-bit field which gives the total >> > size of the header (including options) in 32-byte words. This means >> > that the total size of the header plus option must be less than or >> > equal to 60 bytes -- this leaves 40 bytes for options." >> > >> > The paragraph above seems to be the only indication that some TCP options >> > cannot be combined on a given TCP socket: if the resulting header size does >> > not fit. However, I do not see anything in the specification preventing any >> > of the following use-cases on an established TCP socket: >> > >> > - Transition from no-MD5 to MD5, >> > - Transition from MD5 to no-MD5, >> > - Changing the MD5 key associated with a socket. >> > >> > As long as the resulting combination of options does not exceed the available >> > header space. >> > >> > Can we please fix this KASAN report in a way that does not break user-space >> > applications expectations about Linux' implementation of RFC2385 ? [...] >> > [1] RFC2385: https://tools.ietf.org/html/rfc2385 >> >> >> I do not think we want to transition sockets in the middle. since >> packets can be re-ordered in the network. >> >> MD5 is about security (and a loose form of it), so better make sure >> all packets have it from the beginning of the flow. >> >> A flow with TCP TS on can not suddenly be sending packets without TCP TS. >> >> Clearly, trying to support this operation is a can of worms, I do not >> want to maintain such atrocity. >> >> RFC can state whatever it wants, sometimes reality forces us to have >> sane operations. >> >> Thanks. >> > Also the RFC states : > > "This password never appears in the connection stream, and the actual > form of the password is up to the application. It could even change > during the lifetime of a particular connection so long as this change > was synchronized on both ends" > > It means the key can be changed, but this does not imply the option > can be turned on/off dynamically. > The change discussed previously (introduced by commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets") breaks user-space ABI. As an example, the following BGP application uses setsockopt TCP_MD5SIG on a live TCP socket: https://github.com/IPInfusion/SDN-IP In addition to break user-space, it also breaks network protocol expectations for network equipment vendors implementing RFC2385. Considering that the goal of these protocols is interaction between different network equipment, breaking compatibility on that side is unexpected as well. Requiring to bring down/up the connection just to change the TCP MD5 password is a no-go in networks with high availability requirements. Changing the BGP authentication password must be allowed without tearing down and re-establishing the TCP sockets. Otherwise it doesn't scale for large network operators to have to individually manage each individual TCP socket in their network. However, based on the feedback I received, it would be acceptable to tear-down the TCP connections and re-establish them when enabling or disabling the MD5 option. Here is a list of a few network vendors along with their behavior with respect to TCP MD5: - Cisco: Allows for password to be changed, but within the hold-down timer (~180 seconds). - Juniper: When password is initially set on active connection it will reset, but after that any subsequent password changes no network resets. - Nokia: No notes on if they flap the tcp connection or not. - Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until both sides are ok with new passwords. - Meta-Switch: Expects the password to be set before a connection is attempted, but no further info on whether they reset the TCP connection on a change. - Avaya: Disable the neighbor, then set password, then re-enable. - Zebos: Would normally allow the change when socket connected. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-29 19:43 ` [regression] TCP_MD5SIG " Mathieu Desnoyers @ 2020-06-29 20:47 ` Eric Dumazet 2020-06-30 19:43 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2020-06-29 20:47 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Linus Torvalds, David S. Miller, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Mon, Jun 29, 2020 at 12:43 PM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > ----- On May 13, 2020, at 3:56 PM, Eric Dumazet edumazet@google.com wrote: > > > On Wed, May 13, 2020 at 12:49 PM Eric Dumazet <edumazet@google.com> wrote: > >> > >> > >> On Wed, May 13, 2020 at 12:38 PM Mathieu Desnoyers > >> <mathieu.desnoyers@efficios.com> wrote: > >> > > >> > Hi, > >> > > >> > I am reporting a regression with respect to use of TCP_MD5SIG/TCP_MD5SIG_EXT > >> > on established sockets. It is observed by a customer. > >> > > >> > This issue is introduced by this commit: > >> > > >> > commit 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on > >> > established sockets" > >> > > >> > The intent of this commit appears to be to fix a use of uninitialized value in > >> > tcp_parse_options(). The change introduced by this commit is to disallow setting > >> > the TCP_MD5SIG{,_EXT} socket options on an established socket. > >> > > >> > The justification for this change appears in the commit message: > >> > > >> > "I believe this was caused by a TCP_MD5SIG being set on live > >> > flow. > >> > > >> > This is highly unexpected, since TCP option space is limited. > >> > > >> > For instance, presence of TCP MD5 option automatically disables > >> > TCP TimeStamp option at SYN/SYNACK time, which we can not do > >> > once flow has been established. > >> > > >> > Really, adding/deleting an MD5 key only makes sense on sockets > >> > in CLOSE or LISTEN state." > >> > > >> > However, reading through RFC2385 [1], this justification does not appear > >> > correct. Quoting to the RFC: > >> > > >> > "This password never appears in the connection stream, and the actual > >> > form of the password is up to the application. It could even change > >> > during the lifetime of a particular connection so long as this change > >> > was synchronized on both ends" > >> > > >> > The paragraph above clearly underlines that changing the MD5 signature of > >> > a live TCP socket is allowed. > >> > > >> > I also do not understand why it would be invalid to transition an established > >> > TCP socket from no-MD5 to MD5, or transition from MD5 to no-MD5. Quoting the > >> > RFC: > >> > > >> > "The total header size is also an issue. The TCP header specifies > >> > where segment data starts with a 4-bit field which gives the total > >> > size of the header (including options) in 32-byte words. This means > >> > that the total size of the header plus option must be less than or > >> > equal to 60 bytes -- this leaves 40 bytes for options." > >> > > >> > The paragraph above seems to be the only indication that some TCP options > >> > cannot be combined on a given TCP socket: if the resulting header size does > >> > not fit. However, I do not see anything in the specification preventing any > >> > of the following use-cases on an established TCP socket: > >> > > >> > - Transition from no-MD5 to MD5, > >> > - Transition from MD5 to no-MD5, > >> > - Changing the MD5 key associated with a socket. > >> > > >> > As long as the resulting combination of options does not exceed the available > >> > header space. > >> > > >> > Can we please fix this KASAN report in a way that does not break user-space > >> > applications expectations about Linux' implementation of RFC2385 ? > [...] > >> > [1] RFC2385: https://tools.ietf.org/html/rfc2385 > >> > >> > >> I do not think we want to transition sockets in the middle. since > >> packets can be re-ordered in the network. > >> > >> MD5 is about security (and a loose form of it), so better make sure > >> all packets have it from the beginning of the flow. > >> > >> A flow with TCP TS on can not suddenly be sending packets without TCP TS. > >> > >> Clearly, trying to support this operation is a can of worms, I do not > >> want to maintain such atrocity. > >> > >> RFC can state whatever it wants, sometimes reality forces us to have > >> sane operations. > >> > >> Thanks. > >> > > Also the RFC states : > > > > "This password never appears in the connection stream, and the actual > > form of the password is up to the application. It could even change > > during the lifetime of a particular connection so long as this change > > was synchronized on both ends" > > > > It means the key can be changed, but this does not imply the option > > can be turned on/off dynamically. > > > > The change discussed previously (introduced by commit 721230326891 "tcp: > md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets") breaks > user-space ABI. As an example, the following BGP application uses > setsockopt TCP_MD5SIG on a live TCP socket: > > https://github.com/IPInfusion/SDN-IP > > In addition to break user-space, it also breaks network protocol > expectations for network equipment vendors implementing RFC2385. > Considering that the goal of these protocols is interaction between > different network equipment, breaking compatibility on that side > is unexpected as well. Requiring to bring down/up the connection > just to change the TCP MD5 password is a no-go in networks with > high availability requirements. Changing the BGP authentication > password must be allowed without tearing down and re-establishing > the TCP sockets. Otherwise it doesn't scale for large network > operators to have to individually manage each individual TCP socket > in their network. However, based on the feedback I received, it > would be acceptable to tear-down the TCP connections and re-establish > them when enabling or disabling the MD5 option. > > Here is a list of a few network vendors along with their behavior > with respect to TCP MD5: > > - Cisco: Allows for password to be changed, but within the hold-down > timer (~180 seconds). > - Juniper: When password is initially set on active connection it will > reset, but after that any subsequent password changes no network > resets. > - Nokia: No notes on if they flap the tcp connection or not. > - Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until > both sides are ok with new passwords. > - Meta-Switch: Expects the password to be set before a connection is > attempted, but no further info on whether they reset the TCP > connection on a change. > - Avaya: Disable the neighbor, then set password, then re-enable. > - Zebos: Would normally allow the change when socket connected. > > If you want to be able to _change_ md5 key, this is fine by me, please send a patch. We can not dynamically turn on MD5, this is mentioned briefly in tcp_synack_options(). If you want to turn on MD5 on an established flow, then you must ensure that both SACK and TS were not enabled in the 3WHS, and then make sure nothing blows up in the stack. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-29 20:47 ` Eric Dumazet @ 2020-06-30 19:43 ` Linus Torvalds 2020-06-30 19:52 ` Linus Torvalds 2020-06-30 20:21 ` David Miller 0 siblings, 2 replies; 32+ messages in thread From: Linus Torvalds @ 2020-06-30 19:43 UTC (permalink / raw) To: Eric Dumazet Cc: Mathieu Desnoyers, David S. Miller, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Mon, Jun 29, 2020 at 1:47 PM Eric Dumazet <edumazet@google.com> wrote: > > If you want to be able to _change_ md5 key, this is fine by me, please > send a patch. Eric, if this change breaks existing users, then it gets reverted. That's just fundamental. No RFC's are in the lreast relevant when compared to "this broke existing users". If you're not willing to do the work to fix it, I will revert that commit. Because that's how it works - you can't ask other people to fix the breakage you introduced. It really is that simple. We do not allow developers to break things and then step away and say "not my problem". Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 19:43 ` Linus Torvalds @ 2020-06-30 19:52 ` Linus Torvalds 2020-06-30 20:34 ` Mathieu Desnoyers 2020-06-30 20:21 ` David Miller 1 sibling, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2020-06-30 19:52 UTC (permalink / raw) To: Eric Dumazet Cc: Mathieu Desnoyers, David S. Miller, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 12:43 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > If you're not willing to do the work to fix it, I will revert that > commit. Hmm. I only now noticed that that commit is over two years old. So I think it's still wrong (clearly others do change passwords outside of listening state), but considering that it apparently took people two years to notice, at least some of the onus on figuring out a better morel is on people who didn't even bother to test things in a timely manner. At some point "entreprise vendor kernels" or whatever who stay with legacy kernels for a long time only have themselves to blame. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 19:52 ` Linus Torvalds @ 2020-06-30 20:34 ` Mathieu Desnoyers 2020-06-30 20:39 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: Mathieu Desnoyers @ 2020-06-30 20:34 UTC (permalink / raw) To: Linus Torvalds, Eric Dumazet Cc: David S. Miller, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien ----- On Jun 30, 2020, at 3:52 PM, Linus Torvalds torvalds@linux-foundation.org wrote: > On Tue, Jun 30, 2020 at 12:43 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> [...] > So I think it's still wrong (clearly others do change passwords > outside of listening state), but considering that it apparently took > people two years to notice, at least some of the onus on figuring out > a better morel is on people who didn't even bother to test things in a > timely manner. I'm fully willing to work with Eric on finding a way forward with a fix which addresses the original issue Eric's patch was trying to fix while preserving ABI compatibility. The main thing we need to agree on at this stage is what is our goal. We can either choose to restore the original ABI behavior entirely, or only focus on what appears to be the most important use-cases. AFAIU, restoring full ABI compatibility would require to re-enable all the following scenarios: A) Transition of live socket from no key -> MD5 key. B) Transition of live socket from MD5 key -> no key. C) Transition of live socket from MD5 key to a different MD5 key. Scenario (C) appears to be the most important use-case, and probably the easiest to restore to its original behavior. AFAIU restoring scenarios A and B would require us to validate how much header space is needed by each SACK, TS and MD5 option enabled on the socket, and reject enabling any option that adds header space requirement exceeding the available space. I welcome advice on what should be the end goal here. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 20:34 ` Mathieu Desnoyers @ 2020-06-30 20:39 ` Eric Dumazet 2020-06-30 20:44 ` David Miller 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2020-06-30 20:39 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Linus Torvalds, David S. Miller, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 1:34 PM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > ----- On Jun 30, 2020, at 3:52 PM, Linus Torvalds torvalds@linux-foundation.org wrote: > > > On Tue, Jun 30, 2020 at 12:43 PM Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > >> > [...] > > So I think it's still wrong (clearly others do change passwords > > outside of listening state), but considering that it apparently took > > people two years to notice, at least some of the onus on figuring out > > a better morel is on people who didn't even bother to test things in a > > timely manner. > > I'm fully willing to work with Eric on finding a way forward with a > fix which addresses the original issue Eric's patch was trying to > fix while preserving ABI compatibility. > > The main thing we need to agree on at this stage is what is our goal. We > can either choose to restore the original ABI behavior entirely, or only > focus on what appears to be the most important use-cases. > > AFAIU, restoring full ABI compatibility would require to re-enable all > the following scenarios: > > A) Transition of live socket from no key -> MD5 key. > B) Transition of live socket from MD5 key -> no key. > C) Transition of live socket from MD5 key to a different MD5 key. > > Scenario (C) appears to be the most important use-case, and probably the > easiest to restore to its original behavior. > > AFAIU restoring scenarios A and B would require us to validate how > much header space is needed by each SACK, TS and MD5 option enabled > on the socket, and reject enabling any option that adds header space > requirement exceeding the available space. > > I welcome advice on what should be the end goal here. > The (C) & (B) case are certainly doable. A) case is more complex, I have no idea of breakages of various TCP stacks if a flow got SACK at some point (in 3WHS) but suddenly becomes Reno. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 20:39 ` Eric Dumazet @ 2020-06-30 20:44 ` David Miller 2020-06-30 20:56 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: David Miller @ 2020-06-30 20:44 UTC (permalink / raw) To: edumazet; +Cc: mathieu.desnoyers, torvalds, linux-kernel, netdev, ycheng, joraj From: Eric Dumazet <edumazet@google.com> Date: Tue, 30 Jun 2020 13:39:27 -0700 > The (C) & (B) case are certainly doable. > > A) case is more complex, I have no idea of breakages of various TCP > stacks if a flow got SACK > at some point (in 3WHS) but suddenly becomes Reno. I agree that C and B are the easiest to implement without having to add complicated code to handle various negotiated TCP option scenerios. It does seem to be that some entities do A, or did I misread your behavioral analysis of various implementations Mathieu? Thanks. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 20:44 ` David Miller @ 2020-06-30 20:56 ` Eric Dumazet 2020-06-30 21:17 ` Mathieu Desnoyers 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2020-06-30 20:56 UTC (permalink / raw) To: David Miller Cc: Mathieu Desnoyers, Linus Torvalds, LKML, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote: > > From: Eric Dumazet <edumazet@google.com> > Date: Tue, 30 Jun 2020 13:39:27 -0700 > > > The (C) & (B) case are certainly doable. > > > > A) case is more complex, I have no idea of breakages of various TCP > > stacks if a flow got SACK > > at some point (in 3WHS) but suddenly becomes Reno. > > I agree that C and B are the easiest to implement without having to > add complicated code to handle various negotiated TCP option > scenerios. > > It does seem to be that some entities do A, or did I misread your > behavioral analysis of various implementations Mathieu? > > Thanks. Yes, another question about Mathieu cases is do determine the behavior of all these stacks vs : SACK option TCP TS option. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 20:56 ` Eric Dumazet @ 2020-06-30 21:17 ` Mathieu Desnoyers 2020-06-30 21:23 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: Mathieu Desnoyers @ 2020-06-30 21:17 UTC (permalink / raw) To: Eric Dumazet Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien ----- On Jun 30, 2020, at 4:56 PM, Eric Dumazet edumazet@google.com wrote: > On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote: >> >> From: Eric Dumazet <edumazet@google.com> >> Date: Tue, 30 Jun 2020 13:39:27 -0700 >> >> > The (C) & (B) case are certainly doable. >> > >> > A) case is more complex, I have no idea of breakages of various TCP >> > stacks if a flow got SACK >> > at some point (in 3WHS) but suddenly becomes Reno. >> >> I agree that C and B are the easiest to implement without having to >> add complicated code to handle various negotiated TCP option >> scenerios. >> >> It does seem to be that some entities do A, or did I misread your >> behavioral analysis of various implementations Mathieu? >> >> Thanks. > > Yes, another question about Mathieu cases is do determine the behavior > of all these stacks vs : > SACK option > TCP TS option. I will ask my customer's networking team to investigate these behaviors, which will allow me to prepare a thorough reply to the questions raised by Eric and David. I expect to have an answer within 2-3 weeks at most. Thank you! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 21:17 ` Mathieu Desnoyers @ 2020-06-30 21:23 ` Eric Dumazet 2020-06-30 21:54 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2020-06-30 21:23 UTC (permalink / raw) To: Mathieu Desnoyers Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > ----- On Jun 30, 2020, at 4:56 PM, Eric Dumazet edumazet@google.com wrote: > > > On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote: > >> > >> From: Eric Dumazet <edumazet@google.com> > >> Date: Tue, 30 Jun 2020 13:39:27 -0700 > >> > >> > The (C) & (B) case are certainly doable. > >> > > >> > A) case is more complex, I have no idea of breakages of various TCP > >> > stacks if a flow got SACK > >> > at some point (in 3WHS) but suddenly becomes Reno. > >> > >> I agree that C and B are the easiest to implement without having to > >> add complicated code to handle various negotiated TCP option > >> scenerios. > >> > >> It does seem to be that some entities do A, or did I misread your > >> behavioral analysis of various implementations Mathieu? > >> > >> Thanks. > > > > Yes, another question about Mathieu cases is do determine the behavior > > of all these stacks vs : > > SACK option > > TCP TS option. > > I will ask my customer's networking team to investigate these behaviors, > which will allow me to prepare a thorough reply to the questions raised > by Eric and David. I expect to have an answer within 2-3 weeks at most. > > Thank you! Great, I am working on adding back support for (B) & (C) by the end of this week. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 21:23 ` Eric Dumazet @ 2020-06-30 21:54 ` Eric Dumazet 2020-06-30 22:07 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2020-06-30 21:54 UTC (permalink / raw) To: Mathieu Desnoyers Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 2:23 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers > <mathieu.desnoyers@efficios.com> wrote: > > > > ----- On Jun 30, 2020, at 4:56 PM, Eric Dumazet edumazet@google.com wrote: > > > > > On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote: > > >> > > >> From: Eric Dumazet <edumazet@google.com> > > >> Date: Tue, 30 Jun 2020 13:39:27 -0700 > > >> > > >> > The (C) & (B) case are certainly doable. > > >> > > > >> > A) case is more complex, I have no idea of breakages of various TCP > > >> > stacks if a flow got SACK > > >> > at some point (in 3WHS) but suddenly becomes Reno. > > >> > > >> I agree that C and B are the easiest to implement without having to > > >> add complicated code to handle various negotiated TCP option > > >> scenerios. > > >> > > >> It does seem to be that some entities do A, or did I misread your > > >> behavioral analysis of various implementations Mathieu? > > >> > > >> Thanks. > > > > > > Yes, another question about Mathieu cases is do determine the behavior > > > of all these stacks vs : > > > SACK option > > > TCP TS option. > > > > I will ask my customer's networking team to investigate these behaviors, > > which will allow me to prepare a thorough reply to the questions raised > > by Eric and David. I expect to have an answer within 2-3 weeks at most. > > > > Thank you! > > > Great, I am working on adding back support for (B) & (C) by the end of > this week. Note that the security issue (of sending uninit bytes to the wire) has been independently fixed with [1] This means syzbot was able to have MD5+TS+SACK ~6 months ago. It seems we (linux) do not enable this combination for PASSIVE flows, (according to tcp_synack_options()), but for ACTIVE flows we do nothing special. So maybe code in tcp_synack_options() should be mirrored to tcp_syn_options() for consistency. (disabling TS if both MD5 and SACK are enabled) [1] commit 9424e2e7ad93ffffa88f882c9bc5023570904b55 Author: Eric Dumazet <edumazet@google.com> Date: Thu Dec 5 10:10:15 2019 -0800 tcp: md5: fix potential overestimation of TCP option space Back in 2008, Adam Langley fixed the corner case of packets for flows having all of the following options : MD5 TS SACK Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block can be cooked from the remaining 8 bytes. tcp_established_options() correctly sets opts->num_sack_blocks to zero, but returns 36 instead of 32. This means TCP cooks packets with 4 extra bytes at the end of options, containing unitialized bytes. Fixes: 33ad798c924b ("tcp: options clean up") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index be6d22b8190fa375074062032105879270af4be5..b184f03d743715ef4b2d166ceae651529be77953 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -755,8 +755,9 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb min_t(unsigned int, eff_sacks, (remaining - TCPOLEN_SACK_BASE_ALIGNED) / TCPOLEN_SACK_PERBLOCK); - size += TCPOLEN_SACK_BASE_ALIGNED + - opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK; + if (likely(opts->num_sack_blocks)) + size += TCPOLEN_SACK_BASE_ALIGNED + + opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK; } return size; ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 21:54 ` Eric Dumazet @ 2020-06-30 22:07 ` Eric Dumazet 2020-06-30 22:38 ` Eric Dumazet 2020-07-01 17:24 ` Eric Dumazet 0 siblings, 2 replies; 32+ messages in thread From: Eric Dumazet @ 2020-06-30 22:07 UTC (permalink / raw) To: Mathieu Desnoyers Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 2:54 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Jun 30, 2020 at 2:23 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers > > <mathieu.desnoyers@efficios.com> wrote: > > > > > > ----- On Jun 30, 2020, at 4:56 PM, Eric Dumazet edumazet@google.com wrote: > > > > > > > On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote: > > > >> > > > >> From: Eric Dumazet <edumazet@google.com> > > > >> Date: Tue, 30 Jun 2020 13:39:27 -0700 > > > >> > > > >> > The (C) & (B) case are certainly doable. > > > >> > > > > >> > A) case is more complex, I have no idea of breakages of various TCP > > > >> > stacks if a flow got SACK > > > >> > at some point (in 3WHS) but suddenly becomes Reno. > > > >> > > > >> I agree that C and B are the easiest to implement without having to > > > >> add complicated code to handle various negotiated TCP option > > > >> scenerios. > > > >> > > > >> It does seem to be that some entities do A, or did I misread your > > > >> behavioral analysis of various implementations Mathieu? > > > >> > > > >> Thanks. > > > > > > > > Yes, another question about Mathieu cases is do determine the behavior > > > > of all these stacks vs : > > > > SACK option > > > > TCP TS option. > > > > > > I will ask my customer's networking team to investigate these behaviors, > > > which will allow me to prepare a thorough reply to the questions raised > > > by Eric and David. I expect to have an answer within 2-3 weeks at most. > > > > > > Thank you! > > > > > > Great, I am working on adding back support for (B) & (C) by the end of > > this week. > > Note that the security issue (of sending uninit bytes to the wire) has > been independently fixed with [1] > > This means syzbot was able to have MD5+TS+SACK ~6 months ago. > > It seems we (linux) do not enable this combination for PASSIVE flows, > (according to tcp_synack_options()), > but for ACTIVE flows we do nothing special. > > So maybe code in tcp_synack_options() should be mirrored to > tcp_syn_options() for consistency. > (disabling TS if both MD5 and SACK are enabled) Oh well, tcp_syn_options() is supposed to have the same logic. Maybe we have an issue with SYNCOOKIES (with MD5 + TS + SACK) Nice can of worms. > > [1] > > commit 9424e2e7ad93ffffa88f882c9bc5023570904b55 > Author: Eric Dumazet <edumazet@google.com> > Date: Thu Dec 5 10:10:15 2019 -0800 > > tcp: md5: fix potential overestimation of TCP option space > > Back in 2008, Adam Langley fixed the corner case of packets for flows > having all of the following options : MD5 TS SACK > > Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block > can be cooked from the remaining 8 bytes. > > tcp_established_options() correctly sets opts->num_sack_blocks > to zero, but returns 36 instead of 32. > > This means TCP cooks packets with 4 extra bytes at the end > of options, containing unitialized bytes. > > Fixes: 33ad798c924b ("tcp: options clean up") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: syzbot <syzkaller@googlegroups.com> > Acked-by: Neal Cardwell <ncardwell@google.com> > Acked-by: Soheil Hassas Yeganeh <soheil@google.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index be6d22b8190fa375074062032105879270af4be5..b184f03d743715ef4b2d166ceae651529be77953 > 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -755,8 +755,9 @@ static unsigned int tcp_established_options(struct > sock *sk, struct sk_buff *skb > min_t(unsigned int, eff_sacks, > (remaining - TCPOLEN_SACK_BASE_ALIGNED) / > TCPOLEN_SACK_PERBLOCK); > - size += TCPOLEN_SACK_BASE_ALIGNED + > - opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK; > + if (likely(opts->num_sack_blocks)) > + size += TCPOLEN_SACK_BASE_ALIGNED + > + opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK; > } > > return size; ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 22:07 ` Eric Dumazet @ 2020-06-30 22:38 ` Eric Dumazet 2020-06-30 23:44 ` Mathieu Desnoyers 2020-07-01 2:02 ` Herbert Xu 2020-07-01 17:24 ` Eric Dumazet 1 sibling, 2 replies; 32+ messages in thread From: Eric Dumazet @ 2020-06-30 22:38 UTC (permalink / raw) To: Mathieu Desnoyers Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 3:07 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Jun 30, 2020 at 2:54 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, Jun 30, 2020 at 2:23 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Tue, Jun 30, 2020 at 2:17 PM Mathieu Desnoyers > > > <mathieu.desnoyers@efficios.com> wrote: > > > > > > > > ----- On Jun 30, 2020, at 4:56 PM, Eric Dumazet edumazet@google.com wrote: > > > > > > > > > On Tue, Jun 30, 2020 at 1:44 PM David Miller <davem@davemloft.net> wrote: > > > > >> > > > > >> From: Eric Dumazet <edumazet@google.com> > > > > >> Date: Tue, 30 Jun 2020 13:39:27 -0700 > > > > >> > > > > >> > The (C) & (B) case are certainly doable. > > > > >> > > > > > >> > A) case is more complex, I have no idea of breakages of various TCP > > > > >> > stacks if a flow got SACK > > > > >> > at some point (in 3WHS) but suddenly becomes Reno. > > > > >> > > > > >> I agree that C and B are the easiest to implement without having to > > > > >> add complicated code to handle various negotiated TCP option > > > > >> scenerios. > > > > >> > > > > >> It does seem to be that some entities do A, or did I misread your > > > > >> behavioral analysis of various implementations Mathieu? > > > > >> > > > > >> Thanks. > > > > > > > > > > Yes, another question about Mathieu cases is do determine the behavior > > > > > of all these stacks vs : > > > > > SACK option > > > > > TCP TS option. > > > > > > > > I will ask my customer's networking team to investigate these behaviors, > > > > which will allow me to prepare a thorough reply to the questions raised > > > > by Eric and David. I expect to have an answer within 2-3 weeks at most. > > > > > > > > Thank you! > > > > > > > > > Great, I am working on adding back support for (B) & (C) by the end of > > > this week. > > > > Note that the security issue (of sending uninit bytes to the wire) has > > been independently fixed with [1] > > > > This means syzbot was able to have MD5+TS+SACK ~6 months ago. > > > > It seems we (linux) do not enable this combination for PASSIVE flows, > > (according to tcp_synack_options()), > > but for ACTIVE flows we do nothing special. > > > > So maybe code in tcp_synack_options() should be mirrored to > > tcp_syn_options() for consistency. > > (disabling TS if both MD5 and SACK are enabled) > > Oh well, tcp_syn_options() is supposed to have the same logic. > > Maybe we have an issue with SYNCOOKIES (with MD5 + TS + SACK) > > Nice can of worms. > For updates of keys, it seems existing code lacks some RCU care. MD5 keys use RCU for lookups/hashes, but the replacement of a key does not allocate a new piece of memory. diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data); int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *key) { struct scatterlist sg; + u8 keylen = key->keylen; - sg_init_one(&sg, key->key, key->keylen); - ahash_request_set_crypt(hp->md5_req, &sg, NULL, key->keylen); + smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */ + + sg_init_one(&sg, key->key, keylen); + ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen); return crypto_ahash_update(hp->md5_req); } EXPORT_SYMBOL(tcp_md5_hash_key); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr, if (key) { /* Pre-existing entry - just update that one. */ memcpy(key->key, newkey, newkeylen); + + smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */ + key->keylen = newkeylen; return 0; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 22:38 ` Eric Dumazet @ 2020-06-30 23:44 ` Mathieu Desnoyers 2020-07-01 0:01 ` Eric Dumazet 2020-07-01 2:02 ` Herbert Xu 1 sibling, 1 reply; 32+ messages in thread From: Mathieu Desnoyers @ 2020-06-30 23:44 UTC (permalink / raw) To: Eric Dumazet, paulmck Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien ----- On Jun 30, 2020, at 6:38 PM, Eric Dumazet edumazet@google.com wrote: [...] > > For updates of keys, it seems existing code lacks some RCU care. > > MD5 keys use RCU for lookups/hashes, but the replacement of a key does > not allocate a new piece of memory. How is that RCU-safe ? Based on what I see here: tcp_md5_do_add() has a comment stating: "/* This can be called on a newly created socket, from other files */" which appears to be untrue if this can indeed be called on a live socket. The path for pre-existing keys does: key = tcp_md5_do_lookup_exact(sk, addr, family, prefixlen, l3index); if (key) { /* Pre-existing entry - just update that one. */ memcpy(key->key, newkey, newkeylen); key->keylen = newkeylen; return 0; } AFAIU, this works only if you assume there are no concurrent readers accessing key->key, else they can see a corrupted key. The change you are proposing adds smp_wmb/smp_rmb to pair stores to key before key_len with loads of key_len before key. I'm not sure what this is trying to achieve, and how it prevents the readers from observing a corrupted state if the key is updated on a live socket ? Based on my understanding, this path which deals with pre-existing keys in-place should only ever be used when there are no concurrent readers, else a new memory allocation would be needed to guarantee that readers always observe a valid copy. Thanks, Mathieu > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index > 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124 > 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data); > int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct > tcp_md5sig_key *key) > { > struct scatterlist sg; > + u8 keylen = key->keylen; > > - sg_init_one(&sg, key->key, key->keylen); > - ahash_request_set_crypt(hp->md5_req, &sg, NULL, key->keylen); > + smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */ > + > + sg_init_one(&sg, key->key, keylen); > + ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen); > return crypto_ahash_update(hp->md5_req); > } > EXPORT_SYMBOL(tcp_md5_hash_key); > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index > ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f > 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union > tcp_md5_addr *addr, > if (key) { > /* Pre-existing entry - just update that one. */ > memcpy(key->key, newkey, newkeylen); > + > + smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */ > + > key->keylen = newkeylen; > return 0; > } -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 23:44 ` Mathieu Desnoyers @ 2020-07-01 0:01 ` Eric Dumazet 0 siblings, 0 replies; 32+ messages in thread From: Eric Dumazet @ 2020-07-01 0:01 UTC (permalink / raw) To: Mathieu Desnoyers Cc: paulmck, David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 4:44 PM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > ----- On Jun 30, 2020, at 6:38 PM, Eric Dumazet edumazet@google.com wrote: > [...] > > > > For updates of keys, it seems existing code lacks some RCU care. > > > > MD5 keys use RCU for lookups/hashes, but the replacement of a key does > > not allocate a new piece of memory. > > How is that RCU-safe ? > > Based on what I see here: > > tcp_md5_do_add() has a comment stating: > > "/* This can be called on a newly created socket, from other files */" > > which appears to be untrue if this can indeed be called on a live socket. "This can be called" is not the same than "this is always called for newly created socket" > > The path for pre-existing keys does: > > key = tcp_md5_do_lookup_exact(sk, addr, family, prefixlen, l3index); > if (key) { > /* Pre-existing entry - just update that one. */ > memcpy(key->key, newkey, newkeylen); > key->keylen = newkeylen; > return 0; > } > > AFAIU, this works only if you assume there are no concurrent readers > accessing key->key, else they can see a corrupted key. This is fine. > > The change you are proposing adds smp_wmb/smp_rmb to pair stores > to key before key_len with loads of key_len before key. I'm not sure > what this is trying to achieve, and how it prevents the readers from > observing a corrupted state if the key is updated on a live socket ? By definition if you change the MD5 key on a socket while packets are flying, the incoming packet could either 1) See old key (packet is dropped) 2) See new key. So any other decision (catching intermediate state) is really not an issue, since you already accepted the fact that a packet could be dropped, and TCP will retransmit. TCP MD5 implementation does not support multiple keys for one flow, you can not have both old and new keys being checked. > > Based on my understanding, this path which deals with pre-existing keys > in-place should only ever be used when there are no concurrent readers, > else a new memory allocation would be needed to guarantee that readers > always observe a valid copy. This is not _needed,_ and since memory allocations can fail, we would potentially break applications assuming that changing MD5 key would never fail. Patch has been sent for review on netdev@ ( https://patchwork.ozlabs.org/project/netdev/patch/20200630234101.3259179-1-edumazet@google.com/ ) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 22:38 ` Eric Dumazet 2020-06-30 23:44 ` Mathieu Desnoyers @ 2020-07-01 2:02 ` Herbert Xu 2020-07-01 2:17 ` Eric Dumazet 1 sibling, 1 reply; 32+ messages in thread From: Herbert Xu @ 2020-07-01 2:02 UTC (permalink / raw) To: Eric Dumazet Cc: mathieu.desnoyers, davem, torvalds, linux-kernel, netdev, ycheng, joraj Eric Dumazet <edumazet@google.com> wrote: > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124 > 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data); > int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct > tcp_md5sig_key *key) > { > struct scatterlist sg; > + u8 keylen = key->keylen; > > - sg_init_one(&sg, key->key, key->keylen); > - ahash_request_set_crypt(hp->md5_req, &sg, NULL, key->keylen); > + smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */ > + > + sg_init_one(&sg, key->key, keylen); > + ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen); > return crypto_ahash_update(hp->md5_req); > } > EXPORT_SYMBOL(tcp_md5_hash_key); > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f > 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union > tcp_md5_addr *addr, > if (key) { > /* Pre-existing entry - just update that one. */ > memcpy(key->key, newkey, newkeylen); > + > + smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */ > + > key->keylen = newkeylen; > return 0; > } This doesn't make sense. Your smp_rmb only guarantees that you see a version of key->key that's newer than keylen. What if the key got changed twice? You could still read more than what's in the key (if that's what you're trying to protect against). Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-07-01 2:02 ` Herbert Xu @ 2020-07-01 2:17 ` Eric Dumazet 2020-07-01 2:22 ` Herbert Xu 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2020-07-01 2:17 UTC (permalink / raw) To: Herbert Xu Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 7:02 PM Herbert Xu <herbert@gondor.apana.org.au> wrote: > > Eric Dumazet <edumazet@google.com> wrote: > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index 810cc164f795f8e1e8ca747ed5df51bb20fec8a2..ecc0e3fabce8b03bef823cbfc5c1b0a9e24df124 > > 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -4034,9 +4034,12 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data); > > int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct > > tcp_md5sig_key *key) > > { > > struct scatterlist sg; > > + u8 keylen = key->keylen; > > > > - sg_init_one(&sg, key->key, key->keylen); > > - ahash_request_set_crypt(hp->md5_req, &sg, NULL, key->keylen); > > + smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */ > > + > > + sg_init_one(&sg, key->key, keylen); > > + ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen); > > return crypto_ahash_update(hp->md5_req); > > } > > EXPORT_SYMBOL(tcp_md5_hash_key); > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > index ad6435ba6d72ffd8caf783bb25cad7ec151d6909..99916fcc15ca0be12c2c133ff40516f79e6fdf7f > > 100644 > > --- a/net/ipv4/tcp_ipv4.c > > +++ b/net/ipv4/tcp_ipv4.c > > @@ -1113,6 +1113,9 @@ int tcp_md5_do_add(struct sock *sk, const union > > tcp_md5_addr *addr, > > if (key) { > > /* Pre-existing entry - just update that one. */ > > memcpy(key->key, newkey, newkeylen); > > + > > + smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */ > > + > > key->keylen = newkeylen; > > return 0; > > } > > This doesn't make sense. Your smp_rmb only guarantees that you > see a version of key->key that's newer than keylen. What if the > key got changed twice? You could still read more than what's in > the key (if that's what you're trying to protect against). The worst that can happen is a dropped packet. Which is anyway going to happen anyway eventually if an application changes a TCP MD5 key on a live flow. The main issue of the prior code was the double read of key->keylen in tcp_md5_hash_key(), not that few bytes could change under us. I used smp_rmb() to ease backports, since old kernels had no READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-07-01 2:17 ` Eric Dumazet @ 2020-07-01 2:22 ` Herbert Xu 2020-07-01 2:30 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: Herbert Xu @ 2020-07-01 2:22 UTC (permalink / raw) To: Eric Dumazet Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 07:17:46PM -0700, Eric Dumazet wrote: > > The main issue of the prior code was the double read of key->keylen in > tcp_md5_hash_key(), not that few bytes could change under us. > > I used smp_rmb() to ease backports, since old kernels had no > READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead. If it's the double-read that you're protecting against, you should just use barrier() and the comment should say so too. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-07-01 2:22 ` Herbert Xu @ 2020-07-01 2:30 ` Eric Dumazet 2020-07-01 2:39 ` Joe Perches 2020-07-01 2:58 ` Herbert Xu 0 siblings, 2 replies; 32+ messages in thread From: Eric Dumazet @ 2020-07-01 2:30 UTC (permalink / raw) To: Herbert Xu Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 7:23 PM Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Tue, Jun 30, 2020 at 07:17:46PM -0700, Eric Dumazet wrote: > > > > The main issue of the prior code was the double read of key->keylen in > > tcp_md5_hash_key(), not that few bytes could change under us. > > > > I used smp_rmb() to ease backports, since old kernels had no > > READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead. > > If it's the double-read that you're protecting against, you should > just use barrier() and the comment should say so too. I made this clear in the changelog, do we want comments all over the places ? Do not get me wrong, we had this bug for years and suddenly this is a big deal... MD5 keys are read with RCU protection, and tcp_md5_do_add() might update in-place a prior key. Normally, typical RCU updates would allocate a new piece of memory. In this case only key->key and key->keylen might be updated, and we do not care if an incoming packet could see the old key, the new one, or some intermediate value, since changing the key on a live flow is known to be problematic anyway. We only want to make sure that in the case key->keylen is changed, cpus in tcp_md5_hash_key() wont try to use uninitialized data, or crash because key->keylen was read twice to feed sg_init_one() and ahash_request_set_crypt() ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-07-01 2:30 ` Eric Dumazet @ 2020-07-01 2:39 ` Joe Perches 2020-07-01 2:58 ` Herbert Xu 1 sibling, 0 replies; 32+ messages in thread From: Joe Perches @ 2020-07-01 2:39 UTC (permalink / raw) To: Eric Dumazet, Herbert Xu Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, 2020-06-30 at 19:30 -0700, Eric Dumazet wrote: > On Tue, Jun 30, 2020 at 7:23 PM Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Tue, Jun 30, 2020 at 07:17:46PM -0700, Eric Dumazet wrote: > > > The main issue of the prior code was the double read of key->keylen in > > > tcp_md5_hash_key(), not that few bytes could change under us. > > > > > > I used smp_rmb() to ease backports, since old kernels had no > > > READ_ONCE()/WRITE_ONCE(), but ACCESS_ONCE() instead. > > > > If it's the double-read that you're protecting against, you should > > just use barrier() and the comment should say so too. > > I made this clear in the changelog, do we want comments all over the places ? Having to run git for every line of code isn't great. Comments in code is better than comments in changelogs. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-07-01 2:30 ` Eric Dumazet 2020-07-01 2:39 ` Joe Perches @ 2020-07-01 2:58 ` Herbert Xu 2020-07-01 3:36 ` Eric Dumazet 1 sibling, 1 reply; 32+ messages in thread From: Herbert Xu @ 2020-07-01 2:58 UTC (permalink / raw) To: Eric Dumazet Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 07:30:43PM -0700, Eric Dumazet wrote: > > I made this clear in the changelog, do we want comments all over the places ? > Do not get me wrong, we had this bug for years and suddenly this is a > big deal... I thought you were adding a new pair of smp_rmb/smp_wmb. If they already exist in the code then I agree it's not a big deal. But adding a new pair of bogus smp_Xmb's is bad for maintenance. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-07-01 2:58 ` Herbert Xu @ 2020-07-01 3:36 ` Eric Dumazet 2020-07-01 3:50 ` Herbert Xu 2020-07-01 12:19 ` Mathieu Desnoyers 0 siblings, 2 replies; 32+ messages in thread From: Eric Dumazet @ 2020-07-01 3:36 UTC (permalink / raw) To: Herbert Xu Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 7:59 PM Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Tue, Jun 30, 2020 at 07:30:43PM -0700, Eric Dumazet wrote: > > > > I made this clear in the changelog, do we want comments all over the places ? > > Do not get me wrong, we had this bug for years and suddenly this is a > > big deal... > > I thought you were adding a new pair of smp_rmb/smp_wmb. If they > already exist in the code then I agree it's not a big deal. But > adding a new pair of bogus smp_Xmb's is bad for maintenance. > If I knew so many people were excited about TCP / MD5, I would have posted all my patches on lkml ;) Without the smp_wmb() we would still need something to prevent KMSAN from detecting that we read uninitialized bytes, if key->keylen is increased. (initial content of key->key[] is garbage) Something like this : diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f111660453241692a17c881dd6dc2910a1236263..c3af8180c7049d5c4987bf5c67e4aff2ed6967c9 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4033,11 +4033,9 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data); int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *key) { - u8 keylen = key->keylen; + u8 keylen = READ_ONCE(key->keylen); /* paired with WRITE_ONCE() in tcp_md5_do_add */ struct scatterlist sg; - smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */ - sg_init_one(&sg, key->key, keylen); ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen); return crypto_ahash_update(hp->md5_req); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 99916fcc15ca0be12c2c133ff40516f79e6fdf7f..0d08e0134335a21d23702e6a5c24a0f2b3c61c6f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1114,9 +1114,13 @@ int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr, /* Pre-existing entry - just update that one. */ memcpy(key->key, newkey, newkeylen); - smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */ + /* Pairs with READ_ONCE() in tcp_md5_hash_key(). + * Also note that a reader could catch new key->keylen value + * but old key->key[], this is the reason we use __GFP_ZERO + * at sock_kmalloc() time below these lines. + */ + WRITE_ONCE(key->keylen, newkeylen); - key->keylen = newkeylen; return 0; } @@ -1132,7 +1136,7 @@ int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr, rcu_assign_pointer(tp->md5sig_info, md5sig); } - key = sock_kmalloc(sk, sizeof(*key), gfp); + key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO); if (!key) return -ENOMEM; if (!tcp_alloc_md5sig_pool()) { ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-07-01 3:36 ` Eric Dumazet @ 2020-07-01 3:50 ` Herbert Xu 2020-07-01 12:19 ` Mathieu Desnoyers 1 sibling, 0 replies; 32+ messages in thread From: Herbert Xu @ 2020-07-01 3:50 UTC (permalink / raw) To: Eric Dumazet Cc: Mathieu Desnoyers, David Miller, Linus Torvalds, LKML, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 08:36:51PM -0700, Eric Dumazet wrote: > > If I knew so many people were excited about TCP / MD5, I would have > posted all my patches on lkml ;) > > Without the smp_wmb() we would still need something to prevent KMSAN > from detecting that we read uninitialized bytes, > if key->keylen is increased. (initial content of key->key[] is garbage) > > Something like this : LGTM. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-07-01 3:36 ` Eric Dumazet 2020-07-01 3:50 ` Herbert Xu @ 2020-07-01 12:19 ` Mathieu Desnoyers 2020-07-01 15:15 ` Eric Dumazet 1 sibling, 1 reply; 32+ messages in thread From: Mathieu Desnoyers @ 2020-07-01 12:19 UTC (permalink / raw) To: Eric Dumazet Cc: Herbert Xu, David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien ----- On Jun 30, 2020, at 11:36 PM, Eric Dumazet edumazet@google.com wrote: > On Tue, Jun 30, 2020 at 7:59 PM Herbert Xu <herbert@gondor.apana.org.au> wrote: >> >> On Tue, Jun 30, 2020 at 07:30:43PM -0700, Eric Dumazet wrote: >> > >> > I made this clear in the changelog, do we want comments all over the places ? >> > Do not get me wrong, we had this bug for years and suddenly this is a >> > big deal... >> >> I thought you were adding a new pair of smp_rmb/smp_wmb. If they >> already exist in the code then I agree it's not a big deal. But >> adding a new pair of bogus smp_Xmb's is bad for maintenance. >> > > If I knew so many people were excited about TCP / MD5, I would have > posted all my patches on lkml ;) > > Without the smp_wmb() we would still need something to prevent KMSAN > from detecting that we read uninitialized bytes, > if key->keylen is increased. (initial content of key->key[] is garbage) > > Something like this : The approach below looks good to me, but you'll also need to annotate both tcp_md5_hash_key and tcp_md5_do_add with __no_kcsan or use data_race(expr) to let the concurrency sanitizer know that there is a known data race which is there on purpose (triggered by memcpy in tcp_md5_do_add and somewhere within crypto_ahash_update). See Documentation/dev-tools/kcsan.rst for details. Thanks, Mathieu > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index > f111660453241692a17c881dd6dc2910a1236263..c3af8180c7049d5c4987bf5c67e4aff2ed6967c9 > 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -4033,11 +4033,9 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data); > > int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct > tcp_md5sig_key *key) > { > - u8 keylen = key->keylen; > + u8 keylen = READ_ONCE(key->keylen); /* paired with > WRITE_ONCE() in tcp_md5_do_add */ > struct scatterlist sg; > > - smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */ > - > sg_init_one(&sg, key->key, keylen); > ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen); > return crypto_ahash_update(hp->md5_req); > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index > 99916fcc15ca0be12c2c133ff40516f79e6fdf7f..0d08e0134335a21d23702e6a5c24a0f2b3c61c6f > 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1114,9 +1114,13 @@ int tcp_md5_do_add(struct sock *sk, const union > tcp_md5_addr *addr, > /* Pre-existing entry - just update that one. */ > memcpy(key->key, newkey, newkeylen); > > - smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */ > + /* Pairs with READ_ONCE() in tcp_md5_hash_key(). > + * Also note that a reader could catch new key->keylen value > + * but old key->key[], this is the reason we use __GFP_ZERO > + * at sock_kmalloc() time below these lines. > + */ > + WRITE_ONCE(key->keylen, newkeylen); > > - key->keylen = newkeylen; > return 0; > } > > @@ -1132,7 +1136,7 @@ int tcp_md5_do_add(struct sock *sk, const union > tcp_md5_addr *addr, > rcu_assign_pointer(tp->md5sig_info, md5sig); > } > > - key = sock_kmalloc(sk, sizeof(*key), gfp); > + key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO); > if (!key) > return -ENOMEM; > if (!tcp_alloc_md5sig_pool()) { -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-07-01 12:19 ` Mathieu Desnoyers @ 2020-07-01 15:15 ` Eric Dumazet 0 siblings, 0 replies; 32+ messages in thread From: Eric Dumazet @ 2020-07-01 15:15 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Herbert Xu, David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Wed, Jul 1, 2020 at 5:19 AM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > The approach below looks good to me, but you'll also need to annotate > both tcp_md5_hash_key and tcp_md5_do_add with __no_kcsan or use > data_race(expr) to let the concurrency sanitizer know that there is > a known data race which is there on purpose (triggered by memcpy in tcp_md5_do_add > and somewhere within crypto_ahash_update). See Documentation/dev-tools/kcsan.rst > for details. Sure, I can add a data_race() and let stable teams handle the backports without it ;) > > Thanks, > > Mathieu > > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index > > f111660453241692a17c881dd6dc2910a1236263..c3af8180c7049d5c4987bf5c67e4aff2ed6967c9 > > 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -4033,11 +4033,9 @@ EXPORT_SYMBOL(tcp_md5_hash_skb_data); > > > > int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct > > tcp_md5sig_key *key) > > { > > - u8 keylen = key->keylen; > > + u8 keylen = READ_ONCE(key->keylen); /* paired with > > WRITE_ONCE() in tcp_md5_do_add */ > > struct scatterlist sg; > > > > - smp_rmb(); /* paired with smp_wmb() in tcp_md5_do_add() */ > > - > > sg_init_one(&sg, key->key, keylen); > > ahash_request_set_crypt(hp->md5_req, &sg, NULL, keylen); > > return crypto_ahash_update(hp->md5_req); > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > index > > 99916fcc15ca0be12c2c133ff40516f79e6fdf7f..0d08e0134335a21d23702e6a5c24a0f2b3c61c6f > > 100644 > > --- a/net/ipv4/tcp_ipv4.c > > +++ b/net/ipv4/tcp_ipv4.c > > @@ -1114,9 +1114,13 @@ int tcp_md5_do_add(struct sock *sk, const union > > tcp_md5_addr *addr, > > /* Pre-existing entry - just update that one. */ > > memcpy(key->key, newkey, newkeylen); > > > > - smp_wmb(); /* pairs with smp_rmb() in tcp_md5_hash_key() */ > > + /* Pairs with READ_ONCE() in tcp_md5_hash_key(). > > + * Also note that a reader could catch new key->keylen value > > + * but old key->key[], this is the reason we use __GFP_ZERO > > + * at sock_kmalloc() time below these lines. > > + */ > > + WRITE_ONCE(key->keylen, newkeylen); > > > > - key->keylen = newkeylen; > > return 0; > > } > > > > @@ -1132,7 +1136,7 @@ int tcp_md5_do_add(struct sock *sk, const union > > tcp_md5_addr *addr, > > rcu_assign_pointer(tp->md5sig_info, md5sig); > > } > > > > - key = sock_kmalloc(sk, sizeof(*key), gfp); > > + key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO); > > if (!key) > > return -ENOMEM; > > if (!tcp_alloc_md5sig_pool()) { > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 22:07 ` Eric Dumazet 2020-06-30 22:38 ` Eric Dumazet @ 2020-07-01 17:24 ` Eric Dumazet 1 sibling, 0 replies; 32+ messages in thread From: Eric Dumazet @ 2020-07-01 17:24 UTC (permalink / raw) To: Mathieu Desnoyers Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 3:07 PM Eric Dumazet <edumazet@google.com> wrote: > Oh well, tcp_syn_options() is supposed to have the same logic. > > Maybe we have an issue with SYNCOOKIES (with MD5 + TS + SACK) > > Nice can of worms. Yes, MD5 does not like SYNCOOKIES in some cases. In this trace, S is a linux host, the SYNACK is a syncookie. C host is attempting a SYN with MD5+TS+SACK, which a standard linux host would not attempt. But we could expect another stack to attempt this combination. TCP stack believes it can encode selected TCP options (in the TSVAL value), but since MD5 option is set, TS option is not sent in the SYNACK. However we still send other options that MUST not be sent if TS option could not fit the TCP option space. 10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535, options [nop,nop,md5 valid,mss 65495,sackOK,TS val 456965269 ecr 0,nop,wscale 8], length 0 10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602, win 65535, options [nop,nop,md5 valid,mss 65495,nop,nop,sackOK,nop,wscale 8], length 0 <When ACK packets comes back from client, the server has no state and no TS ecr value, so must assume no option was negotiated> 10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid], length 0 10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid], length 12 10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options [nop,nop,md5 valid], length 0 We can see for instance the windows are wrong by a 256 factor (wscale 8) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 19:43 ` Linus Torvalds 2020-06-30 19:52 ` Linus Torvalds @ 2020-06-30 20:21 ` David Miller 2020-06-30 20:30 ` Eric Dumazet 1 sibling, 1 reply; 32+ messages in thread From: David Miller @ 2020-06-30 20:21 UTC (permalink / raw) To: torvalds; +Cc: edumazet, mathieu.desnoyers, linux-kernel, netdev, ycheng, joraj From: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue, 30 Jun 2020 12:43:21 -0700 > If you're not willing to do the work to fix it, I will revert that > commit. Please let me handle this situation instead of making threats, this just got reported. Thank you. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 20:21 ` David Miller @ 2020-06-30 20:30 ` Eric Dumazet 2020-06-30 20:39 ` Mathieu Desnoyers 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2020-06-30 20:30 UTC (permalink / raw) To: David Miller Cc: Linus Torvalds, Mathieu Desnoyers, LKML, netdev, Yuchung Cheng, Jonathan Rajotte-Julien On Tue, Jun 30, 2020 at 1:21 PM David Miller <davem@davemloft.net> wrote: > > From: Linus Torvalds <torvalds@linux-foundation.org> > Date: Tue, 30 Jun 2020 12:43:21 -0700 > > > If you're not willing to do the work to fix it, I will revert that > > commit. > > Please let me handle this situation instead of making threats, this > just got reported. > > Thank you. > Also keep in mind the commit fixed a security issue, since we were sending on the wire garbage bytes from the kernel. We can not simply revert it and hope for the best. I find quite alarming vendors still use TCP MD5 "for security reasons", but none of them have contributed to it in linux kernel since 2018 (Time of the 'buggy patch') ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [regression] TCP_MD5SIG on established sockets 2020-06-30 20:30 ` Eric Dumazet @ 2020-06-30 20:39 ` Mathieu Desnoyers 0 siblings, 0 replies; 32+ messages in thread From: Mathieu Desnoyers @ 2020-06-30 20:39 UTC (permalink / raw) To: Eric Dumazet Cc: David S. Miller, Linus Torvalds, linux-kernel, netdev, Yuchung Cheng, Jonathan Rajotte-Julien ----- On Jun 30, 2020, at 4:30 PM, Eric Dumazet edumazet@google.com wrote: > On Tue, Jun 30, 2020 at 1:21 PM David Miller <davem@davemloft.net> wrote: >> >> From: Linus Torvalds <torvalds@linux-foundation.org> >> Date: Tue, 30 Jun 2020 12:43:21 -0700 >> >> > If you're not willing to do the work to fix it, I will revert that >> > commit. >> >> Please let me handle this situation instead of making threats, this >> just got reported. >> >> Thank you. >> > > Also keep in mind the commit fixed a security issue, since we were > sending on the wire > garbage bytes from the kernel. > > We can not simply revert it and hope for the best. > > I find quite alarming vendors still use TCP MD5 "for security > reasons", but none of them have contributed to it in linux kernel > since 2018 > (Time of the 'buggy patch') I'm helping a customer increase their contributions and feedback to upstream. As we can see, they have accumulated some backlog over time. Clearly reverting a security fix is not acceptable here. Coming up with a proper ABI-compatible fix should not be out of our reach though. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2020-07-01 17:24 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-13 19:38 [regression] TC_MD5SIG on established sockets Mathieu Desnoyers 2020-05-13 19:49 ` Eric Dumazet 2020-05-13 19:56 ` Eric Dumazet 2020-06-29 19:43 ` [regression] TCP_MD5SIG " Mathieu Desnoyers 2020-06-29 20:47 ` Eric Dumazet 2020-06-30 19:43 ` Linus Torvalds 2020-06-30 19:52 ` Linus Torvalds 2020-06-30 20:34 ` Mathieu Desnoyers 2020-06-30 20:39 ` Eric Dumazet 2020-06-30 20:44 ` David Miller 2020-06-30 20:56 ` Eric Dumazet 2020-06-30 21:17 ` Mathieu Desnoyers 2020-06-30 21:23 ` Eric Dumazet 2020-06-30 21:54 ` Eric Dumazet 2020-06-30 22:07 ` Eric Dumazet 2020-06-30 22:38 ` Eric Dumazet 2020-06-30 23:44 ` Mathieu Desnoyers 2020-07-01 0:01 ` Eric Dumazet 2020-07-01 2:02 ` Herbert Xu 2020-07-01 2:17 ` Eric Dumazet 2020-07-01 2:22 ` Herbert Xu 2020-07-01 2:30 ` Eric Dumazet 2020-07-01 2:39 ` Joe Perches 2020-07-01 2:58 ` Herbert Xu 2020-07-01 3:36 ` Eric Dumazet 2020-07-01 3:50 ` Herbert Xu 2020-07-01 12:19 ` Mathieu Desnoyers 2020-07-01 15:15 ` Eric Dumazet 2020-07-01 17:24 ` Eric Dumazet 2020-06-30 20:21 ` David Miller 2020-06-30 20:30 ` Eric Dumazet 2020-06-30 20:39 ` Mathieu Desnoyers
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.