linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Steam is broken on new kernels
@ 2019-06-21 21:27 Pierre-Loup A. Griffais
  2019-06-21 21:41 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 19+ messages in thread
From: Pierre-Loup A. Griffais @ 2019-06-21 21:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman, edumazet, lkml; +Cc: torvalds

This seems to have broken us:

https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.1.11

Here's some affected users:

https://github.com/ValveSoftware/steam-for-linux/issues/6326

https://www.reddit.com/r/linux_gaming/comments/c37lmh/psa_steam_does_not_connect_on_kernels_newer_than/

https://www.phoronix.com/scan.php?page=news_item&px=Steam-Networking-Kernel-Woes

I don't really understand that distributions that claim to be desktop 
products would have fast-tracked a server-oriented fix to all their 
users without testing one of the primary desktop usecases, but that's 
another thing to figure out later.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-21 21:27 Steam is broken on new kernels Pierre-Loup A. Griffais
@ 2019-06-21 21:41 ` Greg Kroah-Hartman
  2019-06-21 21:43   ` Pierre-Loup A. Griffais
  2019-06-21 22:19   ` Linus Torvalds
  0 siblings, 2 replies; 19+ messages in thread
From: Greg Kroah-Hartman @ 2019-06-21 21:41 UTC (permalink / raw)
  To: Pierre-Loup A. Griffais; +Cc: edumazet, lkml, torvalds

On Fri, Jun 21, 2019 at 02:27:41PM -0700, Pierre-Loup A. Griffais wrote:
> This seems to have broken us:
> 
> https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.1.11
> 
> Here's some affected users:
> 
> https://github.com/ValveSoftware/steam-for-linux/issues/6326
> 
> https://www.reddit.com/r/linux_gaming/comments/c37lmh/psa_steam_does_not_connect_on_kernels_newer_than/
> 
> https://www.phoronix.com/scan.php?page=news_item&px=Steam-Networking-Kernel-Woes
> 
> I don't really understand that distributions that claim to be desktop
> products would have fast-tracked a server-oriented fix to all their users
> without testing one of the primary desktop usecases, but that's another
> thing to figure out later.
> 

What specific commit caused the breakage?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-21 21:41 ` Greg Kroah-Hartman
@ 2019-06-21 21:43   ` Pierre-Loup A. Griffais
  2019-06-21 22:19   ` Linus Torvalds
  1 sibling, 0 replies; 19+ messages in thread
From: Pierre-Loup A. Griffais @ 2019-06-21 21:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: edumazet, lkml, torvalds



On 6/21/19 2:41 PM, Greg Kroah-Hartman wrote:
> On Fri, Jun 21, 2019 at 02:27:41PM -0700, Pierre-Loup A. Griffais wrote:
>> This seems to have broken us:
>>
>> https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.1.11
>>
>> Here's some affected users:
>>
>> https://github.com/ValveSoftware/steam-for-linux/issues/6326
>>
>> https://www.reddit.com/r/linux_gaming/comments/c37lmh/psa_steam_does_not_connect_on_kernels_newer_than/
>>
>> https://www.phoronix.com/scan.php?page=news_item&px=Steam-Networking-Kernel-Woes
>>
>> I don't really understand that distributions that claim to be desktop
>> products would have fast-tracked a server-oriented fix to all their users
>> without testing one of the primary desktop usecases, but that's another
>> thing to figure out later.
>>
> 
> What specific commit caused the breakage?

Unclear as of yet, we're starting that process on our end just now. Not 
sure if a user has already performed a bisect and shared his findings in 
one of the links below, but we'll be scouring these and doing our own 
testing as next steps.

> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-21 21:41 ` Greg Kroah-Hartman
  2019-06-21 21:43   ` Pierre-Loup A. Griffais
@ 2019-06-21 22:19   ` Linus Torvalds
       [not found]     ` <CANn89iL5+x3n9H9v4O6y39W=jvQs=uuXbzOvN5mBbcj0t+wdeg@mail.gmail.com>
  1 sibling, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2019-06-21 22:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Pierre-Loup A. Griffais, Eric Dumazet, lkml

On Fri, Jun 21, 2019 at 2:41 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> What specific commit caused the breakage?

Both on reddit and on github there seems to be confusion about whether
it's a problem or not. Some people have it working with the exact same
kernel that breaks for others.

And then some people seem to say it works intermittently for them,
which seems to indicate a timing issue.

Looking at the SACK patches (assuming it's one of them), I'd suspect
the "tcp: tcp_fragment() should apply sane memory limits".

Eric, that one does

       if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
               NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
               return -ENOMEM;
       }

but I think it's *normal* for "sk_wmem_queued >> 1" to be around the
same size as sk_sndbuf. So if there is some fragmentation, and we add
more skb's to it, that would seem to trigger fairly easily.
Particularly since this is all in "truesize" units, which can be a lot
bigger than the packets themselves.

I don't know the code, so I may be out to lunch and barking up
completely the wrong tree, but that particular check does seem like it
might trigger much more easily than I think the code _intended_ it to
trigger?

Pierre-Loup - do you guys have a test-case inside of valve? Or is this
purely "we see some people with problems"?

               Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
       [not found]     ` <CANn89iL5+x3n9H9v4O6y39W=jvQs=uuXbzOvN5mBbcj0t+wdeg@mail.gmail.com>
@ 2019-06-21 23:04       ` Pierre-Loup A. Griffais
  2019-06-21 23:42         ` Josh Hunt
  2019-06-21 23:54       ` Linus Torvalds
  1 sibling, 1 reply; 19+ messages in thread
From: Pierre-Loup A. Griffais @ 2019-06-21 23:04 UTC (permalink / raw)
  To: Eric Dumazet, Linus Torvalds; +Cc: Greg Kroah-Hartman, lkml



On 6/21/19 3:38 PM, Eric Dumazet wrote:
> Please look at my recent patch.
>   Sorry I am travelling....
> 
> On Fri, Jun 21, 2019, 6:19 PM Linus Torvalds 
> <torvalds@linux-foundation.org <mailto:torvalds@linux-foundation.org>> 
> wrote:
> 
>     On Fri, Jun 21, 2019 at 2:41 PM Greg Kroah-Hartman
>     <gregkh@linuxfoundation.org <mailto:gregkh@linuxfoundation.org>> wrote:
>      >
>      > What specific commit caused the breakage?
> 
>     Both on reddit and on github there seems to be confusion about whether
>     it's a problem or not. Some people have it working with the exact same
>     kernel that breaks for others.
> 
>     And then some people seem to say it works intermittently for them,
>     which seems to indicate a timing issue.
> 
>     Looking at the SACK patches (assuming it's one of them), I'd suspect
>     the "tcp: tcp_fragment() should apply sane memory limits".
> 
>     Eric, that one does
> 
>             if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
>                     NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
>                     return -ENOMEM;
>             }
> 
>     but I think it's *normal* for "sk_wmem_queued >> 1" to be around the
>     same size as sk_sndbuf. So if there is some fragmentation, and we add
>     more skb's to it, that would seem to trigger fairly easily.
>     Particularly since this is all in "truesize" units, which can be a lot
>     bigger than the packets themselves.
> 
>     I don't know the code, so I may be out to lunch and barking up
>     completely the wrong tree, but that particular check does seem like it
>     might trigger much more easily than I think the code _intended_ it to
>     trigger?
> 
>     Pierre-Loup - do you guys have a test-case inside of valve? Or is this
>     purely "we see some people with problems"?

Definitely the latter, although the volume of complaints clearly points 
to a real problem from our experience. Reproducing locally, bisecting 
and testing possible fixes is just now starting on our end.

I agree not all users seem affected; most affected people report success 
by using -tcp to launch Steam, which makes it use direct TCP instead of 
WebSockets, our current default connection method for Linux.

Thanks,
  - Pierre-Loup

> 
>                     Linus
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-21 23:04       ` Pierre-Loup A. Griffais
@ 2019-06-21 23:42         ` Josh Hunt
  0 siblings, 0 replies; 19+ messages in thread
From: Josh Hunt @ 2019-06-21 23:42 UTC (permalink / raw)
  To: Pierre-Loup A. Griffais
  Cc: Eric Dumazet, Linus Torvalds, Greg Kroah-Hartman, lkml

On Fri, Jun 21, 2019 at 4:07 PM Pierre-Loup A. Griffais
<pgriffais@valvesoftware.com> wrote:
>
>
>
> On 6/21/19 3:38 PM, Eric Dumazet wrote:
> > Please look at my recent patch.
> >   Sorry I am travelling....
> >
> > On Fri, Jun 21, 2019, 6:19 PM Linus Torvalds
> > <torvalds@linux-foundation.org <mailto:torvalds@linux-foundation.org>>
> > wrote:
> >
> >     On Fri, Jun 21, 2019 at 2:41 PM Greg Kroah-Hartman
> >     <gregkh@linuxfoundation.org <mailto:gregkh@linuxfoundation.org>> wrote:
> >      >
> >      > What specific commit caused the breakage?
> >
> >     Both on reddit and on github there seems to be confusion about whether
> >     it's a problem or not. Some people have it working with the exact same
> >     kernel that breaks for others.
> >
> >     And then some people seem to say it works intermittently for them,
> >     which seems to indicate a timing issue.
> >
> >     Looking at the SACK patches (assuming it's one of them), I'd suspect
> >     the "tcp: tcp_fragment() should apply sane memory limits".
> >
> >     Eric, that one does
> >
> >             if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
> >                     NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
> >                     return -ENOMEM;
> >             }
> >
> >     but I think it's *normal* for "sk_wmem_queued >> 1" to be around the
> >     same size as sk_sndbuf. So if there is some fragmentation, and we add
> >     more skb's to it, that would seem to trigger fairly easily.
> >     Particularly since this is all in "truesize" units, which can be a lot
> >     bigger than the packets themselves.
> >
> >     I don't know the code, so I may be out to lunch and barking up
> >     completely the wrong tree, but that particular check does seem like it
> >     might trigger much more easily than I think the code _intended_ it to
> >     trigger?
> >
> >     Pierre-Loup - do you guys have a test-case inside of valve? Or is this
> >     purely "we see some people with problems"?
>
> Definitely the latter, although the volume of complaints clearly points
> to a real problem from our experience. Reproducing locally, bisecting
> and testing possible fixes is just now starting on our end.
>
> I agree not all users seem affected; most affected people report success
> by using -tcp to launch Steam, which makes it use direct TCP instead of
> WebSockets, our current default connection method for Linux.
>
> Thanks,
>   - Pierre-Loup
>
> >
> >                     Linus
> >
>
I asked on the github thread if users seeing the problem could check
the new wqueue too big counter:
https://github.com/ValveSoftware/steam-for-linux/issues/6326

So far one person is seeing the counter increase when they see the
problem, and another that doesn't see the problem has the counter at
0. Obviously not a great sample size, but hopefully more will report.
If nothing else, someone is seeing the counter increase while trying
to connect to steam.
-- 
Josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
       [not found]     ` <CANn89iL5+x3n9H9v4O6y39W=jvQs=uuXbzOvN5mBbcj0t+wdeg@mail.gmail.com>
  2019-06-21 23:04       ` Pierre-Loup A. Griffais
@ 2019-06-21 23:54       ` Linus Torvalds
  2019-06-22  0:19         ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2019-06-21 23:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Greg Kroah-Hartman, Pierre-Loup A. Griffais, lkml

Eric is talking about this patch, I think:

   https://patchwork.ozlabs.org/patch/1120222/

I guess I'll ask people on the github thread to test that too.

                  Linus

On Fri, Jun 21, 2019 at 3:38 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Please look at my recent patch.
>  Sorry I am travelling....
>
> On Fri, Jun 21, 2019, 6:19 PM Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> On Fri, Jun 21, 2019 at 2:41 PM Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>> >
>> > What specific commit caused the breakage?
>>
>> Both on reddit and on github there seems to be confusion about whether
>> it's a problem or not. Some people have it working with the exact same
>> kernel that breaks for others.
>>
>> And then some people seem to say it works intermittently for them,
>> which seems to indicate a timing issue.
>>
>> Looking at the SACK patches (assuming it's one of them), I'd suspect
>> the "tcp: tcp_fragment() should apply sane memory limits".
>>
>> Eric, that one does
>>
>>        if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
>>                NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
>>                return -ENOMEM;
>>        }
>>
>> but I think it's *normal* for "sk_wmem_queued >> 1" to be around the
>> same size as sk_sndbuf. So if there is some fragmentation, and we add
>> more skb's to it, that would seem to trigger fairly easily.
>> Particularly since this is all in "truesize" units, which can be a lot
>> bigger than the packets themselves.
>>
>> I don't know the code, so I may be out to lunch and barking up
>> completely the wrong tree, but that particular check does seem like it
>> might trigger much more easily than I think the code _intended_ it to
>> trigger?
>>
>> Pierre-Loup - do you guys have a test-case inside of valve? Or is this
>> purely "we see some people with problems"?
>>
>>                Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-21 23:54       ` Linus Torvalds
@ 2019-06-22  0:19         ` Eric Dumazet
  2019-06-22  1:01           ` Pierre-Loup A. Griffais
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2019-06-22  0:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Greg Kroah-Hartman, Pierre-Loup A. Griffais, lkml

On Fri, Jun 21, 2019 at 7:54 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Eric is talking about this patch, I think:
>
>    https://patchwork.ozlabs.org/patch/1120222/
>

That  is correct.

I am about to take a flight from Boston to Paris, so I can not really
follow discussions/tests for the following hours.

Thanks.

> I guess I'll ask people on the github thread to test that too.
>
>                   Linus
>
> On Fri, Jun 21, 2019 at 3:38 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > Please look at my recent patch.
> >  Sorry I am travelling....
> >
> > On Fri, Jun 21, 2019, 6:19 PM Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >>
> >> On Fri, Jun 21, 2019 at 2:41 PM Greg Kroah-Hartman
> >> <gregkh@linuxfoundation.org> wrote:
> >> >
> >> > What specific commit caused the breakage?
> >>
> >> Both on reddit and on github there seems to be confusion about whether
> >> it's a problem or not. Some people have it working with the exact same
> >> kernel that breaks for others.
> >>
> >> And then some people seem to say it works intermittently for them,
> >> which seems to indicate a timing issue.
> >>
> >> Looking at the SACK patches (assuming it's one of them), I'd suspect
> >> the "tcp: tcp_fragment() should apply sane memory limits".
> >>
> >> Eric, that one does
> >>
> >>        if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
> >>                NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
> >>                return -ENOMEM;
> >>        }
> >>
> >> but I think it's *normal* for "sk_wmem_queued >> 1" to be around the
> >> same size as sk_sndbuf. So if there is some fragmentation, and we add
> >> more skb's to it, that would seem to trigger fairly easily.
> >> Particularly since this is all in "truesize" units, which can be a lot
> >> bigger than the packets themselves.
> >>
> >> I don't know the code, so I may be out to lunch and barking up
> >> completely the wrong tree, but that particular check does seem like it
> >> might trigger much more easily than I think the code _intended_ it to
> >> trigger?
> >>
> >> Pierre-Loup - do you guys have a test-case inside of valve? Or is this
> >> purely "we see some people with problems"?
> >>
> >>                Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-22  0:19         ` Eric Dumazet
@ 2019-06-22  1:01           ` Pierre-Loup A. Griffais
  2019-06-22  5:28             ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Pierre-Loup A. Griffais @ 2019-06-22  1:01 UTC (permalink / raw)
  To: Eric Dumazet, Linus Torvalds; +Cc: Greg Kroah-Hartman, lkml



On 6/21/19 5:19 PM, Eric Dumazet wrote:
> On Fri, Jun 21, 2019 at 7:54 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Eric is talking about this patch, I think:
>>
>>     https://patchwork.ozlabs.org/patch/1120222/
>>
> 
> That  is correct.
> 
> I am about to take a flight from Boston to Paris, so I can not really
> follow discussions/tests for the following hours.

I built the tip of linux-5.1.y and reproduced the issue while trying to 
log out and back into Steam; it exhibited this symptom as well:

pgriffais@pgriffais:~$ nstat -az | grep -i wqueue
TcpExtTCPWqueueTooBig           31                 0.0

I applied Eric's path to the tip of the branch and ran that kernel and 
the bug didn't occur through several logout / login cycles, so things 
look good at first glance. I'll keep running that kernel and report back 
if anything crops up in the future, but I believe we're good, beyond 
getting distros to ship this additional fix.

Thanks,
  - Pierre-Loup

> 
> Thanks.
> 
>> I guess I'll ask people on the github thread to test that too.
>>
>>                    Linus
>>
>> On Fri, Jun 21, 2019 at 3:38 PM Eric Dumazet <edumazet@google.com> wrote:
>>>
>>> Please look at my recent patch.
>>>   Sorry I am travelling....
>>>
>>> On Fri, Jun 21, 2019, 6:19 PM Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>>
>>>> On Fri, Jun 21, 2019 at 2:41 PM Greg Kroah-Hartman
>>>> <gregkh@linuxfoundation.org> wrote:
>>>>>
>>>>> What specific commit caused the breakage?
>>>>
>>>> Both on reddit and on github there seems to be confusion about whether
>>>> it's a problem or not. Some people have it working with the exact same
>>>> kernel that breaks for others.
>>>>
>>>> And then some people seem to say it works intermittently for them,
>>>> which seems to indicate a timing issue.
>>>>
>>>> Looking at the SACK patches (assuming it's one of them), I'd suspect
>>>> the "tcp: tcp_fragment() should apply sane memory limits".
>>>>
>>>> Eric, that one does
>>>>
>>>>         if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
>>>>                 NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
>>>>                 return -ENOMEM;
>>>>         }
>>>>
>>>> but I think it's *normal* for "sk_wmem_queued >> 1" to be around the
>>>> same size as sk_sndbuf. So if there is some fragmentation, and we add
>>>> more skb's to it, that would seem to trigger fairly easily.
>>>> Particularly since this is all in "truesize" units, which can be a lot
>>>> bigger than the packets themselves.
>>>>
>>>> I don't know the code, so I may be out to lunch and barking up
>>>> completely the wrong tree, but that particular check does seem like it
>>>> might trigger much more easily than I think the code _intended_ it to
>>>> trigger?
>>>>
>>>> Pierre-Loup - do you guys have a test-case inside of valve? Or is this
>>>> purely "we see some people with problems"?
>>>>
>>>>                 Linus
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-22  1:01           ` Pierre-Loup A. Griffais
@ 2019-06-22  5:28             ` Linus Torvalds
  2019-06-22  6:35               ` Greg Kroah-Hartman
  2019-06-22  7:37               ` Greg Kroah-Hartman
  0 siblings, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2019-06-22  5:28 UTC (permalink / raw)
  To: Pierre-Loup A. Griffais; +Cc: Eric Dumazet, Greg Kroah-Hartman, lkml

On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais
<pgriffais@valvesoftware.com> wrote:
>
> I applied Eric's path to the tip of the branch and ran that kernel and
> the bug didn't occur through several logout / login cycles, so things
> look good at first glance. I'll keep running that kernel and report back
> if anything crops up in the future, but I believe we're good, beyond
> getting distros to ship this additional fix.

Good. It's now in my tree, so we can get it quickly into stable and
then quickly to distributions.

Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in
tcp_fragment()"), and I'm building it right now and I'll push it out
in a couple of minutes assuming nothing odd is going on.

             Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-22  5:28             ` Linus Torvalds
@ 2019-06-22  6:35               ` Greg Kroah-Hartman
  2019-06-22  7:37               ` Greg Kroah-Hartman
  1 sibling, 0 replies; 19+ messages in thread
From: Greg Kroah-Hartman @ 2019-06-22  6:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Pierre-Loup A. Griffais, Eric Dumazet, lkml

On Fri, Jun 21, 2019 at 10:28:21PM -0700, Linus Torvalds wrote:
> On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais
> <pgriffais@valvesoftware.com> wrote:
> >
> > I applied Eric's path to the tip of the branch and ran that kernel and
> > the bug didn't occur through several logout / login cycles, so things
> > look good at first glance. I'll keep running that kernel and report back
> > if anything crops up in the future, but I believe we're good, beyond
> > getting distros to ship this additional fix.
> 
> Good. It's now in my tree, so we can get it quickly into stable and
> then quickly to distributions.
> 
> Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in
> tcp_fragment()"), and I'm building it right now and I'll push it out
> in a couple of minutes assuming nothing odd is going on.

Thanks, I will pick it up now.

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-22  5:28             ` Linus Torvalds
  2019-06-22  6:35               ` Greg Kroah-Hartman
@ 2019-06-22  7:37               ` Greg Kroah-Hartman
  2019-06-26  2:02                 ` Guenter Roeck
  1 sibling, 1 reply; 19+ messages in thread
From: Greg Kroah-Hartman @ 2019-06-22  7:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Pierre-Loup A. Griffais, Eric Dumazet, lkml

On Fri, Jun 21, 2019 at 10:28:21PM -0700, Linus Torvalds wrote:
> On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais
> <pgriffais@valvesoftware.com> wrote:
> >
> > I applied Eric's path to the tip of the branch and ran that kernel and
> > the bug didn't occur through several logout / login cycles, so things
> > look good at first glance. I'll keep running that kernel and report back
> > if anything crops up in the future, but I believe we're good, beyond
> > getting distros to ship this additional fix.
> 
> Good. It's now in my tree, so we can get it quickly into stable and
> then quickly to distributions.
> 
> Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in
> tcp_fragment()"), and I'm building it right now and I'll push it out
> in a couple of minutes assuming nothing odd is going on.

This looks good for 4.19 and 5.1, so I'll push out new stable kernels in
a bit for them.

But for 4.14 and older, we don't have the "hint" to know this is an
outbound going packet and not to apply these checks at that point in
time, so this patch doesn't work.

I'll see if I can figure anything else later this afternoon for those
kernels...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-22  7:37               ` Greg Kroah-Hartman
@ 2019-06-26  2:02                 ` Guenter Roeck
  2019-06-26  2:29                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 19+ messages in thread
From: Guenter Roeck @ 2019-06-26  2:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Pierre-Loup A. Griffais, Eric Dumazet, lkml

Hi Greg,

On Sat, Jun 22, 2019 at 09:37:53AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Jun 21, 2019 at 10:28:21PM -0700, Linus Torvalds wrote:
> > On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais
> > <pgriffais@valvesoftware.com> wrote:
> > >
> > > I applied Eric's path to the tip of the branch and ran that kernel and
> > > the bug didn't occur through several logout / login cycles, so things
> > > look good at first glance. I'll keep running that kernel and report back
> > > if anything crops up in the future, but I believe we're good, beyond
> > > getting distros to ship this additional fix.
> > 
> > Good. It's now in my tree, so we can get it quickly into stable and
> > then quickly to distributions.
> > 
> > Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in
> > tcp_fragment()"), and I'm building it right now and I'll push it out
> > in a couple of minutes assuming nothing odd is going on.
> 
> This looks good for 4.19 and 5.1, so I'll push out new stable kernels in
> a bit for them.
> 
> But for 4.14 and older, we don't have the "hint" to know this is an
> outbound going packet and not to apply these checks at that point in
> time, so this patch doesn't work.
> 
> I'll see if I can figure anything else later this afternoon for those
> kernels...
> 

I may have missed it, but I don't see a fix for the problem in
older stable branches. Any news ?

One possibility might be be to apply the part of 75c119afe14f7 which
introduces TCP_FRAG_IN_WRITE_QUEUE and TCP_FRAG_IN_RTX_QUEUE, if that
is acceptable.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-26  2:02                 ` Guenter Roeck
@ 2019-06-26  2:29                   ` Greg Kroah-Hartman
  2019-06-26  3:43                     ` Guenter Roeck
  0 siblings, 1 reply; 19+ messages in thread
From: Greg Kroah-Hartman @ 2019-06-26  2:29 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Linus Torvalds, Pierre-Loup A. Griffais, Eric Dumazet, lkml

On Tue, Jun 25, 2019 at 07:02:20PM -0700, Guenter Roeck wrote:
> Hi Greg,
> 
> On Sat, Jun 22, 2019 at 09:37:53AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Jun 21, 2019 at 10:28:21PM -0700, Linus Torvalds wrote:
> > > On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais
> > > <pgriffais@valvesoftware.com> wrote:
> > > >
> > > > I applied Eric's path to the tip of the branch and ran that kernel and
> > > > the bug didn't occur through several logout / login cycles, so things
> > > > look good at first glance. I'll keep running that kernel and report back
> > > > if anything crops up in the future, but I believe we're good, beyond
> > > > getting distros to ship this additional fix.
> > > 
> > > Good. It's now in my tree, so we can get it quickly into stable and
> > > then quickly to distributions.
> > > 
> > > Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in
> > > tcp_fragment()"), and I'm building it right now and I'll push it out
> > > in a couple of minutes assuming nothing odd is going on.
> > 
> > This looks good for 4.19 and 5.1, so I'll push out new stable kernels in
> > a bit for them.
> > 
> > But for 4.14 and older, we don't have the "hint" to know this is an
> > outbound going packet and not to apply these checks at that point in
> > time, so this patch doesn't work.
> > 
> > I'll see if I can figure anything else later this afternoon for those
> > kernels...
> > 
> 
> I may have missed it, but I don't see a fix for the problem in
> older stable branches. Any news ?
> 
> One possibility might be be to apply the part of 75c119afe14f7 which
> introduces TCP_FRAG_IN_WRITE_QUEUE and TCP_FRAG_IN_RTX_QUEUE, if that
> is acceptable.

That's what people have already discussed on the stable mailing list a
few hours ago, hopefully a patch shows up soon as I'm traveling at the
moment and can't do it myself...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-26  2:29                   ` Greg Kroah-Hartman
@ 2019-06-26  3:43                     ` Guenter Roeck
  2019-06-26  4:20                       ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Guenter Roeck @ 2019-06-26  3:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Pierre-Loup A. Griffais, Eric Dumazet, lkml

On 6/25/19 7:29 PM, Greg Kroah-Hartman wrote:
> On Tue, Jun 25, 2019 at 07:02:20PM -0700, Guenter Roeck wrote:
>> Hi Greg,
>>
>> On Sat, Jun 22, 2019 at 09:37:53AM +0200, Greg Kroah-Hartman wrote:
>>> On Fri, Jun 21, 2019 at 10:28:21PM -0700, Linus Torvalds wrote:
>>>> On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais
>>>> <pgriffais@valvesoftware.com> wrote:
>>>>>
>>>>> I applied Eric's path to the tip of the branch and ran that kernel and
>>>>> the bug didn't occur through several logout / login cycles, so things
>>>>> look good at first glance. I'll keep running that kernel and report back
>>>>> if anything crops up in the future, but I believe we're good, beyond
>>>>> getting distros to ship this additional fix.
>>>>
>>>> Good. It's now in my tree, so we can get it quickly into stable and
>>>> then quickly to distributions.
>>>>
>>>> Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in
>>>> tcp_fragment()"), and I'm building it right now and I'll push it out
>>>> in a couple of minutes assuming nothing odd is going on.
>>>
>>> This looks good for 4.19 and 5.1, so I'll push out new stable kernels in
>>> a bit for them.
>>>
>>> But for 4.14 and older, we don't have the "hint" to know this is an
>>> outbound going packet and not to apply these checks at that point in
>>> time, so this patch doesn't work.
>>>
>>> I'll see if I can figure anything else later this afternoon for those
>>> kernels...
>>>
>>
>> I may have missed it, but I don't see a fix for the problem in
>> older stable branches. Any news ?
>>
>> One possibility might be be to apply the part of 75c119afe14f7 which
>> introduces TCP_FRAG_IN_WRITE_QUEUE and TCP_FRAG_IN_RTX_QUEUE, if that
>> is acceptable.
> 
> That's what people have already discussed on the stable mailing list a
> few hours ago, hopefully a patch shows up soon as I'm traveling at the
> moment and can't do it myself...
> 

Sounds good. Let me know if nothing shows up; I'll be happy to do it
if needed.

Guenter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-26  3:43                     ` Guenter Roeck
@ 2019-06-26  4:20                       ` Eric Dumazet
  2019-06-26  5:17                         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2019-06-26  4:20 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Greg Kroah-Hartman, Linus Torvalds, Pierre-Loup A. Griffais, lkml

On Wed, Jun 26, 2019 at 5:43 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 6/25/19 7:29 PM, Greg Kroah-Hartman wrote:
> > On Tue, Jun 25, 2019 at 07:02:20PM -0700, Guenter Roeck wrote:
> >> Hi Greg,
> >>
> >> On Sat, Jun 22, 2019 at 09:37:53AM +0200, Greg Kroah-Hartman wrote:
> >>> On Fri, Jun 21, 2019 at 10:28:21PM -0700, Linus Torvalds wrote:
> >>>> On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais
> >>>> <pgriffais@valvesoftware.com> wrote:
> >>>>>
> >>>>> I applied Eric's path to the tip of the branch and ran that kernel and
> >>>>> the bug didn't occur through several logout / login cycles, so things
> >>>>> look good at first glance. I'll keep running that kernel and report back
> >>>>> if anything crops up in the future, but I believe we're good, beyond
> >>>>> getting distros to ship this additional fix.
> >>>>
> >>>> Good. It's now in my tree, so we can get it quickly into stable and
> >>>> then quickly to distributions.
> >>>>
> >>>> Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in
> >>>> tcp_fragment()"), and I'm building it right now and I'll push it out
> >>>> in a couple of minutes assuming nothing odd is going on.
> >>>
> >>> This looks good for 4.19 and 5.1, so I'll push out new stable kernels in
> >>> a bit for them.
> >>>
> >>> But for 4.14 and older, we don't have the "hint" to know this is an
> >>> outbound going packet and not to apply these checks at that point in
> >>> time, so this patch doesn't work.
> >>>
> >>> I'll see if I can figure anything else later this afternoon for those
> >>> kernels...
> >>>
> >>
> >> I may have missed it, but I don't see a fix for the problem in
> >> older stable branches. Any news ?
> >>
> >> One possibility might be be to apply the part of 75c119afe14f7 which
> >> introduces TCP_FRAG_IN_WRITE_QUEUE and TCP_FRAG_IN_RTX_QUEUE, if that
> >> is acceptable.
> >
> > That's what people have already discussed on the stable mailing list a
> > few hours ago, hopefully a patch shows up soon as I'm traveling at the
> > moment and can't do it myself...
> >
>
> Sounds good. Let me know if nothing shows up; I'll be happy to do it
> if needed.


Without the rb-tree for rtx queues, old kernels are vulnerable to SACK
attacks if sk_sndbuf is too big,
so I would simply  add a cushion in the test, instead of trying to
backport an illusion of the rb-tree fixes.



diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a8772e11dc1cb42d4319b6fc072c625d284c7ad5..a554213afa4ac41120d781fe64b7cd18ff9b56e8
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1274,7 +1274,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff
*skb, u32 len,
        if (nsize < 0)
                nsize = 0;

-       if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
+       if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 131072)) {
                NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
                return -ENOMEM;
        }

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-26  4:20                       ` Eric Dumazet
@ 2019-06-26  5:17                         ` Greg Kroah-Hartman
  2019-06-26  6:38                           ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Greg Kroah-Hartman @ 2019-06-26  5:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Guenter Roeck, Linus Torvalds, Pierre-Loup A. Griffais, lkml

On Wed, Jun 26, 2019 at 06:20:17AM +0200, Eric Dumazet wrote:
> On Wed, Jun 26, 2019 at 5:43 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On 6/25/19 7:29 PM, Greg Kroah-Hartman wrote:
> > > On Tue, Jun 25, 2019 at 07:02:20PM -0700, Guenter Roeck wrote:
> > >> Hi Greg,
> > >>
> > >> On Sat, Jun 22, 2019 at 09:37:53AM +0200, Greg Kroah-Hartman wrote:
> > >>> On Fri, Jun 21, 2019 at 10:28:21PM -0700, Linus Torvalds wrote:
> > >>>> On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais
> > >>>> <pgriffais@valvesoftware.com> wrote:
> > >>>>>
> > >>>>> I applied Eric's path to the tip of the branch and ran that kernel and
> > >>>>> the bug didn't occur through several logout / login cycles, so things
> > >>>>> look good at first glance. I'll keep running that kernel and report back
> > >>>>> if anything crops up in the future, but I believe we're good, beyond
> > >>>>> getting distros to ship this additional fix.
> > >>>>
> > >>>> Good. It's now in my tree, so we can get it quickly into stable and
> > >>>> then quickly to distributions.
> > >>>>
> > >>>> Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in
> > >>>> tcp_fragment()"), and I'm building it right now and I'll push it out
> > >>>> in a couple of minutes assuming nothing odd is going on.
> > >>>
> > >>> This looks good for 4.19 and 5.1, so I'll push out new stable kernels in
> > >>> a bit for them.
> > >>>
> > >>> But for 4.14 and older, we don't have the "hint" to know this is an
> > >>> outbound going packet and not to apply these checks at that point in
> > >>> time, so this patch doesn't work.
> > >>>
> > >>> I'll see if I can figure anything else later this afternoon for those
> > >>> kernels...
> > >>>
> > >>
> > >> I may have missed it, but I don't see a fix for the problem in
> > >> older stable branches. Any news ?
> > >>
> > >> One possibility might be be to apply the part of 75c119afe14f7 which
> > >> introduces TCP_FRAG_IN_WRITE_QUEUE and TCP_FRAG_IN_RTX_QUEUE, if that
> > >> is acceptable.
> > >
> > > That's what people have already discussed on the stable mailing list a
> > > few hours ago, hopefully a patch shows up soon as I'm traveling at the
> > > moment and can't do it myself...
> > >
> >
> > Sounds good. Let me know if nothing shows up; I'll be happy to do it
> > if needed.
> 
> 
> Without the rb-tree for rtx queues, old kernels are vulnerable to SACK
> attacks if sk_sndbuf is too big,
> so I would simply  add a cushion in the test, instead of trying to
> backport an illusion of the rb-tree fixes.
> 
> 
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index a8772e11dc1cb42d4319b6fc072c625d284c7ad5..a554213afa4ac41120d781fe64b7cd18ff9b56e8
> 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1274,7 +1274,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff
> *skb, u32 len,
>         if (nsize < 0)
>                 nsize = 0;
> 
> -       if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
> +       if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 131072)) {
>                 NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
>                 return -ENOMEM;
>         }

That's a funny magic number, can we document what it means?

And yes, it's a much simpler patch, I'd rather take this than the fake
backport.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-26  5:17                         ` Greg Kroah-Hartman
@ 2019-06-26  6:38                           ` Eric Dumazet
  2019-06-26  8:23                             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2019-06-26  6:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Guenter Roeck, Linus Torvalds, Pierre-Loup A. Griffais, lkml

On Wed, Jun 26, 2019 at 8:22 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Jun 26, 2019 at 06:20:17AM +0200, Eric Dumazet wrote:
> > On Wed, Jun 26, 2019 at 5:43 AM Guenter Roeck <linux@roeck-us.net> wrote:
> > >
> > > On 6/25/19 7:29 PM, Greg Kroah-Hartman wrote:
> > > > On Tue, Jun 25, 2019 at 07:02:20PM -0700, Guenter Roeck wrote:
> > > >> Hi Greg,
> > > >>
> > > >> On Sat, Jun 22, 2019 at 09:37:53AM +0200, Greg Kroah-Hartman wrote:
> > > >>> On Fri, Jun 21, 2019 at 10:28:21PM -0700, Linus Torvalds wrote:
> > > >>>> On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais
> > > >>>> <pgriffais@valvesoftware.com> wrote:
> > > >>>>>
> > > >>>>> I applied Eric's path to the tip of the branch and ran that kernel and
> > > >>>>> the bug didn't occur through several logout / login cycles, so things
> > > >>>>> look good at first glance. I'll keep running that kernel and report back
> > > >>>>> if anything crops up in the future, but I believe we're good, beyond
> > > >>>>> getting distros to ship this additional fix.
> > > >>>>
> > > >>>> Good. It's now in my tree, so we can get it quickly into stable and
> > > >>>> then quickly to distributions.
> > > >>>>
> > > >>>> Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in
> > > >>>> tcp_fragment()"), and I'm building it right now and I'll push it out
> > > >>>> in a couple of minutes assuming nothing odd is going on.
> > > >>>
> > > >>> This looks good for 4.19 and 5.1, so I'll push out new stable kernels in
> > > >>> a bit for them.
> > > >>>
> > > >>> But for 4.14 and older, we don't have the "hint" to know this is an
> > > >>> outbound going packet and not to apply these checks at that point in
> > > >>> time, so this patch doesn't work.
> > > >>>
> > > >>> I'll see if I can figure anything else later this afternoon for those
> > > >>> kernels...
> > > >>>
> > > >>
> > > >> I may have missed it, but I don't see a fix for the problem in
> > > >> older stable branches. Any news ?
> > > >>
> > > >> One possibility might be be to apply the part of 75c119afe14f7 which
> > > >> introduces TCP_FRAG_IN_WRITE_QUEUE and TCP_FRAG_IN_RTX_QUEUE, if that
> > > >> is acceptable.
> > > >
> > > > That's what people have already discussed on the stable mailing list a
> > > > few hours ago, hopefully a patch shows up soon as I'm traveling at the
> > > > moment and can't do it myself...
> > > >
> > >
> > > Sounds good. Let me know if nothing shows up; I'll be happy to do it
> > > if needed.
> >
> >
> > Without the rb-tree for rtx queues, old kernels are vulnerable to SACK
> > attacks if sk_sndbuf is too big,
> > so I would simply  add a cushion in the test, instead of trying to
> > backport an illusion of the rb-tree fixes.
> >
> >
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index a8772e11dc1cb42d4319b6fc072c625d284c7ad5..a554213afa4ac41120d781fe64b7cd18ff9b56e8
> > 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -1274,7 +1274,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff
> > *skb, u32 len,
> >         if (nsize < 0)
> >                 nsize = 0;
> >
> > -       if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
> > +       if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 131072)) {
> >                 NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
> >                 return -ENOMEM;
> >         }
>
> That's a funny magic number, can we document what it means?

This is because TCP can cook skb with about 64KB of payload in
tcp_sendmsg() before
checking if memory limits are exceeded. (This is mentioned in commit
b6653b3629e5b88202be3c9abc44713973f5c4b4
" tcp: refine memory limit test in tcp_fragment()" changelog)

Then, if this giant TSO skb needs to be split in ~45 smaller skbs of
one segment each,
the resulting truesize might be twice bigger.

You could use 2 * 65536 if that looks better, and possibly a macro,
 but I feel that adding a macro for this one particular spot and
stable kernels might be overkill ?

>
> And yes, it's a much simpler patch, I'd rather take this than the fake
> backport.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Steam is broken on new kernels
  2019-06-26  6:38                           ` Eric Dumazet
@ 2019-06-26  8:23                             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 19+ messages in thread
From: Greg Kroah-Hartman @ 2019-06-26  8:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Guenter Roeck, Linus Torvalds, Pierre-Loup A. Griffais, lkml

On Wed, Jun 26, 2019 at 08:38:01AM +0200, Eric Dumazet wrote:
> On Wed, Jun 26, 2019 at 8:22 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Jun 26, 2019 at 06:20:17AM +0200, Eric Dumazet wrote:
> > > On Wed, Jun 26, 2019 at 5:43 AM Guenter Roeck <linux@roeck-us.net> wrote:
> > > >
> > > > On 6/25/19 7:29 PM, Greg Kroah-Hartman wrote:
> > > > > On Tue, Jun 25, 2019 at 07:02:20PM -0700, Guenter Roeck wrote:
> > > > >> Hi Greg,
> > > > >>
> > > > >> On Sat, Jun 22, 2019 at 09:37:53AM +0200, Greg Kroah-Hartman wrote:
> > > > >>> On Fri, Jun 21, 2019 at 10:28:21PM -0700, Linus Torvalds wrote:
> > > > >>>> On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais
> > > > >>>> <pgriffais@valvesoftware.com> wrote:
> > > > >>>>>
> > > > >>>>> I applied Eric's path to the tip of the branch and ran that kernel and
> > > > >>>>> the bug didn't occur through several logout / login cycles, so things
> > > > >>>>> look good at first glance. I'll keep running that kernel and report back
> > > > >>>>> if anything crops up in the future, but I believe we're good, beyond
> > > > >>>>> getting distros to ship this additional fix.
> > > > >>>>
> > > > >>>> Good. It's now in my tree, so we can get it quickly into stable and
> > > > >>>> then quickly to distributions.
> > > > >>>>
> > > > >>>> Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in
> > > > >>>> tcp_fragment()"), and I'm building it right now and I'll push it out
> > > > >>>> in a couple of minutes assuming nothing odd is going on.
> > > > >>>
> > > > >>> This looks good for 4.19 and 5.1, so I'll push out new stable kernels in
> > > > >>> a bit for them.
> > > > >>>
> > > > >>> But for 4.14 and older, we don't have the "hint" to know this is an
> > > > >>> outbound going packet and not to apply these checks at that point in
> > > > >>> time, so this patch doesn't work.
> > > > >>>
> > > > >>> I'll see if I can figure anything else later this afternoon for those
> > > > >>> kernels...
> > > > >>>
> > > > >>
> > > > >> I may have missed it, but I don't see a fix for the problem in
> > > > >> older stable branches. Any news ?
> > > > >>
> > > > >> One possibility might be be to apply the part of 75c119afe14f7 which
> > > > >> introduces TCP_FRAG_IN_WRITE_QUEUE and TCP_FRAG_IN_RTX_QUEUE, if that
> > > > >> is acceptable.
> > > > >
> > > > > That's what people have already discussed on the stable mailing list a
> > > > > few hours ago, hopefully a patch shows up soon as I'm traveling at the
> > > > > moment and can't do it myself...
> > > > >
> > > >
> > > > Sounds good. Let me know if nothing shows up; I'll be happy to do it
> > > > if needed.
> > >
> > >
> > > Without the rb-tree for rtx queues, old kernels are vulnerable to SACK
> > > attacks if sk_sndbuf is too big,
> > > so I would simply  add a cushion in the test, instead of trying to
> > > backport an illusion of the rb-tree fixes.
> > >
> > >
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index a8772e11dc1cb42d4319b6fc072c625d284c7ad5..a554213afa4ac41120d781fe64b7cd18ff9b56e8
> > > 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -1274,7 +1274,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff
> > > *skb, u32 len,
> > >         if (nsize < 0)
> > >                 nsize = 0;
> > >
> > > -       if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
> > > +       if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 131072)) {
> > >                 NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
> > >                 return -ENOMEM;
> > >         }
> >
> > That's a funny magic number, can we document what it means?
> 
> This is because TCP can cook skb with about 64KB of payload in
> tcp_sendmsg() before
> checking if memory limits are exceeded. (This is mentioned in commit
> b6653b3629e5b88202be3c9abc44713973f5c4b4
> " tcp: refine memory limit test in tcp_fragment()" changelog)
> 
> Then, if this giant TSO skb needs to be split in ~45 smaller skbs of
> one segment each,
> the resulting truesize might be twice bigger.
> 
> You could use 2 * 65536 if that looks better, and possibly a macro,
>  but I feel that adding a macro for this one particular spot and
> stable kernels might be overkill ?

Ah, yeah, 2*65536 makes more sense to me, seeing 131072 didn't trigger
the same "power of 2" thing in my brain :)

I'll fix this up and queue it up now, thanks!

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-06-26  8:24 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-21 21:27 Steam is broken on new kernels Pierre-Loup A. Griffais
2019-06-21 21:41 ` Greg Kroah-Hartman
2019-06-21 21:43   ` Pierre-Loup A. Griffais
2019-06-21 22:19   ` Linus Torvalds
     [not found]     ` <CANn89iL5+x3n9H9v4O6y39W=jvQs=uuXbzOvN5mBbcj0t+wdeg@mail.gmail.com>
2019-06-21 23:04       ` Pierre-Loup A. Griffais
2019-06-21 23:42         ` Josh Hunt
2019-06-21 23:54       ` Linus Torvalds
2019-06-22  0:19         ` Eric Dumazet
2019-06-22  1:01           ` Pierre-Loup A. Griffais
2019-06-22  5:28             ` Linus Torvalds
2019-06-22  6:35               ` Greg Kroah-Hartman
2019-06-22  7:37               ` Greg Kroah-Hartman
2019-06-26  2:02                 ` Guenter Roeck
2019-06-26  2:29                   ` Greg Kroah-Hartman
2019-06-26  3:43                     ` Guenter Roeck
2019-06-26  4:20                       ` Eric Dumazet
2019-06-26  5:17                         ` Greg Kroah-Hartman
2019-06-26  6:38                           ` Eric Dumazet
2019-06-26  8:23                             ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).