* Re: Soft lockup issue in Linux 4.1.9
[not found] ` <CANn89i+B5T4Rhs8HnrC0+f+GhLvBFfpr4BVDvhkVOveSfy9B8Q@mail.gmail.com>
@ 2015-10-01 11:43 ` Holger Hoffstätte
2015-10-01 11:52 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Holger Hoffstätte @ 2015-10-01 11:43 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger,
Greg Kroah-Hartman, linux-kernel, stable, netdev
On 10/01/15 13:29, Eric Dumazet wrote:
> On Thu, Oct 1, 2015 at 3:59 AM, Holger Hoffstätte
> <holger.hoffstaette@googlemail.com> wrote:
>>
>> On Thu, 01 Oct 2015 06:41:46 +0200, Andre Tomt wrote:
>>
>>> On 01. okt. 2015 00:37, Holger Hoffstätte wrote:
>>>> On Wed, 30 Sep 2015 23:59:43 +0200, Olivier Bonvalet wrote:
>>>>
>>>>> for information, I've just upgraded 6 servers from Linux 4.1.8 to Linux
>>>>> 4.1.9, and have some random soft lockup. If this can help :
>>>>
>>>> Congratulations! You're not the first one to get hit by this, but
>>>> you are probably the first one to get a meaningful stacktrace! \o/
>>>>
>>>>> [ 204.478380] Call Trace:
>>>>> [ 204.478381] <IRQ>
>>>>> [ 204.478385] [<ffffffff81076121>] ? try_to_del_timer_sync+0x43/0x4d
>>>>> [ 204.478386] [<ffffffff810760de>] ? del_timer+0x4d/0x4d
>>>>> [ 204.478388] [<ffffffff8107614b>] ? del_timer_sync+0x20/0x3d
>>>>
>>>> Can you try to revert
>>>>
>>>> [PATCH 4.1 157/159] inet: fix races with reqsk timers
>>>>
>>>> and see how that works for you? I'll do the same on my end. So far the
>>>> only thing I ever could gleam was an rcu stall after cpuidle_enter(),
>>>> but never anything regarding the timer - though it was definitely
>>>> related to NIC activity after idle.
>>>
>>> I'm running with this patch reverted now as well. 2 hours no issues so
>>> far, but I can't conclude anything yet as I've seen it take up to 6+
>>> hours to explode here. As a result the bisect was going veeery slowly.
>>
>> Now 12+ hours going without problems, never got this far with the patch
>> included, as it would usually freeze during idle periods.
>>
>> As far as I'm concerned this is the culprit and should be reverted in
>> 4.1.x, unless Eric can suggest how to fix this. (cc'ed).
>>
>
> Looks an old and known problem...
>
> Following commit should be sent/added for 4.1 stable tree :
>
> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> Author: Eric Dumazet <edumazet@google.com>
> Date: Thu Aug 13 15:44:51 2015 -0700
>
> inet: fix potential deadlock in reqsk_queue_unlink()
>
> When replacing del_timer() with del_timer_sync(), I introduced
> a deadlock condition :
>
> reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
>
> inet_csk_reqsk_queue_drop() can be called from many contexts,
> one being the timer handler itself (reqsk_timer_handler()).
>
> In this case, del_timer_sync() loops forever.
>
> Simple fix is to test if timer is pending.
>
> Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
we speak. Let's hope that this fixes the lockups.
Thanks for the quick reply!
Holger
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9
2015-10-01 11:43 ` Soft lockup issue in Linux 4.1.9 Holger Hoffstätte
@ 2015-10-01 11:52 ` Eric Dumazet
2015-10-02 6:52 ` Andre Tomt
2015-10-02 20:04 ` Thomas Gleixner
0 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2015-10-01 11:52 UTC (permalink / raw)
To: Holger Hoffstätte
Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger,
Greg Kroah-Hartman, LKML, stable, netdev
On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
<holger.hoffstaette@googlemail.com> wrote:
> On 10/01/15 13:29, Eric Dumazet wrote:
>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>> Author: Eric Dumazet <edumazet@google.com>
>> Date: Thu Aug 13 15:44:51 2015 -0700
>>
>> inet: fix potential deadlock in reqsk_queue_unlink()
>>
>> When replacing del_timer() with del_timer_sync(), I introduced
>> a deadlock condition :
>>
>> reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
>>
>> inet_csk_reqsk_queue_drop() can be called from many contexts,
>> one being the timer handler itself (reqsk_timer_handler()).
>>
>> In this case, del_timer_sync() loops forever.
>>
>> Simple fix is to test if timer is pending.
>>
>> Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
> we speak. Let's hope that this fixes the lockups.
>
It definitely should help !
David, since patch is not yet seen on
http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
could you please add it to your queue ?
Thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9
2015-10-01 11:52 ` Eric Dumazet
@ 2015-10-02 6:52 ` Andre Tomt
2015-10-02 7:17 ` Holger Hoffstätte
2015-10-02 20:04 ` Thomas Gleixner
1 sibling, 1 reply; 11+ messages in thread
From: Andre Tomt @ 2015-10-02 6:52 UTC (permalink / raw)
To: Eric Dumazet, Holger Hoffstätte
Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger,
Greg Kroah-Hartman, LKML, stable, netdev
On 01. okt. 2015 13:52, Eric Dumazet wrote:
> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
> <holger.hoffstaette@googlemail.com> wrote:
>> On 10/01/15 13:29, Eric Dumazet wrote:
>
>>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>>> Author: Eric Dumazet <edumazet@google.com>
>>> Date: Thu Aug 13 15:44:51 2015 -0700
>>>
>>> inet: fix potential deadlock in reqsk_queue_unlink()
<snip>
>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
>> we speak. Let's hope that this fixes the lockups.
>>
>
> It definitely should help !
>
> David, since patch is not yet seen on
> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
> could you please add it to your queue ?
Seems to fix it for me as well. 3 systems have been running varying
types of production-like loads with it for 14+ hours without hanging.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9
2015-10-02 6:52 ` Andre Tomt
@ 2015-10-02 7:17 ` Holger Hoffstätte
2015-10-02 19:25 ` Wolfgang Walter
2015-10-03 19:14 ` Thomas D.
0 siblings, 2 replies; 11+ messages in thread
From: Holger Hoffstätte @ 2015-10-02 7:17 UTC (permalink / raw)
To: Andre Tomt, Eric Dumazet
Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger,
Greg Kroah-Hartman, LKML, stable, netdev
On 10/02/15 08:52, Andre Tomt wrote:
> On 01. okt. 2015 13:52, Eric Dumazet wrote:
>> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
>> <holger.hoffstaette@googlemail.com> wrote:
>>> On 10/01/15 13:29, Eric Dumazet wrote:
>>
>>>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>>>> Author: Eric Dumazet <edumazet@google.com>
>>>> Date: Thu Aug 13 15:44:51 2015 -0700
>>>>
>>>> inet: fix potential deadlock in reqsk_queue_unlink()
> <snip>
>>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
>>> we speak. Let's hope that this fixes the lockups.
>>>
>>
>> It definitely should help !
>>
>> David, since patch is not yet seen on
>> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
>> could you please add it to your queue ?
>
> Seems to fix it for me as well. 3 systems have been running varying
> types of production-like loads with it for 14+ hours without hanging.
Just got up, and yes - my systems survived the night as well, no issues.
Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
will get another broken release.
cheers
Holger
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9
2015-10-02 7:17 ` Holger Hoffstätte
@ 2015-10-02 19:25 ` Wolfgang Walter
2015-10-03 19:14 ` Thomas D.
1 sibling, 0 replies; 11+ messages in thread
From: Wolfgang Walter @ 2015-10-02 19:25 UTC (permalink / raw)
To: Holger Hoffstätte
Cc: Andre Tomt, Eric Dumazet, David S. Miller, Eric W. Biederman,
Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev
Am Freitag, 2. Oktober 2015, 09:17:16 schrieb Holger Hoffstätte:
> On 10/02/15 08:52, Andre Tomt wrote:
> > On 01. okt. 2015 13:52, Eric Dumazet wrote:
> >> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
> >>
> >> <holger.hoffstaette@googlemail.com> wrote:
> >>> On 10/01/15 13:29, Eric Dumazet wrote:
> >>>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> >>>> Author: Eric Dumazet <edumazet@google.com>
> >>>> Date: Thu Aug 13 15:44:51 2015 -0700
> >>>>
> >>>> inet: fix potential deadlock in reqsk_queue_unlink()
> >
> > <snip>
> >
> >>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
> >>> we speak. Let's hope that this fixes the lockups.
> >>
> >> It definitely should help !
> >>
> >> David, since patch is not yet seen on
> >> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
> >> could you please add it to your queue ?
> >
> > Seems to fix it for me as well. 3 systems have been running varying
> > types of production-like loads with it for 14+ hours without hanging.
>
> Just got up, and yes - my systems survived the night as well, no issues.
>
> Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
> will get another broken release.
>
Fixes the problem here, too.
Regards,
--
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9
2015-10-01 11:52 ` Eric Dumazet
2015-10-02 6:52 ` Andre Tomt
@ 2015-10-02 20:04 ` Thomas Gleixner
2015-10-02 20:59 ` Eric Dumazet
1 sibling, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2015-10-02 20:04 UTC (permalink / raw)
To: Eric Dumazet
Cc: Holger Hoffstätte, David S. Miller, Eric W. Biederman,
Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1309 bytes --]
On Thu, 1 Oct 2015, Eric Dumazet wrote:
> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
> <holger.hoffstaette@googlemail.com> wrote:
> > On 10/01/15 13:29, Eric Dumazet wrote:
>
> >> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> >> Author: Eric Dumazet <edumazet@google.com>
> >> Date: Thu Aug 13 15:44:51 2015 -0700
> >>
> >> inet: fix potential deadlock in reqsk_queue_unlink()
> >>
> >> When replacing del_timer() with del_timer_sync(), I introduced
> >> a deadlock condition :
> >>
> >> reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
> >>
> >> inet_csk_reqsk_queue_drop() can be called from many contexts,
> >> one being the timer handler itself (reqsk_timer_handler()).
> >>
> >> In this case, del_timer_sync() loops forever.
> >>
> >> Simple fix is to test if timer is pending.
> >>
> >> Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
> >> Signed-off-by: Eric Dumazet <edumazet@google.com>
> >> Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
> > we speak. Let's hope that this fixes the lockups.
> >
>
> It definitely should help !
What makes sure, that the timer cannot be readded while that timer
callback is running?
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9
2015-10-02 20:04 ` Thomas Gleixner
@ 2015-10-02 20:59 ` Eric Dumazet
2015-10-02 21:04 ` Thomas Gleixner
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2015-10-02 20:59 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Eric Dumazet, Holger Hoffstätte, David S. Miller,
Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML,
stable, netdev
On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote:
> What makes sure, that the timer cannot be readded while that timer
> callback is running?
What is exactly your question ?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9
2015-10-02 20:59 ` Eric Dumazet
@ 2015-10-02 21:04 ` Thomas Gleixner
2015-10-02 21:32 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2015-10-02 21:04 UTC (permalink / raw)
To: Eric Dumazet
Cc: Eric Dumazet, Holger Hoffstätte, David S. Miller,
Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML,
stable, netdev
On Fri, 2 Oct 2015, Eric Dumazet wrote:
> On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote:
>
> > What makes sure, that the timer cannot be readded while that timer
> > callback is running?
>
> What is exactly your question ?
CPU0 CPU1
timer expires
callback
add timer
timer_pending() == true
===> del_timer_sync()
I was just curious how this is prevented as I got lost in the
networking code as usual :)
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9
2015-10-02 21:04 ` Thomas Gleixner
@ 2015-10-02 21:32 ` Eric Dumazet
0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2015-10-02 21:32 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Eric Dumazet, Holger Hoffstätte, David S. Miller,
Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML,
stable, netdev
On Fri, 2015-10-02 at 23:04 +0200, Thomas Gleixner wrote:
> On Fri, 2 Oct 2015, Eric Dumazet wrote:
> > On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote:
> >
> > > What makes sure, that the timer cannot be readded while that timer
> > > callback is running?
> >
> > What is exactly your question ?
>
> CPU0 CPU1
>
> timer expires
> callback
> add timer
> timer_pending() == true
> ===> del_timer_sync()
>
> I was just curious how this is prevented as I got lost in the
> networking code as usual :)
Sure ;)
I believe this can not happen for following reasons :
mod_timer_pinned() is used only when req is created, while timer cannot
possibly be running on the same req. The _pinned part is critical
because we set the req->refcnt _after_ starting the timer,
to avoid being visible and caught from rcu lookups in hash tables.
Then, timer might be modified only by mod_timer_pending() from
tcp_check_req() : This should not re-start timer if another cpu is in
the timer callback.
Thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9
2015-10-02 7:17 ` Holger Hoffstätte
2015-10-02 19:25 ` Wolfgang Walter
@ 2015-10-03 19:14 ` Thomas D.
2015-10-17 23:41 ` Greg Kroah-Hartman
1 sibling, 1 reply; 11+ messages in thread
From: Thomas D. @ 2015-10-03 19:14 UTC (permalink / raw)
To: Holger Hoffstätte, Andre Tomt, Eric Dumazet, stable
Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger,
Greg Kroah-Hartman, LKML, netdev
Hi,
Holger Hoffstätte wrote:
> Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
> will get another broken release.
For me it looks like the request was too late, the patch is not included
in 4.1.10. So don't forget to re-apply the patch when doing the upgrade.
Greg, do you need a dedicated inclusion request for
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
in 4.1.x or is it already on your list?
-Thomas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Soft lockup issue in Linux 4.1.9
2015-10-03 19:14 ` Thomas D.
@ 2015-10-17 23:41 ` Greg Kroah-Hartman
0 siblings, 0 replies; 11+ messages in thread
From: Greg Kroah-Hartman @ 2015-10-17 23:41 UTC (permalink / raw)
To: Thomas D.
Cc: Holger Hoffstätte, Andre Tomt, Eric Dumazet, stable,
David S. Miller, Eric W. Biederman, Stephen Hemminger, LKML,
netdev
On Sat, Oct 03, 2015 at 09:14:16PM +0200, Thomas D. wrote:
> Hi,
>
> Holger Hoffstätte wrote:
> > Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
> > will get another broken release.
>
> For me it looks like the request was too late, the patch is not included
> in 4.1.10. So don't forget to re-apply the patch when doing the upgrade.
>
> Greg, do you need a dedicated inclusion request for
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> in 4.1.x or is it already on your list?
Now applied, thanks.
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-10-17 23:41 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1443650383.13282.10.camel@daevel.fr>
[not found] ` <560D1223.3070606@googlemail.com>
[not found] ` <CANn89i+B5T4Rhs8HnrC0+f+GhLvBFfpr4BVDvhkVOveSfy9B8Q@mail.gmail.com>
2015-10-01 11:43 ` Soft lockup issue in Linux 4.1.9 Holger Hoffstätte
2015-10-01 11:52 ` Eric Dumazet
2015-10-02 6:52 ` Andre Tomt
2015-10-02 7:17 ` Holger Hoffstätte
2015-10-02 19:25 ` Wolfgang Walter
2015-10-03 19:14 ` Thomas D.
2015-10-17 23:41 ` Greg Kroah-Hartman
2015-10-02 20:04 ` Thomas Gleixner
2015-10-02 20:59 ` Eric Dumazet
2015-10-02 21:04 ` Thomas Gleixner
2015-10-02 21:32 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).