netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Soft lockup issue in Linux 4.1.9
       [not found]   ` <CANn89i+B5T4Rhs8HnrC0+f+GhLvBFfpr4BVDvhkVOveSfy9B8Q@mail.gmail.com>
@ 2015-10-01 11:43     ` Holger Hoffstätte
  2015-10-01 11:52       ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Holger Hoffstätte @ 2015-10-01 11:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger,
	Greg Kroah-Hartman, linux-kernel, stable, netdev

On 10/01/15 13:29, Eric Dumazet wrote:
> On Thu, Oct 1, 2015 at 3:59 AM, Holger Hoffstätte
> <holger.hoffstaette@googlemail.com> wrote:
>>
>> On Thu, 01 Oct 2015 06:41:46 +0200, Andre Tomt wrote:
>>
>>> On 01. okt. 2015 00:37, Holger Hoffstätte wrote:
>>>> On Wed, 30 Sep 2015 23:59:43 +0200, Olivier Bonvalet wrote:
>>>>
>>>>> for information, I've just upgraded 6 servers from Linux 4.1.8 to Linux
>>>>> 4.1.9, and have some random soft lockup. If this can help :
>>>>
>>>> Congratulations! You're not the first one to get hit by this, but
>>>> you are probably the first one to get a meaningful stacktrace! \o/
>>>>
>>>>> [  204.478380] Call Trace:
>>>>> [  204.478381]  <IRQ>
>>>>> [  204.478385]  [<ffffffff81076121>] ? try_to_del_timer_sync+0x43/0x4d
>>>>> [  204.478386]  [<ffffffff810760de>] ? del_timer+0x4d/0x4d
>>>>> [  204.478388]  [<ffffffff8107614b>] ? del_timer_sync+0x20/0x3d
>>>>
>>>> Can you try to revert
>>>>
>>>>     [PATCH 4.1 157/159] inet: fix races with reqsk timers
>>>>
>>>> and see how that works for you? I'll do the same on my end. So far the
>>>> only thing I ever could gleam was an rcu stall after cpuidle_enter(),
>>>> but never anything regarding the timer - though it was definitely
>>>> related to NIC activity after idle.
>>>
>>> I'm running with this patch reverted now as well. 2 hours no issues so
>>> far, but I can't conclude anything yet as I've seen it take up to 6+
>>> hours to explode here. As a result the bisect was going veeery slowly.
>>
>> Now 12+ hours going without problems, never got this far with the patch
>> included, as it would usually freeze during idle periods.
>>
>> As far as I'm concerned this is the culprit and should be reverted in
>> 4.1.x, unless Eric can suggest how to fix this. (cc'ed).
>>
> 
> Looks an old and known problem...
> 
> Following commit should be sent/added for 4.1 stable tree :
> 
> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Thu Aug 13 15:44:51 2015 -0700
> 
>     inet: fix potential deadlock in reqsk_queue_unlink()
> 
>     When replacing del_timer() with del_timer_sync(), I introduced
>     a deadlock condition :
> 
>     reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
> 
>     inet_csk_reqsk_queue_drop() can be called from many contexts,
>     one being the timer handler itself (reqsk_timer_handler()).
> 
>     In this case, del_timer_sync() loops forever.
> 
>     Simple fix is to test if timer is pending.
> 
>     Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
we speak. Let's hope that this fixes the lockups.

Thanks for the quick reply!

Holger

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Soft lockup issue in Linux 4.1.9
  2015-10-01 11:43     ` Soft lockup issue in Linux 4.1.9 Holger Hoffstätte
@ 2015-10-01 11:52       ` Eric Dumazet
  2015-10-02  6:52         ` Andre Tomt
  2015-10-02 20:04         ` Thomas Gleixner
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2015-10-01 11:52 UTC (permalink / raw)
  To: Holger Hoffstätte
  Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger,
	Greg Kroah-Hartman, LKML, stable, netdev

On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
<holger.hoffstaette@googlemail.com> wrote:
> On 10/01/15 13:29, Eric Dumazet wrote:

>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>> Author: Eric Dumazet <edumazet@google.com>
>> Date:   Thu Aug 13 15:44:51 2015 -0700
>>
>>     inet: fix potential deadlock in reqsk_queue_unlink()
>>
>>     When replacing del_timer() with del_timer_sync(), I introduced
>>     a deadlock condition :
>>
>>     reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
>>
>>     inet_csk_reqsk_queue_drop() can be called from many contexts,
>>     one being the timer handler itself (reqsk_timer_handler()).
>>
>>     In this case, del_timer_sync() loops forever.
>>
>>     Simple fix is to test if timer is pending.
>>
>>     Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
>>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
> we speak. Let's hope that this fixes the lockups.
>

It definitely should help !

David, since patch is not yet seen on
http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
could you please add it to your queue ?

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Soft lockup issue in Linux 4.1.9
  2015-10-01 11:52       ` Eric Dumazet
@ 2015-10-02  6:52         ` Andre Tomt
  2015-10-02  7:17           ` Holger Hoffstätte
  2015-10-02 20:04         ` Thomas Gleixner
  1 sibling, 1 reply; 11+ messages in thread
From: Andre Tomt @ 2015-10-02  6:52 UTC (permalink / raw)
  To: Eric Dumazet, Holger Hoffstätte
  Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger,
	Greg Kroah-Hartman, LKML, stable, netdev

On 01. okt. 2015 13:52, Eric Dumazet wrote:
> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
> <holger.hoffstaette@googlemail.com> wrote:
>> On 10/01/15 13:29, Eric Dumazet wrote:
>
>>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>>> Author: Eric Dumazet <edumazet@google.com>
>>> Date:   Thu Aug 13 15:44:51 2015 -0700
>>>
>>>      inet: fix potential deadlock in reqsk_queue_unlink()
<snip>
>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
>> we speak. Let's hope that this fixes the lockups.
>>
>
> It definitely should help !
>
> David, since patch is not yet seen on
> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
> could you please add it to your queue ?

Seems to fix it for me as well. 3 systems have been running varying 
types of production-like loads with it for 14+ hours without hanging.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Soft lockup issue in Linux 4.1.9
  2015-10-02  6:52         ` Andre Tomt
@ 2015-10-02  7:17           ` Holger Hoffstätte
  2015-10-02 19:25             ` Wolfgang Walter
  2015-10-03 19:14             ` Thomas D.
  0 siblings, 2 replies; 11+ messages in thread
From: Holger Hoffstätte @ 2015-10-02  7:17 UTC (permalink / raw)
  To: Andre Tomt, Eric Dumazet
  Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger,
	Greg Kroah-Hartman, LKML, stable, netdev

On 10/02/15 08:52, Andre Tomt wrote:
> On 01. okt. 2015 13:52, Eric Dumazet wrote:
>> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
>> <holger.hoffstaette@googlemail.com> wrote:
>>> On 10/01/15 13:29, Eric Dumazet wrote:
>>
>>>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>>>> Author: Eric Dumazet <edumazet@google.com>
>>>> Date:   Thu Aug 13 15:44:51 2015 -0700
>>>>
>>>>      inet: fix potential deadlock in reqsk_queue_unlink()
> <snip>
>>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
>>> we speak. Let's hope that this fixes the lockups.
>>>
>>
>> It definitely should help !
>>
>> David, since patch is not yet seen on
>> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
>> could you please add it to your queue ?
> 
> Seems to fix it for me as well. 3 systems have been running varying
> types of production-like loads with it for 14+ hours without hanging.

Just got up, and yes - my systems survived the night as well, no issues.

Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
will get another broken release.

cheers
Holger

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Soft lockup issue in Linux 4.1.9
  2015-10-02  7:17           ` Holger Hoffstätte
@ 2015-10-02 19:25             ` Wolfgang Walter
  2015-10-03 19:14             ` Thomas D.
  1 sibling, 0 replies; 11+ messages in thread
From: Wolfgang Walter @ 2015-10-02 19:25 UTC (permalink / raw)
  To: Holger Hoffstätte
  Cc: Andre Tomt, Eric Dumazet, David S. Miller, Eric W. Biederman,
	Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev

Am Freitag, 2. Oktober 2015, 09:17:16 schrieb Holger Hoffstätte:
> On 10/02/15 08:52, Andre Tomt wrote:
> > On 01. okt. 2015 13:52, Eric Dumazet wrote:
> >> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
> >> 
> >> <holger.hoffstaette@googlemail.com> wrote:
> >>> On 10/01/15 13:29, Eric Dumazet wrote:
> >>>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> >>>> Author: Eric Dumazet <edumazet@google.com>
> >>>> Date:   Thu Aug 13 15:44:51 2015 -0700
> >>>> 
> >>>>      inet: fix potential deadlock in reqsk_queue_unlink()
> > 
> > <snip>
> > 
> >>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
> >>> we speak. Let's hope that this fixes the lockups.
> >> 
> >> It definitely should help !
> >> 
> >> David, since patch is not yet seen on
> >> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
> >> could you please add it to your queue ?
> > 
> > Seems to fix it for me as well. 3 systems have been running varying
> > types of production-like loads with it for 14+ hours without hanging.
> 
> Just got up, and yes - my systems survived the night as well, no issues.
> 
> Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
> will get another broken release.
> 

Fixes the problem here, too.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Soft lockup issue in Linux 4.1.9
  2015-10-01 11:52       ` Eric Dumazet
  2015-10-02  6:52         ` Andre Tomt
@ 2015-10-02 20:04         ` Thomas Gleixner
  2015-10-02 20:59           ` Eric Dumazet
  1 sibling, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2015-10-02 20:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Holger Hoffstätte, David S. Miller, Eric W. Biederman,
	Stephen Hemminger, Greg Kroah-Hartman, LKML, stable, netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1309 bytes --]

On Thu, 1 Oct 2015, Eric Dumazet wrote:
> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
> <holger.hoffstaette@googlemail.com> wrote:
> > On 10/01/15 13:29, Eric Dumazet wrote:
> 
> >> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> >> Author: Eric Dumazet <edumazet@google.com>
> >> Date:   Thu Aug 13 15:44:51 2015 -0700
> >>
> >>     inet: fix potential deadlock in reqsk_queue_unlink()
> >>
> >>     When replacing del_timer() with del_timer_sync(), I introduced
> >>     a deadlock condition :
> >>
> >>     reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
> >>
> >>     inet_csk_reqsk_queue_drop() can be called from many contexts,
> >>     one being the timer handler itself (reqsk_timer_handler()).
> >>
> >>     In this case, del_timer_sync() loops forever.
> >>
> >>     Simple fix is to test if timer is pending.
> >>
> >>     Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
> >>     Signed-off-by: Eric Dumazet <edumazet@google.com>
> >>     Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
> > we speak. Let's hope that this fixes the lockups.
> >
> 
> It definitely should help !

What makes sure, that the timer cannot be readded while that timer
callback is running?

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Soft lockup issue in Linux 4.1.9
  2015-10-02 20:04         ` Thomas Gleixner
@ 2015-10-02 20:59           ` Eric Dumazet
  2015-10-02 21:04             ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2015-10-02 20:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Eric Dumazet, Holger Hoffstätte, David S. Miller,
	Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML,
	stable, netdev

On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote:

> What makes sure, that the timer cannot be readded while that timer
> callback is running?

What is exactly your question ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Soft lockup issue in Linux 4.1.9
  2015-10-02 20:59           ` Eric Dumazet
@ 2015-10-02 21:04             ` Thomas Gleixner
  2015-10-02 21:32               ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2015-10-02 21:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, Holger Hoffstätte, David S. Miller,
	Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML,
	stable, netdev

On Fri, 2 Oct 2015, Eric Dumazet wrote:
> On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote:
> 
> > What makes sure, that the timer cannot be readded while that timer
> > callback is running?
> 
> What is exactly your question ?

CPU0   	  	       		CPU1

timer expires
  callback
				add timer
  timer_pending() == true
  ===> del_timer_sync()

I was just curious how this is prevented as I got lost in the
networking code as usual :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Soft lockup issue in Linux 4.1.9
  2015-10-02 21:04             ` Thomas Gleixner
@ 2015-10-02 21:32               ` Eric Dumazet
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2015-10-02 21:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Eric Dumazet, Holger Hoffstätte, David S. Miller,
	Eric W. Biederman, Stephen Hemminger, Greg Kroah-Hartman, LKML,
	stable, netdev

On Fri, 2015-10-02 at 23:04 +0200, Thomas Gleixner wrote:
> On Fri, 2 Oct 2015, Eric Dumazet wrote:
> > On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote:
> > 
> > > What makes sure, that the timer cannot be readded while that timer
> > > callback is running?
> > 
> > What is exactly your question ?
> 
> CPU0   	  	       		CPU1
> 
> timer expires
>   callback
> 				add timer
>   timer_pending() == true
>   ===> del_timer_sync()
> 
> I was just curious how this is prevented as I got lost in the
> networking code as usual :)

Sure ;)

I believe this can not happen for following reasons :

mod_timer_pinned() is used only when req is created, while timer cannot
possibly be running on the same req. The _pinned part is critical
because we set the req->refcnt _after_ starting the timer,
to avoid being visible and caught from rcu lookups in hash tables.

Then, timer might be modified only by mod_timer_pending() from
tcp_check_req() : This should not re-start timer if another cpu is in
the timer callback.

Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Soft lockup issue in Linux 4.1.9
  2015-10-02  7:17           ` Holger Hoffstätte
  2015-10-02 19:25             ` Wolfgang Walter
@ 2015-10-03 19:14             ` Thomas D.
  2015-10-17 23:41               ` Greg Kroah-Hartman
  1 sibling, 1 reply; 11+ messages in thread
From: Thomas D. @ 2015-10-03 19:14 UTC (permalink / raw)
  To: Holger Hoffstätte, Andre Tomt, Eric Dumazet, stable
  Cc: David S. Miller, Eric W. Biederman, Stephen Hemminger,
	Greg Kroah-Hartman, LKML, netdev

Hi,

Holger Hoffstätte wrote:
> Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
> will get another broken release.

For me it looks like the request was too late, the patch is not included
in 4.1.10. So don't forget to re-apply the patch when doing the upgrade.

Greg, do you need a dedicated inclusion request for
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
in 4.1.x or is it already on your list?


-Thomas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Soft lockup issue in Linux 4.1.9
  2015-10-03 19:14             ` Thomas D.
@ 2015-10-17 23:41               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 11+ messages in thread
From: Greg Kroah-Hartman @ 2015-10-17 23:41 UTC (permalink / raw)
  To: Thomas D.
  Cc: Holger Hoffstätte, Andre Tomt, Eric Dumazet, stable,
	David S. Miller, Eric W. Biederman, Stephen Hemminger, LKML,
	netdev

On Sat, Oct 03, 2015 at 09:14:16PM +0200, Thomas D. wrote:
> Hi,
> 
> Holger Hoffstätte wrote:
> > Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
> > will get another broken release.
> 
> For me it looks like the request was too late, the patch is not included
> in 4.1.10. So don't forget to re-apply the patch when doing the upgrade.
> 
> Greg, do you need a dedicated inclusion request for
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> in 4.1.x or is it already on your list?

Now applied, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-10-17 23:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1443650383.13282.10.camel@daevel.fr>
     [not found] ` <560D1223.3070606@googlemail.com>
     [not found]   ` <CANn89i+B5T4Rhs8HnrC0+f+GhLvBFfpr4BVDvhkVOveSfy9B8Q@mail.gmail.com>
2015-10-01 11:43     ` Soft lockup issue in Linux 4.1.9 Holger Hoffstätte
2015-10-01 11:52       ` Eric Dumazet
2015-10-02  6:52         ` Andre Tomt
2015-10-02  7:17           ` Holger Hoffstätte
2015-10-02 19:25             ` Wolfgang Walter
2015-10-03 19:14             ` Thomas D.
2015-10-17 23:41               ` Greg Kroah-Hartman
2015-10-02 20:04         ` Thomas Gleixner
2015-10-02 20:59           ` Eric Dumazet
2015-10-02 21:04             ` Thomas Gleixner
2015-10-02 21:32               ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).