* Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
@ 2014-09-02 15:45 Itzcak Pechtalt
2014-09-02 16:08 ` Linus Torvalds
0 siblings, 1 reply; 4+ messages in thread
From: Itzcak Pechtalt @ 2014-09-02 15:45 UTC (permalink / raw)
To: linux-kernel; +Cc: Alan Cox, Linus Torvalds
Hi,
I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this subject with full description.
There is also a short fix patch for kernel/hrtimer.c file.
Even if this bug occurs rary, however it resolves system hard lockup option.
I suspect that it was targeted by mistake to not active list (timers_realtime-clock@kernel-bugs.osdl.org).
Following is the fix patch based on kernel 3.16.1 (just simple):
diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c
--- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300
+++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300
@@ -941,7 +941,7 @@
if (!timerqueue_getnext(&base->active))
base->cpu_base->active_bases &= ~(1 << base->index);
out:
- timer->state = newstate;
+ timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK));
}
/*
Is there a chance for this patch fix to insert into next kernel release?
Thanks
Itzcak Pechtalt
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
2014-09-02 15:45 Race condition in HR timers that cause double insertion and hard lockup -- all latest versions Itzcak Pechtalt
@ 2014-09-02 16:08 ` Linus Torvalds
2014-09-02 17:46 ` Itzcak Pechtalt
2014-09-02 19:08 ` Thomas Gleixner
0 siblings, 2 replies; 4+ messages in thread
From: Linus Torvalds @ 2014-09-02 16:08 UTC (permalink / raw)
To: Itzcak Pechtalt, Thomas Gleixner; +Cc: linux-kernel
On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt
<itzcak@flashnetworks.com> wrote:
>
> I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this subject with full description.
> There is also a short fix patch for kernel/hrtimer.c file.
> Even if this bug occurs rary, however it resolves system hard lockup option.
The patch is whitespace-damaged, but with a small oneliner like this
that doesn't much matter (the timer files moved to kernel/time/ during
this merge window, so the patch wouldn't apply as-is anyway).
It needs a sign-off (see Documentation/SubmittingPatches), but even
more importantly it needs to go to the right people for
double-checking.
But the patch is more broken than whitespace and even lack of
sign-off. It cannot even have compiled. I'm assuming "timer_state" was
intended to be "timer->state". Also, every caller but one already has
"HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in
"newstate", so I suspect if this patch is the real fix (which I'll
leave for Thomas to comment more on), afaik the actual problem can
only happen through migrate_hrtimer_list() which uconditionally sets
the whole state to HRTIMER_STATE_MIGRATE.
Thomas? Leaving damaged patch quoted below.
Linus
> I suspect that it was targeted by mistake to not active list (timers_realtime-clock@kernel-bugs.osdl.org).
> Following is the fix patch based on kernel 3.16.1 (just simple):
> diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c
> --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300
> +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300
> @@ -941,7 +941,7 @@
> if (!timerqueue_getnext(&base->active))
> base->cpu_base->active_bases &= ~(1 << base->index);
> out:
> - timer->state = newstate;
> + timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK));
> }
>
> /*
>
> Is there a chance for this patch fix to insert into next kernel release?
>
> Thanks
>
> Itzcak Pechtalt
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
2014-09-02 16:08 ` Linus Torvalds
@ 2014-09-02 17:46 ` Itzcak Pechtalt
2014-09-02 19:08 ` Thomas Gleixner
1 sibling, 0 replies; 4+ messages in thread
From: Itzcak Pechtalt @ 2014-09-02 17:46 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: linux-kernel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2899 bytes --]
Indeed, my crash dump occurred on kernel 2.6.30 (RH 6.5 distribution) and in this version function remove_hrtimer doesn't preserve "HRTIMER_STATE_CALLBACK" flag and triggers hard lockup.
So this bug was already fixed in version 2.6.35 to set the flag by remove_hrtimer function.
Seems that migrate_hrtimer_list case isn't a problem because it is called when old CPU is already dead.
I'll fix the bug report to resolved.
Thanks
Itzcak Pechtalt
-----Original Message-----
From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
Sent: Tuesday, September 02, 2014 7:09 PM
To: Itzcak Pechtalt; Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Subject: Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt <itzcak@flashnetworks.com> wrote:
>
> I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this subject with full description.
> There is also a short fix patch for kernel/hrtimer.c file.
> Even if this bug occurs rary, however it resolves system hard lockup option.
The patch is whitespace-damaged, but with a small oneliner like this that doesn't much matter (the timer files moved to kernel/time/ during this merge window, so the patch wouldn't apply as-is anyway).
It needs a sign-off (see Documentation/SubmittingPatches), but even more importantly it needs to go to the right people for double-checking.
But the patch is more broken than whitespace and even lack of sign-off. It cannot even have compiled. I'm assuming "timer_state" was intended to be "timer->state". Also, every caller but one already has "HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in "newstate", so I suspect if this patch is the real fix (which I'll leave for Thomas to comment more on), afaik the actual problem can only happen through migrate_hrtimer_list() which uconditionally sets the whole state to HRTIMER_STATE_MIGRATE.
Thomas? Leaving damaged patch quoted below.
Linus
> I suspect that it was targeted by mistake to not active list (timers_realtime-clock@kernel-bugs.osdl.org).
> Following is the fix patch based on kernel 3.16.1 (just simple):
> diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c
> --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300
> +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300
> @@ -941,7 +941,7 @@
> if (!timerqueue_getnext(&base->active))
> base->cpu_base->active_bases &= ~(1 << base->index);
> out:
> - timer->state = newstate;
> + timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK));
> }
>
> /*
>
> Is there a chance for this patch fix to insert into next kernel release?
>
> Thanks
>
> Itzcak Pechtalt
>
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
2014-09-02 16:08 ` Linus Torvalds
2014-09-02 17:46 ` Itzcak Pechtalt
@ 2014-09-02 19:08 ` Thomas Gleixner
1 sibling, 0 replies; 4+ messages in thread
From: Thomas Gleixner @ 2014-09-02 19:08 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Itzcak Pechtalt, linux-kernel
On Tue, 2 Sep 2014, Linus Torvalds wrote:
> On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt
> <itzcak@flashnetworks.com> wrote:
> >
> > I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this subject with full description.
> > There is also a short fix patch for kernel/hrtimer.c file.
> > Even if this bug occurs rary, however it resolves system hard lockup option.
>
> The patch is whitespace-damaged, but with a small oneliner like this
> that doesn't much matter (the timer files moved to kernel/time/ during
> this merge window, so the patch wouldn't apply as-is anyway).
>
> It needs a sign-off (see Documentation/SubmittingPatches), but even
> more importantly it needs to go to the right people for
> double-checking.
>
> But the patch is more broken than whitespace and even lack of
> sign-off. It cannot even have compiled. I'm assuming "timer_state" was
> intended to be "timer->state". Also, every caller but one already has
> "HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in
> "newstate", so I suspect if this patch is the real fix (which I'll
> leave for Thomas to comment more on), afaik the actual problem can
> only happen through migrate_hrtimer_list() which uconditionally sets
> the whole state to HRTIMER_STATE_MIGRATE.
>
> Thomas? Leaving damaged patch quoted below.
Right. It's been fixed long ago and the migrate path cannot suffer
from this problem because at this point a callback running on the dead
cpu would cause the
BUG_ON(hrtimer_callback_running(timer));
a few lines above the remove_hrtimer() call to trigger and send the
machine into lala land.
Thanks,
tglx
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-09-02 19:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02 15:45 Race condition in HR timers that cause double insertion and hard lockup -- all latest versions Itzcak Pechtalt
2014-09-02 16:08 ` Linus Torvalds
2014-09-02 17:46 ` Itzcak Pechtalt
2014-09-02 19:08 ` Thomas Gleixner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.