* RCU question
@ 2004-12-09 23:59 George Anzinger
2004-12-10 4:31 ` Dipankar Sarma
0 siblings, 1 reply; 43+ messages in thread
From: George Anzinger @ 2004-12-09 23:59 UTC (permalink / raw)
To: Dipankar Sarma, Manfred Spraul, lkml
I am working on VST code. This code is called from the idle loop to check for
future timers. It then sets up a timer to interrupt in time to handle the
nearest timer and turns off the time base interrupt source. As part of
qualifying the entry to this state I want to make sure there is no pending work
so, from the idle task I have this:
if (local_softirq_pending())
do_softirq();
BUG_ON(local_softirq_pending());
I did not really expect to find any pending softirqs, but, not only are there
some, they don't go away and the system BUGs. The offender is the RCU task.
The question is: is this normal or is there something wrong?
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-09 23:59 RCU question George Anzinger
@ 2004-12-10 4:31 ` Dipankar Sarma
2004-12-10 19:42 ` George Anzinger
0 siblings, 1 reply; 43+ messages in thread
From: Dipankar Sarma @ 2004-12-10 4:31 UTC (permalink / raw)
To: ganzinger; +Cc: Manfred Spraul, lkml
On Thu, Dec 09, 2004 at 03:59:45PM -0800, George Anzinger wrote:
> I am working on VST code. This code is called from the idle loop to check
> for future timers. It then sets up a timer to interrupt in time to handle
> the nearest timer and turns off the time base interrupt source. As part of
> qualifying the entry to this state I want to make sure there is no pending
> work so, from the idle task I have this:
>
> if (local_softirq_pending())
> do_softirq();
>
> BUG_ON(local_softirq_pending());
>
> I did not really expect to find any pending softirqs, but, not only are
> there some, they don't go away and the system BUGs. The offender is the
> RCU task. The question is: is this normal or is there something wrong?
Why do you think there would not be any softirq pending after do_softirq() ?
What if the cpu gets a network interrupt which raises a softirq ?
And yes, RCU processing in softirq context can re-raise the softirq.
AFAICS, it is perfectly normal.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-10 4:31 ` Dipankar Sarma
@ 2004-12-10 19:42 ` George Anzinger
2004-12-10 20:40 ` Dipankar Sarma
0 siblings, 1 reply; 43+ messages in thread
From: George Anzinger @ 2004-12-10 19:42 UTC (permalink / raw)
To: dipankar; +Cc: ganzinger, Manfred Spraul, lkml
Dipankar Sarma wrote:
> On Thu, Dec 09, 2004 at 03:59:45PM -0800, George Anzinger wrote:
>
>>I am working on VST code. This code is called from the idle loop to check
>>for future timers. It then sets up a timer to interrupt in time to handle
>>the nearest timer and turns off the time base interrupt source. As part of
>>qualifying the entry to this state I want to make sure there is no pending
>>work so, from the idle task I have this:
>>
>> if (local_softirq_pending())
>> do_softirq();
>>
>> BUG_ON(local_softirq_pending());
>>
>>I did not really expect to find any pending softirqs, but, not only are
>>there some, they don't go away and the system BUGs. The offender is the
>>RCU task. The question is: is this normal or is there something wrong?
>
>
> Why do you think there would not be any softirq pending after do_softirq() ?
> What if the cpu gets a network interrupt which raises a softirq ?
Yes, but it is serviced on interrupt exit and the task level code would never
see it.
> And yes, RCU processing in softirq context can re-raise the softirq.
> AFAICS, it is perfectly normal.
My assumption was that, this being the idle task, RCU would be more than happy
to finish all its pending tasks.
It may be necessary for me to rethink the conditions required to go into the VST
state. I had assumed that it required NO softirq pending as a pre condition.
From this point on we would have the interrupt system off until the hardware
sleep instruction (hlt in the x86 case).
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-10 19:42 ` George Anzinger
@ 2004-12-10 20:40 ` Dipankar Sarma
2004-12-10 20:45 ` Lee Revell
0 siblings, 1 reply; 43+ messages in thread
From: Dipankar Sarma @ 2004-12-10 20:40 UTC (permalink / raw)
To: George Anzinger; +Cc: ganzinger, Manfred Spraul, lkml
On Fri, Dec 10, 2004 at 11:42:55AM -0800, George Anzinger wrote:
> Dipankar Sarma wrote:
> >And yes, RCU processing in softirq context can re-raise the softirq.
> >AFAICS, it is perfectly normal.
>
> My assumption was that, this being the idle task, RCU would be more than
> happy to finish all its pending tasks.
We try to avoid really long running softirqs (RCU tasklet in this case)
for better scheduling latency. A long running rcu tasklet during
an idle cpu may delay running of an RT process that becomes runnable
during the rcu tasklet.
>
> It may be necessary for me to rethink the conditions required to go into
> the VST state. I had assumed that it required NO softirq pending as a pre
> condition. From this point on we would have the interrupt system off until
> the hardware sleep instruction (hlt in the x86 case).
Unfortunately, we aren't there yet. But it is in my TODO list for a
generic nohz system.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-10 20:40 ` Dipankar Sarma
@ 2004-12-10 20:45 ` Lee Revell
2004-12-10 21:02 ` George Anzinger
0 siblings, 1 reply; 43+ messages in thread
From: Lee Revell @ 2004-12-10 20:45 UTC (permalink / raw)
To: dipankar; +Cc: George Anzinger, ganzinger, Manfred Spraul, lkml
On Sat, 2004-12-11 at 02:10 +0530, Dipankar Sarma wrote:
> On Fri, Dec 10, 2004 at 11:42:55AM -0800, George Anzinger wrote:
> > Dipankar Sarma wrote:
> > >And yes, RCU processing in softirq context can re-raise the softirq.
> > >AFAICS, it is perfectly normal.
> >
> > My assumption was that, this being the idle task, RCU would be more than
> > happy to finish all its pending tasks.
>
> We try to avoid really long running softirqs (RCU tasklet in this case)
> for better scheduling latency. A long running rcu tasklet during
> an idle cpu may delay running of an RT process that becomes runnable
> during the rcu tasklet.
>
Well, softirqs should really be preemptible if you care about RT task
latency. Ingo's patches have had this for months. Works great. Maybe
it's time to push it upstream.
Lee
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-10 20:45 ` Lee Revell
@ 2004-12-10 21:02 ` George Anzinger
2004-12-10 22:58 ` Zwane Mwaikambo
0 siblings, 1 reply; 43+ messages in thread
From: George Anzinger @ 2004-12-10 21:02 UTC (permalink / raw)
To: Lee Revell; +Cc: dipankar, ganzinger, Manfred Spraul, lkml
Lee Revell wrote:
> On Sat, 2004-12-11 at 02:10 +0530, Dipankar Sarma wrote:
>
>>On Fri, Dec 10, 2004 at 11:42:55AM -0800, George Anzinger wrote:
>>
>>>Dipankar Sarma wrote:
>>>
>>>>And yes, RCU processing in softirq context can re-raise the softirq.
>>>>AFAICS, it is perfectly normal.
>>>
>>>My assumption was that, this being the idle task, RCU would be more than
>>>happy to finish all its pending tasks.
>>
>>We try to avoid really long running softirqs (RCU tasklet in this case)
>>for better scheduling latency. A long running rcu tasklet during
>>an idle cpu may delay running of an RT process that becomes runnable
>>during the rcu tasklet.
>>
>
>
> Well, softirqs should really be preemptible if you care about RT task
> latency. Ingo's patches have had this for months. Works great. Maybe
> it's time to push it upstream.
Yes, I understand, and soft_irq() does turn on interrupts...
I was thinking of something like:
while(softirq_pending()) {
local_irq_enable();
do_softirq();
local_irq_disable();
}
<proceed to idle hlt...>
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-10 21:02 ` George Anzinger
@ 2004-12-10 22:58 ` Zwane Mwaikambo
2004-12-11 2:22 ` George Anzinger
0 siblings, 1 reply; 43+ messages in thread
From: Zwane Mwaikambo @ 2004-12-10 22:58 UTC (permalink / raw)
To: George Anzinger; +Cc: Lee Revell, dipankar, ganzinger, Manfred Spraul, lkml
On Fri, 10 Dec 2004, George Anzinger wrote:
> > Well, softirqs should really be preemptible if you care about RT task
> > latency. Ingo's patches have had this for months. Works great. Maybe
> > it's time to push it upstream.
>
> Yes, I understand, and soft_irq() does turn on interrupts...
> I was thinking of something like:
>
> while(softirq_pending()) {
> local_irq_enable();
> do_softirq();
> local_irq_disable();
> }
> <proceed to idle hlt...>
But that's a deadlock and if you enable interrupts you race.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-10 22:58 ` Zwane Mwaikambo
@ 2004-12-11 2:22 ` George Anzinger
2004-12-11 2:45 ` Zwane Mwaikambo
0 siblings, 1 reply; 43+ messages in thread
From: George Anzinger @ 2004-12-11 2:22 UTC (permalink / raw)
To: Zwane Mwaikambo; +Cc: Lee Revell, dipankar, ganzinger, Manfred Spraul, lkml
Zwane Mwaikambo wrote:
> On Fri, 10 Dec 2004, George Anzinger wrote:
>
>
>>>Well, softirqs should really be preemptible if you care about RT task
>>>latency. Ingo's patches have had this for months. Works great. Maybe
>>>it's time to push it upstream.
>>
>>Yes, I understand, and soft_irq() does turn on interrupts...
>>I was thinking of something like:
>>
>> while(softirq_pending()) {
>> local_irq_enable();
>> do_softirq();
>> local_irq_disable();
>> }
>> <proceed to idle hlt...>
>
>
> But that's a deadlock and if you enable interrupts you race.
Again, I remind you we are in the idle task. Nothing more important to do. Or
do you mean that softirq_pending() will NEVER return false?
The other question is: "Is useful work being done?"
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-11 2:22 ` George Anzinger
@ 2004-12-11 2:45 ` Zwane Mwaikambo
2004-12-11 3:29 ` George Anzinger
0 siblings, 1 reply; 43+ messages in thread
From: Zwane Mwaikambo @ 2004-12-11 2:45 UTC (permalink / raw)
To: George Anzinger; +Cc: Lee Revell, dipankar, ganzinger, Manfred Spraul, lkml
On Fri, 10 Dec 2004, George Anzinger wrote:
> > But that's a deadlock and if you enable interrupts you race.
>
> Again, I remind you we are in the idle task. Nothing more important to do.
> Or do you mean that softirq_pending() will NEVER return false?
>
> The other question is: "Is useful work being done?"
We're in the idle task but obviously interrupts (such as network) are
still coming in. So you may take an interrupt after your while
(softirq_pending()) loop has exited.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-11 2:45 ` Zwane Mwaikambo
@ 2004-12-11 3:29 ` George Anzinger
2004-12-11 14:52 ` Zwane Mwaikambo
0 siblings, 1 reply; 43+ messages in thread
From: George Anzinger @ 2004-12-11 3:29 UTC (permalink / raw)
To: Zwane Mwaikambo; +Cc: Lee Revell, dipankar, ganzinger, Manfred Spraul, lkml
Zwane Mwaikambo wrote:
> On Fri, 10 Dec 2004, George Anzinger wrote:
>
>
>>>But that's a deadlock and if you enable interrupts you race.
>>
>>Again, I remind you we are in the idle task. Nothing more important to do.
>>Or do you mean that softirq_pending() will NEVER return false?
>>
>>The other question is: "Is useful work being done?"
>
>
> We're in the idle task but obviously interrupts (such as network) are
> still coming in. So you may take an interrupt after your while
> (softirq_pending()) loop has exited.
That is ok. Either we have interrupts off and no softirqs are pending and we
proceed to the "hlt" (where the interrupt will be taken), or softirqs are
pending, we turn interrupts on, do the softirq, turn interrupts off and try
again. Unless some tasklet (RCU?) never "gives up" or we will exit the while
with interrupts off and move on to the "hlt". Or did I miss something?
-
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-11 3:29 ` George Anzinger
@ 2004-12-11 14:52 ` Zwane Mwaikambo
2004-12-11 16:32 ` Manfred Spraul
0 siblings, 1 reply; 43+ messages in thread
From: Zwane Mwaikambo @ 2004-12-11 14:52 UTC (permalink / raw)
To: George Anzinger; +Cc: Lee Revell, dipankar, ganzinger, Manfred Spraul, lkml
On Fri, 10 Dec 2004, George Anzinger wrote:
> That is ok. Either we have interrupts off and no softirqs are pending and we
> proceed to the "hlt" (where the interrupt will be taken), or softirqs are
> pending, we turn interrupts on, do the softirq, turn interrupts off and try
> again. Unless some tasklet (RCU?) never "gives up" or we will exit the while
> with interrupts off and move on to the "hlt". Or did I miss something?
But the point is that you cannot execute hlt with interrupts disabled.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-11 14:52 ` Zwane Mwaikambo
@ 2004-12-11 16:32 ` Manfred Spraul
2004-12-11 16:52 ` George Anzinger
0 siblings, 1 reply; 43+ messages in thread
From: Manfred Spraul @ 2004-12-11 16:32 UTC (permalink / raw)
To: Zwane Mwaikambo; +Cc: George Anzinger, Lee Revell, dipankar, ganzinger, lkml
Zwane Mwaikambo wrote:
>On Fri, 10 Dec 2004, George Anzinger wrote:
>
>
>
>>That is ok. Either we have interrupts off and no softirqs are pending and we
>>proceed to the "hlt" (where the interrupt will be taken), or softirqs are
>>pending, we turn interrupts on, do the softirq, turn interrupts off and try
>>again. Unless some tasklet (RCU?) never "gives up" or we will exit the while
>>with interrupts off and move on to the "hlt". Or did I miss something?
>>
>>
>
>But the point is that you cannot execute hlt with interrupts disabled.
>
>
The trick is the sti instruction: It enables interrupt processing after
the following instruction.
Thus
sti
hlt
cannot race - it atomically enables interrupts and waits.
--
Manfred
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-11 16:32 ` Manfred Spraul
@ 2004-12-11 16:52 ` George Anzinger
2004-12-12 2:53 ` Zwane Mwaikambo
0 siblings, 1 reply; 43+ messages in thread
From: George Anzinger @ 2004-12-11 16:52 UTC (permalink / raw)
To: Manfred Spraul; +Cc: Zwane Mwaikambo, Lee Revell, dipankar, ganzinger, lkml
Manfred Spraul wrote:
> Zwane Mwaikambo wrote:
>
>> On Fri, 10 Dec 2004, George Anzinger wrote:
>>
>>
>>
>>> That is ok. Either we have interrupts off and no softirqs are
>>> pending and we
>>> proceed to the "hlt" (where the interrupt will be taken), or softirqs
>>> are
>>> pending, we turn interrupts on, do the softirq, turn interrupts off
>>> and try
>>> again. Unless some tasklet (RCU?) never "gives up" or we will exit
>>> the while
>>> with interrupts off and move on to the "hlt". Or did I miss something?
>>>
>>
>>
>> But the point is that you cannot execute hlt with interrupts disabled.
>>
>>
> The trick is the sti instruction: It enables interrupt processing after
> the following instruction.
>
> Thus
> sti
> hlt
>
> cannot race - it atomically enables interrupts and waits.
Exactly :)
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-11 16:52 ` George Anzinger
@ 2004-12-12 2:53 ` Zwane Mwaikambo
2004-12-12 8:59 ` Manfred Spraul
0 siblings, 1 reply; 43+ messages in thread
From: Zwane Mwaikambo @ 2004-12-12 2:53 UTC (permalink / raw)
To: George Anzinger; +Cc: Manfred Spraul, Lee Revell, dipankar, ganzinger, lkml
On Sat, 11 Dec 2004, George Anzinger wrote:
> Manfred Spraul wrote:
> > >
> > The trick is the sti instruction: It enables interrupt processing after the
> > following instruction.
> >
> > Thus
> > sti
> > hlt
> >
> > cannot race - it atomically enables interrupts and waits.
>
> Exactly :)
Ok i wasn't aware that it was safe_halt() that he was referring too, my
poor assumption. But regardless, this seems highly fragile and relying on
behaviour which may change across processor models/vendors. I also found
the following excerpt from (http://sandpile.org/ia32/inter.htm) which you
may find interesting;
"Intel processors don't suppress SMI or NMI after an STI instruction.
Since the INTR suppresion is not preserved across an SMI or NMI handler,
this may result in an INTR being serviced after the STI, which constitutes
a violation of the INTR suppresion. Therefore, ideally the STI instruction
also suppresses SMI and NMI."
George thanks for persisting and explaining your point, i can be very slow
=)
Zwane
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-12 2:53 ` Zwane Mwaikambo
@ 2004-12-12 8:59 ` Manfred Spraul
2004-12-12 9:37 ` Andrea Arcangeli
2004-12-12 16:26 ` Zwane Mwaikambo
0 siblings, 2 replies; 43+ messages in thread
From: Manfred Spraul @ 2004-12-12 8:59 UTC (permalink / raw)
To: Zwane Mwaikambo; +Cc: George Anzinger, Lee Revell, dipankar, ganzinger, lkml
Zwane Mwaikambo wrote:
>"Intel processors don't suppress SMI or NMI after an STI instruction.
>Since the INTR suppresion is not preserved across an SMI or NMI handler,
>this may result in an INTR being serviced after the STI, which constitutes
>a violation of the INTR suppresion.
>
>
Interesting find.
It means that our NMI irq return path should check if it points to a hlt
instruction and if yes, then increase the saved EIP by one before doing
the iretd, right?
--
Manfred
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-12 8:59 ` Manfred Spraul
@ 2004-12-12 9:37 ` Andrea Arcangeli
2004-12-12 10:22 ` Manfred Spraul
2004-12-12 16:26 ` Zwane Mwaikambo
1 sibling, 1 reply; 43+ messages in thread
From: Andrea Arcangeli @ 2004-12-12 9:37 UTC (permalink / raw)
To: Manfred Spraul
Cc: Zwane Mwaikambo, George Anzinger, Lee Revell, dipankar, ganzinger, lkml
On Sun, Dec 12, 2004 at 09:59:00AM +0100, Manfred Spraul wrote:
> It means that our NMI irq return path should check if it points to a hlt
> instruction and if yes, then increase the saved EIP by one before doing
> the iretd, right?
I don't think we'll ever post any event through nmi, so it doesn't
matter. We only care to be waken by real irqs, not nmi/smi. Idle loop is
fine to ignore the actions of the nmi handlers and to hang into the
"hlt".
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-12 9:37 ` Andrea Arcangeli
@ 2004-12-12 10:22 ` Manfred Spraul
2004-12-12 12:15 ` Andrea Arcangeli
2004-12-12 16:51 ` RCU question George Anzinger
0 siblings, 2 replies; 43+ messages in thread
From: Manfred Spraul @ 2004-12-12 10:22 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Zwane Mwaikambo, George Anzinger, Lee Revell, dipankar, ganzinger, lkml
Andrea Arcangeli wrote:
>On Sun, Dec 12, 2004 at 09:59:00AM +0100, Manfred Spraul wrote:
>
>
>>It means that our NMI irq return path should check if it points to a hlt
>>instruction and if yes, then increase the saved EIP by one before doing
>>the iretd, right?
>>
>>
>
>I don't think we'll ever post any event through nmi, so it doesn't
>matter. We only care to be waken by real irqs, not nmi/smi. Idle loop is
>fine to ignore the actions of the nmi handlers and to hang into the
>"hlt".
>
>
No, You misunderstood the problem:
sti
** NMI handler
** normal interrupt arrives, is queued by the cpu
** irqd from NMI handler
** cpu notices the normal interrupt, handles it.
** normal interrupt does a wakeup, schedules a tasklet, whatever
** irqd from normal interupt
hlt << cpu sleeps.
Thus: lost wakeup.
--
Manfred
**
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-12 10:22 ` Manfred Spraul
@ 2004-12-12 12:15 ` Andrea Arcangeli
2004-12-14 21:40 ` Lee Revell
2004-12-12 16:51 ` RCU question George Anzinger
1 sibling, 1 reply; 43+ messages in thread
From: Andrea Arcangeli @ 2004-12-12 12:15 UTC (permalink / raw)
To: Manfred Spraul
Cc: Zwane Mwaikambo, George Anzinger, Lee Revell, dipankar, ganzinger, lkml
On Sun, Dec 12, 2004 at 11:22:49AM +0100, Manfred Spraul wrote:
> Andrea Arcangeli wrote:
>
> >On Sun, Dec 12, 2004 at 09:59:00AM +0100, Manfred Spraul wrote:
> >
> >
> >>It means that our NMI irq return path should check if it points to a hlt
> >>instruction and if yes, then increase the saved EIP by one before doing
> >>the iretd, right?
> >>
> >>
> >
> >I don't think we'll ever post any event through nmi, so it doesn't
> >matter. We only care to be waken by real irqs, not nmi/smi. Idle loop is
> >fine to ignore the actions of the nmi handlers and to hang into the
> >"hlt".
> >
> >
> No, You misunderstood the problem:
>
> sti
> ** NMI handler
> ** normal interrupt arrives, is queued by the cpu
> ** irqd from NMI handler
> ** cpu notices the normal interrupt, handles it.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ok. The above just wasn't obvious to me because iret of an nmi is doing
the same thing that sti does (and the nmi itself is like a cli).
Shouldn't iret wait 1 instruction too or is there a special case about
iret? The specs only tells sti waits 1 instruction, but they don't tell
anything about iret (nor that it waits nor that it doesn't wait). I
realized now the link posted here assumes iret isn't going to wait 1
instruction before processing pending irqs which is reasonable given the
specs don't tell anything about iret, but I didn't imagine there was a
difference between sti and iret (I mean only when iret is going to
change the interrupt enable flag from 0 to 1 just like sti does).
Overall this is a very minor issue (unless HZ is 0), it would only
introduce a 1/HZ latency to the irq that get posted while the nmi
handler is running, and the nmi handlers never runs in production.
Forcing idle=poll when the nmi watchdog is enabled is probably a
reasonable fix.
As for the SMI, I wonder how you plan to fix it. To me it sounds like a
minor mistake that iret isn't equivalent to sti when it toggles the irq
enable bitflag (infact I don't see a way to fix it for smi, though I
know very little about smi).
thanks.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-12 8:59 ` Manfred Spraul
2004-12-12 9:37 ` Andrea Arcangeli
@ 2004-12-12 16:26 ` Zwane Mwaikambo
1 sibling, 0 replies; 43+ messages in thread
From: Zwane Mwaikambo @ 2004-12-12 16:26 UTC (permalink / raw)
To: Manfred Spraul; +Cc: George Anzinger, Lee Revell, dipankar, ganzinger, lkml
On Sun, 12 Dec 2004, Manfred Spraul wrote:
> Zwane Mwaikambo wrote:
>
> > "Intel processors don't suppress SMI or NMI after an STI instruction. Since
> > the INTR suppresion is not preserved across an SMI or NMI handler, this may
> > result in an INTR being serviced after the STI, which constitutes a
> > violation of the INTR suppresion.
> >
> Interesting find.
> It means that our NMI irq return path should check if it points to a hlt
> instruction and if yes, then increase the saved EIP by one before doing the
> iretd, right?
Yeah that should do it, but then we also have to worry about SMIs, perhaps
we could add similar logic to interrupt return path instead?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-12 10:22 ` Manfred Spraul
2004-12-12 12:15 ` Andrea Arcangeli
@ 2004-12-12 16:51 ` George Anzinger
2004-12-12 22:40 ` Manfred Spraul
1 sibling, 1 reply; 43+ messages in thread
From: George Anzinger @ 2004-12-12 16:51 UTC (permalink / raw)
To: Manfred Spraul
Cc: Andrea Arcangeli, Zwane Mwaikambo, Lee Revell, dipankar, ganzinger, lkml
Manfred Spraul wrote:
> Andrea Arcangeli wrote:
>
>> On Sun, Dec 12, 2004 at 09:59:00AM +0100, Manfred Spraul wrote:
>>
>>
>>> It means that our NMI irq return path should check if it points to a
>>> hlt instruction and if yes, then increase the saved EIP by one before
>>> doing the iretd, right?
>>>
>>
>>
>> I don't think we'll ever post any event through nmi, so it doesn't
>> matter. We only care to be waken by real irqs, not nmi/smi. Idle loop is
>> fine to ignore the actions of the nmi handlers and to hang into the
>> "hlt".
>>
>>
> No, You misunderstood the problem:
>
> sti
> ** NMI handler
> ** normal interrupt arrives, is queued by the cpu
> ** irqd from NMI handler
> ** cpu notices the normal interrupt, handles it.
> ** normal interrupt does a wakeup, schedules a tasklet, whatever
I think you are forgetting that the system does the full context switch from the
interrupt handler (well, actually from entry.S) and does not do the irqd until
it is time to go back to the idle thread (i.e. there is nothing left to do), so..
> ** irqd from normal interupt
> hlt << cpu sleeps.
What we loose here is that idle does not go around its little loop again. If an
interrupt becomes pending on the way to the hlt, i.e. while entry.S has
interrupts masked and is doing the irqd, it will be handled prior to the hlt so
we could loose several of these idle loop spins, until no interrupt is pending
allowing the hlt to be executed. On the next interrupt/irqd the hlt will exit.
So what is lost is one or more spins round the idle loop.
The "normal" idle loop just looks at the need_resched flag and goes right back
to the hlt, however, idle, it self, never sets this flag, only interrupt code
can set it at this point, and the interrupt exit takes action to clear it so I
don't see it every being found set in the idle loop (I suppose one could do a
test to see if it is ever found set here), so, in theory, the net effect should
be nill.
Did I miss something?
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-12 16:51 ` RCU question George Anzinger
@ 2004-12-12 22:40 ` Manfred Spraul
2004-12-13 5:22 ` George Anzinger
0 siblings, 1 reply; 43+ messages in thread
From: Manfred Spraul @ 2004-12-12 22:40 UTC (permalink / raw)
To: george
Cc: Andrea Arcangeli, Zwane Mwaikambo, Lee Revell, dipankar, ganzinger, lkml
George Anzinger wrote:
>
> The "normal" idle loop just looks at the need_resched flag and goes
> right back to the hlt,
That's the problem: If a the tasklet does a wakeup then the reschedule
is delayed until the next interrupt. Testing need_resched and executing
hlt must be atomic, but it isn't - NMIs break the atomicity.
Not a big deal, except if someone implements a tickless kernel. I think
we can ignore it for now [or was the thread started by someone who
want's to disable the hardware timer when the system is really idle?]
--
Manfred
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-12 22:40 ` Manfred Spraul
@ 2004-12-13 5:22 ` George Anzinger
0 siblings, 0 replies; 43+ messages in thread
From: George Anzinger @ 2004-12-13 5:22 UTC (permalink / raw)
To: Manfred Spraul
Cc: Andrea Arcangeli, Zwane Mwaikambo, Lee Revell, dipankar, ganzinger, lkml
Manfred Spraul wrote:
> George Anzinger wrote:
>
>>
>> The "normal" idle loop just looks at the need_resched flag and goes
>> right back to the hlt,
>
>
> That's the problem: If a the tasklet does a wakeup then the reschedule
> is delayed until the next interrupt.
Not so. On the interrupt that runs the tasklet, on the way out via entry.S, the
need_resched flag is checked and acted on. Thus the task switch is done prio to
getting back to the hlt.
> Testing need_resched and executing
> hlt must be atomic, but it isn't - NMIs break the atomicity.
Actually this is not required, especially if preemption is turned on.
> Not a big deal, except if someone implements a tickless kernel.
Well, it is not tickless, but VST that I am working on :). The notion is to
turn off the ticks when in idle and there are not time events in the list.
I think
> we can ignore it for now [or was the thread started by someone who
> want's to disable the hardware timer when the system is really idle?]
Yep, me! But still, I keep a timer around to exit, it is just way more than a
tick later (depending on what the next entry in the time list needs).
>
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: RCU question
2004-12-12 12:15 ` Andrea Arcangeli
@ 2004-12-14 21:40 ` Lee Revell
2004-12-14 22:23 ` [patch, 2.6.10-rc3] safe_hlt() & NMIs Ingo Molnar
0 siblings, 1 reply; 43+ messages in thread
From: Lee Revell @ 2004-12-14 21:40 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Manfred Spraul, Zwane Mwaikambo, George Anzinger, dipankar,
ganzinger, lkml, Ingo Molnar
On Sun, 2004-12-12 at 13:15 +0100, Andrea Arcangeli wrote:
> Overall this is a very minor issue (unless HZ is 0), it would only
> introduce a 1/HZ latency to the irq that get posted while the nmi
> handler is running, and the nmi handlers never runs in production.
Ingo, couldn't this account for some of the inexplicable outliers some
people were seeing in latency tests?
Lee
^ permalink raw reply [flat|nested] 43+ messages in thread
* [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 21:40 ` Lee Revell
@ 2004-12-14 22:23 ` Ingo Molnar
2004-12-14 22:47 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 43+ messages in thread
From: Ingo Molnar @ 2004-12-14 22:23 UTC (permalink / raw)
To: Lee Revell
Cc: Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Linus Torvalds, Andi Kleen
* Lee Revell <rlrevell@joe-job.com> wrote:
> On Sun, 2004-12-12 at 13:15 +0100, Andrea Arcangeli wrote:
> > Overall this is a very minor issue (unless HZ is 0), it would only
> > introduce a 1/HZ latency to the irq that get posted while the nmi
> > handler is running, and the nmi handlers never runs in production.
>
> Ingo, couldn't this account for some of the inexplicable outliers some
> people were seeing in latency tests?
indeed, there could be a connection, and it's certainly a fun race. The
proper fix is Manfred's suggestion: check whether the EIP is a kernel
text address, and if yes, whether it's a HLT instruction - and if yes
then increase EIP by 1. I've included the fix in the -33-02 -RT patch.
Andrew, Linus: upstream fix is below - i think it's post-2.6.10 stuff.
Tested it on SMP and UP x86, using both the IO-APIC and the local-APIC
based NMI watchdog.
i think x64 needs a similar fix as well.
Ingo
--- linux/arch/i386/kernel/traps.c.orig
+++ linux/arch/i386/kernel/traps.c
@@ -670,6 +670,17 @@ fastcall void do_nmi(struct pt_regs * re
cpu = smp_processor_id();
+ /*
+ * Fix up obscure CPU behavior: if we interrupt safe_hlt() via
+ * the NMI then we might miss a reschedule if an interrupt is
+ * posted to the CPU and executes before the HLT instruction.
+ *
+ * We check whether the EIP is kernelspace, and if yes, whether
+ * the instruction is HLT:
+ */
+ if (__kernel_text_address(regs->eip) && *(char *)regs->eip == 0xf4)
+ regs->eip++;
+
#ifdef CONFIG_HOTPLUG_CPU
if (!cpu_online(cpu)) {
nmi_exit();
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 22:23 ` [patch, 2.6.10-rc3] safe_hlt() & NMIs Ingo Molnar
@ 2004-12-14 22:47 ` Ingo Molnar
2004-12-14 23:09 ` Linus Torvalds
2004-12-14 23:41 ` Andrea Arcangeli
2004-12-14 23:00 ` Linus Torvalds
2004-12-15 6:27 ` Avi Kivity
2 siblings, 2 replies; 43+ messages in thread
From: Ingo Molnar @ 2004-12-14 22:47 UTC (permalink / raw)
To: Lee Revell
Cc: Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Linus Torvalds, Andi Kleen
* Ingo Molnar <mingo@elte.hu> wrote:
> indeed, there could be a connection, and it's certainly a fun race.
> The proper fix is Manfred's suggestion: check whether the EIP is a
> kernel text address, and if yes, whether it's a HLT instruction - and
> if yes then increase EIP by 1. I've included the fix in the -33-02 -RT
> patch. Andrew, Linus: upstream fix is below - i think it's post-2.6.10
> stuff. Tested it on SMP and UP x86, using both the IO-APIC and the
> local-APIC based NMI watchdog.
>
> i think x64 needs a similar fix as well.
find the correct patch below. I've tested it with an NMI watchdog
frequency artificially increased to 10 KHz, and i've instrumented the
new branch in the NMI handler, but even under heavy IRQ load i was not
able to trigger the branch. Maybe newer CPUs handle this case somehow
and make sti;hlt truly atomic? I tried this on an old Celeron
(Mendocino) and on an Athlon64.
Ingo
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--- linux/arch/i386/kernel/traps.c.orig
+++ linux/arch/i386/kernel/traps.c
@@ -670,6 +670,18 @@ fastcall void do_nmi(struct pt_regs * re
cpu = smp_processor_id();
+ /*
+ * Fix up obscure CPU behavior: if we interrupt safe_hlt() via
+ * the NMI then we might miss a reschedule if an interrupt is
+ * posted to the CPU and executes before the HLT instruction.
+ *
+ * We check whether the EIP is kernelspace, and if yes, whether
+ * the instruction is HLT:
+ */
+ if (__kernel_text_address(regs->eip) &&
+ *(unsigned char *)regs->eip == 0xf4)
+ regs->eip++;
+
#ifdef CONFIG_HOTPLUG_CPU
if (!cpu_online(cpu)) {
nmi_exit();
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 22:23 ` [patch, 2.6.10-rc3] safe_hlt() & NMIs Ingo Molnar
2004-12-14 22:47 ` Ingo Molnar
@ 2004-12-14 23:00 ` Linus Torvalds
2004-12-15 5:04 ` Andi Kleen
2004-12-15 6:27 ` Avi Kivity
2 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2004-12-14 23:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Andi Kleen
On Tue, 14 Dec 2004, Ingo Molnar wrote:
>
> indeed, there could be a connection, and it's certainly a fun race. The
> proper fix is Manfred's suggestion: check whether the EIP is a kernel
> text address, and if yes, whether it's a HLT instruction - and if yes
> then increase EIP by 1.
You do it the wrong way, though. This is not safe:
if (__kernel_text_address(regs->eip) && *(char *)regs->eip == 0xf4)
does _entirely_ the wrong thing if CS is not the kernel CS.
It can trigger with a regular use CS if you were to run the 4G:4G patches,
but more realistically, I think you can make ii trigger even with a
standard kernel by creating a local code segment in your LDT, and then
trying to confuse the kernel that way.
Now, as long as the _only_ thing it does is increment the eip, the worst
that can happen is that it screws over the user program that must have
worked at this a bit, but the basic point is that you shouldn't do this.
In _theory_ you could confuse a real program that wasn't doing anything
bad.
Checking for kernel CS also requires checking that it's not vm86 mode,
btw. So that's not just a "regs->xcs & 0xffff == __KERNEL_CS" either.
But something like
static inline int kernel_mode(struct pt_regs *regs)
{
return !((regs->eflags & VM_MASK) | (regs->xcs & 3));
}
should DTRT.
Can you pls double-check my thinking, and test?
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 22:47 ` Ingo Molnar
@ 2004-12-14 23:09 ` Linus Torvalds
2004-12-15 8:52 ` Ingo Molnar
2004-12-16 0:37 ` Alan Cox
2004-12-14 23:41 ` Andrea Arcangeli
1 sibling, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2004-12-14 23:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Andi Kleen
On Tue, 14 Dec 2004, Ingo Molnar wrote:
>
> find the correct patch below. I've tested it with an NMI watchdog
> frequency artificially increased to 10 KHz, and i've instrumented the
> new branch in the NMI handler, but even under heavy IRQ load i was not
> able to trigger the branch. Maybe newer CPUs handle this case somehow
> and make sti;hlt truly atomic?
Now that you mention it, I have this dim memory of the one-instruction
"sti-shadow" actually disabling NMI's (and debug traps) too. The CPU
literally doesn't test for async events following "sti".
Or maybe that was "mov->ss". That one also has that strange "black hole"
for one instruction.
Hmm.. You could be evil and try to fill up 64kB worth of memory with a
"mov %ax,%ss", and jump to it in vm86 mode and see what happens. The eip
will just keep wrapping around...
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 22:47 ` Ingo Molnar
2004-12-14 23:09 ` Linus Torvalds
@ 2004-12-14 23:41 ` Andrea Arcangeli
1 sibling, 0 replies; 43+ messages in thread
From: Andrea Arcangeli @ 2004-12-14 23:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, Manfred Spraul, Zwane Mwaikambo, George Anzinger,
dipankar, ganzinger, lkml, Andrew Morton, Linus Torvalds,
Andi Kleen
On Tue, Dec 14, 2004 at 11:47:06PM +0100, Ingo Molnar wrote:
> find the correct patch below. I've tested it with an NMI watchdog
> frequency artificially increased to 10 KHz, and i've instrumented the
Nice test, it'd be nice to trigger it in real life.
on the lines of the 64k movl ss, I wonder if we could create an huge
piece of memory like this:
new_htl:
cli
sti
htl
cli
sti
htl
[..]
jmp original_hlt
and to call new_htl from original_hlt instead of sti;hlt. A dozen megs
of the above should boost the probability of getting interrupted in
"hlt" quite a bit.
However even if the nmi can execute on top of the "hlt" instruction, it
doesn't necessairly mean the next pending irq will execute before
executing 'hlt' too, so it'd need a bit more of instrumentation to as
well track down the race as happening (it's not enough to see the branch
in the nmi handler to be taken). The additional instrumentation should
be quite easy though, just copying the same nmi code to the irq handler
should do the trick.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 23:00 ` Linus Torvalds
@ 2004-12-15 5:04 ` Andi Kleen
0 siblings, 0 replies; 43+ messages in thread
From: Andi Kleen @ 2004-12-15 5:04 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Lee Revell, Andrea Arcangeli, Manfred Spraul,
Zwane Mwaikambo, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
> But something like
>
> static inline int kernel_mode(struct pt_regs *regs)
> {
> return !((regs->eflags & VM_MASK) | (regs->xcs & 3));
> }
>
> should DTRT.
>
> Can you pls double-check my thinking, and test?
Reasoning looks correct to me.
-Andi
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 22:23 ` [patch, 2.6.10-rc3] safe_hlt() & NMIs Ingo Molnar
2004-12-14 22:47 ` Ingo Molnar
2004-12-14 23:00 ` Linus Torvalds
@ 2004-12-15 6:27 ` Avi Kivity
2004-12-15 8:51 ` Ingo Molnar
2 siblings, 1 reply; 43+ messages in thread
From: Avi Kivity @ 2004-12-15 6:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Linus Torvalds, Andi Kleen
Ingo Molnar wrote:
>+ if (__kernel_text_address(regs->eip) && *(char *)regs->eip == 0xf4)
>
>
shouldn't that cast be (unsigned char *), otherwise the test will always
fail?
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-15 6:27 ` Avi Kivity
@ 2004-12-15 8:51 ` Ingo Molnar
0 siblings, 0 replies; 43+ messages in thread
From: Ingo Molnar @ 2004-12-15 8:51 UTC (permalink / raw)
To: Avi Kivity
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Linus Torvalds, Andi Kleen
* Avi Kivity <avi@argo.co.il> wrote:
> Ingo Molnar wrote:
>
> >+ if (__kernel_text_address(regs->eip) && *(char *)regs->eip == 0xf4)
> >
> >
> shouldn't that cast be (unsigned char *), otherwise the test will
> always fail?
yes, i fixed this in the second patch. (the compiler warned about it
too)
Ingo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 23:09 ` Linus Torvalds
@ 2004-12-15 8:52 ` Ingo Molnar
2004-12-15 15:44 ` Linus Torvalds
2004-12-16 0:37 ` Alan Cox
1 sibling, 1 reply; 43+ messages in thread
From: Ingo Molnar @ 2004-12-15 8:52 UTC (permalink / raw)
To: Linus Torvalds
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Andi Kleen
* Linus Torvalds <torvalds@osdl.org> wrote:
> > find the correct patch below. I've tested it with an NMI watchdog
> > frequency artificially increased to 10 KHz, and i've instrumented the
> > new branch in the NMI handler, but even under heavy IRQ load i was not
> > able to trigger the branch. Maybe newer CPUs handle this case somehow
> > and make sti;hlt truly atomic?
>
> Now that you mention it, I have this dim memory of the one-instruction
> "sti-shadow" actually disabling NMI's (and debug traps) too. The CPU
> literally doesn't test for async events following "sti".
i ran the stresstest overnight with the 10 KHz NMI, and not a single
time did the new branch trigger, out of hundreds of millions of IRQs and
NMIs. I think this suggests that the race doesnt exist in current CPUs.
Ingo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-15 8:52 ` Ingo Molnar
@ 2004-12-15 15:44 ` Linus Torvalds
2004-12-15 16:35 ` Ingo Molnar
0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2004-12-15 15:44 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Andi Kleen
On Wed, 15 Dec 2004, Ingo Molnar wrote:
>
> i ran the stresstest overnight with the 10 KHz NMI, and not a single
> time did the new branch trigger, out of hundreds of millions of IRQs and
> NMIs. I think this suggests that the race doesnt exist in current CPUs.
That may well be true, but I'm not convinced your test is meaningful or
shows anything.
The thing is, either the CPU is busy, or it's idle. If it's busy, you'll
never see this. And if it's idle, it will always be _in_ the "halt"
instruction.
The only way to see the case is in the borderline cases, and if/when there
are multiple different interrupts (first non-NMI interrupt takes it out of
the hlt, and then the NMI happens to catch the sti). And quite frankly, I
don't see how you would stress-test it. A 1kHz timer interrupt with a
10kHz NMI interrupt is still very infrequent interrupts...
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-15 15:44 ` Linus Torvalds
@ 2004-12-15 16:35 ` Ingo Molnar
0 siblings, 0 replies; 43+ messages in thread
From: Ingo Molnar @ 2004-12-15 16:35 UTC (permalink / raw)
To: Linus Torvalds
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Andi Kleen
* Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Wed, 15 Dec 2004, Ingo Molnar wrote:
> >
> > i ran the stresstest overnight with the 10 KHz NMI, and not a single
> > time did the new branch trigger, out of hundreds of millions of IRQs and
> > NMIs. I think this suggests that the race doesnt exist in current CPUs.
>
> That may well be true, but I'm not convinced your test is meaningful
> or shows anything.
>
> The thing is, either the CPU is busy, or it's idle. If it's busy,
> you'll never see this. And if it's idle, it will always be _in_ the
> "halt" instruction.
i deliberately started a test where there was roughly 50% idle time.
> The only way to see the case is in the borderline cases, and if/when
> there are multiple different interrupts (first non-NMI interrupt takes
> it out of the hlt, and then the NMI happens to catch the sti). And
> quite frankly, I don't see how you would stress-test it. A 1kHz timer
> interrupt with a 10kHz NMI interrupt is still very infrequent
> interrupts...
i started an infinite loop that generated disk IRQs, and started a
network test that generated network IRQs. The IRQ rate was roughly
10K/sec - this combined with the 10K/sec NMI rate should be an adequate
mix. (I also made sure that it's really default_idle that is used.)
Ingo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 23:09 ` Linus Torvalds
2004-12-15 8:52 ` Ingo Molnar
@ 2004-12-16 0:37 ` Alan Cox
2004-12-16 1:58 ` Linus Torvalds
2004-12-16 2:10 ` Zwane Mwaikambo
1 sibling, 2 replies; 43+ messages in thread
From: Alan Cox @ 2004-12-16 0:37 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Lee Revell, Andrea Arcangeli, Manfred Spraul,
Zwane Mwaikambo, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
On Maw, 2004-12-14 at 23:09, Linus Torvalds wrote:
> On Tue, 14 Dec 2004, Ingo Molnar wrote:
> Now that you mention it, I have this dim memory of the one-instruction
> "sti-shadow" actually disabling NMI's (and debug traps) too. The CPU
> literally doesn't test for async events following "sti".
>
> Or maybe that was "mov->ss". That one also has that strange "black hole"
> for one instruction.
The mov to ss one is a bit more magic than that however. If you write
3Gb of mov->ss into memory (ie about 64 pages to thrash the cache and
slow it plus mmap repeatedly) and run it you don't get a vastly long irq
delay at least on intel, not tried the others.
Alan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 0:37 ` Alan Cox
@ 2004-12-16 1:58 ` Linus Torvalds
2004-12-16 14:51 ` Ingo Molnar
2004-12-16 2:10 ` Zwane Mwaikambo
1 sibling, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2004-12-16 1:58 UTC (permalink / raw)
To: Alan Cox
Cc: Ingo Molnar, Lee Revell, Andrea Arcangeli, Manfred Spraul,
Zwane Mwaikambo, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
On Thu, 16 Dec 2004, Alan Cox wrote:
> On Maw, 2004-12-14 at 23:09, Linus Torvalds wrote:
> > On Tue, 14 Dec 2004, Ingo Molnar wrote:
> > Now that you mention it, I have this dim memory of the one-instruction
> > "sti-shadow" actually disabling NMI's (and debug traps) too. The CPU
> > literally doesn't test for async events following "sti".
> >
> > Or maybe that was "mov->ss". That one also has that strange "black hole"
> > for one instruction.
>
> The mov to ss one is a bit more magic than that however. If you write
> 3Gb of mov->ss into memory (ie about 64 pages to thrash the cache and
> slow it plus mmap repeatedly) and run it you don't get a vastly long irq
> delay at least on intel, not tried the others.
The irq window should actually be open every alternate instruction, I
think. Although it's not actually architected, and I thought that there
was some errata for some CPU about this..
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 0:37 ` Alan Cox
2004-12-16 1:58 ` Linus Torvalds
@ 2004-12-16 2:10 ` Zwane Mwaikambo
2004-12-16 13:26 ` Alan Cox
1 sibling, 1 reply; 43+ messages in thread
From: Zwane Mwaikambo @ 2004-12-16 2:10 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, Ingo Molnar, Lee Revell, Andrea Arcangeli,
Manfred Spraul, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
On Thu, 16 Dec 2004, Alan Cox wrote:
> On Maw, 2004-12-14 at 23:09, Linus Torvalds wrote:
> > On Tue, 14 Dec 2004, Ingo Molnar wrote:
> > Now that you mention it, I have this dim memory of the one-instruction
> > "sti-shadow" actually disabling NMI's (and debug traps) too. The CPU
> > literally doesn't test for async events following "sti".
> >
> > Or maybe that was "mov->ss". That one also has that strange "black hole"
> > for one instruction.
>
> The mov to ss one is a bit more magic than that however. If you write
> 3Gb of mov->ss into memory (ie about 64 pages to thrash the cache and
> slow it plus mmap repeatedly) and run it you don't get a vastly long irq
> delay at least on intel, not tried the others.
Might this be because you can't rely on interrupt suppression for back to
back suppressing instructions?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 2:10 ` Zwane Mwaikambo
@ 2004-12-16 13:26 ` Alan Cox
0 siblings, 0 replies; 43+ messages in thread
From: Alan Cox @ 2004-12-16 13:26 UTC (permalink / raw)
To: Zwane Mwaikambo
Cc: Linus Torvalds, Ingo Molnar, Lee Revell, Andrea Arcangeli,
Manfred Spraul, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
On Iau, 2004-12-16 at 02:10, Zwane Mwaikambo wrote:
> Might this be because you can't rely on interrupt suppression for back to
> back suppressing instructions?
The documentation seems to have little to say on this. I've also not
tried things like interleaved mov->ss, sti to see how the interlocking
is done. It would make sense given the original 8086 reason was to allow
ss/sp to be loaded cleanly.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 1:58 ` Linus Torvalds
@ 2004-12-16 14:51 ` Ingo Molnar
2004-12-16 15:08 ` Maciej W. Rozycki
2004-12-16 15:54 ` Linus Torvalds
0 siblings, 2 replies; 43+ messages in thread
From: Ingo Molnar @ 2004-12-16 14:51 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, Lee Revell, Andrea Arcangeli, Manfred Spraul,
Zwane Mwaikambo, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
* Linus Torvalds <torvalds@osdl.org> wrote:
> The irq window should actually be open every alternate instruction, I
> think. Although it's not actually architected, and I thought that
> there was some errata for some CPU about this..
i have generated an instruction-granularity profile of kernel code
executing the following sequence, driven by the NMI watchdog interrupt:
asm ("cli; cli; sti; cli; sti; cli; sti; cli; sti; cli; sti; ");
asm ("cli; cli; sti; cli; sti; cli; sti; cli; sti; cli; sti; ");
asm ("cli; cli; sti; cli; sti; cli; sti; cli; sti; cli; sti; ");
the first CLI is done twice, to prove that the NMI profiling works and
that the kernel can be interrupted in those places. Then i called this
kernel code in a loop. Here's the result:
c0125ee9: 1529 fa cli
^---------------------------------- # of profiler hits
c0125eea: 507 fb sti
c0125eeb: 0 fa cli
c0125eec: 3719 fb sti
c0125eed: 0 fa cli
c0125eee: 1579 fb sti
c0125eef: 0 fa cli
c0125ef0: 3317 fb sti
c0125ef1: 0 fa cli
c0125ef2: 3030 fb sti
c0125ef3: 0 fa cli
c0125ef4: 2497 fa cli
c0125ef5: 1055 fb sti
c0125ef6: 0 fa cli
c0125ef7: 4674 fb sti
c0125ef8: 0 fa cli
c0125ef9: 3827 fb sti
c0125efa: 0 fa cli
c0125efb: 1622 fb sti
c0125efc: 0 fa cli
c0125efd: 3155 fb sti
c0125efe: 0 fa cli
c0125eff: 1273 fa cli
c0125f00: 512 fb sti
c0125f01: 0 fa cli
c0125f02: 1312 fb sti
c0125f03: 0 fa cli
c0125f04: 1426 fb sti
c0125f05: 0 fa cli
c0125f06: 1507 fb sti
c0125f07: 0 fa cli
c0125f08: 2720 fb sti
c0125f09: 0 fa cli
c0125f0a: 2469 fa cli
c0125f0b: 787 fb sti
c0125f0c: 0 fa cli
c0125f0d: 2085 fb sti
c0125f0e: 0 fa cli
the 'cli' is always a 'black hole' to the NMI, while the second of two
consecutive cli's are not.
i also played a bit with the %ss instructions, and combined them with
the cli/sti instructions and other instructions in various ways, and
with a bit of experimenting found the following, somewhat surprising
results:
c0125f33: 1016 66 8c d0 mov %ss,%ax
c0125f36: 6626 8e d0 mov %eax,%ss
c0125f38: 34715 8e d0 mov %eax,%ss
c0125f3a: 14682 8e d0 mov %eax,%ss
c0125f3c: 4521 8e d0 mov %eax,%ss
c0125f3e: 7564 8e d0 mov %eax,%ss
c0125f40: 3861 66 8e d0 mov %ax,%ss
c0125f43: 0 66 8c d1 mov %ss,%cx
c0125f46: 1061 66 8c da mov %ds,%dx
c0125f49: 7660 8e d1 mov %ecx,%ss
c0125f4b: 11322 17 pop %ss
c0125f4c: 0 fb sti
c0125f4d: 8935 8e d1 mov %ecx,%ss
c0125f4f: 0 fa cli
c0125f50: 2198 66 8c d1 mov %ss,%cx
c0125f53: 735 66 8c da mov %ds,%dx
c0125f56: 0 8e da mov %edx,%ds
c0125f58: 6400 8e d0 mov %eax,%ss
c0125f5a: 3062 8e d0 mov %eax,%ss
c0125f5c: 3552 8e d0 mov %eax,%ss
c0125f5e: 4818 8e d0 mov %eax,%ss
c0125f60: 0 fb sti
c0125f61: 0 66 8c da mov %ds,%dx
c0125f64: 17788 8e d0 mov %eax,%ss
c0125f66: 64694 8e d0 mov %eax,%ss
c0125f68: 12837 8e d0 mov %eax,%ss
c0125f6a: 9859 8e d0 mov %eax,%ss
c0125f6c: 0 fb sti
c0125f6d: 74506 8e d0 mov %eax,%ss
c0125f6f: 0 fb sti
c0125f70: 8589 fa cli
c0125f71: 10248 8e d0 mov %eax,%ss
c0125f73: 3825 8e d0 mov %eax,%ss
c0125f75: 4903 8e d0 mov %eax,%ss
c0125f77: 71134 8e d0 mov %eax,%ss
c0125f79: 0 fb sti
c0125f7a: 0 fa cli
c0125f7b: 7461 8e d0 mov %eax,%ss
c0125f7d: 0 66 8c d0 mov %ss,%ax
c0125f80: 39387 8e d0 mov %eax,%ss
c0125f82: 0 fa cli
c0125f83: 41484 8e d0 mov %eax,%ss
c0125f85: 0 fa cli
c0125f86: 4490 8e d0 mov %eax,%ss
c0125f88: 0 fa cli
c0125f89: 6024 8e d0 mov %eax,%ss
c0125f8b: 15454 8e d0 mov %eax,%ss
c0125f8d: 0 fb sti
c0125f8e: 0 fb sti
c0125f8f: 115104 fb sti
c0125f90: 39061 fb sti
it shows a number of interesting effects:
- "mov %eax,%ss" followed by the _same_ instruction cancels the
black-hole. This i suspect is done to prevent the lockup in vm86
mode.
- an %ss black-hole instruction followed by 'sti' cancels sti's
black-hole. This is unlikely to occur in real kernel code, but we
might want to add a 'nop' in front of safe_halt()'s sti, to make sure
the black-hole takes effect.
- in one case a two-instruction blackhole was created - but this might
be some prefetch effect.
i played around with the instructions a bit to manufacture combinations
that enlengthen the black-hole but failed :) This was on an Athlon64.
Ingo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 14:51 ` Ingo Molnar
@ 2004-12-16 15:08 ` Maciej W. Rozycki
2004-12-16 15:11 ` Ingo Molnar
2004-12-16 15:54 ` Linus Torvalds
1 sibling, 1 reply; 43+ messages in thread
From: Maciej W. Rozycki @ 2004-12-16 15:08 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Alan Cox, Lee Revell, Andrea Arcangeli,
Manfred Spraul, Zwane Mwaikambo, George Anzinger, dipankar,
ganzinger, lkml, Andrew Morton, Andi Kleen
On Thu, 16 Dec 2004, Ingo Molnar wrote:
> c0125ee9: 1529 fa cli
> ^---------------------------------- # of profiler hits
> c0125eea: 507 fb sti
> c0125eeb: 0 fa cli
> c0125eec: 3719 fb sti
> c0125eed: 0 fa cli
> c0125eee: 1579 fb sti
> c0125eef: 0 fa cli
> c0125ef0: 3317 fb sti
> c0125ef1: 0 fa cli
> c0125ef2: 3030 fb sti
> c0125ef3: 0 fa cli
> c0125ef4: 2497 fa cli
> c0125ef5: 1055 fb sti
> c0125ef6: 0 fa cli
[...]
> the 'cli' is always a 'black hole' to the NMI, while the second of two
> consecutive cli's are not.
It looks like the 'sti' is actually the black hole -- remember interrupts
are traps, that is they are probed for and taken after instruction
execution.
Maciej
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 15:08 ` Maciej W. Rozycki
@ 2004-12-16 15:11 ` Ingo Molnar
2004-12-16 15:42 ` Maciej W. Rozycki
0 siblings, 1 reply; 43+ messages in thread
From: Ingo Molnar @ 2004-12-16 15:11 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: Linus Torvalds, Alan Cox, Lee Revell, Andrea Arcangeli,
Manfred Spraul, Zwane Mwaikambo, George Anzinger, dipankar,
ganzinger, lkml, Andrew Morton, Andi Kleen
* Maciej W. Rozycki <macro@linux-mips.org> wrote:
> On Thu, 16 Dec 2004, Ingo Molnar wrote:
>
> > c0125ee9: 1529 fa cli
> > ^---------------------------------- # of profiler hits
> > c0125eea: 507 fb sti
> > c0125eeb: 0 fa cli
> > c0125eec: 3719 fb sti
> > c0125eed: 0 fa cli
> > c0125eee: 1579 fb sti
> > c0125eef: 0 fa cli
> > c0125ef0: 3317 fb sti
> > c0125ef1: 0 fa cli
> > c0125ef2: 3030 fb sti
> > c0125ef3: 0 fa cli
> > c0125ef4: 2497 fa cli
> > c0125ef5: 1055 fb sti
> > c0125ef6: 0 fa cli
> [...]
> > the 'cli' is always a 'black hole' to the NMI, while the second of two
> > consecutive cli's are not.
>
> It looks like the 'sti' is actually the black hole -- remember
> interrupts are traps, that is they are probed for and taken after
> instruction execution.
The 'sti' "shadows" the cli, i.e. we'll never get an interrupt that gets
inbetween 'sti;cli'. I.e. sti is the black-hole generator, and 'cli' is
in the black hole. In that sense the 'cli' is in a black hole to the
NMI: the NMI will never see cli as the 'next to be executed'
instruction.
Ingo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 15:11 ` Ingo Molnar
@ 2004-12-16 15:42 ` Maciej W. Rozycki
0 siblings, 0 replies; 43+ messages in thread
From: Maciej W. Rozycki @ 2004-12-16 15:42 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Alan Cox, Lee Revell, Andrea Arcangeli,
Manfred Spraul, Zwane Mwaikambo, George Anzinger, dipankar,
ganzinger, lkml, Andrew Morton, Andi Kleen
On Thu, 16 Dec 2004, Ingo Molnar wrote:
> The 'sti' "shadows" the cli, i.e. we'll never get an interrupt that gets
> inbetween 'sti;cli'. I.e. sti is the black-hole generator, and 'cli' is
> in the black hole. In that sense the 'cli' is in a black hole to the
> NMI: the NMI will never see cli as the 'next to be executed'
> instruction.
That's what I meant indeed, but I'd like to emphasise, for readers to be
aware, the black hole is not tied to the 'cli' instruction itself in any
way. The black-holed instruction needs not be a 'cli' -- it can be an
arbitrary one except from ones creating black holes as you've observed
with your test.
Maciej
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 14:51 ` Ingo Molnar
2004-12-16 15:08 ` Maciej W. Rozycki
@ 2004-12-16 15:54 ` Linus Torvalds
1 sibling, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2004-12-16 15:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: Alan Cox, Lee Revell, Andrea Arcangeli, Manfred Spraul,
Zwane Mwaikambo, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
On Thu, 16 Dec 2004, Ingo Molnar wrote:
>
> i also played a bit with the %ss instructions, and combined them with
> the cli/sti instructions and other instructions in various ways, and
> with a bit of experimenting found the following, somewhat surprising
> results:
>
> [ snip ]
>
> it shows a number of interesting effects:
>
> - "mov %eax,%ss" followed by the _same_ instruction cancels the
> black-hole. This i suspect is done to prevent the lockup in vm86
> mode.
I don't think it's the "same instruction". Looking at the pattern, I think
that a "mov->ss" always checks interrupts _before_ it executes, and never
checks interrupts _after_ it executes.
So I think the pattern is (for your athlon64):
- regular instructions check interrupts before they execute, _except_ if
the "dontcheck" flag was set. They clear "dontcheck" after execution.
- "mov->ss" always checks interrupts before it executes, regardless of
"dontcheck". It always sets "dontcheck".
- "sti" sets "dontcheck" if interrupts were disabled before.
So you can get two-instruction holes by doing the sequence
/* interrupts disabled */
mov->ss
sti
/* any instruction except cli/mov->ss */
but no other combination (series of "mov->ss" will always check _before_
each "mov->ss", and series of "sti" will obviously only have interrupts
disabled for the _first_ sti).
And I suspect this is very much micro-architecture-dependent, although the
Athlon64 rules seem very simple and straightforward.
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2004-12-16 15:57 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-09 23:59 RCU question George Anzinger
2004-12-10 4:31 ` Dipankar Sarma
2004-12-10 19:42 ` George Anzinger
2004-12-10 20:40 ` Dipankar Sarma
2004-12-10 20:45 ` Lee Revell
2004-12-10 21:02 ` George Anzinger
2004-12-10 22:58 ` Zwane Mwaikambo
2004-12-11 2:22 ` George Anzinger
2004-12-11 2:45 ` Zwane Mwaikambo
2004-12-11 3:29 ` George Anzinger
2004-12-11 14:52 ` Zwane Mwaikambo
2004-12-11 16:32 ` Manfred Spraul
2004-12-11 16:52 ` George Anzinger
2004-12-12 2:53 ` Zwane Mwaikambo
2004-12-12 8:59 ` Manfred Spraul
2004-12-12 9:37 ` Andrea Arcangeli
2004-12-12 10:22 ` Manfred Spraul
2004-12-12 12:15 ` Andrea Arcangeli
2004-12-14 21:40 ` Lee Revell
2004-12-14 22:23 ` [patch, 2.6.10-rc3] safe_hlt() & NMIs Ingo Molnar
2004-12-14 22:47 ` Ingo Molnar
2004-12-14 23:09 ` Linus Torvalds
2004-12-15 8:52 ` Ingo Molnar
2004-12-15 15:44 ` Linus Torvalds
2004-12-15 16:35 ` Ingo Molnar
2004-12-16 0:37 ` Alan Cox
2004-12-16 1:58 ` Linus Torvalds
2004-12-16 14:51 ` Ingo Molnar
2004-12-16 15:08 ` Maciej W. Rozycki
2004-12-16 15:11 ` Ingo Molnar
2004-12-16 15:42 ` Maciej W. Rozycki
2004-12-16 15:54 ` Linus Torvalds
2004-12-16 2:10 ` Zwane Mwaikambo
2004-12-16 13:26 ` Alan Cox
2004-12-14 23:41 ` Andrea Arcangeli
2004-12-14 23:00 ` Linus Torvalds
2004-12-15 5:04 ` Andi Kleen
2004-12-15 6:27 ` Avi Kivity
2004-12-15 8:51 ` Ingo Molnar
2004-12-12 16:51 ` RCU question George Anzinger
2004-12-12 22:40 ` Manfred Spraul
2004-12-13 5:22 ` George Anzinger
2004-12-12 16:26 ` Zwane Mwaikambo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).