* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
@ 2004-12-17 23:35 Chuck Ebbert
0 siblings, 0 replies; 21+ messages in thread
From: Chuck Ebbert @ 2004-12-17 23:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel, Andi Kleen, Andrew Morton
On Tue, 14 Dec 2004 at 15:00:56 -0800 (PST) Linus Torvalds wrote:
> Checking for kernel CS also requires checking that it's not vm86 mode,
> btw. So that's not just a "regs->xcs & 0xffff == __KERNEL_CS" either.
>
> But something like
>
> static inline int kernel_mode(struct pt_regs *regs)
> {
> return !((regs->eflags & VM_MASK) | (regs->xcs & 3));
> }
>
> should DTRT.
>
> Can you pls double-check my thinking, and test?
There is already a user_mode() macro in asm-i386/ptrace.h but it's slow.
x86_64's macro is ugly but at least there's no logical-or in it.
Your i386 code is better, so how about applying this patch (boots/runs/is faster
on my tests):
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
--- linux-2.6.9.1/include/asm-i386/ptrace.h 2004-10-19 15:28:18.000000000 -0400
+++ linux-2.6.9.2/include/asm-i386/ptrace.h 2004-12-17 16:59:39.956099664 -0500
@@ -55,7 +55,11 @@ struct pt_regs {
#define PTRACE_SET_THREAD_AREA 26
#ifdef __KERNEL__
-#define user_mode(regs) ((VM_MASK & (regs)->eflags) || (3 & (regs)->xcs))
+static inline int kernel_mode(struct pt_regs *regs)
+{
+ return !((3 & (regs)->xcs) | (VM_MASK & (regs)->eflags));
+}
+#define user_mode(regs) (!kernel_mode(regs))
#define instruction_pointer(regs) ((regs)->eip)
#if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER)
extern unsigned long profile_pc(struct pt_regs *regs);
--
Please take it as a sign of my infinite respect for you,
that I insist on you doing all the work.
-- Rusty Russell
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 14:51 ` Ingo Molnar
2004-12-16 15:08 ` Maciej W. Rozycki
@ 2004-12-16 15:54 ` Linus Torvalds
1 sibling, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2004-12-16 15:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: Alan Cox, Lee Revell, Andrea Arcangeli, Manfred Spraul,
Zwane Mwaikambo, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
On Thu, 16 Dec 2004, Ingo Molnar wrote:
>
> i also played a bit with the %ss instructions, and combined them with
> the cli/sti instructions and other instructions in various ways, and
> with a bit of experimenting found the following, somewhat surprising
> results:
>
> [ snip ]
>
> it shows a number of interesting effects:
>
> - "mov %eax,%ss" followed by the _same_ instruction cancels the
> black-hole. This i suspect is done to prevent the lockup in vm86
> mode.
I don't think it's the "same instruction". Looking at the pattern, I think
that a "mov->ss" always checks interrupts _before_ it executes, and never
checks interrupts _after_ it executes.
So I think the pattern is (for your athlon64):
- regular instructions check interrupts before they execute, _except_ if
the "dontcheck" flag was set. They clear "dontcheck" after execution.
- "mov->ss" always checks interrupts before it executes, regardless of
"dontcheck". It always sets "dontcheck".
- "sti" sets "dontcheck" if interrupts were disabled before.
So you can get two-instruction holes by doing the sequence
/* interrupts disabled */
mov->ss
sti
/* any instruction except cli/mov->ss */
but no other combination (series of "mov->ss" will always check _before_
each "mov->ss", and series of "sti" will obviously only have interrupts
disabled for the _first_ sti).
And I suspect this is very much micro-architecture-dependent, although the
Athlon64 rules seem very simple and straightforward.
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 15:11 ` Ingo Molnar
@ 2004-12-16 15:42 ` Maciej W. Rozycki
0 siblings, 0 replies; 21+ messages in thread
From: Maciej W. Rozycki @ 2004-12-16 15:42 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Alan Cox, Lee Revell, Andrea Arcangeli,
Manfred Spraul, Zwane Mwaikambo, George Anzinger, dipankar,
ganzinger, lkml, Andrew Morton, Andi Kleen
On Thu, 16 Dec 2004, Ingo Molnar wrote:
> The 'sti' "shadows" the cli, i.e. we'll never get an interrupt that gets
> inbetween 'sti;cli'. I.e. sti is the black-hole generator, and 'cli' is
> in the black hole. In that sense the 'cli' is in a black hole to the
> NMI: the NMI will never see cli as the 'next to be executed'
> instruction.
That's what I meant indeed, but I'd like to emphasise, for readers to be
aware, the black hole is not tied to the 'cli' instruction itself in any
way. The black-holed instruction needs not be a 'cli' -- it can be an
arbitrary one except from ones creating black holes as you've observed
with your test.
Maciej
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 15:08 ` Maciej W. Rozycki
@ 2004-12-16 15:11 ` Ingo Molnar
2004-12-16 15:42 ` Maciej W. Rozycki
0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2004-12-16 15:11 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: Linus Torvalds, Alan Cox, Lee Revell, Andrea Arcangeli,
Manfred Spraul, Zwane Mwaikambo, George Anzinger, dipankar,
ganzinger, lkml, Andrew Morton, Andi Kleen
* Maciej W. Rozycki <macro@linux-mips.org> wrote:
> On Thu, 16 Dec 2004, Ingo Molnar wrote:
>
> > c0125ee9: 1529 fa cli
> > ^---------------------------------- # of profiler hits
> > c0125eea: 507 fb sti
> > c0125eeb: 0 fa cli
> > c0125eec: 3719 fb sti
> > c0125eed: 0 fa cli
> > c0125eee: 1579 fb sti
> > c0125eef: 0 fa cli
> > c0125ef0: 3317 fb sti
> > c0125ef1: 0 fa cli
> > c0125ef2: 3030 fb sti
> > c0125ef3: 0 fa cli
> > c0125ef4: 2497 fa cli
> > c0125ef5: 1055 fb sti
> > c0125ef6: 0 fa cli
> [...]
> > the 'cli' is always a 'black hole' to the NMI, while the second of two
> > consecutive cli's are not.
>
> It looks like the 'sti' is actually the black hole -- remember
> interrupts are traps, that is they are probed for and taken after
> instruction execution.
The 'sti' "shadows" the cli, i.e. we'll never get an interrupt that gets
inbetween 'sti;cli'. I.e. sti is the black-hole generator, and 'cli' is
in the black hole. In that sense the 'cli' is in a black hole to the
NMI: the NMI will never see cli as the 'next to be executed'
instruction.
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 14:51 ` Ingo Molnar
@ 2004-12-16 15:08 ` Maciej W. Rozycki
2004-12-16 15:11 ` Ingo Molnar
2004-12-16 15:54 ` Linus Torvalds
1 sibling, 1 reply; 21+ messages in thread
From: Maciej W. Rozycki @ 2004-12-16 15:08 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Alan Cox, Lee Revell, Andrea Arcangeli,
Manfred Spraul, Zwane Mwaikambo, George Anzinger, dipankar,
ganzinger, lkml, Andrew Morton, Andi Kleen
On Thu, 16 Dec 2004, Ingo Molnar wrote:
> c0125ee9: 1529 fa cli
> ^---------------------------------- # of profiler hits
> c0125eea: 507 fb sti
> c0125eeb: 0 fa cli
> c0125eec: 3719 fb sti
> c0125eed: 0 fa cli
> c0125eee: 1579 fb sti
> c0125eef: 0 fa cli
> c0125ef0: 3317 fb sti
> c0125ef1: 0 fa cli
> c0125ef2: 3030 fb sti
> c0125ef3: 0 fa cli
> c0125ef4: 2497 fa cli
> c0125ef5: 1055 fb sti
> c0125ef6: 0 fa cli
[...]
> the 'cli' is always a 'black hole' to the NMI, while the second of two
> consecutive cli's are not.
It looks like the 'sti' is actually the black hole -- remember interrupts
are traps, that is they are probed for and taken after instruction
execution.
Maciej
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 1:58 ` Linus Torvalds
@ 2004-12-16 14:51 ` Ingo Molnar
2004-12-16 15:08 ` Maciej W. Rozycki
2004-12-16 15:54 ` Linus Torvalds
0 siblings, 2 replies; 21+ messages in thread
From: Ingo Molnar @ 2004-12-16 14:51 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, Lee Revell, Andrea Arcangeli, Manfred Spraul,
Zwane Mwaikambo, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
* Linus Torvalds <torvalds@osdl.org> wrote:
> The irq window should actually be open every alternate instruction, I
> think. Although it's not actually architected, and I thought that
> there was some errata for some CPU about this..
i have generated an instruction-granularity profile of kernel code
executing the following sequence, driven by the NMI watchdog interrupt:
asm ("cli; cli; sti; cli; sti; cli; sti; cli; sti; cli; sti; ");
asm ("cli; cli; sti; cli; sti; cli; sti; cli; sti; cli; sti; ");
asm ("cli; cli; sti; cli; sti; cli; sti; cli; sti; cli; sti; ");
the first CLI is done twice, to prove that the NMI profiling works and
that the kernel can be interrupted in those places. Then i called this
kernel code in a loop. Here's the result:
c0125ee9: 1529 fa cli
^---------------------------------- # of profiler hits
c0125eea: 507 fb sti
c0125eeb: 0 fa cli
c0125eec: 3719 fb sti
c0125eed: 0 fa cli
c0125eee: 1579 fb sti
c0125eef: 0 fa cli
c0125ef0: 3317 fb sti
c0125ef1: 0 fa cli
c0125ef2: 3030 fb sti
c0125ef3: 0 fa cli
c0125ef4: 2497 fa cli
c0125ef5: 1055 fb sti
c0125ef6: 0 fa cli
c0125ef7: 4674 fb sti
c0125ef8: 0 fa cli
c0125ef9: 3827 fb sti
c0125efa: 0 fa cli
c0125efb: 1622 fb sti
c0125efc: 0 fa cli
c0125efd: 3155 fb sti
c0125efe: 0 fa cli
c0125eff: 1273 fa cli
c0125f00: 512 fb sti
c0125f01: 0 fa cli
c0125f02: 1312 fb sti
c0125f03: 0 fa cli
c0125f04: 1426 fb sti
c0125f05: 0 fa cli
c0125f06: 1507 fb sti
c0125f07: 0 fa cli
c0125f08: 2720 fb sti
c0125f09: 0 fa cli
c0125f0a: 2469 fa cli
c0125f0b: 787 fb sti
c0125f0c: 0 fa cli
c0125f0d: 2085 fb sti
c0125f0e: 0 fa cli
the 'cli' is always a 'black hole' to the NMI, while the second of two
consecutive cli's are not.
i also played a bit with the %ss instructions, and combined them with
the cli/sti instructions and other instructions in various ways, and
with a bit of experimenting found the following, somewhat surprising
results:
c0125f33: 1016 66 8c d0 mov %ss,%ax
c0125f36: 6626 8e d0 mov %eax,%ss
c0125f38: 34715 8e d0 mov %eax,%ss
c0125f3a: 14682 8e d0 mov %eax,%ss
c0125f3c: 4521 8e d0 mov %eax,%ss
c0125f3e: 7564 8e d0 mov %eax,%ss
c0125f40: 3861 66 8e d0 mov %ax,%ss
c0125f43: 0 66 8c d1 mov %ss,%cx
c0125f46: 1061 66 8c da mov %ds,%dx
c0125f49: 7660 8e d1 mov %ecx,%ss
c0125f4b: 11322 17 pop %ss
c0125f4c: 0 fb sti
c0125f4d: 8935 8e d1 mov %ecx,%ss
c0125f4f: 0 fa cli
c0125f50: 2198 66 8c d1 mov %ss,%cx
c0125f53: 735 66 8c da mov %ds,%dx
c0125f56: 0 8e da mov %edx,%ds
c0125f58: 6400 8e d0 mov %eax,%ss
c0125f5a: 3062 8e d0 mov %eax,%ss
c0125f5c: 3552 8e d0 mov %eax,%ss
c0125f5e: 4818 8e d0 mov %eax,%ss
c0125f60: 0 fb sti
c0125f61: 0 66 8c da mov %ds,%dx
c0125f64: 17788 8e d0 mov %eax,%ss
c0125f66: 64694 8e d0 mov %eax,%ss
c0125f68: 12837 8e d0 mov %eax,%ss
c0125f6a: 9859 8e d0 mov %eax,%ss
c0125f6c: 0 fb sti
c0125f6d: 74506 8e d0 mov %eax,%ss
c0125f6f: 0 fb sti
c0125f70: 8589 fa cli
c0125f71: 10248 8e d0 mov %eax,%ss
c0125f73: 3825 8e d0 mov %eax,%ss
c0125f75: 4903 8e d0 mov %eax,%ss
c0125f77: 71134 8e d0 mov %eax,%ss
c0125f79: 0 fb sti
c0125f7a: 0 fa cli
c0125f7b: 7461 8e d0 mov %eax,%ss
c0125f7d: 0 66 8c d0 mov %ss,%ax
c0125f80: 39387 8e d0 mov %eax,%ss
c0125f82: 0 fa cli
c0125f83: 41484 8e d0 mov %eax,%ss
c0125f85: 0 fa cli
c0125f86: 4490 8e d0 mov %eax,%ss
c0125f88: 0 fa cli
c0125f89: 6024 8e d0 mov %eax,%ss
c0125f8b: 15454 8e d0 mov %eax,%ss
c0125f8d: 0 fb sti
c0125f8e: 0 fb sti
c0125f8f: 115104 fb sti
c0125f90: 39061 fb sti
it shows a number of interesting effects:
- "mov %eax,%ss" followed by the _same_ instruction cancels the
black-hole. This i suspect is done to prevent the lockup in vm86
mode.
- an %ss black-hole instruction followed by 'sti' cancels sti's
black-hole. This is unlikely to occur in real kernel code, but we
might want to add a 'nop' in front of safe_halt()'s sti, to make sure
the black-hole takes effect.
- in one case a two-instruction blackhole was created - but this might
be some prefetch effect.
i played around with the instructions a bit to manufacture combinations
that enlengthen the black-hole but failed :) This was on an Athlon64.
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 2:10 ` Zwane Mwaikambo
@ 2004-12-16 13:26 ` Alan Cox
0 siblings, 0 replies; 21+ messages in thread
From: Alan Cox @ 2004-12-16 13:26 UTC (permalink / raw)
To: Zwane Mwaikambo
Cc: Linus Torvalds, Ingo Molnar, Lee Revell, Andrea Arcangeli,
Manfred Spraul, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
On Iau, 2004-12-16 at 02:10, Zwane Mwaikambo wrote:
> Might this be because you can't rely on interrupt suppression for back to
> back suppressing instructions?
The documentation seems to have little to say on this. I've also not
tried things like interleaved mov->ss, sti to see how the interlocking
is done. It would make sense given the original 8086 reason was to allow
ss/sp to be loaded cleanly.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 0:37 ` Alan Cox
2004-12-16 1:58 ` Linus Torvalds
@ 2004-12-16 2:10 ` Zwane Mwaikambo
2004-12-16 13:26 ` Alan Cox
1 sibling, 1 reply; 21+ messages in thread
From: Zwane Mwaikambo @ 2004-12-16 2:10 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, Ingo Molnar, Lee Revell, Andrea Arcangeli,
Manfred Spraul, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
On Thu, 16 Dec 2004, Alan Cox wrote:
> On Maw, 2004-12-14 at 23:09, Linus Torvalds wrote:
> > On Tue, 14 Dec 2004, Ingo Molnar wrote:
> > Now that you mention it, I have this dim memory of the one-instruction
> > "sti-shadow" actually disabling NMI's (and debug traps) too. The CPU
> > literally doesn't test for async events following "sti".
> >
> > Or maybe that was "mov->ss". That one also has that strange "black hole"
> > for one instruction.
>
> The mov to ss one is a bit more magic than that however. If you write
> 3Gb of mov->ss into memory (ie about 64 pages to thrash the cache and
> slow it plus mmap repeatedly) and run it you don't get a vastly long irq
> delay at least on intel, not tried the others.
Might this be because you can't rely on interrupt suppression for back to
back suppressing instructions?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-16 0:37 ` Alan Cox
@ 2004-12-16 1:58 ` Linus Torvalds
2004-12-16 14:51 ` Ingo Molnar
2004-12-16 2:10 ` Zwane Mwaikambo
1 sibling, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2004-12-16 1:58 UTC (permalink / raw)
To: Alan Cox
Cc: Ingo Molnar, Lee Revell, Andrea Arcangeli, Manfred Spraul,
Zwane Mwaikambo, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
On Thu, 16 Dec 2004, Alan Cox wrote:
> On Maw, 2004-12-14 at 23:09, Linus Torvalds wrote:
> > On Tue, 14 Dec 2004, Ingo Molnar wrote:
> > Now that you mention it, I have this dim memory of the one-instruction
> > "sti-shadow" actually disabling NMI's (and debug traps) too. The CPU
> > literally doesn't test for async events following "sti".
> >
> > Or maybe that was "mov->ss". That one also has that strange "black hole"
> > for one instruction.
>
> The mov to ss one is a bit more magic than that however. If you write
> 3Gb of mov->ss into memory (ie about 64 pages to thrash the cache and
> slow it plus mmap repeatedly) and run it you don't get a vastly long irq
> delay at least on intel, not tried the others.
The irq window should actually be open every alternate instruction, I
think. Although it's not actually architected, and I thought that there
was some errata for some CPU about this..
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 23:09 ` Linus Torvalds
2004-12-15 8:52 ` Ingo Molnar
@ 2004-12-16 0:37 ` Alan Cox
2004-12-16 1:58 ` Linus Torvalds
2004-12-16 2:10 ` Zwane Mwaikambo
1 sibling, 2 replies; 21+ messages in thread
From: Alan Cox @ 2004-12-16 0:37 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Lee Revell, Andrea Arcangeli, Manfred Spraul,
Zwane Mwaikambo, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
On Maw, 2004-12-14 at 23:09, Linus Torvalds wrote:
> On Tue, 14 Dec 2004, Ingo Molnar wrote:
> Now that you mention it, I have this dim memory of the one-instruction
> "sti-shadow" actually disabling NMI's (and debug traps) too. The CPU
> literally doesn't test for async events following "sti".
>
> Or maybe that was "mov->ss". That one also has that strange "black hole"
> for one instruction.
The mov to ss one is a bit more magic than that however. If you write
3Gb of mov->ss into memory (ie about 64 pages to thrash the cache and
slow it plus mmap repeatedly) and run it you don't get a vastly long irq
delay at least on intel, not tried the others.
Alan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-15 15:44 ` Linus Torvalds
@ 2004-12-15 16:35 ` Ingo Molnar
0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2004-12-15 16:35 UTC (permalink / raw)
To: Linus Torvalds
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Andi Kleen
* Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Wed, 15 Dec 2004, Ingo Molnar wrote:
> >
> > i ran the stresstest overnight with the 10 KHz NMI, and not a single
> > time did the new branch trigger, out of hundreds of millions of IRQs and
> > NMIs. I think this suggests that the race doesnt exist in current CPUs.
>
> That may well be true, but I'm not convinced your test is meaningful
> or shows anything.
>
> The thing is, either the CPU is busy, or it's idle. If it's busy,
> you'll never see this. And if it's idle, it will always be _in_ the
> "halt" instruction.
i deliberately started a test where there was roughly 50% idle time.
> The only way to see the case is in the borderline cases, and if/when
> there are multiple different interrupts (first non-NMI interrupt takes
> it out of the hlt, and then the NMI happens to catch the sti). And
> quite frankly, I don't see how you would stress-test it. A 1kHz timer
> interrupt with a 10kHz NMI interrupt is still very infrequent
> interrupts...
i started an infinite loop that generated disk IRQs, and started a
network test that generated network IRQs. The IRQ rate was roughly
10K/sec - this combined with the 10K/sec NMI rate should be an adequate
mix. (I also made sure that it's really default_idle that is used.)
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-15 8:52 ` Ingo Molnar
@ 2004-12-15 15:44 ` Linus Torvalds
2004-12-15 16:35 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2004-12-15 15:44 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Andi Kleen
On Wed, 15 Dec 2004, Ingo Molnar wrote:
>
> i ran the stresstest overnight with the 10 KHz NMI, and not a single
> time did the new branch trigger, out of hundreds of millions of IRQs and
> NMIs. I think this suggests that the race doesnt exist in current CPUs.
That may well be true, but I'm not convinced your test is meaningful or
shows anything.
The thing is, either the CPU is busy, or it's idle. If it's busy, you'll
never see this. And if it's idle, it will always be _in_ the "halt"
instruction.
The only way to see the case is in the borderline cases, and if/when there
are multiple different interrupts (first non-NMI interrupt takes it out of
the hlt, and then the NMI happens to catch the sti). And quite frankly, I
don't see how you would stress-test it. A 1kHz timer interrupt with a
10kHz NMI interrupt is still very infrequent interrupts...
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 23:09 ` Linus Torvalds
@ 2004-12-15 8:52 ` Ingo Molnar
2004-12-15 15:44 ` Linus Torvalds
2004-12-16 0:37 ` Alan Cox
1 sibling, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2004-12-15 8:52 UTC (permalink / raw)
To: Linus Torvalds
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Andi Kleen
* Linus Torvalds <torvalds@osdl.org> wrote:
> > find the correct patch below. I've tested it with an NMI watchdog
> > frequency artificially increased to 10 KHz, and i've instrumented the
> > new branch in the NMI handler, but even under heavy IRQ load i was not
> > able to trigger the branch. Maybe newer CPUs handle this case somehow
> > and make sti;hlt truly atomic?
>
> Now that you mention it, I have this dim memory of the one-instruction
> "sti-shadow" actually disabling NMI's (and debug traps) too. The CPU
> literally doesn't test for async events following "sti".
i ran the stresstest overnight with the 10 KHz NMI, and not a single
time did the new branch trigger, out of hundreds of millions of IRQs and
NMIs. I think this suggests that the race doesnt exist in current CPUs.
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-15 6:27 ` Avi Kivity
@ 2004-12-15 8:51 ` Ingo Molnar
0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2004-12-15 8:51 UTC (permalink / raw)
To: Avi Kivity
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Linus Torvalds, Andi Kleen
* Avi Kivity <avi@argo.co.il> wrote:
> Ingo Molnar wrote:
>
> >+ if (__kernel_text_address(regs->eip) && *(char *)regs->eip == 0xf4)
> >
> >
> shouldn't that cast be (unsigned char *), otherwise the test will
> always fail?
yes, i fixed this in the second patch. (the compiler warned about it
too)
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 22:23 ` [patch, 2.6.10-rc3] safe_hlt() & NMIs Ingo Molnar
2004-12-14 22:47 ` Ingo Molnar
2004-12-14 23:00 ` Linus Torvalds
@ 2004-12-15 6:27 ` Avi Kivity
2004-12-15 8:51 ` Ingo Molnar
2 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2004-12-15 6:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Linus Torvalds, Andi Kleen
Ingo Molnar wrote:
>+ if (__kernel_text_address(regs->eip) && *(char *)regs->eip == 0xf4)
>
>
shouldn't that cast be (unsigned char *), otherwise the test will always
fail?
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 23:00 ` Linus Torvalds
@ 2004-12-15 5:04 ` Andi Kleen
0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2004-12-15 5:04 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Lee Revell, Andrea Arcangeli, Manfred Spraul,
Zwane Mwaikambo, George Anzinger, dipankar, ganzinger, lkml,
Andrew Morton, Andi Kleen
> But something like
>
> static inline int kernel_mode(struct pt_regs *regs)
> {
> return !((regs->eflags & VM_MASK) | (regs->xcs & 3));
> }
>
> should DTRT.
>
> Can you pls double-check my thinking, and test?
Reasoning looks correct to me.
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 22:47 ` Ingo Molnar
2004-12-14 23:09 ` Linus Torvalds
@ 2004-12-14 23:41 ` Andrea Arcangeli
1 sibling, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2004-12-14 23:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, Manfred Spraul, Zwane Mwaikambo, George Anzinger,
dipankar, ganzinger, lkml, Andrew Morton, Linus Torvalds,
Andi Kleen
On Tue, Dec 14, 2004 at 11:47:06PM +0100, Ingo Molnar wrote:
> find the correct patch below. I've tested it with an NMI watchdog
> frequency artificially increased to 10 KHz, and i've instrumented the
Nice test, it'd be nice to trigger it in real life.
on the lines of the 64k movl ss, I wonder if we could create an huge
piece of memory like this:
new_htl:
cli
sti
htl
cli
sti
htl
[..]
jmp original_hlt
and to call new_htl from original_hlt instead of sti;hlt. A dozen megs
of the above should boost the probability of getting interrupted in
"hlt" quite a bit.
However even if the nmi can execute on top of the "hlt" instruction, it
doesn't necessairly mean the next pending irq will execute before
executing 'hlt' too, so it'd need a bit more of instrumentation to as
well track down the race as happening (it's not enough to see the branch
in the nmi handler to be taken). The additional instrumentation should
be quite easy though, just copying the same nmi code to the irq handler
should do the trick.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 22:47 ` Ingo Molnar
@ 2004-12-14 23:09 ` Linus Torvalds
2004-12-15 8:52 ` Ingo Molnar
2004-12-16 0:37 ` Alan Cox
2004-12-14 23:41 ` Andrea Arcangeli
1 sibling, 2 replies; 21+ messages in thread
From: Linus Torvalds @ 2004-12-14 23:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Andi Kleen
On Tue, 14 Dec 2004, Ingo Molnar wrote:
>
> find the correct patch below. I've tested it with an NMI watchdog
> frequency artificially increased to 10 KHz, and i've instrumented the
> new branch in the NMI handler, but even under heavy IRQ load i was not
> able to trigger the branch. Maybe newer CPUs handle this case somehow
> and make sti;hlt truly atomic?
Now that you mention it, I have this dim memory of the one-instruction
"sti-shadow" actually disabling NMI's (and debug traps) too. The CPU
literally doesn't test for async events following "sti".
Or maybe that was "mov->ss". That one also has that strange "black hole"
for one instruction.
Hmm.. You could be evil and try to fill up 64kB worth of memory with a
"mov %ax,%ss", and jump to it in vm86 mode and see what happens. The eip
will just keep wrapping around...
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 22:23 ` [patch, 2.6.10-rc3] safe_hlt() & NMIs Ingo Molnar
2004-12-14 22:47 ` Ingo Molnar
@ 2004-12-14 23:00 ` Linus Torvalds
2004-12-15 5:04 ` Andi Kleen
2004-12-15 6:27 ` Avi Kivity
2 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2004-12-14 23:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lee Revell, Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Andi Kleen
On Tue, 14 Dec 2004, Ingo Molnar wrote:
>
> indeed, there could be a connection, and it's certainly a fun race. The
> proper fix is Manfred's suggestion: check whether the EIP is a kernel
> text address, and if yes, whether it's a HLT instruction - and if yes
> then increase EIP by 1.
You do it the wrong way, though. This is not safe:
if (__kernel_text_address(regs->eip) && *(char *)regs->eip == 0xf4)
does _entirely_ the wrong thing if CS is not the kernel CS.
It can trigger with a regular use CS if you were to run the 4G:4G patches,
but more realistically, I think you can make ii trigger even with a
standard kernel by creating a local code segment in your LDT, and then
trying to confuse the kernel that way.
Now, as long as the _only_ thing it does is increment the eip, the worst
that can happen is that it screws over the user program that must have
worked at this a bit, but the basic point is that you shouldn't do this.
In _theory_ you could confuse a real program that wasn't doing anything
bad.
Checking for kernel CS also requires checking that it's not vm86 mode,
btw. So that's not just a "regs->xcs & 0xffff == __KERNEL_CS" either.
But something like
static inline int kernel_mode(struct pt_regs *regs)
{
return !((regs->eflags & VM_MASK) | (regs->xcs & 3));
}
should DTRT.
Can you pls double-check my thinking, and test?
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 22:23 ` [patch, 2.6.10-rc3] safe_hlt() & NMIs Ingo Molnar
@ 2004-12-14 22:47 ` Ingo Molnar
2004-12-14 23:09 ` Linus Torvalds
2004-12-14 23:41 ` Andrea Arcangeli
2004-12-14 23:00 ` Linus Torvalds
2004-12-15 6:27 ` Avi Kivity
2 siblings, 2 replies; 21+ messages in thread
From: Ingo Molnar @ 2004-12-14 22:47 UTC (permalink / raw)
To: Lee Revell
Cc: Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Linus Torvalds, Andi Kleen
* Ingo Molnar <mingo@elte.hu> wrote:
> indeed, there could be a connection, and it's certainly a fun race.
> The proper fix is Manfred's suggestion: check whether the EIP is a
> kernel text address, and if yes, whether it's a HLT instruction - and
> if yes then increase EIP by 1. I've included the fix in the -33-02 -RT
> patch. Andrew, Linus: upstream fix is below - i think it's post-2.6.10
> stuff. Tested it on SMP and UP x86, using both the IO-APIC and the
> local-APIC based NMI watchdog.
>
> i think x64 needs a similar fix as well.
find the correct patch below. I've tested it with an NMI watchdog
frequency artificially increased to 10 KHz, and i've instrumented the
new branch in the NMI handler, but even under heavy IRQ load i was not
able to trigger the branch. Maybe newer CPUs handle this case somehow
and make sti;hlt truly atomic? I tried this on an old Celeron
(Mendocino) and on an Athlon64.
Ingo
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--- linux/arch/i386/kernel/traps.c.orig
+++ linux/arch/i386/kernel/traps.c
@@ -670,6 +670,18 @@ fastcall void do_nmi(struct pt_regs * re
cpu = smp_processor_id();
+ /*
+ * Fix up obscure CPU behavior: if we interrupt safe_hlt() via
+ * the NMI then we might miss a reschedule if an interrupt is
+ * posted to the CPU and executes before the HLT instruction.
+ *
+ * We check whether the EIP is kernelspace, and if yes, whether
+ * the instruction is HLT:
+ */
+ if (__kernel_text_address(regs->eip) &&
+ *(unsigned char *)regs->eip == 0xf4)
+ regs->eip++;
+
#ifdef CONFIG_HOTPLUG_CPU
if (!cpu_online(cpu)) {
nmi_exit();
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch, 2.6.10-rc3] safe_hlt() & NMIs
2004-12-14 21:40 ` Lee Revell
@ 2004-12-14 22:23 ` Ingo Molnar
2004-12-14 22:47 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Ingo Molnar @ 2004-12-14 22:23 UTC (permalink / raw)
To: Lee Revell
Cc: Andrea Arcangeli, Manfred Spraul, Zwane Mwaikambo,
George Anzinger, dipankar, ganzinger, lkml, Andrew Morton,
Linus Torvalds, Andi Kleen
* Lee Revell <rlrevell@joe-job.com> wrote:
> On Sun, 2004-12-12 at 13:15 +0100, Andrea Arcangeli wrote:
> > Overall this is a very minor issue (unless HZ is 0), it would only
> > introduce a 1/HZ latency to the irq that get posted while the nmi
> > handler is running, and the nmi handlers never runs in production.
>
> Ingo, couldn't this account for some of the inexplicable outliers some
> people were seeing in latency tests?
indeed, there could be a connection, and it's certainly a fun race. The
proper fix is Manfred's suggestion: check whether the EIP is a kernel
text address, and if yes, whether it's a HLT instruction - and if yes
then increase EIP by 1. I've included the fix in the -33-02 -RT patch.
Andrew, Linus: upstream fix is below - i think it's post-2.6.10 stuff.
Tested it on SMP and UP x86, using both the IO-APIC and the local-APIC
based NMI watchdog.
i think x64 needs a similar fix as well.
Ingo
--- linux/arch/i386/kernel/traps.c.orig
+++ linux/arch/i386/kernel/traps.c
@@ -670,6 +670,17 @@ fastcall void do_nmi(struct pt_regs * re
cpu = smp_processor_id();
+ /*
+ * Fix up obscure CPU behavior: if we interrupt safe_hlt() via
+ * the NMI then we might miss a reschedule if an interrupt is
+ * posted to the CPU and executes before the HLT instruction.
+ *
+ * We check whether the EIP is kernelspace, and if yes, whether
+ * the instruction is HLT:
+ */
+ if (__kernel_text_address(regs->eip) && *(char *)regs->eip == 0xf4)
+ regs->eip++;
+
#ifdef CONFIG_HOTPLUG_CPU
if (!cpu_online(cpu)) {
nmi_exit();
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2004-12-17 23:38 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-17 23:35 [patch, 2.6.10-rc3] safe_hlt() & NMIs Chuck Ebbert
-- strict thread matches above, loose matches on Subject: below --
2004-12-11 3:29 RCU question George Anzinger
2004-12-11 14:52 ` Zwane Mwaikambo
2004-12-11 16:32 ` Manfred Spraul
2004-12-11 16:52 ` George Anzinger
2004-12-12 2:53 ` Zwane Mwaikambo
2004-12-12 8:59 ` Manfred Spraul
2004-12-12 9:37 ` Andrea Arcangeli
2004-12-12 10:22 ` Manfred Spraul
2004-12-12 12:15 ` Andrea Arcangeli
2004-12-14 21:40 ` Lee Revell
2004-12-14 22:23 ` [patch, 2.6.10-rc3] safe_hlt() & NMIs Ingo Molnar
2004-12-14 22:47 ` Ingo Molnar
2004-12-14 23:09 ` Linus Torvalds
2004-12-15 8:52 ` Ingo Molnar
2004-12-15 15:44 ` Linus Torvalds
2004-12-15 16:35 ` Ingo Molnar
2004-12-16 0:37 ` Alan Cox
2004-12-16 1:58 ` Linus Torvalds
2004-12-16 14:51 ` Ingo Molnar
2004-12-16 15:08 ` Maciej W. Rozycki
2004-12-16 15:11 ` Ingo Molnar
2004-12-16 15:42 ` Maciej W. Rozycki
2004-12-16 15:54 ` Linus Torvalds
2004-12-16 2:10 ` Zwane Mwaikambo
2004-12-16 13:26 ` Alan Cox
2004-12-14 23:41 ` Andrea Arcangeli
2004-12-14 23:00 ` Linus Torvalds
2004-12-15 5:04 ` Andi Kleen
2004-12-15 6:27 ` Avi Kivity
2004-12-15 8:51 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).