* [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
@ 2003-09-30 7:38 Jamie Lokier
2003-09-30 8:01 ` Nick Piggin
2003-09-30 13:22 ` Dave Jones
0 siblings, 2 replies; 45+ messages in thread
From: Jamie Lokier @ 2003-09-30 7:38 UTC (permalink / raw)
To: akpm; +Cc: torvalds, linux-kernel, richard.brunner
This builds upon Andi Kleen's excellent patch for the AMD prefetch bug
workaround. It is applies to 2.6.0-test6.
There are four changes on top of Andi's recent patch:
1. The workaround code is only included when compiling a kernel
optimised for AMD processors with the bug. For kernels optimised
for a different processor, there is no code size overhead.
This will make some people happier.
It will make some people unhappier, because it means a generic
kernel on an AMD will run fine, but won't fixup _userspace_
prefetch faults. More on this below.
2. Nevertheless, when a kernel _not_ optimised for those AMD processors
is run on one, the "alternative" mechanism now correctly replaces
prefetch instructions with nops.
3. The is_prefetch() instruction decoder now handles all cases of
funny GDT and LDT segments, checks limits and so on. As far as I
can see, there are no further bugs or flaws in that part.
4. A consequence of the is_prefetch() change is that, despite the
longer C code (rarely used code, for segments), is_prefetch()
should run marginally faster now because the access_ok() calls are
not required. They're subsumed into the segment limit
calculation.
Like Andi's patch, this removes 10k or so from current x86 kernels by
removing the silly conditional from the prefetch() function.
Controversial point coming up!
It's a reasonable argument that if a kernel version, say 2.6.0, promises
to hide the errata from _userspace_ programs, then the generic or
not-AMD-optimised kernels should do that too. Otherwise userspace may
as well provide the workaround itself, by catching SIGSEGV - because it
is perfectly normal and common to run a generic kernel on an AMD.
If userspace has to do that, then there's no point in the kernel having
the fixup at all - it may as well use __ex_table entries for the
kernel-space prefetch instructions, and give up on hiding the processor
bug from userspace. That would be much simpler than any of the patches
so far.
Arguably the bug has been around for years already (all K7's), so the
rare userspace programs which use prefetch should be aware of this already.
Something to think about.
Here is Andi's comment from the patch this is derived from:
Here is a new version of the Athlon/Opteron prefetch issue workaround
for 2.6.0test6. The issue was hit regularly by the 2.6 kernel
while doing prefetches on NULL terminated hlists.
These CPUs sometimes generate illegal exception for prefetch
instructions. The operating system can work around this by checking
if the faulting instruction is a prefetch and ignoring it when
it's the case.
The code is structured carefully to ensure that the page fault
will never recurse more than once. Also unmapped EIPs are special
cased to give more friendly oopses when the kernel jumps to
unmapped addresses.
It removes the previous dumb in kernel workaround for this and shrinks the
kernel by >10k.
Small behaviour change is that a SIGBUS fault for a *_user access will
cause an EFAULT now, no SIGBUS.
I'm running this now, on my dual AMD Athlon, and it's working fine. (It
works when I comment out the fast path and force the full segment check,
before you ask ;)
Enjoy,
-- Jamie
diff -urN --exclude-from=dontdiff orig-2.6.0-test6/arch/i386/Kconfig dual-2.6.0-test6/arch/i386/Kconfig
--- orig-2.6.0-test6/arch/i386/Kconfig 2003-09-30 05:39:33.467103692 +0100
+++ dual-2.6.0-test6/arch/i386/Kconfig 2003-09-30 06:05:19.868623767 +0100
@@ -397,6 +397,12 @@
depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6
default y
+config X86_PREFETCH_FIXUP
+ bool
+ depends on MK7 || MK8
+ default y
+
+
config HPET_TIMER
bool "HPET Timer Support"
help
diff -urN --exclude-from=dontdiff orig-2.6.0-test6/arch/i386/kernel/cpu/common.c dual-2.6.0-test6/arch/i386/kernel/cpu/common.c
--- orig-2.6.0-test6/arch/i386/kernel/cpu/common.c 2003-09-30 05:39:33.871035696 +0100
+++ dual-2.6.0-test6/arch/i386/kernel/cpu/common.c 2003-09-30 06:03:46.222851345 +0100
@@ -327,6 +327,17 @@
* we do "generic changes."
*/
+ /* Prefetch works ok? */
+#ifndef CONFIG_X86_PREFETCH_FIXUP
+ if (c->x86_vendor != X86_VENDOR_AMD || c->x86 < 6)
+#endif
+ {
+ if (cpu_has(c, X86_FEATURE_XMM))
+ set_bit(X86_FEATURE_XMM_PREFETCH, c->x86_capability);
+ if (cpu_has(c, X86_FEATURE_3DNOW))
+ set_bit(X86_FEATURE_3DNOW_PREFETCH, c->x86_capability);
+ }
+
/* TSC disabled? */
if ( tsc_disable )
clear_bit(X86_FEATURE_TSC, c->x86_capability);
diff -urN --exclude-from=dontdiff orig-2.6.0-test6/arch/i386/mm/fault.c dual-2.6.0-test6/arch/i386/mm/fault.c
--- orig-2.6.0-test6/arch/i386/mm/fault.c 2003-07-08 21:31:09.000000000 +0100
+++ dual-2.6.0-test6/arch/i386/mm/fault.c 2003-09-30 06:03:46.284839268 +0100
@@ -55,6 +55,157 @@
console_loglevel = loglevel_save;
}
+#ifdef CONFIG_X86_PREFETCH_FIXUP
+/*
+ * Return EIP plus the CS segment base. The segment limit is also
+ * adjusted, clamped to the kernel/user address space (whichever is
+ * appropriate), and returned in *eip_limit.
+ *
+ * The segment is checked, because it might have been changed by another
+ * task between the original faulting instruction and here.
+ *
+ * If CS is no longer a valid code segment, or if EIP is beyond the
+ * limit, or if it is a kernel address when CS is not a kernel segment,
+ * then the returned value will be greater than *eip_limit.
+ */
+static inline unsigned long get_segment_eip(struct pt_regs *regs,
+ unsigned long *eip_limit)
+{
+ unsigned long eip = regs->eip;
+ unsigned seg = regs->xcs & 0xffff;
+ u32 seg_ar, seg_limit, base, *desc;
+
+ /* The standard kernel/user address space limit. */
+ *eip_limit = (seg & 3) ? USER_DS.seg : KERNEL_DS.seg;
+
+ /* Unlikely, but must come before segment checks. */
+ if (unlikely((regs->eflags & VM_MASK) != 0))
+ return eip + (seg << 4);
+
+ /* By far the commonest cases. */
+ if (likely(seg == __USER_CS || seg == __KERNEL_CS))
+ return eip;
+
+ /* Check the segment exists, is within the current LDT/GDT size,
+ that kernel/user (ring 0..3) has the appropriate privelege,
+ that it's a code segment, and get the limit. */
+ __asm__ ("larl %3,%0; lsll %3,%1"
+ : "=&r" (seg_ar), "=r" (seg_limit) : "0" (0), "rm" (seg));
+ if ((~seg_ar & 0x9800) || eip > seg_limit) {
+ *eip_limit = 0;
+ return 1; /* So that returned eip > *eip_limit. */
+ }
+
+ /* Get the GDT/LDT descriptor base. */
+ if (seg & (1<<2)) {
+ /* Must lock the LDT while reading it. */
+ down(¤t->mm->context.sem);
+ desc = current->mm->context.ldt;
+ } else {
+ /* Must disable preemption while reading the GDT. */
+ desc = (u32 *)&cpu_gdt_table[get_cpu()];
+ }
+ desc = (void *)desc + (seg & ~7);
+ base = (desc[0] >> 16) |
+ ((desc[1] & 0xff) << 16) |
+ (desc[1] & 0xff000000);
+ if (seg & (1<<2))
+ up(¤t->mm->context.sem);
+ else
+ put_cpu();
+
+ /* Adjust EIP and segment limit, and clamp at the kernel limit.
+ It's legitimate for segments to wrap at 0xffffffff. */
+ seg_limit += base;
+ if (seg_limit < *eip_limit && seg_limit >= base)
+ *eip_limit = seg_limit;
+ return eip + base;
+}
+
+/*
+ * Sometimes AMD Athlon/Opteron CPUs report invalid exceptions on prefetch.
+ * Check that here and ignore it.
+ */
+static int __is_prefetch(struct pt_regs *regs, unsigned long addr)
+{
+ unsigned long limit;
+ unsigned long instr = get_segment_eip (regs, &limit);
+ int scan_more = 1;
+ int prefetch = 0;
+ int i;
+
+ /*
+ * Avoid recursive faults. This catches the kernel jumping to nirvana.
+ * More complicated races with unmapped EIP are handled elsewhere for
+ * user space.
+ */
+ if (instr == addr)
+ return 0;
+
+ for (i = 0; scan_more && i < 15; i++) {
+ unsigned char opcode;
+ unsigned char instr_hi;
+ unsigned char instr_lo;
+
+ if (instr > limit)
+ break;
+ if (__get_user(opcode, (unsigned char *) instr))
+ break;
+
+ instr_hi = opcode & 0xf0;
+ instr_lo = opcode & 0x0f;
+ instr++;
+
+ switch (instr_hi) {
+ case 0x20:
+ case 0x30:
+ /* Values 0x26,0x2E,0x36,0x3E are valid x86 prefixes. */
+ scan_more = ((instr_lo & 7) == 0x6);
+ break;
+
+ case 0x60:
+ /* 0x64 thru 0x67 are valid prefixes in all modes. */
+ scan_more = (instr_lo & 0xC) == 0x4;
+ break;
+ case 0xF0:
+ /* 0xF0, 0xF2, and 0xF3 are valid prefixes in all modes. */
+ scan_more = !instr_lo || (instr_lo>>1) == 1;
+ break;
+ case 0x00:
+ /* Prefetch instruction is 0x0F0D or 0x0F18 */
+ scan_more = 0;
+ if (instr > limit)
+ break;
+ if (__get_user(opcode, (unsigned char *) instr))
+ break;
+ prefetch = (instr_lo == 0xF) &&
+ (opcode == 0x0D || opcode == 0x18);
+ break;
+ default:
+ scan_more = 0;
+ break;
+ }
+ }
+
+ return prefetch;
+}
+
+static inline int is_prefetch(struct pt_regs *regs, unsigned long addr)
+{
+ /* This code is included only when optimising for AMD, so we
+ may as well bias in favour of it. */
+ if (likely(boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ boot_cpu_data.x86 >= 6))
+ return __is_prefetch(regs, addr);
+ return 0;
+}
+
+#else /* CONFIG_X86_PREFETCH_FIXUP */
+
+#define is_prefetch(regs, addr) (0)
+
+#endif
+
asmlinkage void do_invalid_op(struct pt_regs *, unsigned long);
/*
@@ -110,7 +261,7 @@
* atomic region then we must not take the fault..
*/
if (in_atomic() || !mm)
- goto no_context;
+ goto bad_area_nosemaphore;
down_read(&mm->mmap_sem);
@@ -198,8 +349,16 @@
bad_area:
up_read(&mm->mmap_sem);
+bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (error_code & 4) {
+ /*
+ * Valid to do another page fault here because this one came
+ * from user space.
+ */
+ if (is_prefetch(regs, address))
+ return;
+
tsk->thread.cr2 = address;
tsk->thread.error_code = error_code;
tsk->thread.trap_no = 14;
@@ -232,6 +391,14 @@
if (fixup_exception(regs))
return;
+ /*
+ * Valid to do another page fault here, because if this fault
+ * had been triggered by is_prefetch fixup_exception would have
+ * handled it.
+ */
+ if (is_prefetch(regs, address))
+ return;
+
/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.
@@ -286,10 +453,14 @@
do_sigbus:
up_read(&mm->mmap_sem);
- /*
- * Send a sigbus, regardless of whether we were in kernel
- * or user mode.
- */
+ /* Kernel mode? Handle exceptions or die */
+ if (!(error_code & 4))
+ goto no_context;
+
+ /* User space => ok to do another page fault */
+ if (is_prefetch(regs, address))
+ return;
+
tsk->thread.cr2 = address;
tsk->thread.error_code = error_code;
tsk->thread.trap_no = 14;
@@ -298,10 +469,6 @@
info.si_code = BUS_ADRERR;
info.si_addr = (void *)address;
force_sig_info(SIGBUS, &info, tsk);
-
- /* Kernel mode? Handle exceptions or die */
- if (!(error_code & 4))
- goto no_context;
return;
vmalloc_fault:
diff -urN --exclude-from=dontdiff orig-2.6.0-test6/include/asm-i386/cpufeature.h dual-2.6.0-test6/include/asm-i386/cpufeature.h
--- orig-2.6.0-test6/include/asm-i386/cpufeature.h 2003-09-30 05:41:04.520720265 +0100
+++ dual-2.6.0-test6/include/asm-i386/cpufeature.h 2003-09-30 06:03:46.302835762 +0100
@@ -68,6 +68,9 @@
#define X86_FEATURE_K7 (3*32+ 5) /* Athlon */
#define X86_FEATURE_P3 (3*32+ 6) /* P3 */
#define X86_FEATURE_P4 (3*32+ 7) /* P4 */
+/* Prefetch insns without AMD spurious faults, or with fixup for them. */
+#define X86_FEATURE_XMM_PREFETCH (3*32+ 8) /* SSE */
+#define X86_FEATURE_3DNOW_PREFETCH (3*32+ 9) /* 3DNow! */
/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
#define X86_FEATURE_EST (4*32+ 7) /* Enhanced SpeedStep */
diff -urN --exclude-from=dontdiff orig-2.6.0-test6/include/asm-i386/processor.h dual-2.6.0-test6/include/asm-i386/processor.h
--- orig-2.6.0-test6/include/asm-i386/processor.h 2003-09-30 05:41:04.861655987 +0100
+++ dual-2.6.0-test6/include/asm-i386/processor.h 2003-09-30 06:03:46.334829529 +0100
@@ -589,11 +589,9 @@
#define ARCH_HAS_PREFETCH
extern inline void prefetch(const void *x)
{
- if (cpu_data[0].x86_vendor == X86_VENDOR_AMD)
- return; /* Some athlons fault if the address is bad */
alternative_input(ASM_NOP4,
"prefetchnta (%1)",
- X86_FEATURE_XMM,
+ X86_FEATURE_XMM_PREFETCH,
"r" (x));
}
@@ -607,7 +605,7 @@
{
alternative_input(ASM_NOP4,
"prefetchw (%1)",
- X86_FEATURE_3DNOW,
+ X86_FEATURE_3DNOW_PREFETCH,
"r" (x));
}
#define spin_lock_prefetch(x) prefetchw(x)
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-09-30 7:38 [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch Jamie Lokier
@ 2003-09-30 8:01 ` Nick Piggin
2003-09-30 13:22 ` Dave Jones
1 sibling, 0 replies; 45+ messages in thread
From: Nick Piggin @ 2003-09-30 8:01 UTC (permalink / raw)
To: Jamie Lokier; +Cc: akpm, torvalds, linux-kernel, richard.brunner
Jamie Lokier wrote:
>This builds upon Andi Kleen's excellent patch for the AMD prefetch bug
>workaround. It is applies to 2.6.0-test6.
>
>There are four changes on top of Andi's recent patch:
>
> 1. The workaround code is only included when compiling a kernel
> optimised for AMD processors with the bug. For kernels optimised
> for a different processor, there is no code size overhead.
>
> This will make some people happier.
>
> It will make some people unhappier, because it means a generic
> kernel on an AMD will run fine, but won't fixup _userspace_
> prefetch faults. More on this below.
>
> 2. Nevertheless, when a kernel _not_ optimised for those AMD processors
> is run on one, the "alternative" mechanism now correctly replaces
> prefetch instructions with nops.
>
> 3. The is_prefetch() instruction decoder now handles all cases of
> funny GDT and LDT segments, checks limits and so on. As far as I
> can see, there are no further bugs or flaws in that part.
>
> 4. A consequence of the is_prefetch() change is that, despite the
> longer C code (rarely used code, for segments), is_prefetch()
> should run marginally faster now because the access_ok() calls are
> not required. They're subsumed into the segment limit
> calculation.
>
>Like Andi's patch, this removes 10k or so from current x86 kernels by
>removing the silly conditional from the prefetch() function.
>
>Controversial point coming up!
>
It sounds good Jamie, but could you possibly compile it in
unconditionally? Otherwise it introduces yet another caveat to the CPU
selection / build system. This can be addressed seperately.
I don't think anyone advocates attempting to handle this nicely with the
current CPU selection system, nor were they targeting this patch in
particular. It just kicked off a discussion about the shortfalls of the
selection system.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-09-30 7:38 [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch Jamie Lokier
2003-09-30 8:01 ` Nick Piggin
@ 2003-09-30 13:22 ` Dave Jones
2003-09-30 13:39 ` Jamie Lokier
1 sibling, 1 reply; 45+ messages in thread
From: Dave Jones @ 2003-09-30 13:22 UTC (permalink / raw)
To: Jamie Lokier; +Cc: akpm, torvalds, linux-kernel, richard.brunner
On Tue, Sep 30, 2003 at 08:38:14AM +0100, Jamie Lokier wrote:
> diff -urN --exclude-from=dontdiff orig-2.6.0-test6/arch/i386/Kconfig dual-2.6.0-test6/arch/i386/Kconfig
> --- orig-2.6.0-test6/arch/i386/Kconfig 2003-09-30 05:39:33.467103692 +0100
> +++ dual-2.6.0-test6/arch/i386/Kconfig 2003-09-30 06:05:19.868623767 +0100
> @@ -397,6 +397,12 @@
> depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6
> default y
>
> +config X86_PREFETCH_FIXUP
> + bool
> + depends on MK7 || MK8
> + default y
> +
> +
> config HPET_TIMER
> bool "HPET Timer Support"
> help
> diff -urN --exclude-from=dontdiff orig-2.6.0-test6/arch/i386/kernel/cpu/common.c dual-2.6.0-test6/arch/i386/kernel/cpu/common.c
> --- orig-2.6.0-test6/arch/i386/kernel/cpu/common.c 2003-09-30 05:39:33.871035696 +0100
> +++ dual-2.6.0-test6/arch/i386/kernel/cpu/common.c 2003-09-30 06:03:46.222851345 +0100
> @@ -327,6 +327,17 @@
> * we do "generic changes."
> */
>
> + /* Prefetch works ok? */
> +#ifndef CONFIG_X86_PREFETCH_FIXUP
> + if (c->x86_vendor != X86_VENDOR_AMD || c->x86 < 6)
> +#endif
> + {
> + if (cpu_has(c, X86_FEATURE_XMM))
> + set_bit(X86_FEATURE_XMM_PREFETCH, c->x86_capability);
> + if (cpu_has(c, X86_FEATURE_3DNOW))
> + set_bit(X86_FEATURE_3DNOW_PREFETCH, c->x86_capability);
> + }
> +
> /* TSC disabled? */
> if ( tsc_disable )
This looks to be completely gratuitous. Why disable it when we have the
ability to work around it ?
Dave
--
Dave Jones http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-09-30 13:22 ` Dave Jones
@ 2003-09-30 13:39 ` Jamie Lokier
2003-09-30 13:53 ` Dave Jones
0 siblings, 1 reply; 45+ messages in thread
From: Jamie Lokier @ 2003-09-30 13:39 UTC (permalink / raw)
To: Dave Jones, akpm, torvalds, linux-kernel, richard.brunner
Dave Jones wrote:
> This looks to be completely gratuitous. Why disable it when we have the
> ability to work around it ?
Because some people expressed a wish to have kernels that don't
contain the workaround code, be they P4-optimised or 486-optimised
kernels. After all we have kernels that don't contain the F00F
workaround too. I'm not pushing this patch as is, it's for
considering the pros and cons.
CONFIG_X86_PREFETCH_WORKAROUND makes more makes more sense with the
recently available "split every x86 CPU into individually selectable
options" patch, and, on reflection, that's probably where it belongs.
-- Jamie
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-09-30 13:39 ` Jamie Lokier
@ 2003-09-30 13:53 ` Dave Jones
2003-09-30 14:45 ` Jamie Lokier
0 siblings, 1 reply; 45+ messages in thread
From: Dave Jones @ 2003-09-30 13:53 UTC (permalink / raw)
To: Jamie Lokier; +Cc: akpm, torvalds, linux-kernel, richard.brunner
On Tue, Sep 30, 2003 at 02:39:36PM +0100, Jamie Lokier wrote:
> Dave Jones wrote:
> > This looks to be completely gratuitous. Why disable it when we have the
> > ability to work around it ?
>
> Because some people expressed a wish to have kernels that don't
> contain the workaround code, be they P4-optimised or 486-optimised
> kernels.
And those people are wrong. If they want to save bloat, instead of
'fixing' things by removing <1 page of .text, how about working on
some of the real problems like shrinking some of the growth of various
data structures that actually *matter*.
The "I don't want Athlon code in my kernel because I run a P4 and
it makes it slow/bigger" argument is totally bogus. It's akin to
the gentoo-esque "I compiled my distro with -march=p4 and now
my /bin/ls is faster than yours" argument.
> After all we have kernels that don't contain the F00F
> workaround too. I'm not pushing this patch as is, it's for
> considering the pros and cons.
F00F workaround was enabled on every kernel that is possible
to boot on affected hardware last time I looked.
This is what you seem to be missing, it's not optional.
If its possible to boot that kernel on affected hardware,
it needs errata workarounds.
> CONFIG_X86_PREFETCH_WORKAROUND makes more makes more sense with the
> recently available "split every x86 CPU into individually selectable
> options" patch, and, on reflection, that's probably where it belongs.
Said patch isn't included in mainline, so this argument is bogus.
Relative merits of that patch were already discussed in another thread.
Dave
--
Dave Jones http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-09-30 13:53 ` Dave Jones
@ 2003-09-30 14:45 ` Jamie Lokier
2003-09-30 15:08 ` Dave Jones
0 siblings, 1 reply; 45+ messages in thread
From: Jamie Lokier @ 2003-09-30 14:45 UTC (permalink / raw)
To: Dave Jones, akpm, torvalds, linux-kernel, richard.brunner
Dave Jones wrote:
> On Tue, Sep 30, 2003 at 02:39:36PM +0100, Jamie Lokier wrote:
> > Dave Jones wrote:
> > > This looks to be completely gratuitous. Why disable it when we have the
> > > ability to work around it ?
> >
> > Because some people expressed a wish to have kernels that don't
> > contain the workaround code, be they P4-optimised or 486-optimised
> > kernels.
>
> And those people are wrong. If they want to save bloat, instead of
> 'fixing' things by removing <1 page of .text, how about working on
> some of the real problems like shrinking some of the growth of various
> data structures that actually *matter*.
How about both?
> The "I don't want Athlon code in my kernel because I run a P4 and
> it makes it slow/bigger" argument is totally bogus. It's akin to
> the gentoo-esque "I compiled my distro with -march=p4 and now
> my /bin/ls is faster than yours" argument.
I'm talking about people with embedded 486s or old 486s donated. P4s
are abundant in RAM, but 2MB is still not unheard of in the small
boxes, and in 2MB, 512 bytes of code (which is about the size of the
prefetch workaround) is more significant. Not by itself, but by
virtue of being yet another unnecessary, and very clearly removable
chunk. Diligence and all that.
> > After all we have kernels that don't contain the F00F
> > workaround too. I'm not pushing this patch as is, it's for
> > considering the pros and cons.
>
> F00F workaround was enabled on every kernel that is possible
> to boot on affected hardware last time I looked.
> This is what you seem to be missing, it's not optional.
> If its possible to boot that kernel on affected hardware,
> it needs errata workarounds.
We have a few confusing issues here.
1. First, your point about affected hardware.
I see your point, and the new "alternative" macro is allowing things
to develop along those lines even better: using fancier features (like
prefetch) but not requiring them. Most of the x86 kernel is like that now.
All the stuff in arch/i386/kernel/cpu/* is mandatory. Yet:
- I don't see anything that prevents a PPro-compiled kernel from booting
on a P5MMX with the F00F erratum.
- Nor do I see anything that prevents a PII-compiled kernel from booting
on a PPro with the store ordering erratum (X86_PPRO_FENCE).
Those are both nasty bugs, and the latter can corrupt data on an SMP PPro.
Currently, it seems quite hypocritical to treat the AMD workaround
as, in effect, mandatory, while there are others which aren't.
Perhaps it's this apparent hypocrisy which needs healing.
2. I'm not sure if you're criticising the other chap who wants
rid of the AMD errata workaround, or my X86_PREFETCH_FIXUP code.
In case you hadn't fully grokked it, my code doesn't disable the
workaround! It simply substitutes it for a smaller, slightly
slower one, on kernels which are not optimised for AMD. Those
kernels continue to work just fine on AMD processors.
Given that, I'm not sure what the thrust of your argument is.
> > CONFIG_X86_PREFETCH_WORKAROUND makes more makes more sense with the
> > recently available "split every x86 CPU into individually selectable
> > options" patch, and, on reflection, that's probably where it belongs.
>
> Said patch isn't included in mainline, so this argument is bogus.
> Relative merits of that patch were already discussed in another thread.
That wasn't an argument and the patch from me isn't in the mainline
either, but anyway I agree that topic has its own thread. Let's
please leave it out of this one and focus on the merits, or otherwise,
of my patch and Andi's.
-- Jamie
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-09-30 14:45 ` Jamie Lokier
@ 2003-09-30 15:08 ` Dave Jones
2003-09-30 16:54 ` Jamie Lokier
0 siblings, 1 reply; 45+ messages in thread
From: Dave Jones @ 2003-09-30 15:08 UTC (permalink / raw)
To: Jamie Lokier; +Cc: akpm, torvalds, linux-kernel, richard.brunner
On Tue, Sep 30, 2003 at 03:45:26PM +0100, Jamie Lokier wrote:
> > And those people are wrong. If they want to save bloat, instead of
> > 'fixing' things by removing <1 page of .text, how about working on
> > some of the real problems like shrinking some of the growth of various
> > data structures that actually *matter*.
> How about both?
Sounds like wasted effort, in the same sense that rewriting a crap
algorithm in assembly won't be better than using a more efficient
algorithm in C.
> I'm talking about people with embedded 486s or old 486s donated. P4s
> are abundant in RAM
Mine has 256MB. Sure its a huge amount in comparison to how we kitted
out 486s a few years back, but still hardly an abundance in todays bloatware..
> but 2MB is still not unheard of in the small
> boxes, and in 2MB, 512 bytes of code (which is about the size of the
> prefetch workaround) is more significant.
I'l be *amazed* if you manage to get a 2.6 kernel to boot and function
efficiently in 2MB without config'ing away a lot of the 'growth' like
sysfs. (Sidenote: Before some loon actually tries this, by function
efficiently, I mean is actually usable for some purpose, not "it booted
to a bash prompt")
> > F00F workaround was enabled on every kernel that is possible
> > to boot on affected hardware last time I looked.
> > This is what you seem to be missing, it's not optional.
> > If its possible to boot that kernel on affected hardware,
> > it needs errata workarounds.
>
> We have a few confusing issues here.
>
> 1. First, your point about affected hardware.
>
> - I don't see anything that prevents a PPro-compiled kernel from booting
> on a P5MMX with the F00F erratum.
Compiled with -m686 - Uses CMOV, won't boot.
> - Nor do I see anything that prevents a PII-compiled kernel from booting
> on a PPro with the store ordering erratum (X86_PPRO_FENCE).
Correct. As noted in another mail, it arguably should contain the
workaround.
> Perhaps it's this apparent hypocrisy which needs healing.
Agreed.
> 2. I'm not sure if you're criticising the other chap who wants
> rid of the AMD errata workaround, or my X86_PREFETCH_FIXUP code.
My criticism was twofold.
1. The splitting of X86_FEATURE_XMM into X86_FEATURE_XMM_PREFETCH and
X86_FEATURE_3DNOW_PREFETCH doesn't seem to really buy us anything
other than complication.
2. THis chunk...
+ /* Prefetch works ok? */
+#ifndef CONFIG_X86_PREFETCH_FIXUP
+ if (c->x86_vendor != X86_VENDOR_AMD || c->x86 < 6)
+#endif
+ {
+ if (cpu_has(c, X86_FEATURE_XMM))
+ set_bit(X86_FEATURE_XMM_PREFETCH, c->x86_capability);
+ if (cpu_has(c, X86_FEATURE_3DNOW))
+ set_bit(X86_FEATURE_3DNOW_PREFETCH, c->x86_capability);
+ }
- If we haven't set CONFIG_X86_PREFETCH_FIXUP (say a P4 kernel), this
code path isn't taken, and we end up not doing prefetches on P4's too
as you're not setting X86_FEATURE_XMM_PREFETCH anywhere else, and apply_alternatives
leaves them as NOPs.
- Newer C3s are 686's with prefetch, this nobbles them too.
> In case you hadn't fully grokked it, my code doesn't disable the
> workaround! It simply substitutes it for a smaller, slightly
> slower one, on kernels which are not optimised for AMD.
See above. Or have I missed something ?
> Given that, I'm not sure what the thrust of your argument is.
It's possible I'm missing something silly..
Dave
--
Dave Jones http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-09-30 15:08 ` Dave Jones
@ 2003-09-30 16:54 ` Jamie Lokier
2003-09-30 17:26 ` Dave Jones
0 siblings, 1 reply; 45+ messages in thread
From: Jamie Lokier @ 2003-09-30 16:54 UTC (permalink / raw)
To: Dave Jones, akpm, torvalds, linux-kernel, richard.brunner
Dave Jones wrote:
> > - I don't see anything that prevents a PPro-compiled kernel from booting
> > on a P5MMX with the F00F erratum.
>
> Compiled with -m686 - Uses CMOV, won't boot.
Ok, not PPro, but with Processor Family set to K6, CYRIXIII, or any of
the 3 WINCHIP choices, it is compiled with -march=i585 and without F00F.
(Fixing that by adding F00F too all those non-Intel processors, just to
make sure non-F00F kernels crash with a cmov instruction is too subtle
for my taste.)
Anyway, it should complain about lack of cmov not crash :)
> 1. The splitting of X86_FEATURE_XMM into X86_FEATURE_XMM_PREFETCH and
> X86_FEATURE_3DNOW_PREFETCH doesn't seem to really buy us anything
> other than complication.
I once suggested turning off XMM entirely when prefetch is broken and
not fixed, but that resulted in a mild toasting, hence the extra
synthetic flag.
It's not really split: XMM_PREFETCH is an additional flag, on top of
XMM, which simply means Linux decided it's safe to use the prefetch
instruction.
The only reason it's a feature flag is because the "alternative" macro
needs one.
> + /* Prefetch works ok? */
> +#ifndef CONFIG_X86_PREFETCH_FIXUP
> + if (c->x86_vendor != X86_VENDOR_AMD || c->x86 < 6)
> +#endif
> + {
> + if (cpu_has(c, X86_FEATURE_XMM))
> + set_bit(X86_FEATURE_XMM_PREFETCH, c->x86_capability);
> + if (cpu_has(c, X86_FEATURE_3DNOW))
> + set_bit(X86_FEATURE_3DNOW_PREFETCH, c->x86_capability);
> + }
>
> - If we haven't set CONFIG_X86_PREFETCH_FIXUP (say a P4 kernel), this
> code path isn't taken, and we end up not doing prefetches on P4's too
> as you're not setting X86_FEATURE_XMM_PREFETCH anywhere else, and apply_alternatives
> leaves them as NOPs.
> - Newer C3s are 686's with prefetch, this nobbles them too.
Read the code again. It does what you think it doesn't do, so to speak.
-- Jamie
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-09-30 16:54 ` Jamie Lokier
@ 2003-09-30 17:26 ` Dave Jones
2003-09-30 23:55 ` Jamie Lokier
0 siblings, 1 reply; 45+ messages in thread
From: Dave Jones @ 2003-09-30 17:26 UTC (permalink / raw)
To: Jamie Lokier; +Cc: akpm, torvalds, linux-kernel, richard.brunner
On Tue, Sep 30, 2003 at 05:54:50PM +0100, Jamie Lokier wrote:
> > > - I don't see anything that prevents a PPro-compiled kernel from booting
> > > on a P5MMX with the F00F erratum.
> > Compiled with -m686 - Uses CMOV, won't boot.
> Ok, not PPro, but with Processor Family set to K6, CYRIXIII, or any of
> the 3 WINCHIP choices, it is compiled with -march=i585 and without F00F.
in theory, these should be interchangable. Not tested though.
> (Fixing that by adding F00F too all those non-Intel processors, just to
> make sure non-F00F kernels crash with a cmov instruction is too subtle
> for my taste.)
this case is a little more obscure imo, and not worth the effort.
> Anyway, it should complain about lack of cmov not crash :)
not easy, given we execute cmov instructions before we even hit
a printk. Such a test & output needs to be done in assembly in early
startup.
> > 1. The splitting of X86_FEATURE_XMM into X86_FEATURE_XMM_PREFETCH and
> > X86_FEATURE_3DNOW_PREFETCH doesn't seem to really buy us anything
> > other than complication.
> I once suggested turning off XMM entirely when prefetch is broken and
> not fixed, but that resulted in a mild toasting, hence the extra
> synthetic flag.
Which gets us back to the question of why this is needed at all ?
You said earlier "In case you hadn't fully grokked it, my code doesn't
disable the workaround!" So why do you need this ?
> > - If we haven't set CONFIG_X86_PREFETCH_FIXUP (say a P4 kernel), this
> > code path isn't taken, and we end up not doing prefetches on P4's too
> > as you're not setting X86_FEATURE_XMM_PREFETCH anywhere else, and apply_alternatives
> > leaves them as NOPs.
> > - Newer C3s are 686's with prefetch, this nobbles them too.
> Read the code again. It does what you think it doesn't do, so to speak.
Yep, brain fart.
Dave
--
Dave Jones http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-09-30 17:26 ` Dave Jones
@ 2003-09-30 23:55 ` Jamie Lokier
2003-10-01 0:27 ` Andrew Morton
0 siblings, 1 reply; 45+ messages in thread
From: Jamie Lokier @ 2003-09-30 23:55 UTC (permalink / raw)
To: Dave Jones, akpm, torvalds, linux-kernel, richard.brunner
Dave Jones wrote:
> Which gets us back to the question of why this is needed at all ?
> You said earlier "In case you hadn't fully grokked it, my code doesn't
> disable the workaround!" So why do you need this ?
To change the prefetch workaround from a critical requirement to an
optimisation knob.
X86_USE_3DNOW is very similar: if it's enabled, the kernel has some
extra code to make certain CPUs run faster, but they also run fine
without it. X86_OOSTORE is another.
What I'd really like your opinion on is the appropriate userspace
behaviour. If we don't care about fixing up userspace, then
__ex_table is a much tidier workaround for the prefetch bug. If we do
care about fixing up userspace, then do we need a policy decision that
says it's not acceptable to run on AMD without userspace fixups from
2.6.0 onwards - it must fixup userspace or refuse to run?
-- Jamie
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-09-30 23:55 ` Jamie Lokier
@ 2003-10-01 0:27 ` Andrew Morton
0 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2003-10-01 0:27 UTC (permalink / raw)
To: Jamie Lokier; +Cc: davej, torvalds, linux-kernel, richard.brunner
Jamie Lokier <jamie@shareable.org> wrote:
>
> What I'd really like your opinion on is the appropriate userspace
> behaviour. If we don't care about fixing up userspace, then
> __ex_table is a much tidier workaround for the prefetch bug.
I think we should fix up userspace.
> If we do
> care about fixing up userspace, then do we need a policy decision that
> says it's not acceptable to run on AMD without userspace fixups from
> 2.6.0 onwards - it must fixup userspace or refuse to run?
If you're saying that the kernel should refuse to boot if it discovers that
a) the CPU needs the workaround and b) the kernel does not implement the
workaround then yup.
^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <20030930073814.GA26649@mail.jlokier.co.uk.suse.lists.linux.kernel>]
* RE: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
@ 2003-10-01 1:54 Nakajima, Jun
2003-10-01 2:07 ` Andrew Morton
2003-10-01 2:08 ` Mike Fedyk
0 siblings, 2 replies; 45+ messages in thread
From: Nakajima, Jun @ 2003-10-01 1:54 UTC (permalink / raw)
To: Andrew Morton, Jamie Lokier
Cc: davej, torvalds, linux-kernel, richard.brunner
> I think we should fix up userspace.
What do you mean by this? Patch user code at runtime (it's much more
complex than it sounds) or remove prefetch instructions from userspace?
Jun
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Andrew Morton
> Sent: Tuesday, September 30, 2003 5:27 PM
> To: Jamie Lokier
> Cc: davej@redhat.com; torvalds@osdl.org; linux-kernel@vger.kernel.org;
> richard.brunner@amd.com
> Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch
errata
> patch
>
> Jamie Lokier <jamie@shareable.org> wrote:
> >
> > What I'd really like your opinion on is the appropriate userspace
> > behaviour. If we don't care about fixing up userspace, then
> > __ex_table is a much tidier workaround for the prefetch bug.
>
> I think we should fix up userspace.
>
> > If we do
> > care about fixing up userspace, then do we need a policy decision
that
> > says it's not acceptable to run on AMD without userspace fixups from
> > 2.6.0 onwards - it must fixup userspace or refuse to run?
>
> If you're saying that the kernel should refuse to boot if it discovers
> that
> a) the CPU needs the workaround and b) the kernel does not implement
the
> workaround then yup.
>
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-10-01 1:54 Nakajima, Jun
@ 2003-10-01 2:07 ` Andrew Morton
2003-10-01 2:08 ` Mike Fedyk
1 sibling, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2003-10-01 2:07 UTC (permalink / raw)
To: Nakajima, Jun; +Cc: jamie, davej, torvalds, linux-kernel, richard.brunner
"Nakajima, Jun" <jun.nakajima@intel.com> wrote:
>
> > I think we should fix up userspace.
> What do you mean by this? Patch user code at runtime (it's much more
> complex than it sounds) or remove prefetch instructions from userspace?
Detect when user code stumbles over this CPU errata and make it look like
nothing happened.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-10-01 1:54 Nakajima, Jun
2003-10-01 2:07 ` Andrew Morton
@ 2003-10-01 2:08 ` Mike Fedyk
1 sibling, 0 replies; 45+ messages in thread
From: Mike Fedyk @ 2003-10-01 2:08 UTC (permalink / raw)
To: Nakajima, Jun
Cc: Andrew Morton, Jamie Lokier, davej, torvalds, linux-kernel,
richard.brunner
On Tue, Sep 30, 2003 at 06:54:06PM -0700, Nakajima, Jun wrote:
> > I think we should fix up userspace.
> What do you mean by this? Patch user code at runtime (it's much more
> complex than it sounds) or remove prefetch instructions from userspace?
No, just make sure that userspace doesn't see the exceptions that the errata
will produce in obscure cases. It is all explained in several threads about
this. Search for "Andi Kleen" and "prefetch".
^ permalink raw reply [flat|nested] 45+ messages in thread
* RE: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
@ 2003-10-01 2:23 Nakajima, Jun
2003-10-01 2:51 ` Jamie Lokier
0 siblings, 1 reply; 45+ messages in thread
From: Nakajima, Jun @ 2003-10-01 2:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: jamie, davej, torvalds, linux-kernel, richard.brunner
Oh, I thought you already closed this issue and you were discussing a
different thing.
I agree. They kernel should fix up userspace for this CPU errata.
Jun
> -----Original Message-----
> From: Andrew Morton [mailto:akpm@osdl.org]
> Sent: Tuesday, September 30, 2003 7:07 PM
> To: Nakajima, Jun
> Cc: jamie@shareable.org; davej@redhat.com; torvalds@osdl.org; linux-
> kernel@vger.kernel.org; richard.brunner@amd.com
> Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch
errata
> patch
>
> "Nakajima, Jun" <jun.nakajima@intel.com> wrote:
> >
> > > I think we should fix up userspace.
> > What do you mean by this? Patch user code at runtime (it's much
more
> > complex than it sounds) or remove prefetch instructions from
userspace?
>
> Detect when user code stumbles over this CPU errata and make it look
like
> nothing happened.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-10-01 2:23 Nakajima, Jun
@ 2003-10-01 2:51 ` Jamie Lokier
2003-10-01 3:14 ` Andrew Morton
0 siblings, 1 reply; 45+ messages in thread
From: Jamie Lokier @ 2003-10-01 2:51 UTC (permalink / raw)
To: Nakajima, Jun
Cc: Andrew Morton, davej, torvalds, linux-kernel, richard.brunner
Nakajima, Jun wrote:
> Oh, I thought you already closed this issue and you were discussing a
> different thing.
>
> I agree. They kernel should fix up userspace for this CPU errata.
Our question is whether kernels configured specifically for non-AMD
(e.g. Winchip or Elan kernels) must have the AMD fixup code (it is a
few hundred bytes), refuse to boot on AMD, ignore the problem, or work
but not fix up userspace.
-- Jamie
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-10-01 2:51 ` Jamie Lokier
@ 2003-10-01 3:14 ` Andrew Morton
0 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2003-10-01 3:14 UTC (permalink / raw)
To: Jamie Lokier; +Cc: jun.nakajima, davej, torvalds, linux-kernel, richard.brunner
Jamie Lokier <jamie@shareable.org> wrote:
>
> Nakajima, Jun wrote:
> > Oh, I thought you already closed this issue and you were discussing a
> > different thing.
> >
> > I agree. They kernel should fix up userspace for this CPU errata.
>
> Our question is whether kernels configured specifically for non-AMD
> (e.g. Winchip or Elan kernels) must have the AMD fixup code (it is a
> few hundred bytes), refuse to boot on AMD, ignore the problem, or work
> but not fix up userspace.
Refusing to boot would be best I think. Fixing it up anyway would be OK
too.
Ignoring the problem in kernel and/or userspace could be viewed as a
response to pilot error but as always it would be nice if, given a choice,
we can help the pilot.
^ permalink raw reply [flat|nested] 45+ messages in thread
* RE: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
@ 2003-10-01 4:30 Nakajima, Jun
2003-10-01 5:38 ` Jamie Lokier
0 siblings, 1 reply; 45+ messages in thread
From: Nakajima, Jun @ 2003-10-01 4:30 UTC (permalink / raw)
To: Andrew Morton, Jamie Lokier
Cc: davej, torvalds, linux-kernel, richard.brunner
> Refusing to boot would be best I think. Fixing it up anyway would be
OK
> too.
>
I agree.
> Ignoring the problem in kernel and/or userspace could be viewed as a
> response to pilot error but as always it would be nice if, given a
choice,
> we can help the pilot.
Yes. It would kind to tell the pilot about the errata of the engine (and
refuse to start), rather than telling him that the engine might break
down anytime after the takeoff.
Jun
> -----Original Message----
> From: Andrew Morton [mailto:akpm@osdl.org]
> Sent: Tuesday, September 30, 2003 8:14 PM
> To: Jamie Lokier
> Cc: Nakajima, Jun; davej@redhat.com; torvalds@osdl.org; linux-
> kernel@vger.kernel.org; richard.brunner@amd.com
> Subject: Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch
errata
> patch
>
> Jamie Lokier <jamie@shareable.org> wrote:
> >
> > Nakajima, Jun wrote:
> > > Oh, I thought you already closed this issue and you were
discussing a
> > > different thing.
> > >
> > > I agree. The kernel should fix up userspace for this CPU errata.
> >
> > Our question is whether kernels configured specifically for non-AMD
> > (e.g. Winchip or Elan kernels) must have the AMD fixup code (it is a
> > few hundred bytes), refuse to boot on AMD, ignore the problem, or
work
> > but not fix up userspace.
>
> Refusing to boot would be best I think. Fixing it up anyway would be
OK
> too.
>
> Ignoring the problem in kernel and/or userspace could be viewed as a
> response to pilot error but as always it would be nice if, given a
choice,
> we can help the pilot.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-10-01 4:30 Nakajima, Jun
@ 2003-10-01 5:38 ` Jamie Lokier
2003-10-01 5:48 ` Andrew Morton
0 siblings, 1 reply; 45+ messages in thread
From: Jamie Lokier @ 2003-10-01 5:38 UTC (permalink / raw)
To: Nakajima, Jun
Cc: Andrew Morton, davej, torvalds, linux-kernel, richard.brunner
Nakajima, Jun wrote:
> Yes. It would kind to tell the pilot about the errata of the engine (and
> refuse to start), rather than telling him that the engine might break
> down anytime after the takeoff.
Doesn't refusing to boot seem to heavy handed for this bug? The buggy
CPUs have been around for many years (it is practically the entire AMD
line for the last 4 years or so), and nobody in userspace has
complained about the 2.4 behaviour so far. (Linux 2.4 behaviour is,
of course, to ignore the errata).
ps. Why are we all saying errata - isn't erratum good enough? :)
-- Jamie
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-10-01 5:38 ` Jamie Lokier
@ 2003-10-01 5:48 ` Andrew Morton
2003-10-01 6:13 ` Jamie Lokier
0 siblings, 1 reply; 45+ messages in thread
From: Andrew Morton @ 2003-10-01 5:48 UTC (permalink / raw)
To: Jamie Lokier; +Cc: jun.nakajima, davej, torvalds, linux-kernel, richard.brunner
Jamie Lokier <jamie@shareable.org> wrote:
>
> Nakajima, Jun wrote:
> > Yes. It would kind to tell the pilot about the errata of the engine (and
> > refuse to start), rather than telling him that the engine might break
> > down anytime after the takeoff.
>
> Doesn't refusing to boot seem to heavy handed for this bug? The buggy
> CPUs have been around for many years (it is practically the entire AMD
> line for the last 4 years or so), and nobody in userspace has
> complained about the 2.4 behaviour so far. (Linux 2.4 behaviour is,
> of course, to ignore the errata).
That is the case at present. But the 2.6 kernel was hitting this
erracularity daily.
If some smart cookie decides to add prefetches to some STL implementation
or something, they are likely to start hitting it with the same frequency.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-10-01 5:48 ` Andrew Morton
@ 2003-10-01 6:13 ` Jamie Lokier
2003-10-01 6:32 ` Andrew Morton
0 siblings, 1 reply; 45+ messages in thread
From: Jamie Lokier @ 2003-10-01 6:13 UTC (permalink / raw)
To: Andrew Morton
Cc: jun.nakajima, davej, torvalds, linux-kernel, richard.brunner
Andrew Morton wrote:
> > Doesn't refusing to boot seem to heavy handed for this bug? The buggy
> > CPUs have been around for many years (it is practically the entire AMD
> > line for the last 4 years or so), and nobody in userspace has
> > complained about the 2.4 behaviour so far. (Linux 2.4 behaviour is,
> > of course, to ignore the errata).
>
> That is the case at present. But the 2.6 kernel was hitting this
> erracularity daily.
We're talking about what to offer userspace now... I think we all
agree that the kernel itself shouldn't be allowed to hit it, one way
or another.
> If some smart cookie decides to add prefetches to some STL implementation
> or something, they are likely to start hitting it with the same frequency.
Especially now that GCC has intrinsics for prefetches, and GCC's
optimiser can generate prefetches automatically (-fprefetch-loop-arrays).
But if they assume the kernel hides it, they'll still hit it on any of
the huge installed base of <=4 year old AMD boxes running 2.2 and 2.4.
Ideally, userspace should just not compile with prefetch if they think
they might run on one of those - or select different code at run time
- it is not so different from the constraints which say SSE code
simply won't run on some CPUs or old kernels anyway. (prefetch is
worse on a P5 than a K7 - it'll throw an illegal opcode exception :)
I understand you're advocating a policy that says we can't do anything
about old systems, but from 2.6 onwards apps can depend on not being
hit by that erratum in userspace, is that right?
-- Jamie
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-10-01 6:13 ` Jamie Lokier
@ 2003-10-01 6:32 ` Andrew Morton
2003-10-01 6:57 ` Jamie Lokier
0 siblings, 1 reply; 45+ messages in thread
From: Andrew Morton @ 2003-10-01 6:32 UTC (permalink / raw)
To: Jamie Lokier; +Cc: jun.nakajima, davej, torvalds, linux-kernel, richard.brunner
Jamie Lokier <jamie@shareable.org> wrote:
>
> Andrew Morton wrote:
> > > Doesn't refusing to boot seem to heavy handed for this bug? The buggy
> > > CPUs have been around for many years (it is practically the entire AMD
> > > line for the last 4 years or so), and nobody in userspace has
> > > complained about the 2.4 behaviour so far. (Linux 2.4 behaviour is,
> > > of course, to ignore the errata).
> >
> > That is the case at present. But the 2.6 kernel was hitting this
> > erracularity daily.
>
> We're talking about what to offer userspace now... I think we all
> agree that the kernel itself shouldn't be allowed to hit it, one way
> or another.
Oh yes, if it hits in-kernel you get a dead box.
Looking at Andi's patch, it is also a dead box if the fault happens inside
down_write(mmap_sem). That should be fixed, methinks.
And I think we're also a bit deadlocky if it happens inside down_read(),
because double down_read() is illegal because an intervening down_write()
from another thread can block the second down_read(). Or maybe not: the
rwsem semantics may have changed since I last looked.
And if the fault happens inside spinlock on a !CONFIG_PREEMPT kernel we end
up doing down_read() with spinlocks held, I think?
> > If some smart cookie decides to add prefetches to some STL implementation
> > or something, they are likely to start hitting it with the same frequency.
>
> Especially now that GCC has intrinsics for prefetches, and GCC's
> optimiser can generate prefetches automatically (-fprefetch-loop-arrays).
Yup. Although prefetch-loop-arrays doesn't sound like something which will
deref a dud pointer?
> ...
> I understand you're advocating a policy that says we can't do anything
> about old systems, but from 2.6 onwards apps can depend on not being
> hit by that erratum in userspace, is that right?
I expect this will be backported to 2.4 asap.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch
2003-10-01 6:32 ` Andrew Morton
@ 2003-10-01 6:57 ` Jamie Lokier
0 siblings, 0 replies; 45+ messages in thread
From: Jamie Lokier @ 2003-10-01 6:57 UTC (permalink / raw)
To: Andrew Morton
Cc: jun.nakajima, davej, torvalds, linux-kernel, richard.brunner
Andrew Morton wrote:
> Looking at Andi's patch, it is also a dead box if the fault happens inside
> down_write(mmap_sem). That should be fixed, methinks.
>
> And I think we're also a bit deadlocky if it happens inside down_read(),
> because double down_read() is illegal because an intervening down_write()
> from another thread can block the second down_read(). Or maybe not: the
> rwsem semantics may have changed since I last looked.
>
> And if the fault happens inside spinlock on a !CONFIG_PREEMPT kernel we end
> up doing down_read() with spinlocks held, I think?
Ouch, ouch and ouch. You're right, it is a serious problem with the
patch. It is easy enough to fix by making the fault handler not take
mmap_sem if the fault's in the kernel address range. (With apologies
to the folk running kernel mode userspace...)
> Yup. Although prefetch-loop-arrays doesn't sound like something which will
> deref a dud pointer?
It might deref off the end of a mapped region.
> > I understand you're advocating a policy that says we can't do anything
> > about old systems, but from 2.6 onwards apps can depend on not being
> > hit by that erratum in userspace, is that right?
>
> I expect this will be backported to 2.4 asap.
I agree, but there's going to be an installed base of _current_ 2.4
kernels, on these AMD CPUs, for some years to come. App and library
writers will have to know about the erratum if they want to use the
prefetch instruction, for a while yet.
Btw, does anyone know if the "prefetchw" instruction is affected by
the erratum? I see that in current 2.6, only the "prefetchnta"
instruction is disabled so presumably "prefetchw" is ok?
-- Jamie
^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <7F740D512C7C1046AB53446D3720017304AFCF@scsmsx402.sc.intel.com.suse.lists.linux.kernel>]
end of thread, other threads:[~2003-10-01 16:18 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-30 7:38 [PATCH] Mutilated form of Andi Kleen's AMD prefetch errata patch Jamie Lokier
2003-09-30 8:01 ` Nick Piggin
2003-09-30 13:22 ` Dave Jones
2003-09-30 13:39 ` Jamie Lokier
2003-09-30 13:53 ` Dave Jones
2003-09-30 14:45 ` Jamie Lokier
2003-09-30 15:08 ` Dave Jones
2003-09-30 16:54 ` Jamie Lokier
2003-09-30 17:26 ` Dave Jones
2003-09-30 23:55 ` Jamie Lokier
2003-10-01 0:27 ` Andrew Morton
[not found] <20030930073814.GA26649@mail.jlokier.co.uk.suse.lists.linux.kernel>
[not found] ` <20030930132211.GA23333@redhat.com.suse.lists.linux.kernel>
[not found] ` <20030930133936.GA28876@mail.shareable.org.suse.lists.linux.kernel>
[not found] ` <20030930135324.GC5507@redhat.com.suse.lists.linux.kernel>
[not found] ` <20030930144526.GC28876@mail.shareable.org.suse.lists.linux.kernel>
[not found] ` <20030930150825.GD5507@redhat.com.suse.lists.linux.kernel>
[not found] ` <20030930165450.GF28876@mail.shareable.org.suse.lists.linux.kernel>
[not found] ` <20030930172618.GE5507@redhat.com.suse.lists.linux.kernel>
2003-09-30 19:08 ` Andi Kleen
2003-09-30 20:08 ` H. Peter Anvin
2003-10-01 1:54 Nakajima, Jun
2003-10-01 2:07 ` Andrew Morton
2003-10-01 2:08 ` Mike Fedyk
2003-10-01 2:23 Nakajima, Jun
2003-10-01 2:51 ` Jamie Lokier
2003-10-01 3:14 ` Andrew Morton
2003-10-01 4:30 Nakajima, Jun
2003-10-01 5:38 ` Jamie Lokier
2003-10-01 5:48 ` Andrew Morton
2003-10-01 6:13 ` Jamie Lokier
2003-10-01 6:32 ` Andrew Morton
2003-10-01 6:57 ` Jamie Lokier
[not found] <7F740D512C7C1046AB53446D3720017304AFCF@scsmsx402.sc.intel.com.suse.lists.linux.kernel>
[not found] ` <20031001053833.GB1131@mail.shareable.org.suse.lists.linux.kernel>
[not found] ` <20030930224853.15073447.akpm@osdl.org.suse.lists.linux.kernel>
[not found] ` <20031001061348.GE1131@mail.shareable.org.suse.lists.linux.kernel>
[not found] ` <20030930233258.37ed9f7f.akpm@osdl.org.suse.lists.linux.kernel>
2003-10-01 6:47 ` Andi Kleen
2003-10-01 7:00 ` Andrew Morton
2003-10-01 7:06 ` Andi Kleen
2003-10-01 7:31 ` Jamie Lokier
2003-10-01 7:41 ` Andi Kleen
2003-10-01 8:02 ` Hugh Dickins
2003-10-01 8:49 ` Andi Kleen
2003-10-01 9:33 ` Jamie Lokier
2003-10-01 14:51 ` Andrew Morton
2003-10-01 14:56 ` Andi Kleen
2003-10-01 15:19 ` Andrew Morton
2003-10-01 15:24 ` Andi Kleen
2003-10-01 16:18 ` Jamie Lokier
2003-10-01 7:20 ` Jamie Lokier
2003-10-01 7:39 ` Andi Kleen
2003-10-01 8:20 ` Jamie Lokier
[not found] ` <20031001065705.GI1131@mail.shareable.org.suse.lists.linux.kernel>
2003-10-01 7:15 ` Andi Kleen
2003-10-01 7:24 ` Andi Kleen
2003-10-01 7:55 ` Jamie Lokier
2003-10-01 8:00 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).