All of lore.kernel.org
 help / color / mirror / Atom feed
* Dynamic nop selection breaks boot on Geode LX
@ 2010-10-02 19:16 Daniel Drake
  2010-10-03  5:50 ` Borislav Petkov
  2010-10-04 15:46 ` Jason Baron
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel Drake @ 2010-10-02 19:16 UTC (permalink / raw)
  To: jbaron; +Cc: Andres Salomon, Chris Ball, linux-kernel, rostedt

Hi,

linux-next fails to boot on OLPC's XO-1 laptop based on Geode LX processor.

This is because of

commit f49aa448561fe9215f43405cac6f31eb86317792
Author: Jason Baron <jbaron@redhat.com>
Date:   Fri Sep 17 11:08:51 2010 -0400

    jump label: Make dynamic no-op selection available outside of ftrace

http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git;a=commitdiff;h=f49aa44

The Geode hangs inside the asm() statement inside arch_init_ideal_nop5.

Any thoughts?

I'm a bit out of my depth debugging assembly code, but will happily
take pointers and patches to help diagnose.

cheers
Daniel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-02 19:16 Dynamic nop selection breaks boot on Geode LX Daniel Drake
@ 2010-10-03  5:50 ` Borislav Petkov
  2010-10-03  9:26   ` Borislav Petkov
  2010-10-03 16:32   ` Daniel Drake
  2010-10-04 15:46 ` Jason Baron
  1 sibling, 2 replies; 19+ messages in thread
From: Borislav Petkov @ 2010-10-03  5:50 UTC (permalink / raw)
  To: Daniel Drake; +Cc: jbaron, Andres Salomon, Chris Ball, linux-kernel, rostedt

From: Daniel Drake <dsd@laptop.org>
Date: Sat, Oct 02, 2010 at 08:16:19PM +0100

> Hi,
>
> linux-next fails to boot on OLPC's XO-1 laptop based on Geode LX
> processor.
>
> This is because of
>
> commit f49aa448561fe9215f43405cac6f31eb86317792 Author: Jason Baron
> <jbaron@redhat.com> Date: Fri Sep 17 11:08:51 2010 -0400
>
>     jump label: Make dynamic no-op selection available outside of
> ftrace
>
> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git;a=co
> mmitdiff;h=f49aa44
>
> The Geode hangs inside the asm() statement inside
> arch_init_ideal_nop5.
>
> Any thoughts?

maybe because the Geode doesn't have both the
P6_NOP5 and the NOP with 4 0x66 prefixes:
http://kerneltrap.org/mailarchive/linux-kernel/2010/8/27/4612336

and for some reason the trap can't find the fixup address. You say
"hangs" so you don't even get an "invalid opcode" OOPS?

Maybe on a Geode jmp label should use the 2-byte jmp + 3 1-byte nops
unconditionally after adding a synthetic CPUID flag in init_amd_k6(), in
the Geode part.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-03  5:50 ` Borislav Petkov
@ 2010-10-03  9:26   ` Borislav Petkov
  2010-10-03 14:47     ` Borislav Petkov
  2010-10-03 16:32   ` Daniel Drake
  1 sibling, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2010-10-03  9:26 UTC (permalink / raw)
  To: Daniel Drake; +Cc: jbaron, Andres Salomon, Chris Ball, linux-kernel, rostedt

From: Borislav Petkov <bp@alien8.de>
Date: Sun, Oct 03, 2010 at 07:50:13AM +0200

> Maybe on a Geode jmp label should use the 2-byte jmp + 3 1-byte nops
> unconditionally after adding a synthetic CPUID flag in init_amd_k6(), in
> the Geode part.

Maybe something like the following? (only tested in kvm)

--
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 220e2ea..e2ef56c 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -89,7 +89,7 @@
 #define X86_FEATURE_LFENCE_RDTSC (3*32+18) /* "" Lfence synchronizes RDTSC */
 #define X86_FEATURE_11AP	(3*32+19) /* "" Bad local APIC aka 11AP */
 #define X86_FEATURE_NOPL	(3*32+20) /* The NOPL (0F 1F) instructions */
-					  /* 21 available, was AMD_C1E */
+#define X86_FEATURE_NO_NOPL	(3*32+21) /* "" Missing NOPL (e.g. Geode LX) */
 #define X86_FEATURE_XTOPOLOGY	(3*32+22) /* cpu topology enum extensions */
 #define X86_FEATURE_TSC_RELIABLE (3*32+23) /* TSC is known to be reliable */
 #define X86_FEATURE_NONSTOP_TSC	(3*32+24) /* TSC does not stop in C states */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index cb0e6d3..3e1fa13 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -666,6 +666,9 @@ void __init arch_init_ideal_nop5(void)
 	 *
 	 * TODO: check the cpuid to determine the best nop.
 	 */
+	if (boot_cpu_has(X86_FEATURE_NO_NOPL))
+		goto early_done;
+
 	asm volatile (
 		"ftrace_test_jmp:"
 		"jmp ftrace_test_p6nop\n"
@@ -688,6 +691,7 @@ void __init arch_init_ideal_nop5(void)
 		_ASM_EXTABLE(ftrace_test_nop5, 3b)
 		: "=r"(faulted) : "0" (faulted));
 
+early_done:
 	switch (faulted) {
 	case 0:
 		pr_info("converting mcount calls to 0f 1f 44 00 00\n");
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 7b875fd..d1e5cf4 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -139,7 +139,7 @@ static void __cpuinit init_amd_k6(struct cpuinfo_x86 *c)
 
 	if (c->x86_model == 10) {
 		/* AMD Geode LX is model 10 */
-		/* placeholder for any needed mods */
+		set_cpu_cap(c, X86_FEATURE_NO_NOPL);
 		return;
 	}
 }

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-03  9:26   ` Borislav Petkov
@ 2010-10-03 14:47     ` Borislav Petkov
  0 siblings, 0 replies; 19+ messages in thread
From: Borislav Petkov @ 2010-10-03 14:47 UTC (permalink / raw)
  To: Daniel Drake; +Cc: jbaron, Andres Salomon, Chris Ball, linux-kernel, rostedt

From: Borislav Petkov <bp@alien8.de>
Date: Sun, Oct 03, 2010 at 11:26:12AM +0200

> > Maybe on a Geode jmp label should use the 2-byte jmp + 3 1-byte nops
> > unconditionally after adding a synthetic CPUID flag in init_amd_k6(), in
> > the Geode part.
> 
> Maybe something like the following? (only tested in kvm)

Gaah, forgot the actual fallback to jmp + nop, test this one instead:

--
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 220e2ea..e2ef56c 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -89,7 +89,7 @@
 #define X86_FEATURE_LFENCE_RDTSC (3*32+18) /* "" Lfence synchronizes RDTSC */
 #define X86_FEATURE_11AP	(3*32+19) /* "" Bad local APIC aka 11AP */
 #define X86_FEATURE_NOPL	(3*32+20) /* The NOPL (0F 1F) instructions */
-					  /* 21 available, was AMD_C1E */
+#define X86_FEATURE_NO_NOPL	(3*32+21) /* "" Missing NOPL (e.g. Geode LX) */
 #define X86_FEATURE_XTOPOLOGY	(3*32+22) /* cpu topology enum extensions */
 #define X86_FEATURE_TSC_RELIABLE (3*32+23) /* TSC is known to be reliable */
 #define X86_FEATURE_NONSTOP_TSC	(3*32+24) /* TSC does not stop in C states */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index cb0e6d3..d5fad2b 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -666,6 +666,11 @@ void __init arch_init_ideal_nop5(void)
 	 *
 	 * TODO: check the cpuid to determine the best nop.
 	 */
+	if (boot_cpu_has(X86_FEATURE_NO_NOPL)) {
+		faulted = 2;
+		goto early_done;
+	}
+
 	asm volatile (
 		"ftrace_test_jmp:"
 		"jmp ftrace_test_p6nop\n"
@@ -688,6 +693,7 @@ void __init arch_init_ideal_nop5(void)
 		_ASM_EXTABLE(ftrace_test_nop5, 3b)
 		: "=r"(faulted) : "0" (faulted));
 
+early_done:
 	switch (faulted) {
 	case 0:
 		pr_info("converting mcount calls to 0f 1f 44 00 00\n");
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 7b875fd..d1e5cf4 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -139,7 +139,7 @@ static void __cpuinit init_amd_k6(struct cpuinfo_x86 *c)
 
 	if (c->x86_model == 10) {
 		/* AMD Geode LX is model 10 */
-		/* placeholder for any needed mods */
+		set_cpu_cap(c, X86_FEATURE_NO_NOPL);
 		return;
 	}
 }

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-03  5:50 ` Borislav Petkov
  2010-10-03  9:26   ` Borislav Petkov
@ 2010-10-03 16:32   ` Daniel Drake
  2010-10-03 17:12     ` Borislav Petkov
  2010-10-04 18:06     ` Steven Rostedt
  1 sibling, 2 replies; 19+ messages in thread
From: Daniel Drake @ 2010-10-03 16:32 UTC (permalink / raw)
  To: Borislav Petkov, Daniel Drake, jbaron, Andres Salomon,
	Chris Ball, linux-kernel, rostedt

On 3 October 2010 06:50, Borislav Petkov <bp@alien8.de> wrote:
> maybe because the Geode doesn't have both the
> P6_NOP5 and the NOP with 4 0x66 prefixes:
> http://kerneltrap.org/mailarchive/linux-kernel/2010/8/27/4612336
>
> and for some reason the trap can't find the fixup address. You say
> "hangs" so you don't even get an "invalid opcode" OOPS?

The XO doesn't have standard VGA, so it is difficult to debug such
early crashes. This is crashing so early that kernel messages don't
even start to get sent over serial.

To debug these things, I checkpoint the code with various calls which
send individual characters over serial:

static void log_serial(char c)
{
   while ((inb(0x3fd) & 0x20) == 0) ;
   outb(c, 0x3f8);
   while ((inb(0x3fd) & 0x40) == 0) ;
}

So, I'm not really sure if/how it crashed or oops'd. However, I can
confirm that panic() does not get reached, since I put a character log
in there and it doesn't get sent. Let me know if you want me to put
character logging in other places.

I applied your two patches by hand and it doesn't solve the issue,
because the init_amd_k6() code is called long after
arch_init_ideal_nop5()

Daniel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-03 16:32   ` Daniel Drake
@ 2010-10-03 17:12     ` Borislav Petkov
  2010-10-04 18:06     ` Steven Rostedt
  1 sibling, 0 replies; 19+ messages in thread
From: Borislav Petkov @ 2010-10-03 17:12 UTC (permalink / raw)
  To: Daniel Drake; +Cc: jbaron, Andres Salomon, Chris Ball, linux-kernel, rostedt

From: Daniel Drake <dsd@laptop.org>
Date: Sun, Oct 03, 2010 at 05:32:12PM +0100

> On 3 October 2010 06:50, Borislav Petkov <bp@alien8.de> wrote:
> > maybe because the Geode doesn't have both the
> > P6_NOP5 and the NOP with 4 0x66 prefixes:
> > http://kerneltrap.org/mailarchive/linux-kernel/2010/8/27/4612336
> >
> > and for some reason the trap can't find the fixup address. You say
> > "hangs" so you don't even get an "invalid opcode" OOPS?
> 
> The XO doesn't have standard VGA, so it is difficult to debug such
> early crashes. This is crashing so early that kernel messages don't
> even start to get sent over serial.
> 
> To debug these things, I checkpoint the code with various calls which
> send individual characters over serial:
> 
> static void log_serial(char c)
> {
>    while ((inb(0x3fd) & 0x20) == 0) ;
>    outb(c, 0x3f8);
>    while ((inb(0x3fd) & 0x40) == 0) ;
> }
> 
> So, I'm not really sure if/how it crashed or oops'd. However, I can
> confirm that panic() does not get reached, since I put a character log
> in there and it doesn't get sent. Let me know if you want me to put
> character logging in other places.

Nice! Debugging a machine like that looks like a lot of jumping through
hoops. But let me ask you this: did you bisect linux-next to come up
with the offending patch in your other mail?

> I applied your two patches by hand and it doesn't solve the
> issue, because the init_amd_k6() code is called long after
> arch_init_ideal_nop5()

Dang, we'll have to push the CPUID feature check into the early-routine
which runs before than arch_init_ideal_nop5() in setup_arch(), updated
patch is below. Just to make sure we're matching the right model, does
/proc/cpuinfo say family 5, model 10 on the machine?

Thanks.

--
From: Borislav Petkov <bp@alien8.de>
Date: Sun, 3 Oct 2010 11:11:54 +0200
Subject: [PATCH] jmp label: Fix boot hang on Geode LX

Since Geode LX doesn't implement the NOPL versions of NOP, fallback to
using JMP as a NOP on it.

Signed-off-by: Borislav Petkov <bp@alien8.de>
---
 arch/x86/include/asm/cpufeature.h |    2 +-
 arch/x86/kernel/alternative.c     |    6 ++++++
 arch/x86/kernel/cpu/amd.c         |   12 ++++++------
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 220e2ea..e2ef56c 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -89,7 +89,7 @@
 #define X86_FEATURE_LFENCE_RDTSC (3*32+18) /* "" Lfence synchronizes RDTSC */
 #define X86_FEATURE_11AP	(3*32+19) /* "" Bad local APIC aka 11AP */
 #define X86_FEATURE_NOPL	(3*32+20) /* The NOPL (0F 1F) instructions */
-					  /* 21 available, was AMD_C1E */
+#define X86_FEATURE_NO_NOPL	(3*32+21) /* "" Missing NOPL (e.g. Geode LX) */
 #define X86_FEATURE_XTOPOLOGY	(3*32+22) /* cpu topology enum extensions */
 #define X86_FEATURE_TSC_RELIABLE (3*32+23) /* TSC is known to be reliable */
 #define X86_FEATURE_NONSTOP_TSC	(3*32+24) /* TSC does not stop in C states */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index cb0e6d3..d5fad2b 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -666,6 +666,11 @@ void __init arch_init_ideal_nop5(void)
 	 *
 	 * TODO: check the cpuid to determine the best nop.
 	 */
+	if (boot_cpu_has(X86_FEATURE_NO_NOPL)) {
+		faulted = 2;
+		goto early_done;
+	}
+
 	asm volatile (
 		"ftrace_test_jmp:"
 		"jmp ftrace_test_p6nop\n"
@@ -688,6 +693,7 @@ void __init arch_init_ideal_nop5(void)
 		_ASM_EXTABLE(ftrace_test_nop5, 3b)
 		: "=r"(faulted) : "0" (faulted));
 
+early_done:
 	switch (faulted) {
 	case 0:
 		pr_info("converting mcount calls to 0f 1f 44 00 00\n");
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 7b875fd..8931977 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -136,12 +136,6 @@ static void __cpuinit init_amd_k6(struct cpuinfo_x86 *c)
 
 		return;
 	}
-
-	if (c->x86_model == 10) {
-		/* AMD Geode LX is model 10 */
-		/* placeholder for any needed mods */
-		return;
-	}
 }
 
 static void __cpuinit amd_k7_smp_check(struct cpuinfo_x86 *c)
@@ -413,6 +407,12 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
 	}
 #endif
 
+	if (c->x86 == 5 && c->x86_model == 10) {
+		/* AMD Geode LX is model 10 */
+		set_cpu_cap(c, X86_FEATURE_NO_NOPL);
+		return;
+	}
+
 	/* We need to do the following only once */
 	if (c != &boot_cpu_data)
 		return;
-- 
1.7.2.3


-- 
Regards/Gruss,
    Boris.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-02 19:16 Dynamic nop selection breaks boot on Geode LX Daniel Drake
  2010-10-03  5:50 ` Borislav Petkov
@ 2010-10-04 15:46 ` Jason Baron
  2010-10-04 16:49   ` Daniel Drake
                     ` (3 more replies)
  1 sibling, 4 replies; 19+ messages in thread
From: Jason Baron @ 2010-10-04 15:46 UTC (permalink / raw)
  To: Daniel Drake; +Cc: Andres Salomon, Chris Ball, linux-kernel, rostedt, mingo

On Sat, Oct 02, 2010 at 08:16:19PM +0100, Daniel Drake wrote:
> Hi,
> 
> linux-next fails to boot on OLPC's XO-1 laptop based on Geode LX processor.
> 
> This is because of
> 
> commit f49aa448561fe9215f43405cac6f31eb86317792
> Author: Jason Baron <jbaron@redhat.com>
> Date:   Fri Sep 17 11:08:51 2010 -0400
> 
>     jump label: Make dynamic no-op selection available outside of ftrace
> 
> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git;a=commitdiff;h=f49aa44
> 
> The Geode hangs inside the asm() statement inside arch_init_ideal_nop5.
> 
> Any thoughts?
> 
> I'm a bit out of my depth debugging assembly code, but will happily
> take pointers and patches to help diagnose.
> 
> cheers
> Daniel

Hi Daniel,

looks like I moved dynamic no-op selection way too early in the boot -
before the exception tables required to select invalid opcodes were
even set up. The patch belows moves them later and should resolve this
issue for you. thanks for narrowing it down!

thanks,

-Jason


move arch_init_ideal_nop5 later

arch_init_ideal_nop5() was being called from setup_arch() before
the exception table was setup. Move it later into
alternative_instructions().

Fixes a boot hang on OLPC's XO-1 laptop based on Geode LX
processor.


Reported-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Jason Baron <jbaron@redhat.com>
---
 arch/x86/kernel/alternative.c |    5 +++++
 arch/x86/kernel/setup.c       |    6 ------
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index cb0e6d3..d8b5b21 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -454,6 +454,7 @@ extern struct paravirt_patch_site __start_parainstructions[],
 
 void __init alternative_instructions(void)
 {
+	unsigned long flags;
 	/* The patching is not fully atomic, so try to avoid local interruptions
 	   that might execute the to be patched code.
 	   Other CPUs are not running. */
@@ -508,6 +509,10 @@ void __init alternative_instructions(void)
 				(unsigned long)__smp_locks_end);
 
 	restart_nmi();
+
+	local_irq_save(flags);
+	arch_init_ideal_nop5();
+	local_irq_restore(flags);
 }
 
 /**
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 468ccd2..85b02b6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -112,7 +112,6 @@
 #include <asm/numa_64.h>
 #endif
 #include <asm/mce.h>
-#include <asm/alternative.h>
 
 /*
  * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
@@ -690,7 +689,6 @@ void __init setup_arch(char **cmdline_p)
 {
 	int acpi = 0;
 	int k8 = 0;
-	unsigned long flags;
 
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
@@ -1038,10 +1036,6 @@ void __init setup_arch(char **cmdline_p)
 	x86_init.oem.banner();
 
 	mcheck_init();
-
-	local_irq_save(flags);
-	arch_init_ideal_nop5();
-	local_irq_restore(flags);
 }
 
 #ifdef CONFIG_X86_32
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-04 15:46 ` Jason Baron
@ 2010-10-04 16:49   ` Daniel Drake
  2010-10-04 20:31   ` Steven Rostedt
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Daniel Drake @ 2010-10-04 16:49 UTC (permalink / raw)
  To: Jason Baron; +Cc: Andres Salomon, Chris Ball, linux-kernel, rostedt, mingo

On 4 October 2010 16:46, Jason Baron <jbaron@redhat.com> wrote:
> looks like I moved dynamic no-op selection way too early in the boot -
> before the exception tables required to select invalid opcodes were
> even set up. The patch belows moves them later and should resolve this
> issue for you. thanks for narrowing it down!

Applied this without the earlier patches from Borislav.

Thanks, that fixes it.

Daniel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-03 16:32   ` Daniel Drake
  2010-10-03 17:12     ` Borislav Petkov
@ 2010-10-04 18:06     ` Steven Rostedt
  1 sibling, 0 replies; 19+ messages in thread
From: Steven Rostedt @ 2010-10-04 18:06 UTC (permalink / raw)
  To: Daniel Drake
  Cc: Borislav Petkov, jbaron, Andres Salomon, Chris Ball, linux-kernel

On Sun, 2010-10-03 at 17:32 +0100, Daniel Drake wrote:
> On 3 October 2010 06:50, Borislav Petkov <bp@alien8.de> wrote:
> > maybe because the Geode doesn't have both the
> > P6_NOP5 and the NOP with 4 0x66 prefixes:
> > http://kerneltrap.org/mailarchive/linux-kernel/2010/8/27/4612336
> >
> > and for some reason the trap can't find the fixup address. You say
> > "hangs" so you don't even get an "invalid opcode" OOPS?
> 
> The XO doesn't have standard VGA, so it is difficult to debug such
> early crashes. This is crashing so early that kernel messages don't
> even start to get sent over serial.
> 
> To debug these things, I checkpoint the code with various calls which
> send individual characters over serial:
> 
> static void log_serial(char c)
> {
>    while ((inb(0x3fd) & 0x20) == 0) ;
>    outb(c, 0x3f8);
>    while ((inb(0x3fd) & 0x40) == 0) ;
> }

Did you try earlyprintk? That is, on the kernel command line add:

	earlyprintk=ttyS0,115200

or whatever the tty and baud rate is. This will start printing out the
serial right at kernel startup. Even before printk is configured.

-- Steve

> 
> So, I'm not really sure if/how it crashed or oops'd. However, I can
> confirm that panic() does not get reached, since I put a character log
> in there and it doesn't get sent. Let me know if you want me to put
> character logging in other places.
> 
> I applied your two patches by hand and it doesn't solve the issue,
> because the init_amd_k6() code is called long after
> arch_init_ideal_nop5()
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-04 15:46 ` Jason Baron
  2010-10-04 16:49   ` Daniel Drake
@ 2010-10-04 20:31   ` Steven Rostedt
  2010-10-04 20:39     ` Jason Baron
  2010-10-04 21:51   ` H. Peter Anvin
  2010-10-26 20:08   ` Daniel Drake
  3 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2010-10-04 20:31 UTC (permalink / raw)
  To: Jason Baron; +Cc: Daniel Drake, Andres Salomon, Chris Ball, linux-kernel, mingo

On Mon, 2010-10-04 at 11:46 -0400, Jason Baron wrote:

>  	restart_nmi();
> +
> +	local_irq_save(flags);
> +	arch_init_ideal_nop5();
> +	local_irq_restore(flags);
>  }
>  
>  /**

BTW, why the irq disabling around the call?

-- Steve



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-04 20:31   ` Steven Rostedt
@ 2010-10-04 20:39     ` Jason Baron
  2010-10-04 22:11       ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Baron @ 2010-10-04 20:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Drake, Andres Salomon, Chris Ball, linux-kernel, mingo

On Mon, Oct 04, 2010 at 04:31:12PM -0400, Steven Rostedt wrote:
> On Mon, 2010-10-04 at 11:46 -0400, Jason Baron wrote:
> 
> >  	restart_nmi();
> > +
> > +	local_irq_save(flags);
> > +	arch_init_ideal_nop5();
> > +	local_irq_restore(flags);
> >  }
> >  
> >  /**
> 
> BTW, why the irq disabling around the call?
> 
> -- Steve
> 
> 

Only b/c the selection of the nop was originally done by
'ftrace_dyn_arch_init()' which had irq's disabled around it. So maybe
that can be dropped now?

thanks,

-Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-04 15:46 ` Jason Baron
  2010-10-04 16:49   ` Daniel Drake
  2010-10-04 20:31   ` Steven Rostedt
@ 2010-10-04 21:51   ` H. Peter Anvin
  2010-10-04 22:15     ` Steven Rostedt
  2010-10-26 20:08   ` Daniel Drake
  3 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2010-10-04 21:51 UTC (permalink / raw)
  To: Jason Baron
  Cc: Daniel Drake, Andres Salomon, Chris Ball, linux-kernel, rostedt, mingo

On 10/04/2010 08:46 AM, Jason Baron wrote:
> 
> move arch_init_ideal_nop5 later
> 
> arch_init_ideal_nop5() was being called from setup_arch() before
> the exception table was setup. Move it later into
> alternative_instructions().
> 
> Fixes a boot hang on OLPC's XO-1 laptop based on Geode LX
> processor.
> 

This code is fundamentally toxic and needs to be scrapped completely --
it is simply broken beyond repair.

We tried exactly this type of dynamic selection before, and it doesn't
work on broken virtualizers; in particular Microsoft VirtualPC can pass
the exception test and yet fail later.

The end result is very simple: you can always use NOPL on 64 bits, you
can never use NOPL on 32 bits.

66 66 66 66 90 will always *work* (as in, it will never fail) but it's
pretty slow on older CPUs which took a hit on handle prefixes -- but it
might still be faster than a jump on those.  Thus, in your code the JMP
case will never be reached anyway.

There isn't, of course, a classic 5-byte sequence, although the sequence:

	2E 8D 75 26 00

... should work (leal %ds:0(,%esi,1),%esi).  However, 66 ... 90 is
likely to work better on modern processors (although I haven't measured it.)

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-04 20:39     ` Jason Baron
@ 2010-10-04 22:11       ` Steven Rostedt
  0 siblings, 0 replies; 19+ messages in thread
From: Steven Rostedt @ 2010-10-04 22:11 UTC (permalink / raw)
  To: Jason Baron; +Cc: Daniel Drake, Andres Salomon, Chris Ball, linux-kernel, mingo

On Mon, 2010-10-04 at 16:39 -0400, Jason Baron wrote:
> On Mon, Oct 04, 2010 at 04:31:12PM -0400, Steven Rostedt wrote:
> > On Mon, 2010-10-04 at 11:46 -0400, Jason Baron wrote:
> > 
> > >  	restart_nmi();
> > > +
> > > +	local_irq_save(flags);
> > > +	arch_init_ideal_nop5();
> > > +	local_irq_restore(flags);
> > >  }
> > >  
> > >  /**
> > 
> > BTW, why the irq disabling around the call?
> > 
> > -- Steve
> > 
> > 
> 
> Only b/c the selection of the nop was originally done by
> 'ftrace_dyn_arch_init()' which had irq's disabled around it. So maybe
> that can be dropped now?

Yeah, that disable is there for legacy reasons. It can be scrapped.

I'll remove it.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-04 21:51   ` H. Peter Anvin
@ 2010-10-04 22:15     ` Steven Rostedt
  2010-10-04 22:22       ` H. Peter Anvin
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2010-10-04 22:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jason Baron, Daniel Drake, Andres Salomon, Chris Ball,
	linux-kernel, mingo

On Mon, 2010-10-04 at 14:51 -0700, H. Peter Anvin wrote:
> On 10/04/2010 08:46 AM, Jason Baron wrote:
> > 
> > move arch_init_ideal_nop5 later
> > 
> > arch_init_ideal_nop5() was being called from setup_arch() before
> > the exception table was setup. Move it later into
> > alternative_instructions().
> > 
> > Fixes a boot hang on OLPC's XO-1 laptop based on Geode LX
> > processor.
> > 
> 
> This code is fundamentally toxic and needs to be scrapped completely --
> it is simply broken beyond repair.
> 
> We tried exactly this type of dynamic selection before, and it doesn't
> work on broken virtualizers; in particular Microsoft VirtualPC can pass
> the exception test and yet fail later.

So the code is broken because of broken virtualizers??

> 
> The end result is very simple: you can always use NOPL on 64 bits, you
> can never use NOPL on 32 bits.
> 
> 66 66 66 66 90 will always *work* (as in, it will never fail) but it's
> pretty slow on older CPUs which took a hit on handle prefixes -- but it
> might still be faster than a jump on those.  Thus, in your code the JMP
> case will never be reached anyway.

The jmp was there because of paranoia, and I never expected it to be
reached.

> 
> There isn't, of course, a classic 5-byte sequence, although the sequence:
> 
> 	2E 8D 75 26 00
> 
> ... should work (leal %ds:0(,%esi,1),%esi).  However, 66 ... 90 is
> likely to work better on modern processors (although I haven't measured it.)

The point is, this nop will be at _every_ function call (it replaces the
mcount call). Not just scattered throughout the kernel. It is imperative
that we have the best nop available.

So what would you recommend?

-- Steve



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-04 22:15     ` Steven Rostedt
@ 2010-10-04 22:22       ` H. Peter Anvin
  2010-10-04 22:27         ` Nick Lowe
       [not found]         ` <AANLkTikCkwF+yd4kdad8Bcz-6YX+STiy1wgrFfJxsfRg@mail.gmail.com>
  0 siblings, 2 replies; 19+ messages in thread
From: H. Peter Anvin @ 2010-10-04 22:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jason Baron, Daniel Drake, Andres Salomon, Chris Ball,
	linux-kernel, mingo, Borislav Petkov

On 10/04/2010 03:15 PM, Steven Rostedt wrote:
>>
>> We tried exactly this type of dynamic selection before, and it doesn't
>> work on broken virtualizers; in particular Microsoft VirtualPC can pass
>> the exception test and yet fail later.
> 
> So the code is broken because of broken virtualizers??
> 

Yup.  Fun, isn't it?  :(  Unfortunately, broken virtualizers appear as
broken CPUs to us.  We used to do the #UD probe for NOPL, but it didn't
work.

>>
>> The end result is very simple: you can always use NOPL on 64 bits, you
>> can never use NOPL on 32 bits.
>>
>> 66 66 66 66 90 will always *work* (as in, it will never fail) but it's
>> pretty slow on older CPUs which took a hit on handle prefixes -- but it
>> might still be faster than a jump on those.  Thus, in your code the JMP
>> case will never be reached anyway.
> 
> The jmp was there because of paranoia, and I never expected it to be
> reached.
> 
>>
>> There isn't, of course, a classic 5-byte sequence, although the sequence:
>>
>> 	2E 8D 75 26 00
>>
>> ... should work (leal %ds:0(,%esi,1),%esi).  However, 66 ... 90 is
>> likely to work better on modern processors (although I haven't measured it.)
> 
> The point is, this nop will be at _every_ function call (it replaces the
> mcount call). Not just scattered throughout the kernel. It is imperative
> that we have the best nop available.
> 
> So what would you recommend?
> 

NOPL is special, because it's the only NOP sequence that isn't actually
*supported* on all processors (and we have found that we can't even use
it on 32 bits, even though the vast majority of all real-life 32-bit
processors do support it.)

Borislav is just checking to see if we can just use NOPL unconditionally
on 64 bits; as far as 32 bits is concerned the only option for picking
what is "best" is probably to benchmark some set of sequences on the set
of processors we care about.  However, I suspect that on any modern
processors either 66 66 66 66 90 or 2E 8D 75 26 00 will work equally well.

With a bit of benchmarking I think we could adopt the policy of using
NOPL on 64 bits and one of the above sequences on 32 bits.

	-hpa



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-04 22:22       ` H. Peter Anvin
@ 2010-10-04 22:27         ` Nick Lowe
       [not found]         ` <AANLkTikCkwF+yd4kdad8Bcz-6YX+STiy1wgrFfJxsfRg@mail.gmail.com>
  1 sibling, 0 replies; 19+ messages in thread
From: Nick Lowe @ 2010-10-04 22:27 UTC (permalink / raw)
  To: nick.lowe; +Cc: linux-kernel

As far as I'm aware, the check is just broken in VMs that trace their
lineage to Connectix's VirtualPC that Microsoft acquired back in 2006.
Is this correct? On balance, is this really to be cared about?
As far as 'common' processors that don't play nicely with NOPL...

VIA Edens based on the 'Samuel 2' design do not support CMOV or NOPL.
All VIA Edens based on the 'Nehemiah' design support CMOV but not
NOPL. (Introduced in
2003.)

Via C3s based on the 'Samuel 2'or 'Ezra'/'Ezra-T' design do not
support CMOV or NOPL.
All C3s based on the 'Nehemiah' design support CMOV but not NOPL.
(Introduced in 2003)

National Semi's GXm, GXLV and GX1 do not support CMOV or NOPL.
All Geodes since and including National Semi's GX2 support CMOV but
not NOPL. (Introduced in
2002)
The AMD branded Geodes (GX and LX) support CMOV but not NOPL.

The Cyrix 6x86 processors do not support CMOV or NOPL.
The Cyrix 8x86MX / Cyrix MII do support CMOV but not NOPL.

The AMD K6 and K6-2 do not support CMOV or NOPL.

There are probably a few more I've forgotten about...
Nick

On Mon, Oct 4, 2010 at 11:22 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 10/04/2010 03:15 PM, Steven Rostedt wrote:
> >>
> >> We tried exactly this type of dynamic selection before, and it doesn't
> >> work on broken virtualizers; in particular Microsoft VirtualPC can pass
> >> the exception test and yet fail later.
> >
> > So the code is broken because of broken virtualizers??
> >
>
> Yup.  Fun, isn't it?  :(  Unfortunately, broken virtualizers appear as
> broken CPUs to us.  We used to do the #UD probe for NOPL, but it didn't
> work.
>
> >>
> >> The end result is very simple: you can always use NOPL on 64 bits, you
> >> can never use NOPL on 32 bits.
> >>
> >> 66 66 66 66 90 will always *work* (as in, it will never fail) but it's
> >> pretty slow on older CPUs which took a hit on handle prefixes -- but it
> >> might still be faster than a jump on those.  Thus, in your code the JMP
> >> case will never be reached anyway.
> >
> > The jmp was there because of paranoia, and I never expected it to be
> > reached.
> >
> >>
> >> There isn't, of course, a classic 5-byte sequence, although the sequence:
> >>
> >>      2E 8D 75 26 00
> >>
> >> ... should work (leal %ds:0(,%esi,1),%esi).  However, 66 ... 90 is
> >> likely to work better on modern processors (although I haven't measured it.)
> >
> > The point is, this nop will be at _every_ function call (it replaces the
> > mcount call). Not just scattered throughout the kernel. It is imperative
> > that we have the best nop available.
> >
> > So what would you recommend?
> >
>
> NOPL is special, because it's the only NOP sequence that isn't actually
> *supported* on all processors (and we have found that we can't even use
> it on 32 bits, even though the vast majority of all real-life 32-bit
> processors do support it.)
>
> Borislav is just checking to see if we can just use NOPL unconditionally
> on 64 bits; as far as 32 bits is concerned the only option for picking
> what is "best" is probably to benchmark some set of sequences on the set
> of processors we care about.  However, I suspect that on any modern
> processors either 66 66 66 66 90 or 2E 8D 75 26 00 will work equally well.
>
> With a bit of benchmarking I think we could adopt the policy of using
> NOPL on 64 bits and one of the above sequences on 32 bits.
>
>        -hpa
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
       [not found]         ` <AANLkTikCkwF+yd4kdad8Bcz-6YX+STiy1wgrFfJxsfRg@mail.gmail.com>
@ 2010-10-04 22:32           ` H. Peter Anvin
  0 siblings, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2010-10-04 22:32 UTC (permalink / raw)
  To: Nick Lowe
  Cc: Steven Rostedt, Jason Baron, Daniel Drake, Andres Salomon,
	Chris Ball, linux-kernel, mingo, Borislav Petkov

On 10/04/2010 03:25 PM, Nick Lowe wrote:
> > As far as I'm aware, the check is just broken in VMs that trace their
> > lineage to Connectix's VirtualPC that Microsoft acquired back in 2006.
> > Is this correct? On balance, is this really to be cared about?
If it was a huge win from using NOPL, then it would perhaps not be -- or
we'd go through more effort at detecting the broken cases.  However,
since it's not, it's easier to just avoid it.

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-04 15:46 ` Jason Baron
                     ` (2 preceding siblings ...)
  2010-10-04 21:51   ` H. Peter Anvin
@ 2010-10-26 20:08   ` Daniel Drake
  2010-10-26 20:12     ` Jason Baron
  3 siblings, 1 reply; 19+ messages in thread
From: Daniel Drake @ 2010-10-26 20:08 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andres Salomon, Chris Ball, linux-kernel, rostedt, mingo, hpa,
	nick.lowe, Borislav Petkov, tglx

Hi,

On 4 October 2010 16:46, Jason Baron <jbaron@redhat.com> wrote:
> Hi Daniel,
>
> looks like I moved dynamic no-op selection way too early in the boot -
> before the exception tables required to select invalid opcodes were
> even set up. The patch belows moves them later and should resolve this
> issue for you. thanks for narrowing it down!

Bump... This patch has not been merged. I see that there is still some
lack of consensus in this area but please don't forget about it. The
offending commit (f49aa448561fe) has ended up in Linus' tree, so
Linus' tree does not boot on Geode as a result.

>
>
> move arch_init_ideal_nop5 later
>
> arch_init_ideal_nop5() was being called from setup_arch() before
> the exception table was setup. Move it later into
> alternative_instructions().
>
> Fixes a boot hang on OLPC's XO-1 laptop based on Geode LX
> processor.
>
>
> Reported-by: Daniel Drake <dsd@laptop.org>
> Signed-off-by: Jason Baron <jbaron@redhat.com>
> ---
>  arch/x86/kernel/alternative.c |    5 +++++
>  arch/x86/kernel/setup.c       |    6 ------
>  2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index cb0e6d3..d8b5b21 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -454,6 +454,7 @@ extern struct paravirt_patch_site __start_parainstructions[],
>
>  void __init alternative_instructions(void)
>  {
> +       unsigned long flags;
>        /* The patching is not fully atomic, so try to avoid local interruptions
>           that might execute the to be patched code.
>           Other CPUs are not running. */
> @@ -508,6 +509,10 @@ void __init alternative_instructions(void)
>                                (unsigned long)__smp_locks_end);
>
>        restart_nmi();
> +
> +       local_irq_save(flags);
> +       arch_init_ideal_nop5();
> +       local_irq_restore(flags);
>  }
>
>  /**
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 468ccd2..85b02b6 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -112,7 +112,6 @@
>  #include <asm/numa_64.h>
>  #endif
>  #include <asm/mce.h>
> -#include <asm/alternative.h>
>
>  /*
>  * end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
> @@ -690,7 +689,6 @@ void __init setup_arch(char **cmdline_p)
>  {
>        int acpi = 0;
>        int k8 = 0;
> -       unsigned long flags;
>
>  #ifdef CONFIG_X86_32
>        memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
> @@ -1038,10 +1036,6 @@ void __init setup_arch(char **cmdline_p)
>        x86_init.oem.banner();
>
>        mcheck_init();
> -
> -       local_irq_save(flags);
> -       arch_init_ideal_nop5();
> -       local_irq_restore(flags);
>  }
>
>  #ifdef CONFIG_X86_32
> --
> 1.7.1
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Dynamic nop selection breaks boot on Geode LX
  2010-10-26 20:08   ` Daniel Drake
@ 2010-10-26 20:12     ` Jason Baron
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Baron @ 2010-10-26 20:12 UTC (permalink / raw)
  To: Daniel Drake
  Cc: Andres Salomon, Chris Ball, linux-kernel, rostedt, mingo, hpa,
	nick.lowe, Borislav Petkov, tglx

On Tue, Oct 26, 2010 at 09:08:27PM +0100, Daniel Drake wrote:
> Hi,
> 
> On 4 October 2010 16:46, Jason Baron <jbaron@redhat.com> wrote:
> > Hi Daniel,
> >
> > looks like I moved dynamic no-op selection way too early in the boot -
> > before the exception tables required to select invalid opcodes were
> > even set up. The patch belows moves them later and should resolve this
> > issue for you. thanks for narrowing it down!
> 
> Bump... This patch has not been merged. I see that there is still some
> lack of consensus in this area but please don't forget about it. The
> offending commit (f49aa448561fe) has ended up in Linus' tree, so
> Linus' tree does not boot on Geode as a result.
> 

yeah, sorry...I've been busy chasing down a compiler issue...I haven't
forgotten, and will post this shortly.

thanks,

-Jason


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-10-26 20:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-02 19:16 Dynamic nop selection breaks boot on Geode LX Daniel Drake
2010-10-03  5:50 ` Borislav Petkov
2010-10-03  9:26   ` Borislav Petkov
2010-10-03 14:47     ` Borislav Petkov
2010-10-03 16:32   ` Daniel Drake
2010-10-03 17:12     ` Borislav Petkov
2010-10-04 18:06     ` Steven Rostedt
2010-10-04 15:46 ` Jason Baron
2010-10-04 16:49   ` Daniel Drake
2010-10-04 20:31   ` Steven Rostedt
2010-10-04 20:39     ` Jason Baron
2010-10-04 22:11       ` Steven Rostedt
2010-10-04 21:51   ` H. Peter Anvin
2010-10-04 22:15     ` Steven Rostedt
2010-10-04 22:22       ` H. Peter Anvin
2010-10-04 22:27         ` Nick Lowe
     [not found]         ` <AANLkTikCkwF+yd4kdad8Bcz-6YX+STiy1wgrFfJxsfRg@mail.gmail.com>
2010-10-04 22:32           ` H. Peter Anvin
2010-10-26 20:08   ` Daniel Drake
2010-10-26 20:12     ` Jason Baron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.