* Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
@ 2008-07-21 13:14 Luis R. Rodriguez
2008-07-21 13:23 ` H. Peter Anvin
0 siblings, 1 reply; 19+ messages in thread
From: Luis R. Rodriguez @ 2008-07-21 13:14 UTC (permalink / raw)
To: linux kernel, H. Peter Anvin; +Cc: Ivan Seskar, jfm3, Sujith
This bug seems to be present since 2.6.22 [1], so hope we can get this
fixed ASAP. Let me know if you have patch suggestions I can test.
This crashes very early, I had to use earlyprintk to get it.
BUG: Int 6: CR2 00000000
EDI 00000000 ESI 0009f000 EBP 00000000 ESP c036ff60
EBX c03e6070 EDX 00000006 ECX 0000009f EAX c034d240
err 00000000 EIP c0387ac2 CS 00000060 flg 00010016
Stack: 00000000 00000000 c03b63d8 c037bb11 0001dff0 00003c00 c0399614 c03b6304
00822007 c037aeb4 c0304134 000001df c0304117 00000000 30303030 205d3030
00000000 c034c794 00000000 00000000 00822007 c02b921f c034c794 00000000
[ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.26 #1
BUG: Int 6: CR2 00000000
EDI 00000000 ESI c034c018 EBP 00000000 ESP c036feb8
EBX c036ff28 EDX 00000006 ECX c03e6070 EAX c034c018
err 00000000 EIP c013d922 CS 00000060 flg 00010016
Stack: c036ff28 c03e6070 c01305ef c0104a34 c036fffc c036e000 0009f000 00000002
0009f000 00000000 00000000 c0105969 00000000 c02bee94 c0310de8 c02b9129
00000000 c0303d6a 00000000 c034751d c03c7278 c0347faa 00000002 c0347feb
[ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.26 #1
node1-1:~# cat /proc/cpuinfo
processor : 0
vendor_id : CentaurHauls
cpu family : 6
model : 9
model name : VIA Nehemiah
stepping : 8
cpu MHz : 997.108
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr cx8 sep mtrr pge cmov pat mmx
fxsr sse up rng rng_en ace ace_en
bogomips : 1996.49
clflush size : 32
You can get my config from:
http://www.winlab.rutgers.edu/~mcgrof/configs/config-2.6.26
It's basically debian based from 2.6.25-2 just updated for 2.6.26. Let
me know if I can provide more information.
[1] http://orbit-lab.org/wiki/Documentation/SupportedImages/baseline-8.3.ndz
Luis
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-21 13:14 Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot Luis R. Rodriguez
@ 2008-07-21 13:23 ` H. Peter Anvin
2008-07-21 14:01 ` Luis R. Rodriguez
0 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-21 13:23 UTC (permalink / raw)
To: Luis R. Rodriguez; +Cc: linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith
Luis R. Rodriguez wrote:
> This bug seems to be present since 2.6.22 [1], so hope we can get this
> fixed ASAP. Let me know if you have patch suggestions I can test.
>
> This crashes very early, I had to use earlyprintk to get it.
>
> BUG: Int 6: CR2 00000000
> EDI 00000000 ESI 0009f000 EBP 00000000 ESP c036ff60
> EBX c03e6070 EDX 00000006 ECX 0000009f EAX c034d240
> err 00000000 EIP c0387ac2 CS 00000060 flg 00010016
> Stack: 00000000 00000000 c03b63d8 c037bb11 0001dff0 00003c00 c0399614 c03b6304
> 00822007 c037aeb4 c0304134 000001df c0304117 00000000 30303030 205d3030
> 00000000 c034c794 00000000 00000000 00822007 c02b921f c034c794 00000000
> [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.26 #1
Use objdump -d or something to find out what is at 0xc0387ac2; the error
is an undefined instruction exception.
I suspect this is another case of a processor reporting family == 6 and
not providing the 0F 1F NOP opcodes.
-hpa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-21 13:23 ` H. Peter Anvin
@ 2008-07-21 14:01 ` Luis R. Rodriguez
2008-07-21 23:24 ` H. Peter Anvin
0 siblings, 1 reply; 19+ messages in thread
From: Luis R. Rodriguez @ 2008-07-21 14:01 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith
On Mon, Jul 21, 2008 at 6:23 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> Luis R. Rodriguez wrote:
>>
>> This bug seems to be present since 2.6.22 [1], so hope we can get this
>> fixed ASAP. Let me know if you have patch suggestions I can test.
>>
>> This crashes very early, I had to use earlyprintk to get it.
>>
>> BUG: Int 6: CR2 00000000
>> EDI 00000000 ESI 0009f000 EBP 00000000 ESP c036ff60
>> EBX c03e6070 EDX 00000006 ECX 0000009f EAX c034d240
>> err 00000000 EIP c0387ac2 CS 00000060 flg 00010016
>> Stack: 00000000 00000000 c03b63d8 c037bb11 0001dff0 00003c00 c0399614
>> c03b6304
>> 00822007 c037aeb4 c0304134 000001df c0304117 00000000 30303030
>> 205d3030
>> 00000000 c034c794 00000000 00000000 00822007 c02b921f c034c794
>> 00000000
>> [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.26 #1
>
> Use objdump -d or something to find out what is at 0xc0387ac2
I've put extra spaces between the culprit.
c0387aa0 <free_bootmem>:
c0387aa0: 57 push %edi
c0387aa1: 89 c7 mov %eax,%edi
c0387aa3: a1 40 d2 34 c0 mov 0xc034d240,%eax
c0387aa8: 56 push %esi
c0387aa9: 89 d6 mov %edx,%esi
c0387aab: 53 push %ebx
c0387aac: eb 0e jmp c0387abc <free_bootmem+0x1c>
c0387aae: 89 d8 mov %ebx,%eax
c0387ab0: 89 f1 mov %esi,%ecx
c0387ab2: 89 fa mov %edi,%edx
c0387ab4: e8 67 ff ff ff call c0387a20 <free_bootmem_core>
c0387ab9: 8b 43 18 mov 0x18(%ebx),%eax
c0387abc: 8d 58 e8 lea -0x18(%eax),%ebx
c0387abf: 8b 43 18 mov 0x18(%ebx),%eax
c0387ac2: 0f 1f 40 00 nopl 0x0(%eax)
c0387ac6: 81 fb 28 d2 34 c0 cmp $0xc034d228,%ebx
c0387acc: 75 e0 jne c0387aae <free_bootmem+0xe>
c0387ace: 5b pop %ebx
c0387acf: 5e pop %esi
c0387ad0: 5f pop %edi
c0387ad1: c3 ret
Luis
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-21 14:01 ` Luis R. Rodriguez
@ 2008-07-21 23:24 ` H. Peter Anvin
2008-07-22 4:47 ` Luis R. Rodriguez
2008-07-22 13:14 ` Ingo Molnar
0 siblings, 2 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-21 23:24 UTC (permalink / raw)
To: Luis R. Rodriguez; +Cc: linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith
Luis R. Rodriguez wrote:
> On Mon, Jul 21, 2008 at 6:23 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> Luis R. Rodriguez wrote:
>>> This bug seems to be present since 2.6.22 [1], so hope we can get this
>>> fixed ASAP. Let me know if you have patch suggestions I can test.
>>>
>>> This crashes very early, I had to use earlyprintk to get it.
>>>
>
> I've put extra spaces between the culprit.
>
> c0387ac2: 0f 1f 40 00 nopl 0x0(%eax)
>
Sure enough, our old friend.
You have in your configuration:
CONFIG_M686=y
# CONFIG_X86_GENERIC is not set
... so this is fully expected; CONFIG_M686 without CONFIG_X86_GENERIC is
not compatible with such processors.
Not a bug.
-hpa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-21 23:24 ` H. Peter Anvin
@ 2008-07-22 4:47 ` Luis R. Rodriguez
2008-07-22 13:10 ` H. Peter Anvin
2008-07-22 17:10 ` Jeff Garzik
2008-07-22 13:14 ` Ingo Molnar
1 sibling, 2 replies; 19+ messages in thread
From: Luis R. Rodriguez @ 2008-07-22 4:47 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith
On Mon, Jul 21, 2008 at 4:24 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> Luis R. Rodriguez wrote:
>>
>> On Mon, Jul 21, 2008 at 6:23 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> Luis R. Rodriguez wrote:
>>>>
>>>> This bug seems to be present since 2.6.22 [1], so hope we can get this
>>>> fixed ASAP. Let me know if you have patch suggestions I can test.
>>>>
>>>> This crashes very early, I had to use earlyprintk to get it.
>>>>
>>
>> I've put extra spaces between the culprit.
>>
>> c0387ac2: 0f 1f 40 00 nopl 0x0(%eax)
>>
>
> Sure enough, our old friend.
>
> You have in your configuration:
>
> CONFIG_M686=y
> # CONFIG_X86_GENERIC is not set
>
> ... so this is fully expected; CONFIG_M686 without CONFIG_X86_GENERIC is not
> compatible with such processors.
>
> Not a bug.
Thanks for taking a look at this. So well, it would be a
misconfiguration bug by the distribution then to try to support a
generic 686 kernel wihtout GENERIC then.
Luis
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 4:47 ` Luis R. Rodriguez
@ 2008-07-22 13:10 ` H. Peter Anvin
2008-07-22 17:10 ` Jeff Garzik
1 sibling, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-22 13:10 UTC (permalink / raw)
To: Luis R. Rodriguez; +Cc: linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith
Luis R. Rodriguez wrote:
>
> Thanks for taking a look at this. So well, it would be a
> misconfiguration bug by the distribution then to try to support a
> generic 686 kernel wihtout GENERIC then.
>
Yes.
-hpa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-21 23:24 ` H. Peter Anvin
2008-07-22 4:47 ` Luis R. Rodriguez
@ 2008-07-22 13:14 ` Ingo Molnar
2008-07-22 13:24 ` H. Peter Anvin
1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2008-07-22 13:14 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
jfm3, Sujith
* H. Peter Anvin <hpa@zytor.com> wrote:
> Luis R. Rodriguez wrote:
>> On Mon, Jul 21, 2008 at 6:23 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> Luis R. Rodriguez wrote:
>>>> This bug seems to be present since 2.6.22 [1], so hope we can get this
>>>> fixed ASAP. Let me know if you have patch suggestions I can test.
>>>>
>>>> This crashes very early, I had to use earlyprintk to get it.
>>>>
>>
>> I've put extra spaces between the culprit.
>>
>> c0387ac2: 0f 1f 40 00 nopl 0x0(%eax)
>>
>
> Sure enough, our old friend.
>
> You have in your configuration:
>
> CONFIG_M686=y
> # CONFIG_X86_GENERIC is not set
>
> ... so this is fully expected; CONFIG_M686 without CONFIG_X86_GENERIC is
> not compatible with such processors.
>
> Not a bug.
it would still be nice to get a nice printk and panic during bootup
instead of some obscure crash, hm?
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 13:14 ` Ingo Molnar
@ 2008-07-22 13:24 ` H. Peter Anvin
2008-07-22 13:46 ` Ingo Molnar
2008-07-26 18:31 ` Andi Kleen
0 siblings, 2 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-22 13:24 UTC (permalink / raw)
To: Ingo Molnar
Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
jfm3, Sujith
Ingo Molnar wrote:
>>
>> Not a bug.
>
> it would still be nice to get a nice printk and panic during bootup
> instead of some obscure crash, hm?
>
Yes. The fundamental problem is that Centaur has a set of CPUs which
report family == 6 but don't have the long NOP instructions. We would
need an exact CPUID criterion for these CPUs in order to be able to
report it as an error. An alternative would be to attempt trapping in
the real-mode code (#UD is one of the *very* few CPU exceptions which
can be reliably captured in real mode on a BIOS system), but doing so
would probably mean breaking Loadlin at the very least.
We can't "printk and panic" because we never get that far in the kernel
proper, for obvious reasons: the code is quite littered with these buggers.
-hpa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 13:24 ` H. Peter Anvin
@ 2008-07-22 13:46 ` Ingo Molnar
2008-07-22 13:54 ` H. Peter Anvin
2008-07-26 18:31 ` Andi Kleen
1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2008-07-22 13:46 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
jfm3, Sujith
* H. Peter Anvin <hpa@zytor.com> wrote:
> Ingo Molnar wrote:
>>>
>>> Not a bug.
>>
>> it would still be nice to get a nice printk and panic during bootup
>> instead of some obscure crash, hm?
>>
>
> Yes. The fundamental problem is that Centaur has a set of CPUs which
> report family == 6 but don't have the long NOP instructions. We would
> need an exact CPUID criterion for these CPUs in order to be able to
> report it as an error. An alternative would be to attempt trapping in
> the real-mode code (#UD is one of the *very* few CPU exceptions which
> can be reliably captured in real mode on a BIOS system), but doing so
> would probably mean breaking Loadlin at the very least.
>
> We can't "printk and panic" because we never get that far in the
> kernel proper, for obvious reasons: the code is quite littered with
> these buggers.
hm. How about to default to a safe NOP all the way up to where we can
fix up alternatives and install a different NOP. (which we could also
test first via intentionally jumping on it and catching any exception
via a special exception handler)
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 13:46 ` Ingo Molnar
@ 2008-07-22 13:54 ` H. Peter Anvin
0 siblings, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-22 13:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
jfm3, Sujith
Ingo Molnar wrote:
>>
>> We can't "printk and panic" because we never get that far in the
>> kernel proper, for obvious reasons: the code is quite littered with
>> these buggers.
>
> hm. How about to default to a safe NOP all the way up to where we can
> fix up alternatives and install a different NOP. (which we could also
> test first via intentionally jumping on it and catching any exception
> via a special exception handler)
>
I don't really think that's realistic, especially if gcc starts using
these instructions (which it really *should*.)
You can make the same argument for every non-i386 instruction (heck,
even every non-8086 instruction), and it quickly gets unworkable.
Since it is extremely likely that the set of processors affected is now
bounded, I think it's just a matter of identifying the relevant CPUID
info. As far as I know, only VIA is affected.
What is worse is that there are a number of "virtual processors" out
there which are, in effect, separate implementations of the x86
architecture, but don't actually identify as anything else. Several of
them have broken nopl implementations, but identify as processors which
are known good in this department. Again, nothing unique to nopl about
this, but it's a generic problem.
-hpa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 4:47 ` Luis R. Rodriguez
2008-07-22 13:10 ` H. Peter Anvin
@ 2008-07-22 17:10 ` Jeff Garzik
2008-07-22 18:21 ` H. Peter Anvin
1 sibling, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2008-07-22 17:10 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: H. Peter Anvin, linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith
Luis R. Rodriguez wrote:
> On Mon, Jul 21, 2008 at 4:24 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> Luis R. Rodriguez wrote:
>>> On Mon, Jul 21, 2008 at 6:23 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>> Luis R. Rodriguez wrote:
>>>>> This bug seems to be present since 2.6.22 [1], so hope we can get this
>>>>> fixed ASAP. Let me know if you have patch suggestions I can test.
>>>>>
>>>>> This crashes very early, I had to use earlyprintk to get it.
>>>>>
>>> I've put extra spaces between the culprit.
>>>
>>> c0387ac2: 0f 1f 40 00 nopl 0x0(%eax)
>>>
>> Sure enough, our old friend.
>>
>> You have in your configuration:
>>
>> CONFIG_M686=y
>> # CONFIG_X86_GENERIC is not set
>>
>> ... so this is fully expected; CONFIG_M686 without CONFIG_X86_GENERIC is not
>> compatible with such processors.
>>
>> Not a bug.
>
> Thanks for taking a look at this. So well, it would be a
> misconfiguration bug by the distribution then to try to support a
> generic 686 kernel wihtout GENERIC then.
Well, it may be intentional -- some distros simply exclude support for
the lower-volume VIA processors, since that might imply building their
"generic 686 kernel" sans CMOV and some other instructions, and changing
the compiler's instruction scheduling to something less optimal for the
majority. :/
Jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 17:10 ` Jeff Garzik
@ 2008-07-22 18:21 ` H. Peter Anvin
2008-07-22 18:33 ` Jeff Garzik
0 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-22 18:21 UTC (permalink / raw)
To: Jeff Garzik
Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
jfm3, Sujith
Jeff Garzik wrote:
>>
>> Thanks for taking a look at this. So well, it would be a
>> misconfiguration bug by the distribution then to try to support a
>> generic 686 kernel wihtout GENERIC then.
>
> Well, it may be intentional -- some distros simply exclude support for
> the lower-volume VIA processors, since that might imply building their
> "generic 686 kernel" sans CMOV and some other instructions, and changing
> the compiler's instruction scheduling to something less optimal for the
> majority. :/
>
X86_GENERIC shouldn't disable CMOV?
We're only referring specifically to the family == 6 VIA processors here.
-hpa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 18:21 ` H. Peter Anvin
@ 2008-07-22 18:33 ` Jeff Garzik
2008-07-22 18:41 ` H. Peter Anvin
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2008-07-22 18:33 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
jfm3, Sujith
H. Peter Anvin wrote:
> Jeff Garzik wrote:
>>>
>>> Thanks for taking a look at this. So well, it would be a
>>> misconfiguration bug by the distribution then to try to support a
>>> generic 686 kernel wihtout GENERIC then.
>>
>> Well, it may be intentional -- some distros simply exclude support for
>> the lower-volume VIA processors, since that might imply building their
>> "generic 686 kernel" sans CMOV and some other instructions, and
>> changing the compiler's instruction scheduling to something less
>> optimal for the majority. :/
>>
>
> X86_GENERIC shouldn't disable CMOV?
I said "generic 686 kernel" not a specific Kconfig option (for reasons
stated below), which is a bit different.
> We're only referring specifically to the family == 6 VIA processors here.
To be specific, I was merely saying that VIA processors where
c->x86_model==6 may lack CMOV.
I have not kept track of what current Kconfig options will set, but in
the past it was quite easy to build a "generic 686 kernel" that required
CMOV and thus excluded these VIA processors.
Distros in the past often wound up intentionally -not- supporting some
of these VIA processors, because they did not want to create a non-CMOV
kernel. (This policy obviously excluded older x86 as well)
If these things have been addressed recently (< 12-18 months) then all good.
Jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 18:33 ` Jeff Garzik
@ 2008-07-22 18:41 ` H. Peter Anvin
2008-07-22 23:28 ` Jeff Garzik
0 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-22 18:41 UTC (permalink / raw)
To: Jeff Garzik
Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
jfm3, Sujith
Jeff Garzik wrote:
>
>> We're only referring specifically to the family == 6 VIA processors here.
>
> To be specific, I was merely saying that VIA processors where
> c->x86_model==6 may lack CMOV.
>
> I have not kept track of what current Kconfig options will set, but in
> the past it was quite easy to build a "generic 686 kernel" that required
> CMOV and thus excluded these VIA processors.
>
> Distros in the past often wound up intentionally -not- supporting some
> of these VIA processors, because they did not want to create a non-CMOV
> kernel. (This policy obviously excluded older x86 as well)
>
> If these things have been addressed recently (< 12-18 months) then all
> good.
>
I am pretty sure CONFIG_X86_GENERIC doesn't disable CMOV, and since CMOV
is a separate CPUID flag it's all good (if the chip doesn't have it,
it'll trap.)
Unfortunately Intel didn't assign a CPUID flag for the long NOPs, and
then didn't document them (I think partially because they were a
retcon), but yet it reflected a serious hole in Centaur's
characterization effort that they bumped family to 6 without following
P6 behaviour for a massive range of opcodes.
The main reason for disabling P6 NOPs for CONFIG_X86_GENERIC is that the
win is so small, and that a number of vendors got it wrong.
-hpa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 18:41 ` H. Peter Anvin
@ 2008-07-22 23:28 ` Jeff Garzik
2008-07-23 0:31 ` H. Peter Anvin
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2008-07-22 23:28 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
jfm3, Sujith
H. Peter Anvin wrote:
> Jeff Garzik wrote:
>>
>>> We're only referring specifically to the family == 6 VIA processors
>>> here.
>>
>> To be specific, I was merely saying that VIA processors where
>> c->x86_model==6 may lack CMOV.
>>
>> I have not kept track of what current Kconfig options will set, but in
>> the past it was quite easy to build a "generic 686 kernel" that
>> required CMOV and thus excluded these VIA processors.
>>
>> Distros in the past often wound up intentionally -not- supporting some
>> of these VIA processors, because they did not want to create a
>> non-CMOV kernel. (This policy obviously excluded older x86 as well)
>>
>> If these things have been addressed recently (< 12-18 months) then all
>> good.
>>
>
> I am pretty sure CONFIG_X86_GENERIC doesn't disable CMOV, and since CMOV
> is a separate CPUID flag it's all good (if the chip doesn't have it,
> it'll trap.)
It's generally more an issue of making sure the compiler is not
instructed to issue cmov (-march=i686).
> Unfortunately Intel didn't assign a CPUID flag for the long NOPs, and
> then didn't document them (I think partially because they were a
> retcon), but yet it reflected a serious hole in Centaur's
> characterization effort that they bumped family to 6 without following
> P6 behaviour for a massive range of opcodes.
>
> The main reason for disabling P6 NOPs for CONFIG_X86_GENERIC is that the
> win is so small, and that a number of vendors got it wrong.
Yeah.
Jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 23:28 ` Jeff Garzik
@ 2008-07-23 0:31 ` H. Peter Anvin
0 siblings, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-23 0:31 UTC (permalink / raw)
To: Jeff Garzik
Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
jfm3, Sujith
Jeff Garzik wrote:
>>
>> I am pretty sure CONFIG_X86_GENERIC doesn't disable CMOV, and since
>> CMOV is a separate CPUID flag it's all good (if the chip doesn't have
>> it, it'll trap.)
>
> It's generally more an issue of making sure the compiler is not
> instructed to issue cmov (-march=i686).
>
You're missing the point, though. The issues at hand are:
- Luis' distributor is compiling kernels without CONFIG_X86_GENERIC.
- VIA has CPUs with family == 6 that don't support long NOPs.
- There is no CPUID flag for long NOPs.
So the VIA chips in question sail through the system that's supposed to
warn that the kernel is using an unsupported feature and have a hard
crash, instead.
A lot of virtualizers do the same thing, since they don't use proper
vendor IDs and instead mimic real chips, sigh.
-hpa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-22 13:24 ` H. Peter Anvin
2008-07-22 13:46 ` Ingo Molnar
@ 2008-07-26 18:31 ` Andi Kleen
2008-07-26 18:35 ` H. Peter Anvin
1 sibling, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2008-07-26 18:31 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Luis R. Rodriguez, linux kernel, H. Peter Anvin,
Ivan Seskar, jfm3, Sujith
"H. Peter Anvin" <hpa@zytor.com> writes:
> Ingo Molnar wrote:
>>>
>>> Not a bug.
>> it would still be nice to get a nice printk and panic during bootup
>> instead of some obscure crash, hm?
>>
>
> Yes. The fundamental problem is that Centaur has a set of CPUs which
> report family == 6 but don't have the long NOP instructions. We would
> need an exact CPUID criterion for these CPUs in order to be able to
> report it as an error. An alternative would be to attempt trapping in
> the real-mode code (#UD is one of the *very* few CPU exceptions which
> can be reliably captured in real mode on a BIOS system), but doing so
> would probably mean breaking Loadlin at the very least.
>
> We can't "printk and panic" because we never get that far in the
> kernel proper, for obvious reasons: the code is quite littered with
> these buggers.
This was originally supposed to be handled in the early real mode
head.S code. That is why I put the CPUID checking code in there
to error out early when you can still print to the console
using the BIOS functions.
I suspect this regressed when that code was moved to C, because
now the C compiler generates CMOV early.
How about always building the real mode C code with -march=i386?
It is not performance critical so that is ok.
-Andi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-26 18:31 ` Andi Kleen
@ 2008-07-26 18:35 ` H. Peter Anvin
2008-07-26 18:44 ` Andi Kleen
0 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-26 18:35 UTC (permalink / raw)
To: Andi Kleen
Cc: Ingo Molnar, Luis R. Rodriguez, linux kernel, H. Peter Anvin,
Ivan Seskar, jfm3, Sujith
Andi Kleen wrote:
>
> This was originally supposed to be handled in the early real mode
> head.S code. That is why I put the CPUID checking code in there
> to error out early when you can still print to the console
> using the BIOS functions.
>
> I suspect this regressed when that code was moved to C, because
> now the C compiler generates CMOV early.
>
> How about always building the real mode C code with -march=i386?
> It is not performance critical so that is ok.
>
The real mode code *is* compiled with -march=i386, and in the CMOV case
it will err out with a legible message.
The issue isn't CMOV at all, it's with long NOPs, which don't have a
CPUID bit -- they're supposed to be supported if family >= 6, but some
VIA chips violate that condition.
-hpa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
2008-07-26 18:35 ` H. Peter Anvin
@ 2008-07-26 18:44 ` Andi Kleen
0 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2008-07-26 18:44 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Luis R. Rodriguez, linux kernel, H. Peter Anvin,
Ivan Seskar, jfm3, Sujith
"H. Peter Anvin" <hpa@zytor.com> writes:
>
> The real mode code *is* compiled with -march=i386, and in the CMOV
> case it will err out with a legible message.
>
> The issue isn't CMOV at all, it's with long NOPs, which don't have a
> CPUID bit -- they're supposed to be supported if family >= 6, but some
> VIA chips violate that condition.
Ah yes I realized that about 1 minute after sending the original mail %)
Sorry for the noise. The only way to handle this is probably to add
special quirks. Should check with Centaur for the exact CPUID signatures
of these CPUs.
Or perhaps just stop using the special nops. It was always unclear
if optimizing nops was really worth it.
-Andi
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-07-26 18:44 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-21 13:14 Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot Luis R. Rodriguez
2008-07-21 13:23 ` H. Peter Anvin
2008-07-21 14:01 ` Luis R. Rodriguez
2008-07-21 23:24 ` H. Peter Anvin
2008-07-22 4:47 ` Luis R. Rodriguez
2008-07-22 13:10 ` H. Peter Anvin
2008-07-22 17:10 ` Jeff Garzik
2008-07-22 18:21 ` H. Peter Anvin
2008-07-22 18:33 ` Jeff Garzik
2008-07-22 18:41 ` H. Peter Anvin
2008-07-22 23:28 ` Jeff Garzik
2008-07-23 0:31 ` H. Peter Anvin
2008-07-22 13:14 ` Ingo Molnar
2008-07-22 13:24 ` H. Peter Anvin
2008-07-22 13:46 ` Ingo Molnar
2008-07-22 13:54 ` H. Peter Anvin
2008-07-26 18:31 ` Andi Kleen
2008-07-26 18:35 ` H. Peter Anvin
2008-07-26 18:44 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).