linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
@ 2008-07-21 13:14 Luis R. Rodriguez
  2008-07-21 13:23 ` H. Peter Anvin
  0 siblings, 1 reply; 19+ messages in thread
From: Luis R. Rodriguez @ 2008-07-21 13:14 UTC (permalink / raw)
  To: linux kernel, H. Peter Anvin; +Cc: Ivan Seskar, jfm3, Sujith

This bug seems to be present since 2.6.22 [1], so hope we can get this
fixed ASAP. Let me know if you have patch suggestions I can test.

This crashes very early, I had to use earlyprintk to get it.

BUG: Int 6: CR2 00000000
     EDI 00000000  ESI 0009f000  EBP 00000000  ESP c036ff60
     EBX c03e6070  EDX 00000006  ECX 0000009f  EAX c034d240
     err 00000000  EIP c0387ac2   CS 00000060  flg 00010016
Stack: 00000000 00000000 c03b63d8 c037bb11 0001dff0 00003c00 c0399614 c03b6304
       00822007 c037aeb4 c0304134 000001df c0304117 00000000 30303030 205d3030
       00000000 c034c794 00000000 00000000 00822007 c02b921f c034c794 00000000
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.26 #1
BUG: Int 6: CR2 00000000
     EDI 00000000  ESI c034c018  EBP 00000000  ESP c036feb8
     EBX c036ff28  EDX 00000006  ECX c03e6070  EAX c034c018
     err 00000000  EIP c013d922   CS 00000060  flg 00010016
Stack: c036ff28 c03e6070 c01305ef c0104a34 c036fffc c036e000 0009f000 00000002
       0009f000 00000000 00000000 c0105969 00000000 c02bee94 c0310de8 c02b9129
       00000000 c0303d6a 00000000 c034751d c03c7278 c0347faa 00000002 c0347feb
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.26 #1

node1-1:~# cat /proc/cpuinfo
processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 9
model name      : VIA Nehemiah
stepping        : 8
cpu MHz         : 997.108
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr cx8 sep mtrr pge cmov pat mmx
fxsr sse up rng rng_en ace ace_en
bogomips        : 1996.49
clflush size    : 32

You can get my config from:

http://www.winlab.rutgers.edu/~mcgrof/configs/config-2.6.26

It's basically debian based from 2.6.25-2 just updated for 2.6.26. Let
me know if I can provide more information.

[1] http://orbit-lab.org/wiki/Documentation/SupportedImages/baseline-8.3.ndz

  Luis

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-21 13:14 Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot Luis R. Rodriguez
@ 2008-07-21 13:23 ` H. Peter Anvin
  2008-07-21 14:01   ` Luis R. Rodriguez
  0 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-21 13:23 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith

Luis R. Rodriguez wrote:
> This bug seems to be present since 2.6.22 [1], so hope we can get this
> fixed ASAP. Let me know if you have patch suggestions I can test.
> 
> This crashes very early, I had to use earlyprintk to get it.
> 
> BUG: Int 6: CR2 00000000
>      EDI 00000000  ESI 0009f000  EBP 00000000  ESP c036ff60
>      EBX c03e6070  EDX 00000006  ECX 0000009f  EAX c034d240
>      err 00000000  EIP c0387ac2   CS 00000060  flg 00010016
> Stack: 00000000 00000000 c03b63d8 c037bb11 0001dff0 00003c00 c0399614 c03b6304
>        00822007 c037aeb4 c0304134 000001df c0304117 00000000 30303030 205d3030
>        00000000 c034c794 00000000 00000000 00822007 c02b921f c034c794 00000000
> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.26 #1

Use objdump -d or something to find out what is at 0xc0387ac2; the error 
is an undefined instruction exception.

I suspect this is another case of a processor reporting family == 6 and 
not providing the 0F 1F NOP opcodes.

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-21 13:23 ` H. Peter Anvin
@ 2008-07-21 14:01   ` Luis R. Rodriguez
  2008-07-21 23:24     ` H. Peter Anvin
  0 siblings, 1 reply; 19+ messages in thread
From: Luis R. Rodriguez @ 2008-07-21 14:01 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith

On Mon, Jul 21, 2008 at 6:23 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> Luis R. Rodriguez wrote:
>>
>> This bug seems to be present since 2.6.22 [1], so hope we can get this
>> fixed ASAP. Let me know if you have patch suggestions I can test.
>>
>> This crashes very early, I had to use earlyprintk to get it.
>>
>> BUG: Int 6: CR2 00000000
>>     EDI 00000000  ESI 0009f000  EBP 00000000  ESP c036ff60
>>     EBX c03e6070  EDX 00000006  ECX 0000009f  EAX c034d240
>>     err 00000000  EIP c0387ac2   CS 00000060  flg 00010016
>> Stack: 00000000 00000000 c03b63d8 c037bb11 0001dff0 00003c00 c0399614
>> c03b6304
>>       00822007 c037aeb4 c0304134 000001df c0304117 00000000 30303030
>> 205d3030
>>       00000000 c034c794 00000000 00000000 00822007 c02b921f c034c794
>> 00000000
>> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.26 #1
>
> Use objdump -d or something to find out what is at 0xc0387ac2

I've put extra spaces between the culprit.

c0387aa0 <free_bootmem>:
c0387aa0:       57                      push   %edi
c0387aa1:       89 c7                   mov    %eax,%edi
c0387aa3:       a1 40 d2 34 c0          mov    0xc034d240,%eax
c0387aa8:       56                      push   %esi
c0387aa9:       89 d6                   mov    %edx,%esi
c0387aab:       53                      push   %ebx
c0387aac:       eb 0e                   jmp    c0387abc <free_bootmem+0x1c>
c0387aae:       89 d8                   mov    %ebx,%eax
c0387ab0:       89 f1                   mov    %esi,%ecx
c0387ab2:       89 fa                   mov    %edi,%edx
c0387ab4:       e8 67 ff ff ff          call   c0387a20 <free_bootmem_core>
c0387ab9:       8b 43 18                mov    0x18(%ebx),%eax
c0387abc:       8d 58 e8                lea    -0x18(%eax),%ebx
c0387abf:       8b 43 18                mov    0x18(%ebx),%eax

c0387ac2:       0f 1f 40 00             nopl   0x0(%eax)

c0387ac6:       81 fb 28 d2 34 c0       cmp    $0xc034d228,%ebx
c0387acc:       75 e0                   jne    c0387aae <free_bootmem+0xe>
c0387ace:       5b                      pop    %ebx
c0387acf:       5e                      pop    %esi
c0387ad0:       5f                      pop    %edi
c0387ad1:       c3                      ret

  Luis

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-21 14:01   ` Luis R. Rodriguez
@ 2008-07-21 23:24     ` H. Peter Anvin
  2008-07-22  4:47       ` Luis R. Rodriguez
  2008-07-22 13:14       ` Ingo Molnar
  0 siblings, 2 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-21 23:24 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith

Luis R. Rodriguez wrote:
> On Mon, Jul 21, 2008 at 6:23 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> Luis R. Rodriguez wrote:
>>> This bug seems to be present since 2.6.22 [1], so hope we can get this
>>> fixed ASAP. Let me know if you have patch suggestions I can test.
>>>
>>> This crashes very early, I had to use earlyprintk to get it.
>>>
> 
> I've put extra spaces between the culprit.
> 
> c0387ac2:       0f 1f 40 00             nopl   0x0(%eax)
> 

Sure enough, our old friend.

You have in your configuration:

CONFIG_M686=y
# CONFIG_X86_GENERIC is not set

... so this is fully expected; CONFIG_M686 without CONFIG_X86_GENERIC is 
not compatible with such processors.

Not a bug.

	-hpa


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-21 23:24     ` H. Peter Anvin
@ 2008-07-22  4:47       ` Luis R. Rodriguez
  2008-07-22 13:10         ` H. Peter Anvin
  2008-07-22 17:10         ` Jeff Garzik
  2008-07-22 13:14       ` Ingo Molnar
  1 sibling, 2 replies; 19+ messages in thread
From: Luis R. Rodriguez @ 2008-07-22  4:47 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith

On Mon, Jul 21, 2008 at 4:24 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> Luis R. Rodriguez wrote:
>>
>> On Mon, Jul 21, 2008 at 6:23 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> Luis R. Rodriguez wrote:
>>>>
>>>> This bug seems to be present since 2.6.22 [1], so hope we can get this
>>>> fixed ASAP. Let me know if you have patch suggestions I can test.
>>>>
>>>> This crashes very early, I had to use earlyprintk to get it.
>>>>
>>
>> I've put extra spaces between the culprit.
>>
>> c0387ac2:       0f 1f 40 00             nopl   0x0(%eax)
>>
>
> Sure enough, our old friend.
>
> You have in your configuration:
>
> CONFIG_M686=y
> # CONFIG_X86_GENERIC is not set
>
> ... so this is fully expected; CONFIG_M686 without CONFIG_X86_GENERIC is not
> compatible with such processors.
>
> Not a bug.

Thanks for taking a  look at this. So well, it would be a
misconfiguration bug by the distribution then to try to support a
generic 686 kernel wihtout GENERIC then.

  Luis

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22  4:47       ` Luis R. Rodriguez
@ 2008-07-22 13:10         ` H. Peter Anvin
  2008-07-22 17:10         ` Jeff Garzik
  1 sibling, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-22 13:10 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith

Luis R. Rodriguez wrote:
> 
> Thanks for taking a  look at this. So well, it would be a
> misconfiguration bug by the distribution then to try to support a
> generic 686 kernel wihtout GENERIC then.
> 

Yes.

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-21 23:24     ` H. Peter Anvin
  2008-07-22  4:47       ` Luis R. Rodriguez
@ 2008-07-22 13:14       ` Ingo Molnar
  2008-07-22 13:24         ` H. Peter Anvin
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2008-07-22 13:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
	jfm3, Sujith

* H. Peter Anvin <hpa@zytor.com> wrote:

> Luis R. Rodriguez wrote:
>> On Mon, Jul 21, 2008 at 6:23 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> Luis R. Rodriguez wrote:
>>>> This bug seems to be present since 2.6.22 [1], so hope we can get this
>>>> fixed ASAP. Let me know if you have patch suggestions I can test.
>>>>
>>>> This crashes very early, I had to use earlyprintk to get it.
>>>>
>>
>> I've put extra spaces between the culprit.
>>
>> c0387ac2:       0f 1f 40 00             nopl   0x0(%eax)
>>
>
> Sure enough, our old friend.
>
> You have in your configuration:
>
> CONFIG_M686=y
> # CONFIG_X86_GENERIC is not set
>
> ... so this is fully expected; CONFIG_M686 without CONFIG_X86_GENERIC is  
> not compatible with such processors.
>
> Not a bug.

it would still be nice to get a nice printk and panic during bootup 
instead of some obscure crash, hm?

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22 13:14       ` Ingo Molnar
@ 2008-07-22 13:24         ` H. Peter Anvin
  2008-07-22 13:46           ` Ingo Molnar
  2008-07-26 18:31           ` Andi Kleen
  0 siblings, 2 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-22 13:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
	jfm3, Sujith

Ingo Molnar wrote:
>>
>> Not a bug.
> 
> it would still be nice to get a nice printk and panic during bootup 
> instead of some obscure crash, hm?
> 

Yes.  The fundamental problem is that Centaur has a set of CPUs which 
report family == 6 but don't have the long NOP instructions.  We would 
need an exact CPUID criterion for these CPUs in order to be able to 
report it as an error.  An alternative would be to attempt trapping in 
the real-mode code (#UD is one of the *very* few CPU exceptions which 
can be reliably captured in real mode on a BIOS system), but doing so 
would probably mean breaking Loadlin at the very least.

We can't "printk and panic" because we never get that far in the kernel 
proper, for obvious reasons: the code is quite littered with these buggers.

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22 13:24         ` H. Peter Anvin
@ 2008-07-22 13:46           ` Ingo Molnar
  2008-07-22 13:54             ` H. Peter Anvin
  2008-07-26 18:31           ` Andi Kleen
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2008-07-22 13:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
	jfm3, Sujith


* H. Peter Anvin <hpa@zytor.com> wrote:

> Ingo Molnar wrote:
>>>
>>> Not a bug.
>>
>> it would still be nice to get a nice printk and panic during bootup  
>> instead of some obscure crash, hm?
>>
>
> Yes.  The fundamental problem is that Centaur has a set of CPUs which 
> report family == 6 but don't have the long NOP instructions.  We would 
> need an exact CPUID criterion for these CPUs in order to be able to 
> report it as an error.  An alternative would be to attempt trapping in 
> the real-mode code (#UD is one of the *very* few CPU exceptions which 
> can be reliably captured in real mode on a BIOS system), but doing so 
> would probably mean breaking Loadlin at the very least.
>
> We can't "printk and panic" because we never get that far in the 
> kernel proper, for obvious reasons: the code is quite littered with 
> these buggers.

hm. How about to default to a safe NOP all the way up to where we can 
fix up alternatives and install a different NOP. (which we could also 
test first via intentionally jumping on it and catching any exception 
via a special exception handler)

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22 13:46           ` Ingo Molnar
@ 2008-07-22 13:54             ` H. Peter Anvin
  0 siblings, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-22 13:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
	jfm3, Sujith

Ingo Molnar wrote:
>>
>> We can't "printk and panic" because we never get that far in the 
>> kernel proper, for obvious reasons: the code is quite littered with 
>> these buggers.
> 
> hm. How about to default to a safe NOP all the way up to where we can 
> fix up alternatives and install a different NOP. (which we could also 
> test first via intentionally jumping on it and catching any exception 
> via a special exception handler)
> 

I don't really think that's realistic, especially if gcc starts using 
these instructions (which it really *should*.)

You can make the same argument for every non-i386 instruction (heck, 
even every non-8086 instruction), and it quickly gets unworkable.

Since it is extremely likely that the set of processors affected is now 
bounded, I think it's just a matter of identifying the relevant CPUID 
info.  As far as I know, only VIA is affected.

What is worse is that there are a number of "virtual processors" out 
there which are, in effect, separate implementations of the x86 
architecture, but don't actually identify as anything else.  Several of 
them have broken nopl implementations, but identify as processors which 
are known good in this department.  Again, nothing unique to nopl about 
this, but it's a generic problem.

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22  4:47       ` Luis R. Rodriguez
  2008-07-22 13:10         ` H. Peter Anvin
@ 2008-07-22 17:10         ` Jeff Garzik
  2008-07-22 18:21           ` H. Peter Anvin
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2008-07-22 17:10 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: H. Peter Anvin, linux kernel, H. Peter Anvin, Ivan Seskar, jfm3, Sujith

Luis R. Rodriguez wrote:
> On Mon, Jul 21, 2008 at 4:24 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> Luis R. Rodriguez wrote:
>>> On Mon, Jul 21, 2008 at 6:23 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>> Luis R. Rodriguez wrote:
>>>>> This bug seems to be present since 2.6.22 [1], so hope we can get this
>>>>> fixed ASAP. Let me know if you have patch suggestions I can test.
>>>>>
>>>>> This crashes very early, I had to use earlyprintk to get it.
>>>>>
>>> I've put extra spaces between the culprit.
>>>
>>> c0387ac2:       0f 1f 40 00             nopl   0x0(%eax)
>>>
>> Sure enough, our old friend.
>>
>> You have in your configuration:
>>
>> CONFIG_M686=y
>> # CONFIG_X86_GENERIC is not set
>>
>> ... so this is fully expected; CONFIG_M686 without CONFIG_X86_GENERIC is not
>> compatible with such processors.
>>
>> Not a bug.
> 
> Thanks for taking a  look at this. So well, it would be a
> misconfiguration bug by the distribution then to try to support a
> generic 686 kernel wihtout GENERIC then.

Well, it may be intentional -- some distros simply exclude support for 
the lower-volume VIA processors, since that might imply building their 
"generic 686 kernel" sans CMOV and some other instructions, and changing 
the compiler's instruction scheduling to something less optimal for the 
majority.  :/

	Jeff




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22 17:10         ` Jeff Garzik
@ 2008-07-22 18:21           ` H. Peter Anvin
  2008-07-22 18:33             ` Jeff Garzik
  0 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-22 18:21 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
	jfm3, Sujith

Jeff Garzik wrote:
>>
>> Thanks for taking a  look at this. So well, it would be a
>> misconfiguration bug by the distribution then to try to support a
>> generic 686 kernel wihtout GENERIC then.
> 
> Well, it may be intentional -- some distros simply exclude support for 
> the lower-volume VIA processors, since that might imply building their 
> "generic 686 kernel" sans CMOV and some other instructions, and changing 
> the compiler's instruction scheduling to something less optimal for the 
> majority.  :/
> 

X86_GENERIC shouldn't disable CMOV?

We're only referring specifically to the family == 6 VIA processors here.

	-hpa


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22 18:21           ` H. Peter Anvin
@ 2008-07-22 18:33             ` Jeff Garzik
  2008-07-22 18:41               ` H. Peter Anvin
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2008-07-22 18:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
	jfm3, Sujith

H. Peter Anvin wrote:
> Jeff Garzik wrote:
>>>
>>> Thanks for taking a  look at this. So well, it would be a
>>> misconfiguration bug by the distribution then to try to support a
>>> generic 686 kernel wihtout GENERIC then.
>>
>> Well, it may be intentional -- some distros simply exclude support for 
>> the lower-volume VIA processors, since that might imply building their 
>> "generic 686 kernel" sans CMOV and some other instructions, and 
>> changing the compiler's instruction scheduling to something less 
>> optimal for the majority.  :/
>>
> 
> X86_GENERIC shouldn't disable CMOV?

I said "generic 686 kernel" not a specific Kconfig option (for reasons 
stated below), which is a bit different.


> We're only referring specifically to the family == 6 VIA processors here.

To be specific, I was merely saying that VIA processors where 
c->x86_model==6 may lack CMOV.

I have not kept track of what current Kconfig options will set, but in 
the past it was quite easy to build a "generic 686 kernel" that required 
CMOV and thus excluded these VIA processors.

Distros in the past often wound up intentionally -not- supporting some 
of these VIA processors, because they did not want to create a non-CMOV 
kernel.  (This policy obviously excluded older x86 as well)

If these things have been addressed recently (< 12-18 months) then all good.

	Jeff





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22 18:33             ` Jeff Garzik
@ 2008-07-22 18:41               ` H. Peter Anvin
  2008-07-22 23:28                 ` Jeff Garzik
  0 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-22 18:41 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
	jfm3, Sujith

Jeff Garzik wrote:
> 
>> We're only referring specifically to the family == 6 VIA processors here.
> 
> To be specific, I was merely saying that VIA processors where 
> c->x86_model==6 may lack CMOV.
> 
> I have not kept track of what current Kconfig options will set, but in 
> the past it was quite easy to build a "generic 686 kernel" that required 
> CMOV and thus excluded these VIA processors.
> 
> Distros in the past often wound up intentionally -not- supporting some 
> of these VIA processors, because they did not want to create a non-CMOV 
> kernel.  (This policy obviously excluded older x86 as well)
> 
> If these things have been addressed recently (< 12-18 months) then all 
> good.
> 

I am pretty sure CONFIG_X86_GENERIC doesn't disable CMOV, and since CMOV 
is a separate CPUID flag it's all good (if the chip doesn't have it, 
it'll trap.)

Unfortunately Intel didn't assign a CPUID flag for the long NOPs, and 
then didn't document them (I think partially because they were a 
retcon), but yet it reflected a serious hole in Centaur's 
characterization effort that they bumped family to 6 without following 
P6 behaviour for a massive range of opcodes.

The main reason for disabling P6 NOPs for CONFIG_X86_GENERIC is that the 
win is so small, and that a number of vendors got it wrong.

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22 18:41               ` H. Peter Anvin
@ 2008-07-22 23:28                 ` Jeff Garzik
  2008-07-23  0:31                   ` H. Peter Anvin
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2008-07-22 23:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
	jfm3, Sujith

H. Peter Anvin wrote:
> Jeff Garzik wrote:
>>
>>> We're only referring specifically to the family == 6 VIA processors 
>>> here.
>>
>> To be specific, I was merely saying that VIA processors where 
>> c->x86_model==6 may lack CMOV.
>>
>> I have not kept track of what current Kconfig options will set, but in 
>> the past it was quite easy to build a "generic 686 kernel" that 
>> required CMOV and thus excluded these VIA processors.
>>
>> Distros in the past often wound up intentionally -not- supporting some 
>> of these VIA processors, because they did not want to create a 
>> non-CMOV kernel.  (This policy obviously excluded older x86 as well)
>>
>> If these things have been addressed recently (< 12-18 months) then all 
>> good.
>>
> 
> I am pretty sure CONFIG_X86_GENERIC doesn't disable CMOV, and since CMOV 
> is a separate CPUID flag it's all good (if the chip doesn't have it, 
> it'll trap.)

It's generally more an issue of making sure the compiler is not 
instructed to issue cmov (-march=i686).


> Unfortunately Intel didn't assign a CPUID flag for the long NOPs, and 
> then didn't document them (I think partially because they were a 
> retcon), but yet it reflected a serious hole in Centaur's 
> characterization effort that they bumped family to 6 without following 
> P6 behaviour for a massive range of opcodes.
> 
> The main reason for disabling P6 NOPs for CONFIG_X86_GENERIC is that the 
> win is so small, and that a number of vendors got it wrong.

Yeah.

	Jeff




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22 23:28                 ` Jeff Garzik
@ 2008-07-23  0:31                   ` H. Peter Anvin
  0 siblings, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-23  0:31 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Luis R. Rodriguez, linux kernel, H. Peter Anvin, Ivan Seskar,
	jfm3, Sujith

Jeff Garzik wrote:
>>
>> I am pretty sure CONFIG_X86_GENERIC doesn't disable CMOV, and since 
>> CMOV is a separate CPUID flag it's all good (if the chip doesn't have 
>> it, it'll trap.)
> 
> It's generally more an issue of making sure the compiler is not 
> instructed to issue cmov (-march=i686).
> 

You're missing the point, though.  The issues at hand are:

- Luis' distributor is compiling kernels without CONFIG_X86_GENERIC.
- VIA has CPUs with family == 6 that don't support long NOPs.
- There is no CPUID flag for long NOPs.

So the VIA chips in question sail through the system that's supposed to 
warn that the kernel is using an unsupported feature and have a hard 
crash, instead.

A lot of virtualizers do the same thing, since they don't use proper 
vendor IDs and instead mimic real chips, sigh.

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-22 13:24         ` H. Peter Anvin
  2008-07-22 13:46           ` Ingo Molnar
@ 2008-07-26 18:31           ` Andi Kleen
  2008-07-26 18:35             ` H. Peter Anvin
  1 sibling, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2008-07-26 18:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Luis R. Rodriguez, linux kernel, H. Peter Anvin,
	Ivan Seskar, jfm3, Sujith

"H. Peter Anvin" <hpa@zytor.com> writes:

> Ingo Molnar wrote:
>>>
>>> Not a bug.
>> it would still be nice to get a nice printk and panic during bootup
>> instead of some obscure crash, hm?
>>
>
> Yes.  The fundamental problem is that Centaur has a set of CPUs which
> report family == 6 but don't have the long NOP instructions.  We would
> need an exact CPUID criterion for these CPUs in order to be able to
> report it as an error.  An alternative would be to attempt trapping in
> the real-mode code (#UD is one of the *very* few CPU exceptions which
> can be reliably captured in real mode on a BIOS system), but doing so
> would probably mean breaking Loadlin at the very least.
>
> We can't "printk and panic" because we never get that far in the
> kernel proper, for obvious reasons: the code is quite littered with
> these buggers.

This was originally supposed to be handled in the early real mode
head.S code. That is why I put the CPUID checking code in there
to error out early when you can still print to the console
using the BIOS functions.

I suspect this regressed when that code was moved to C, because
now the C compiler generates CMOV early.

How about always building the real mode C code with -march=i386?
It is not performance critical so that is ok.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-26 18:31           ` Andi Kleen
@ 2008-07-26 18:35             ` H. Peter Anvin
  2008-07-26 18:44               ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: H. Peter Anvin @ 2008-07-26 18:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Luis R. Rodriguez, linux kernel, H. Peter Anvin,
	Ivan Seskar, jfm3, Sujith

Andi Kleen wrote:
> 
> This was originally supposed to be handled in the early real mode
> head.S code. That is why I put the CPUID checking code in there
> to error out early when you can still print to the console
> using the BIOS functions.
> 
> I suspect this regressed when that code was moved to C, because
> now the C compiler generates CMOV early.
> 
> How about always building the real mode C code with -march=i386?
> It is not performance critical so that is ok.
> 

The real mode code *is* compiled with -march=i386, and in the CMOV case 
it will err out with a legible message.

The issue isn't CMOV at all, it's with long NOPs, which don't have a 
CPUID bit -- they're supposed to be supported if family >= 6, but some 
VIA chips violate that condition.

	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot
  2008-07-26 18:35             ` H. Peter Anvin
@ 2008-07-26 18:44               ` Andi Kleen
  0 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2008-07-26 18:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Luis R. Rodriguez, linux kernel, H. Peter Anvin,
	Ivan Seskar, jfm3, Sujith

"H. Peter Anvin" <hpa@zytor.com> writes:
>
> The real mode code *is* compiled with -march=i386, and in the CMOV
> case it will err out with a legible message.
>
> The issue isn't CMOV at all, it's with long NOPs, which don't have a
> CPUID bit -- they're supposed to be supported if family >= 6, but some
> VIA chips violate that condition.

Ah yes I realized that about 1 minute after sending the original mail %)
Sorry for the noise. The only way to handle this is probably to add
special quirks. Should check with Centaur for the exact CPUID signatures
of these CPUs.

Or perhaps just stop using the special nops. It was always unclear
if optimizing nops was really worth it.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-07-26 18:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-21 13:14 Bug on 2.6.26 - x86 VIA Nehemiah CentaurHauls processor cannot boot Luis R. Rodriguez
2008-07-21 13:23 ` H. Peter Anvin
2008-07-21 14:01   ` Luis R. Rodriguez
2008-07-21 23:24     ` H. Peter Anvin
2008-07-22  4:47       ` Luis R. Rodriguez
2008-07-22 13:10         ` H. Peter Anvin
2008-07-22 17:10         ` Jeff Garzik
2008-07-22 18:21           ` H. Peter Anvin
2008-07-22 18:33             ` Jeff Garzik
2008-07-22 18:41               ` H. Peter Anvin
2008-07-22 23:28                 ` Jeff Garzik
2008-07-23  0:31                   ` H. Peter Anvin
2008-07-22 13:14       ` Ingo Molnar
2008-07-22 13:24         ` H. Peter Anvin
2008-07-22 13:46           ` Ingo Molnar
2008-07-22 13:54             ` H. Peter Anvin
2008-07-26 18:31           ` Andi Kleen
2008-07-26 18:35             ` H. Peter Anvin
2008-07-26 18:44               ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).