linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: VIA C3 and random SIGTRAP or segfault
@ 2003-01-15 14:15 Larry Sendlosky
  2003-01-15 14:47 ` Padraig
  2003-01-20  7:30 ` Alan
  0 siblings, 2 replies; 12+ messages in thread
From: Larry Sendlosky @ 2003-01-15 14:15 UTC (permalink / raw)
  To: Dave Jones, Miklos Szeredi; +Cc: linux-kernel

We're seeing the same thing on a mini-ITX based system.
init is segfaulting :(( .  We've never seen this on our
other non-C3 systems running the same codebase. We've instrumented
the kernel to help catch the initial problem, hopefully it will
trigger soon.

Dave, will the cmov generate a segfault or illegal instr trap (SIGILL?) ?

thanks
larry



-----Original Message-----
From: Dave Jones [mailto:davej@codemonkey.org.uk]
Sent: Wednesday, January 15, 2003 7:23 AM
To: Miklos Szeredi
Cc: linux-kernel@vger.kernel.org
Subject: Re: VIA C3 and random SIGTRAP or segfault


On Wed, Jan 15, 2003 at 10:29:01AM +0100, Miklos Szeredi wrote:
 > 
 > I just bought a VIA C3 866 processor, and under very special
 > circumstances some programs (e.g. mplayer, xmms) randomly crash with
 > trace/breakpoint trap or segmentation fault.  Otherwise the system
 > seems stable even under high load.

Be sure that those programs aren't compiled for 686. The C3 lacks
cmov, so it'll segfault when it hits that opcode. You can confirm
this by running it under gdb, and disassembling where it segv's to.
This is still a common problem thats biting some people. The debian
folks had a broken libssl for months up until recently.

Note to userspace developers: If you're compiling something as
a 686 binary, you *NEED* to check the feature flags (in an i386
compiled program) to see if the CPU has cmov before you load 686
optimised parts of your app.  This is *NOT* a kernel problem,
it is *NOT* a CPU bug. The cmov extension is optional.
VIA chose to save silicon space by not implementing it. 
Gcc unfortunatly always uses cmov when compiling for 686.

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VIA C3 and random SIGTRAP or segfault
  2003-01-15 14:15 VIA C3 and random SIGTRAP or segfault Larry Sendlosky
@ 2003-01-15 14:47 ` Padraig
  2003-01-15 15:56   ` Miklos Szeredi
  2003-01-20  7:30 ` Alan
  1 sibling, 1 reply; 12+ messages in thread
From: Padraig @ 2003-01-15 14:47 UTC (permalink / raw)
  To: Larry Sendlosky; +Cc: Dave Jones, Miklos Szeredi, linux-kernel

Larry Sendlosky wrote:
> We're seeing the same thing on a mini-ITX based system.
> init is segfaulting :(( .  We've never seen this on our
> other non-C3 systems running the same codebase. We've instrumented
> the kernel to help catch the initial problem, hopefully it will
> trigger soon.
> 
> Dave, will the cmov generate a segfault or illegal instr trap (SIGILL?) ?

segfault is what I saw. Something seems to be corrupted (by a cmov
SIGILL?) and from then the app will crash in the same
(arbitrary) place until the machine is restarted. Some apps
are more susceptible than others. Note a Samuel II would work fine?

Hmm, just checking an ssh binary and associated libs that I know
crashed every so often (only in interactive mode, not with ssh -c),
I noticed that libnsl.so.1 (network services lib (part of glibc))
had cmov instructions. Other things noticed to crash were bash,
vi, php, snmpd. So I guess libnsl could be the root of our probs?
Note we built the whole system from SRPMs with the appropriate
flags for C3, but obviously these were ignored for libnsl anyway!
Also possibly related is that most problematic binaries
(php/snmpd/ssh) were linked to libcrypto.so.2 which may be relevant?

To find if a binary has CMOV instructions:
   objdump --disassemble binary | grep cmov

Pádraig.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VIA C3 and random SIGTRAP or segfault
  2003-01-15 14:47 ` Padraig
@ 2003-01-15 15:56   ` Miklos Szeredi
  2003-01-15 16:12     ` Jens Axboe
  2003-01-15 16:13     ` Padraig
  0 siblings, 2 replies; 12+ messages in thread
From: Miklos Szeredi @ 2003-01-15 15:56 UTC (permalink / raw)
  To: Padraig; +Cc: Larry.Sendlosky, davej, linux-kernel


> segfault is what I saw. Something seems to be corrupted (by a cmov
> SIGILL?) and from then the app will crash in the same
> (arbitrary) place until the machine is restarted. Some apps
> are more susceptible than others. Note a Samuel II would work fine?

Do you mean that after a cmov is encountered other applications will
also randomly crash?  That would explain what I've been seeing.

Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VIA C3 and random SIGTRAP or segfault
  2003-01-15 15:56   ` Miklos Szeredi
@ 2003-01-15 16:12     ` Jens Axboe
  2003-01-15 16:13     ` Padraig
  1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2003-01-15 16:12 UTC (permalink / raw)
  To: Miklos Szeredi, Padraig; +Cc: Larry.Sendlosky, davej, linux-kernel

On Wed, Jan 15 2003, Miklos Szeredi wrote:
> 
> > segfault is what I saw. Something seems to be corrupted (by a cmov
> > SIGILL?) and from then the app will crash in the same
> > (arbitrary) place until the machine is restarted. Some apps
> > are more susceptible than others. Note a Samuel II would work fine?
> 
> Do you mean that after a cmov is encountered other applications will
> also randomly crash?  That would explain what I've been seeing.

No, it will SIGILL immediately.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VIA C3 and random SIGTRAP or segfault
  2003-01-15 15:56   ` Miklos Szeredi
  2003-01-15 16:12     ` Jens Axboe
@ 2003-01-15 16:13     ` Padraig
  2003-01-15 16:26       ` Miklos Szeredi
  1 sibling, 1 reply; 12+ messages in thread
From: Padraig @ 2003-01-15 16:13 UTC (permalink / raw)
  To: Miklos.Szeredi; +Cc: Larry.Sendlosky, davej, linux-kernel

Miklos.Szeredi@eth.ericsson.se wrote:
>>segfault is what I saw. Something seems to be corrupted (by a cmov
>>SIGILL?) and from then the app will crash in the same
>>(arbitrary) place until the machine is restarted. Some apps
>>are more susceptible than others. Note a Samuel II would work fine?
> 
> Do you mean that after a cmov is encountered other applications will
> also randomly crash?  That would explain what I've been seeing.

Well I never got SIGILL as would be expected. I got SEGFAULTs
and I'm only speculating that a CMOV was encountered.
But yes that does seem to be what's happening, the
CMOV corrupts something global to many apps, and
"every now and then" SEGFAULT.

You could quickly check your system with something like:

find /bin -perm +111 -type f |
while read bin; do
     objdump --disassemble $bin 2>/dev/null |
     grep -q cmov && echo "$bin has cmov"
done

Pádraig.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VIA C3 and random SIGTRAP or segfault
  2003-01-15 16:13     ` Padraig
@ 2003-01-15 16:26       ` Miklos Szeredi
  0 siblings, 0 replies; 12+ messages in thread
From: Miklos Szeredi @ 2003-01-15 16:26 UTC (permalink / raw)
  To: Padraig; +Cc: Larry.Sendlosky, davej, linux-kernel


> Well I never got SIGILL as would be expected. I got SEGFAULTs
> and I'm only speculating that a CMOV was encountered.
> But yes that does seem to be what's happening, the
> CMOV corrupts something global to many apps, and
> "every now and then" SEGFAULT.

That is exactly the behavior I'm seeing.  When xmms is run by one user
under gnome it crashes after some random amount of time.  Other users
or under kde xmms _never_ crashes.  

> You could quickly check your system with something like:
> 
> find /bin -perm +111 -type f |
> while read bin; do
>      objdump --disassemble $bin 2>/dev/null |
>      grep -q cmov && echo "$bin has cmov"
> done

Thanks I will check for cmovs.

Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: VIA C3 and random SIGTRAP or segfault
  2003-01-15 14:15 VIA C3 and random SIGTRAP or segfault Larry Sendlosky
  2003-01-15 14:47 ` Padraig
@ 2003-01-20  7:30 ` Alan
  1 sibling, 0 replies; 12+ messages in thread
From: Alan @ 2003-01-20  7:30 UTC (permalink / raw)
  To: Larry Sendlosky; +Cc: Dave Jones, Miklos Szeredi, Linux Kernel Mailing List

On Wed, 2003-01-15 at 14:15, Larry Sendlosky wrote:
> We're seeing the same thing on a mini-ITX based system.
> init is segfaulting :(( .  We've never seen this on our
> other non-C3 systems running the same codebase. We've instrumented
> the kernel to help catch the initial problem, hopefully it will
> trigger soon.

I run Red Hat 8.x on both EPIA and EPIA-M boards without problems. 
I have seen weird crashes on EPIA boards with marginal RAM (you
need the right cas for EPIA otherwise it will die under any kind
of bus mastering)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VIA C3 and random SIGTRAP or segfault
  2003-01-15 12:23 ` Dave Jones
  2003-01-15 12:38   ` Miklos Szeredi
@ 2003-01-16  5:53   ` Glen Turner
  1 sibling, 0 replies; 12+ messages in thread
From: Glen Turner @ 2003-01-16  5:53 UTC (permalink / raw)
  To: Dave Jones; +Cc: Miklos Szeredi, linux-kernel

Dave Jones wrote:
> On Wed, Jan 15, 2003 at 10:29:01AM +0100, Miklos Szeredi wrote:
>  > 
>  > I just bought a VIA C3 866 processor, and under very special
>  > circumstances some programs (e.g. mplayer, xmms) randomly crash with
>  > trace/breakpoint trap or segmentation fault.  Otherwise the system
>  > seems stable even under high load.
> 
> Be sure that those programs aren't compiled for 686. The C3 lacks
> cmov, so it'll segfault when it hits that opcode. You can confirm
> this by running it under gdb, and disassembling where it segv's to.
> This is still a common problem thats biting some people. The debian
> folks had a broken libssl for months up until recently.
> 
> Note to userspace developers: If you're compiling something as
> a 686 binary, you *NEED* to check the feature flags (in an i386
> compiled program) to see if the CPU has cmov before you load 686
> optimised parts of your app.  This is *NOT* a kernel problem,
> it is *NOT* a CPU bug. The cmov extension is optional.
> VIA chose to save silicon space by not implementing it. 
> Gcc unfortunatly always uses cmov when compiling for 686.

Why not use a CMOV in a i686-specific crt0.c?

Then programs compiled for i686 but run on i586 will SIGILL
deterministically at program start-up.  It seems to me that
the major problem with SIGILL is that it occurs depending
upon the program execution flow, and thus appears indeterministic
to the user.

This doesn't solve the problem of a i386 executable calling
a i686 library, but solving that problem deterministically
requires a lot of baggage:

   - compiler to produce an object file header stating CPU
     features used.

   - run time linker to take union of all CPU features in
     object file headers and check against CPU features
     returned by CPUID.

Even this isn't perfect, consider multi-processor machines
with differing CPU feature sets or applications which attempt
to implement their own run-time checking:

    get_cpu_features(&feature);
    if (feature.cmov && feature.somethingelse && ...)
        mytask_i686();
    else
        mytask_i386();

This leads inevitably more flags in the object file header
to instruct the run-time linker to skip particular CPU feature
checks

   gcc -c -mdisable_cpu_feature_check=cmov -o mytask.o mytask.c

SIGILL starts to look lightweight :-)

-- 
  Glen Turner                (08) 8303 3936 or +61 8 8303 3936
  Australian Academic and Research Network   www.aarnet.edu.au


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VIA C3 and random SIGTRAP or segfault
  2003-01-15 12:38   ` Miklos Szeredi
@ 2003-01-15 13:23     ` Dave Jones
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Jones @ 2003-01-15 13:23 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel

On Wed, Jan 15, 2003 at 01:38:58PM +0100, Miklos Szeredi wrote:
 > 
 > Thanks, I'll check that out, though I'm a bit sceptical since the
 > crashes occur randomly not predictably, and that makes me feel it's
 > not because of an unimplemented instruction.
 > 
 > Also what about trace/breakpoint trap?  Can that be also generated by
 > an illegal instruction?

Hmm. My theory would explain SIGILL's, but if you're seeing others
as well, it could be something else. Check power supply is rated
high enough, cooling, (though cooling is usually less of an issue with C3s)
A run with memtest86 may also be worth trying.
		
		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VIA C3 and random SIGTRAP or segfault
  2003-01-15 12:23 ` Dave Jones
@ 2003-01-15 12:38   ` Miklos Szeredi
  2003-01-15 13:23     ` Dave Jones
  2003-01-16  5:53   ` Glen Turner
  1 sibling, 1 reply; 12+ messages in thread
From: Miklos Szeredi @ 2003-01-15 12:38 UTC (permalink / raw)
  To: davej; +Cc: linux-kernel


Thanks, I'll check that out, though I'm a bit sceptical since the
crashes occur randomly not predictably, and that makes me feel it's
not because of an unimplemented instruction.

Also what about trace/breakpoint trap?  Can that be also generated by
an illegal instruction?

Thanks,
Miklos

> On Wed, Jan 15, 2003 at 10:29:01AM +0100, Miklos Szeredi wrote:
>  > 
>  > I just bought a VIA C3 866 processor, and under very special
>  > circumstances some programs (e.g. mplayer, xmms) randomly crash with
>  > trace/breakpoint trap or segmentation fault.  Otherwise the system
>  > seems stable even under high load.
> 
> Be sure that those programs aren't compiled for 686. The C3 lacks
> cmov, so it'll segfault when it hits that opcode. You can confirm
> this by running it under gdb, and disassembling where it segv's to.
> This is still a common problem thats biting some people. The debian
> folks had a broken libssl for months up until recently.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: VIA C3 and random SIGTRAP or segfault
  2003-01-15  9:29 Miklos Szeredi
@ 2003-01-15 12:23 ` Dave Jones
  2003-01-15 12:38   ` Miklos Szeredi
  2003-01-16  5:53   ` Glen Turner
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Jones @ 2003-01-15 12:23 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel

On Wed, Jan 15, 2003 at 10:29:01AM +0100, Miklos Szeredi wrote:
 > 
 > I just bought a VIA C3 866 processor, and under very special
 > circumstances some programs (e.g. mplayer, xmms) randomly crash with
 > trace/breakpoint trap or segmentation fault.  Otherwise the system
 > seems stable even under high load.

Be sure that those programs aren't compiled for 686. The C3 lacks
cmov, so it'll segfault when it hits that opcode. You can confirm
this by running it under gdb, and disassembling where it segv's to.
This is still a common problem thats biting some people. The debian
folks had a broken libssl for months up until recently.

Note to userspace developers: If you're compiling something as
a 686 binary, you *NEED* to check the feature flags (in an i386
compiled program) to see if the CPU has cmov before you load 686
optimised parts of your app.  This is *NOT* a kernel problem,
it is *NOT* a CPU bug. The cmov extension is optional.
VIA chose to save silicon space by not implementing it. 
Gcc unfortunatly always uses cmov when compiling for 686.

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* VIA C3 and random SIGTRAP or segfault
@ 2003-01-15  9:29 Miklos Szeredi
  2003-01-15 12:23 ` Dave Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Miklos Szeredi @ 2003-01-15  9:29 UTC (permalink / raw)
  To: linux-kernel


I just bought a VIA C3 866 processor, and under very special
circumstances some programs (e.g. mplayer, xmms) randomly crash with
trace/breakpoint trap or segmentation fault.  Otherwise the system
seems stable even under high load.  Tested under various kernels
(generic i386 2.2.19, 2.4.19, and 2.4.19 compiled for the C3), with
different memory modules (some known to be good) and various video
cards and X servers, but the result is always the same.

Can this be a software fault or is the CPU faulty?  Can anything other
then a CPU fault cause programs to receive SIGTRAP?

The system config is:

cpu: C3 866MHz
mb: asus cuv4x-c (via vt82c694x chipset)

The BIOS recognises the CPU as "VIA Cyrix III 866A", which is not
exactly right but almost.

Any advice is greatly appreciated!
Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-01-21  8:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-15 14:15 VIA C3 and random SIGTRAP or segfault Larry Sendlosky
2003-01-15 14:47 ` Padraig
2003-01-15 15:56   ` Miklos Szeredi
2003-01-15 16:12     ` Jens Axboe
2003-01-15 16:13     ` Padraig
2003-01-15 16:26       ` Miklos Szeredi
2003-01-20  7:30 ` Alan
  -- strict thread matches above, loose matches on Subject: below --
2003-01-15  9:29 Miklos Szeredi
2003-01-15 12:23 ` Dave Jones
2003-01-15 12:38   ` Miklos Szeredi
2003-01-15 13:23     ` Dave Jones
2003-01-16  5:53   ` Glen Turner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).