linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Questionable SIGSEGV signal handling bug concerning siginfo.si_addr on i386-linux 2.4.2
@ 2001-06-26 14:24 Hugo Mildenberger
  2001-06-26 14:49 ` Brian Gerst
  0 siblings, 1 reply; 3+ messages in thread
From: Hugo Mildenberger @ 2001-06-26 14:24 UTC (permalink / raw)
  To: linux-kernel

Dear friends,

I'm working on a library, which is able to map (at least synchronous) kernel
signals to c++ exceptions in a way, that c++ exception handlers can
determine reason and location of failure in a very detailed manner. Within
that context, I detected a surprising difference in the behaviour of my test
programs, depending on if they have been compiled by gcc-2.9.2 or gcc-3.0.
When I compiled the program with gcc-3.0, siginfo.si_addr contained an
address, which was always by a value of +4 too large when compared to the
original invalid pointer value (e.g.0x1238 versus 0x1234 or 0x4 versus 0x0).
By contrast, the gcc-2.9.2 compiled program behaved correctly.

That symptom, as I thought, may have been caused by a subtile processor bug,
which depends on register usage or instruction ordering. And I tracked it
down to the following difference in offending instructions (both are located
in the same routine of my test program and causing the expected SIGSEGV,
suppose eax would contain a value of 0x1234):

->gcc-2.95.2:  807c38a:       dd 00         fldl   (%eax)
->gcc-3.0:     806e457:       8b 70 04      mov    0x4(%eax),%esi

siginfo.si_addr contained a correct value in the first case, but an offset
of +4 compared to the original eax value in the second case. So it may be
really a kind of processor bug (if not a feature): eax will be copied for a
slow, but parallel running memory access, while eax itself is incremented
fast. When the MPU detects the illegal access, the processor denunciates its
register eax as containing an illegal adress, and the kernel gadgetry then
propagates the contents of eax into siginfo.si_addr...  But this is wrong,
or at least eax is propagated too late.

Thats what I thought until know. But surprise, surprise: As I examined the
contents of eax located within the ucontext structure, that value was
correct! It was the orignal (but invalid) pointer value. Only
siginfo.si_addr was wrong.

Therefore I think that it is reasonable to report this somewhat strange
behaviour as a kernel bug. It is eventually located within
arch/i386/kernel/entry.S, where some sparsely documented fiddling is done
with ORIG_EAX and EAX. It would be nice if some real ix86 experts could
comment about this - and also how exactly the ix86 processors propagate an
invalid address or operand in case of trapping.


Regards,

Hugo Mildenberger (milden@dialup.nacamar.de)


My kernel version is:Linux 2.4.2 i686 unknown. The kernel headers are
matching my kernel version…

Output of cat /proc/cpuinfo:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 8
model name	: Pentium III (Coppermine)
stepping	: 3
cpu MHz	: 645.212
cache size	: 256 KB
fdiv_bug	: no
hlt_bug	: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36
mmx fxsr sse
bogomips	: 1287.78


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Questionable SIGSEGV signal handling bug concerning siginfo.si_addr  on i386-linux 2.4.2
  2001-06-26 14:24 Questionable SIGSEGV signal handling bug concerning siginfo.si_addr on i386-linux 2.4.2 Hugo Mildenberger
@ 2001-06-26 14:49 ` Brian Gerst
  2001-06-26 16:19   ` AW: " Hugo Mildenberger
  0 siblings, 1 reply; 3+ messages in thread
From: Brian Gerst @ 2001-06-26 14:49 UTC (permalink / raw)
  To: Hugo Mildenberger; +Cc: linux-kernel

Hugo Mildenberger wrote:
> 
> Dear friends,
> 
> I'm working on a library, which is able to map (at least synchronous) kernel
> signals to c++ exceptions in a way, that c++ exception handlers can
> determine reason and location of failure in a very detailed manner. Within
> that context, I detected a surprising difference in the behaviour of my test
> programs, depending on if they have been compiled by gcc-2.9.2 or gcc-3.0.
> When I compiled the program with gcc-3.0, siginfo.si_addr contained an
> address, which was always by a value of +4 too large when compared to the
> original invalid pointer value (e.g.0x1238 versus 0x1234 or 0x4 versus 0x0).
> By contrast, the gcc-2.9.2 compiled program behaved correctly.
> 
> That symptom, as I thought, may have been caused by a subtile processor bug,
> which depends on register usage or instruction ordering. And I tracked it
> down to the following difference in offending instructions (both are located
> in the same routine of my test program and causing the expected SIGSEGV,
> suppose eax would contain a value of 0x1234):
> 
> ->gcc-2.95.2:  807c38a:       dd 00         fldl   (%eax)
> ->gcc-3.0:     806e457:       8b 70 04      mov    0x4(%eax),%esi
> 
> siginfo.si_addr contained a correct value in the first case, but an offset
> of +4 compared to the original eax value in the second case. 

What you are seeing is the correct behavior.  The address in si_addr is
the exact address that caused the page fault (from register %cr2).  It
appears that you were trying to access an element of a structure, where
the structure pointer was in %eax and the offset of the element within
the structure is 4 bytes.  I suggest that if you are trying to find out
if a fault happened inside a structure you check the whole range of
addresses in that structure, because any of them could have faulted.

--

				Brian Gerst

^ permalink raw reply	[flat|nested] 3+ messages in thread

* AW: Questionable SIGSEGV signal handling bug concerning siginfo.si_addr on i386-linux 2.4.2
  2001-06-26 14:49 ` Brian Gerst
@ 2001-06-26 16:19   ` Hugo Mildenberger
  0 siblings, 0 replies; 3+ messages in thread
From: Hugo Mildenberger @ 2001-06-26 16:19 UTC (permalink / raw)
  To: Brian Gerst; +Cc: linux-kernel

Dear Brian,

it is hard to admit, but you are right even if a structure was not directly
involved. The corresponding high level statements were

double guarded_log( double *arg ) {
       try {
            return log( *arg ); // e.g. arg==0 or arg==0x1234.
       } catch ( s2e::SCN(SIGSGEV)*e ) {
         // [...]
       } catch ( s2e::SCN(SIGFPE)*e ) {
         // [...]
       }
}

... but your supposed structure access is really there - because a double
value allocates 8 bytes and the gcc-3.0 developers obviously changed code
generation. They access at least FPU parameters now differently and highest
long-word first; that explains the difference in program behavior. Your
suggestion of testing against the whole occupied address range is the
general solution which I already used intuitively if I was dealing with
structs and arrays - but I did not realize that also a double value is a
certain kind of struct if seen from the processor?s point of view, which is
(at the time of writing usually) only 32 bits wide.

Thank you for your exact analysis and your quick response.

Hugo Mildenberger

>Von: Brian Gerst [mailto:bgerst@didntduck.org]
>What you are seeing is the correct behavior.  The address in si_addr is
>the exact address that caused the page fault (from register %cr2).  It
>appears that you were trying to access an element of a structure, where
>the structure pointer was in %eax and the offset of the element within
>the structure is 4 bytes.  I suggest that if you are trying to find out
>if a fault happened inside a structure you check the whole range of
>addresses in that structure, because any of them could have faulted.
>
>--
>
>				Brian Gerst


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-06-26 16:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-26 14:24 Questionable SIGSEGV signal handling bug concerning siginfo.si_addr on i386-linux 2.4.2 Hugo Mildenberger
2001-06-26 14:49 ` Brian Gerst
2001-06-26 16:19   ` AW: " Hugo Mildenberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).