linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Catching SIGSEGV with signal() in 2.6
@ 2004-04-06  0:40 Kevin B. Hendricks
  2004-04-06  2:04 ` Ulrich Drepper
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin B. Hendricks @ 2004-04-06  0:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: bero

Hi,

Just in case this helps,  this is a simplified testcase of the OpenOffice.org 
code in question that always worked under 2.4 kernels on multiple 
architectures but fails on 2.6.X kernels on those same multiple platforms.

For some reason, the segfault generated by trying to write to address 0 can 
not be properly caught anymore (or at least it appears that way to me).

Hope this helps.

Kevin

[kbhend@base1 solar]$ cat testcase.c
#include <stdio.h>
#include <signal.h>
#include <setjmp.h>

typedef int (*TestFunc)( void* );
static jmp_buf check_env;
static int bSignal;

void SignalHdl( int sig )
{
  bSignal = 1;
  longjmp( check_env, sig );
}

int check( TestFunc func, void* p )
{
  int result;
  bSignal = 0;
  if ( !setjmp( check_env ) )
  {
        signal( SIGSEGV,        SignalHdl );
        signal( SIGBUS,         SignalHdl );
        result = func( p );
        signal( SIGSEGV,        SIG_DFL );
        signal( SIGBUS,         SIG_DFL );
  }
  if ( bSignal )
        return -1;
  else
        return 0;
}

int GetAtAddress( void* p )
{
  return *((char*)p);
}

int SetAtAddress( void* p )
{
  return *((char*)p)    = 0;
}

int CheckGetAccess( void* p )
{
  int b;
  b = -1 != check( (TestFunc)GetAtAddress, p );
  return b;
}

int CheckSetAccess( void* p )
{
  int b;
  b = -1 != check( (TestFunc)SetAtAddress, p );
  return b;
}

void InfoMemoryAccess( char* p )
{
  if ( CheckGetAccess( p ) )
    printf( "can read address %p\n", p );
  else
    printf( "can not read address %p\n", p );

  if ( CheckSetAccess( p ) )
    printf( "can write address %p\n", p );
  else
    printf( "can not write address %p\n", p );
}

int
main( int argc, char* argv[] )
{
  {
        char* p = NULL;
        InfoMemoryAccess( p );
        p = (char*)&p;
        InfoMemoryAccess( p );
  }
  exit( 0 );
}



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-06  0:40 Catching SIGSEGV with signal() in 2.6 Kevin B. Hendricks
@ 2004-04-06  2:04 ` Ulrich Drepper
  2004-04-06  3:01   ` Kevin B. Hendricks
  0 siblings, 1 reply; 14+ messages in thread
From: Ulrich Drepper @ 2004-04-06  2:04 UTC (permalink / raw)
  To: Kevin B. Hendricks; +Cc: linux-kernel

Kevin B. Hendricks wrote:

> For some reason, the segfault generated by trying to write to address 0 can 
> not be properly caught anymore (or at least it appears that way to me).

If the code would be correct you'd see the expected behavior.

> void SignalHdl( int sig )
> {
>   bSignal = 1;
>   longjmp( check_env, sig );
> }

Since you jump out of a signal handling you must use siglongmp


> int check( TestFunc func, void* p )
> {
>   int result;
>   bSignal = 0;
>   if ( !setjmp( check_env ) )

And sigsetjmp(check_env, 1) here.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-06  2:04 ` Ulrich Drepper
@ 2004-04-06  3:01   ` Kevin B. Hendricks
  2004-04-06  4:08     ` Ulrich Drepper
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin B. Hendricks @ 2004-04-06  3:01 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

Hi,

> If the code would be correct you'd see the expected behavior.
> Since you jump out of a signal handling you must use siglongmp
> And sigsetjmp(check_env, 1) here.

So the code has been wrong since the beginning and we were just "lucky" it 
worked in all pre-2.6 kernels?

I have no doubt you are right but forgiving my ignorance here, please explain 
why must we use siglongjmp when longjmping out of a signal handler given that 

1. before the next use of the handler we use signal again to properly set the 
signal handler (and the set of masked signals).

and 

2. the mask of blocked signals will include sigsegv upon entry to the signal 
handler and therefore it will be masked after the normal longjmp since a 
normal longjmp wil not change the set of masked symbols.

What am I missing that makes sigsetjmp and siglongjmp a requirement, or is 
this just part of some specification someplace?

Kevin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-06  3:01   ` Kevin B. Hendricks
@ 2004-04-06  4:08     ` Ulrich Drepper
  2004-04-06 12:02       ` Kevin B. Hendricks
  2004-04-06 15:53       ` Edgar Toernig
  0 siblings, 2 replies; 14+ messages in thread
From: Ulrich Drepper @ 2004-04-06  4:08 UTC (permalink / raw)
  To: Kevin B. Hendricks; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1300 bytes --]

Kevin B. Hendricks wrote:

> So the code has been wrong since the beginning and we were just "lucky" it 
> worked in all pre-2.6 kernels?

The old code depended on undefined behavior.


> 1. before the next use of the handler we use signal again to properly set the 
> signal handler (and the set of masked signals).

Where do you set the signal mask?  That's the point.  You don't.  This
means jumping from the signal handler causes the signal to remain
blocked.  And then


~~~~
If any of the SIGFPE, SIGILL, SIGSEGV, or SIGBUS signals are generated
while they are blocked, the result is undefined, unless the signal was
generated by the kill() function, the sigqueue() function, or the
raise() function.
~~~~

(see pthread_sigmask in POSIX) comes into play.

The second SIGSEGV signal is created with the signal blocked and since
it's neither of the functions mentioned in the text below which creates
the signal anything can happen.  The old kernel queued the signal, the
new kernel terminates the process which is much better IMO.  Try the
attached program to see why.  Also note, the 2.4 behavior is
inconsistent.  If no handler is installed the process is terminated,
regardless of the signal being masked.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

[-- Attachment #2: minsig.c --]
[-- Type: text/x-csrc, Size: 200 bytes --]

#include <signal.h>
int *p;
void
sh (int sig)
{
}
int
main(void)
{
  sigset_t s;
  sigemptyset (&s);
  sigaddset (&s, SIGSEGV);
  sigprocmask (SIG_BLOCK, &s, 0);
  signal(SIGSEGV, sh);
  return *p;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-06  4:08     ` Ulrich Drepper
@ 2004-04-06 12:02       ` Kevin B. Hendricks
  2004-04-06 15:53       ` Edgar Toernig
  1 sibling, 0 replies; 14+ messages in thread
From: Kevin B. Hendricks @ 2004-04-06 12:02 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

Hi Ulrich,

> The old code depended on undefined behavior.

Thanks for explanation.  It makes perfect sense.  I appreciate it.

Our bad assumption was that using signal to install a signalhandler on a 
specific signal unblocked that specific signal, but as you show it does not.

I will try to get a fix using sigsetjmp/siglongjmp or fork/wait into the 
forthcoming OOo 1.1.2 tree so that no more "problems" are reported.

Kevin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-06  4:08     ` Ulrich Drepper
  2004-04-06 12:02       ` Kevin B. Hendricks
@ 2004-04-06 15:53       ` Edgar Toernig
  1 sibling, 0 replies; 14+ messages in thread
From: Edgar Toernig @ 2004-04-06 15:53 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Kevin B. Hendricks, linux-kernel

Ulrich Drepper wrote:
>
> Kevin B. Hendricks wrote:
> 
> > So the code has been wrong since the beginning and we were just "lucky" it 
> > worked in all pre-2.6 kernels?
> 
> The old code depended on undefined behavior.

Maybe it's simply *old* code, possibly written under libc5.
There, signal() used SA_RESETHAND which implies SA_NODEFER
which in turn did not block the signal and exiting from the
signal handler via longjmp was OK.

With the new signal() behaviour in glibc2 one may get results
undefined by POSIX but it still worked as before because the
sigprocmask was ignored for SIGSEGV under Linux <2.6.

It's the combination of new glibc2 and new kernel that makes
code like the mentioned one break.

It has nothing to do with POSIX - for POSIX all of this is
"undefined/implementation defined behaviour".  I had chosen
to stay compatible...

Ciao, ET.

-- 
Not every program claims to be POSIX compliant (who reads
3600 pages of difficult to obtain specs?) - some are simply
Linux programs...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-05 18:17 ` Jamie Lokier
  2004-04-05 19:16   ` Chris Friesen
@ 2004-04-05 21:23   ` bero
  1 sibling, 0 replies; 14+ messages in thread
From: bero @ 2004-04-05 21:23 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel

On Mon, 5 Apr 2004, Jamie Lokier wrote:

> > See http://www.openoffice.org/issues/show_bug.cgi?id=27162
> > 
> > Is this change intentional, or a bug?
> 
> On 2.6.3, x86, SIGSEGV is being caught just fine in my test program,
> with the correct fault address, with or without SA_SIGINFO.

Seems to be triggered only by some segfaults -- a simpler test app than 
the one in the OpenOffice bug report works here too, the OpenOffice one 
crashes.

I'll try to debug it some more when I have some time, but that could take 
a while (busy ATM)

LLaP
bero

-- 
Ark Linux - Linux for the masses
http://www.arklinux.org/

Redistribution and processing of this message is subject to
http://www.arklinux.org/terms.php

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-05 20:59       ` Richard B. Johnson
  2004-04-05 21:11         ` Chris Friesen
@ 2004-04-05 21:12         ` Jamie Lokier
  1 sibling, 0 replies; 14+ messages in thread
From: Jamie Lokier @ 2004-04-05 21:12 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Chris Friesen, bero, Linux kernel

Richard B. Johnson wrote:
> Are you using a longjump to get out of the signal handler?
> You may find that you can trap SIGSEGV, but you can't exit
> from it because it will return to the instruction that
> caused the trap!!!

Thanks for stating the obvious! :)

No, actually I'm changing memory protection with mprotect() inside the
handler, so when it returns the program can continue.

But that's not relevant to the OpenOffice problem.  They have a
program which traps SIGSEGV with 2.4 and terminates suddenly with 2.6.
Obviously they aren't just returning else it wouldn't work with 2.4.

-- Jamie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-05 20:59       ` Richard B. Johnson
@ 2004-04-05 21:11         ` Chris Friesen
  2004-04-05 21:12         ` Jamie Lokier
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Friesen @ 2004-04-05 21:11 UTC (permalink / raw)
  To: root; +Cc: Jamie Lokier, bero, Linux kernel

Richard B. Johnson wrote:

 > Are you using a longjump to get out of the signal handler?
 > You may find that you can trap SIGSEGV, but you can't exit
 > from it because it will return to the instruction that
 > caused the trap!!!

That's the same as in 2.4 though.  The original poster was talking about 
behaviour changes in 2.6.

Chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-05 20:40     ` Jamie Lokier
@ 2004-04-05 20:59       ` Richard B. Johnson
  2004-04-05 21:11         ` Chris Friesen
  2004-04-05 21:12         ` Jamie Lokier
  0 siblings, 2 replies; 14+ messages in thread
From: Richard B. Johnson @ 2004-04-05 20:59 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Chris Friesen, bero, Linux kernel

On Mon, 5 Apr 2004, Jamie Lokier wrote:

> Chris Friesen wrote:
> > SA_SIGINFO implies sigaction().  The original poster was talking about
> > signal().
> >
> > That said, it seems to work with 2.6.4 on ppc32.
>
> Just tried it with 2.6.3, x86 and signal().  Works fine.
>
> -- Jamie

Are you using a longjump to get out of the signal handler?
You may find that you can trap SIGSEGV, but you can't exit
from it because it will return to the instruction that
caused the trap!!!

#include <stdio.h>
#include <signal.h>
void handler(int sig) {
    fprintf(stderr, "Caught %d\n", sig);
}
int main() {
    char *foo = NULL;
    signal(SIGSEGV, handler);
    fprintf(stderr, "Send a signal....\n");
    kill(0, SIGSEGV);
    fprintf(stderr, "Okay! That worked!\n");
//    *foo = 0;
    return 0;
}

Just un-comment the null-pointer de-reference and watch!

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-05 19:16   ` Chris Friesen
@ 2004-04-05 20:40     ` Jamie Lokier
  2004-04-05 20:59       ` Richard B. Johnson
  0 siblings, 1 reply; 14+ messages in thread
From: Jamie Lokier @ 2004-04-05 20:40 UTC (permalink / raw)
  To: Chris Friesen; +Cc: bero, linux-kernel

Chris Friesen wrote:
> SA_SIGINFO implies sigaction().  The original poster was talking about 
> signal().
> 
> That said, it seems to work with 2.6.4 on ppc32.

Just tried it with 2.6.3, x86 and signal().  Works fine.

-- Jamie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-05 18:17 ` Jamie Lokier
@ 2004-04-05 19:16   ` Chris Friesen
  2004-04-05 20:40     ` Jamie Lokier
  2004-04-05 21:23   ` bero
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Friesen @ 2004-04-05 19:16 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: bero, linux-kernel

Jamie Lokier wrote:
> bero@arklinux.org wrote:
> 
>>... doesn't seem to be possible anymore.
>>
>>See http://www.openoffice.org/issues/show_bug.cgi?id=27162
>>
>>Is this change intentional, or a bug?
> 
> 
> On 2.6.3, x86, SIGSEGV is being caught just fine in my test program,
> with the correct fault address, with or without SA_SIGINFO.

SA_SIGINFO implies sigaction().  The original poster was talking about 
signal().

That said, it seems to work with 2.6.4 on ppc32.

Chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Catching SIGSEGV with signal() in 2.6
  2004-04-05 15:25 bero
@ 2004-04-05 18:17 ` Jamie Lokier
  2004-04-05 19:16   ` Chris Friesen
  2004-04-05 21:23   ` bero
  0 siblings, 2 replies; 14+ messages in thread
From: Jamie Lokier @ 2004-04-05 18:17 UTC (permalink / raw)
  To: bero; +Cc: linux-kernel

bero@arklinux.org wrote:
> ... doesn't seem to be possible anymore.
> 
> See http://www.openoffice.org/issues/show_bug.cgi?id=27162
> 
> Is this change intentional, or a bug?

On 2.6.3, x86, SIGSEGV is being caught just fine in my test program,
with the correct fault address, with or without SA_SIGINFO.

-- Jamie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Catching SIGSEGV with signal() in 2.6
@ 2004-04-05 15:25 bero
  2004-04-05 18:17 ` Jamie Lokier
  0 siblings, 1 reply; 14+ messages in thread
From: bero @ 2004-04-05 15:25 UTC (permalink / raw)
  To: linux-kernel

... doesn't seem to be possible anymore.

See
http://www.openoffice.org/issues/show_bug.cgi?id=27162

Is this change intentional, or a bug?

LLaP
bero

-- 
Ark Linux - Linux for the masses
http://www.arklinux.org/

Redistribution and processing of this message is subject to
http://www.arklinux.org/terms.php

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-04-06 15:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-06  0:40 Catching SIGSEGV with signal() in 2.6 Kevin B. Hendricks
2004-04-06  2:04 ` Ulrich Drepper
2004-04-06  3:01   ` Kevin B. Hendricks
2004-04-06  4:08     ` Ulrich Drepper
2004-04-06 12:02       ` Kevin B. Hendricks
2004-04-06 15:53       ` Edgar Toernig
  -- strict thread matches above, loose matches on Subject: below --
2004-04-05 15:25 bero
2004-04-05 18:17 ` Jamie Lokier
2004-04-05 19:16   ` Chris Friesen
2004-04-05 20:40     ` Jamie Lokier
2004-04-05 20:59       ` Richard B. Johnson
2004-04-05 21:11         ` Chris Friesen
2004-04-05 21:12         ` Jamie Lokier
2004-04-05 21:23   ` bero

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).