All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about execve.
@ 2010-03-28  3:26 Carlos O'Donell
  2010-03-28 15:42 ` John David Anglin
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-28  3:26 UTC (permalink / raw)
  To: Helge Deller, Kyle McMartin, linux-parisc

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]

Helge,

On PARISC I'm seeing the following reproducible behvaiour:

* Parent calls vfork()
* Child of vfork() calls execve()
* Child returns from execve() and starts corrupting parent state
eventually leading to a segmentation fault.
* New process (as a result of execve) runs to completion.

What code in the Linux kernel prevents the child, which calls
execve(), from returning?

Test case attached.

Cheers,
Carlos.

[-- Attachment #2: build.sh --]
[-- Type: application/x-sh, Size: 40 bytes --]

[-- Attachment #3: pt-vfork.S --]
[-- Type: application/octet-stream, Size: 1675 bytes --]

.text ! 
.align 4 ! 
.export __vfork ! 
.type __vfork,@function ! 
__vfork: ! 
.PROC ! 
.CALLINFO FRAME=64,CALLS,SAVE_RP,ENTRY_GR=3 ! 
.ENTRY ! ! 
 /* Save return pointer. */
 stw %rp, -20(%sr0,%sp) !

 /* Get 64 bytes on the stack, save sp, and PIC register. */
 stwm %r3, 64(%sp)
 stw %sp, -4(%sp)
 stw %r19, -32(%sp)

 /* Save/restore PIC register around syscall. */
 copy %r19, %r25

 /* Load thread register. */
 mfctl %cr27, %r26 ! 
 /* Load cached parent PID. */
 ldw -1044(%r26),%r1 ! 
 /* Negate it, such that the child runs with
    a negative PID and no functions work until
    the execve. */
 sub %r0,%r1,%r1 ! 
 /* Store it back. */
 stw %r1,-1044(%r26) !

 /* Call vfork. */
 ble 0x100(%sr2,%r0)
 ldi (0 + 113),%r20 

 /* If this is the child jump to thread_start */
 cmpb,=,n %r0,%ret0,.Lthread_start ! 

 /* This is the parent. */
 /* Load thread register. */
 mfctl %cr27, %r26 ! 
 /* Load cached parent PID */
 ldw -1044(%r26),%r1 ! 
 /* Negate it (restoring it) */
 sub %r0,%r1,%r1 ! 
 /* Save it back. */
 stw %r1,-1044(%r26) ! 

 ldi -4096,%r1
 /* Unsigned compre = Was ret0 between -1 and -4096. */
 comclr,>>= %r1,%ret0,%r0
 b,n .Lerror

.Lthread_start: !

 /* Load return pointer and restore stack. */
 ldw -84(%sp), %rp
 bv %r0(%rp)
 ldwm -64(%sp), %r3

.Lerror:
 /* Negate error code. */
 sub %r0,%ret0,%r3
 .import __errno_location,code ! ! 
 /* Get address of errno. */
 bl __errno_location,%rp !

 copy %r25, %r19
 /* Save errno. */
 stw %r3, 0(%ret0)
 /* vfork returns -1 on error. */
 ldi -1, %ret0
 /* Return. */
 ldw -84(%sp), %rp
 bv %r0(%rp)
 ldwm -64(%sp), %r3
.EXIT ! .PROCEND ! .size __vfork, .-__vfork !

.weak vfork ! vfork = __vfork

[-- Attachment #4: vfork.c --]
[-- Type: text/x-csrc, Size: 589 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>

#define CALL_EXIT 0

int main (void)
{
  pid_t child;
  char *cmd[] = { "bash", "-c", "echo In child $$;", (char *)0 };
  char *env[] = { "HOME=/tmp", (char *)0 };
  int ret;

  child = vfork();

  if (child == 0)
    {
      ret = execve("/bin/bash", cmd, env);
      printf ("ret = %d\n", ret);
#if CALL_EXIT == 1
      _exit(1);
#endif
    }
  else
    {
      printf("child != 0\n");
    }

  printf("parent is %d\n", (unsigned int)getpid());
  printf("child is %d\n", (unsigned int)child);

  return 0;
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28  3:26 Question about execve Carlos O'Donell
@ 2010-03-28 15:42 ` John David Anglin
  2010-03-28 16:03   ` John David Anglin
                     ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: John David Anglin @ 2010-03-28 15:42 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Helge Deller, Kyle McMartin, linux-parisc

On Sat, 27 Mar 2010, Carlos O'Donell wrote:

> Helge,
> 
> On PARISC I'm seeing the following reproducible behvaiour:
> 
> * Parent calls vfork()
> * Child of vfork() calls execve()
> * Child returns from execve() and starts corrupting parent state
> eventually leading to a segmentation fault.
> * New process (as a result of execve) runs to completion.
> 
> What code in the Linux kernel prevents the child, which calls
> execve(), from returning?

All the execute a file commands can return if an error occurs.  Thus,
I question the fact that the execve wrapper doesn't save registers.

The vfork child runs using the parent's stack.  Thus, even the call
to execve potentially corrupts the parent's stack, and depending on
scheduling, this might cause a segmentation fault in either parent or
child.

Here's a couple of excerpts from the vfork man page for HP-UX:

Series 800

Process times for the parent and child processes within the [vfork,exec]
window may be inaccurate.

    * Parent and child processes share the same stack space within the
    [vfork,exec] window. If the size of the stack has been changed within
    this window by the child process (return from or call to a function,
    for example), it is likely that the parent and child processes will
    be killed with signal SIGSEGV or SIGBUS.

The [vfork,exec] window begins at the vfork() call and ends when the child
completes its exec() call.

The Open Group man page says:

The use of vfork() for any purpose except as a prelude to an immediate call
to a function from the exec  family, or to _exit(), is not advised.  On HP-UX,
it's not supported.

I see that this is not a new issue for linux:
http://lkml.indiana.edu/hypermail/linux/kernel/9902.0/0256.html

It is ok to treat vfork() identically to fork(), and the HP-UX
manpage indicates that this may be the case for certain HP-UX
implementations (maybe Series 700).

> Test case attached.

I was not able to save your pt-fork.S file.   mutt didn't understand
the encoding.

I tried the vfork.c test case on my c3750 with 32-bit kernel.  It
didn't segv in a limited number of runs.  However, I did notice that
getpid() is broken after vfork().

The printf call in the child path probably isn't legit.  It's also not
legit to simply return if execve fails.  _exit must be called.

My guess would be that the parent is not blocked leading to corruption.

> #include <stdio.h>
> #include <stdlib.h>
> #include <errno.h>
> #include <unistd.h>
> 
> #define CALL_EXIT 0
> 
> int main (void)
> {
>   pid_t child;
>   char *cmd[] = { "bash", "-c", "echo In child $$;", (char *)0 };
>   char *env[] = { "HOME=/tmp", (char *)0 };
>   int ret;
> 
>   child = vfork();
> 
>   if (child == 0)
>     {
>       ret = execve("/bin/bash", cmd, env);
>       printf ("ret = %d\n", ret);
> #if CALL_EXIT == 1
>       _exit(1);
> #endif
>     }
>   else
>     {
>       printf("child != 0\n");
>     }
> 
>   printf("parent is %d\n", (unsigned int)getpid());
>   printf("child is %d\n", (unsigned int)child);
> 
>   return 0;
> }

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 15:42 ` John David Anglin
@ 2010-03-28 16:03   ` John David Anglin
  2010-03-28 17:26   ` Carlos O'Donell
  2010-03-28 17:32   ` John David Anglin
  2 siblings, 0 replies; 23+ messages in thread
From: John David Anglin @ 2010-03-28 16:03 UTC (permalink / raw)
  To: dave.anglin; +Cc: carlos, deller, kyle, linux-parisc

> My guess would be that the parent is not blocked leading to corruption.

I did a quick test and the child always seems to run first.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 15:42 ` John David Anglin
  2010-03-28 16:03   ` John David Anglin
@ 2010-03-28 17:26   ` Carlos O'Donell
  2010-03-28 18:00     ` John David Anglin
  2010-03-28 17:32   ` John David Anglin
  2 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-28 17:26 UTC (permalink / raw)
  To: John David Anglin; +Cc: Helge Deller, Kyle McMartin, linux-parisc

[-- Attachment #1: Type: text/plain, Size: 2619 bytes --]

On Sun, Mar 28, 2010 at 11:42 AM, John David Anglin
<dave@hiauly1.hia.nrc.ca> wrote:
>> Test case attached.
>
> I was not able to save your pt-fork.S file.   mutt didn't understand
> the encoding.

Attached as vforktest.tgz.

> I tried the vfork.c test case on my c3750 with 32-bit kernel.  It
> didn't segv in a limited number of runs.  However, I did notice that
> getpid() is broken after vfork().

Broken how?

A small note about getpid():

Since the parent and child share the same memory space, the parent
temporarily negates the cached PID value just before the vfork. Only
the parent, after the vfork, restores the cached PID. This is on
purpose to prevent any functions like PID-related functions from
working in the child() during the window bewteen vfork and execve.

> The printf call in the child path probably isn't legit.  It's also not
> legit to simply return if execve fails.  _exit must be called.

The printf call in the child path is ABSOLUTELY wrong, but I wanted to
see what the return was. You could equally well insert:

#define ABORT_INSTRUCTION asm ("iitlbp %r0,(%sr0, %r0)")

And cause the child to abort with an invalid instruction trap, and
inspect r28 at the time of the fault to see what execve returned.

In fact I use this in my testing now to avoid printf.

> My guess would be that the parent is not blocked leading to corruption.

What I see is that the child is returning from the kernel execve, and
continuing to execute.

If I use ABORT_INSTRUCTION in the child right after the execve I see this:

carlos@firin:~/fsrc/vforkfailure/test$ ./vfork
Illegal instruction
carlos@firin:~/fsrc/vforkfailure/test$ In child 28734

This makes *no* sense.

What I don't understand is this:

* vfork'd child shares memory space with parent, therefore space id
has to be the same?
* Does execve give the process a new space id or does it simply
overwrite the entire process image?

So I had the test case try to exec a binary that just calls ABORT_INSTRUCTION.

What I noticed is the following:

carlos@firin:~/fsrc/vforkfailure/test$ ./vfork
parent sr3 is 0x3428000
Illegal instruction

There are 2 entries in dmesg:
vfork (pid 28964): Illegal instruction (code 8) at 00000000000106a7
main (pid 28965): Illegal instruction (code 8) at 00000000000104af

* The child returning from execve has the same space id as the parent
e.g. 0x3418000
* The process run via the execve has a different space id e.g. 0x3418800

What is supposed to stop the child, which called execve, from
returning to the parent?

Cheers,
Carlos.

[-- Attachment #2: vforktest.tgz --]
[-- Type: application/x-gzip, Size: 18051 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 15:42 ` John David Anglin
  2010-03-28 16:03   ` John David Anglin
  2010-03-28 17:26   ` Carlos O'Donell
@ 2010-03-28 17:32   ` John David Anglin
  2010-03-28 18:20     ` Carlos O'Donell
  2 siblings, 1 reply; 23+ messages in thread
From: John David Anglin @ 2010-03-28 17:32 UTC (permalink / raw)
  To: dave.anglin; +Cc: carlos, deller, kyle, linux-parisc

> I tried the vfork.c test case on my c3750 with 32-bit kernel.  It
> didn't segv in a limited number of runs.  However, I did notice that
> getpid() is broken after vfork().

The vfork (clone) syscall corrupts (i.e., inserts wrong value)
the parent tid.  In the following, I disabled the printf's and
execve call in Carlos's testcase.  The child just does _exit().

The fast path through getpid is:

Dump of assembler code for function getpid:
0x0001ad2c <getpid+0>:	stw rp,-14(sp)
0x0001ad30 <getpid+4>:	mfctl tr3,r20
0x0001ad34 <getpid+8>:	ldw -414(r20),r19
0x0001ad38 <getpid+12>:	cmpib,>= 0,r19,0x1ad48 <getpid+28>
0x0001ad3c <getpid+16>:	copy r19,ret0
0x0001ad40 <getpid+20>:	ldw -14(sp),rp
0x0001ad44 <getpid+24>:	bv,n r0(rp)

Breakpoint 3, 0x0001ad34 in getpid ()
(gdb) del 2
(gdb) p/x $r20
$6 = 0x9a480

Breakpoint 4, main () at vfork.c:17
17	  child = vfork();
(gdb) x/x 0x9a480 - 0x414
0x9a06c:	0x00000000
(gdb) c
Continuing.

Breakpoint 3, 0x0001ad34 in getpid ()
(gdb) x/x 0x9a480 - 0x414
0x9a06c:	0x10101364

Breakpoint 4, main () at vfork.c:17
17	  child = vfork();
(gdb) x/64x 0x9a040
0x9a040:	0x00000000	0x00000000	0x00000000	0x00000000
0x9a050:	0x00000000	0x00000000	0x00000000	0x00000000
0x9a060:	0x00000000	0x00000000	0x00000000	0x00000000
0x9a070:	0x00000000	0x00000000	0x00000000	0x00000000
0x9a080:	0xc0521150	0x00000000	0x00000000	0x00000000

Breakpoint 3, 0x0001ad34 in getpid ()
(gdb) x/64x 0x9a040
0x9a040:	0x00000000	0x00000000	0x00000000	0x00000000
0x9a050:	0x00000000	0x00000000	0x00000000	0x00000000
0x9a060:	0x00000000	0x00000000	0x00000000	0x10101364
0x9a070:	0x00000000	0x00000000	0x00000000	0x00000000
0x9a080:	0xc03ae150	0x00000000	0x00000000	0x00000000

So, the only location changed by vfork is the parent tid.

dave@hiauly6:~$ strace ./vfork
execve("./vfork", ["./vfork"], [/* 16 vars */]) = 0
newuname({sys="Linux", node="hiauly6", ...}) = 0
brk(0)                                  = 0x9a000
brk(0x9acb4)                            = 0x9acb4
brk(0xbbcb4)                            = 0xbbcb4
brk(0xbc000)                            = 0xbc000
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x9a068) = 6212
fstat64(0x1, 0xc0258a08)                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40000000
write(1, "parent is 269488996\n", 20parent is 269488996
)   = 20
exit_group(0)                           = ?

(gdb) p/x 269488996
$1 = 0x10101364

I believe this is the "getpid" implementation:

static inline __attribute__((always_inline)) pid_t
really_getpid (pid_t oldval)
{
  if (__builtin_expect (oldval == 0, 1))
    {
      pid_t selftid = THREAD_GETMEM (THREAD_SELF, tid);
      if (__builtin_expect (selftid != 0, 1))
	return selftid;
     }

  INTERNAL_SYSCALL_DECL (err);
  pid_t result = INTERNAL_SYSCALL (getpid, err, 0);

  /* We do not set the PID field in the TID here since we might be
     called from a signal handler while the thread executes fork.  */
  if (oldval == 0)
    THREAD_SETMEM (THREAD_SELF, tid, result);
  return result;
}

As a side issue, gdb can't single step over mfctl instruction ;(

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 17:26   ` Carlos O'Donell
@ 2010-03-28 18:00     ` John David Anglin
  2010-03-28 18:47       ` Carlos O'Donell
  2010-03-28 19:07       ` John David Anglin
  0 siblings, 2 replies; 23+ messages in thread
From: John David Anglin @ 2010-03-28 18:00 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: John David Anglin, Helge Deller, Kyle McMartin, linux-parisc

On Sun, 28 Mar 2010, Carlos O'Donell wrote:

> Broken how?

See other mail.

> A small note about getpid():
> 
> Since the parent and child share the same memory space, the parent
> temporarily negates the cached PID value just before the vfork. Only
> the parent, after the vfork, restores the cached PID. This is on
> purpose to prevent any functions like PID-related functions from
> working in the child() during the window bewteen vfork and execve.

Ok, it might be glibc that's broken (i.e., it doesn't restore correct
dave).

> The printf call in the child path is ABSOLUTELY wrong, but I wanted to
> see what the return was. You could equally well insert:
> 
> #define ABORT_INSTRUCTION asm ("iitlbp %r0,(%sr0, %r0)")

Tried that and it didn't trigger.

> > My guess would be that the parent is not blocked leading to corruption.
> 
> What I see is that the child is returning from the kernel execve, and
> continuing to execute.

Hmmm, I'm not seeing that, but I do have the patch installed that
reloads %r16 after calling schedule.  %r16 isn't clobbered by the
call but I believe %cr30 can change.  As a result, we may run a task
with the wrong registers with the current code.

See <http://article.gmane.org/gmane.linux.ports.parisc/2687>.

> If I use ABORT_INSTRUCTION in the child right after the execve I see this:

Already tried this.

> carlos@firin:~/fsrc/vforkfailure/test$ ./vfork
> Illegal instruction
> carlos@firin:~/fsrc/vforkfailure/test$ In child 28734
> 
> This makes *no* sense.
> 
> What I don't understand is this:
> 
> * vfork'd child shares memory space with parent, therefore space id
> has to be the same?

Sounds right.

> * Does execve give the process a new space id or does it simply
> overwrite the entire process image?
> 
> So I had the test case try to exec a binary that just calls ABORT_INSTRUCTION.
> 
> What I noticed is the following:
> 
> carlos@firin:~/fsrc/vforkfailure/test$ ./vfork
> parent sr3 is 0x3428000
> Illegal instruction
> 
> There are 2 entries in dmesg:
> vfork (pid 28964): Illegal instruction (code 8) at 00000000000106a7
> main (pid 28965): Illegal instruction (code 8) at 00000000000104af
> 
> * The child returning from execve has the same space id as the parent
> e.g. 0x3418000
> * The process run via the execve has a different space id e.g. 0x3418800

That's what I would have expected for execve.

> What is supposed to stop the child, which called execve, from
> returning to the parent?

It's only supposed to happen if there is an error in the execve call.
I think our task scheduling is broken ;(

Could you try my change and see if it helps?

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 17:32   ` John David Anglin
@ 2010-03-28 18:20     ` Carlos O'Donell
  0 siblings, 0 replies; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-28 18:20 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, Kyle McMartin, linux-parisc

On Sun, Mar 28, 2010 at 1:32 PM, John David Anglin
<dave@hiauly1.hia.nrc.ca> wrote:
>> I tried the vfork.c test case on my c3750 with 32-bit kernel. =A0It
>> didn't segv in a limited number of runs. =A0However, I did notice th=
at
>> getpid() is broken after vfork().
>
> The vfork (clone) syscall corrupts (i.e., inserts wrong value)
> the parent tid. =A0In the following, I disabled the printf's and
> execve call in Carlos's testcase. =A0The child just does _exit().

The vfork syscall goes through process.c (sys_vfork) which doesn't
tell the kernel about the parent_tidptr, or child_tidptr (passes NULL
for both).

The kernel shouldn't be touching the parent tid pointer at all.

The vfork wrapper in glibc *does* negate the cached PID value, such
that the child doesn't see a valid PID value until after execve
completes.

> The fast path through getpid is:
>
> Dump of assembler code for function getpid:
> 0x0001ad2c <getpid+0>: =A0stw rp,-14(sp)
> 0x0001ad30 <getpid+4>: =A0mfctl tr3,r20
> 0x0001ad34 <getpid+8>: =A0ldw -414(r20),r19
> 0x0001ad38 <getpid+12>: cmpib,>=3D 0,r19,0x1ad48 <getpid+28>
> 0x0001ad3c <getpid+16>: copy r19,ret0
> 0x0001ad40 <getpid+20>: ldw -14(sp),rp
> 0x0001ad44 <getpid+24>: bv,n r0(rp)
>
> Breakpoint 3, 0x0001ad34 in getpid ()
> (gdb) del 2
> (gdb) p/x $r20
> $6 =3D 0x9a480
>
> Breakpoint 4, main () at vfork.c:17
> 17 =A0 =A0 =A0 =A0child =3D vfork();
> (gdb) x/x 0x9a480 - 0x414
> 0x9a06c: =A0 =A0 =A0 =A00x00000000
> (gdb) c
> Continuing.

This is the PID of the parent, not the TID. They are actually two
different fields.

nptl/descr.h
~~~
  /* Thread ID - which is also a 'is this thread descriptor (and
     therefore stack) used' flag.  */
  pid_t tid;

  /* Process ID - thread group ID in kernel speak.  */
  pid_t pid;
~~~

The PID of all the threads in a process group is the same.

Each thread has a unique TID, which is

During the vfork the parent does this:
~~~
 /* Load thread register. */
 mfctl %cr27, %r26 !
 /* Load cached parent PID. */
 ldw -1044(%r26),%r1 !
 /* Negate it, such that the child runs with
    a negative PID and no functions work until
    the execve. */
 sub %r0,%r1,%r1 !
 /* Store it back. */
 stw %r1,-1044(%r26) !
~~~

> Breakpoint 3, 0x0001ad34 in getpid ()
> (gdb) x/x 0x9a480 - 0x414
> 0x9a06c: =A0 =A0 =A0 =A00x10101364

I don't see how this is the negative of 0x0, it should just be 0x0. I
wonder what changed it.

> Breakpoint 4, main () at vfork.c:17
> 17 =A0 =A0 =A0 =A0child =3D vfork();
> (gdb) x/64x 0x9a040
> 0x9a040: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00=
x00000000 =A0 =A0 =A00x00000000
> 0x9a050: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00=
x00000000 =A0 =A0 =A00x00000000
> 0x9a060: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00=
x00000000 =A0 =A0 =A00x00000000
> 0x9a070: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00=
x00000000 =A0 =A0 =A00x00000000
> 0x9a080: =A0 =A0 =A0 =A00xc0521150 =A0 =A0 =A00x00000000 =A0 =A0 =A00=
x00000000 =A0 =A0 =A00x00000000
>
> Breakpoint 3, 0x0001ad34 in getpid ()
> (gdb) x/64x 0x9a040
> 0x9a040: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00=
x00000000 =A0 =A0 =A00x00000000
> 0x9a050: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00=
x00000000 =A0 =A0 =A00x00000000
> 0x9a060: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00=
x00000000 =A0 =A0 =A00x10101364
> 0x9a070: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00=
x00000000 =A0 =A0 =A00x00000000
> 0x9a080: =A0 =A0 =A0 =A00xc03ae150 =A0 =A0 =A00x00000000 =A0 =A0 =A00=
x00000000 =A0 =A0 =A00x00000000
>
> So, the only location changed by vfork is the parent tid.

s/tid/pid/g.

> dave@hiauly6:~$ strace ./vfork
> execve("./vfork", ["./vfork"], [/* 16 vars */]) =3D 0
> newuname({sys=3D"Linux", node=3D"hiauly6", ...}) =3D 0
> brk(0) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0=3D 0x9a000
> brk(0x9acb4) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D=
 0x9acb4
> brk(0xbbcb4) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D=
 0xbbcb4
> brk(0xbc000) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D=
 0xbc000
> clone(child_stack=3D0, flags=3DCLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI=
D|SIGCHLD, child_tidptr=3D0x9a068) =3D 6212
> fstat64(0x1, 0xc0258a08) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,=
 0) =3D 0x40000000
> write(1, "parent is 269488996\n", 20parent is 269488996
> ) =A0 =3D 20
> exit_group(0) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D=
 ?
>
> (gdb) p/x 269488996
> $1 =3D 0x10101364
>
> I believe this is the "getpid" implementation:
>
> static inline __attribute__((always_inline)) pid_t
> really_getpid (pid_t oldval)
> {
> =A0if (__builtin_expect (oldval =3D=3D 0, 1))
> =A0 =A0{
> =A0 =A0 =A0pid_t selftid =3D THREAD_GETMEM (THREAD_SELF, tid);
> =A0 =A0 =A0if (__builtin_expect (selftid !=3D 0, 1))
> =A0 =A0 =A0 =A0return selftid;
> =A0 =A0 }
>
> =A0INTERNAL_SYSCALL_DECL (err);
> =A0pid_t result =3D INTERNAL_SYSCALL (getpid, err, 0);
>
> =A0/* We do not set the PID field in the TID here since we might be
> =A0 =A0 called from a signal handler while the thread executes fork. =
=A0*/
> =A0if (oldval =3D=3D 0)
> =A0 =A0THREAD_SETMEM (THREAD_SELF, tid, result);
> =A0return result;
> }
>
> As a side issue, gdb can't single step over mfctl instruction ;(

We'll fix gdb next :-)

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 18:00     ` John David Anglin
@ 2010-03-28 18:47       ` Carlos O'Donell
  2010-03-28 19:07       ` John David Anglin
  1 sibling, 0 replies; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-28 18:47 UTC (permalink / raw)
  To: John David Anglin; +Cc: Helge Deller, Kyle McMartin, linux-parisc

[-- Attachment #1: Type: text/plain, Size: 1891 bytes --]

On Sun, Mar 28, 2010 at 2:00 PM, John David Anglin
<dave@hiauly1.hia.nrc.ca> wrote:
> Ok, it might be glibc that's broken (i.e., it doesn't restore correct
> dave).

I don't think this is broken.

>> The printf call in the child path is ABSOLUTELY wrong, but I wanted to
>> see what the return was. You could equally well insert:
>>
>> #define ABORT_INSTRUCTION asm ("iitlbp %r0,(%sr0, %r0)")
>
> Tried that and it didn't trigger.

OK, your kernel is perhaps fixed?

>> > My guess would be that the parent is not blocked leading to corruption.
>>
>> What I see is that the child is returning from the kernel execve, and
>> continuing to execute.
>
> Hmmm, I'm not seeing that, but I do have the patch installed that
> reloads %r16 after calling schedule.  %r16 isn't clobbered by the
> call but I believe %cr30 can change.  As a result, we may run a task
> with the wrong registers with the current code.

When schedule returns you are either still in the current process, or
have switched to another process via arch/parisc/kernel/entry.S
(_switch_to), so yes %cr30 will have been changed and must be reloaded
(kernel threads IIUC don't have their own register state, you are
simply just switching %cr30 around and reloading the appropriate
pt_regs).

Where are the new space ids loaded into the space registers for the new process?

> See <http://article.gmane.org/gmane.linux.ports.parisc/2687>.

Applied.

> It's only supposed to happen if there is an error in the execve call.
> I think our task scheduling is broken ;(
>
> Could you try my change and see if it helps?

It doesn't help.

Just so we are on the same page, please find attached vforktest.tgz.
Unpack it.
Run ./build.sh
Run ./vfork

What are the results on your machine?

I'm running a 64-bit kernel 2.6.32 (kyle's tree) SMP two PA8700's at 650MHz.

Cheers,
Carlos.

[-- Attachment #2: vforktest.tgz --]
[-- Type: application/x-gzip, Size: 21529 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 18:00     ` John David Anglin
  2010-03-28 18:47       ` Carlos O'Donell
@ 2010-03-28 19:07       ` John David Anglin
  2010-03-28 19:31         ` John David Anglin
  1 sibling, 1 reply; 23+ messages in thread
From: John David Anglin @ 2010-03-28 19:07 UTC (permalink / raw)
  To: dave.anglin; +Cc: carlos, deller, kyle, linux-parisc

> > A small note about getpid():
> > 
> > Since the parent and child share the same memory space, the parent
> > temporarily negates the cached PID value just before the vfork. Only
> > the parent, after the vfork, restores the cached PID. This is on
> > purpose to prevent any functions like PID-related functions from
> > working in the child() during the window bewteen vfork and execve.
> 
> Ok, it might be glibc that's broken (i.e., it doesn't restore correct
> dave).

The bad value is stored here:

0x0003f578 <fork+700>:	ldil L%7b800,r26
0x0003f57c <fork+704>:	b,l 0x35d74 <_IO_list_unlock>,rp
0x0003f580 <fork+708>:	stw r19,-414(r6)
(gdb) p/x $r19
$2 = 0x10101364

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 19:07       ` John David Anglin
@ 2010-03-28 19:31         ` John David Anglin
  2010-03-28 19:40           ` Carlos O'Donell
  0 siblings, 1 reply; 23+ messages in thread
From: John David Anglin @ 2010-03-28 19:31 UTC (permalink / raw)
  To: John David Anglin; +Cc: dave.anglin, carlos, deller, kyle, linux-parisc

> The bad value is stored here:
> 
> 0x0003f578 <fork+700>:	ldil L%7b800,r26
> 0x0003f57c <fork+704>:	b,l 0x35d74 <_IO_list_unlock>,rp
> 0x0003f580 <fork+708>:	stw r19,-414(r6)
> (gdb) p/x $r19
> $2 = 0x10101364

%r19 is clobbered by the clone syscall:

(gdb) disass 0x0003f44c 0x0003f458
Dump of assembler code from 0x3f44c to 0x3f458:
0x0003f44c <fork+400>:	be,l 100(sr2,r0),sr0,r31
0x0003f450 <fork+404>:	ldi 78,r20
0x0003f454 <fork+408>:	nop

Breakpoint 5, 0x0003f44c in fork ()
(gdb) p/x $r19
$6 = 0x0

Breakpoint 4, 0x0003f454 in fork ()
(gdb) p/x $r19
$7 = 0x10101364

Same value for %r19 is returned for both parent and child.

This is using application linked with -static (i.e., libc.a).  Probably,
shared lib is different.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 19:31         ` John David Anglin
@ 2010-03-28 19:40           ` Carlos O'Donell
       [not found]             ` <20100328202239.162785145@hiauly1.hia.nrc.ca>
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-28 19:40 UTC (permalink / raw)
  To: John David Anglin; +Cc: dave.anglin, deller, kyle, linux-parisc

On Sun, Mar 28, 2010 at 3:31 PM, John David Anglin
<dave@hiauly1.hia.nrc.ca> wrote:
>> The bad value is stored here:
>>
>> 0x0003f578 <fork+700>: =A0 =A0 =A0 =A0ldil L%7b800,r26
>> 0x0003f57c <fork+704>: =A0 =A0 =A0 =A0b,l 0x35d74 <_IO_list_unlock>,=
rp
>> 0x0003f580 <fork+708>: =A0 =A0 =A0 =A0stw r19,-414(r6)
>> (gdb) p/x $r19
>> $2 =3D 0x10101364
>
> %r19 is clobbered by the clone syscall:

This is a known issue with the kernel assembly, fork, vfork, and clone
all clobber r19, which is why glibc saves/restore r19 across
syscalls... but only in the PIC case.

ARCH_FORK ->
  INLINE_SYSCALL (clone, 5,                                            =
 \
                  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, =
 \
                  NULL, NULL, NULL, &THREAD_SELF->tid)

INLINE_SYSCALL does:
                asm volatile(                                          =
 \
                        SAVE_ASM_PIC                                   =
 \
                        "       ble  0x100(%%sr2, %%r0)\n"             =
 \
                        "       ldi %1, %%r20\n"                       =
 \
                        LOAD_ASM_PIC                                   =
 \
                        : "=3Dr" (__res)                               =
   \
                        : "i" (SYS_ify(name)) PIC_REG_USE ASM_ARGS_##nr=
 \
                        : "memory", CALL_CLOB_REGS CLOB_ARGS_##nr      =
 \
                );                                                     =
 \

So r19 is saved/restored around the clone syscall, but *ONLY* if the
code is PIC.

Therefore in your non-PIC case it will corrupt r19.

Actually, looking at the fork/vfork/clone wrappers in entry.S they
clobber two registers:

        /* These are call-clobbered registers and therefore
           also syscall-clobbered (we hope). */
        STREG   %r2,PT_GR19(%r1)        /* save for child */
        STREG   %r30,PT_GR21(%r1)

It would appear that r21 is also clobbered.

Does this mean that we need to either (a) fix the kernel to stop
clobbering r19 and r21 or fix userspace?

I'm inclined to think the kernel is wrong, and lazy here, and pt_regs
should have another PT_GR2_CHILD and PT_GR30_CHILD entries for the
case above.

Comments?

> (gdb) disass 0x0003f44c 0x0003f458
> Dump of assembler code from 0x3f44c to 0x3f458:
> 0x0003f44c <fork+400>: =A0be,l 100(sr2,r0),sr0,r31
> 0x0003f450 <fork+404>: =A0ldi 78,r20
> 0x0003f454 <fork+408>: =A0nop
>
> Breakpoint 5, 0x0003f44c in fork ()
> (gdb) p/x $r19
> $6 =3D 0x0
>
> Breakpoint 4, 0x0003f454 in fork ()
> (gdb) p/x $r19
> $7 =3D 0x10101364
>
> Same value for %r19 is returned for both parent and child.
>
> This is using application linked with -static (i.e., libc.a). =A0Prob=
ably,
> shared lib is different.

I hope my explanation above shows why r19 is clobbered.

I hadn't thought about the static case and a non-PIC register use of r1=
9.

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
       [not found]             ` <20100328202239.162785145@hiauly1.hia.nrc.ca>
@ 2010-03-28 20:39               ` Carlos O'Donell
       [not found]                 ` <20100328210134.3D6005145@hiauly1.hia.nrc.ca>
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-28 20:39 UTC (permalink / raw)
  To: John David Anglin; +Cc: dave.anglin, deller, kyle, linux-parisc

On Sun, Mar 28, 2010 at 4:22 PM, John David Anglin
<dave@hiauly1.hia.nrc.ca> wrote:
>> Does this mean that we need to either (a) fix the kernel to stop
>> clobbering r19 and r21 or fix userspace?
>
> r21 is an syscall argument register and it looks like CLOB_ARGS_*
> handles it. =A0So, I doubt we need to worry about about r21.

Right.

>> I'm inclined to think the kernel is wrong, and lazy here, and pt_reg=
s
>> should have another PT_GR2_CHILD and PT_GR30_CHILD entries for the
>> case above.
>
> Fixing the kernel will fix broken userspace code. =A0Fixing glibc to
> save and restore r19 in non-PIC code will allow new code to work
> with broken kernels. =A0Thus, I tend to think both should be fixed.

Agreed, I'll get a fix for this into glibc.

> Instead of using PT_GR19 and PT_GR21, can't we use PT_SR4 and PT_SR5
> to store r2 and r30 for the child. =A0We don't seem to ever actually
> save sr4 or sr5 in the syscall path and we restore fixed values.
> This is a hack, but avoids increasing the size of the task structure.

I don't like hacks like this. They are confusing and hard to maintain.

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
       [not found]                 ` <20100328210134.3D6005145@hiauly1.hia.nrc.ca>
@ 2010-03-28 21:04                   ` Carlos O'Donell
       [not found]                     ` <20100328212110.03F224E77@hiauly1.hia.nrc.ca>
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-28 21:04 UTC (permalink / raw)
  To: John David Anglin; +Cc: dave.anglin, deller, kyle, linux-parisc

On Sun, Mar 28, 2010 at 5:01 PM, John David Anglin
<dave@hiauly1.hia.nrc.ca> wrote:
>> > Instead of using PT_GR19 and PT_GR21, can't we use PT_SR4 and PT_S=
R5
>> > to store r2 and r30 for the child. =A0We don't seem to ever actual=
ly
>> > save sr4 or sr5 in the syscall path and we restore fixed values.
>> > This is a hack, but avoids increasing the size of the task structu=
re.
>>
>> I don't like hacks like this. They are confusing and hard to maintai=
n.
>
> However, it says in ptrace.h:
>
> =A0* N.B. gdb/strace care about the size and offsets within this
> =A0* structure. If you change things, you may break object compatibil=
ity
> =A0* for those applications.
>
> There is one free slot in pt_regs:
>
> =A0 =A0 =A0 =A0unsigned long pad0; =A0 =A0 /* available for other use=
s */
>
> Given that this struct leaks to userspace, I don't think we should
> mess with it.

Maybe. I would certainly test this out before adding new entries.

If I provide you with a new glibc with the r19 fix would you be able
to test it out?

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
       [not found]                     ` <20100328212110.03F224E77@hiauly1.hia.nrc.ca>
@ 2010-03-28 23:12                       ` Carlos O'Donell
  2010-03-28 23:59                         ` John David Anglin
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-28 23:12 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, Kyle McMartin, linux-parisc

On Sun, Mar 28, 2010 at 5:21 PM, John David Anglin
<dave@hiauly1.hia.nrc.ca> wrote:
>> If I provide you with a new glibc with the r19 fix would you be able
>> to test it out?
>
> Yes.

I know it's easier to debug a static application for these cases, so I
think a fixed glibc will help.

I have an unstable libc6 currently building with the changes to
save/restore r19 in all cases.

We still aren't any closer to explaining why my shared case doesn't work.

I still don't have a good answer to this question:

* What prevents the execve from returning to userspace?

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 23:12                       ` Carlos O'Donell
@ 2010-03-28 23:59                         ` John David Anglin
  2010-03-29  0:24                           ` Carlos O'Donell
  0 siblings, 1 reply; 23+ messages in thread
From: John David Anglin @ 2010-03-28 23:59 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Helge Deller, Kyle McMartin, linux-parisc

On Sun, 28 Mar 2010, Carlos O'Donell wrote:

> On Sun, Mar 28, 2010 at 5:21 PM, John David Anglin
> <dave@hiauly1.hia.nrc.ca> wrote:
> >> If I provide you with a new glibc with the r19 fix would you be able
> >> to test it out?
> >
> > Yes.
> 
> I know it's easier to debug a static application for these cases, so I
> think a fixed glibc will help.

Yes.

It's proving more difficult than I thought to fix the kernel and
I'm not sure I understand how this works:

        LDREG   TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_SIZE(%r30), %r1
	LDREG   TASK_PT_GR19(%r1),%r2
	b       wrapper_exit
	copy    %r0,%r28

It appears the child has to access the parent's pt_regs struct
to load the return for the child.

> I have an unstable libc6 currently building with the changes to
> save/restore r19 in all cases.

Great.

> We still aren't any closer to explaining why my shared case doesn't work.
> 
> I still don't have a good answer to this question:
> 
> * What prevents the execve from returning to userspace?

I presume you mean the caller's userspace.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-28 23:59                         ` John David Anglin
@ 2010-03-29  0:24                           ` Carlos O'Donell
  2010-03-29  2:38                             ` John David Anglin
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-29  0:24 UTC (permalink / raw)
  To: John David Anglin; +Cc: Helge Deller, Kyle McMartin, linux-parisc

On Sun, Mar 28, 2010 at 7:59 PM, John David Anglin
<dave@hiauly1.hia.nrc.ca> wrote:
>> I know it's easier to debug a static application for these cases, so=
 I
>> think a fixed glibc will help.
>
> Yes.
>
> It's proving more difficult than I thought to fix the kernel and
> I'm not sure I understand how this works:
>
> =A0 =A0 =A0 =A0LDREG =A0 TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_SIZE=
(%r30), %r1
> =A0 =A0 =A0 =A0LDREG =A0 TASK_PT_GR19(%r1),%r2
> =A0 =A0 =A0 =A0b =A0 =A0 =A0 wrapper_exit
> =A0 =A0 =A0 =A0copy =A0 =A0%r0,%r28
>
> It appears the child has to access the parent's pt_regs struct
> to load the return for the child.

I see that copy_process in do_fork will give the child a copy of the
entire pt_reg structure so it has access to these values anyway?

>> * What prevents the execve from returning to userspace?
>
> I presume you mean the caller's userspace.

Yes, the process that calls execve.

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-29  0:24                           ` Carlos O'Donell
@ 2010-03-29  2:38                             ` John David Anglin
  2010-03-29 12:11                               ` Carlos O'Donell
  0 siblings, 1 reply; 23+ messages in thread
From: John David Anglin @ 2010-03-29  2:38 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: John David Anglin, Helge Deller, Kyle McMartin, linux-parisc

On Sun, 28 Mar 2010, Carlos O'Donell wrote:

> On Sun, Mar 28, 2010 at 7:59 PM, John David Anglin
> <dave@hiauly1.hia.nrc.ca> wrote:
> >> I know it's easier to debug a static application for these cases, =
so I
> >> think a fixed glibc will help.
> >
> > Yes.
> >
> > It's proving more difficult than I thought to fix the kernel and
> > I'm not sure I understand how this works:
> >
> > =A0 =A0 =A0 =A0LDREG =A0 TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_SI=
ZE(%r30), %r1
> > =A0 =A0 =A0 =A0LDREG =A0 TASK_PT_GR19(%r1),%r2
> > =A0 =A0 =A0 =A0b =A0 =A0 =A0 wrapper_exit
> > =A0 =A0 =A0 =A0copy =A0 =A0%r0,%r28
> >
> > It appears the child has to access the parent's pt_regs struct
> > to load the return for the child.
>=20
> I see that copy_process in do_fork will give the child a copy of the
> entire pt_reg structure so it has access to these values anyway?

I got a kernel to boot which shouldn't clobber r19.  Turned out to
be a typo.  This uses PT_SR4 for the kernel return for the child.
Will test with gcc build.

Will try again using the pad0 field later.

Dave
--=20
J. David Anglin                                  dave.anglin@nrc-cnrc.g=
c.ca
National Research Council of Canada              (613) 990-0752 (FAX: 9=
52-6602)
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-29  2:38                             ` John David Anglin
@ 2010-03-29 12:11                               ` Carlos O'Donell
  2010-03-29 14:02                                 ` John David Anglin
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-29 12:11 UTC (permalink / raw)
  To: John David Anglin; +Cc: Helge Deller, Kyle McMartin, linux-parisc

On Sun, Mar 28, 2010 at 10:38 PM, John David Anglin
<dave@hiauly1.hia.nrc.ca> wrote:
> On Sun, 28 Mar 2010, Carlos O'Donell wrote:
>
>> On Sun, Mar 28, 2010 at 7:59 PM, John David Anglin
>> <dave@hiauly1.hia.nrc.ca> wrote:
>> >> I know it's easier to debug a static application for these cases,=
 so I
>> >> think a fixed glibc will help.
>> >
>> > Yes.
>> >
>> > It's proving more difficult than I thought to fix the kernel and
>> > I'm not sure I understand how this works:
>> >
>> > =A0 =A0 =A0 =A0LDREG =A0 TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_S=
IZE(%r30), %r1
>> > =A0 =A0 =A0 =A0LDREG =A0 TASK_PT_GR19(%r1),%r2
>> > =A0 =A0 =A0 =A0b =A0 =A0 =A0 wrapper_exit
>> > =A0 =A0 =A0 =A0copy =A0 =A0%r0,%r28
>> >
>> > It appears the child has to access the parent's pt_regs struct
>> > to load the return for the child.
>>
>> I see that copy_process in do_fork will give the child a copy of the
>> entire pt_reg structure so it has access to these values anyway?
>
> I got a kernel to boot which shouldn't clobber r19. =A0Turned out to
> be a typo. =A0This uses PT_SR4 for the kernel return for the child.
> Will test with gcc build.
>
> Will try again using the pad0 field later.

Sorry, what turned out to be a typo?

I guess any static binary that calls fork/vfork/clone is susceptible
to corruption of r19 in this way.

I don't know how much that effects userspace, since most is shared,
*however* the dynamic linker is technically a static binary :-)

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-29 12:11                               ` Carlos O'Donell
@ 2010-03-29 14:02                                 ` John David Anglin
  2010-03-31 13:55                                   ` Carlos O'Donell
  0 siblings, 1 reply; 23+ messages in thread
From: John David Anglin @ 2010-03-29 14:02 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: John David Anglin, Helge Deller, Kyle McMartin, linux-parisc

On Mon, 29 Mar 2010, Carlos O'Donell wrote:

> > I got a kernel to boot which shouldn't clobber r19. =A0Turned out t=
o
> > be a typo. =A0This uses PT_SR4 for the kernel return for the child.
> > Will test with gcc build.
> >
> > Will try again using the pad0 field later.
>=20
> Sorry, what turned out to be a typo?

I used PT_SR4 instead of TASK_PT_SR4 in child_return ;(

> I guess any static binary that calls fork/vfork/clone is susceptible
> to corruption of r19 in this way.
>=20
> I don't know how much that effects userspace, since most is shared,
> *however* the dynamic linker is technically a static binary :-)

The change may not fix much, but there's no technical reason that
the kernel needs to clobber r19.

Dave
--=20
J. David Anglin                                  dave.anglin@nrc-cnrc.g=
c.ca
National Research Council of Canada              (613) 990-0752 (FAX: 9=
52-6602)
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-29 14:02                                 ` John David Anglin
@ 2010-03-31 13:55                                   ` Carlos O'Donell
  2010-04-01 10:54                                     ` Helge Deller
  0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-31 13:55 UTC (permalink / raw)
  To: John David Anglin; +Cc: Helge Deller, Kyle McMartin, linux-parisc

On Mon, Mar 29, 2010 at 10:02 AM, John David Anglin
<dave@hiauly1.hia.nrc.ca> wrote:
> The change may not fix much, but there's no technical reason that
> the kernel needs to clobber r19.

Rebuilt 2.10.2-6 here with r19 save for all cases.

http://www.parisc-linux.org/~carlos/saver19/

Two regressions during build, but they appear unrelated.

Could you test this in a chroot and tell me if it works for you?

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-03-31 13:55                                   ` Carlos O'Donell
@ 2010-04-01 10:54                                     ` Helge Deller
  2010-04-01 13:33                                       ` John David Anglin
  0 siblings, 1 reply; 23+ messages in thread
From: Helge Deller @ 2010-04-01 10:54 UTC (permalink / raw)
  To: Carlos O'Donell, dave.anglin; +Cc: linux-parisc, kyle

> Rebuilt 2.10.2-6 here with r19 save for all cases.
>=20
> http://www.parisc-linux.org/~carlos/saver19/

Hi Carlos,

I installed this libc on one of my servers.
I didn't ran any bigger tests yet, but at least it seems to have fixed =
the issue that an ssh into the machine often didn't let me in the first=
 time. Now it just works...

Thanks!
Helge
--=20
NEU: GMX DSL f=FCr 19,99 EUR/mtl. und ohne Mindest-Laufzeit!
http://portal.gmx.net/de/go/dsl02
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-04-01 10:54                                     ` Helge Deller
@ 2010-04-01 13:33                                       ` John David Anglin
  2010-04-01 18:55                                         ` Helge Deller
  0 siblings, 1 reply; 23+ messages in thread
From: John David Anglin @ 2010-04-01 13:33 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, kyle, carlos, dave.anglin

Hi Helge,

> I installed this libc on one of my servers.
> I didn't ran any bigger tests yet, but at least it seems to have fixed the issue that an ssh into the machine often didn't let me in the first time. Now it just works...

I think this might be another issue.  Strangely, I used to see this
one my c3750 and it' now gone away.

I originally noticed a problem with Carlos's vfork testcase when compiled
with -static.  The parent pid printed by the parent was corrupted (always
same value) by the vfork call (r19 was not saved across syscall).  If you
check that it now works, the problem is probably fixed in glibc.

I have attached below a kernel fix for this bug.  This allows old user
code to work correctly.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

Signed-off-by: John Daqvid Anglin <dave.anglin@nrc-cnrc.gc.ca>

diff --git a/arch/parisc/kernel/asm-offsets.c b/arch/parisc/kernel/asm-offsets.c
index ec787b4..b2f35b2 100644
--- a/arch/parisc/kernel/asm-offsets.c
+++ b/arch/parisc/kernel/asm-offsets.c
@@ -137,6 +137,7 @@ int main(void)
 	DEFINE(TASK_PT_IAOQ0, offsetof(struct task_struct, thread.regs.iaoq[0]));
 	DEFINE(TASK_PT_IAOQ1, offsetof(struct task_struct, thread.regs.iaoq[1]));
 	DEFINE(TASK_PT_CR27, offsetof(struct task_struct, thread.regs.cr27));
+	DEFINE(TASK_PT_SYSCALL_RP, offsetof(struct task_struct, thread.regs.pad0));
 	DEFINE(TASK_PT_ORIG_R28, offsetof(struct task_struct, thread.regs.orig_r28));
 	DEFINE(TASK_PT_KSP, offsetof(struct task_struct, thread.regs.ksp));
 	DEFINE(TASK_PT_KPC, offsetof(struct task_struct, thread.regs.kpc));
@@ -225,6 +226,7 @@ int main(void)
 	DEFINE(PT_IAOQ0, offsetof(struct pt_regs, iaoq[0]));
 	DEFINE(PT_IAOQ1, offsetof(struct pt_regs, iaoq[1]));
 	DEFINE(PT_CR27, offsetof(struct pt_regs, cr27));
+	DEFINE(PT_SYSCALL_RP, offsetof(struct pt_regs, pad0));
 	DEFINE(PT_ORIG_R28, offsetof(struct pt_regs, orig_r28));
 	DEFINE(PT_KSP, offsetof(struct pt_regs, ksp));
 	DEFINE(PT_KPC, offsetof(struct pt_regs, kpc));
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 3a44f7f..5772cba 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -1772,9 +1744,9 @@ ENTRY(sys_fork_wrapper)
 	ldo	-16(%r30),%r29		/* Reference param save area */
 #endif
 
-	/* These are call-clobbered registers and therefore
-	   also syscall-clobbered (we hope). */
-	STREG	%r2,PT_GR19(%r1)	/* save for child */
+	STREG	%r2,PT_SYSCALL_RP(%r1)
+
+	/* WARNING - Clobbers r21, userspace must save! */
 	STREG	%r30,PT_GR21(%r1)
 
 	LDREG	PT_GR30(%r1),%r25
@@ -1804,7 +1776,7 @@ ENTRY(child_return)
 	nop
 
 	LDREG	TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_SIZE(%r30), %r1
-	LDREG	TASK_PT_GR19(%r1),%r2
+	LDREG	TASK_PT_SYSCALL_RP(%r1),%r2
 	b	wrapper_exit
 	copy	%r0,%r28
 ENDPROC(child_return)
@@ -1823,8 +1795,9 @@ ENTRY(sys_clone_wrapper)
 	ldo	-16(%r30),%r29		/* Reference param save area */
 #endif
 
-	/* WARNING - Clobbers r19 and r21, userspace must save these! */
-	STREG	%r2,PT_GR19(%r1)	/* save for child */
+	STREG	%r2,PT_SYSCALL_RP(%r1)
+
+	/* WARNING - Clobbers r21, userspace must save! */
 	STREG	%r30,PT_GR21(%r1)
 	BL	sys_clone,%r2
 	copy	%r1,%r24
@@ -1847,7 +1820,9 @@ ENTRY(sys_vfork_wrapper)
 	ldo	-16(%r30),%r29		/* Reference param save area */
 #endif
 
-	STREG	%r2,PT_GR19(%r1)	/* save for child */
+	STREG	%r2,PT_SYSCALL_RP(%r1)
+
+	/* WARNING - Clobbers r21, userspace must save! */
 	STREG	%r30,PT_GR21(%r1)
 
 	BL	sys_vfork,%r2
diff --git a/arch/parisc/hpux/wrappers.S b/arch/parisc/hpux/wrappers.S
index 58c53c8..bdcea33 100644
--- a/arch/parisc/hpux/wrappers.S
+++ b/arch/parisc/hpux/wrappers.S
@@ -88,7 +88,7 @@ ENTRY(hpux_fork_wrapper)
 
 	STREG	%r2,-20(%r30)
 	ldo	64(%r30),%r30
-	STREG	%r2,PT_GR19(%r1)	;! save for child
+	STREG	%r2,PT_SYSCALL_RP(%r1)	;! save for child
 	STREG	%r30,PT_GR21(%r1)	;! save for child
 
 	LDREG	PT_GR30(%r1),%r25
@@ -132,7 +132,7 @@ ENTRY(hpux_child_return)
 	bl,n	schedule_tail, %r2
 #endif
 
-	LDREG	TASK_PT_GR19-TASK_SZ_ALGN-128(%r30),%r2
+	LDREG	TASK_PT_SYSCALL_RP-TASK_SZ_ALGN-128(%r30),%r2
 	b fork_return
 	copy %r0,%r28
 ENDPROC(hpux_child_return)

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Question about execve.
  2010-04-01 13:33                                       ` John David Anglin
@ 2010-04-01 18:55                                         ` Helge Deller
  0 siblings, 0 replies; 23+ messages in thread
From: Helge Deller @ 2010-04-01 18:55 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, kyle, carlos, dave.anglin

On 04/01/2010 03:33 PM, John David Anglin wrote:
> Hi Helge,
>
>> I installed this libc on one of my servers.
>> I didn't ran any bigger tests yet, but at least it seems to have fixed the issue that an ssh into the machine often didn't let me in the first time. Now it just works...
>
> I think this might be another issue.  Strangely, I used to see this
> one my c3750 and it' now gone away.

Sadly I was wrong and answered too fast.
The ssh-issue came back at some point, so the new glibc didn't fixed it.... :-(
  
> I originally noticed a problem with Carlos's vfork testcase when compiled
> with -static.  The parent pid printed by the parent was corrupted (always
> same value) by the vfork call (r19 was not saved across syscall).  If you
> check that it now works, the problem is probably fixed in glibc.
>
> I have attached below a kernel fix for this bug.  This allows old user
> code to work correctly.

Thanks for the work you and Carlos do!

Helge

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-04-01 18:55 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-28  3:26 Question about execve Carlos O'Donell
2010-03-28 15:42 ` John David Anglin
2010-03-28 16:03   ` John David Anglin
2010-03-28 17:26   ` Carlos O'Donell
2010-03-28 18:00     ` John David Anglin
2010-03-28 18:47       ` Carlos O'Donell
2010-03-28 19:07       ` John David Anglin
2010-03-28 19:31         ` John David Anglin
2010-03-28 19:40           ` Carlos O'Donell
     [not found]             ` <20100328202239.162785145@hiauly1.hia.nrc.ca>
2010-03-28 20:39               ` Carlos O'Donell
     [not found]                 ` <20100328210134.3D6005145@hiauly1.hia.nrc.ca>
2010-03-28 21:04                   ` Carlos O'Donell
     [not found]                     ` <20100328212110.03F224E77@hiauly1.hia.nrc.ca>
2010-03-28 23:12                       ` Carlos O'Donell
2010-03-28 23:59                         ` John David Anglin
2010-03-29  0:24                           ` Carlos O'Donell
2010-03-29  2:38                             ` John David Anglin
2010-03-29 12:11                               ` Carlos O'Donell
2010-03-29 14:02                                 ` John David Anglin
2010-03-31 13:55                                   ` Carlos O'Donell
2010-04-01 10:54                                     ` Helge Deller
2010-04-01 13:33                                       ` John David Anglin
2010-04-01 18:55                                         ` Helge Deller
2010-03-28 17:32   ` John David Anglin
2010-03-28 18:20     ` Carlos O'Donell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.