vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1)

All of lore.kernel.org
 help / color / mirror / Atom feed

* vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1)
       [not found] ` <bug-215769-216477-to2O9X1Knw@https.bugzilla.kernel.org/>
@ 2022-04-02 21:15   ` Alejandro Colomar (man-pages)
  2022-04-04  8:05     ` Christian Brauner
  0 siblings, 1 reply; 7+ messages in thread
From: Alejandro Colomar (man-pages) @ 2022-04-02 21:15 UTC (permalink / raw)
  To: bugzilla-daemon, linux-kernel,
	Коренберг
	Марк,
	Christian Brauner
  Cc: Andrei Vagin, Dmitry Safonov, Thomas Gleixner, Arnd Bergmann,
	Serge Hallyn

[Added some kernel CCs that may know what's going on]

Hi,

On 3/31/22 09:53, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215769
> 
> --- Comment #3 from Коренберг Марк (socketpair@gmail.com) ---
> Hi,
> I appreciate depth of information validation. Actually, you are right. vfork()
> DOES work with pid=1 processes. I figured out the cause in my case. In order to
> reproduce -- add unshare(CLONE_NEWTIME) just before vfork(). Now, I don't know
> if it's a bug in vfork() or in fork(). Yes, both are clone() actually.
> 
> In any case, they should either both give EINVAL or both don't fail. But it's
> definitely bug in the kernel around CLONE_NEWTIME.
> 

On 3/31/22 10:12, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215769
>
> --- Comment #4 from Коренберг Марк (socketpair@gmail.com) ---
> #define _GNU_SOURCE 1
> #include <stdio.h>
> #include <sched.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <err.h>
>
> #ifndef CLONE_NEWTIME
> #define CLONE_NEWTIME   0x00000080
> #endif
>
> int main (void)
> {
>   if (unshare (CLONE_NEWTIME))  err (EXIT_FAILURE, "UNSHARE_NEWTIME");
>
>   pid_t pid;
>   switch (pid=vfork ())
>   {
>   case 0:
>     _exit(0);
>   case -1:
>     err(EXIT_FAILURE, "vfork BUG");
>   default:
>     waitpid(pid, NULL, 0);
>   }
>   return 0;
> }
>

I could reproduce it with the following code.  I tried
syscall(SYS_vfork) to make sure it's not a problem in the libc wrapper,
and to make sure I do call vfork(2).  If I replace vfork(2) with
fork(2), I don't get the error.


$ cat vfork.c
#define _GNU_SOURCE
#include <err.h>
#include <linux/sched.h>
#include <sched.h>
#include <signal.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <unistd.h>

int main(void)
{
	pid_t pid;

	if (unshare(CLONE_NEWTIME) == -1)
		err(EXIT_FAILURE, "unshare(2)");
	if (signal(SIGCHLD, SIG_IGN) == SIG_ERR)
		err(EXIT_FAILURE, "sigaction(2)");
	pid = syscall(SYS_vfork);
	switch (pid) {
	case 0:
		errx(EXIT_SUCCESS, "Grandchild exiting normally.");
	case -1:
		/* If we got here, the report is confirmed. */
		err(EXIT_FAILURE, "vfork(2)");
	default:
		errx(EXIT_SUCCESS, "Child exiting normally.");
	}
}

$ cc -Wall -Wextra -Werror vfork.c
$ sudo ./a.out
a.out: vfork(2): Invalid argument



$ grep_syscall_def vfork
kernel/fork.c:2711:
SYSCALL_DEFINE0(vfork)
{
	struct kernel_clone_args args = {
		.flags		= CLONE_VFORK | CLONE_VM,
		.exit_signal	= SIGCHLD,
	};

	return kernel_clone(&args);
}


Maybe someone in the kernel can send some patch for the clone(2) and/or
vfork(2) manual pages that explains the reason (if it's intended).


Thanks,

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1)
  2022-04-02 21:15   ` vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1) Alejandro Colomar (man-pages)
@ 2022-04-04  8:05     ` Christian Brauner
  2022-04-05 19:28       ` vfork(2) behavior not consistent with fork(2) (was: vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1)) Alejandro Colomar
  0 siblings, 1 reply; 7+ messages in thread
From: Christian Brauner @ 2022-04-04  8:05 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: bugzilla-daemon, linux-kernel,
	Коренберг
	Марк,
	Andrei Vagin, Dmitry Safonov, Thomas Gleixner, Arnd Bergmann,
	Serge Hallyn

On Sat, Apr 02, 2022 at 11:15:52PM +0200, Alejandro Colomar (man-pages) wrote:
> [Added some kernel CCs that may know what's going on]
> 
> Hi,
> 
> On 3/31/22 09:53, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215769
> > 
> > --- Comment #3 from Коренберг Марк (socketpair@gmail.com) ---
> > Hi,
> > I appreciate depth of information validation. Actually, you are right. vfork()
> > DOES work with pid=1 processes. I figured out the cause in my case. In order to
> > reproduce -- add unshare(CLONE_NEWTIME) just before vfork(). Now, I don't know
> > if it's a bug in vfork() or in fork(). Yes, both are clone() actually.
> > 
> > In any case, they should either both give EINVAL or both don't fail. But it's
> > definitely bug in the kernel around CLONE_NEWTIME.
> > 
> 
> On 3/31/22 10:12, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215769
> >
> > --- Comment #4 from Коренберг Марк (socketpair@gmail.com) ---
> > #define _GNU_SOURCE 1
> > #include <stdio.h>
> > #include <sched.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <sys/types.h>
> > #include <sys/wait.h>
> > #include <err.h>
> >
> > #ifndef CLONE_NEWTIME
> > #define CLONE_NEWTIME   0x00000080
> > #endif
> >
> > int main (void)
> > {
> >   if (unshare (CLONE_NEWTIME))  err (EXIT_FAILURE, "UNSHARE_NEWTIME");
> >
> >   pid_t pid;
> >   switch (pid=vfork ())
> >   {
> >   case 0:
> >     _exit(0);
> >   case -1:
> >     err(EXIT_FAILURE, "vfork BUG");
> >   default:
> >     waitpid(pid, NULL, 0);
> >   }
> >   return 0;
> > }
> >
> 
> I could reproduce it with the following code.  I tried
> syscall(SYS_vfork) to make sure it's not a problem in the libc wrapper,
> and to make sure I do call vfork(2).  If I replace vfork(2) with
> fork(2), I don't get the error.
> 
> 
> $ cat vfork.c
> #define _GNU_SOURCE
> #include <err.h>
> #include <linux/sched.h>
> #include <sched.h>
> #include <signal.h>
> #include <stdlib.h>
> #include <sys/syscall.h>
> #include <unistd.h>
> 
> int main(void)
> {
> 	pid_t pid;
> 
> 	if (unshare(CLONE_NEWTIME) == -1)
> 		err(EXIT_FAILURE, "unshare(2)");
> 	if (signal(SIGCHLD, SIG_IGN) == SIG_ERR)
> 		err(EXIT_FAILURE, "sigaction(2)");
> 	pid = syscall(SYS_vfork);
> 	switch (pid) {
> 	case 0:
> 		errx(EXIT_SUCCESS, "Grandchild exiting normally.");
> 	case -1:
> 		/* If we got here, the report is confirmed. */
> 		err(EXIT_FAILURE, "vfork(2)");
> 	default:
> 		errx(EXIT_SUCCESS, "Child exiting normally.");
> 	}
> }
> 
> $ cc -Wall -Wextra -Werror vfork.c
> $ sudo ./a.out
> a.out: vfork(2): Invalid argument
> 
> 
> 
> $ grep_syscall_def vfork
> kernel/fork.c:2711:
> SYSCALL_DEFINE0(vfork)
> {
> 	struct kernel_clone_args args = {
> 		.flags		= CLONE_VFORK | CLONE_VM,
> 		.exit_signal	= SIGCHLD,
> 	};
> 
> 	return kernel_clone(&args);
> }
> 
> 
> Maybe someone in the kernel can send some patch for the clone(2) and/or
> vfork(2) manual pages that explains the reason (if it's intended).

Hey Alejandro,

I won't be able to send a patch very soon but I can at least explain why
you see EINVAL. :)

This is intended. 

vfork() suspends the parent process and the child process will share the
same vm as the parent process. If the child process is in a new time
namespace different from its parent process it is not allowed to be in
the same threadgroup or share virtual memory with the parent process.
That's why you see EINVAL.

Note, the unshare(CLONE_NEWTIME) call will _not_ cause the calling
process to be moved into a different time namespace. Only the newly
created child process will be after a subsequent
fork()/vfork()/clone()/clone3()...

The semantics are equivalent to that of CLONE_NEWPID in this regard. You
can see this via /proc/<pid>/ns/ where you see two entries for pid
namespaces and also two entries for time namespaces:

* CLONE_NEWTIME
  * /proc/<pid>/ns/time			// current time namespace
  * /proc/<pid>/ns/time_for_children	// time namespace for the new child process

If during fork:

parent_process->time != parent_process->time_for_children

and either CLONE_VM or CLONE_THREAD is set you see EINVAL.

You can thus replicate the same error via:

unshare(CLONE_NEWTIME)

and a

clone() or clone3() call with CLONE_VM or CLONE_THREAD.

Christian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* vfork(2) behavior not consistent with fork(2) (was: vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1))
  2022-04-04  8:05     ` Christian Brauner
@ 2022-04-05 19:28       ` Alejandro Colomar
  2022-04-06  8:46         ` Christian Brauner
  0 siblings, 1 reply; 7+ messages in thread
From: Alejandro Colomar @ 2022-04-05 19:28 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel,
	Коренберг
	Марк,
	Andrei Vagin, Dmitry Safonov, Thomas Gleixner, Arnd Bergmann,
	Serge Hallyn, bugzilla-daemon

Hey, Christian!

On 4/4/22 10:05, Christian Brauner wrote:
> On Sat, Apr 02, 2022 at 11:15:52PM +0200, Alejandro Colomar (man-pages) wrote:
>> [Added some kernel CCs that may know what's going on]
[...]
>> Maybe someone in the kernel can send some patch for the clone(2) and/or
>> vfork(2) manual pages that explains the reason (if it's intended).
> 
> Hey Alejandro,
> 
> I won't be able to send a patch very soon but I can at least explain why
> you see EINVAL. :)

Don't hurry, we're not planning to release any soon :)

> 
> This is intended.
> 
> vfork() suspends the parent process and the child process will share the
> same vm as the parent process. If the child process is in a new time
> namespace different from its parent process it is not allowed to be in
> the same threadgroup or share virtual memory with the parent process.
> That's why you see EINVAL.

That makes a lot of sense to me.

> 
> Note, the unshare(CLONE_NEWTIME) call will _not_ cause the calling
> process to be moved into a different time namespace. Only the newly
> created child process will be after a subsequent
> fork()/vfork()/clone()/clone3()...
> 
> The semantics are equivalent to that of CLONE_NEWPID in this regard. You
> can see this via /proc/<pid>/ns/ where you see two entries for pid
> namespaces and also two entries for time namespaces:
> 
> * CLONE_NEWTIME
>    * /proc/<pid>/ns/time			// current time namespace
>    * /proc/<pid>/ns/time_for_children	// time namespace for the new child process

Also makes sense.  Michael taught me that a few weeks ago :)

This also triggers some doubt:  will the same problem happen with 
CLONE_NEWPID since it also moves the child into a new ns (in this case a 
PID one)?  See test program below.

> 
> If during fork:
> 
> parent_process->time != parent_process->time_for_children
> 
> and either CLONE_VM or CLONE_THREAD is set you see EINVAL.
> 
> You can thus replicate the same error via:
> 
> unshare(CLONE_NEWTIME)
> 
> and a
> 
> clone() or clone3() call with CLONE_VM or CLONE_THREAD.

So, to test my doubts, I wrote this similar program (and also similar 
programs where only the CLONE_NEW* flag was changed, one with 
CLONE_NEWTIME, and one with CLONE_NEWNS)):

$ cat vfork_newpid.c
#define _GNU_SOURCE
#include <err.h>
#include <errno.h>
#include <linux/sched.h>
#include <sched.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <unistd.h>

static char *const child_argv[] = {
	"print_pid",
	NULL
};

static char *const child_envp[] = {
	NULL
};

int
main(void)
{
	pid_t pid;

	printf("%s: PID: %ld\n", program_invocation_short_name, (long) getpid());

	if (unshare(CLONE_NEWPID) == -1)
		err(EXIT_FAILURE, "unshare(2)");
	if (signal(SIGCHLD, SIG_IGN) == SIG_ERR)
		err(EXIT_FAILURE, "signal(2)");

	pid = syscall(SYS_vfork);
	//pid = vfork();  // This behaves differently.
	switch (pid) {
	case 0:
		execve("/home/alx/tmp/print_pid", child_argv, child_envp);
		err(EXIT_SUCCESS, "PID %jd exiting after execve(2)",
		    (long) getpid());
	case -1:
		err(EXIT_FAILURE, "vfork(2)");
	default:
		errx(EXIT_SUCCESS, "Parent exiting after vfork(2).");
	}
}

$ cat print_pid.c
#include <err.h>
#include <stdlib.h>
#include <unistd.h>

int
main(void)
{
	errx(EXIT_SUCCESS, "PID %jd exiting.", (long) getpid());
}

$ cc -Wall -Wextra -Werror -o print_pid print_pid.c
$ cc -Wall -Wextra -Werror -o vfork_newpid vfork_newpid.c
$
$
$ sudo ./vfork_newpid
vfork_newpid: PID: 8479
vfork_newpid: PID 8479 exiting after execve(2): Success
print_pid: PID 1 exiting.
$
$
$ sudo ./vfork_newtime
vfork_newtime: PID: 8484
vfork_newtime: vfork(2): Invalid argument
$
$
$ sudo ./vfork_newns
vfork_newns: PID: 8486
vfork_newns: PID 8486 exiting after execve(2): Success
print_pid: PID 8487 exiting.


The first thing I noted is that usage of vfork(2) differs considerably 
from fork(2), and that's something that's not clear by reading the 
manual page.  It sais that the parent process is suspended until the 
child calls execve(2), but I expected it to mean that vfork(2) doesn't 
return to the parent until that happened, but was otherwise transparent. 
  I was wrong and my tests showed me that.

I was going to propose an example program for the manual page, when I 
decided to try a slightly different thing: call vfork() instead of 
syscall(SYS_vfork);  that changed the behavior to the same one as with 
fork(2) (i.e., the parent resumes after vfork(2) returns the PID of the 
child.

Is that also intended?  I couldn't find the glibc wrapper source code, 
so I don't know what is glibc doing here, but I straced the processes, 
and they're all calling vfork(), so the behavior should be consistent; 
it's quite weird.  I'm very confused at this point.


I'm also wondering why it's okay to have processes in different PID ns 
share the same vm, but I guess that's implementation details that I 
don't need to care that much.


Thanks for the details!

Cheers,

Alex

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfork(2) behavior not consistent with fork(2) (was: vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1))
  2022-04-05 19:28       ` vfork(2) behavior not consistent with fork(2) (was: vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1)) Alejandro Colomar
@ 2022-04-06  8:46         ` Christian Brauner
  2022-04-06 19:22           ` Alejandro Colomar
  0 siblings, 1 reply; 7+ messages in thread
From: Christian Brauner @ 2022-04-06  8:46 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-kernel,
	Коренберг
	Марк,
	Andrei Vagin, Dmitry Safonov, Thomas Gleixner, Arnd Bergmann,
	Serge Hallyn, bugzilla-daemon

On Tue, Apr 05, 2022 at 09:28:12PM +0200, Alejandro Colomar wrote:
> Hey, Christian!
> 
> On 4/4/22 10:05, Christian Brauner wrote:
> > On Sat, Apr 02, 2022 at 11:15:52PM +0200, Alejandro Colomar (man-pages) wrote:
> > > [Added some kernel CCs that may know what's going on]
> [...]
> > > Maybe someone in the kernel can send some patch for the clone(2) and/or
> > > vfork(2) manual pages that explains the reason (if it's intended).
> > 
> > Hey Alejandro,
> > 
> > I won't be able to send a patch very soon but I can at least explain why
> > you see EINVAL. :)
> 
> Don't hurry, we're not planning to release any soon :)
> 
> > 
> > This is intended.
> > 
> > vfork() suspends the parent process and the child process will share the
> > same vm as the parent process. If the child process is in a new time
> > namespace different from its parent process it is not allowed to be in
> > the same threadgroup or share virtual memory with the parent process.
> > That's why you see EINVAL.
> 
> That makes a lot of sense to me.
> 
> > 
> > Note, the unshare(CLONE_NEWTIME) call will _not_ cause the calling
> > process to be moved into a different time namespace. Only the newly
> > created child process will be after a subsequent
> > fork()/vfork()/clone()/clone3()...
> > 
> > The semantics are equivalent to that of CLONE_NEWPID in this regard. You
> > can see this via /proc/<pid>/ns/ where you see two entries for pid
> > namespaces and also two entries for time namespaces:
> > 
> > * CLONE_NEWTIME
> >    * /proc/<pid>/ns/time			// current time namespace
> >    * /proc/<pid>/ns/time_for_children	// time namespace for the new child process
> 
> Also makes sense.  Michael taught me that a few weeks ago :)
> 
> This also triggers some doubt:  will the same problem happen with
> CLONE_NEWPID since it also moves the child into a new ns (in this case a PID
> one)?  See test program below.

No, it won't. A pid namespace places no relevant constraints on vm usage
whereas a time namespace does.
If a task joins a new time namespace it'll clean the VVAR page tables
and refault them with the new layout after the timens change. That
affects all tasks which use the same task->mm.

Since CLONE_THREAD implies CLONE_VM this would affect the whole
thread-group behind their back. All threads would suddenly change
timens.

No such issues exist for pid namespaces; they don't need to alter
task->mm.

> 
> > 
> > If during fork:
> > 
> > parent_process->time != parent_process->time_for_children
> > 
> > and either CLONE_VM or CLONE_THREAD is set you see EINVAL.
> > 
> > You can thus replicate the same error via:
> > 
> > unshare(CLONE_NEWTIME)
> > 
> > and a
> > 
> > clone() or clone3() call with CLONE_VM or CLONE_THREAD.
> 
> So, to test my doubts, I wrote this similar program (and also similar
> programs where only the CLONE_NEW* flag was changed, one with CLONE_NEWTIME,
> and one with CLONE_NEWNS)):
> 
> $ cat vfork_newpid.c
> #define _GNU_SOURCE
> #include <err.h>
> #include <errno.h>
> #include <linux/sched.h>
> #include <sched.h>
> #include <signal.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/syscall.h>
> #include <unistd.h>
> 
> static char *const child_argv[] = {
> 	"print_pid",
> 	NULL
> };
> 
> static char *const child_envp[] = {
> 	NULL
> };
> 
> int
> main(void)
> {
> 	pid_t pid;
> 
> 	printf("%s: PID: %ld\n", program_invocation_short_name, (long) getpid());
> 
> 	if (unshare(CLONE_NEWPID) == -1)
> 		err(EXIT_FAILURE, "unshare(2)");
> 	if (signal(SIGCHLD, SIG_IGN) == SIG_ERR)
> 		err(EXIT_FAILURE, "signal(2)");
> 
> 	pid = syscall(SYS_vfork);
> 	//pid = vfork();  // This behaves differently.
> 	switch (pid) {
> 	case 0:
> 		execve("/home/alx/tmp/print_pid", child_argv, child_envp);
> 		err(EXIT_SUCCESS, "PID %jd exiting after execve(2)",
> 		    (long) getpid());
> 	case -1:
> 		err(EXIT_FAILURE, "vfork(2)");
> 	default:
> 		errx(EXIT_SUCCESS, "Parent exiting after vfork(2).");
> 	}
> }
> 
> $ cat print_pid.c
> #include <err.h>
> #include <stdlib.h>
> #include <unistd.h>
> 
> int
> main(void)
> {
> 	errx(EXIT_SUCCESS, "PID %jd exiting.", (long) getpid());
> }
> 
> $ cc -Wall -Wextra -Werror -o print_pid print_pid.c
> $ cc -Wall -Wextra -Werror -o vfork_newpid vfork_newpid.c
> $
> $
> $ sudo ./vfork_newpid
> vfork_newpid: PID: 8479
> vfork_newpid: PID 8479 exiting after execve(2): Success
> print_pid: PID 1 exiting.
> $
> $
> $ sudo ./vfork_newtime
> vfork_newtime: PID: 8484
> vfork_newtime: vfork(2): Invalid argument
> $
> $
> $ sudo ./vfork_newns
> vfork_newns: PID: 8486
> vfork_newns: PID 8486 exiting after execve(2): Success
> print_pid: PID 8487 exiting.
> 
> 
> The first thing I noted is that usage of vfork(2) differs considerably from
> fork(2), and that's something that's not clear by reading the manual page.
> It sais that the parent process is suspended until the child calls
> execve(2), but I expected it to mean that vfork(2) doesn't return to the
> parent until that happened, but was otherwise transparent.  I was wrong and
> my tests showed me that.
> 
> I was going to propose an example program for the manual page, when I
> decided to try a slightly different thing: call vfork() instead of
> syscall(SYS_vfork);  that changed the behavior to the same one as with
> fork(2) (i.e., the parent resumes after vfork(2) returns the PID of the
> child.
> 
> Is that also intended?  I couldn't find the glibc wrapper source code, so I
> don't know what is glibc doing here, but I straced the processes, and
> they're all calling vfork(), so the behavior should be consistent; it's
> quite weird.  I'm very confused at this point.

glibc does vfork() via inline assembly massaging. There's probably
atfork handlers and a bunch of other stuff involved so it's difficult to
do a remote diagnosis.
(And note that calling anything other than execve() or _exit() after
vfork() is basically undefined behavior.)

> 
> 
> I'm also wondering why it's okay to have processes in different PID ns share
> the same vm, but I guess that's implementation details that I don't need to
> care that much.

See earlier in the thread.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* vfork(2) behavior not consistent with fork(2) (was: vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1))
  2022-04-06  8:46         ` Christian Brauner
@ 2022-04-06 19:22           ` Alejandro Colomar
  2022-04-06 19:26             ` vfork(2) behavior not consistent with fork(2) Florian Weimer
  0 siblings, 1 reply; 7+ messages in thread
From: Alejandro Colomar @ 2022-04-06 19:22 UTC (permalink / raw)
  To: Christian Brauner, Florian Weimer, Michael Kerrisk
  Cc: linux-kernel,
	Коренберг
	Марк,
	Andrei Vagin, Dmitry Safonov, Thomas Gleixner, Arnd Bergmann,
	Serge Hallyn, bugzilla-daemon, linux-api

> $ sudo ./vfork_newpid
> vfork_newpid: PID: 8479
> vfork_newpid: PID 8479 exiting after execve(2): Success
> print_pid: PID 1 exiting. 


I definitely think this is a kernel (or glibc) bug.
execve(2) is supposed to _never_ return 0 (and errno 0).
I submitted a new bug to discuss it.

Please see <https://bugzilla.kernel.org/show_bug.cgi?id=215813>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfork(2) behavior not consistent with fork(2)
  2022-04-06 19:22           ` Alejandro Colomar
@ 2022-04-06 19:26             ` Florian Weimer
  2022-04-06 19:31               ` [Bug 215813] syscall(SYS_vfork) causes execve() to return 0. (was: vfork(2) behavior not consistent with fork(2)) Alejandro Colomar
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-04-06 19:26 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Christian Brauner, Michael Kerrisk, linux-kernel,
	Коренберг
	Марк,
	Andrei Vagin, Dmitry Safonov, Thomas Gleixner, Arnd Bergmann,
	Serge Hallyn, bugzilla-daemon, linux-api

* Alejandro Colomar:

>> $ sudo ./vfork_newpid
>> vfork_newpid: PID: 8479
>> vfork_newpid: PID 8479 exiting after execve(2): Success
>> print_pid: PID 1 exiting. 
>
>
> I definitely think this is a kernel (or glibc) bug.
> execve(2) is supposed to _never_ return 0 (and errno 0).
> I submitted a new bug to discuss it.
>
> Please see <https://bugzilla.kernel.org/show_bug.cgi?id=215813>

It's not clear if this is valid.  The syscall function in glibc does not
protect the on-stack return address against overwriting, so it can't be
used to call SYS_vfork on x86.

Can you reproduce this with a true inline syscall, or the glibc vfork
function (which protects the return address)?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 215813] syscall(SYS_vfork) causes execve() to return 0. (was: vfork(2) behavior not consistent with fork(2))
  2022-04-06 19:26             ` vfork(2) behavior not consistent with fork(2) Florian Weimer
@ 2022-04-06 19:31               ` Alejandro Colomar
  0 siblings, 0 replies; 7+ messages in thread
From: Alejandro Colomar @ 2022-04-06 19:31 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Christian Brauner, Michael Kerrisk, linux-kernel,
	Коренберг
	Марк,
	Andrei Vagin, Dmitry Safonov, Thomas Gleixner, Arnd Bergmann,
	Serge Hallyn, linux-api, bugzilla-daemon

Hi Florian,

On 4/6/22 21:26, Florian Weimer wrote:
> It's not clear if this is valid.  The syscall function in glibc does not
> protect the on-stack return address against overwriting, so it can't be
> used to call SYS_vfork on x86.
> 
> Can you reproduce this with a true inline syscall, or the glibc vfork
> function (which protects the return address)?

If you tell me how I can call a syscall without the libc wrapper or 
syscall(2), sure, I can try :)

If syscall(2) can't be used for certain syscalls, maybe we should 
document that.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-06 21:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-215769-216477@https.bugzilla.kernel.org/>
     [not found] ` <bug-215769-216477-to2O9X1Knw@https.bugzilla.kernel.org/>
2022-04-02 21:15   ` vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1) Alejandro Colomar (man-pages)
2022-04-04  8:05     ` Christian Brauner
2022-04-05 19:28       ` vfork(2) behavior not consistent with fork(2) (was: vfork(2) fails after unshare(CLONE_NEWTIME) (was: [Bug 215769] man 2 vfork() does not document corner case when PID == 1)) Alejandro Colomar
2022-04-06  8:46         ` Christian Brauner
2022-04-06 19:22           ` Alejandro Colomar
2022-04-06 19:26             ` vfork(2) behavior not consistent with fork(2) Florian Weimer
2022-04-06 19:31               ` [Bug 215813] syscall(SYS_vfork) causes execve() to return 0. (was: vfork(2) behavior not consistent with fork(2)) Alejandro Colomar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.