linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serue@us.ibm.com>
To: Ying Han <yinghan@google.com>
Cc: Greg Kurz <gkurz@fr.ibm.com>, Cedric Le Goater <legoater@free.fr>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-api@vger.kernel.org, containers@lists.linux-foundation.org,
	mpm@selenic.com, linux-kernel@vger.kernel.org,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	linux-mm@kvack.org, tglx@linutronix.de, viro@zeniv.linux.org.uk,
	hpa@zytor.com, mingo@elte.hu, torvalds@linux-foundation.org,
	Alexey Dobriyan <adobriyan@gmail.com>,
	xemul@openvz.org
Subject: Re: How much of a mess does OpenVZ make? ;) Was: What can OpenVZ do?
Date: Fri, 13 Mar 2009 12:37:19 -0500	[thread overview]
Message-ID: <20090313173719.GA12179@us.ibm.com> (raw)
In-Reply-To: <604427e00903122129y37ad791aq5fe7ef2552415da9@mail.gmail.com>

Quoting Ying Han (yinghan@google.com):
> Hi Serge:
> I made a patch based on Oren's tree recently which implement a new
> syscall clone_with_pid. I tested with checkpoint/restart process tree
> and it works as expected.
> This patch has some hack in it which i made a copy of libc's clone and
> made modifications of passing one more argument(pid number). I will
> try to clean up the code and do more testing.
> 
> New syscall clone_with_pid
> Implement a new syscall which clone a thread with a preselected pid number.
> 
> clone_with_pid(child_func, child_stack + CHILD_STACK - 16,
> 			CLONE_WITH_PID|SIGCHLD, pid, NULL);
> 
> Signed-off-by: Ying Han <yinghan@google.com>
> 
> diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
> index 87803da..b5a1b03 100644
> --- a/arch/x86/include/asm/syscalls.h
> +++ b/arch/x86/include/asm/syscalls.h
> @@ -26,6 +26,7 @@ asmlinkage int sys_fork(struct pt_regs);
>  asmlinkage int sys_clone(struct pt_regs);
>  asmlinkage int sys_vfork(struct pt_regs);
>  asmlinkage int sys_execve(struct pt_regs);
> +asmlinkage int sys_clone_with_pid(struct pt_regs);
> 
>  /* kernel/signal_32.c */
>  asmlinkage int sys_sigsuspend(int, int, old_sigset_t);
> diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32
> index a5f9e09..f10ca0e 100644
> --- a/arch/x86/include/asm/unistd_32.h
> +++ b/arch/x86/include/asm/unistd_32.h
> @@ -340,6 +340,7 @@
>  #define __NR_inotify_init1	332
>  #define __NR_checkpoint		333
>  #define __NR_restart		334
> +#define __NR_clone_with_pid	335
> 
>  #ifdef __KERNEL__
> 
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 0a1302f..88ae634 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -8,7 +8,6 @@
>  /*
>   * This file handles the architecture-dependent parts of process handling..
>   */
> -
>  #include <stdarg.h>
> 
>  #include <linux/cpu.h>
> @@ -652,6 +651,28 @@ asmlinkage int sys_clone(struct pt_regs regs)
>  	return do_fork(clone_flags, newsp, &regs, 0, parent_tidptr, child_tidptr);
>  }
> 
> +/**
> + * sys_clone_with_pid - clone a thread with pre-select pid number.
> + */
> +asmlinkage int sys_clone_with_pid(struct pt_regs regs)
> +{
> +	unsigned long clone_flags;
> +	unsigned long newsp;
> +	int __user *parent_tidptr, *child_tidptr;
> +	pid_t pid_nr;
> +
> +	clone_flags = regs.bx;
> +	newsp = regs.cx;
> +	parent_tidptr = (int __user *)regs.dx;
> +	child_tidptr = (int __user *)regs.di;
> +	pid_nr = regs.bp;

Hi,

Thanks for looking at this.  I appreciate the patch.  Two comments
however.

As I was saying in another email, i think that so long as we are going
with a new syscall, we should make sure that it suffices for nested
pid namespaces.  So send in an array of pids and its lengths, then
use an algorithm like Alexey's to fill in the sent-in pids if possible.

> +	if (!newsp)
> +		newsp = regs.sp;
> +	return do_fork(clone_flags, newsp, &regs, pid_nr, parent_tidptr,
> +			child_tidptr);
> +}
> +
>  /*
>   * This is trivial, and on the face of it looks like it
>   * could equally well be done in user mode.

> diff --git a/kernel/fork.c b/kernel/fork.c
> index 085ce56..262ae1e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -10,7 +10,7 @@
>   * Fork is rather simple, once you get the hang of it, but the memory
>   * management can be a bitch. See 'mm/memory.c': 'copy_page_range()'
>   */
> -
> +#define DEBUG
>  #include <linux/slab.h>
>  #include <linux/init.h>
>  #include <linux/unistd.h>
> @@ -959,10 +959,19 @@ static struct task_struct *copy_process(unsigned long cl
>  	int retval;
>  	struct task_struct *p;
>  	int cgroup_callbacks_done = 0;
> +	pid_t clone_pid = stack_size;

Note that some architectures (i.e. ia64) actually use the stack_size
sent to copy_thread, so you at least need to zero out stack_size here.

And I suspect there are cases where a stack_size is actually sent in,
so this doesn't seem legitimate, but I haven't tracked down the callers.
If you tell me you think there should be no case where a real stack_size
is sent in, well, I'll feel better if you prove it by breaking the patch
up so that:

1. you remove the stack_size argument from copy_process.  Test such a
kernel on some architectures (boot+ltp, for instance).

2. add a chosen_pid argument to copy_process.

Then we can be sure noone is using the field.

thanks,
-serge

  parent reply	other threads:[~2009-03-13 17:37 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-27 17:07 [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Oren Laadan
2009-01-27 17:07 ` [RFC v13][PATCH 01/14] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-01-27 17:20   ` Randy Dunlap
2009-01-27 17:08 ` [RFC v13][PATCH 02/14] Checkpoint/restart: initial documentation Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 03/14] Make file_pos_read/write() public Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 04/14] General infrastructure for checkpoint restart Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 05/14] x86 support for checkpoint/restart Oren Laadan
2009-02-24  7:47   ` Nathan Lynch
2009-02-24 16:06     ` Dave Hansen
2009-03-18  7:21     ` Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 06/14] Dump memory address space Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 07/14] Restore " Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 08/14] Infrastructure for shared objects Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 09/14] Dump open file descriptors Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 10/14] Restore open file descriprtors Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 11/14] External checkpoint of a task other than ourself Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 12/14] Track in-kernel when we expect checkpoint/restart to work Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 13/14] Checkpoint multiple processes Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 14/14] Restart " Oren Laadan
2009-02-10 17:05 ` [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Dave Hansen
2009-02-11 22:14   ` Andrew Morton
2009-02-12  9:17     ` Ingo Molnar
2009-02-12 18:11       ` Dave Hansen
2009-02-12 20:48         ` Serge E. Hallyn
2009-02-13 10:20         ` Ingo Molnar
2009-02-12 18:11     ` Dave Hansen
2009-02-12 19:30       ` Matt Mackall
2009-02-12 19:42         ` Andrew Morton
2009-02-12 21:51           ` What can OpenVZ do? Dave Hansen
2009-02-12 22:10             ` Andrew Morton
2009-02-12 23:04               ` How much of a mess does OpenVZ make? ;) Was: " Dave Hansen
2009-02-26 15:57                 ` Alexey Dobriyan
2009-03-10 21:53                   ` Alexey Dobriyan
2009-03-10 23:28                     ` Serge E. Hallyn
2009-03-11  8:26                     ` Cedric Le Goater
2009-03-12 14:53                       ` Serge E. Hallyn
2009-03-12 21:01                         ` Greg Kurz
2009-03-12 21:21                           ` Serge E. Hallyn
2009-03-13  4:29                             ` Ying Han
2009-03-13  5:34                               ` Sukadev Bhattiprolu
2009-03-13  6:19                                 ` Ying Han
2009-03-13 17:27                                 ` Linus Torvalds
2009-03-13 19:02                                   ` Serge E. Hallyn
2009-03-13 19:35                                   ` Alexey Dobriyan
2009-03-13 21:01                                     ` Linus Torvalds
2009-03-13 21:51                                       ` Dave Hansen
2009-03-13 22:15                                         ` Oren Laadan
2009-03-14  0:27                                           ` Eric W. Biederman
2009-03-14  8:12                                             ` Ingo Molnar
2009-03-16 22:33                                               ` Kevin Fox
2009-03-19 21:19                                               ` Eric W. Biederman
2009-03-14  0:20                                       ` Alexey Dobriyan
2009-03-14  8:25                                         ` Ingo Molnar
2009-03-16  6:01                                           ` Oren Laadan
2009-03-13 20:48                                   ` Mike Waychison
2009-03-13 22:35                                     ` Oren Laadan
2009-03-18 18:54                                       ` Mike Waychison
2009-03-18 19:04                                         ` Oren Laadan
2009-03-13 15:27                               ` Cedric Le Goater
2009-03-13 17:11                                 ` Greg Kurz
2009-03-13 17:37                               ` Serge E. Hallyn [this message]
2009-03-13 15:47                         ` Cedric Le Goater
2009-03-13 16:35                           ` Serge E. Hallyn
2009-03-13 16:53                             ` Cedric Le Goater
2009-02-26 16:27                 ` Alexey Dobriyan
2009-02-26 17:33                   ` Ingo Molnar
2009-02-26 18:30                     ` Greg Kurz
2009-02-26 22:17                       ` Alexey Dobriyan
2009-02-27  9:19                         ` Greg Kurz
2009-02-27 10:53                           ` Alexey Dobriyan
2009-02-27 14:33                             ` Cedric Le Goater
2009-02-27  9:36                         ` Cedric Le Goater
2009-02-26 22:31                     ` Alexey Dobriyan
2009-02-27  9:03                       ` Ingo Molnar
2009-02-27  9:19                         ` Andrew Morton
2009-02-27 10:57                           ` Alexey Dobriyan
2009-02-27  9:22                         ` Andrew Morton
2009-02-27 10:59                           ` Alexey Dobriyan
2009-02-27 16:14                       ` Dave Hansen
2009-02-27 21:57                         ` Alexey Dobriyan
2009-02-27 21:54                           ` Dave Hansen
2009-03-01  1:33                       ` Alexey Dobriyan
2009-03-01 20:02                         ` Serge E. Hallyn
2009-03-01 20:56                           ` Alexey Dobriyan
2009-03-01 22:21                             ` Serge E. Hallyn
2009-03-03 16:17                             ` Cedric Le Goater
2009-03-03 18:28                               ` Serge E. Hallyn
2009-02-13 10:53               ` Ingo Molnar
2009-02-16 20:51                 ` Dave Hansen
2009-02-17 22:23                   ` Ingo Molnar
2009-02-17 22:30                     ` Dave Hansen
2009-02-18  0:32                       ` Ingo Molnar
2009-02-18  0:40                         ` Dave Hansen
2009-02-18  5:11                           ` Alexey Dobriyan
2009-02-18 18:16                             ` Ingo Molnar
2009-02-18 21:27                               ` Dave Hansen
2009-02-18 23:15                                 ` Ingo Molnar
2009-02-19 19:06                                   ` Banning checkpoint (was: Re: What can OpenVZ do?) Alexey Dobriyan
2009-02-19 19:11                                     ` Dave Hansen
2009-02-24  4:47                                       ` Alexey Dobriyan
2009-02-24  5:11                                         ` Dave Hansen
2009-02-24 15:43                                           ` Serge E. Hallyn
2009-02-24 20:09                                           ` Alexey Dobriyan
2009-02-12 22:17             ` What can OpenVZ do? Alexey Dobriyan
2009-02-13 10:27             ` Ingo Molnar
2009-02-13 11:32               ` Alexey Dobriyan
2009-02-13 11:45                 ` Ingo Molnar
2009-02-13 22:28                   ` Alexey Dobriyan
2009-03-14  0:04                     ` Eric W. Biederman
2009-03-14  0:26                       ` Serge E. Hallyn
2009-02-12 22:57         ` [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Dave Hansen
2009-02-12 23:05           ` Matt Mackall
2009-02-12 23:13             ` Dave Hansen
2009-02-13 23:28       ` Andrew Morton
2009-02-14 23:08         ` Ingo Molnar
2009-02-14 23:31           ` Andrew Morton
2009-02-14 23:50             ` Ingo Molnar
2009-02-16 17:37         ` Dave Hansen
2009-03-13  2:45         ` Oren Laadan
2009-03-13  3:57           ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090313173719.GA12179@us.ibm.com \
    --to=serue@us.ibm.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=gkurz@fr.ibm.com \
    --cc=hpa@zytor.com \
    --cc=legoater@free.fr \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=mpm@selenic.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xemul@openvz.org \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).