All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][v4][PATCH 0/7] clone_with_pids() system call
@ 2009-08-07  6:11 Sukadev Bhattiprolu
  2009-08-07  6:12 ` [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
                   ` (8 more replies)
  0 siblings, 9 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oren Laadan, Eric W. Biederman, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev



=== NEW CLONE() SYSTEM CALL:

To support application checkpoint/restart, a task must have the same pid it
had when it was checkpointed.  When containers are nested, the tasks within
the containers exist in multiple pid namespaces and hence have multiple pids
to specify during restart.

This patchset implements a new system call, clone_with_pids() that lets a
process specify the pids of the child process.

Patches 1 through 5 are helpers and we believe they are needed for application
restart, regardless of the kernel implementation of application restart.

Patch 7/7 defines a prototype of the new system call.

=== IMPORTANT TODO:

clone() system call has another limitation - all available bits in clone-flags
are in use and any new clone-flag will need a variant of the clone() system
call. 

It appears to make sense to try and extend this new system call to address
this limitation as well. The basic requirements of a new clone system call
could then be summarized as:

	- do everything clone() does today, and
	- give application an ability to choose pids for the child process
	  in all ancestor pid namespaces, and
	- allow more clone_flags

Contstraints:

	- system-calls are restricted to 6 parameters and clone() already
	  takes 5 parameters, any extension to clone() interface would require
	  one or more copy_from_user().

	- does copy_from_user() of a few words have a significant impact on
	  the total cost of clone() ?

Based on these requirements and constraints, we have been exploring a couple
of system call interfaces and appreciate any iput.  

1. =====

	#if 64bit
	#define CLONE_FLAGS_WORDS	1
	#else
	#define CLONE_FLAGS_WORDS	2
	#endif

        struct pid_set {
                int num_pids;
                pid_t *pids;
        };

	typedef struct {
		unsigned long flags[CLONE_FLAGS_WORDS];
	} clone_flags_t;

	int clone_extended(clone_flags_t *flags, void *child_stack, int *unused,
		int *parent_tid, int *child_tid, struct pid_set *pid_set);

	Pros:
		- extendible clone_flags (like sigset_t)

	Cons:
		- copy_from_user() needed on all architectures (we maybe able
		  to play some tricks with 'clone_flags_t' to avoid the copy
		  on 64-bit archtitectures till N_CLONE_FLAGS exceeds 64).

		- Both applications and kernel must use interfaces equivalent
		  to sigsetops(3) to test/set/clear clone flags.
2. ======

	struct clone_info {
		int num_clone_high_words;
		int *flags_high;
		struct pid_set pid_set;
	}

        int clone_extended(int flags_low, void *child_stack, void *unused,
		int *parent_tid, int *child_tid, struct clone_info *clone_info);

	Pros:
		- copy_from_user() needed only for new flags and pid_set

	Cons:
		- splitting the high and low clone-flags is awkward ?


Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page
       [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-08-07  6:12   ` Sukadev Bhattiprolu
  2009-08-07  6:12   ` [RFC][v4][PATCH 2/7]: Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:12 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov



Subject: [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page

To implement support for clone_with_pids() system call we would
need to allocate pidmap page in more than one place. Move this
code to a new function alloc_pidmap_page().

Changelog[v2]:
	- (Matt Helsley, Dave Hansen) Have alloc_pidmap_page() return
	  -ENOMEM on error instead of -1.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 kernel/pid.c |   46 ++++++++++++++++++++++++++++++----------------
 1 files changed, 30 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/pid.c
===================================================================
--- linux-2.6.orig/kernel/pid.c	2009-08-05 17:00:22.000000000 -0700
+++ linux-2.6/kernel/pid.c	2009-08-05 17:02:40.000000000 -0700
@@ -122,9 +122,34 @@
 	atomic_inc(&map->nr_free);
 }
 
+static int alloc_pidmap_page(struct pidmap *map)
+{
+	void *page;
+
+	if (likely(map->page))
+		return 0;
+
+	page = kzalloc(PAGE_SIZE, GFP_KERNEL);
+
+	/*
+	 * Free the page if someone raced with us installing it:
+	 */
+	spin_lock_irq(&pidmap_lock);
+	if (map->page)
+		kfree(page);
+	else
+		map->page = page;
+	spin_unlock_irq(&pidmap_lock);
+
+	if (unlikely(!map->page))
+		return -ENOMEM;
+
+	return 0;
+}
+
 static int alloc_pidmap(struct pid_namespace *pid_ns)
 {
-	int i, offset, max_scan, pid, last = pid_ns->last_pid;
+	int i, rc, offset, max_scan, pid, last = pid_ns->last_pid;
 	struct pidmap *map;
 
 	pid = last + 1;
@@ -134,21 +159,10 @@
 	map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
 	max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset;
 	for (i = 0; i <= max_scan; ++i) {
-		if (unlikely(!map->page)) {
-			void *page = kzalloc(PAGE_SIZE, GFP_KERNEL);
-			/*
-			 * Free the page if someone raced with us
-			 * installing it:
-			 */
-			spin_lock_irq(&pidmap_lock);
-			if (map->page)
-				kfree(page);
-			else
-				map->page = page;
-			spin_unlock_irq(&pidmap_lock);
-			if (unlikely(!map->page))
-				break;
-		}
+		rc = alloc_pidmap_page(map);
+		if (rc)
+			break;
+
 		if (likely(atomic_read(&map->nr_free))) {
 			do {
 				if (!test_and_set_bit(offset, map->page)) {

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page
  2009-08-07  6:11 [RFC][v4][PATCH 0/7] clone_with_pids() system call Sukadev Bhattiprolu
@ 2009-08-07  6:12 ` Sukadev Bhattiprolu
  2009-08-07  6:12 ` [RFC][v4][PATCH 2/7]: Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oren Laadan, Eric W. Biederman, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev



Subject: [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page

To implement support for clone_with_pids() system call we would
need to allocate pidmap page in more than one place. Move this
code to a new function alloc_pidmap_page().

Changelog[v2]:
	- (Matt Helsley, Dave Hansen) Have alloc_pidmap_page() return
	  -ENOMEM on error instead of -1.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Reviewed-by: Oren Laadan <orenl@cs.columbia.edu>
---
 kernel/pid.c |   46 ++++++++++++++++++++++++++++++----------------
 1 files changed, 30 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/pid.c
===================================================================
--- linux-2.6.orig/kernel/pid.c	2009-08-05 17:00:22.000000000 -0700
+++ linux-2.6/kernel/pid.c	2009-08-05 17:02:40.000000000 -0700
@@ -122,9 +122,34 @@
 	atomic_inc(&map->nr_free);
 }
 
+static int alloc_pidmap_page(struct pidmap *map)
+{
+	void *page;
+
+	if (likely(map->page))
+		return 0;
+
+	page = kzalloc(PAGE_SIZE, GFP_KERNEL);
+
+	/*
+	 * Free the page if someone raced with us installing it:
+	 */
+	spin_lock_irq(&pidmap_lock);
+	if (map->page)
+		kfree(page);
+	else
+		map->page = page;
+	spin_unlock_irq(&pidmap_lock);
+
+	if (unlikely(!map->page))
+		return -ENOMEM;
+
+	return 0;
+}
+
 static int alloc_pidmap(struct pid_namespace *pid_ns)
 {
-	int i, offset, max_scan, pid, last = pid_ns->last_pid;
+	int i, rc, offset, max_scan, pid, last = pid_ns->last_pid;
 	struct pidmap *map;
 
 	pid = last + 1;
@@ -134,21 +159,10 @@
 	map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
 	max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset;
 	for (i = 0; i <= max_scan; ++i) {
-		if (unlikely(!map->page)) {
-			void *page = kzalloc(PAGE_SIZE, GFP_KERNEL);
-			/*
-			 * Free the page if someone raced with us
-			 * installing it:
-			 */
-			spin_lock_irq(&pidmap_lock);
-			if (map->page)
-				kfree(page);
-			else
-				map->page = page;
-			spin_unlock_irq(&pidmap_lock);
-			if (unlikely(!map->page))
-				break;
-		}
+		rc = alloc_pidmap_page(map);
+		if (rc)
+			break;
+
 		if (likely(atomic_read(&map->nr_free))) {
 			do {
 				if (!test_and_set_bit(offset, map->page)) {

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 2/7]: Have alloc_pidmap() return actual error code
       [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-08-07  6:12   ` [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
@ 2009-08-07  6:12   ` Sukadev Bhattiprolu
  2009-08-07  6:13   ` [RFC][v4][PATCH 3/7]: Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:12 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov



Subject: [RFC][v4][PATCH 2/7]: Have alloc_pidmap() return actual error code

alloc_pidmap() can fail either because all pid numbers are in use or
because memory allocation failed.  With support for setting a specific
pid number, alloc_pidmap() would also fail if either the given pid
number is invalid or in use.

Rather than have callers assume -ENOMEM, have alloc_pidmap() return
the actual error.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 kernel/fork.c |    5 +++--
 kernel/pid.c  |    9 ++++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c	2009-08-05 17:00:22.000000000 -0700
+++ linux-2.6/kernel/fork.c	2009-08-05 17:02:45.000000000 -0700
@@ -1124,10 +1124,11 @@
 		goto bad_fork_cleanup_io;
 
 	if (pid != &init_struct_pid) {
-		retval = -ENOMEM;
 		pid = alloc_pid(p->nsproxy->pid_ns);
-		if (!pid)
+		if (IS_ERR(pid)) {
+			retval = PTR_ERR(pid);
 			goto bad_fork_cleanup_io;
+		}
 
 		if (clone_flags & CLONE_NEWPID) {
 			retval = pid_ns_prepare_proc(p->nsproxy->pid_ns);
Index: linux-2.6/kernel/pid.c
===================================================================
--- linux-2.6.orig/kernel/pid.c	2009-08-05 17:02:40.000000000 -0700
+++ linux-2.6/kernel/pid.c	2009-08-05 17:02:45.000000000 -0700
@@ -158,6 +158,7 @@
 	offset = pid & BITS_PER_PAGE_MASK;
 	map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
 	max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset;
+	rc = -EAGAIN;
 	for (i = 0; i <= max_scan; ++i) {
 		rc = alloc_pidmap_page(map);
 		if (rc)
@@ -188,12 +189,14 @@
 		} else {
 			map = &pid_ns->pidmap[0];
 			offset = RESERVED_PIDS;
-			if (unlikely(last == offset))
+			if (unlikely(last == offset)) {
+				rc = -EAGAIN;
 				break;
+			}
 		}
 		pid = mk_pid(pid_ns, map, offset);
 	}
-	return -1;
+	return rc;
 }
 
 int next_pidmap(struct pid_namespace *pid_ns, int last)
@@ -298,7 +301,7 @@
 		free_pidmap(pid->numbers + i);
 
 	kmem_cache_free(ns->pid_cachep, pid);
-	pid = NULL;
+	pid = ERR_PTR(nr);
 	goto out;
 }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 2/7]: Have alloc_pidmap() return actual error code
  2009-08-07  6:11 [RFC][v4][PATCH 0/7] clone_with_pids() system call Sukadev Bhattiprolu
  2009-08-07  6:12 ` [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
@ 2009-08-07  6:12 ` Sukadev Bhattiprolu
  2009-08-07  6:13 ` [RFC][v4][PATCH 3/7]: Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oren Laadan, Eric W. Biederman, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev



Subject: [RFC][v4][PATCH 2/7]: Have alloc_pidmap() return actual error code

alloc_pidmap() can fail either because all pid numbers are in use or
because memory allocation failed.  With support for setting a specific
pid number, alloc_pidmap() would also fail if either the given pid
number is invalid or in use.

Rather than have callers assume -ENOMEM, have alloc_pidmap() return
the actual error.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Reviewed-by: Oren Laadan <orenl@cs.columbia.edu>
---
 kernel/fork.c |    5 +++--
 kernel/pid.c  |    9 ++++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c	2009-08-05 17:00:22.000000000 -0700
+++ linux-2.6/kernel/fork.c	2009-08-05 17:02:45.000000000 -0700
@@ -1124,10 +1124,11 @@
 		goto bad_fork_cleanup_io;
 
 	if (pid != &init_struct_pid) {
-		retval = -ENOMEM;
 		pid = alloc_pid(p->nsproxy->pid_ns);
-		if (!pid)
+		if (IS_ERR(pid)) {
+			retval = PTR_ERR(pid);
 			goto bad_fork_cleanup_io;
+		}
 
 		if (clone_flags & CLONE_NEWPID) {
 			retval = pid_ns_prepare_proc(p->nsproxy->pid_ns);
Index: linux-2.6/kernel/pid.c
===================================================================
--- linux-2.6.orig/kernel/pid.c	2009-08-05 17:02:40.000000000 -0700
+++ linux-2.6/kernel/pid.c	2009-08-05 17:02:45.000000000 -0700
@@ -158,6 +158,7 @@
 	offset = pid & BITS_PER_PAGE_MASK;
 	map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
 	max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset;
+	rc = -EAGAIN;
 	for (i = 0; i <= max_scan; ++i) {
 		rc = alloc_pidmap_page(map);
 		if (rc)
@@ -188,12 +189,14 @@
 		} else {
 			map = &pid_ns->pidmap[0];
 			offset = RESERVED_PIDS;
-			if (unlikely(last == offset))
+			if (unlikely(last == offset)) {
+				rc = -EAGAIN;
 				break;
+			}
 		}
 		pid = mk_pid(pid_ns, map, offset);
 	}
-	return -1;
+	return rc;
 }
 
 int next_pidmap(struct pid_namespace *pid_ns, int last)
@@ -298,7 +301,7 @@
 		free_pidmap(pid->numbers + i);
 
 	kmem_cache_free(ns->pid_cachep, pid);
-	pid = NULL;
+	pid = ERR_PTR(nr);
 	goto out;
 }
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 3/7]: Add target_pid parameter to alloc_pidmap()
       [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-08-07  6:12   ` [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
  2009-08-07  6:12   ` [RFC][v4][PATCH 2/7]: Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
@ 2009-08-07  6:13   ` Sukadev Bhattiprolu
  2009-08-07  6:13   ` [RFC][v4][PATCH 4/7]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:13 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov



Subject: [RFC][v4][PATCH 3/7]: Add target_pid parameter to alloc_pidmap()

With support for setting a specific pid number for a process,
alloc_pidmap() will need a 'target_pid' parameter.

Changelog[v2]:
	- (Serge Hallyn) Check for 'pid < 0' in set_pidmap().(Code
	  actually checks for 'pid <= 0' for completeness).

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 kernel/pid.c |   28 ++++++++++++++++++++++++++--
 1 files changed, 26 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/pid.c
===================================================================
--- linux-2.6.orig/kernel/pid.c	2009-08-05 17:02:45.000000000 -0700
+++ linux-2.6/kernel/pid.c	2009-08-05 19:34:37.000000000 -0700
@@ -147,11 +147,35 @@
 	return 0;
 }
 
-static int alloc_pidmap(struct pid_namespace *pid_ns)
+static int set_pidmap(struct pid_namespace *pid_ns, int pid)
+{
+	int offset;
+	struct pidmap *map;
+
+	if (pid <= 0 || pid >= pid_max)
+		return -EINVAL;
+
+	offset = pid & BITS_PER_PAGE_MASK;
+	map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
+
+	if (alloc_pidmap_page(map))
+		return -ENOMEM;
+
+	if (test_and_set_bit(offset, map->page))
+		return -EBUSY;
+
+	atomic_dec(&map->nr_free);
+	return pid;
+}
+
+static int alloc_pidmap(struct pid_namespace *pid_ns, int target_pid)
 {
 	int i, rc, offset, max_scan, pid, last = pid_ns->last_pid;
 	struct pidmap *map;
 
+	if (target_pid)
+		return set_pidmap(pid_ns, target_pid);
+
 	pid = last + 1;
 	if (pid >= pid_max)
 		pid = RESERVED_PIDS;
@@ -270,7 +294,7 @@
 
 	tmp = ns;
 	for (i = ns->level; i >= 0; i--) {
-		nr = alloc_pidmap(tmp);
+		nr = alloc_pidmap(tmp, 0);
 		if (nr < 0)
 			goto out_free;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 3/7]: Add target_pid parameter to alloc_pidmap()
  2009-08-07  6:11 [RFC][v4][PATCH 0/7] clone_with_pids() system call Sukadev Bhattiprolu
  2009-08-07  6:12 ` [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
  2009-08-07  6:12 ` [RFC][v4][PATCH 2/7]: Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
@ 2009-08-07  6:13 ` Sukadev Bhattiprolu
  2009-08-07  6:13 ` [RFC][v4][PATCH 4/7]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oren Laadan, Eric W. Biederman, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev



Subject: [RFC][v4][PATCH 3/7]: Add target_pid parameter to alloc_pidmap()

With support for setting a specific pid number for a process,
alloc_pidmap() will need a 'target_pid' parameter.

Changelog[v2]:
	- (Serge Hallyn) Check for 'pid < 0' in set_pidmap().(Code
	  actually checks for 'pid <= 0' for completeness).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Reviewed-by: Oren Laadan <orenl@cs.columbia.edu>
---
 kernel/pid.c |   28 ++++++++++++++++++++++++++--
 1 files changed, 26 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/pid.c
===================================================================
--- linux-2.6.orig/kernel/pid.c	2009-08-05 17:02:45.000000000 -0700
+++ linux-2.6/kernel/pid.c	2009-08-05 19:34:37.000000000 -0700
@@ -147,11 +147,35 @@
 	return 0;
 }
 
-static int alloc_pidmap(struct pid_namespace *pid_ns)
+static int set_pidmap(struct pid_namespace *pid_ns, int pid)
+{
+	int offset;
+	struct pidmap *map;
+
+	if (pid <= 0 || pid >= pid_max)
+		return -EINVAL;
+
+	offset = pid & BITS_PER_PAGE_MASK;
+	map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
+
+	if (alloc_pidmap_page(map))
+		return -ENOMEM;
+
+	if (test_and_set_bit(offset, map->page))
+		return -EBUSY;
+
+	atomic_dec(&map->nr_free);
+	return pid;
+}
+
+static int alloc_pidmap(struct pid_namespace *pid_ns, int target_pid)
 {
 	int i, rc, offset, max_scan, pid, last = pid_ns->last_pid;
 	struct pidmap *map;
 
+	if (target_pid)
+		return set_pidmap(pid_ns, target_pid);
+
 	pid = last + 1;
 	if (pid >= pid_max)
 		pid = RESERVED_PIDS;
@@ -270,7 +294,7 @@
 
 	tmp = ns;
 	for (i = ns->level; i >= 0; i--) {
-		nr = alloc_pidmap(tmp);
+		nr = alloc_pidmap(tmp, 0);
 		if (nr < 0)
 			goto out_free;
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 4/7]: Add target_pids parameter to alloc_pid()
       [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2009-08-07  6:13   ` [RFC][v4][PATCH 3/7]: Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
@ 2009-08-07  6:13   ` Sukadev Bhattiprolu
  2009-08-07  6:13   ` [RFC][v4][PATCH 5/7]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:13 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov



Subject: [RFC][v4][PATCH 4/7]: Add target_pids parameter to alloc_pid()

This parameter is currently NULL, but will be used in a follow-on patch.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 include/linux/pid.h |    2 +-
 kernel/fork.c       |    3 ++-
 kernel/pid.c        |    9 +++++++--
 3 files changed, 10 insertions(+), 4 deletions(-)

Index: linux-2.6/include/linux/pid.h
===================================================================
--- linux-2.6.orig/include/linux/pid.h	2009-08-05 19:34:37.000000000 -0700
+++ linux-2.6/include/linux/pid.h	2009-08-05 19:35:08.000000000 -0700
@@ -119,7 +119,7 @@
 extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
 int next_pidmap(struct pid_namespace *pid_ns, int last);
 
-extern struct pid *alloc_pid(struct pid_namespace *ns);
+extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *target_pids);
 extern void free_pid(struct pid *pid);
 
 /*
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c	2009-08-05 19:34:37.000000000 -0700
+++ linux-2.6/kernel/fork.c	2009-08-05 19:35:08.000000000 -0700
@@ -954,6 +954,7 @@
 	int retval;
 	struct task_struct *p;
 	int cgroup_callbacks_done = 0;
+	pid_t *target_pids = NULL;
 
 	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
 		return ERR_PTR(-EINVAL);
@@ -1124,7 +1125,7 @@
 		goto bad_fork_cleanup_io;
 
 	if (pid != &init_struct_pid) {
-		pid = alloc_pid(p->nsproxy->pid_ns);
+		pid = alloc_pid(p->nsproxy->pid_ns, target_pids);
 		if (IS_ERR(pid)) {
 			retval = PTR_ERR(pid);
 			goto bad_fork_cleanup_io;
Index: linux-2.6/kernel/pid.c
===================================================================
--- linux-2.6.orig/kernel/pid.c	2009-08-05 19:34:37.000000000 -0700
+++ linux-2.6/kernel/pid.c	2009-08-05 19:35:08.000000000 -0700
@@ -280,13 +280,14 @@
 	call_rcu(&pid->rcu, delayed_put_pid);
 }
 
-struct pid *alloc_pid(struct pid_namespace *ns)
+struct pid *alloc_pid(struct pid_namespace *ns, pid_t *target_pids)
 {
 	struct pid *pid;
 	enum pid_type type;
 	int i, nr;
 	struct pid_namespace *tmp;
 	struct upid *upid;
+	int tpid;
 
 	pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
 	if (!pid)
@@ -294,7 +295,11 @@
 
 	tmp = ns;
 	for (i = ns->level; i >= 0; i--) {
-		nr = alloc_pidmap(tmp, 0);
+		tpid = 0;
+		if (target_pids)
+			tpid = target_pids[i];
+
+		nr = alloc_pidmap(tmp, tpid);
 		if (nr < 0)
 			goto out_free;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 4/7]: Add target_pids parameter to alloc_pid()
  2009-08-07  6:11 [RFC][v4][PATCH 0/7] clone_with_pids() system call Sukadev Bhattiprolu
                   ` (2 preceding siblings ...)
  2009-08-07  6:13 ` [RFC][v4][PATCH 3/7]: Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
@ 2009-08-07  6:13 ` Sukadev Bhattiprolu
  2009-08-07  6:13 ` [RFC][v4][PATCH 5/7]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oren Laadan, Eric W. Biederman, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev



Subject: [RFC][v4][PATCH 4/7]: Add target_pids parameter to alloc_pid()

This parameter is currently NULL, but will be used in a follow-on patch.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Reviewed-by: Oren Laadan <orenl@cs.columbia.edu>
---
 include/linux/pid.h |    2 +-
 kernel/fork.c       |    3 ++-
 kernel/pid.c        |    9 +++++++--
 3 files changed, 10 insertions(+), 4 deletions(-)

Index: linux-2.6/include/linux/pid.h
===================================================================
--- linux-2.6.orig/include/linux/pid.h	2009-08-05 19:34:37.000000000 -0700
+++ linux-2.6/include/linux/pid.h	2009-08-05 19:35:08.000000000 -0700
@@ -119,7 +119,7 @@
 extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
 int next_pidmap(struct pid_namespace *pid_ns, int last);
 
-extern struct pid *alloc_pid(struct pid_namespace *ns);
+extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *target_pids);
 extern void free_pid(struct pid *pid);
 
 /*
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c	2009-08-05 19:34:37.000000000 -0700
+++ linux-2.6/kernel/fork.c	2009-08-05 19:35:08.000000000 -0700
@@ -954,6 +954,7 @@
 	int retval;
 	struct task_struct *p;
 	int cgroup_callbacks_done = 0;
+	pid_t *target_pids = NULL;
 
 	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
 		return ERR_PTR(-EINVAL);
@@ -1124,7 +1125,7 @@
 		goto bad_fork_cleanup_io;
 
 	if (pid != &init_struct_pid) {
-		pid = alloc_pid(p->nsproxy->pid_ns);
+		pid = alloc_pid(p->nsproxy->pid_ns, target_pids);
 		if (IS_ERR(pid)) {
 			retval = PTR_ERR(pid);
 			goto bad_fork_cleanup_io;
Index: linux-2.6/kernel/pid.c
===================================================================
--- linux-2.6.orig/kernel/pid.c	2009-08-05 19:34:37.000000000 -0700
+++ linux-2.6/kernel/pid.c	2009-08-05 19:35:08.000000000 -0700
@@ -280,13 +280,14 @@
 	call_rcu(&pid->rcu, delayed_put_pid);
 }
 
-struct pid *alloc_pid(struct pid_namespace *ns)
+struct pid *alloc_pid(struct pid_namespace *ns, pid_t *target_pids)
 {
 	struct pid *pid;
 	enum pid_type type;
 	int i, nr;
 	struct pid_namespace *tmp;
 	struct upid *upid;
+	int tpid;
 
 	pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
 	if (!pid)
@@ -294,7 +295,11 @@
 
 	tmp = ns;
 	for (i = ns->level; i >= 0; i--) {
-		nr = alloc_pidmap(tmp, 0);
+		tpid = 0;
+		if (target_pids)
+			tpid = target_pids[i];
+
+		nr = alloc_pidmap(tmp, tpid);
 		if (nr < 0)
 			goto out_free;
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 5/7]: Add target_pids parameter to copy_process()
       [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2009-08-07  6:13   ` [RFC][v4][PATCH 4/7]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
@ 2009-08-07  6:13   ` Sukadev Bhattiprolu
  2009-08-07  6:14   ` [RFC][v4][PATCH 6/7]: Define do_fork_with_pids() Sukadev Bhattiprolu
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:13 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov



Subject: [RFC][v4][PATCH 5/7]: Add target_pids parameter to copy_process()

Add a 'target_pids' parameter to copy_process().  The new parameter will be
used in a follow-on patch when clone_with_pids() is implemented.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 kernel/fork.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c	2009-08-05 19:35:08.000000000 -0700
+++ linux-2.6/kernel/fork.c	2009-08-05 19:35:34.000000000 -0700
@@ -949,12 +949,12 @@
 					unsigned long stack_size,
 					int __user *child_tidptr,
 					struct pid *pid,
+					pid_t *target_pids,
 					int trace)
 {
 	int retval;
 	struct task_struct *p;
 	int cgroup_callbacks_done = 0;
-	pid_t *target_pids = NULL;
 
 	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
 		return ERR_PTR(-EINVAL);
@@ -1330,7 +1330,7 @@
 	struct pt_regs regs;
 
 	task = copy_process(CLONE_VM, 0, idle_regs(&regs), 0, NULL,
-			    &init_struct_pid, 0);
+			    &init_struct_pid, NULL, 0);
 	if (!IS_ERR(task))
 		init_idle(task, cpu);
 
@@ -1353,6 +1353,7 @@
 	struct task_struct *p;
 	int trace = 0;
 	long nr;
+	pid_t *target_pids = NULL;
 
 	/*
 	 * Do some preliminary argument and permissions checking before we
@@ -1393,7 +1394,7 @@
 		trace = tracehook_prepare_clone(clone_flags);
 
 	p = copy_process(clone_flags, stack_start, regs, stack_size,
-			 child_tidptr, NULL, trace);
+			 child_tidptr, NULL, target_pids, trace);
 	/*
 	 * Do this prior waking up the new thread - the thread pointer
 	 * might get invalid after that point, if the thread exits quickly.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 5/7]: Add target_pids parameter to copy_process()
  2009-08-07  6:11 [RFC][v4][PATCH 0/7] clone_with_pids() system call Sukadev Bhattiprolu
                   ` (3 preceding siblings ...)
  2009-08-07  6:13 ` [RFC][v4][PATCH 4/7]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
@ 2009-08-07  6:13 ` Sukadev Bhattiprolu
  2009-08-07  6:14 ` [RFC][v4][PATCH 6/7]: Define do_fork_with_pids() Sukadev Bhattiprolu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oren Laadan, Eric W. Biederman, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev



Subject: [RFC][v4][PATCH 5/7]: Add target_pids parameter to copy_process()

Add a 'target_pids' parameter to copy_process().  The new parameter will be
used in a follow-on patch when clone_with_pids() is implemented.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Reviewed-by: Oren Laadan <orenl@cs.columbia.edu>
---
 kernel/fork.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c	2009-08-05 19:35:08.000000000 -0700
+++ linux-2.6/kernel/fork.c	2009-08-05 19:35:34.000000000 -0700
@@ -949,12 +949,12 @@
 					unsigned long stack_size,
 					int __user *child_tidptr,
 					struct pid *pid,
+					pid_t *target_pids,
 					int trace)
 {
 	int retval;
 	struct task_struct *p;
 	int cgroup_callbacks_done = 0;
-	pid_t *target_pids = NULL;
 
 	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
 		return ERR_PTR(-EINVAL);
@@ -1330,7 +1330,7 @@
 	struct pt_regs regs;
 
 	task = copy_process(CLONE_VM, 0, idle_regs(&regs), 0, NULL,
-			    &init_struct_pid, 0);
+			    &init_struct_pid, NULL, 0);
 	if (!IS_ERR(task))
 		init_idle(task, cpu);
 
@@ -1353,6 +1353,7 @@
 	struct task_struct *p;
 	int trace = 0;
 	long nr;
+	pid_t *target_pids = NULL;
 
 	/*
 	 * Do some preliminary argument and permissions checking before we
@@ -1393,7 +1394,7 @@
 		trace = tracehook_prepare_clone(clone_flags);
 
 	p = copy_process(clone_flags, stack_start, regs, stack_size,
-			 child_tidptr, NULL, trace);
+			 child_tidptr, NULL, target_pids, trace);
 	/*
 	 * Do this prior waking up the new thread - the thread pointer
 	 * might get invalid after that point, if the thread exits quickly.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 6/7]: Define do_fork_with_pids()
       [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2009-08-07  6:13   ` [RFC][v4][PATCH 5/7]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
@ 2009-08-07  6:14   ` Sukadev Bhattiprolu
  2009-08-07  6:15   ` [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall Sukadev Bhattiprolu
  2009-08-13  3:45   ` [RFC][v4][PATCH 0/7] clone_with_pids() system call Eric W. Biederman
  7 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:14 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov


Subject: [RFC][v4][PATCH 6/7]: Define do_fork_with_pids()

do_fork_with_pids() is same as do_fork(), except that it takes an
additional, 'pid_set', parameter. This parameter, currently unused,
specifies the set of target pids of the process in each of its pid
namespaces.

Changelog[v4]:
	- Rename 'struct target_pid_set' to 'struct pid_set' since it may
	  be useful in other contexts.
Changelog[v3]:
	- Fix "long-line" warning from checkpatch.pl

Changelog[v2]:
	- To facilitate moving architecture-inpdendent code to kernel/fork.c
	  pass in 'struct target_pid_set __user *' to do_fork_with_pids()
	  rather than 'pid_t *' (next patch moves the arch-independent
	  code to kernel/fork.c)

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 include/linux/sched.h |    3 +++
 include/linux/types.h |    5 +++++
 kernel/fork.c         |   16 ++++++++++++++--
 3 files changed, 22 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h	2009-08-05 19:34:36.000000000 -0700
+++ linux-2.6/include/linux/sched.h	2009-08-05 19:36:59.000000000 -0700
@@ -2053,6 +2053,9 @@
 
 extern int do_execve(char *, char __user * __user *, char __user * __user *, struct pt_regs *);
 extern long do_fork(unsigned long, unsigned long, struct pt_regs *, unsigned long, int __user *, int __user *);
+extern long do_fork_with_pids(unsigned long, unsigned long, struct pt_regs *,
+				unsigned long, int __user *, int __user *,
+				struct pid_set __user *pid_set);
 struct task_struct *fork_idle(int);
 
 extern void set_task_comm(struct task_struct *tsk, char *from);
Index: linux-2.6/include/linux/types.h
===================================================================
--- linux-2.6.orig/include/linux/types.h	2009-08-05 19:34:36.000000000 -0700
+++ linux-2.6/include/linux/types.h	2009-08-06 19:13:38.000000000 -0700
@@ -204,6 +204,11 @@
 	char			f_fpack[6];
 };
 
+struct pid_set {
+	int num_pids;
+	pid_t *pids;
+};
+
 #endif	/* __KERNEL__ */
 #endif /*  __ASSEMBLY__ */
 #endif /* _LINUX_TYPES_H */
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c	2009-08-05 19:35:34.000000000 -0700
+++ linux-2.6/kernel/fork.c	2009-08-06 19:13:38.000000000 -0700
@@ -1343,12 +1343,13 @@
  * It copies the process, and if successful kick-starts
  * it and waits for it to finish using the VM if required.
  */
-long do_fork(unsigned long clone_flags,
+long do_fork_with_pids(unsigned long clone_flags,
 	      unsigned long stack_start,
 	      struct pt_regs *regs,
 	      unsigned long stack_size,
 	      int __user *parent_tidptr,
-	      int __user *child_tidptr)
+	      int __user *child_tidptr,
+	      struct pid_set __user *pid_setp)
 {
 	struct task_struct *p;
 	int trace = 0;
@@ -1451,6 +1452,17 @@
 	return nr;
 }
 
+long do_fork(unsigned long clone_flags,
+	      unsigned long stack_start,
+	      struct pt_regs *regs,
+	      unsigned long stack_size,
+	      int __user *parent_tidptr,
+	      int __user *child_tidptr)
+{
+	return do_fork_with_pids(clone_flags, stack_start, regs, stack_size,
+			parent_tidptr, child_tidptr, NULL);
+}
+
 #ifndef ARCH_MIN_MMSTRUCT_ALIGN
 #define ARCH_MIN_MMSTRUCT_ALIGN 0
 #endif

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 6/7]: Define do_fork_with_pids()
  2009-08-07  6:11 [RFC][v4][PATCH 0/7] clone_with_pids() system call Sukadev Bhattiprolu
                   ` (4 preceding siblings ...)
  2009-08-07  6:13 ` [RFC][v4][PATCH 5/7]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
@ 2009-08-07  6:14 ` Sukadev Bhattiprolu
  2009-08-07  6:15 ` [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall Sukadev Bhattiprolu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oren Laadan, Eric W. Biederman, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev


Subject: [RFC][v4][PATCH 6/7]: Define do_fork_with_pids()

do_fork_with_pids() is same as do_fork(), except that it takes an
additional, 'pid_set', parameter. This parameter, currently unused,
specifies the set of target pids of the process in each of its pid
namespaces.

Changelog[v4]:
	- Rename 'struct target_pid_set' to 'struct pid_set' since it may
	  be useful in other contexts.
Changelog[v3]:
	- Fix "long-line" warning from checkpatch.pl

Changelog[v2]:
	- To facilitate moving architecture-inpdendent code to kernel/fork.c
	  pass in 'struct target_pid_set __user *' to do_fork_with_pids()
	  rather than 'pid_t *' (next patch moves the arch-independent
	  code to kernel/fork.c)

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Reviewed-by: Oren Laadan <orenl@cs.columbia.edu>
---
 include/linux/sched.h |    3 +++
 include/linux/types.h |    5 +++++
 kernel/fork.c         |   16 ++++++++++++++--
 3 files changed, 22 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h	2009-08-05 19:34:36.000000000 -0700
+++ linux-2.6/include/linux/sched.h	2009-08-05 19:36:59.000000000 -0700
@@ -2053,6 +2053,9 @@
 
 extern int do_execve(char *, char __user * __user *, char __user * __user *, struct pt_regs *);
 extern long do_fork(unsigned long, unsigned long, struct pt_regs *, unsigned long, int __user *, int __user *);
+extern long do_fork_with_pids(unsigned long, unsigned long, struct pt_regs *,
+				unsigned long, int __user *, int __user *,
+				struct pid_set __user *pid_set);
 struct task_struct *fork_idle(int);
 
 extern void set_task_comm(struct task_struct *tsk, char *from);
Index: linux-2.6/include/linux/types.h
===================================================================
--- linux-2.6.orig/include/linux/types.h	2009-08-05 19:34:36.000000000 -0700
+++ linux-2.6/include/linux/types.h	2009-08-06 19:13:38.000000000 -0700
@@ -204,6 +204,11 @@
 	char			f_fpack[6];
 };
 
+struct pid_set {
+	int num_pids;
+	pid_t *pids;
+};
+
 #endif	/* __KERNEL__ */
 #endif /*  __ASSEMBLY__ */
 #endif /* _LINUX_TYPES_H */
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c	2009-08-05 19:35:34.000000000 -0700
+++ linux-2.6/kernel/fork.c	2009-08-06 19:13:38.000000000 -0700
@@ -1343,12 +1343,13 @@
  * It copies the process, and if successful kick-starts
  * it and waits for it to finish using the VM if required.
  */
-long do_fork(unsigned long clone_flags,
+long do_fork_with_pids(unsigned long clone_flags,
 	      unsigned long stack_start,
 	      struct pt_regs *regs,
 	      unsigned long stack_size,
 	      int __user *parent_tidptr,
-	      int __user *child_tidptr)
+	      int __user *child_tidptr,
+	      struct pid_set __user *pid_setp)
 {
 	struct task_struct *p;
 	int trace = 0;
@@ -1451,6 +1452,17 @@
 	return nr;
 }
 
+long do_fork(unsigned long clone_flags,
+	      unsigned long stack_start,
+	      struct pt_regs *regs,
+	      unsigned long stack_size,
+	      int __user *parent_tidptr,
+	      int __user *child_tidptr)
+{
+	return do_fork_with_pids(clone_flags, stack_start, regs, stack_size,
+			parent_tidptr, child_tidptr, NULL);
+}
+
 #ifndef ARCH_MIN_MMSTRUCT_ALIGN
 #define ARCH_MIN_MMSTRUCT_ALIGN 0
 #endif

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
       [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2009-08-07  6:14   ` [RFC][v4][PATCH 6/7]: Define do_fork_with_pids() Sukadev Bhattiprolu
@ 2009-08-07  6:15   ` Sukadev Bhattiprolu
  2009-08-13  3:45   ` [RFC][v4][PATCH 0/7] clone_with_pids() system call Eric W. Biederman
  7 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov


Subject: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall

Container restart requires that a task have the same pid it had when it was
checkpointed. When containers are nested the tasks within the containers
exist in multiple pid namespaces and hence have multiple pids to specify
during restart.

clone_with_pids(), intended for use during restart, is the same as clone(),
except that it takes a 'target_pid_set' paramter. This parameter lets caller
choose specific pid numbers for the child process, in the process's active
and ancestor pid namespaces. (Descendant pid namespaces in general don't
matter since processes don't have pids in them anyway, but see comments
in copy_target_pids() regarding CLONE_NEWPID).

Unlike clone(), clone_with_pids() needs CAP_SYS_ADMIN, at least for now, to
prevent unprivileged processes from misusing this interface.

Call clone_with_pids as follows:

	pid_t pids[] = { 0, 77, 99 };
	struct pid_set pid_set;

	pid_set.num_pids = sizeof(pids) / sizeof(int);
	pid_set.pids = &pids;

	syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set);

If a target-pid is 0, the kernel continues to assign a pid for the process in
that namespace. In the above example, pids[0] is 0, meaning the kernel will
assign next available pid to the process in init_pid_ns. But kernel will assign
pid 77 in the child pid namespace 1 and pid 99 in pid namespace 2. If either
77 or 99 are taken, the system call fails with -EBUSY.

If 'pid_set.num_pids' exceeds the current nesting level of pid namespaces,
the system call fails with -EINVAL.

Its mostly an exploratory patch seeking feedback on the interface.

NOTE:
	Compared to clone(), clone_with_pids() needs to pass in two more
	pieces of information:

		- number of pids in the set
		- user buffer containing the list of pids.

	But since clone() already takes 5 parameters, use a 'struct pid_set'.

TODO:
	- Gently tested.
	- May need additional sanity checks in do_fork_with_pids().

Changelog[v4]:
	- (Oren Laadan) rename 'struct target_pid_set' to 'struct pid_set'

Changelog[v3]:
	- (Oren Laadan) Allow CLONE_NEWPID flag (by allocating an extra pid
	  in the target_pids[] list and setting it 0. See copy_target_pids()).
	- (Oren Laadan) Specified target pids should apply only to youngest
	  pid-namespaces (see copy_target_pids())
	- (Matt Helsley) Update patch description.

Changelog[v2]:
	- Remove unnecessary printk and add a note to callers of
	  copy_target_pids() to free target_pids.
	- (Serge Hallyn) Mention CAP_SYS_ADMIN restriction in patch description.
	- (Oren Laadan) Add checks for 'num_pids < 0' (return -EINVAL) and
	  'num_pids == 0' (fall back to normal clone()).
	- Move arch-independent code (sanity checks and copy-in of target-pids)
	  into kernel/fork.c and simplify sys_clone_with_pids()

Changelog[v1]:
	- Fixed some compile errors (had fixed these errors earlier in my
	  git tree but had not refreshed patches before emailing them)

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 arch/x86/include/asm/syscalls.h    |    2 +
 arch/x86/include/asm/unistd_32.h   |    1 +
 arch/x86/kernel/entry_32.S         |    1 +
 arch/x86/kernel/process_32.c       |   21 +++++++
 arch/x86/kernel/syscall_table_32.S |    1 +
 kernel/fork.c                      |  108 +++++++++++++++++++++++++++++++++++-
 6 files changed, 133 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index 372b76e..df3c4a8 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -40,6 +40,8 @@ long sys_iopl(struct pt_regs *);
 
 /* kernel/process_32.c */
 int sys_clone(struct pt_regs *);
+int sys_clone_with_pids(struct pt_regs *);
+int sys_vfork(struct pt_regs *);
 int sys_execve(struct pt_regs *);
 
 /* kernel/signal.c */
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index 732a307..f65b750 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -342,6 +342,7 @@
 #define __NR_pwritev		334
 #define __NR_rt_tgsigqueueinfo	335
 #define __NR_perf_counter_open	336
+#define __NR_clone_with_pids	337
 
 #ifdef __KERNEL__
 
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index c097e7d..c7bd1f6 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -718,6 +718,7 @@ ptregs_##name: \
 PTREGSCALL(iopl)
 PTREGSCALL(fork)
 PTREGSCALL(clone)
+PTREGSCALL(clone_with_pids)
 PTREGSCALL(vfork)
 PTREGSCALL(execve)
 PTREGSCALL(sigaltstack)
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 59f4524..9965c06 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -443,6 +443,27 @@ int sys_clone(struct pt_regs *regs)
 	return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, child_tidptr);
 }
 
+int sys_clone_with_pids(struct pt_regs *regs)
+{
+	unsigned long clone_flags;
+	unsigned long newsp;
+	int __user *parent_tidptr;
+	int __user *child_tidptr;
+	void __user *upid_setp;
+
+	clone_flags = regs->bx;
+	newsp = regs->cx;
+	parent_tidptr = (int __user *)regs->dx;
+	child_tidptr = (int __user *)regs->di;
+	upid_setp = (void __user *)regs->bp;
+
+	if (!newsp)
+		newsp = regs->sp;
+
+	return do_fork_with_pids(clone_flags, newsp, regs, 0, parent_tidptr,
+			child_tidptr, upid_setp);
+}
+
 /*
  * sys_execve() executes a new program.
  */
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index d51321d..879e5ec 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -336,3 +336,4 @@ ENTRY(sys_call_table)
 	.long sys_pwritev
 	.long sys_rt_tgsigqueueinfo	/* 335 */
 	.long sys_perf_counter_open
+	.long ptregs_clone_with_pids
diff --git a/kernel/fork.c b/kernel/fork.c
index 64d53d9..29c66f0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1336,6 +1336,97 @@ struct task_struct * __cpuinit fork_idle(int cpu)
 }
 
 /*
+ * If user specified any 'target-pids' in @upid_setp, copy them from
+ * user and return a pointer to a local copy of the list of pids. The
+ * caller must free the list, when they are done using it.
+ *
+ * If user did not specify any target pids, return NULL (caller should
+ * treat this like normal clone).
+ *
+ * On any errors, return the error code
+ */
+static pid_t *copy_target_pids(void __user *upid_setp)
+{
+	int j;
+	int rc;
+	int size;
+	int unum_pids;		/* # of pids specified by user */
+	int knum_pids;		/* # of pids needed in kernel */
+	pid_t *target_pids;
+	struct target_pid_set pid_set;
+
+	if (!upid_setp)
+		return NULL;
+
+	rc = copy_from_user(&pid_set, upid_setp, sizeof(pid_set));
+	if (rc)
+		return ERR_PTR(-EFAULT);
+
+	unum_pids = pid_set.num_pids;
+	knum_pids = task_pid(current)->level + 1;
+
+	if (!unum_pids)
+		return NULL;
+
+	if (unum_pids < 0 || unum_pids > knum_pids)
+		return ERR_PTR(-EINVAL);
+
+	/*
+	 * To keep alloc_pid() simple, allocate an extra pid_t in target_pids[]
+	 * and set it to 0. This last entry in target_pids[] corresponds to the
+	 * (yet-to-be-created) descendant pid-namespace if CLONE_NEWPID was
+	 * specified. If CLONE_NEWPID was not specified, this last entry will
+	 * simply be ignored.
+	 */
+	target_pids = kzalloc((knum_pids + 1) * sizeof(pid_t), GFP_KERNEL);
+	if (!target_pids)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * A process running in a level 2 pid namespace has three pid namespaces
+	 * and hence three pid numbers. If this process is checkpointed,
+	 * information about these three namespaces are saved. We refer to these
+	 * namespaces as 'known namespaces'.
+	 *
+	 * If this checkpointed process is however restarted in a level 3 pid
+	 * namespace, the restarted process has an extra ancestor pid namespace
+	 * (i.e 'unknown namespace') and 'knum_pids' exceeds 'unum_pids'.
+	 *
+	 * During restart, the process requests specific pids for its 'known
+	 * namespaces' and lets kernel assign pids to its 'unknown namespaces'.
+	 *
+	 * Since the requested-pids correspond to 'known namespaces' and since
+	 * 'known-namespaces' are younger than (i.e descendants of) 'unknown-
+	 * namespaces', copy requested pids to the back-end of target_pids[]
+	 * (i.e before the last entry for CLONE_NEWPID mentioned above).
+	 * Any entries in target_pids[] not corresponding to a requested pid
+	 * will be set to zero and kernel assigns a pid in those namespaces.
+	 *
+	 * NOTE: The order of pids in target_pids[] is oldest pid namespace to
+	 * 	 youngest (target_pids[0] corresponds to init_pid_ns). i.e.
+	 * 	 the order is:
+	 *
+	 * 		- pids for 'unknown-namespaces' (if any)
+	 * 		- pids for 'known-namespaces' (requested pids)
+	 * 		- 0 in the last entry (for CLONE_NEWPID).
+	 */
+	j = knum_pids - unum_pids;
+	size = unum_pids * sizeof(pid_t);
+
+	rc = copy_from_user(&target_pids[j], pid_set.target_pids, size);
+	if (rc) {
+		rc = -EFAULT;
+		goto out_free;
+	}
+
+	return target_pids;
+
+out_free:
+	kfree(target_pids);
+	return ERR_PTR(rc);
+}
+
+/*
  *  Ok, this is the main fork-routine.
  *
  * It copies the process, and if successful kick-starts
@@ -1352,7 +1443,7 @@ long do_fork_with_pids(unsigned long clone_flags,
 	struct task_struct *p;
 	int trace = 0;
 	long nr;
-	pid_t *target_pids = NULL;
+	pid_t *target_pids;
 
 	/*
 	 * Do some preliminary argument and permissions checking before we
@@ -1386,6 +1477,17 @@ long do_fork_with_pids(unsigned long clone_flags,
 		}
 	}
 
+	target_pids = copy_target_pids(pid_setp);
+
+	if (target_pids) {
+		if (IS_ERR(target_pids))
+			return PTR_ERR(target_pids);
+
+		nr = -EPERM;
+		if (!capable(CAP_SYS_ADMIN))
+			goto out_free;
+	}
+
 	/*
 	 * When called from kernel_thread, don't do user tracing stuff.
 	 */
@@ -1453,6 +1555,10 @@ long do_fork_with_pids(unsigned long clone_flags,
 	} else {
 		nr = PTR_ERR(p);
 	}
+
+out_free:
+	kfree(target_pids);
+
 	return nr;
 }
 
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
  2009-08-07  6:11 [RFC][v4][PATCH 0/7] clone_with_pids() system call Sukadev Bhattiprolu
                   ` (5 preceding siblings ...)
  2009-08-07  6:14 ` [RFC][v4][PATCH 6/7]: Define do_fork_with_pids() Sukadev Bhattiprolu
@ 2009-08-07  6:15 ` Sukadev Bhattiprolu
  2009-08-10 14:54   ` Pavel Machek
       [not found]   ` <20090807061517.GG20672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
       [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-08-13  3:45 ` Eric W. Biederman
  8 siblings, 2 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oren Laadan, Eric W. Biederman, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev


Subject: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall

Container restart requires that a task have the same pid it had when it was
checkpointed. When containers are nested the tasks within the containers
exist in multiple pid namespaces and hence have multiple pids to specify
during restart.

clone_with_pids(), intended for use during restart, is the same as clone(),
except that it takes a 'target_pid_set' paramter. This parameter lets caller
choose specific pid numbers for the child process, in the process's active
and ancestor pid namespaces. (Descendant pid namespaces in general don't
matter since processes don't have pids in them anyway, but see comments
in copy_target_pids() regarding CLONE_NEWPID).

Unlike clone(), clone_with_pids() needs CAP_SYS_ADMIN, at least for now, to
prevent unprivileged processes from misusing this interface.

Call clone_with_pids as follows:

	pid_t pids[] = { 0, 77, 99 };
	struct pid_set pid_set;

	pid_set.num_pids = sizeof(pids) / sizeof(int);
	pid_set.pids = &pids;

	syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set);

If a target-pid is 0, the kernel continues to assign a pid for the process in
that namespace. In the above example, pids[0] is 0, meaning the kernel will
assign next available pid to the process in init_pid_ns. But kernel will assign
pid 77 in the child pid namespace 1 and pid 99 in pid namespace 2. If either
77 or 99 are taken, the system call fails with -EBUSY.

If 'pid_set.num_pids' exceeds the current nesting level of pid namespaces,
the system call fails with -EINVAL.

Its mostly an exploratory patch seeking feedback on the interface.

NOTE:
	Compared to clone(), clone_with_pids() needs to pass in two more
	pieces of information:

		- number of pids in the set
		- user buffer containing the list of pids.

	But since clone() already takes 5 parameters, use a 'struct pid_set'.

TODO:
	- Gently tested.
	- May need additional sanity checks in do_fork_with_pids().

Changelog[v4]:
	- (Oren Laadan) rename 'struct target_pid_set' to 'struct pid_set'

Changelog[v3]:
	- (Oren Laadan) Allow CLONE_NEWPID flag (by allocating an extra pid
	  in the target_pids[] list and setting it 0. See copy_target_pids()).
	- (Oren Laadan) Specified target pids should apply only to youngest
	  pid-namespaces (see copy_target_pids())
	- (Matt Helsley) Update patch description.

Changelog[v2]:
	- Remove unnecessary printk and add a note to callers of
	  copy_target_pids() to free target_pids.
	- (Serge Hallyn) Mention CAP_SYS_ADMIN restriction in patch description.
	- (Oren Laadan) Add checks for 'num_pids < 0' (return -EINVAL) and
	  'num_pids == 0' (fall back to normal clone()).
	- Move arch-independent code (sanity checks and copy-in of target-pids)
	  into kernel/fork.c and simplify sys_clone_with_pids()

Changelog[v1]:
	- Fixed some compile errors (had fixed these errors earlier in my
	  git tree but had not refreshed patches before emailing them)

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/x86/include/asm/syscalls.h    |    2 +
 arch/x86/include/asm/unistd_32.h   |    1 +
 arch/x86/kernel/entry_32.S         |    1 +
 arch/x86/kernel/process_32.c       |   21 +++++++
 arch/x86/kernel/syscall_table_32.S |    1 +
 kernel/fork.c                      |  108 +++++++++++++++++++++++++++++++++++-
 6 files changed, 133 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index 372b76e..df3c4a8 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -40,6 +40,8 @@ long sys_iopl(struct pt_regs *);
 
 /* kernel/process_32.c */
 int sys_clone(struct pt_regs *);
+int sys_clone_with_pids(struct pt_regs *);
+int sys_vfork(struct pt_regs *);
 int sys_execve(struct pt_regs *);
 
 /* kernel/signal.c */
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index 732a307..f65b750 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -342,6 +342,7 @@
 #define __NR_pwritev		334
 #define __NR_rt_tgsigqueueinfo	335
 #define __NR_perf_counter_open	336
+#define __NR_clone_with_pids	337
 
 #ifdef __KERNEL__
 
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index c097e7d..c7bd1f6 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -718,6 +718,7 @@ ptregs_##name: \
 PTREGSCALL(iopl)
 PTREGSCALL(fork)
 PTREGSCALL(clone)
+PTREGSCALL(clone_with_pids)
 PTREGSCALL(vfork)
 PTREGSCALL(execve)
 PTREGSCALL(sigaltstack)
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 59f4524..9965c06 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -443,6 +443,27 @@ int sys_clone(struct pt_regs *regs)
 	return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, child_tidptr);
 }
 
+int sys_clone_with_pids(struct pt_regs *regs)
+{
+	unsigned long clone_flags;
+	unsigned long newsp;
+	int __user *parent_tidptr;
+	int __user *child_tidptr;
+	void __user *upid_setp;
+
+	clone_flags = regs->bx;
+	newsp = regs->cx;
+	parent_tidptr = (int __user *)regs->dx;
+	child_tidptr = (int __user *)regs->di;
+	upid_setp = (void __user *)regs->bp;
+
+	if (!newsp)
+		newsp = regs->sp;
+
+	return do_fork_with_pids(clone_flags, newsp, regs, 0, parent_tidptr,
+			child_tidptr, upid_setp);
+}
+
 /*
  * sys_execve() executes a new program.
  */
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index d51321d..879e5ec 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -336,3 +336,4 @@ ENTRY(sys_call_table)
 	.long sys_pwritev
 	.long sys_rt_tgsigqueueinfo	/* 335 */
 	.long sys_perf_counter_open
+	.long ptregs_clone_with_pids
diff --git a/kernel/fork.c b/kernel/fork.c
index 64d53d9..29c66f0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1336,6 +1336,97 @@ struct task_struct * __cpuinit fork_idle(int cpu)
 }
 
 /*
+ * If user specified any 'target-pids' in @upid_setp, copy them from
+ * user and return a pointer to a local copy of the list of pids. The
+ * caller must free the list, when they are done using it.
+ *
+ * If user did not specify any target pids, return NULL (caller should
+ * treat this like normal clone).
+ *
+ * On any errors, return the error code
+ */
+static pid_t *copy_target_pids(void __user *upid_setp)
+{
+	int j;
+	int rc;
+	int size;
+	int unum_pids;		/* # of pids specified by user */
+	int knum_pids;		/* # of pids needed in kernel */
+	pid_t *target_pids;
+	struct target_pid_set pid_set;
+
+	if (!upid_setp)
+		return NULL;
+
+	rc = copy_from_user(&pid_set, upid_setp, sizeof(pid_set));
+	if (rc)
+		return ERR_PTR(-EFAULT);
+
+	unum_pids = pid_set.num_pids;
+	knum_pids = task_pid(current)->level + 1;
+
+	if (!unum_pids)
+		return NULL;
+
+	if (unum_pids < 0 || unum_pids > knum_pids)
+		return ERR_PTR(-EINVAL);
+
+	/*
+	 * To keep alloc_pid() simple, allocate an extra pid_t in target_pids[]
+	 * and set it to 0. This last entry in target_pids[] corresponds to the
+	 * (yet-to-be-created) descendant pid-namespace if CLONE_NEWPID was
+	 * specified. If CLONE_NEWPID was not specified, this last entry will
+	 * simply be ignored.
+	 */
+	target_pids = kzalloc((knum_pids + 1) * sizeof(pid_t), GFP_KERNEL);
+	if (!target_pids)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * A process running in a level 2 pid namespace has three pid namespaces
+	 * and hence three pid numbers. If this process is checkpointed,
+	 * information about these three namespaces are saved. We refer to these
+	 * namespaces as 'known namespaces'.
+	 *
+	 * If this checkpointed process is however restarted in a level 3 pid
+	 * namespace, the restarted process has an extra ancestor pid namespace
+	 * (i.e 'unknown namespace') and 'knum_pids' exceeds 'unum_pids'.
+	 *
+	 * During restart, the process requests specific pids for its 'known
+	 * namespaces' and lets kernel assign pids to its 'unknown namespaces'.
+	 *
+	 * Since the requested-pids correspond to 'known namespaces' and since
+	 * 'known-namespaces' are younger than (i.e descendants of) 'unknown-
+	 * namespaces', copy requested pids to the back-end of target_pids[]
+	 * (i.e before the last entry for CLONE_NEWPID mentioned above).
+	 * Any entries in target_pids[] not corresponding to a requested pid
+	 * will be set to zero and kernel assigns a pid in those namespaces.
+	 *
+	 * NOTE: The order of pids in target_pids[] is oldest pid namespace to
+	 * 	 youngest (target_pids[0] corresponds to init_pid_ns). i.e.
+	 * 	 the order is:
+	 *
+	 * 		- pids for 'unknown-namespaces' (if any)
+	 * 		- pids for 'known-namespaces' (requested pids)
+	 * 		- 0 in the last entry (for CLONE_NEWPID).
+	 */
+	j = knum_pids - unum_pids;
+	size = unum_pids * sizeof(pid_t);
+
+	rc = copy_from_user(&target_pids[j], pid_set.target_pids, size);
+	if (rc) {
+		rc = -EFAULT;
+		goto out_free;
+	}
+
+	return target_pids;
+
+out_free:
+	kfree(target_pids);
+	return ERR_PTR(rc);
+}
+
+/*
  *  Ok, this is the main fork-routine.
  *
  * It copies the process, and if successful kick-starts
@@ -1352,7 +1443,7 @@ long do_fork_with_pids(unsigned long clone_flags,
 	struct task_struct *p;
 	int trace = 0;
 	long nr;
-	pid_t *target_pids = NULL;
+	pid_t *target_pids;
 
 	/*
 	 * Do some preliminary argument and permissions checking before we
@@ -1386,6 +1477,17 @@ long do_fork_with_pids(unsigned long clone_flags,
 		}
 	}
 
+	target_pids = copy_target_pids(pid_setp);
+
+	if (target_pids) {
+		if (IS_ERR(target_pids))
+			return PTR_ERR(target_pids);
+
+		nr = -EPERM;
+		if (!capable(CAP_SYS_ADMIN))
+			goto out_free;
+	}
+
 	/*
 	 * When called from kernel_thread, don't do user tracing stuff.
 	 */
@@ -1453,6 +1555,10 @@ long do_fork_with_pids(unsigned long clone_flags,
 	} else {
 		nr = PTR_ERR(p);
 	}
+
+out_free:
+	kfree(target_pids);
+
 	return nr;
 }
 
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
       [not found]   ` <20090807061517.GG20672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-08-10 14:54     ` Pavel Machek
  0 siblings, 0 replies; 36+ messages in thread
From: Pavel Machek @ 2009-08-10 14:54 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan,
	Pavel Emelyanov

Hi!

> 
> Subject: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
> 
> Container restart requires that a task have the same pid it had when it was
> checkpointed. When containers are nested the tasks within the containers
> exist in multiple pid namespaces and hence have multiple pids to specify
> during restart.
> 
> clone_with_pids(), intended for use during restart, is the same as clone(),
> except that it takes a 'target_pid_set' paramter. This parameter lets caller
> choose specific pid numbers for the child process, in the process's active
> and ancestor pid namespaces. (Descendant pid namespaces in general don't
> matter since processes don't have pids in them anyway, but see comments
> in copy_target_pids() regarding CLONE_NEWPID).

This should go to documentation/manpage somewhere.


> Unlike clone(), clone_with_pids() needs CAP_SYS_ADMIN, at least for now, to
> prevent unprivileged processes from misusing this interface.
> 
> Call clone_with_pids as follows:
> 
> 	pid_t pids[] = { 0, 77, 99 };
> 	struct pid_set pid_set;
> 
> 	pid_set.num_pids = sizeof(pids) / sizeof(int);
> 	pid_set.pids = &pids;
> 
> 	syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set);
> 
> If a target-pid is 0, the kernel continues to assign a pid for the process in
> that namespace. In the above example, pids[0] is 0, meaning the kernel will
> assign next available pid to the process in init_pid_ns. But kernel will assign
> pid 77 in the child pid namespace 1 and pid 99 in pid namespace 2. If either
> 77 or 99 are taken, the system call fails with -EBUSY.
> 
> If 'pid_set.num_pids' exceeds the current nesting level of pid namespaces,
> the system call fails with -EINVAL.

Does it make sense to set the pid in anything but innermost container?


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
  2009-08-07  6:15 ` [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall Sukadev Bhattiprolu
@ 2009-08-10 14:54   ` Pavel Machek
  2009-08-10 15:07     ` Serge E. Hallyn
                       ` (2 more replies)
       [not found]   ` <20090807061517.GG20672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  1 sibling, 3 replies; 36+ messages in thread
From: Pavel Machek @ 2009-08-10 14:54 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, Oren Laadan, Eric W. Biederman, serue,
	Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew,
	mingo, hpa, Containers, sukadev

Hi!

> 
> Subject: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
> 
> Container restart requires that a task have the same pid it had when it was
> checkpointed. When containers are nested the tasks within the containers
> exist in multiple pid namespaces and hence have multiple pids to specify
> during restart.
> 
> clone_with_pids(), intended for use during restart, is the same as clone(),
> except that it takes a 'target_pid_set' paramter. This parameter lets caller
> choose specific pid numbers for the child process, in the process's active
> and ancestor pid namespaces. (Descendant pid namespaces in general don't
> matter since processes don't have pids in them anyway, but see comments
> in copy_target_pids() regarding CLONE_NEWPID).

This should go to documentation/manpage somewhere.


> Unlike clone(), clone_with_pids() needs CAP_SYS_ADMIN, at least for now, to
> prevent unprivileged processes from misusing this interface.
> 
> Call clone_with_pids as follows:
> 
> 	pid_t pids[] = { 0, 77, 99 };
> 	struct pid_set pid_set;
> 
> 	pid_set.num_pids = sizeof(pids) / sizeof(int);
> 	pid_set.pids = &pids;
> 
> 	syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set);
> 
> If a target-pid is 0, the kernel continues to assign a pid for the process in
> that namespace. In the above example, pids[0] is 0, meaning the kernel will
> assign next available pid to the process in init_pid_ns. But kernel will assign
> pid 77 in the child pid namespace 1 and pid 99 in pid namespace 2. If either
> 77 or 99 are taken, the system call fails with -EBUSY.
> 
> If 'pid_set.num_pids' exceeds the current nesting level of pid namespaces,
> the system call fails with -EINVAL.

Does it make sense to set the pid in anything but innermost container?


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
       [not found]     ` <20090810145425.GA1378-+ZI9xUNit7I@public.gmane.org>
@ 2009-08-10 15:07       ` Serge E. Hallyn
  2009-08-10 22:26       ` Sukadev Bhattiprolu
  1 sibling, 0 replies; 36+ messages in thread
From: Serge E. Hallyn @ 2009-08-10 15:07 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI,
	Sukadev Bhattiprolu, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov

Quoting Pavel Machek (pavel-+ZI9xUNit7I@public.gmane.org):
> > Unlike clone(), clone_with_pids() needs CAP_SYS_ADMIN, at least for now, to
> > prevent unprivileged processes from misusing this interface.
> > 
> > Call clone_with_pids as follows:
> > 
> > 	pid_t pids[] = { 0, 77, 99 };
> > 	struct pid_set pid_set;
> > 
> > 	pid_set.num_pids = sizeof(pids) / sizeof(int);
> > 	pid_set.pids = &pids;
> > 
> > 	syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set);
> > 
> > If a target-pid is 0, the kernel continues to assign a pid for the process in
> > that namespace. In the above example, pids[0] is 0, meaning the kernel will
> > assign next available pid to the process in init_pid_ns. But kernel will assign
> > pid 77 in the child pid namespace 1 and pid 99 in pid namespace 2. If either
> > 77 or 99 are taken, the system call fails with -EBUSY.
> > 
> > If 'pid_set.num_pids' exceeds the current nesting level of pid namespaces,
> > the system call fails with -EINVAL.
> 
> Does it make sense to set the pid in anything but innermost container?

Yup, we might be restarting an app using a nested pid namespace, in which
case restart would specify pids for 2 (or more) of the innermost containers.

thanks,
-serge

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
  2009-08-10 14:54   ` Pavel Machek
@ 2009-08-10 15:07     ` Serge E. Hallyn
  2009-08-10 22:26     ` Sukadev Bhattiprolu
       [not found]     ` <20090810145425.GA1378-+ZI9xUNit7I@public.gmane.org>
  2 siblings, 0 replies; 36+ messages in thread
From: Serge E. Hallyn @ 2009-08-10 15:07 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sukadev Bhattiprolu, linux-kernel, Oren Laadan,
	Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov,
	Andrew Morton, torvalds, mikew, mingo, hpa, Containers, sukadev

Quoting Pavel Machek (pavel@ucw.cz):
> > Unlike clone(), clone_with_pids() needs CAP_SYS_ADMIN, at least for now, to
> > prevent unprivileged processes from misusing this interface.
> > 
> > Call clone_with_pids as follows:
> > 
> > 	pid_t pids[] = { 0, 77, 99 };
> > 	struct pid_set pid_set;
> > 
> > 	pid_set.num_pids = sizeof(pids) / sizeof(int);
> > 	pid_set.pids = &pids;
> > 
> > 	syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set);
> > 
> > If a target-pid is 0, the kernel continues to assign a pid for the process in
> > that namespace. In the above example, pids[0] is 0, meaning the kernel will
> > assign next available pid to the process in init_pid_ns. But kernel will assign
> > pid 77 in the child pid namespace 1 and pid 99 in pid namespace 2. If either
> > 77 or 99 are taken, the system call fails with -EBUSY.
> > 
> > If 'pid_set.num_pids' exceeds the current nesting level of pid namespaces,
> > the system call fails with -EINVAL.
> 
> Does it make sense to set the pid in anything but innermost container?

Yup, we might be restarting an app using a nested pid namespace, in which
case restart would specify pids for 2 (or more) of the innermost containers.

thanks,
-serge

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
       [not found]     ` <20090810145425.GA1378-+ZI9xUNit7I@public.gmane.org>
  2009-08-10 15:07       ` Serge E. Hallyn
@ 2009-08-10 22:26       ` Sukadev Bhattiprolu
  1 sibling, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-10 22:26 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan,
	Pavel Emelyanov

Pavel Machek [pavel-+ZI9xUNit7I@public.gmane.org] wrote:
| Hi!
| 
| > 
| > Subject: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
| > 
| > Container restart requires that a task have the same pid it had when it was
| > checkpointed. When containers are nested the tasks within the containers
| > exist in multiple pid namespaces and hence have multiple pids to specify
| > during restart.
| > 
| > clone_with_pids(), intended for use during restart, is the same as clone(),
| > except that it takes a 'target_pid_set' paramter. This parameter lets caller
| > choose specific pid numbers for the child process, in the process's active
| > and ancestor pid namespaces. (Descendant pid namespaces in general don't
| > matter since processes don't have pids in them anyway, but see comments
| > in copy_target_pids() regarding CLONE_NEWPID).
| 
| This should go to documentation/manpage somewhere.

Agree. Will update once we have some consensus on the interface.

The interface defined in this patch 7/7 

	syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set);

meets the requirements of checkpoint/restart. But as mentioned in patch 0/7,
we are just not sure if we should take this opportunity to address the
clone-flags limitation so we are not forced to define another flavor of
clone() soon.

Sukadev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
  2009-08-10 14:54   ` Pavel Machek
  2009-08-10 15:07     ` Serge E. Hallyn
@ 2009-08-10 22:26     ` Sukadev Bhattiprolu
       [not found]     ` <20090810145425.GA1378-+ZI9xUNit7I@public.gmane.org>
  2 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-10 22:26 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-kernel, Oren Laadan, Eric W. Biederman, serue,
	Alexey Dobriyan, Pavel Emelyanov, Andrew Morton, torvalds, mikew,
	mingo, hpa, Containers, sukadev

Pavel Machek [pavel@ucw.cz] wrote:
| Hi!
| 
| > 
| > Subject: [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall
| > 
| > Container restart requires that a task have the same pid it had when it was
| > checkpointed. When containers are nested the tasks within the containers
| > exist in multiple pid namespaces and hence have multiple pids to specify
| > during restart.
| > 
| > clone_with_pids(), intended for use during restart, is the same as clone(),
| > except that it takes a 'target_pid_set' paramter. This parameter lets caller
| > choose specific pid numbers for the child process, in the process's active
| > and ancestor pid namespaces. (Descendant pid namespaces in general don't
| > matter since processes don't have pids in them anyway, but see comments
| > in copy_target_pids() regarding CLONE_NEWPID).
| 
| This should go to documentation/manpage somewhere.

Agree. Will update once we have some consensus on the interface.

The interface defined in this patch 7/7 

	syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set);

meets the requirements of checkpoint/restart. But as mentioned in patch 0/7,
we are just not sure if we should take this opportunity to address the
clone-flags limitation so we are not forced to define another flavor of
clone() soon.

Sukadev

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
       [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                     ` (6 preceding siblings ...)
  2009-08-07  6:15   ` [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall Sukadev Bhattiprolu
@ 2009-08-13  3:45   ` Eric W. Biederman
  7 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2009-08-13  3:45 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan,
	Pavel Emelyanov

Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

> === NEW CLONE() SYSTEM CALL:
>
> To support application checkpoint/restart, a task must have the same pid it
> had when it was checkpointed.  When containers are nested, the tasks within
> the containers exist in multiple pid namespaces and hence have multiple pids
> to specify during restart.
>
> This patchset implements a new system call, clone_with_pids() that lets a
> process specify the pids of the child process.
>
> Patches 1 through 5 are helpers and we believe they are needed for application
> restart, regardless of the kernel implementation of application restart.

I'm not very impressed.

- static int alloc_pidmap(struct pid_namespace *pid_ns)
+ static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)

Do that.

That is pass in pid_max and last_pid, and you don't have to do weird
things in alloc_pidmap, and no set_pidmap is needed.

No changes to copy_process are needed it already takes a struct pid
argument.

I haven't been following closely what is gained by having a clone_with_pids
syscall?  

As for new namespaces that don't need to happen at process creation time
(which is just about anything that is left) we can create a new syscall that
unshares just that one.


Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
  2009-08-07  6:11 [RFC][v4][PATCH 0/7] clone_with_pids() system call Sukadev Bhattiprolu
                   ` (7 preceding siblings ...)
       [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-08-13  3:45 ` Eric W. Biederman
  2009-08-13  8:00   ` Sukadev Bhattiprolu
                     ` (2 more replies)
  8 siblings, 3 replies; 36+ messages in thread
From: Eric W. Biederman @ 2009-08-13  3:45 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, Oren Laadan, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev, Oleg Nesterov

Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:

> === NEW CLONE() SYSTEM CALL:
>
> To support application checkpoint/restart, a task must have the same pid it
> had when it was checkpointed.  When containers are nested, the tasks within
> the containers exist in multiple pid namespaces and hence have multiple pids
> to specify during restart.
>
> This patchset implements a new system call, clone_with_pids() that lets a
> process specify the pids of the child process.
>
> Patches 1 through 5 are helpers and we believe they are needed for application
> restart, regardless of the kernel implementation of application restart.

I'm not very impressed.

- static int alloc_pidmap(struct pid_namespace *pid_ns)
+ static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)

Do that.

That is pass in pid_max and last_pid, and you don't have to do weird
things in alloc_pidmap, and no set_pidmap is needed.

No changes to copy_process are needed it already takes a struct pid
argument.

I haven't been following closely what is gained by having a clone_with_pids
syscall?  

As for new namespaces that don't need to happen at process creation time
(which is just about anything that is left) we can create a new syscall that
unshares just that one.


Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
       [not found]   ` <m1vdks5qc8.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-08-13  8:00     ` Sukadev Bhattiprolu
  2009-08-13 13:32     ` Serge E. Hallyn
  1 sibling, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-13  8:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan,
	Pavel Emelyanov

Eric W. Biederman [ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org] wrote:
| Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
| 
| > === NEW CLONE() SYSTEM CALL:
| >
| > To support application checkpoint/restart, a task must have the same pid it
| > had when it was checkpointed.  When containers are nested, the tasks within
| > the containers exist in multiple pid namespaces and hence have multiple pids
| > to specify during restart.
| >
| > This patchset implements a new system call, clone_with_pids() that lets a
| > process specify the pids of the child process.
| >
| > Patches 1 through 5 are helpers and we believe they are needed for application
| > restart, regardless of the kernel implementation of application restart.
| 
| I'm not very impressed.
| 
| - static int alloc_pidmap(struct pid_namespace *pid_ns)
| + static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)
| 
| Do that.
| 
| That is pass in pid_max and last_pid, and you don't have to do weird
| things in alloc_pidmap, and no set_pidmap is needed.

But last_pid is from the pid_ns. Do you mean to have alloc_pidmap()
take a pid_min and pid_max and when choosing a specific pid, have
pid_min == pid_max == target_pid ?

| 
| No changes to copy_process are needed it already takes a struct pid
| argument.


I see your point about passing in both 'struct pid*' and target_pids[].
But in the common case the struct pid passed into copy_process() is
NULL - allocating pid in do_fork() would significantly alter the
existing control flow - no ? alloc_pid() assumes any new pid namespace
has been created - in copy_namespaces(). Moving the alloc_pid() to
do_fork() would require parsing clone_flags in do_fork() and pulling
pid namespace code out of copy_namespaces().

| 
| I haven't been following closely what is gained by having a clone_with_pids
| syscall?  

When restarting an application from a checkpoint, the application must get
the same pid it had at the time of checkpoint. clone_with_pids() would be
used during restart so the child can be created with a specific set of pids.

| 
| As for new namespaces that don't need to happen at process creation time
| (which is just about anything that is left) we can create a new syscall that
| unshares just that one.
| 

Ok. If all new namespaces can be handled with a variant of unshare(), we can
decouple clone_with_pids() from the clone-flags issue.

| 
| Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
  2009-08-13  3:45 ` Eric W. Biederman
@ 2009-08-13  8:00   ` Sukadev Bhattiprolu
  2009-08-13  9:05     ` Eric W. Biederman
       [not found]     ` <20090813080049.GA16639-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
       [not found]   ` <m1vdks5qc8.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  2009-08-13 13:32   ` Serge E. Hallyn
  2 siblings, 2 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-13  8:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Oren Laadan, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev, Oleg Nesterov

Eric W. Biederman [ebiederm@xmission.com] wrote:
| Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:
| 
| > === NEW CLONE() SYSTEM CALL:
| >
| > To support application checkpoint/restart, a task must have the same pid it
| > had when it was checkpointed.  When containers are nested, the tasks within
| > the containers exist in multiple pid namespaces and hence have multiple pids
| > to specify during restart.
| >
| > This patchset implements a new system call, clone_with_pids() that lets a
| > process specify the pids of the child process.
| >
| > Patches 1 through 5 are helpers and we believe they are needed for application
| > restart, regardless of the kernel implementation of application restart.
| 
| I'm not very impressed.
| 
| - static int alloc_pidmap(struct pid_namespace *pid_ns)
| + static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)
| 
| Do that.
| 
| That is pass in pid_max and last_pid, and you don't have to do weird
| things in alloc_pidmap, and no set_pidmap is needed.

But last_pid is from the pid_ns. Do you mean to have alloc_pidmap()
take a pid_min and pid_max and when choosing a specific pid, have
pid_min == pid_max == target_pid ?

| 
| No changes to copy_process are needed it already takes a struct pid
| argument.


I see your point about passing in both 'struct pid*' and target_pids[].
But in the common case the struct pid passed into copy_process() is
NULL - allocating pid in do_fork() would significantly alter the
existing control flow - no ? alloc_pid() assumes any new pid namespace
has been created - in copy_namespaces(). Moving the alloc_pid() to
do_fork() would require parsing clone_flags in do_fork() and pulling
pid namespace code out of copy_namespaces().

| 
| I haven't been following closely what is gained by having a clone_with_pids
| syscall?  

When restarting an application from a checkpoint, the application must get
the same pid it had at the time of checkpoint. clone_with_pids() would be
used during restart so the child can be created with a specific set of pids.

| 
| As for new namespaces that don't need to happen at process creation time
| (which is just about anything that is left) we can create a new syscall that
| unshares just that one.
| 

Ok. If all new namespaces can be handled with a variant of unshare(), we can
decouple clone_with_pids() from the clone-flags issue.

| 
| Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
       [not found]     ` <20090813080049.GA16639-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-08-13  9:05       ` Eric W. Biederman
  0 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2009-08-13  9:05 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan,
	Pavel Emelyanov

Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

> Eric W. Biederman [ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org] wrote:
> | Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
> | 
> | > === NEW CLONE() SYSTEM CALL:
> | >
> | > To support application checkpoint/restart, a task must have the same pid it
> | > had when it was checkpointed.  When containers are nested, the tasks within
> | > the containers exist in multiple pid namespaces and hence have multiple pids
> | > to specify during restart.
> | >
> | > This patchset implements a new system call, clone_with_pids() that lets a
> | > process specify the pids of the child process.
> | >
> | > Patches 1 through 5 are helpers and we believe they are needed for application
> | > restart, regardless of the kernel implementation of application restart.
> | 
> | I'm not very impressed.
> | 
> | - static int alloc_pidmap(struct pid_namespace *pid_ns)
> | + static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)
> | 
> | Do that.
> | 
> | That is pass in pid_max and last_pid, and you don't have to do weird
> | things in alloc_pidmap, and no set_pidmap is needed.
>
> But last_pid is from the pid_ns. Do you mean to have alloc_pidmap()
> take a pid_min and pid_max and when choosing a specific pid, have
> pid_min == pid_max == target_pid ?

Yes. It already takes a pid_min and a pid_max from the environment.
I guess the pid_min is RESERVED_PIDS by default.

> | No changes to copy_process are needed it already takes a struct pid
> | argument.
>
>
> I see your point about passing in both 'struct pid*' and target_pids[].
> But in the common case the struct pid passed into copy_process() is
> NULL - allocating pid in do_fork() would significantly alter the
> existing control flow - no ? alloc_pid() assumes any new pid namespace
> has been created - in copy_namespaces(). Moving the alloc_pid() to
> do_fork() would require parsing clone_flags in do_fork() and pulling
> pid namespace code out of copy_namespaces().

Why change do_fork?

> | I haven't been following closely what is gained by having a clone_with_pids
> | syscall?  
>
> When restarting an application from a checkpoint, the application must get
> the same pid it had at the time of checkpoint. clone_with_pids() would be
> used during restart so the child can be created with a specific set of pids.

That part I understand.  What I don't understand is why have that one part be
special and have user space do the work?

> | As for new namespaces that don't need to happen at process creation time
> | (which is just about anything that is left) we can create a new syscall that
> | unshares just that one.
> | 
>
> Ok. If all new namespaces can be handled with a variant of unshare(), we can
> decouple clone_with_pids() from the clone-flags issue.

What I mean is we should be able to get away things like:
sys_new_timens();

Very very simple syscalls.  One per each kind of namespace we want new
instances of.

Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
  2009-08-13  8:00   ` Sukadev Bhattiprolu
@ 2009-08-13  9:05     ` Eric W. Biederman
  2009-08-13 19:46       ` Serge E. Hallyn
                         ` (2 more replies)
       [not found]     ` <20090813080049.GA16639-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  1 sibling, 3 replies; 36+ messages in thread
From: Eric W. Biederman @ 2009-08-13  9:05 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, Oren Laadan, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev, Oleg Nesterov

Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:

> Eric W. Biederman [ebiederm@xmission.com] wrote:
> | Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:
> | 
> | > === NEW CLONE() SYSTEM CALL:
> | >
> | > To support application checkpoint/restart, a task must have the same pid it
> | > had when it was checkpointed.  When containers are nested, the tasks within
> | > the containers exist in multiple pid namespaces and hence have multiple pids
> | > to specify during restart.
> | >
> | > This patchset implements a new system call, clone_with_pids() that lets a
> | > process specify the pids of the child process.
> | >
> | > Patches 1 through 5 are helpers and we believe they are needed for application
> | > restart, regardless of the kernel implementation of application restart.
> | 
> | I'm not very impressed.
> | 
> | - static int alloc_pidmap(struct pid_namespace *pid_ns)
> | + static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)
> | 
> | Do that.
> | 
> | That is pass in pid_max and last_pid, and you don't have to do weird
> | things in alloc_pidmap, and no set_pidmap is needed.
>
> But last_pid is from the pid_ns. Do you mean to have alloc_pidmap()
> take a pid_min and pid_max and when choosing a specific pid, have
> pid_min == pid_max == target_pid ?

Yes. It already takes a pid_min and a pid_max from the environment.
I guess the pid_min is RESERVED_PIDS by default.

> | No changes to copy_process are needed it already takes a struct pid
> | argument.
>
>
> I see your point about passing in both 'struct pid*' and target_pids[].
> But in the common case the struct pid passed into copy_process() is
> NULL - allocating pid in do_fork() would significantly alter the
> existing control flow - no ? alloc_pid() assumes any new pid namespace
> has been created - in copy_namespaces(). Moving the alloc_pid() to
> do_fork() would require parsing clone_flags in do_fork() and pulling
> pid namespace code out of copy_namespaces().

Why change do_fork?

> | I haven't been following closely what is gained by having a clone_with_pids
> | syscall?  
>
> When restarting an application from a checkpoint, the application must get
> the same pid it had at the time of checkpoint. clone_with_pids() would be
> used during restart so the child can be created with a specific set of pids.

That part I understand.  What I don't understand is why have that one part be
special and have user space do the work?

> | As for new namespaces that don't need to happen at process creation time
> | (which is just about anything that is left) we can create a new syscall that
> | unshares just that one.
> | 
>
> Ok. If all new namespaces can be handled with a variant of unshare(), we can
> decouple clone_with_pids() from the clone-flags issue.

What I mean is we should be able to get away things like:
sys_new_timens();

Very very simple syscalls.  One per each kind of namespace we want new
instances of.

Eric



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
       [not found]   ` <m1vdks5qc8.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  2009-08-13  8:00     ` Sukadev Bhattiprolu
@ 2009-08-13 13:32     ` Serge E. Hallyn
  1 sibling, 0 replies; 36+ messages in thread
From: Serge E. Hallyn @ 2009-08-13 13:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI,
	Sukadev Bhattiprolu, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
> 
> > === NEW CLONE() SYSTEM CALL:
> >
> > To support application checkpoint/restart, a task must have the same pid it
> > had when it was checkpointed.  When containers are nested, the tasks within
> > the containers exist in multiple pid namespaces and hence have multiple pids
> > to specify during restart.
> >
> > This patchset implements a new system call, clone_with_pids() that lets a
> > process specify the pids of the child process.
> >
> > Patches 1 through 5 are helpers and we believe they are needed for application
> > restart, regardless of the kernel implementation of application restart.
> 
> I'm not very impressed.
> 
> - static int alloc_pidmap(struct pid_namespace *pid_ns)
> + static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)

Regardless of Suka's patch, note that I did send you a patch back on march 21
to make pid_max a pidns property.  I suspect that's still something we
want. 

-serge

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
  2009-08-13  3:45 ` Eric W. Biederman
  2009-08-13  8:00   ` Sukadev Bhattiprolu
       [not found]   ` <m1vdks5qc8.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-08-13 13:32   ` Serge E. Hallyn
  2 siblings, 0 replies; 36+ messages in thread
From: Serge E. Hallyn @ 2009-08-13 13:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sukadev Bhattiprolu, linux-kernel, Oren Laadan, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev, Oleg Nesterov

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:
> 
> > === NEW CLONE() SYSTEM CALL:
> >
> > To support application checkpoint/restart, a task must have the same pid it
> > had when it was checkpointed.  When containers are nested, the tasks within
> > the containers exist in multiple pid namespaces and hence have multiple pids
> > to specify during restart.
> >
> > This patchset implements a new system call, clone_with_pids() that lets a
> > process specify the pids of the child process.
> >
> > Patches 1 through 5 are helpers and we believe they are needed for application
> > restart, regardless of the kernel implementation of application restart.
> 
> I'm not very impressed.
> 
> - static int alloc_pidmap(struct pid_namespace *pid_ns)
> + static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)

Regardless of Suka's patch, note that I did send you a patch back on march 21
to make pid_max a pidns property.  I suspect that's still something we
want. 

-serge

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
       [not found]       ` <m1vdks2iea.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-08-13 19:46         ` Serge E. Hallyn
  2009-08-18  3:31         ` Sukadev Bhattiprolu
  1 sibling, 0 replies; 36+ messages in thread
From: Serge E. Hallyn @ 2009-08-13 19:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI,
	Sukadev Bhattiprolu, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
> 
> > Eric W. Biederman [ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org] wrote:
> > | Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
> > | 
> > | > === NEW CLONE() SYSTEM CALL:
> > | >
> > | > To support application checkpoint/restart, a task must have the same pid it
> > | > had when it was checkpointed.  When containers are nested, the tasks within
> > | > the containers exist in multiple pid namespaces and hence have multiple pids
> > | > to specify during restart.
> > | >
> > | > This patchset implements a new system call, clone_with_pids() that lets a
> > | > process specify the pids of the child process.
> > | >
> > | > Patches 1 through 5 are helpers and we believe they are needed for application
> > | > restart, regardless of the kernel implementation of application restart.
> > | 
> > | I'm not very impressed.
> > | 
> > | - static int alloc_pidmap(struct pid_namespace *pid_ns)
> > | + static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)
> > | 
> > | Do that.
> > | 
> > | That is pass in pid_max and last_pid, and you don't have to do weird
> > | things in alloc_pidmap, and no set_pidmap is needed.
> >
> > But last_pid is from the pid_ns. Do you mean to have alloc_pidmap()
> > take a pid_min and pid_max and when choosing a specific pid, have
> > pid_min == pid_max == target_pid ?
> 
> Yes. It already takes a pid_min and a pid_max from the environment.
> I guess the pid_min is RESERVED_PIDS by default.
> 
> > | No changes to copy_process are needed it already takes a struct pid
> > | argument.
> >
> >
> > I see your point about passing in both 'struct pid*' and target_pids[].
> > But in the common case the struct pid passed into copy_process() is
> > NULL - allocating pid in do_fork() would significantly alter the
> > existing control flow - no ? alloc_pid() assumes any new pid namespace
> > has been created - in copy_namespaces(). Moving the alloc_pid() to
> > do_fork() would require parsing clone_flags in do_fork() and pulling
> > pid namespace code out of copy_namespaces().
> 
> Why change do_fork?
> 
> > | I haven't been following closely what is gained by having a clone_with_pids
> > | syscall?  
> >
> > When restarting an application from a checkpoint, the application must get
> > the same pid it had at the time of checkpoint. clone_with_pids() would be
> > used during restart so the child can be created with a specific set of pids.
> 
> That part I understand.  What I don't understand is why have that one part be
> special and have user space do the work?

How would this be used then?  Let's say I'm recreating a process tree
with two nested pid namespaces.  so just using clone(CLONE_NEWPID) we'd
have P{500} creates P{1501,1} which creates P{1502,1,2} which creates
P{1502,2,3} (1502 in top namespace, 2 in child ns, 3 in lowest pid ns).
But now we want to create P{X, 27, 953} (i.e. X can be anything).  How
do we specify that for pidns 2 we want pid_min=pid_max=27, and for
pidns 3 pid_min=pid_max=953?

-serge

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
  2009-08-13  9:05     ` Eric W. Biederman
@ 2009-08-13 19:46       ` Serge E. Hallyn
  2009-08-21 16:11         ` Serge E. Hallyn
       [not found]         ` <20090813194616.GA10493-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
       [not found]       ` <m1vdks2iea.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  2009-08-18  3:31       ` Sukadev Bhattiprolu
  2 siblings, 2 replies; 36+ messages in thread
From: Serge E. Hallyn @ 2009-08-13 19:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sukadev Bhattiprolu, linux-kernel, Oren Laadan, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev, Oleg Nesterov

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:
> 
> > Eric W. Biederman [ebiederm@xmission.com] wrote:
> > | Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:
> > | 
> > | > === NEW CLONE() SYSTEM CALL:
> > | >
> > | > To support application checkpoint/restart, a task must have the same pid it
> > | > had when it was checkpointed.  When containers are nested, the tasks within
> > | > the containers exist in multiple pid namespaces and hence have multiple pids
> > | > to specify during restart.
> > | >
> > | > This patchset implements a new system call, clone_with_pids() that lets a
> > | > process specify the pids of the child process.
> > | >
> > | > Patches 1 through 5 are helpers and we believe they are needed for application
> > | > restart, regardless of the kernel implementation of application restart.
> > | 
> > | I'm not very impressed.
> > | 
> > | - static int alloc_pidmap(struct pid_namespace *pid_ns)
> > | + static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)
> > | 
> > | Do that.
> > | 
> > | That is pass in pid_max and last_pid, and you don't have to do weird
> > | things in alloc_pidmap, and no set_pidmap is needed.
> >
> > But last_pid is from the pid_ns. Do you mean to have alloc_pidmap()
> > take a pid_min and pid_max and when choosing a specific pid, have
> > pid_min == pid_max == target_pid ?
> 
> Yes. It already takes a pid_min and a pid_max from the environment.
> I guess the pid_min is RESERVED_PIDS by default.
> 
> > | No changes to copy_process are needed it already takes a struct pid
> > | argument.
> >
> >
> > I see your point about passing in both 'struct pid*' and target_pids[].
> > But in the common case the struct pid passed into copy_process() is
> > NULL - allocating pid in do_fork() would significantly alter the
> > existing control flow - no ? alloc_pid() assumes any new pid namespace
> > has been created - in copy_namespaces(). Moving the alloc_pid() to
> > do_fork() would require parsing clone_flags in do_fork() and pulling
> > pid namespace code out of copy_namespaces().
> 
> Why change do_fork?
> 
> > | I haven't been following closely what is gained by having a clone_with_pids
> > | syscall?  
> >
> > When restarting an application from a checkpoint, the application must get
> > the same pid it had at the time of checkpoint. clone_with_pids() would be
> > used during restart so the child can be created with a specific set of pids.
> 
> That part I understand.  What I don't understand is why have that one part be
> special and have user space do the work?

How would this be used then?  Let's say I'm recreating a process tree
with two nested pid namespaces.  so just using clone(CLONE_NEWPID) we'd
have P{500} creates P{1501,1} which creates P{1502,1,2} which creates
P{1502,2,3} (1502 in top namespace, 2 in child ns, 3 in lowest pid ns).
But now we want to create P{X, 27, 953} (i.e. X can be anything).  How
do we specify that for pidns 2 we want pid_min=pid_max=27, and for
pidns 3 pid_min=pid_max=953?

-serge

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
       [not found]       ` <m1vdks2iea.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  2009-08-13 19:46         ` Serge E. Hallyn
@ 2009-08-18  3:31         ` Sukadev Bhattiprolu
  1 sibling, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-18  3:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Alexey Dobriyan,
	Pavel Emelyanov

Eric W. Biederman [ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org] wrote:

| > But last_pid is from the pid_ns. Do you mean to have alloc_pidmap()
| > take a pid_min and pid_max and when choosing a specific pid, have
| > pid_min == pid_max == target_pid ?
| 
| Yes. It already takes a pid_min and a pid_max from the environment.
| I guess the pid_min is RESERVED_PIDS by default.

Well, defining alloc_pidmap() as:

	int alloc_pidmap(pid_ns, int min, int max)

seems to unnecessarily complicate alloc_pidmap() - what if 'min' is 0
but 'max' is not or vice-versa. Generalizing alloc_pidmap() to handle
all combinations seems like an overkill and/or expose RESERVED_PIDS and
pid_max caller.

Maybe we can drop the set_pidmap() call by sticking to 

	int alloc_pidmap(pid_ns, target_pid)

and setting 'max_scan' to 1 when target_pid is set (see quick patch below).

| 
| > | No changes to copy_process are needed it already takes a struct pid
| > | argument.
| >
| >
| > I see your point about passing in both 'struct pid*' and target_pids[].
| > But in the common case the struct pid passed into copy_process() is
| > NULL - allocating pid in do_fork() would significantly alter the
| > existing control flow - no ? alloc_pid() assumes any new pid namespace
| > has been created - in copy_namespaces(). Moving the alloc_pid() to
| > do_fork() would require parsing clone_flags in do_fork() and pulling
| > pid namespace code out of copy_namespaces().
| 
| Why change do_fork?

Sorry, maybe I am missing something. If we don't pass target_pids as a
parameter to copy_process(), how do we specify the target pids ?
Fill in a dummy struct pid with the target-pids and pass it into
copy_process() ?

| 
| > | I haven't been following closely what is gained by having a clone_with_pids
| > | syscall?  
| >
| > When restarting an application from a checkpoint, the application must get
| > the same pid it had at the time of checkpoint. clone_with_pids() would be
| > used during restart so the child can be created with a specific set of pids.
| 
| That part I understand.  What I don't understand is why have that one part be
| special and have user space do the work?

By 'work' do you mean the rest of the process-restart logic ?

The user-level restart program creates the necessary process using
clone_with_pids() and each child process calls another system call,
sys_restart() which restores the process state.

Sukadev

---

Index: linux-2.6/kernel/pid.c
===================================================================
--- linux-2.6.orig/kernel/pid.c	2009-08-17 18:43:15.000000000 -0700
+++ linux-2.6/kernel/pid.c	2009-08-17 19:41:57.000000000 -0700
@@ -122,18 +122,29 @@
 	atomic_inc(&map->nr_free);
 }
 
-static int alloc_pidmap(struct pid_namespace *pid_ns)
+static int alloc_pidmap(struct pid_namespace *pid_ns, int target_pid)
 {
 	int i, offset, max_scan, pid, last = pid_ns->last_pid;
 	struct pidmap *map;
 	int rc;
 
-	pid = last + 1;
-	if (pid >= pid_max)
-		pid = RESERVED_PIDS;
+	if (target_pid) {
+		if (target_pid < 0 || target_pid >= pid_max)
+			return -EINVAL;
+		pid = target_pid;
+		max_scan = 1;
+	} else {
+		pid = last + 1;
+		if (pid >= pid_max)
+			pid = RESERVED_PIDS;
+	}
+
 	offset = pid & BITS_PER_PAGE_MASK;
 	map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
+
 	max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset;
+	if (target_pid)
+		max_scan = 1;
 
 	rc = -EAGAIN;
 	for (i = 0; i <= max_scan; ++i) {
@@ -258,7 +269,7 @@
 
 	tmp = ns;
 	for (i = ns->level; i >= 0; i--) {
-		nr = alloc_pidmap(tmp);
+		nr = alloc_pidmap(tmp, 0);
 		if (nr < 0)
 			goto out_free;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
  2009-08-13  9:05     ` Eric W. Biederman
  2009-08-13 19:46       ` Serge E. Hallyn
       [not found]       ` <m1vdks2iea.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-08-18  3:31       ` Sukadev Bhattiprolu
  2 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-18  3:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Oren Laadan, serue, Alexey Dobriyan,
	Pavel Emelyanov, Andrew Morton, torvalds, mikew, mingo, hpa,
	Containers, sukadev, Oleg Nesterov

Eric W. Biederman [ebiederm@xmission.com] wrote:

| > But last_pid is from the pid_ns. Do you mean to have alloc_pidmap()
| > take a pid_min and pid_max and when choosing a specific pid, have
| > pid_min == pid_max == target_pid ?
| 
| Yes. It already takes a pid_min and a pid_max from the environment.
| I guess the pid_min is RESERVED_PIDS by default.

Well, defining alloc_pidmap() as:

	int alloc_pidmap(pid_ns, int min, int max)

seems to unnecessarily complicate alloc_pidmap() - what if 'min' is 0
but 'max' is not or vice-versa. Generalizing alloc_pidmap() to handle
all combinations seems like an overkill and/or expose RESERVED_PIDS and
pid_max caller.

Maybe we can drop the set_pidmap() call by sticking to 

	int alloc_pidmap(pid_ns, target_pid)

and setting 'max_scan' to 1 when target_pid is set (see quick patch below).

| 
| > | No changes to copy_process are needed it already takes a struct pid
| > | argument.
| >
| >
| > I see your point about passing in both 'struct pid*' and target_pids[].
| > But in the common case the struct pid passed into copy_process() is
| > NULL - allocating pid in do_fork() would significantly alter the
| > existing control flow - no ? alloc_pid() assumes any new pid namespace
| > has been created - in copy_namespaces(). Moving the alloc_pid() to
| > do_fork() would require parsing clone_flags in do_fork() and pulling
| > pid namespace code out of copy_namespaces().
| 
| Why change do_fork?

Sorry, maybe I am missing something. If we don't pass target_pids as a
parameter to copy_process(), how do we specify the target pids ?
Fill in a dummy struct pid with the target-pids and pass it into
copy_process() ?

| 
| > | I haven't been following closely what is gained by having a clone_with_pids
| > | syscall?  
| >
| > When restarting an application from a checkpoint, the application must get
| > the same pid it had at the time of checkpoint. clone_with_pids() would be
| > used during restart so the child can be created with a specific set of pids.
| 
| That part I understand.  What I don't understand is why have that one part be
| special and have user space do the work?

By 'work' do you mean the rest of the process-restart logic ?

The user-level restart program creates the necessary process using
clone_with_pids() and each child process calls another system call,
sys_restart() which restores the process state.

Sukadev

---

Index: linux-2.6/kernel/pid.c
===================================================================
--- linux-2.6.orig/kernel/pid.c	2009-08-17 18:43:15.000000000 -0700
+++ linux-2.6/kernel/pid.c	2009-08-17 19:41:57.000000000 -0700
@@ -122,18 +122,29 @@
 	atomic_inc(&map->nr_free);
 }
 
-static int alloc_pidmap(struct pid_namespace *pid_ns)
+static int alloc_pidmap(struct pid_namespace *pid_ns, int target_pid)
 {
 	int i, offset, max_scan, pid, last = pid_ns->last_pid;
 	struct pidmap *map;
 	int rc;
 
-	pid = last + 1;
-	if (pid >= pid_max)
-		pid = RESERVED_PIDS;
+	if (target_pid) {
+		if (target_pid < 0 || target_pid >= pid_max)
+			return -EINVAL;
+		pid = target_pid;
+		max_scan = 1;
+	} else {
+		pid = last + 1;
+		if (pid >= pid_max)
+			pid = RESERVED_PIDS;
+	}
+
 	offset = pid & BITS_PER_PAGE_MASK;
 	map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
+
 	max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset;
+	if (target_pid)
+		max_scan = 1;
 
 	rc = -EAGAIN;
 	for (i = 0; i <= max_scan; ++i) {
@@ -258,7 +269,7 @@
 
 	tmp = ns;
 	for (i = ns->level; i >= 0; i--) {
-		nr = alloc_pidmap(tmp);
+		nr = alloc_pidmap(tmp, 0);
 		if (nr < 0)
 			goto out_free;
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
       [not found]         ` <20090813194616.GA10493-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-08-21 16:11           ` Serge E. Hallyn
  0 siblings, 0 replies; 36+ messages in thread
From: Serge E. Hallyn @ 2009-08-21 16:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, mingo-X9Un+BFzKDI,
	Sukadev Bhattiprolu, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov

Quoting Serge E. Hallyn (serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> > Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
> > 
> > > Eric W. Biederman [ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org] wrote:
> > > | Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
> > > | 
> > > | > === NEW CLONE() SYSTEM CALL:
> > > | >
> > > | > To support application checkpoint/restart, a task must have the same pid it
> > > | > had when it was checkpointed.  When containers are nested, the tasks within
> > > | > the containers exist in multiple pid namespaces and hence have multiple pids
> > > | > to specify during restart.
> > > | >
> > > | > This patchset implements a new system call, clone_with_pids() that lets a
> > > | > process specify the pids of the child process.
> > > | >
> > > | > Patches 1 through 5 are helpers and we believe they are needed for application
> > > | > restart, regardless of the kernel implementation of application restart.
> > > | 
> > > | I'm not very impressed.
> > > | 
> > > | - static int alloc_pidmap(struct pid_namespace *pid_ns)
> > > | + static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)
> > > | 
> > > | Do that.
> > > | 
> > > | That is pass in pid_max and last_pid, and you don't have to do weird
> > > | things in alloc_pidmap, and no set_pidmap is needed.
> > >
> > > But last_pid is from the pid_ns. Do you mean to have alloc_pidmap()
> > > take a pid_min and pid_max and when choosing a specific pid, have
> > > pid_min == pid_max == target_pid ?
> > 
> > Yes. It already takes a pid_min and a pid_max from the environment.
> > I guess the pid_min is RESERVED_PIDS by default.
> > 
> > > | No changes to copy_process are needed it already takes a struct pid
> > > | argument.
> > >
> > >
> > > I see your point about passing in both 'struct pid*' and target_pids[].
> > > But in the common case the struct pid passed into copy_process() is
> > > NULL - allocating pid in do_fork() would significantly alter the
> > > existing control flow - no ? alloc_pid() assumes any new pid namespace
> > > has been created - in copy_namespaces(). Moving the alloc_pid() to
> > > do_fork() would require parsing clone_flags in do_fork() and pulling
> > > pid namespace code out of copy_namespaces().
> > 
> > Why change do_fork?
> > 
> > > | I haven't been following closely what is gained by having a clone_with_pids
> > > | syscall?  
> > >
> > > When restarting an application from a checkpoint, the application must get
> > > the same pid it had at the time of checkpoint. clone_with_pids() would be
> > > used during restart so the child can be created with a specific set of pids.
> > 
> > That part I understand.  What I don't understand is why have that one part be
> > special and have user space do the work?
> 
> How would this be used then?  Let's say I'm recreating a process tree
> with two nested pid namespaces.  so just using clone(CLONE_NEWPID) we'd
> have P{500} creates P{1501,1} which creates P{1502,1,2} which creates
> P{1502,2,3} (1502 in top namespace, 2 in child ns, 3 in lowest pid ns).
> But now we want to create P{X, 27, 953} (i.e. X can be anything).  How
> do we specify that for pidns 2 we want pid_min=pid_max=27, and for
> pidns 3 pid_min=pid_max=953?

Eric, if you have an idea for how to do this, please let me know,
and I'll set about trying a new patchset to do it.  But as it stands
I don't see how to make your suggestion useful from userspace.

thanks,
-serge

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][v4][PATCH 0/7] clone_with_pids() system call
  2009-08-13 19:46       ` Serge E. Hallyn
@ 2009-08-21 16:11         ` Serge E. Hallyn
       [not found]         ` <20090813194616.GA10493-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  1 sibling, 0 replies; 36+ messages in thread
From: Serge E. Hallyn @ 2009-08-21 16:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Containers, Oleg Nesterov, linux-kernel, hpa, mingo,
	Sukadev Bhattiprolu, torvalds, Alexey Dobriyan, Pavel Emelyanov

Quoting Serge E. Hallyn (serue@us.ibm.com):
> Quoting Eric W. Biederman (ebiederm@xmission.com):
> > Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:
> > 
> > > Eric W. Biederman [ebiederm@xmission.com] wrote:
> > > | Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:
> > > | 
> > > | > === NEW CLONE() SYSTEM CALL:
> > > | >
> > > | > To support application checkpoint/restart, a task must have the same pid it
> > > | > had when it was checkpointed.  When containers are nested, the tasks within
> > > | > the containers exist in multiple pid namespaces and hence have multiple pids
> > > | > to specify during restart.
> > > | >
> > > | > This patchset implements a new system call, clone_with_pids() that lets a
> > > | > process specify the pids of the child process.
> > > | >
> > > | > Patches 1 through 5 are helpers and we believe they are needed for application
> > > | > restart, regardless of the kernel implementation of application restart.
> > > | 
> > > | I'm not very impressed.
> > > | 
> > > | - static int alloc_pidmap(struct pid_namespace *pid_ns)
> > > | + static int alloc_pidmap(struct pid_namespace *pid_ns, int pid_max, int last_pid)
> > > | 
> > > | Do that.
> > > | 
> > > | That is pass in pid_max and last_pid, and you don't have to do weird
> > > | things in alloc_pidmap, and no set_pidmap is needed.
> > >
> > > But last_pid is from the pid_ns. Do you mean to have alloc_pidmap()
> > > take a pid_min and pid_max and when choosing a specific pid, have
> > > pid_min == pid_max == target_pid ?
> > 
> > Yes. It already takes a pid_min and a pid_max from the environment.
> > I guess the pid_min is RESERVED_PIDS by default.
> > 
> > > | No changes to copy_process are needed it already takes a struct pid
> > > | argument.
> > >
> > >
> > > I see your point about passing in both 'struct pid*' and target_pids[].
> > > But in the common case the struct pid passed into copy_process() is
> > > NULL - allocating pid in do_fork() would significantly alter the
> > > existing control flow - no ? alloc_pid() assumes any new pid namespace
> > > has been created - in copy_namespaces(). Moving the alloc_pid() to
> > > do_fork() would require parsing clone_flags in do_fork() and pulling
> > > pid namespace code out of copy_namespaces().
> > 
> > Why change do_fork?
> > 
> > > | I haven't been following closely what is gained by having a clone_with_pids
> > > | syscall?  
> > >
> > > When restarting an application from a checkpoint, the application must get
> > > the same pid it had at the time of checkpoint. clone_with_pids() would be
> > > used during restart so the child can be created with a specific set of pids.
> > 
> > That part I understand.  What I don't understand is why have that one part be
> > special and have user space do the work?
> 
> How would this be used then?  Let's say I'm recreating a process tree
> with two nested pid namespaces.  so just using clone(CLONE_NEWPID) we'd
> have P{500} creates P{1501,1} which creates P{1502,1,2} which creates
> P{1502,2,3} (1502 in top namespace, 2 in child ns, 3 in lowest pid ns).
> But now we want to create P{X, 27, 953} (i.e. X can be anything).  How
> do we specify that for pidns 2 we want pid_min=pid_max=27, and for
> pidns 3 pid_min=pid_max=953?

Eric, if you have an idea for how to do this, please let me know,
and I'll set about trying a new patchset to do it.  But as it stands
I don't see how to make your suggestion useful from userspace.

thanks,
-serge

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC][v4][PATCH 0/7] clone_with_pids() system call
@ 2009-08-07  6:11 Sukadev Bhattiprolu
  0 siblings, 0 replies; 36+ messages in thread
From: Sukadev Bhattiprolu @ 2009-08-07  6:11 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Containers, Eric W. Biederman, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alexey Dobriyan, Pavel Emelyanov



=== NEW CLONE() SYSTEM CALL:

To support application checkpoint/restart, a task must have the same pid it
had when it was checkpointed.  When containers are nested, the tasks within
the containers exist in multiple pid namespaces and hence have multiple pids
to specify during restart.

This patchset implements a new system call, clone_with_pids() that lets a
process specify the pids of the child process.

Patches 1 through 5 are helpers and we believe they are needed for application
restart, regardless of the kernel implementation of application restart.

Patch 7/7 defines a prototype of the new system call.

=== IMPORTANT TODO:

clone() system call has another limitation - all available bits in clone-flags
are in use and any new clone-flag will need a variant of the clone() system
call. 

It appears to make sense to try and extend this new system call to address
this limitation as well. The basic requirements of a new clone system call
could then be summarized as:

	- do everything clone() does today, and
	- give application an ability to choose pids for the child process
	  in all ancestor pid namespaces, and
	- allow more clone_flags

Contstraints:

	- system-calls are restricted to 6 parameters and clone() already
	  takes 5 parameters, any extension to clone() interface would require
	  one or more copy_from_user().

	- does copy_from_user() of a few words have a significant impact on
	  the total cost of clone() ?

Based on these requirements and constraints, we have been exploring a couple
of system call interfaces and appreciate any iput.  

1. =====

	#if 64bit
	#define CLONE_FLAGS_WORDS	1
	#else
	#define CLONE_FLAGS_WORDS	2
	#endif

        struct pid_set {
                int num_pids;
                pid_t *pids;
        };

	typedef struct {
		unsigned long flags[CLONE_FLAGS_WORDS];
	} clone_flags_t;

	int clone_extended(clone_flags_t *flags, void *child_stack, int *unused,
		int *parent_tid, int *child_tid, struct pid_set *pid_set);

	Pros:
		- extendible clone_flags (like sigset_t)

	Cons:
		- copy_from_user() needed on all architectures (we maybe able
		  to play some tricks with 'clone_flags_t' to avoid the copy
		  on 64-bit archtitectures till N_CLONE_FLAGS exceeds 64).

		- Both applications and kernel must use interfaces equivalent
		  to sigsetops(3) to test/set/clear clone flags.
2. ======

	struct clone_info {
		int num_clone_high_words;
		int *flags_high;
		struct pid_set pid_set;
	}

        int clone_extended(int flags_low, void *child_stack, void *unused,
		int *parent_tid, int *child_tid, struct clone_info *clone_info);

	Pros:
		- copy_from_user() needed only for new flags and pid_set

	Cons:
		- splitting the high and low clone-flags is awkward ?


Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2009-08-21 16:11 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-07  6:11 [RFC][v4][PATCH 0/7] clone_with_pids() system call Sukadev Bhattiprolu
2009-08-07  6:12 ` [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
2009-08-07  6:12 ` [RFC][v4][PATCH 2/7]: Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
2009-08-07  6:13 ` [RFC][v4][PATCH 3/7]: Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
2009-08-07  6:13 ` [RFC][v4][PATCH 4/7]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
2009-08-07  6:13 ` [RFC][v4][PATCH 5/7]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
2009-08-07  6:14 ` [RFC][v4][PATCH 6/7]: Define do_fork_with_pids() Sukadev Bhattiprolu
2009-08-07  6:15 ` [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall Sukadev Bhattiprolu
2009-08-10 14:54   ` Pavel Machek
2009-08-10 15:07     ` Serge E. Hallyn
2009-08-10 22:26     ` Sukadev Bhattiprolu
     [not found]     ` <20090810145425.GA1378-+ZI9xUNit7I@public.gmane.org>
2009-08-10 15:07       ` Serge E. Hallyn
2009-08-10 22:26       ` Sukadev Bhattiprolu
     [not found]   ` <20090807061517.GG20672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-08-10 14:54     ` Pavel Machek
     [not found] ` <20090807061103.GA19343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-08-07  6:12   ` [RFC][v4][PATCH 1/7]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
2009-08-07  6:12   ` [RFC][v4][PATCH 2/7]: Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
2009-08-07  6:13   ` [RFC][v4][PATCH 3/7]: Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
2009-08-07  6:13   ` [RFC][v4][PATCH 4/7]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
2009-08-07  6:13   ` [RFC][v4][PATCH 5/7]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
2009-08-07  6:14   ` [RFC][v4][PATCH 6/7]: Define do_fork_with_pids() Sukadev Bhattiprolu
2009-08-07  6:15   ` [RFC][v4][PATCH 7/7]: Define clone_with_pids syscall Sukadev Bhattiprolu
2009-08-13  3:45   ` [RFC][v4][PATCH 0/7] clone_with_pids() system call Eric W. Biederman
2009-08-13  3:45 ` Eric W. Biederman
2009-08-13  8:00   ` Sukadev Bhattiprolu
2009-08-13  9:05     ` Eric W. Biederman
2009-08-13 19:46       ` Serge E. Hallyn
2009-08-21 16:11         ` Serge E. Hallyn
     [not found]         ` <20090813194616.GA10493-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-08-21 16:11           ` Serge E. Hallyn
     [not found]       ` <m1vdks2iea.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-08-13 19:46         ` Serge E. Hallyn
2009-08-18  3:31         ` Sukadev Bhattiprolu
2009-08-18  3:31       ` Sukadev Bhattiprolu
     [not found]     ` <20090813080049.GA16639-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-08-13  9:05       ` Eric W. Biederman
     [not found]   ` <m1vdks5qc8.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-08-13  8:00     ` Sukadev Bhattiprolu
2009-08-13 13:32     ` Serge E. Hallyn
2009-08-13 13:32   ` Serge E. Hallyn
2009-08-07  6:11 Sukadev Bhattiprolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.