All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][v6][PATCH 0/9] clone_with_pids() syscall
@ 2009-09-10  6:06 Sukadev Bhattiprolu
  2009-09-10  6:08 ` [RFC][v6][PATCH 1/9]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
                   ` (10 more replies)
  0 siblings, 11 replies; 41+ messages in thread
From: Sukadev Bhattiprolu @ 2009-09-10  6:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oren Laadan, Eric W. Biederman, Alexey Dobriyan, Pavel Emelyanov,
	Andrew Morton, torvalds, mikew, mingo, hpa, Nathan Lynch, arnd,
	container, sukadev


=== NEW CLONE() SYSTEM CALL:

To support application checkpoint/restart, a task must have the same pid it
had when it was checkpointed.  When containers are nested, the tasks within
the containers exist in multiple pid namespaces and hence have multiple pids
to specify during restart.

This patchset implements a new system call, clone_with_pids() that lets a
process specify the pids of the child process.

Patches 1 through 6 are helpers and we believe they are needed for application
restart, regardless of the kernel implementation of application restart.

Patch 8 defines a prototype of the new system call. Patch 9 adds some
documentation on the new system call.

Changelog[v6]:
	- [Nathan Lynch, Arnd Bergmann, H. Peter Anvin, Linus Torvalds]
	  Change 'pid_set.pids' to 'pid_t pids[]' so sizeof(struct pid_set) is
	  constant across architectures (Patches 7, 8).
	- [Nathan Lynch] Change pid_set.num_pids to unsigned and remove
	  'unum_pids < 0' check (Patches 7,8)
	- [Pavel Machek] New patch (Patch 9) to add some documentation.

Changelog[v5]:
	- Make 'pid_max' a property of pid_ns (Integrated Serge Hallyn's patch
	  into this set)
	- (Eric Biederman): Avoid the new function, set_pidmap() - added
	  couple of checks on 'target_pid' in alloc_pidmap() itself.

=== IMPORTANT TODO:

clone() system call has another limitation - all available bits in clone-flags
are in use and any new clone-flag will need a variant of the clone() system
call. 

It appears to make sense to try and extend this new system call to address
this limitation as well. The basic requirements of a new clone system call
could then be summarized as:

	- do everything clone() does today, and
	- give application an ability to choose pids for the child process
	  in all ancestor pid namespaces, and
	- allow more clone_flags

Contstraints:

	- system-calls are restricted to 6 parameters and clone() already
	  takes 5 parameters, any extension to clone() interface would require
	  one or more copy_from_user().

	- does copy_from_user() of a few words have a significant impact on
	  the total cost of clone() ?

Based on these requirements and constraints, we have been exploring a couple
of system call interfaces and appreciate any iput.  

1. =====

	#if 64bit
	#define CLONE_FLAGS_WORDS	1
	#else
	#define CLONE_FLAGS_WORDS	2
	#endif

        struct pid_set {
                int num_pids;
                pid_t pids[];
        };

	typedef struct {
		unsigned long flags[CLONE_FLAGS_WORDS];
	} clone_flags_t;

	int clone_extended(clone_flags_t *flags, void *child_stack, int *unused,
		int *parent_tid, int *child_tid, struct pid_set *pid_set);

	Pros:
		- extendible clone_flags (like sigset_t)

	Cons:
		- copy_from_user() needed on all architectures (we maybe able
		  to play some tricks with 'clone_flags_t' to avoid the copy
		  on 64-bit archtitectures till N_CLONE_FLAGS exceeds 64).

		- Both applications and kernel must use interfaces equivalent
		  to sigsetops(3) to test/set/clear clone flags.
2. ======

	struct clone_info {
		int num_clone_high_words;
		int *flags_high;
		struct pid_set pid_set;
	}

        int clone_extended(int flags_low, void *child_stack, void *unused,
		int *parent_tid, int *child_tid, struct clone_info *clone_info);

	Pros:
		- copy_from_user() needed only for new flags and pid_set

	Cons:
		- splitting the high and low clone-flags is awkward ?


Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2009-09-14  7:14 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-10  6:06 [RFC][v6][PATCH 0/9] clone_with_pids() syscall Sukadev Bhattiprolu
2009-09-10  6:08 ` [RFC][v6][PATCH 1/9]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
2009-09-10  6:09 ` [RFC][v6][PATCH 2/9]: Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
2009-09-10  6:09 ` [RFC][v6][PATCH 3/9] Make pid_max a pid_ns property Sukadev Bhattiprolu
2009-09-10  6:09 ` [RFC][v6][PATCH 4/9]: Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
2009-09-10  6:10 ` [RFC][v6][PATCH 5/9]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
2009-09-10  6:11 ` [RFC][v6][PATCH 6/9]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
2009-09-10  6:12 ` [RFC][v6][PATCH 7/9]: Define do_fork_with_pids() Sukadev Bhattiprolu
     [not found]   ` <20090910061227.GF25883-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-10  7:05     ` Arnd Bergmann
2009-09-10  7:05   ` Arnd Bergmann
     [not found]     ` <200909100905.35817.arnd-r2nGTMty4D4@public.gmane.org>
2009-09-10 21:29       ` Sukadev Bhattiprolu
2009-09-10 21:29     ` Sukadev Bhattiprolu
2009-09-10  6:13 ` [RFC][v6][PATCH 8/9]: Define clone_with_pids() syscall Sukadev Bhattiprolu
2009-09-10  7:31   ` Arnd Bergmann
     [not found]     ` <200909100931.25585.arnd-r2nGTMty4D4@public.gmane.org>
2009-09-10 21:28       ` Sukadev Bhattiprolu
2009-09-10 21:28     ` Sukadev Bhattiprolu
2009-09-11 10:31       ` Arnd Bergmann
     [not found]         ` <200909111231.30495.arnd-r2nGTMty4D4@public.gmane.org>
2009-09-11 11:00           ` Louis Rilling
2009-09-11 11:00             ` Louis Rilling
2009-09-11 11:12             ` Arnd Bergmann
     [not found]             ` <20090911110056.GA12824-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2009-09-11 11:12               ` Arnd Bergmann
     [not found]       ` <20090910212837.GA31459-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-11 10:31         ` Arnd Bergmann
     [not found]   ` <20090910061301.GG25883-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-10  7:31     ` Arnd Bergmann
     [not found] ` <20090910060627.GA24343-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-10  6:11   ` [RFC][v6][PATCH 6/9]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
2009-09-10  6:12   ` [RFC][v6][PATCH 7/9]: Define do_fork_with_pids() Sukadev Bhattiprolu
2009-09-10  6:13   ` [RFC][v6][PATCH 8/9]: Define clone_with_pids() syscall Sukadev Bhattiprolu
2009-09-10  6:14   ` [RFC][v6][PATCH 9/9]: Document " Sukadev Bhattiprolu
2009-09-10  6:14 ` Sukadev Bhattiprolu
2009-09-10 15:26   ` Randy Dunlap
     [not found]     ` <20090910082659.033ab8fd.randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2009-09-10 16:31       ` Sukadev Bhattiprolu
2009-09-10 16:31     ` Sukadev Bhattiprolu
     [not found]   ` <20090910061413.GH25883-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-10 15:26     ` Randy Dunlap
2009-09-11 11:22 ` [RFC][v6][PATCH 0/9] " Peter Zijlstra
2009-09-11 11:34   ` Arnd Bergmann
2009-09-11 11:40     ` Peter Zijlstra
2009-09-11 11:50       ` Arnd Bergmann
2009-09-11 16:47     ` Sukadev Bhattiprolu
2009-09-11 17:00       ` Peter Zijlstra
2009-09-12 17:19         ` Sukadev Bhattiprolu
2009-09-13 14:36           ` Arnd Bergmann
2009-09-14  7:14           ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.