All of lore.kernel.org
 help / color / mirror / Atom feed
* Checkpoint/restart of ptys, pgids, and controlling tty
@ 2009-09-04 14:20 Oren Laadan
       [not found] ` <1252074054-22241-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Oren Laadan @ 2009-09-04 14:20 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

This post packs together the patches (6 for kernel, 3 for user) to
provide checkpoint/restart support for ptys, pgids and controlling
terminal. They apply on top of current ckpt-dev-v17 (kernel, user).

I'm not an expert on PTYs, TTYs, line disciplines and the like,
comments are mostly welcome:

* What (additional) values we need to sanitize on restart ?
* What (additional) locking should be in place ? (especially checkpoint)
* Need to save/restore echo buffer (and position) ?
* /dev/ptmx and /dev/pts/... paths are currently hardcoded
* Other security concerns ? ...

Oren.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] c/r: [objhash] add ckpt_obj_reserve()
       [not found] ` <1252074054-22241-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
@ 2009-09-04 14:20   ` Oren Laadan
  2009-09-04 14:20   ` [PATCH 2/6] c/r: [pty 1/2] allow allocation of desired pty slave Oren Laadan
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Oren Laadan @ 2009-09-04 14:20 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Use ckpt_obj_reserve() during checkpoint to reserve an objref that
will not be otherwise used for any other object in checkpoint. Such an
objref can be useful for restart to add objects that were not there at
checkpoint.

One use case for this is ptys (subsequent patches): when restoring a
master pty, we create a file pointer, which must be kept somewhere and
later cleaned up. The hash is a good place to store it.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/objhash.c       |   20 +++++++++++++++++++-
 include/linux/checkpoint.h |    1 +
 2 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/checkpoint/objhash.c b/checkpoint/objhash.c
index a9a10d1..022b405 100644
--- a/checkpoint/objhash.c
+++ b/checkpoint/objhash.c
@@ -465,6 +465,11 @@ static struct ckpt_obj *obj_find_by_objref(struct ckpt_ctx *ctx, int objref)
 	return NULL;
 }
 
+static inline int obj_alloc_objref(struct ckpt_ctx *ctx)
+{
+	return ctx->obj_hash->next_free_objref++;
+}
+
 /**
  * ckpt_obj_new - add an object to the obj_hash
  * @ctx: checkpoint context
@@ -498,7 +503,7 @@ static struct ckpt_obj *obj_new(struct ckpt_ctx *ctx, void *ptr,
 
 	if (!objref) {
 		/* use @obj->ptr to index, assign objref (checkpoint) */
-		obj->objref = ctx->obj_hash->next_free_objref++;;
+		obj->objref = obj_alloc_objref(ctx);
 		i = hash_long((unsigned long) ptr, CKPT_OBJ_HASH_NBITS);
 	} else {
 		/* use @obj->objref to index (restart) */
@@ -637,6 +642,19 @@ int ckpt_obj_lookup_add(struct ckpt_ctx *ctx, void *ptr,
 }
 
 /**
+ * ckpt_obj_reserve - reserve an objref
+ * @ctx: checkpoint context
+ *
+ * The reserved objref will not be used for subsequent objects. This
+ * gives an objref that can be safely used during restart without a
+ * matching object in checkpoint.  [used during checkpoint].
+ */
+int ckpt_obj_reserve(struct ckpt_ctx *ctx)
+{
+	return obj_alloc_objref(ctx);
+}
+
+/**
  * checkpoint_obj - if not already in hash, add object and checkpoint
  * @ctx: checkpoint context
  * @ptr: pointer to object
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 761cad5..1d050f8 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -121,6 +121,7 @@ extern void *ckpt_obj_fetch(struct ckpt_ctx *ctx, int objref,
 			    enum obj_type type);
 extern int ckpt_obj_insert(struct ckpt_ctx *ctx, void *ptr, int objref,
 			   enum obj_type type);
+extern int ckpt_obj_reserve(struct ckpt_ctx *ctx);
 
 extern void ckpt_ctx_get(struct ckpt_ctx *ctx);
 extern void ckpt_ctx_put(struct ckpt_ctx *ctx);
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/6] c/r: [pty 1/2] allow allocation of desired pty slave
       [not found] ` <1252074054-22241-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
  2009-09-04 14:20   ` [PATCH 1/6] c/r: [objhash] add ckpt_obj_reserve() Oren Laadan
@ 2009-09-04 14:20   ` Oren Laadan
       [not found]     ` <1252074054-22241-3-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
  2009-09-04 14:20   ` [PATCH 3/6] c/r: [pty 2/2] support for pseudo terminals Oren Laadan
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Oren Laadan @ 2009-09-04 14:20 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

During restart, we need to allocate pty slaves with the same
identifiers as recorded during checkpoint. Modify the allocation code
to allow an in-kernel caller to request a specific slave identifier.

For this, add a new field to task_struct - 'required_id'. It will
hold the desired identifier when restoring a (master) pty.

The code in ptmx_open() will use this value only for tasks that try to
open /dev/ptmx that are restarting (PF_RESTARTING), and if the value
isn't CKPT_REQUIRED_NONE (-1).

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/sys.c           |    7 +++++++
 drivers/char/pty.c         |    9 +++++++--
 fs/devpts/inode.c          |   10 ++++++++--
 include/linux/checkpoint.h |    2 ++
 include/linux/devpts_fs.h  |    2 +-
 include/linux/sched.h      |    1 +
 6 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/checkpoint/sys.c b/checkpoint/sys.c
index 525182a..3db18f7 100644
--- a/checkpoint/sys.c
+++ b/checkpoint/sys.c
@@ -339,6 +339,13 @@ SYSCALL_DEFINE3(restart, pid_t, pid, int, fd, unsigned long, flags)
 	return ret;
 }
 
+static int __init checkpoint_init(void)
+{
+	init_task.required_id = CKPT_REQUIRED_NONE;
+	return 0;
+}
+__initcall(checkpoint_init);
+
 
 /* 'ckpt_debug_level' controls the verbosity level of c/r code */
 #ifdef CONFIG_CHECKPOINT_DEBUG
diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index 6e6942c..326c2e9 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -633,12 +633,17 @@ static int __ptmx_open(struct inode *inode, struct file *filp)
 {
 	struct tty_struct *tty;
 	int retval;
-	int index;
+	int index = 0;
 
 	nonseekable_open(inode, filp);
 
+#ifdef CONFIG_CHECKPOINT
+	/* when restarting, request specific index */
+	if (current->flags & PF_RESTARTING)
+		index = (int) current->required_id;
+#endif
 	/* find a device that is not in use. */
-	index = devpts_new_index(inode);
+	index = devpts_new_index(inode, index);
 	if (index < 0)
 		return index;
 
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 75efb02..8921726 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -433,7 +433,7 @@ static struct file_system_type devpts_fs_type = {
  * to the System V naming convention
  */
 
-int devpts_new_index(struct inode *ptmx_inode)
+int devpts_new_index(struct inode *ptmx_inode, int req_idx)
 {
 	struct super_block *sb = pts_sb_from_inode(ptmx_inode);
 	struct pts_fs_info *fsi = DEVPTS_SB(sb);
@@ -445,7 +445,8 @@ retry:
 		return -ENOMEM;
 
 	mutex_lock(&allocated_ptys_lock);
-	ida_ret = ida_get_new(&fsi->allocated_ptys, &index);
+	index = (req_idx >= 0 ? req_idx : 0);
+	ida_ret = ida_get_new_above(&fsi->allocated_ptys, index, &index);
 	if (ida_ret < 0) {
 		mutex_unlock(&allocated_ptys_lock);
 		if (ida_ret == -EAGAIN)
@@ -453,6 +454,11 @@ retry:
 		return -EIO;
 	}
 
+	if (req_idx >= 0 && index != req_idx) {
+		ida_remove(&fsi->allocated_ptys, index);
+		mutex_unlock(&allocated_ptys_lock);
+		return -EBUSY;
+	}
 	if (index >= pty_limit) {
 		ida_remove(&fsi->allocated_ptys, index);
 		mutex_unlock(&allocated_ptys_lock);
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 1d050f8..473164e 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -46,6 +46,8 @@
 #define CHECKPOINT_USER_FLAGS		CHECKPOINT_SUBTREE
 #define RESTART_USER_FLAGS		(RESTART_TASKSELF | RESTART_FROZEN)
 
+#define CKPT_REQUIRED_NONE  ((unsigned long) -1L)
+
 extern void exit_checkpoint(struct task_struct *tsk);
 
 extern int ckpt_kwrite(struct ckpt_ctx *ctx, void *buf, int count);
diff --git a/include/linux/devpts_fs.h b/include/linux/devpts_fs.h
index 5ce0e5f..14948ee 100644
--- a/include/linux/devpts_fs.h
+++ b/include/linux/devpts_fs.h
@@ -17,7 +17,7 @@
 
 #ifdef CONFIG_UNIX98_PTYS
 
-int devpts_new_index(struct inode *ptmx_inode);
+int devpts_new_index(struct inode *ptmx_inode, int req_idx);
 void devpts_kill_index(struct inode *ptmx_inode, int idx);
 /* mknod in devpts */
 int devpts_pty_new(struct inode *ptmx_inode, struct tty_struct *tty);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0e67de7..f6f1350 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1481,6 +1481,7 @@ struct task_struct {
 #endif /* CONFIG_TRACING */
 #ifdef CONFIG_CHECKPOINT
 	struct ckpt_ctx *checkpoint_ctx;
+	unsigned long required_id;
 #endif
 };
 
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/6] c/r: [pty 2/2] support for pseudo terminals
       [not found] ` <1252074054-22241-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
  2009-09-04 14:20   ` [PATCH 1/6] c/r: [objhash] add ckpt_obj_reserve() Oren Laadan
  2009-09-04 14:20   ` [PATCH 2/6] c/r: [pty 1/2] allow allocation of desired pty slave Oren Laadan
@ 2009-09-04 14:20   ` Oren Laadan
       [not found]     ` <1252074054-22241-4-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
  2009-09-04 14:20   ` [PATCH 4/6] c/r: tighten logic to protect against bogus pids in input Oren Laadan
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Oren Laadan @ 2009-09-04 14:20 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

This patch adds support for checkpoint and restart of pseudo terminals
(PTYs). Since PTYs are shared (pointed to by file, and signal), they
are managed via objhash.

PTYs are master/slave pairs; The code arranges for the master to
always be checkpointed first, followed by the slave. This is important
since during restart both ends are created when restoring the master.

In this patch only UNIX98 style PTYs are supported.

Currently only PTYs that are referenced by open files are handled.
Thus PTYs checkpoint starts with a file in tty_file_checkpoint(). It
will first checkpoint the master and slave PTYs via tty_checkpoint(),
and then complete the saving of the file descriptor. This means that
in the image file, the order of objects is: master-tty, slave-tty,
file-desc.

During restart, to restore the master side, we open the /dev/ptmx
device and get a file handle. But at this point we don't know the
designated objref for this file, because the file is due later on in
the image stream. On the other hand, we can't just fput() the file
because it will close the PTY too.

Instead, when we checkpoint the master PTY, we _reserve_ an objref
for the file (which won't be further used in checkpoint). Then at
restart, use it to insert the file to objhash.

TODO:

* Better sanitize input from checkpoint image on restore
* Check the locking when saving/restoring tty_struct state
* Echo position/buffer isn't saved (is it needed ?)
* Handle multiple devpts mounts (namespaces)
* Paths of ptmx and slaves are hard coded (/dev/ptmx, /dev/pts/...)

Changelog[v1]:
  - Adjust include/asm/checkpoint_hdr.h for s390 architecture
  - Add NCC to kernel constants header (ckpt_hdr_const)
  - [Serge Hallyn] fix calculation of canon_datalen

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 arch/s390/include/asm/checkpoint_hdr.h |   11 +
 arch/x86/include/asm/checkpoint_hdr.h  |   11 +
 checkpoint/checkpoint.c                |    2 +
 checkpoint/files.c                     |    6 +
 checkpoint/objhash.c                   |   26 ++
 checkpoint/restart.c                   |    4 +
 drivers/char/pty.c                     |    2 +-
 drivers/char/tty_io.c                  |  515 +++++++++++++++++++++++++++++++-
 include/linux/checkpoint.h             |    5 +
 include/linux/checkpoint_hdr.h         |   78 +++++
 include/linux/tty.h                    |    6 +
 11 files changed, 664 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/checkpoint_hdr.h b/arch/s390/include/asm/checkpoint_hdr.h
index 1976355..b6ea8ce 100644
--- a/arch/s390/include/asm/checkpoint_hdr.h
+++ b/arch/s390/include/asm/checkpoint_hdr.h
@@ -83,13 +83,24 @@ struct ckpt_hdr_mm_context {
 };
 
 #define CKPT_ARCH_NSIG  64
+#define CKPT_TTY_NCC  8
+
+/* arch dependent constants */
 #ifdef __KERNEL__
+
 #include <asm/signal.h>
 #if CKPT_ARCH_NSIG != _SIGCONTEXT_NSIG
 #error CKPT_ARCH_NSIG size is wrong (asm/sigcontext.h and asm/checkpoint_hdr.h)
 #endif
+
+#include <linux/tty.h>
+#if CKPT_TTY_NCC != NCC
+#error CKPT_TTY_NCC size is wrong per asm-generic/termios.h
 #endif
 
+#endif /* __KERNEL__ */
+
+
 struct ckpt_hdr_header_arch {
 	struct ckpt_hdr h;
 };
diff --git a/arch/x86/include/asm/checkpoint_hdr.h b/arch/x86/include/asm/checkpoint_hdr.h
index 1228d1b..7a24de5 100644
--- a/arch/x86/include/asm/checkpoint_hdr.h
+++ b/arch/x86/include/asm/checkpoint_hdr.h
@@ -48,14 +48,25 @@ enum {
 	CKPT_HDR_MM_CONTEXT_LDT,
 };
 
+/* arch dependent constants */
 #define CKPT_ARCH_NSIG  64
+#define CKPT_TTY_NCC  8
+
 #ifdef __KERNEL__
+
 #include <asm/signal.h>
 #if CKPT_ARCH_NSIG != _NSIG
 #error CKPT_ARCH_NSIG size is wrong per asm/signal.h and asm/checkpoint_hdr.h
 #endif
+
+#include <linux/tty.h>
+#if CKPT_TTY_NCC != NCC
+#error CKPT_TTY_NCC size is wrong per asm-generic/termios.h
 #endif
 
+#endif /* __KERNEL__ */
+
+
 struct ckpt_hdr_header_arch {
 	struct ckpt_hdr h;
 	/* FIXME: add HAVE_HWFP */
diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index c19f812..fc26d2b 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -201,6 +201,8 @@ static void fill_kernel_const(struct ckpt_constants *h)
 	h->uts_domainname_len = sizeof(uts->domainname);
 	/* rlimit */
 	h->rlimit_nlimits = RLIM_NLIMITS;
+	/* tty */
+	h->tty_termios_ncc = NCC;
 }
 
 /* write the checkpoint header */
diff --git a/checkpoint/files.c b/checkpoint/files.c
index 204055b..b51324a 100644
--- a/checkpoint/files.c
+++ b/checkpoint/files.c
@@ -572,6 +572,12 @@ static struct restore_file_ops restore_file_ops[] = {
 		.file_type = CKPT_FILE_SOCKET,
 		.restore = sock_file_restore,
 	},
+	/* tty */
+	{
+		.file_name = "TTY",
+		.file_type = CKPT_FILE_TTY,
+		.restore = tty_file_restore,
+	},
 };
 
 static struct file *do_restore_file(struct ckpt_ctx *ctx)
diff --git a/checkpoint/objhash.c b/checkpoint/objhash.c
index 022b405..e9ea176 100644
--- a/checkpoint/objhash.c
+++ b/checkpoint/objhash.c
@@ -246,6 +246,22 @@ static void obj_sock_drop(void *ptr)
 	sock_put((struct sock *) ptr);
 }
 
+static int obj_tty_grab(void *ptr)
+{
+	tty_kref_get((struct tty_struct *) ptr);
+	return 0;
+}
+
+static void obj_tty_drop(void *ptr)
+{
+	tty_kref_put((struct tty_struct *) ptr);
+}
+
+static int obj_tty_users(void *ptr)
+{
+	return atomic_read(&((struct tty_struct *) ptr)->kref.refcount);
+}
+
 static struct ckpt_obj_ops ckpt_obj_ops[] = {
 	/* ignored object */
 	{
@@ -382,6 +398,16 @@ static struct ckpt_obj_ops ckpt_obj_ops[] = {
 		.ref_drop = obj_sock_drop,
 		.ref_grab = obj_sock_grab,
 	},
+	/* struct tty_struct */
+	{
+		.obj_name = "TTY",
+		.obj_type = CKPT_OBJ_TTY,
+		.ref_drop = obj_tty_drop,
+		.ref_grab = obj_tty_grab,
+		.ref_users = obj_tty_users,
+		.checkpoint = checkpoint_tty,
+		.restore = restore_tty,
+	},
 };
 
 
diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index ed42b4b..2282ffc 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -19,6 +19,7 @@
 #include <linux/freezer.h>
 #include <linux/magic.h>
 #include <linux/utsname.h>
+#include <linux/termios.h>
 #include <asm/syscall.h>
 #include <linux/elf.h>
 #include <linux/checkpoint.h>
@@ -374,6 +375,9 @@ static int check_kernel_const(struct ckpt_constants *h)
 	/* rlimit */
 	if (h->rlimit_nlimits != RLIM_NLIMITS)
 		return -EINVAL;
+	/* tty */
+	if (h->tty_termios_ncc != NCC)
+		return -EINVAL;
 
 	return 0;
 }
diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index 326c2e9..afdab5e 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -633,7 +633,7 @@ static int __ptmx_open(struct inode *inode, struct file *filp)
 {
 	struct tty_struct *tty;
 	int retval;
-	int index = 0;
+	int index = -1;
 
 	nonseekable_open(inode, filp);
 
diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index a3afa0c..b8f8d79 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -106,6 +106,7 @@
 
 #include <linux/kmod.h>
 #include <linux/nsproxy.h>
+#include <linux/checkpoint.h>
 
 #undef TTY_DEBUG_HANGUP
 
@@ -151,6 +152,13 @@ static long tty_compat_ioctl(struct file *file, unsigned int cmd,
 #define tty_compat_ioctl NULL
 #endif
 static int tty_fasync(int fd, struct file *filp, int on);
+#ifdef CONFIG_CHECKPOINT
+static int tty_file_checkpoint(struct ckpt_ctx *ctx, struct file *file);
+static int tty_file_collect(struct ckpt_ctx *ctx, struct file *file);
+#else
+#define tty_file_checkpoint NULL
+#define tty_file_collect NULL
+#endif /* CONFIG_CHECKPOINT */
 static void release_tty(struct tty_struct *tty, int idx);
 static void __proc_set_tty(struct task_struct *tsk, struct tty_struct *tty);
 static void proc_set_tty(struct task_struct *tsk, struct tty_struct *tty);
@@ -417,6 +425,8 @@ static const struct file_operations tty_fops = {
 	.open		= tty_open,
 	.release	= tty_release,
 	.fasync		= tty_fasync,
+	.checkpoint	= tty_file_checkpoint,
+	.collect	= tty_file_collect,
 };
 
 static const struct file_operations console_fops = {
@@ -439,6 +449,8 @@ static const struct file_operations hung_up_tty_fops = {
 	.unlocked_ioctl	= hung_up_tty_ioctl,
 	.compat_ioctl	= hung_up_tty_compat_ioctl,
 	.release	= tty_release,
+	.checkpoint	= tty_file_checkpoint,
+	.collect	= tty_file_collect,
 };
 
 static DEFINE_SPINLOCK(redirect_lock);
@@ -570,7 +582,9 @@ static void do_tty_hangup(struct work_struct *work)
 	set_bit(TTY_HUPPED, &tty->flags);
 	spin_unlock_irqrestore(&tty->ctrl_lock, flags);
 
-	/* Account for the p->signal references we killed */
+	/* Account
+
+	   for the p->signal references we killed */
 	while (refs--)
 		tty_kref_put(tty);
 
@@ -2586,6 +2600,505 @@ static long tty_compat_ioctl(struct file *file, unsigned int cmd,
 }
 #endif
 
+#ifdef CONFIG_CHECKPOINT
+static int tty_can_checkpoint(struct ckpt_ctx *ctx, struct tty_struct *tty)
+{
+	/* only support pty driver */
+	if (tty->driver->type != TTY_DRIVER_TYPE_PTY) {
+		ckpt_write_err(ctx, "unsupported tty type %d",
+			       tty->driver->type);
+		return 0;
+	}
+	/* only support unix98 style */
+	if (tty->driver->major != UNIX98_PTY_MASTER_MAJOR &&
+	    tty->driver->major != UNIX98_PTY_SLAVE_MAJOR) {
+		ckpt_write_err(ctx, "unsupported legacy pty");
+		return 0;
+	}
+	/* only support n_tty ldisc */
+	if (tty->ldisc->ops->num != N_TTY) {
+		ckpt_write_err(ctx, "unsupported ldisc %d",
+			       tty->ldisc->ops->num);
+		return 0;
+	}
+
+	return 1;
+}
+
+static int tty_file_checkpoint(struct ckpt_ctx *ctx, struct file *file)
+{
+	struct ckpt_hdr_file_tty *h;
+	struct tty_struct *tty, *real_tty;
+	struct inode *inode;
+	int master_objref, slave_objref;
+	int ret;
+
+	tty = (struct tty_struct *)file->private_data;
+	inode = file->f_path.dentry->d_inode;
+	if (tty_paranoia_check(tty, inode, "tty_file_checkpoint"))
+		return -EIO;
+
+	if (!tty_can_checkpoint(ctx, tty))
+		return -ENOSYS;
+
+	/*
+	 * If we ever support more than PTYs, this would be tty-type
+	 * specific (and probably called via tty_operations).
+	 */
+
+	real_tty = tty_pair_get_tty(tty);
+	ckpt_debug("tty: %p, real_tty: %p\n", tty, real_tty);
+
+	master_objref = checkpoint_obj(ctx, real_tty->link, CKPT_OBJ_TTY);
+	if (master_objref < 0)
+		return master_objref;
+	slave_objref = checkpoint_obj(ctx, real_tty, CKPT_OBJ_TTY);
+	if (slave_objref < 0)
+		return slave_objref;
+	ckpt_debug("master %p %d, slave %p %d\n",
+		   real_tty->link, master_objref, real_tty, slave_objref);
+
+	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_FILE);
+	if (!h)
+		return -ENOMEM;
+
+	h->common.f_type = CKPT_FILE_TTY;
+	h->tty_objref = (tty == real_tty ? slave_objref : master_objref);
+
+	ret = checkpoint_file_common(ctx, file, &h->common);
+	if (!ret)
+		ret = ckpt_write_obj(ctx, &h->common.h);
+
+	ckpt_hdr_put(ctx, h);
+	return ret;
+}
+
+static int tty_file_collect(struct ckpt_ctx *ctx, struct file *file)
+{
+	struct tty_struct *tty;
+	struct inode *inode;
+	int ret;
+
+	tty = (struct tty_struct *)file->private_data;
+	inode = file->f_path.dentry->d_inode;
+	if (tty_paranoia_check(tty, inode, "tty_collect"))
+		return -EIO;
+
+	if (!tty_can_checkpoint(ctx, tty))
+		return -ENOSYS;
+
+	ckpt_debug("collecting tty: %p\n", tty);
+	ret = ckpt_obj_collect(ctx, tty, CKPT_OBJ_TTY);
+	if (ret < 0)
+		return ret;
+
+	if (tty->driver->subtype == PTY_TYPE_MASTER) {
+		if (!tty->link) {
+			ckpt_write_err(ctx, "missing tty->link\n");
+			return -EIO;
+		}
+		ckpt_debug("collecting slave tty: %p\n", tty->link);
+		ret = ckpt_obj_collect(ctx, tty->link, CKPT_OBJ_TTY);
+	}
+
+	return ret;
+}
+
+#define CKPT_LDISC_BAD   (1 << TTY_LDISC_CHANGING)
+#define CKPT_LDISC_GOOD  ((1 << TTY_LDISC_OPEN) | (1 << TTY_LDISC))
+#define CKPT_LDISC_FLAGS (CKPT_LDISC_GOOD | CKPT_LDISC_BAD)
+
+static int checkpoint_tty_ldisc(struct ckpt_ctx *ctx, struct tty_struct *tty)
+{
+	struct ckpt_hdr_ldisc_n_tty *h;
+	int datalen, read_tail;
+	int n, ret;
+
+	/* shouldn't reach here unless ldisc is n_tty */
+	BUG_ON(tty->ldisc->ops->num != N_TTY);
+
+	if ((tty->flags & CKPT_LDISC_FLAGS) != CKPT_LDISC_GOOD) {
+		ckpt_write_err(ctx, "bad ldisc flags $#lx\n", tty->flags);
+		return -EBUSY;
+	}
+
+	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_TTY_LDISC);
+	if (!h)
+		return -ENOMEM;
+
+	spin_lock_irq(&tty->read_lock);
+	h->column = tty->column;
+	h->datalen = tty->read_cnt;
+	h->canon_column = tty->canon_column;
+	h->canon_datalen = tty->canon_head;
+	if (tty->canon_head > tty->read_tail)
+		h->canon_datalen -= tty->read_tail;
+	else
+		h->canon_datalen += N_TTY_BUF_SIZE - tty->read_tail;
+	h->canon_data = tty->canon_data;
+
+	datalen = tty->read_cnt;
+	read_tail = tty->read_tail;
+	spin_unlock_irq(&tty->read_lock);
+
+	h->minimum_to_wake = tty->minimum_to_wake;
+
+	h->stopped = tty->stopped;
+	h->hw_stopped = tty->hw_stopped;
+	h->flow_stopped = tty->flow_stopped;
+	h->packet = tty->packet;
+	h->ctrl_status = tty->ctrl_status;
+	h->lnext = tty->lnext;
+	h->erasing = tty->erasing;
+	h->raw = tty->raw;
+	h->real_raw = tty->real_raw;
+	h->icanon = tty->icanon;
+	h->closing = tty->closing;
+
+	BUILD_BUG_ON(sizeof(h->read_flags) != sizeof(tty->read_flags));
+	memcpy(h->read_flags, tty->read_flags, sizeof(tty->read_flags));
+
+	ret = ckpt_write_obj(ctx, &h->h);
+	ckpt_hdr_put(ctx, h);
+	if (ret < 0)
+		return ret;
+
+	ckpt_debug("datalen %d\n", datalen);
+	if (datalen) {
+		ret = ckpt_write_buffer(ctx, NULL, datalen);
+		if (ret < 0)
+			return ret;
+		n = min(datalen, N_TTY_BUF_SIZE - read_tail);
+		ret = ckpt_kwrite(ctx, &tty->read_buf[read_tail], n);
+		if (ret < 0)
+			return ret;
+		n = datalen - n;
+		ret = ckpt_kwrite(ctx, tty->read_buf, n);
+	}
+
+	return ret;
+}
+
+#define CKPT_TTY_BAD   ((1 << TTY_CLOSING) | (1 << TTY_FLUSHING))
+#define CKPT_TTY_GOOD  0
+
+static int do_checkpoint_tty(struct ckpt_ctx *ctx, struct tty_struct *tty)
+{
+	struct ckpt_hdr_tty *h;
+	int link_objref;
+	int master = 0;
+	int ret;
+
+	if ((tty->flags & (CKPT_TTY_BAD | CKPT_TTY_GOOD)) != CKPT_TTY_GOOD) {
+		ckpt_write_err(ctx, "bad tty flags %#lx\n", tty->flags);
+		return -EBUSY;
+	}
+
+	/*
+	 * If we ever support more than PTYs, this would be tty-type
+	 * specific (and probably called via tty_operations).
+	 */
+	link_objref = ckpt_obj_lookup(ctx, tty->link, CKPT_OBJ_TTY);
+
+	if (tty->driver->subtype == PTY_TYPE_MASTER)
+		master = 1;
+
+	/* tty is master if-and-only-if link_objref is zero */
+	BUG_ON(master ^ !link_objref);
+
+	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_TTY);
+	if (!h)
+		return -ENOMEM;
+
+	h->driver_type = tty->driver->type;
+	h->driver_subtype = tty->driver->subtype;
+
+	h->link_objref = link_objref;
+
+	/* if master, reserve an objref (see do_restore_tty) */
+	h->file_objref = (master ? ckpt_obj_reserve(ctx) : 0);
+	ckpt_debug("link %d file %d\n", h->link_objref, h->file_objref);
+
+	h->index = tty->index;
+	h->ldisc = tty->ldisc->ops->num;
+	h->flags = tty->flags;
+
+	spin_lock_irq(&tty->ctrl_lock);
+	if (tty->session)
+		h->session = pid_vnr(tty->session);
+	if (tty->pgrp)
+		h->pgrp = pid_vnr(tty->pgrp);
+	spin_unlock_irq(&tty->ctrl_lock);
+	BUG_ON(h->session < 0);
+	BUG_ON(h->pgrp < 0);
+
+	mutex_lock(&tty->termios_mutex);
+	h->termios.c_line = tty->termios->c_line;
+	h->termios.c_iflag = tty->termios->c_iflag;
+	h->termios.c_oflag = tty->termios->c_oflag;
+	h->termios.c_cflag = tty->termios->c_cflag;
+	h->termios.c_lflag = tty->termios->c_lflag;
+	memcpy(h->termios.c_cc, tty->termios->c_cc, NCC);
+	h->winsize.ws_row = tty->winsize.ws_row;
+	h->winsize.ws_col = tty->winsize.ws_col;
+	h->winsize.ws_ypixel = tty->winsize.ws_ypixel;
+	h->winsize.ws_xpixel = tty->winsize.ws_xpixel;
+	mutex_unlock(&tty->termios_mutex);
+
+	ret  = ckpt_write_obj(ctx, &h->h);
+	ckpt_hdr_put(ctx, h);
+	if (ret < 0)
+		return ret;
+
+	/* save line discipline data (also writes buffer) */
+	if (!test_bit(TTY_HUPPED, &tty->flags))
+		ret = checkpoint_tty_ldisc(ctx, tty);
+
+	return ret;
+}
+
+int checkpoint_tty(struct ckpt_ctx *ctx, void *ptr)
+{
+	return do_checkpoint_tty(ctx, (struct tty_struct *) ptr);
+}
+
+struct file *tty_file_restore(struct ckpt_ctx *ctx, struct ckpt_hdr_file *ptr)
+{
+	struct ckpt_hdr_file_tty *h = (struct ckpt_hdr_file_tty *) ptr;
+	struct tty_struct *tty;
+	struct file *file;
+	char slavepath[16];	/* "/dev/pts/###" */
+	int slavelock;
+	int ret;
+
+	if (ptr->h.type != CKPT_HDR_FILE ||
+	    ptr->h.len != sizeof(*h) || ptr->f_type != CKPT_FILE_TTY)
+		return ERR_PTR(-EINVAL);
+
+	if (h->tty_objref <= 0)
+		return ERR_PTR(-EINVAL);
+
+	tty = ckpt_obj_fetch(ctx, h->tty_objref, CKPT_OBJ_TTY);
+	ckpt_debug("tty %p objref %d\n", tty, h->tty_objref);
+
+	/* at this point the tty should have been restore already */
+	if (IS_ERR(tty))
+		return (struct file *) tty;
+
+	/*
+	 * If we ever support more than PTYs, this would be tty-type
+	 * specific (and probably called via tty_operations).
+	 */
+
+	/*
+	 * If this tty is master, get the corresponding file from
+	 * tty->tty_file. Otherwise, open the slave device.
+	 */
+	if (tty->driver->subtype == PTY_TYPE_MASTER) {
+		file_list_lock();
+		file = list_first_entry(&tty->tty_files,
+					typeof(*file), f_u.fu_list);
+		file_list_unlock();
+		ckpt_debug("master file %p\n", file);
+		get_file(file);
+	} else {
+		sprintf(slavepath, "/dev/pts/%d", tty->index);
+		slavelock = test_bit(TTY_PTY_LOCK, &tty->link->flags);
+		clear_bit(TTY_PTY_LOCK, &tty->link->flags);
+		file = filp_open(slavepath, O_RDWR | O_NOCTTY, 0);
+		ckpt_debug("slave file %p (idnex %d)\n", file, tty->index);
+		if (IS_ERR(file))
+			return file;
+		if (slavelock)
+			set_bit(TTY_PTY_LOCK, &tty->link->flags);
+	}
+
+	ret = restore_file_common(ctx, file, ptr);
+	if (ret < 0) {
+		fput(file);
+		file = ERR_PTR(ret);
+	}
+
+	return file;
+}
+
+static int restore_tty_ldisc(struct ckpt_ctx *ctx,
+			     struct tty_struct *tty,
+			     struct ckpt_hdr_tty *hh)
+{
+	struct ckpt_hdr_ldisc_n_tty *h;
+	int ret;
+
+	h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_TTY_LDISC);
+	if (IS_ERR(h))
+		return PTR_ERR(h);
+
+	/* this is unfair shortcut, because we know ldisc is n_tty */
+	if (hh->ldisc != N_TTY)
+		return -EINVAL;
+	if ((hh->flags & CKPT_LDISC_FLAGS) != CKPT_LDISC_GOOD)
+		return -EINVAL;
+
+	if (h->datalen > N_TTY_BUF_SIZE)
+		return -EINVAL;
+	if (h->canon_datalen > N_TTY_BUF_SIZE)
+		return -EINVAL;
+
+	if (h->datalen) {
+		ret = _ckpt_read_buffer(ctx, tty->read_buf, h->datalen);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* TODO: sanitize all these values ? */
+
+	spin_lock_irq(&tty->read_lock);
+	tty->column = h->column;
+	tty->read_cnt = h->datalen;
+	tty->read_head = h->datalen;
+	tty->read_tail = 0;
+	tty->canon_column = h->canon_column;
+	tty->canon_head = h->canon_datalen;
+	tty->canon_data = h->canon_data;
+	spin_unlock_irq(&tty->read_lock);
+
+	tty->minimum_to_wake = h->minimum_to_wake;
+
+	tty->stopped = h->stopped;
+	tty->hw_stopped = h->hw_stopped;
+	tty->flow_stopped = h->flow_stopped;
+	tty->packet = h->packet;
+	tty->ctrl_status = h->ctrl_status;
+	tty->lnext = h->lnext;
+	tty->erasing = h->erasing;
+	tty->raw = h->raw;
+	tty->real_raw = h->real_raw;
+	tty->icanon = h->icanon;
+	tty->closing = h->closing;
+
+	BUILD_BUG_ON(sizeof(h->read_flags) != sizeof(tty->read_flags));
+	memcpy(tty->read_flags, h->read_flags, sizeof(tty->read_flags));
+
+	return 0;
+}
+
+#define CKPT_PTMX_PATH  "/dev/ptmx"
+
+static struct tty_struct *do_restore_tty(struct ckpt_ctx *ctx)
+{
+	struct ckpt_hdr_tty *h;
+	struct tty_struct *tty = ERR_PTR(-EINVAL);
+	struct file *file = NULL;
+	int master, ret;
+
+	h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_TTY);
+	if (IS_ERR(h))
+		return (struct tty_struct *) h;
+
+	if (h->driver_type != TTY_DRIVER_TYPE_PTY)
+		goto out;
+	if (h->driver_subtype == PTY_TYPE_MASTER)
+		master = 1;
+	else if (h->driver_subtype == PTY_TYPE_SLAVE)
+		master = 0;
+	else
+		goto out;
+	/* @link_object is positive if-and-only-if tty is not master */
+	if (h->link_objref < 0 || (master ^ !h->link_objref))
+		goto out;
+	/* @file_object is positive if-and-only-if tty is master */
+	if (h->file_objref < 0 || (master ^ !!h->file_objref))
+		goto out;
+	if (h->session < 0 || h->pgrp < 0)
+		goto out;
+	if (h->flags & CKPT_TTY_BAD)
+		goto out;
+	/* hung-up tty cannot be master, or have session or pgrp */
+	if (test_bit(TTY_HUPPED, (unsigned long *) &h->flags) &&
+	    (master || h->session || h->pgrp))
+		goto out;
+
+	ckpt_debug("sanity checks passed, link %d\n", h->link_objref);
+
+	/*
+	 * If we ever support more than PTYs, this would be tty-type
+	 * specific (and probably called via tty_operations).
+	 */
+	if (master) {
+		current->required_id = h->index;
+		file = filp_open(CKPT_PTMX_PATH, O_RDWR | O_NOCTTY, 0);
+		current->required_id = CKPT_REQUIRED_NONE;
+		ckpt_debug("master file %p (obj %d)\n", file, h->file_objref);
+		if (IS_ERR(file)) {
+			tty = (struct tty_struct *) file;
+			goto out;
+		}
+
+		/*
+		 * Add file to objhash to ensure proper cleanup later
+		 * (it isn't referenced elsewhere). Use h->file_objref
+		 * which was explicitly during checkpoint for this.
+		 */
+		ret = ckpt_obj_insert(ctx, file, h->file_objref, CKPT_OBJ_FILE);
+		if (ret < 0) {
+			tty = ERR_PTR(ret);
+			fput(file);
+			goto out;
+		}
+
+		tty = file->private_data;
+	} else {
+		tty = ckpt_obj_fetch(ctx, h->link_objref, CKPT_OBJ_TTY);
+		if (IS_ERR(tty))
+			goto out;
+		tty = tty->link;
+	}
+
+	ckpt_debug("tty %p (hup %d)\n",
+		   tty, test_bit(TTY_HUPPED, (unsigned long *) &h->flags));
+
+	/* we now have the desired tty: restore its state as per @h */
+
+	spin_lock_irq(&tty->ctrl_lock);
+	tty->session = find_get_pid(h->session);
+	tty->pgrp = find_get_pid(h->pgrp);
+	spin_unlock_irq(&tty->ctrl_lock);
+
+	mutex_lock(&tty->termios_mutex);
+	tty->termios->c_line = h->termios.c_line;
+	tty->termios->c_iflag = h->termios.c_iflag;
+	tty->termios->c_oflag = h->termios.c_oflag;
+	tty->termios->c_cflag = h->termios.c_cflag;
+	tty->termios->c_lflag = h->termios.c_lflag;
+	memcpy(tty->termios->c_cc, h->termios.c_cc, NCC);
+	tty->winsize.ws_row = h->winsize.ws_row;
+	tty->winsize.ws_col = h->winsize.ws_col;
+	tty->winsize.ws_ypixel = h->winsize.ws_ypixel;
+	tty->winsize.ws_xpixel = h->winsize.ws_xpixel;
+	mutex_unlock(&tty->termios_mutex);
+
+	if (test_bit(TTY_HUPPED, (unsigned long *) &h->flags))
+		tty_vhangup(tty);
+	else {
+		ret = restore_tty_ldisc(ctx, tty, h);
+		if (ret < 0) {
+			tty = ERR_PTR(ret);
+			goto out;
+		}
+	}
+
+	tty_kref_get(tty);
+ out:
+	ckpt_hdr_put(ctx, h);
+	return tty;
+}
+
+void *restore_tty(struct ckpt_ctx *ctx)
+{
+	return (void *) do_restore_tty(ctx);
+}
+#endif /* COFNIG_CHECKPOINT */
+
 /*
  * This implements the "Secure Attention Key" ---  the idea is to
  * prevent trojan horses by killing all processes associated with this
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 473164e..5e90ef9 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -265,6 +265,11 @@ extern int restore_obj_signal(struct ckpt_ctx *ctx, int signal_objref);
 extern int checkpoint_task_signal(struct ckpt_ctx *ctx, struct task_struct *t);
 extern int restore_task_signal(struct ckpt_ctx *ctx);
 
+/* ttys */
+extern int checkpoint_tty(struct ckpt_ctx *ctx, void *ptr);
+extern void *restore_tty(struct ckpt_ctx *ctx);
+
+
 /* useful macros to copy fields and buffers to/from ckpt_hdr_xxx structures */
 #define CKPT_CPT 1
 #define CKPT_RST 2
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 06bc6e2..c0eec49 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -78,6 +78,8 @@ enum {
 	CKPT_HDR_FILE_NAME,
 	CKPT_HDR_FILE,
 	CKPT_HDR_PIPE_BUF,
+	CKPT_HDR_TTY,
+	CKPT_HDR_TTY_LDISC,
 
 	CKPT_HDR_MM = 401,
 	CKPT_HDR_VMA,
@@ -136,6 +138,7 @@ enum obj_type {
 	CKPT_OBJ_USER,
 	CKPT_OBJ_GROUPINFO,
 	CKPT_OBJ_SOCK,
+	CKPT_OBJ_TTY,
 	CKPT_OBJ_MAX
 };
 
@@ -156,6 +159,8 @@ struct ckpt_constants {
 	__u16 uts_domainname_len;
 	/* rlimit */
 	__u16 rlimit_nlimits;
+	/* tty */
+	__u16 tty_termios_ncc;
 } __attribute__((aligned(8)));
 
 /* checkpoint image header */
@@ -344,6 +349,7 @@ enum file_type {
 	CKPT_FILE_PIPE,
 	CKPT_FILE_FIFO,
 	CKPT_FILE_SOCKET,
+	CKPT_FILE_TTY,
 	CKPT_FILE_MAX
 };
 
@@ -638,6 +644,78 @@ struct ckpt_hdr_ipc_sem {
 } __attribute__((aligned(8)));
 
 
+/* devices */
+struct ckpt_hdr_file_tty {
+	struct ckpt_hdr_file common;
+	__s32 tty_objref;
+};
+
+struct ckpt_hdr_tty {
+	struct ckpt_hdr h;
+
+	__u16 driver_type;
+	__u16 driver_subtype;
+
+	__s32 link_objref;
+	__s32 file_objref;
+	__u32 _padding;
+
+	__u32 index;
+	__u32 ldisc;
+	__u64 flags;
+
+	__s32 session;
+	__s32 pgrp;
+
+	/* termios */
+	struct {
+		__u16 c_iflag;
+		__u16 c_oflag;
+		__u16 c_cflag;
+		__u16 c_lflag;
+		__u8 c_line;
+		__u8 c_cc[CKPT_TTY_NCC];
+	} __attribute__((aligned(8))) termios;
+
+	/* winsize */
+	struct {
+		__u16 ws_row;
+		__u16 ws_col;
+		__u16 ws_xpixel;
+		__u16 ws_ypixel;
+	} __attribute__((aligned(8))) winsize;
+} __attribute__((aligned(8)));
+
+struct ckpt_hdr_ldisc_n_tty {
+	struct ckpt_hdr h;
+
+	__u32 column;
+	__u32 datalen;
+	__u32 canon_column;
+	__u32 canon_datalen;
+	__u32 canon_data;
+
+	__u16 minimum_to_wake;
+
+	__u8 stopped;
+	__u8 hw_stopped;
+	__u8 flow_stopped;
+	__u8 packet;
+	__u8 ctrl_status;
+	__u8 lnext;
+	__u8 erasing;
+	__u8 raw;
+	__u8 real_raw;
+	__u8 icanon;
+	__u8 closing;
+	__u8 padding[3];
+
+	__u8 read_flags[N_TTY_BUF_SIZE / 8];
+
+	/* if @datalen > 0, buffer contents follow (next object) */
+} __attribute__((aligned(8)));
+
+
 #define CKPT_TST_OVERFLOW_16(a, b) \
 	((sizeof(a) > sizeof(b)) && ((a) > SHORT_MAX))
 
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 1488d8c..fd77894 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -467,6 +467,12 @@ extern void tty_ldisc_begin(void);
 /* This last one is just for the tty layer internals and shouldn't be used elsewhere */
 extern void tty_ldisc_enable(struct tty_struct *tty);
 
+#ifdef CONFIG_CHECKPOINT
+struct ckpt_ctx;
+struct ckpt_hdr_file;
+extern struct file *tty_file_restore(struct ckpt_ctx *ctx,
+				     struct ckpt_hdr_file *ptr);
+#endif
 
 /* n_tty.c */
 extern struct tty_ldisc_ops tty_ldisc_N_TTY;
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/6] c/r: tighten logic to protect against bogus pids in input
       [not found] ` <1252074054-22241-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
                     ` (2 preceding siblings ...)
  2009-09-04 14:20   ` [PATCH 3/6] c/r: [pty 2/2] support for pseudo terminals Oren Laadan
@ 2009-09-04 14:20   ` Oren Laadan
       [not found]     ` <1252074054-22241-5-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
  2009-09-04 14:20   ` [PATCH 1/3] mktree: add support to ghost tasks (TASK_GHOST) Oren Laadan
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Oren Laadan @ 2009-09-04 14:20 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

During restart, each process that completes its restore will wake up
the next process in line, using the pid provided in the image file.
A malicious user could provide bogus pids and cause arbitrary tasks to
be woken up.

This patch tightens the checks and requires that to wake-up a task by
pid during restart, the target task must belong to (have) the same
restart context (task->checkpoint_ctx) as the current process.

Also, update in-code comments about restart and zombie logic.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/restart.c |   35 ++++++++++++++++++++---------------
 1 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index 2282ffc..840b5cb 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -462,12 +462,12 @@ static int restore_read_tree(struct ckpt_ctx *ctx)
 		return PTR_ERR(h);
 
 	ret = -EINVAL;
-	if (h->nr_tasks < 0)
+	if (h->nr_tasks <= 0)
 		goto out;
 
 	ctx->nr_pids = h->nr_tasks;
 	size = sizeof(*ctx->pids_arr) * ctx->nr_pids;
-	if (size < 0)		/* overflow ? */
+	if (size <= 0)		/* overflow ? */
 		goto out;
 
 	ctx->pids_arr = kmalloc(size, GFP_KERNEL);
@@ -520,8 +520,11 @@ int ckpt_activate_next(struct ckpt_ctx *ctx)
 
 	rcu_read_lock();
 	task = find_task_by_pid_ns(pid, ctx->root_nsproxy->pid_ns);
-	if (task)
+	/* target task must have same restart context */
+	if (task && task->checkpoint_ctx == ctx)
 		wake_up_process(task);
+	else
+		task = NULL;
 	rcu_read_unlock();
 
 	if (!task) {
@@ -568,9 +571,8 @@ static int do_restore_task(void)
 	int ret;
 
 	/*
-	 * Wait for coordinator to make become visible, then grab a
-	 * reference to its restart context. If we're the last task to
-	 * do it, notify the coordinator.
+	 * Wait for coordinator to become visible, then grab a
+	 * reference to its restart context.
 	 */
 	ret = wait_event_interruptible(waitq, current->checkpoint_ctx);
 	if (ret < 0)
@@ -610,8 +612,9 @@ static int do_restore_task(void)
 		goto out;
 
 	/*
-	 * zombie: we're done here; Save @ctx on task_struct, to be
-	 * used to ckpt_activate_next(), and released, from do_exit().
+	 * zombie: we're done here; do_exit() will notice the @ctx on
+	 * our current->checkpoint_ctx (and our PF_RESTARTING) - it
+	 * will call ckpt_activate_next() and release the @ctx.
 	 */
 	if (ret)
 		do_exit(current->exit_code);
@@ -638,11 +641,11 @@ static int do_restore_task(void)
 }
 
 /**
- * prepare_descendants - set ->restart_tsk of all descendants
+ * prepare_descendants - set ->checkpoint_ctx of all descendants
  * @ctx: checkpoint context
  * @root: root process for restart
  *
- * Called by the coodinator to set the ->restart_tsk pointer of the
+ * Called by the coodinator to set the ->checkpoint_ctx pointer of the
  * root task and all its descendants.
  */
 static int prepare_descendants(struct ckpt_ctx *ctx, struct task_struct *root)
@@ -662,7 +665,7 @@ static int prepare_descendants(struct ckpt_ctx *ctx, struct task_struct *root)
 			break;
 		}
 		/*
-		 * Set task->restart_tsk of all non-zombie descendants.
+		 * Set task->checkpoint_ctx of all non-zombie descendants.
 		 * If a descendant already has a ->checkpoint_ctx, it
 		 * must be a coordinator (for a different restart ?) so
 		 * we fail.
@@ -727,6 +730,7 @@ static int wait_all_tasks_finish(struct ckpt_ctx *ctx)
 
 	init_completion(&ctx->complete);
 
+	BUG_ON(ctx->active_pid != -1);
 	ret = ckpt_activate_next(ctx);
 	if (ret < 0)
 		return ret;
@@ -839,7 +843,7 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
 		if (ret < 0)
 			goto out;
 	} else {
-		/* prepare descendants' t->restart_tsk point to coord */
+		/* prepare descendants' t->checkpoint_ctx point to coord */
 		ret = prepare_descendants(ctx, ctx->root_task);
 		if (ret < 0)
 			goto out;
@@ -970,11 +974,12 @@ void exit_checkpoint(struct task_struct *tsk)
 {
 	struct ckpt_ctx *ctx;
 
-	ctx = tsk->checkpoint_ctx;
-	tsk->checkpoint_ctx = NULL;
+	/* no one else will touch this, because @tsk is dead already */
+	ctx = xchg(&tsk->checkpoint_ctx, NULL);
 
-	/* restarting zombies will acrivate next task in restart */
+	/* restarting zombies will activate next task in restart */
 	if (tsk->flags & PF_RESTARTING) {
+		BUG_ON(ctx->active_pid == -1);
 		if (ckpt_activate_next(ctx) < 0)
 			pr_warning("c/r: [%d] failed zombie exit\n", tsk->pid);
 	}
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 1/3] mktree: add support to ghost tasks (TASK_GHOST)
       [not found] ` <1252074054-22241-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
                     ` (3 preceding siblings ...)
  2009-09-04 14:20   ` [PATCH 4/6] c/r: tighten logic to protect against bogus pids in input Oren Laadan
@ 2009-09-04 14:20   ` Oren Laadan
  2009-09-04 14:20   ` [PATCH 2/3] test/pgrp.c: add test case for process-groups Oren Laadan
  2009-09-04 14:20   ` [PATCH 3/3] restart: correctly handle pgid/ppid/sid = 0 Oren Laadan
  6 siblings, 0 replies; 22+ messages in thread
From: Oren Laadan @ 2009-09-04 14:20 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 restart.c |  123 +++++++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 80 insertions(+), 43 deletions(-)

diff --git a/restart.c b/restart.c
index 2d8d796..a7f9d22 100644
--- a/restart.c
+++ b/restart.c
@@ -144,12 +144,13 @@ struct task {
 	pid_t real_parent;	/* pid of task's real parent */
 };
 
-#define TASK_ROOT	0x1	/* */
-#define TASK_DEAD	0x2	/* */
-#define TASK_THREAD	0x4	/* */
-#define TASK_SIBLING	0x8	/* */
-#define TASK_SESSION	0x10	/* */
-#define TASK_NEWPID	0x20	/* */
+#define TASK_ROOT	0x1	/* root task */
+#define TASK_GHOST	0x2	/* dead task (pid used as sid/pgid) */
+#define TASK_THREAD	0x4	/* thread (non leader) */
+#define TASK_SIBLING	0x8	/* creator's sibling (use CLONE_PARENT) */
+#define TASK_SESSION	0x10	/* inherits creator's original sid */
+#define TASK_NEWPID	0x20	/* starts a new pid namespace */
+#define TASK_DEAD	0x40	/* dead task (dummy) */
 
 struct ckpt_ctx {
 	pid_t init_pid;
@@ -843,10 +844,12 @@ static int ckpt_build_tree(struct ckpt_ctx *ctx)
 			ckpt_dbg_cont(" prev %d", task->prev_sib->pid);
 		if (task->phantom)
 			ckpt_dbg_cont(" placeholder %d", task->phantom->pid);
-		ckpt_dbg_cont(" %c%c%c%c",
+		ckpt_dbg_cont(" %c%c%c%c%c%c",
 		       (task->flags & TASK_THREAD) ? 'T' : ' ',
 		       (task->flags & TASK_SIBLING) ? 'P' : ' ',
 		       (task->flags & TASK_SESSION) ? 'S' : ' ',
+		       (task->flags & TASK_NEWPID) ? 'N' : ' ',
+		       (task->flags & TASK_GHOST) ? 'G' : ' ',
 		       (task->flags & TASK_DEAD) ? 'D' : ' ');
 		ckpt_dbg_cont("\n");
 	}
@@ -855,7 +858,7 @@ static int ckpt_build_tree(struct ckpt_ctx *ctx)
 	return 0;
 }		
 
-static int ckpt_setup_task(struct ckpt_ctx *ctx, pid_t pid)
+static int ckpt_setup_task(struct ckpt_ctx *ctx, pid_t pid, pid_t ppid)
 {
 	struct task *task;
 
@@ -864,12 +867,13 @@ static int ckpt_setup_task(struct ckpt_ctx *ctx, pid_t pid)
 
 	task = &ctx->tasks_arr[ctx->tasks_nr++];
 
-	task->flags = TASK_DEAD;
+	task->flags = TASK_GHOST;
 
+	/* */
 	task->pid = pid;
-	task->ppid = ckpt_init_task(ctx)->pid;
+	task->ppid = ppid;
 	task->tgid = pid;
-	task->sid = pid;
+	task->sid = ppid;
 
 	task->children = NULL;
 	task->next_sib = NULL;
@@ -892,36 +896,38 @@ static int ckpt_setup_task(struct ckpt_ctx *ctx, pid_t pid)
 
 static int ckpt_init_tree(struct ckpt_ctx *ctx)
 {
+	struct ckpt_hdr_pids *pids_arr = ctx->pids_arr;
+	int pids_nr = ctx->pids_nr;
 	struct task *task;
-	pid_t init_sid;
-	pid_t init_pid;
-	pid_t init_pgid;
+	pid_t root_sid;
+	pid_t root_pid;
+	pid_t root_pgid;
 	int i;
 
-	init_pid = ctx->pids_arr[0].vpid;
-	init_sid = ctx->pids_arr[0].vsid;
-	init_pgid = ctx->pids_arr[0].vpgid;
+	root_pid = pids_arr[0].vpid;
+	root_sid = pids_arr[0].vsid;
+	root_pgid = pids_arr[0].vpgid;
 
 	/* XXX for out-of-container subtrees */
-	for (i = 0; i < ctx->pids_nr; i++) {
-		if (ctx->pids_arr[i].vsid == init_sid)
-			ctx->pids_arr[i].vsid = init_pid;
-		if (ctx->pids_arr[i].vpgid == init_sid)
-			ctx->pids_arr[i].vpgid = init_pid;
-		if (ctx->pids_arr[i].vpgid == init_pgid)
-			ctx->pids_arr[i].vpgid = init_pid;
+	for (i = 0; i < pids_nr; i++) {
+		if (pids_arr[i].vsid == root_sid)
+			pids_arr[i].vsid = root_pid;
+		if (pids_arr[i].vpgid == root_sid)
+			pids_arr[i].vpgid = root_pid;
+		if (pids_arr[i].vpgid == root_pgid)
+			pids_arr[i].vpgid = root_pid;
 	}
 
 	/* populate with known tasks */
-	for (i = 0; i < ctx->pids_nr; i++) {
+	for (i = 0; i < pids_nr; i++) {
 		task = &ctx->tasks_arr[i];
 
 		task->flags = 0;
 
-		task->pid = ctx->pids_arr[i].vpid;
-		task->ppid = ctx->pids_arr[i].vppid;
-		task->tgid = ctx->pids_arr[i].vtgid;
-		task->sid = ctx->pids_arr[i].vsid;
+		task->pid = pids_arr[i].vpid;
+		task->ppid = pids_arr[i].vppid;
+		task->tgid = pids_arr[i].vtgid;
+		task->sid = pids_arr[i].vsid;
 
 		task->children = NULL;
 		task->next_sib = NULL;
@@ -936,19 +942,32 @@ static int ckpt_init_tree(struct ckpt_ctx *ctx)
 			return -1;
 	}
 
-	ctx->tasks_nr = ctx->pids_nr;
+	ctx->tasks_nr = pids_nr;
 
 	/* add pids unaccounted for (no tasks) */
-	for (i = 0; i < ctx->pids_nr; i++) {
-		if (ckpt_setup_task(ctx, ctx->pids_arr[i].vsid) < 0)
+	for (i = 0; i < pids_nr; i++) {
+		/* session leader's parent is root task */
+		if (ckpt_setup_task(ctx, pids_arr->vsid, root_pid) < 0)
 			return -1;
-		if (ckpt_setup_task(ctx, ctx->pids_arr[i].vpgid) < 0)
+
+		/*
+		 * If pgrp != sid, pgrp owner's parent is sid. Other
+		 * tasks with same pgrp will need to have threir sid
+		 * matching, too, when the kernel restores their pgrp.
+		 * If pgrp == sid, then the call above would have
+		 * ensured that the pid is hashed: ckpt_setup_task()
+		 * will return promptly.
+		 */
+		if (ckpt_setup_task(ctx, pids_arr->vpgid, pids_arr->vsid) < 0)
 			return -1;
+
+		pids_arr++;
 	}
 
 	/* mark root task(s) */
 	ctx->tasks_arr[0].flags |= TASK_ROOT;
 
+	ckpt_dbg("total tasks (including ghosts): %d\n", ctx->tasks_nr);
 	return 0;
 }
 
@@ -970,10 +989,11 @@ static int ckpt_init_tree(struct ckpt_ctx *ctx)
  * flags, pid, tgid, sid, pgid, and pointers to the a creator, next
  * and previous sibling, and first child task. Note that the creator
  * may not necessarily correspond to the parent. The possible flags
- * are TASK_DEAD, TASK_THREAD, TASK_SESSION (that asks inherit a
- * session id), and TASK_SIBLING (that asks to inherit the parent via
- * CLONE_PARENT). The algorithm loops through all the entries in the
- * table:
+ * are TASK_ROOT, TASK_GHOST, TASK_THREAD, TASK_SIBLING (that asks to
+ * inherit the parent via CLONE_PARENT), TASK_SESSION (that asks to
+ * inherit a session id), TASK_NEWPID (that asks to start a new pid
+ * namespace), and TASK_DEAD. The algorithm loops through all the
+ * entries in the table:
  *
  * If the entry is a thread and not the thread group leader, we set
  * the creator to be the thread group leader and set TASK_THREAD.
@@ -1023,7 +1043,7 @@ static int ckpt_set_creator(struct ckpt_ctx *ctx, struct task *task)
 	struct task *creator;
 
 	if (task == ckpt_init_task(ctx)) {
-		ckpt_err("pid %d: init - no creator\n", ckpt_init_task(ctx)->pid);
+		ckpt_err("pid %d: no init creator\n", ckpt_init_task(ctx)->pid);
 		return -1;
 	}
 
@@ -1230,6 +1250,7 @@ static int ckpt_make_tree(struct ckpt_ctx *ctx, struct task *task)
 {
 	struct task *child;
 	struct pid_swap swap;
+	unsigned long flags = 0;
 	pid_t newpid;
 	int ret;
 
@@ -1307,9 +1328,20 @@ static int ckpt_make_tree(struct ckpt_ctx *ctx, struct task *task)
 	}
 	close(ctx->pipe_out);
 
+	/*
+	 * Ghost tasks are not restarted and end up dead, but their
+	 * pids are referred to by other tasks' pgids (also sids, that
+	 * are already properly set by now). Therefore, they stick
+	 * around until those tasks actually restore their pgrp, and
+	 * then exit (more precisely, killed). The RESTART_GHOST flag
+	 * tells the kernel that they are not to be restored.
+	 */
+	if (task->flags & TASK_GHOST)
+		flags |= RESTART_GHOST;
+
 	/* on success this doesn't return */
-	ckpt_dbg("about to call sys_restart()\n");
-	ret = restart(0, STDIN_FILENO, 0);
+	ckpt_dbg("about to call sys_restart(), flags %#lx\n", flags);
+	ret = restart(0, STDIN_FILENO, flags);
 	if (ret < 0)
 		perror("task restore failed");
 	return ret;
@@ -1562,7 +1594,10 @@ static int ckpt_adjust_pids(struct ckpt_ctx *ctx)
 	memcpy(ctx->copy_arr, ctx->pids_arr, len);
 
 	/* read in 'pid_swap' data and adjust ctx->pids_arr */
-	for (n = 0; n < ctx->pids_nr; n++) {
+	for (n = 0; n < ctx->tasks_nr; n++) {
+		/* don't expect data from dead tasks */
+		if (ctx->tasks_arr[n].flags & TASK_DEAD)
+			continue;
 		ret = read(ctx->pipe_in, &swap, sizeof(swap));
 		if (ret < 0)
 			ckpt_abort(ctx, "read pipe");
@@ -1577,8 +1612,10 @@ static int ckpt_adjust_pids(struct ckpt_ctx *ctx)
 				ctx->copy_arr[m].vpid = swap.new;
 			if (ctx->pids_arr[m].vtgid == swap.old)
 				ctx->copy_arr[m].vtgid = swap.new;
-			if (ctx->pids_arr[m].vppid == swap.old)
-				ctx->copy_arr[m].vppid = swap.new;
+			if (ctx->pids_arr[m].vpgid == swap.old)
+				ctx->copy_arr[m].vpgid = swap.new;
+			else if (ctx->pids_arr[m].vpgid == -swap.old)
+				ctx->copy_arr[m].vpgid = -swap.new;
 		}
 	}
 
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] test/pgrp.c: add test case for process-groups
       [not found] ` <1252074054-22241-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
                     ` (4 preceding siblings ...)
  2009-09-04 14:20   ` [PATCH 1/3] mktree: add support to ghost tasks (TASK_GHOST) Oren Laadan
@ 2009-09-04 14:20   ` Oren Laadan
  2009-09-04 14:20   ` [PATCH 3/3] restart: correctly handle pgid/ppid/sid = 0 Oren Laadan
  6 siblings, 0 replies; 22+ messages in thread
From: Oren Laadan @ 2009-09-04 14:20 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

---
 test/Makefile |    2 +-
 test/pgrp.c   |   79 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+), 1 deletions(-)
 create mode 100644 test/pgrp.c

diff --git a/test/Makefile b/test/Makefile
index 0ad576f..cad40e0 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -7,7 +7,7 @@ CFLAGS += -g $(WARNS)
 
 TESTS=	onetask multitask bigmem pipes pipes2 fifo shmem \
 	ipcshm ipcmsq ipcsem ipcshm_multi zombie sigpending \
-	itimer pty
+	itimer pty pgrp
 
 LDLIBS = -lm
 
diff --git a/test/pgrp.c b/test/pgrp.c
new file mode 100644
index 0000000..e685d12
--- /dev/null
+++ b/test/pgrp.c
@@ -0,0 +1,79 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#define OUTFILE  "/tmp/cr-test.out"
+
+pid_t do_fork(void)
+{
+	pid_t pid = fork();
+
+	if (pid < 0) {
+		perror("fork");
+		exit(1);
+	}
+
+	return pid;
+}
+
+int main(int argc, char *argv[])
+{
+	FILE *file;
+	pid_t p1, p2;
+	int i;
+
+	close(0);
+	close(1);
+	close(2);
+
+	setsid();
+
+	unlink(OUTFILE);
+	file = fopen(OUTFILE, "w+");
+	if (!file) {
+		perror("open");
+		exit(1);
+	}
+	if (dup2(0,2) < 0) {
+		perror("dup2");
+		exit(1);
+	}
+
+	fprintf(file, "hello, world\n");
+	fflush(file);
+
+	p1 = do_fork();
+	if (p1 == 0) {
+		fprintf(file, "[%d] child born 0\n", getpid());
+		fflush(file);
+		sleep(1);
+		fprintf(file, "[%d] child exit 0\n", getpid());
+		exit(0);
+	}
+
+	p2 = do_fork();
+	if (p2 == 0) {
+		fprintf(file, "[%d] child born 1\n", getpid());
+		sleep(1);
+	} else {
+		if (setpgid(p2, p1) < 0) {
+			perror("setpgid");
+			exit(1);
+		}
+	}
+
+	for (i = 0; i < 15; i++) {
+		fprintf(file, "[%d] count %d\n", getpid(), i);
+		fflush(file);
+		sleep(1);
+		fflush(file);
+	}
+
+	fprintf(file, "[pid %d] world, hello\n", getpid());
+	fflush(file);
+
+	return 0;
+}
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] restart: correctly handle pgid/ppid/sid = 0
       [not found] ` <1252074054-22241-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
                     ` (5 preceding siblings ...)
  2009-09-04 14:20   ` [PATCH 2/3] test/pgrp.c: add test case for process-groups Oren Laadan
@ 2009-09-04 14:20   ` Oren Laadan
  6 siblings, 0 replies; 22+ messages in thread
From: Oren Laadan @ 2009-09-04 14:20 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

If pgid, or sid, or ppid is zero, it means that they point to a task
from the ancestor pid-ns. Make sure 'mktree' handles these correctly.

If a 0 pid (pgid/sid/ppid) exists, then force the --pidns option to be
on - to ensure that we have a new pid-ns (and thus a parent pid-ns).

Ignore pid = 0 when setting up new tasks in ckpt_setup_task().

Improve sanity checks for pid 0 and tgid 0 (both are impossible).

Allow ppid = 0 only for root_task. For others, allow sid = 0.

Introduce dummy zero_task that represents the parent of root_task,
likely from ancestor pid-ns. Setting a creator to root_task simplifies
the logic.

Also, ensure that a the tgid of threads is valid (has an entry in the
pids array) even if the thread group leader died.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 restart.c |  126 +++++++++++++++++++++++++++++++++++++++++++------------------
 1 files changed, 89 insertions(+), 37 deletions(-)

diff --git a/restart.c b/restart.c
index a7f9d22..3128d4e 100644
--- a/restart.c
+++ b/restart.c
@@ -63,8 +63,8 @@ static char usage_str[] =
  * the time of the checkpoint may be already in use then. Therefore,
  * by default, 'mktree' creates an equivalen tree without restoring
  * the original pids, assuming that the application can tolerate this.
- * For this, the 'ckpt_hdr_pids' array is transformed on-the-fly
- * before it is fed to the kernel.
+ * For this, the 'ckpt_pids' array is transformed on-the-fly before it
+ * is fed to the kernel.
  *
  * By default, "--pids" implied "--pidns" and vice-versa. The user can
  * use "--pids --no-pidns" for a restart in the currnet namespace -
@@ -144,6 +144,9 @@ struct task {
 	pid_t real_parent;	/* pid of task's real parent */
 };
 
+/* zero_task represents creator of root_task (all pids 0) */
+struct task zero_task;
+
 #define TASK_ROOT	0x1	/* root task */
 #define TASK_GHOST	0x2	/* dead task (pid used as sid/pgid) */
 #define TASK_THREAD	0x4	/* thread (non leader) */
@@ -153,7 +156,7 @@ struct task {
 #define TASK_DEAD	0x40	/* dead task (dummy) */
 
 struct ckpt_ctx {
-	pid_t init_pid;
+	pid_t root_pid;
 	int pipe_in;
 	int pipe_out;
 	int pids_nr;
@@ -196,6 +199,7 @@ int global_sent_sigint;
 
 static int ckpt_build_tree(struct ckpt_ctx *ctx);
 static int ckpt_init_tree(struct ckpt_ctx *ctx);
+static int ckpt_need_pidns(struct ckpt_ctx *ctx);
 static int ckpt_set_creator(struct ckpt_ctx *ctx, struct task *task);
 static int ckpt_placeholder_task(struct ckpt_ctx *ctx, struct task *task);
 static int ckpt_propagate_session(struct ckpt_ctx *ctx, struct task *session);
@@ -545,6 +549,16 @@ int main(int argc, char *argv[])
 	if (ret < 0)
 		exit(1);
 
+	/*
+	 * For a pgid/sid == 0, the corresponding restarting task will
+	 * expect to reference the parent pid-ns (of entire restart).
+	 * We ensure that one does exist by setting ctx.args->pidns.
+	 */
+	if (!ctx.args->pidns && ckpt_need_pidns(&ctx)) {
+		ckpt_dbg("found pgid/sid 0, need pidns\n");
+		ctx.args->pidns = 1;
+	}
+
 	if (ctx.args->pidns && ctx.tasks_arr[0].pid != 1) {
 		ckpt_dbg("new pidns without init\n");
 		if (global_send_sigint == -1)
@@ -821,8 +835,6 @@ static int ckpt_build_tree(struct ckpt_ctx *ctx)
 	/* assign a creator to each task */
 	for (i = 0; i < ctx->tasks_nr; i++) {
 		task = &ctx->tasks_arr[i];
-		if (task == ckpt_init_task(ctx))
-			continue;
 		if (task->creator)
 			continue;
 		if (ckpt_set_creator(ctx, task) < 0) {
@@ -837,7 +849,7 @@ static int ckpt_build_tree(struct ckpt_ctx *ctx)
 		task = &ctx->tasks_arr[i];
 		ckpt_dbg("[%d] pid %d ppid %d sid %d creator %d",
 			 i, task->pid, task->ppid, task->sid,
-		       task->creator ? task->creator->pid : 0);
+			 task->creator->pid);
 		if (task->next_sib)
 			ckpt_dbg_cont(" next %d", task->next_sib->pid);
 		if (task->prev_sib)
@@ -862,7 +874,10 @@ static int ckpt_setup_task(struct ckpt_ctx *ctx, pid_t pid, pid_t ppid)
 {
 	struct task *task;
 
-	if (hash_lookup(ctx, pid))
+	if (pid == 0)  /* ignore if outside namespace */
+		return 0;
+
+	if (hash_lookup(ctx, pid))  /* already handled */
 		return 0;
 
 	task = &ctx->tasks_arr[ctx->tasks_nr++];
@@ -896,7 +911,7 @@ static int ckpt_setup_task(struct ckpt_ctx *ctx, pid_t pid, pid_t ppid)
 
 static int ckpt_init_tree(struct ckpt_ctx *ctx)
 {
-	struct ckpt_hdr_pids *pids_arr = ctx->pids_arr;
+	struct ckpt_pids *pids_arr = ctx->pids_arr;
 	int pids_nr = ctx->pids_nr;
 	struct task *task;
 	pid_t root_sid;
@@ -908,16 +923,6 @@ static int ckpt_init_tree(struct ckpt_ctx *ctx)
 	root_sid = pids_arr[0].vsid;
 	root_pgid = pids_arr[0].vpgid;
 
-	/* XXX for out-of-container subtrees */
-	for (i = 0; i < pids_nr; i++) {
-		if (pids_arr[i].vsid == root_sid)
-			pids_arr[i].vsid = root_pid;
-		if (pids_arr[i].vpgid == root_sid)
-			pids_arr[i].vpgid = root_pid;
-		if (pids_arr[i].vpgid == root_pgid)
-			pids_arr[i].vpgid = root_pid;
-	}
-
 	/* populate with known tasks */
 	for (i = 0; i < pids_nr; i++) {
 		task = &ctx->tasks_arr[i];
@@ -938,6 +943,14 @@ static int ckpt_init_tree(struct ckpt_ctx *ctx)
 		task->rpid = -1;
 		task->ctx = ctx;
 
+		if (task->pid == 0) {
+			ckpt_err("Invalid pid 0 for task#%d\n", i);
+			return -1;
+		} else if (task->tgid == 0) {
+			ckpt_err("Invalid tgid 0 for task#%d\n", i);
+			return -1;
+		}
+
 		if (hash_insert(ctx, task->pid, task) < 0)
 			return -1;
 	}
@@ -946,31 +959,60 @@ static int ckpt_init_tree(struct ckpt_ctx *ctx)
 
 	/* add pids unaccounted for (no tasks) */
 	for (i = 0; i < pids_nr; i++) {
-		/* session leader's parent is root task */
-		if (ckpt_setup_task(ctx, pids_arr->vsid, root_pid) < 0)
+		pid_t sid;
+
+		sid = pids_arr[i].vsid;
+
+		/*
+		 * An unaccounted-for sid belongs to a task that was a
+		 * session leader and died. We can safe set its parent
+		 * (and creator) to be the root task.
+		 */
+		if (ckpt_setup_task(ctx, sid, root_pid) < 0)
 			return -1;
 
 		/*
-		 * If pgrp != sid, pgrp owner's parent is sid. Other
-		 * tasks with same pgrp will need to have threir sid
-		 * matching, too, when the kernel restores their pgrp.
-		 * If pgrp == sid, then the call above would have
-		 * ensured that the pid is hashed: ckpt_setup_task()
-		 * will return promptly.
+		 * If a pid belongs to a dead thread group leader, we
+		 * need to add it with the same sid as current (and
+		 * other) threads.
 		 */
-		if (ckpt_setup_task(ctx, pids_arr->vpgid, pids_arr->vsid) < 0)
+		if (ckpt_setup_task(ctx, pids_arr[i].vtgid, sid) < 0)
 			return -1;
 
-		pids_arr++;
+		/*
+		 * If pgrp == sid, then the pgrp/sid will already have
+		 * been hashed by now (e.g. by the call above) and the
+		 * ckpt_setup_task() will return promptly.
+		 * If pgrp != sid, then the pgrp 'owner' must have the
+		 * same sid as us: all tasks with same pgrp must have
+		 * their sid matching.
+		 */
+		if (ckpt_setup_task(ctx, pids_arr[i].vpgid, sid) < 0)
+			return -1;
 	}
 
-	/* mark root task(s) */
-	ctx->tasks_arr[0].flags |= TASK_ROOT;
+	/* mark root task(s), and set its "creator" to be zero_task */
+	ckpt_init_task(ctx)->flags |= TASK_ROOT;
+	ckpt_init_task(ctx)->creator = &zero_task;
 
 	ckpt_dbg("total tasks (including ghosts): %d\n", ctx->tasks_nr);
 	return 0;
 }
 
+static int ckpt_need_pidns(struct ckpt_ctx *ctx)
+{
+	int i;
+
+	for (i = 0; i < ctx->pids_nr; i++) {
+		if (ctx->pids_arr[i].vpid == 0 ||
+		    ctx->pids_arr[i].vpgid == 0 ||
+		    ctx->pids_arr[i].vsid == 0)
+			return 1;
+	}
+
+	return 0;
+}
+
 /*
  * Algorithm DumpForest
  * "Transparent Checkpoint/Restart of Multiple Processes on Commodity
@@ -1043,10 +1085,20 @@ static int ckpt_set_creator(struct ckpt_ctx *ctx, struct task *task)
 	struct task *creator;
 
 	if (task == ckpt_init_task(ctx)) {
-		ckpt_err("pid %d: no init creator\n", ckpt_init_task(ctx)->pid);
+		ckpt_err("pid %d: logical error\n", ckpt_init_task(ctx)->pid);
 		return -1;
 	}
 
+	/* only root_task can have ppid == 0, parent must always exist */
+	if (task->ppid == 0 || !parent) {
+		ckpt_err("pid %d: invalid ppid %d\n", task->pid, task->ppid);
+		return -1;
+	}
+
+	/* sid == 0 must have been inherited from outside the container */
+	if (task->sid == 0)
+		session = ckpt_init_task(ctx);
+
 	if (task->tgid != task->pid) {
 		/* thread: creator is thread-group-leader */
 		ckpt_dbg("pid %d: thread tgid %d\n", task->pid, task->tgid);
@@ -1073,7 +1125,7 @@ static int ckpt_set_creator(struct ckpt_ctx *ctx, struct task *task)
 		creator = session->phantom;
 	} else {
 		/* first make sure we know the session's creator */
-		if (!session->creator && session != ckpt_init_task(ctx)) {
+		if (!session->creator) {
 			/* (non-session-leader) recursive: session's creator */
 			ckpt_dbg("pid %d: recursive session creator %d\n",
 			       task->pid, task->sid);
@@ -1206,7 +1258,7 @@ static int ckpt_propagate_session(struct ckpt_ctx *ctx, struct task *task)
 		ckpt_dbg("pid %d: moving up to %d\n", task->pid, creator->pid);
 		task = creator;
 
-		if(!task->creator && task != ckpt_init_task(ctx)) {
+		if(!task->creator) {
 			if (ckpt_set_creator(ctx, task) < 0)
 				return -1;
 		}
@@ -1389,7 +1441,7 @@ int ckpt_fork_stub(void *data)
 
 	/* root has some extra work */
 	if (task->flags & TASK_ROOT) {
-		ctx->init_pid = _getpid();
+		ctx->root_pid = _getpid();
 		ckpt_dbg("root task pid %d\n", _getpid());
 	}
 
@@ -1513,13 +1565,13 @@ static int ckpt_fork_feeder(struct ckpt_ctx *ctx)
 static void ckpt_abort(struct ckpt_ctx *ctx, char *str)
 {
 	perror(str);
-	kill(-(ctx->init_pid), SIGKILL);
+	kill(-(ctx->root_pid), SIGKILL);
 	exit(1);
 }
 
 /*
  * feeder process: delegates checkpoint image stream to the kernel.
- * In '--no-pids' mode, transform the pids array (struct ckpt_hdr_pids)
+ * In '--no-pids' mode, transform the pids array (struct ckpt_pids)
  * on the fly and feed the result to the "init" task of the restart
  */
 static int ckpt_do_feeder(void *data)
@@ -1567,7 +1619,7 @@ static int ckpt_do_feeder(void *data)
 }
 
 /*
- * ckpt_adjust_pids: transform the pids array (struct ckpt_hdr_pids) by
+ * ckpt_adjust_pids: transform the pids array (struct ckpt_pids) by
  * substituing actual pid values for original pid values.
  *
  * Collect pids reported by the newly created tasks; each task sends
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] c/r: [pty 1/2] allow allocation of desired pty slave
       [not found]     ` <1252074054-22241-3-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
@ 2009-09-04 15:26       ` Serge E. Hallyn
       [not found]         ` <20090904152644.GA15253-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Serge E. Hallyn @ 2009-09-04 15:26 UTC (permalink / raw)
  To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
> During restart, we need to allocate pty slaves with the same
> identifiers as recorded during checkpoint. Modify the allocation code
> to allow an in-kernel caller to request a specific slave identifier.
> 
> For this, add a new field to task_struct - 'required_id'. It will
> hold the desired identifier when restoring a (master) pty.
> 
> The code in ptmx_open() will use this value only for tasks that try to
> open /dev/ptmx that are restarting (PF_RESTARTING), and if the value
> isn't CKPT_REQUIRED_NONE (-1).

So noone has indicated any preference for this versus the ptmx_create()
approach...

I'm satisfied knowing we have a working fallback in case task->required_id
is deemed unacceptable.

However I'd like to not have linux-kernel folks think us morons for not
having considered that.  Can you add a message to the changelog saying
why we're going with this approach (namely, that it lets us re-use
filp_open() instead of having to do a custom alloc_file in a new code-path,
which introduces maintenance duplication for file permission checking
paths)?

thanks,
-serge

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] c/r: [pty 1/2] allow allocation of desired pty slave
       [not found]         ` <20090904152644.GA15253-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-09-04 16:08           ` Oren Laadan
  2009-09-08  8:09           ` Louis Rilling
  1 sibling, 0 replies; 22+ messages in thread
From: Oren Laadan @ 2009-09-04 16:08 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
>> During restart, we need to allocate pty slaves with the same
>> identifiers as recorded during checkpoint. Modify the allocation code
>> to allow an in-kernel caller to request a specific slave identifier.
>>
>> For this, add a new field to task_struct - 'required_id'. It will
>> hold the desired identifier when restoring a (master) pty.
>>
>> The code in ptmx_open() will use this value only for tasks that try to
>> open /dev/ptmx that are restarting (PF_RESTARTING), and if the value
>> isn't CKPT_REQUIRED_NONE (-1).
> 
> So noone has indicated any preference for this versus the ptmx_create()
> approach...
> 
> I'm satisfied knowing we have a working fallback in case task->required_id
> is deemed unacceptable.
> 
> However I'd like to not have linux-kernel folks think us morons for not
> having considered that.  Can you add a message to the changelog saying
> why we're going with this approach (namely, that it lets us re-use
> filp_open() instead of having to do a custom alloc_file in a new code-path,
> which introduces maintenance duplication for file permission checking
> paths)?

Good point, will add.

Oren.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] c/r: [pty 2/2] support for pseudo terminals
       [not found]     ` <1252074054-22241-4-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
@ 2009-09-04 16:21       ` Serge E. Hallyn
  2009-09-08  8:50       ` Louis Rilling
  1 sibling, 0 replies; 22+ messages in thread
From: Serge E. Hallyn @ 2009-09-04 16:21 UTC (permalink / raw)
  To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
> This patch adds support for checkpoint and restart of pseudo terminals
> (PTYs). Since PTYs are shared (pointed to by file, and signal), they
> are managed via objhash.
> 
> PTYs are master/slave pairs; The code arranges for the master to
> always be checkpointed first, followed by the slave. This is important
> since during restart both ends are created when restoring the master.
> 
> In this patch only UNIX98 style PTYs are supported.
> 
> Currently only PTYs that are referenced by open files are handled.
> Thus PTYs checkpoint starts with a file in tty_file_checkpoint(). It
> will first checkpoint the master and slave PTYs via tty_checkpoint(),
> and then complete the saving of the file descriptor. This means that
> in the image file, the order of objects is: master-tty, slave-tty,
> file-desc.
> 
> During restart, to restore the master side, we open the /dev/ptmx
> device and get a file handle. But at this point we don't know the
> designated objref for this file, because the file is due later on in
> the image stream. On the other hand, we can't just fput() the file
> because it will close the PTY too.
> 
> Instead, when we checkpoint the master PTY, we _reserve_ an objref
> for the file (which won't be further used in checkpoint). Then at
> restart, use it to insert the file to objhash.
> 
> TODO:
> 
> * Better sanitize input from checkpoint image on restore
> * Check the locking when saving/restoring tty_struct state
> * Echo position/buffer isn't saved (is it needed ?)
> * Handle multiple devpts mounts (namespaces)
> * Paths of ptmx and slaves are hard coded (/dev/ptmx, /dev/pts/...)
> 
> Changelog[v1]:
>   - Adjust include/asm/checkpoint_hdr.h for s390 architecture
>   - Add NCC to kernel constants header (ckpt_hdr_const)
>   - [Serge Hallyn] fix calculation of canon_datalen
> 
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>

First three patches can get

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Tested-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] c/r: tighten logic to protect against bogus pids in input
       [not found]     ` <1252074054-22241-5-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
@ 2009-09-08  1:09       ` Serge E. Hallyn
  0 siblings, 0 replies; 22+ messages in thread
From: Serge E. Hallyn @ 2009-09-08  1:09 UTC (permalink / raw)
  To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
> During restart, each process that completes its restore will wake up
> the next process in line, using the pid provided in the image file.
> A malicious user could provide bogus pids and cause arbitrary tasks to
> be woken up.
> 
> This patch tightens the checks and requires that to wake-up a task by
> pid during restart, the target task must belong to (have) the same
> restart context (task->checkpoint_ctx) as the current process.
> 
> Also, update in-code comments about restart and zombie logic.
> 
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

> ---
>  checkpoint/restart.c |   35 ++++++++++++++++++++---------------
>  1 files changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/checkpoint/restart.c b/checkpoint/restart.c
> index 2282ffc..840b5cb 100644
> --- a/checkpoint/restart.c
> +++ b/checkpoint/restart.c
> @@ -462,12 +462,12 @@ static int restore_read_tree(struct ckpt_ctx *ctx)
>  		return PTR_ERR(h);
> 
>  	ret = -EINVAL;
> -	if (h->nr_tasks < 0)
> +	if (h->nr_tasks <= 0)
>  		goto out;
> 
>  	ctx->nr_pids = h->nr_tasks;
>  	size = sizeof(*ctx->pids_arr) * ctx->nr_pids;
> -	if (size < 0)		/* overflow ? */
> +	if (size <= 0)		/* overflow ? */
>  		goto out;
> 
>  	ctx->pids_arr = kmalloc(size, GFP_KERNEL);
> @@ -520,8 +520,11 @@ int ckpt_activate_next(struct ckpt_ctx *ctx)
> 
>  	rcu_read_lock();
>  	task = find_task_by_pid_ns(pid, ctx->root_nsproxy->pid_ns);
> -	if (task)
> +	/* target task must have same restart context */
> +	if (task && task->checkpoint_ctx == ctx)
>  		wake_up_process(task);
> +	else
> +		task = NULL;
>  	rcu_read_unlock();
> 
>  	if (!task) {
> @@ -568,9 +571,8 @@ static int do_restore_task(void)
>  	int ret;
> 
>  	/*
> -	 * Wait for coordinator to make become visible, then grab a
> -	 * reference to its restart context. If we're the last task to
> -	 * do it, notify the coordinator.
> +	 * Wait for coordinator to become visible, then grab a
> +	 * reference to its restart context.
>  	 */
>  	ret = wait_event_interruptible(waitq, current->checkpoint_ctx);
>  	if (ret < 0)
> @@ -610,8 +612,9 @@ static int do_restore_task(void)
>  		goto out;
> 
>  	/*
> -	 * zombie: we're done here; Save @ctx on task_struct, to be
> -	 * used to ckpt_activate_next(), and released, from do_exit().
> +	 * zombie: we're done here; do_exit() will notice the @ctx on
> +	 * our current->checkpoint_ctx (and our PF_RESTARTING) - it
> +	 * will call ckpt_activate_next() and release the @ctx.
>  	 */
>  	if (ret)
>  		do_exit(current->exit_code);
> @@ -638,11 +641,11 @@ static int do_restore_task(void)
>  }
> 
>  /**
> - * prepare_descendants - set ->restart_tsk of all descendants
> + * prepare_descendants - set ->checkpoint_ctx of all descendants
>   * @ctx: checkpoint context
>   * @root: root process for restart
>   *
> - * Called by the coodinator to set the ->restart_tsk pointer of the
> + * Called by the coodinator to set the ->checkpoint_ctx pointer of the
>   * root task and all its descendants.
>   */
>  static int prepare_descendants(struct ckpt_ctx *ctx, struct task_struct *root)
> @@ -662,7 +665,7 @@ static int prepare_descendants(struct ckpt_ctx *ctx, struct task_struct *root)
>  			break;
>  		}
>  		/*
> -		 * Set task->restart_tsk of all non-zombie descendants.
> +		 * Set task->checkpoint_ctx of all non-zombie descendants.
>  		 * If a descendant already has a ->checkpoint_ctx, it
>  		 * must be a coordinator (for a different restart ?) so
>  		 * we fail.
> @@ -727,6 +730,7 @@ static int wait_all_tasks_finish(struct ckpt_ctx *ctx)
> 
>  	init_completion(&ctx->complete);
> 
> +	BUG_ON(ctx->active_pid != -1);
>  	ret = ckpt_activate_next(ctx);
>  	if (ret < 0)
>  		return ret;
> @@ -839,7 +843,7 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
>  		if (ret < 0)
>  			goto out;
>  	} else {
> -		/* prepare descendants' t->restart_tsk point to coord */
> +		/* prepare descendants' t->checkpoint_ctx point to coord */
>  		ret = prepare_descendants(ctx, ctx->root_task);
>  		if (ret < 0)
>  			goto out;
> @@ -970,11 +974,12 @@ void exit_checkpoint(struct task_struct *tsk)
>  {
>  	struct ckpt_ctx *ctx;
> 
> -	ctx = tsk->checkpoint_ctx;
> -	tsk->checkpoint_ctx = NULL;
> +	/* no one else will touch this, because @tsk is dead already */
> +	ctx = xchg(&tsk->checkpoint_ctx, NULL);
> 
> -	/* restarting zombies will acrivate next task in restart */
> +	/* restarting zombies will activate next task in restart */
>  	if (tsk->flags & PF_RESTARTING) {
> +		BUG_ON(ctx->active_pid == -1);
>  		if (ckpt_activate_next(ctx) < 0)
>  			pr_warning("c/r: [%d] failed zombie exit\n", tsk->pid);
>  	}
> -- 
> 1.6.0.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] c/r: correctly restore pgid
       [not found]   ` <1252074054-22241-6-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
@ 2009-09-08  3:04     ` Serge E. Hallyn
       [not found]       ` <20090908030445.GB16460-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Serge E. Hallyn @ 2009-09-08  3:04 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Oren Laadan

Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
> The main challenge with restoring the pgid of tasks is that the
> original "owner" (the process with that pid) might have exited
> already. I call these "ghost" pgids. 'mktree' does create these
> processes, but they then exit without participating in the restart.
> 
> To solve this, this patch introduces a RESTART_GHOST flag, used for
> "ghost" owners that are created only to pass their pgid to other
> tasks. ('mktree' now makes them call restart(2) instead of exiting).
> 
> When a "ghost" task calls restart(2), it will be placed on a wait
> queue until the restart completes and then exit. This guarantees that
> the pgid that it owns remains available for all (regular) restarting
> tasks for when they need it.
> 
> Regular tasks perform the restart as before, except that they also
> now restore their old pgrp, which is guaranteed to exist.
> 
> Changelog [v1]:
>   - Verify that pgid owner is a thread-group-leader.
>   - Handle the case of pgid/sid == 0 using root's parent pid-ns
> 
> Signed-off-by: Oren Laadan <orenl-3n9FKfXVdfcggxPgtCmjjw@public.gmane.org>
> ---
>  checkpoint/process.c             |  106 ++++++++++++++++++++++++-
>  checkpoint/restart.c             |  158 ++++++++++++++++++++++++++------------
>  checkpoint/sys.c                 |    3 +-
>  include/linux/checkpoint.h       |   11 ++-
>  include/linux/checkpoint_hdr.h   |    3 +
>  include/linux/checkpoint_types.h |    6 +-
>  6 files changed, 230 insertions(+), 57 deletions(-)
> 
> diff --git a/checkpoint/process.c b/checkpoint/process.c
> index 40b2580..5d6bdb9 100644
> --- a/checkpoint/process.c
> +++ b/checkpoint/process.c
> @@ -23,6 +23,57 @@
>  #include <linux/syscalls.h>
> 
> 
> +pid_t ckpt_pid_nr(struct ckpt_ctx *ctx, struct pid *pid)
> +{
> +	return pid ? pid_nr_ns(pid, ctx->root_nsproxy->pid_ns) : CKPT_PID_NULL;
> +}
> +
> +/* must be called with tasklist_lock or rcu_read_lock() held */
> +struct pid *_ckpt_find_pgrp(struct ckpt_ctx *ctx, pid_t pgid)
> +{
> +	struct task_struct *p;
> +	struct pid *pgrp;
> +
> +	if (pgid == 0) {
> +		/*
> +		 * At checkpoint the pgid owner lived in an ancestor
> +		 * pid-ns. The best we can do (sanely and safely) is
> +		 * to examine the parent of this restart's root: if in
> +		 * a distinct pid-ns, use its pgrp; otherwise fail.
> +		 */
> +		p = ctx->root_task->real_parent;
> +		if (p->nsproxy->pid_ns == current->nsproxy->pid_ns)
> +			return NULL;
> +		pgrp = task_pgrp(p);
> +	} else {
> +		/*
> +		 * Find the owner process of this pgid (it must exist
> +		 * if pgrp exists). It must be a thread group leader.
> +		 */
> +		pgrp = find_vpid(pgid);
> +		p = pid_task(pgrp, PIDTYPE_PID);
> +		if (!p || !thread_group_leader(p))
> +			return NULL;
> +		/*
> +		 * The pgrp must "belong" to our restart tree (compare
> +		 * p->checkpoint_ctx to ours). This prevents malicious
> +		 * input from (guessing and) using unrelated pgrps. If
> +		 * the owner is dead, then it doesn't have a context,
> +		 * so instead compare against its (real) parent's.
> +		 */
> +		if (p->exit_state == EXIT_ZOMBIE)
> +			p = p->real_parent;
> +		if (p->checkpoint_ctx != ctx)
> +			return NULL;
> +	}
> +
> +	if (task_session(current) != task_session(p))
> +		return NULL;
> +
> +	return pgrp;
> +}
> +
> +
>  #ifdef CONFIG_FUTEX
>  static void save_task_robust_futex_list(struct ckpt_hdr_task *h,
>  					struct task_struct *t)
> @@ -94,8 +145,8 @@ static int checkpoint_task_struct(struct ckpt_ctx *ctx, struct task_struct *t)
>  		h->exit_signal = t->exit_signal;
>  		h->pdeath_signal = t->pdeath_signal;
> 
> -		h->set_child_tid = t->set_child_tid;
> -		h->clear_child_tid = t->clear_child_tid;
> +		h->set_child_tid = (unsigned long) t->set_child_tid;

note that set_child_tid is an int (signed), not a long.  Same on
x86, but not on other arches.  Shouldn't lose info so could be worse.

On the whole,

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

-serge

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] c/r: support for controlling terminal and job control
       [not found]   ` <1252074054-22241-7-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
@ 2009-09-08  3:21     ` Serge E. Hallyn
  0 siblings, 0 replies; 22+ messages in thread
From: Serge E. Hallyn @ 2009-09-08  3:21 UTC (permalink / raw)
  To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
> Add checkpoint/restart of controlling terminal: current->signal->tty.
> This is only done for session leaders.
> 
> If the session leader belongs to the ancestor pid-ns, then checkpoint
> skips this tty; On restart, it will not be restored, and whatever tty
> is in place from parent pid-ns (at restart) will be inherited.
> 
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>

Not exactly my area, but looks good

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

> ---
>  checkpoint/signal.c            |   78 ++++++++++++++++++++++++++++++++++++++-
>  drivers/char/tty_io.c          |   33 +++++++++++++----
>  include/linux/checkpoint.h     |    1 +
>  include/linux/checkpoint_hdr.h |    6 +++
>  include/linux/tty.h            |    5 +++
>  5 files changed, 114 insertions(+), 9 deletions(-)
> 
> diff --git a/checkpoint/signal.c b/checkpoint/signal.c
> index ad06783..178a97c 100644
> --- a/checkpoint/signal.c
> +++ b/checkpoint/signal.c
> @@ -315,11 +315,12 @@ static int checkpoint_signal(struct ckpt_ctx *ctx, struct task_struct *t)
>  	struct ckpt_hdr_signal *h;
>  	struct signal_struct *signal;
>  	struct sigpending shared_pending;
> +	struct tty_struct *tty = NULL;
>  	struct rlimit *rlim;
>  	struct timeval tval;
>  	cputime_t cputime;
>  	unsigned long flags;
> -	int i, ret;
> +	int i, ret = 0;
> 
>  	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_SIGNAL);
>  	if (!h)
> @@ -398,9 +399,34 @@ static int checkpoint_signal(struct ckpt_ctx *ctx, struct task_struct *t)
>  	cputime_to_timeval(signal->it_prof_incr, &tval);
>  	h->it_prof_incr = timeval_to_ns(&tval);
> 
> +	/* tty */
> +	if (signal->leader) {
> +		h->tty_old_pgrp = ckpt_pid_nr(ctx, signal->tty_old_pgrp);
> +		tty = tty_kref_get(signal->tty);
> +		if (tty) {
> +			/* irq is already disabled */
> +			spin_lock(&tty->ctrl_lock);
> +			h->tty_pgrp = ckpt_pid_nr(ctx, tty->pgrp);
> +			spin_unlock(&tty->ctrl_lock);
> +			tty_kref_put(tty);
> +		}
> +	}
> +
>  	unlock_task_sighand(t, &flags);
> 
> -	ret = ckpt_write_obj(ctx, &h->h);
> +	/*
> +	 * If the session is in an ancestor namespace, skip this tty
> +	 * and set tty_objref = 0. It will not be explicitly restored,
> +	 * but rather inherited from parent pid-ns at restart time.
> +	 */
> +	if (tty && ckpt_pid_nr(ctx, tty->session) > 0) {
> +		h->tty_objref = checkpoint_obj(ctx, tty, CKPT_OBJ_TTY);
> +		if (h->tty_objref < 0)
> +			ret = h->tty_objref;
> +	}
> +
> +	if (!ret)
> +		ret = ckpt_write_obj(ctx, &h->h);
>  	if (!ret)
>  		ret = checkpoint_sigpending(ctx, &shared_pending);
> 
> @@ -471,8 +497,10 @@ static int restore_signal(struct ckpt_ctx *ctx)
>  	struct ckpt_hdr_signal *h;
>  	struct sigpending new_pending;
>  	struct sigpending *pending;
> +	struct tty_struct *tty = NULL;
>  	struct itimerval itimer;
>  	struct rlimit rlim;
> +	struct pid *pgrp;
>  	int i, ret;
> 
>  	h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_SIGNAL);
> @@ -492,6 +520,36 @@ static int restore_signal(struct ckpt_ctx *ctx)
>  	if (ret < 0)
>  		goto out;
> 
> +	/* tty - session */
> +	if (h->tty_objref) {
> +		tty = ckpt_obj_fetch(ctx, h->tty_objref, CKPT_OBJ_TTY);
> +		if (IS_ERR(tty)) {
> +			ret = PTR_ERR(tty);
> +			goto out;
> +		}
> +		/* this will fail unless we're the session leader */
> +		ret = tiocsctty(tty, 0);
> +		if (ret < 0)
> +			goto out;
> +		/* now restore the foreground group (job control) */
> +		if (h->tty_pgrp) {
> +			ret = do_tiocspgrp(tty, tty_pair_get_tty(tty),
> +					   h->tty_pgrp);
> +			if (ret < 0)
> +				goto out;
> +		}
> +	} else {
> +		/*
> +		 * If tty_objref isn't set, we _keep_ whatever tty we
> +		 * already have as a ctty. Why does this make sense ?
> +		 * - If our session is "within" the restart context,
> +		 * then that session has no controlling terminal.
> +		 * - If out session is "outside" the restart context,
> +                 * then we're like to keep whatever we inherit from
> +                 * the parent pid-ns.
> +		 */
> +	}
> +
>  	/*
>  	 * Reset real/virt/prof itimer (in case they were set), to
>  	 * prevent unwanted signals after flushing current signals
> @@ -503,7 +561,23 @@ static int restore_signal(struct ckpt_ctx *ctx)
>  	do_setitimer(ITIMER_VIRTUAL, &itimer, NULL);
>  	do_setitimer(ITIMER_PROF, &itimer, NULL);
> 
> +	/* tty - tty_old_pgrp */
> +	if (h->tty_old_pgrp) {
> +		ret = -EINVAL;
> +		if (!current->signal->leader)
> +			goto out;
> +		rcu_read_lock();
> +		pgrp = get_pid(_ckpt_find_pgrp(ctx, h->tty_old_pgrp));
> +		rcu_read_unlock();
> +		if (!pgrp)
> +			goto out;
> +	}
> +
>  	spin_lock_irq(&current->sighand->siglock);
> +	/* tty - tty_old_pgrp */
> +	put_pid(current->signal->tty_old_pgrp);
> +	current->signal->tty_old_pgrp = pgrp;
> +	/* pending signals */
>  	pending = &current->signal->shared_pending;
>  	flush_sigqueue(pending);
>  	pending->signal = new_pending.signal;
> diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
> index b8f8d79..0206300 100644
> --- a/drivers/char/tty_io.c
> +++ b/drivers/char/tty_io.c
> @@ -2132,7 +2132,7 @@ static int fionbio(struct file *file, int __user *p)
>   *		Takes ->siglock() when updating signal->tty
>   */
> 
> -static int tiocsctty(struct tty_struct *tty, int arg)
> +int tiocsctty(struct tty_struct *tty, int arg)
>  {
>  	int ret = 0;
>  	if (current->signal->leader && (task_session(current) == tty->session))
> @@ -2221,10 +2221,10 @@ static int tiocgpgrp(struct tty_struct *tty, struct tty_struct *real_tty, pid_t
>  }
> 
>  /**
> - *	tiocspgrp		-	attempt to set process group
> + *	do_tiocspgrp		-	attempt to set process group
>   *	@tty: tty passed by user
>   *	@real_tty: tty side device matching tty passed by user
> - *	@p: pid pointer
> + *	@pid: pgrp_nr
>   *
>   *	Set the process group of the tty to the session passed. Only
>   *	permitted where the tty session is our session.
> @@ -2232,10 +2232,10 @@ static int tiocgpgrp(struct tty_struct *tty, struct tty_struct *real_tty, pid_t
>   *	Locking: RCU, ctrl lock
>   */
> 
> -static int tiocspgrp(struct tty_struct *tty, struct tty_struct *real_tty, pid_t __user *p)
> +int do_tiocspgrp(struct tty_struct *tty,
> +		 struct tty_struct *real_tty, pid_t pgrp_nr)
>  {
>  	struct pid *pgrp;
> -	pid_t pgrp_nr;
>  	int retval = tty_check_change(real_tty);
>  	unsigned long flags;
> 
> @@ -2247,8 +2247,6 @@ static int tiocspgrp(struct tty_struct *tty, struct tty_struct *real_tty, pid_t
>  	    (current->signal->tty != real_tty) ||
>  	    (real_tty->session != task_session(current)))
>  		return -ENOTTY;
> -	if (get_user(pgrp_nr, p))
> -		return -EFAULT;
>  	if (pgrp_nr < 0)
>  		return -EINVAL;
>  	rcu_read_lock();
> @@ -2270,6 +2268,27 @@ out_unlock:
>  }
> 
>  /**
> + *	tiocspgrp		-	attempt to set process group
> + *	@tty: tty passed by user
> + *	@real_tty: tty side device matching tty passed by user
> + *	@p: pid pointer
> + *
> + *	Set the process group of the tty to the session passed. Only
> + *	permitted where the tty session is our session.
> + *
> + *	Locking: RCU, ctrl lock
> + */
> +
> +static int tiocspgrp(struct tty_struct *tty, struct tty_struct *real_tty, pid_t __user *p)
> +{
> +	pid_t pgrp_nr;
> +
> +	if (get_user(pgrp_nr, p))
> +		return -EFAULT;
> +	return do_tiocspgrp(tty, real_tty, pgrp_nr);
> +}
> +
> +/**
>   *	tiocgsid		-	get session id
>   *	@tty: tty passed by user
>   *	@real_tty: tty side of the tty pased by the user if a pty else the tty
> diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
> index 628d6f6..434c6b1 100644
> --- a/include/linux/checkpoint.h
> +++ b/include/linux/checkpoint.h
> @@ -87,6 +87,7 @@ extern char *ckpt_fill_fname(struct path *path, struct path *root,
> 
>  /* pids */
>  extern pid_t ckpt_pid_nr(struct ckpt_ctx *ctx, struct pid *pid);
> +extern struct pid *_ckpt_find_pgrp(struct ckpt_ctx *ctx, pid_t pgid);
> 
>  /* socket functions */
>  extern int ckpt_sock_getnames(struct ckpt_ctx *ctx,
> diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
> index c9ded68..398d5f7 100644
> --- a/include/linux/checkpoint_hdr.h
> +++ b/include/linux/checkpoint_hdr.h
> @@ -555,13 +555,19 @@ struct ckpt_rlimit {
> 
>  struct ckpt_hdr_signal {
>  	struct ckpt_hdr h;
> +	/* rlimit */
>  	struct ckpt_rlimit rlim[CKPT_RLIM_NLIMITS];
> +	/* itimer */
>  	__u64 it_real_value;
>  	__u64 it_real_incr;
>  	__u64 it_virt_value;
>  	__u64 it_virt_incr;
>  	__u64 it_prof_value;
>  	__u64 it_prof_incr;
> +	/* tty */
> +	__s32 tty_objref;
> +	__s32 tty_pgrp;
> +	__s32 tty_old_pgrp;
>  } __attribute__((aligned(8)));
> 
>  struct ckpt_hdr_signal_task {
> diff --git a/include/linux/tty.h b/include/linux/tty.h
> index fd77894..ee49d97 100644
> --- a/include/linux/tty.h
> +++ b/include/linux/tty.h
> @@ -467,6 +467,11 @@ extern void tty_ldisc_begin(void);
>  /* This last one is just for the tty layer internals and shouldn't be used elsewhere */
>  extern void tty_ldisc_enable(struct tty_struct *tty);
> 
> +/* These are for checkpoint/restart */
> +extern int tiocsctty(struct tty_struct *tty, int arg);
> +extern int do_tiocspgrp(struct tty_struct *tty,
> +			struct tty_struct *real_tty, pid_t pgrp_nr);
> +
>  #ifdef CONFIG_CHECKPOINT
>  struct ckpt_ctx;
>  struct ckpt_hdr_file;
> -- 
> 1.6.0.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] c/r: [pty 1/2] allow allocation of desired pty slave
       [not found]         ` <20090904152644.GA15253-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-09-04 16:08           ` Oren Laadan
@ 2009-09-08  8:09           ` Louis Rilling
       [not found]             ` <20090908080944.GP12824-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 22+ messages in thread
From: Louis Rilling @ 2009-09-08  8:09 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA


[-- Attachment #1.1: Type: text/plain, Size: 2066 bytes --]

On 04/09/09 10:26 -0500, Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
> > During restart, we need to allocate pty slaves with the same
> > identifiers as recorded during checkpoint. Modify the allocation code
> > to allow an in-kernel caller to request a specific slave identifier.
> > 
> > For this, add a new field to task_struct - 'required_id'. It will
> > hold the desired identifier when restoring a (master) pty.
> > 
> > The code in ptmx_open() will use this value only for tasks that try to
> > open /dev/ptmx that are restarting (PF_RESTARTING), and if the value
> > isn't CKPT_REQUIRED_NONE (-1).
> 
> So noone has indicated any preference for this versus the ptmx_create()
> approach...
> 
> I'm satisfied knowing we have a working fallback in case task->required_id
> is deemed unacceptable.
> 
> However I'd like to not have linux-kernel folks think us morons for not
> having considered that.  Can you add a message to the changelog saying
> why we're going with this approach (namely, that it lets us re-use
> filp_open() instead of having to do a custom alloc_file in a new code-path,
> which introduces maintenance duplication for file permission checking
> paths)?

As far as I am concerned, I do have a preference for the ptmx_create()
approach. This task->required_id field reminds me the former approach taken for
restarting pids and (and SYSV IPC ids IIRC) from userspace, that was proposed
last year and actually deemed unacceptable [ IIRC, this was an argument in favor
of a restart() syscall ]. I know that it's not the same since ->required_id is
not set from userspace and used in a later syscall, but still ...

Moreover I see no reason whey there would be no way to refactorize ptmx code and
have less duplicated code with the ptmx_create() approach.

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 206 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] c/r: [pty 2/2] support for pseudo terminals
       [not found]     ` <1252074054-22241-4-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
  2009-09-04 16:21       ` Serge E. Hallyn
@ 2009-09-08  8:50       ` Louis Rilling
       [not found]         ` <20090908085013.GQ12824-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 22+ messages in thread
From: Louis Rilling @ 2009-09-08  8:50 UTC (permalink / raw)
  To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA


[-- Attachment #1.1: Type: text/plain, Size: 2757 bytes --]

On 04/09/09 10:20 -0400, Oren Laadan wrote:
> This patch adds support for checkpoint and restart of pseudo terminals
> (PTYs). Since PTYs are shared (pointed to by file, and signal), they
> are managed via objhash.
> 
> PTYs are master/slave pairs; The code arranges for the master to
> always be checkpointed first, followed by the slave. This is important
> since during restart both ends are created when restoring the master.
> 
> In this patch only UNIX98 style PTYs are supported.
> 
> Currently only PTYs that are referenced by open files are handled.
> Thus PTYs checkpoint starts with a file in tty_file_checkpoint(). It
> will first checkpoint the master and slave PTYs via tty_checkpoint(),
> and then complete the saving of the file descriptor. This means that
> in the image file, the order of objects is: master-tty, slave-tty,
> file-desc.
> 
> During restart, to restore the master side, we open the /dev/ptmx
> device and get a file handle. But at this point we don't know the
> designated objref for this file, because the file is due later on in
> the image stream. On the other hand, we can't just fput() the file
> because it will close the PTY too.
> 
> Instead, when we checkpoint the master PTY, we _reserve_ an objref
> for the file (which won't be further used in checkpoint). Then at
> restart, use it to insert the file to objhash.
> 
> TODO:
> 
> * Better sanitize input from checkpoint image on restore
> * Check the locking when saving/restoring tty_struct state
> * Echo position/buffer isn't saved (is it needed ?)
> * Handle multiple devpts mounts (namespaces)
> * Paths of ptmx and slaves are hard coded (/dev/ptmx, /dev/pts/...)
> 
> Changelog[v1]:
>   - Adjust include/asm/checkpoint_hdr.h for s390 architecture
>   - Add NCC to kernel constants header (ckpt_hdr_const)
>   - [Serge Hallyn] fix calculation of canon_datalen
> 
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>

[...]

> diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
> index a3afa0c..b8f8d79 100644
> --- a/drivers/char/tty_io.c
> +++ b/drivers/char/tty_io.c
> @@ -570,7 +582,9 @@ static void do_tty_hangup(struct work_struct *work)
>  	set_bit(TTY_HUPPED, &tty->flags);
>  	spin_unlock_irqrestore(&tty->ctrl_lock, flags);
>  
> -	/* Account for the p->signal references we killed */
> +	/* Account
> +
> +	   for the p->signal references we killed */
>  	while (refs--)
>  		tty_kref_put(tty);
>  

Nit: this hunk is certainly not needed ;)

[...]

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 206 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] c/r: [pty 1/2] allow allocation of desired pty slave
       [not found]             ` <20090908080944.GP12824-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
@ 2009-09-08 13:19               ` Oren Laadan
       [not found]                 ` <4AA659C7.5020700-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Oren Laadan @ 2009-09-08 13:19 UTC (permalink / raw)
  To: Serge E. Hallyn, Oren Laadan,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



Louis Rilling wrote:
> On 04/09/09 10:26 -0500, Serge E. Hallyn wrote:
>> Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
>>> During restart, we need to allocate pty slaves with the same
>>> identifiers as recorded during checkpoint. Modify the allocation code
>>> to allow an in-kernel caller to request a specific slave identifier.
>>>
>>> For this, add a new field to task_struct - 'required_id'. It will
>>> hold the desired identifier when restoring a (master) pty.
>>>
>>> The code in ptmx_open() will use this value only for tasks that try to
>>> open /dev/ptmx that are restarting (PF_RESTARTING), and if the value
>>> isn't CKPT_REQUIRED_NONE (-1).
>> So noone has indicated any preference for this versus the ptmx_create()
>> approach...
>>
>> I'm satisfied knowing we have a working fallback in case task->required_id
>> is deemed unacceptable.
>>
>> However I'd like to not have linux-kernel folks think us morons for not
>> having considered that.  Can you add a message to the changelog saying
>> why we're going with this approach (namely, that it lets us re-use
>> filp_open() instead of having to do a custom alloc_file in a new code-path,
>> which introduces maintenance duplication for file permission checking
>> paths)?
> 
> As far as I am concerned, I do have a preference for the ptmx_create()
> approach. This task->required_id field reminds me the former approach taken for
> restarting pids and (and SYSV IPC ids IIRC) from userspace, that was proposed
> last year and actually deemed unacceptable [ IIRC, this was an argument in favor
> of a restart() syscall ]. I know that it's not the same since ->required_id is
> not set from userspace and used in a later syscall, but still ...
> 
> Moreover I see no reason whey there would be no way to refactorize ptmx code and
> have less duplicated code with the ptmx_create() approach.

I basically agree - I simply took the easiest/fastest path; if the ptmx
code is properly refactored, we should use that instead.

Did you have a chance to look at Serge's attempt to do exactly that ?
https://lists.linux-foundation.org/pipermail/containers/2009-September/020363.html

Thanks,

Oren.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] c/r: [pty 2/2] support for pseudo terminals
       [not found]         ` <20090908085013.GQ12824-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
@ 2009-09-08 13:19           ` Oren Laadan
  0 siblings, 0 replies; 22+ messages in thread
From: Oren Laadan @ 2009-09-08 13:19 UTC (permalink / raw)
  To: Oren Laadan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



Louis Rilling wrote:
> On 04/09/09 10:20 -0400, Oren Laadan wrote:
>> This patch adds support for checkpoint and restart of pseudo terminals
>> (PTYs). Since PTYs are shared (pointed to by file, and signal), they
>> are managed via objhash.
>>
>> PTYs are master/slave pairs; The code arranges for the master to
>> always be checkpointed first, followed by the slave. This is important
>> since during restart both ends are created when restoring the master.
>>
>> In this patch only UNIX98 style PTYs are supported.
>>
>> Currently only PTYs that are referenced by open files are handled.
>> Thus PTYs checkpoint starts with a file in tty_file_checkpoint(). It
>> will first checkpoint the master and slave PTYs via tty_checkpoint(),
>> and then complete the saving of the file descriptor. This means that
>> in the image file, the order of objects is: master-tty, slave-tty,
>> file-desc.
>>
>> During restart, to restore the master side, we open the /dev/ptmx
>> device and get a file handle. But at this point we don't know the
>> designated objref for this file, because the file is due later on in
>> the image stream. On the other hand, we can't just fput() the file
>> because it will close the PTY too.
>>
>> Instead, when we checkpoint the master PTY, we _reserve_ an objref
>> for the file (which won't be further used in checkpoint). Then at
>> restart, use it to insert the file to objhash.
>>
>> TODO:
>>
>> * Better sanitize input from checkpoint image on restore
>> * Check the locking when saving/restoring tty_struct state
>> * Echo position/buffer isn't saved (is it needed ?)
>> * Handle multiple devpts mounts (namespaces)
>> * Paths of ptmx and slaves are hard coded (/dev/ptmx, /dev/pts/...)
>>
>> Changelog[v1]:
>>   - Adjust include/asm/checkpoint_hdr.h for s390 architecture
>>   - Add NCC to kernel constants header (ckpt_hdr_const)
>>   - [Serge Hallyn] fix calculation of canon_datalen
>>
>> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> 
> [...]
> 
>> diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
>> index a3afa0c..b8f8d79 100644
>> --- a/drivers/char/tty_io.c
>> +++ b/drivers/char/tty_io.c
>> @@ -570,7 +582,9 @@ static void do_tty_hangup(struct work_struct *work)
>>  	set_bit(TTY_HUPPED, &tty->flags);
>>  	spin_unlock_irqrestore(&tty->ctrl_lock, flags);
>>  
>> -	/* Account for the p->signal references we killed */
>> +	/* Account
>> +
>> +	   for the p->signal references we killed */
>>  	while (refs--)
>>  		tty_kref_put(tty);
>>  
> 
> Nit: this hunk is certainly not needed ;)

Hmm... lemme think about it ... :p

Oren.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] c/r: correctly restore pgid
       [not found]       ` <20090908030445.GB16460-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-09-08 13:33         ` Oren Laadan
       [not found]           ` <4AA65D10.6000702-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Oren Laadan @ 2009-09-08 13:33 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Oren Laadan



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
>> The main challenge with restoring the pgid of tasks is that the
>> original "owner" (the process with that pid) might have exited
>> already. I call these "ghost" pgids. 'mktree' does create these
>> processes, but they then exit without participating in the restart.
>>
>> To solve this, this patch introduces a RESTART_GHOST flag, used for
>> "ghost" owners that are created only to pass their pgid to other
>> tasks. ('mktree' now makes them call restart(2) instead of exiting).
>>
>> When a "ghost" task calls restart(2), it will be placed on a wait
>> queue until the restart completes and then exit. This guarantees that
>> the pgid that it owns remains available for all (regular) restarting
>> tasks for when they need it.
>>
>> Regular tasks perform the restart as before, except that they also
>> now restore their old pgrp, which is guaranteed to exist.
>>
>> Changelog [v1]:
>>   - Verify that pgid owner is a thread-group-leader.
>>   - Handle the case of pgid/sid == 0 using root's parent pid-ns
>>
>> Signed-off-by: Oren Laadan <orenl-3n9FKfXVdfcggxPgtCmjjw@public.gmane.org>
>> ---
>>  checkpoint/process.c             |  106 ++++++++++++++++++++++++-
>>  checkpoint/restart.c             |  158 ++++++++++++++++++++++++++------------
>>  checkpoint/sys.c                 |    3 +-
>>  include/linux/checkpoint.h       |   11 ++-
>>  include/linux/checkpoint_hdr.h   |    3 +
>>  include/linux/checkpoint_types.h |    6 +-
>>  6 files changed, 230 insertions(+), 57 deletions(-)
>>
>> diff --git a/checkpoint/process.c b/checkpoint/process.c
>> index 40b2580..5d6bdb9 100644
>> --- a/checkpoint/process.c
>> +++ b/checkpoint/process.c
>> @@ -23,6 +23,57 @@
>>  #include <linux/syscalls.h>
>>
>>
>> +pid_t ckpt_pid_nr(struct ckpt_ctx *ctx, struct pid *pid)
>> +{
>> +	return pid ? pid_nr_ns(pid, ctx->root_nsproxy->pid_ns) : CKPT_PID_NULL;
>> +}
>> +
>> +/* must be called with tasklist_lock or rcu_read_lock() held */
>> +struct pid *_ckpt_find_pgrp(struct ckpt_ctx *ctx, pid_t pgid)
>> +{
>> +	struct task_struct *p;
>> +	struct pid *pgrp;
>> +
>> +	if (pgid == 0) {
>> +		/*
>> +		 * At checkpoint the pgid owner lived in an ancestor
>> +		 * pid-ns. The best we can do (sanely and safely) is
>> +		 * to examine the parent of this restart's root: if in
>> +		 * a distinct pid-ns, use its pgrp; otherwise fail.
>> +		 */
>> +		p = ctx->root_task->real_parent;
>> +		if (p->nsproxy->pid_ns == current->nsproxy->pid_ns)
>> +			return NULL;
>> +		pgrp = task_pgrp(p);
>> +	} else {
>> +		/*
>> +		 * Find the owner process of this pgid (it must exist
>> +		 * if pgrp exists). It must be a thread group leader.
>> +		 */
>> +		pgrp = find_vpid(pgid);
>> +		p = pid_task(pgrp, PIDTYPE_PID);
>> +		if (!p || !thread_group_leader(p))
>> +			return NULL;
>> +		/*
>> +		 * The pgrp must "belong" to our restart tree (compare
>> +		 * p->checkpoint_ctx to ours). This prevents malicious
>> +		 * input from (guessing and) using unrelated pgrps. If
>> +		 * the owner is dead, then it doesn't have a context,
>> +		 * so instead compare against its (real) parent's.
>> +		 */
>> +		if (p->exit_state == EXIT_ZOMBIE)
>> +			p = p->real_parent;
>> +		if (p->checkpoint_ctx != ctx)
>> +			return NULL;
>> +	}
>> +
>> +	if (task_session(current) != task_session(p))
>> +		return NULL;
>> +
>> +	return pgrp;
>> +}
>> +
>> +
>>  #ifdef CONFIG_FUTEX
>>  static void save_task_robust_futex_list(struct ckpt_hdr_task *h,
>>  					struct task_struct *t)
>> @@ -94,8 +145,8 @@ static int checkpoint_task_struct(struct ckpt_ctx *ctx, struct task_struct *t)
>>  		h->exit_signal = t->exit_signal;
>>  		h->pdeath_signal = t->pdeath_signal;
>>
>> -		h->set_child_tid = t->set_child_tid;
>> -		h->clear_child_tid = t->clear_child_tid;
>> +		h->set_child_tid = (unsigned long) t->set_child_tid;
> 
> note that set_child_tid is an int (signed), not a long.  Same on
> x86, but not on other arches.  Shouldn't lose info so could be worse.

{set,clear}_child_tid are both pointers to user space: it's an address
in userspace, so we save it as 'unsigned long'.

{clear,set}_child_tid is defined in include/linux/sched.h ... how can
it differ for different archs ?

> 
> On the whole,
> 
> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Thanks. I got a few fixes for the code piles up and now c/r of 'screen'
with a couple of shells is working :)

Oren.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] c/r: correctly restore pgid
       [not found]           ` <4AA65D10.6000702-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
@ 2009-09-08 14:00             ` Serge E. Hallyn
       [not found]               ` <20090908140056.GA873-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Serge E. Hallyn @ 2009-09-08 14:00 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Oren Laadan

Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
> >> The main challenge with restoring the pgid of tasks is that the
> >> original "owner" (the process with that pid) might have exited
> >> already. I call these "ghost" pgids. 'mktree' does create these
> >> processes, but they then exit without participating in the restart.
> >>
> >> To solve this, this patch introduces a RESTART_GHOST flag, used for
> >> "ghost" owners that are created only to pass their pgid to other
> >> tasks. ('mktree' now makes them call restart(2) instead of exiting).
> >>
> >> When a "ghost" task calls restart(2), it will be placed on a wait
> >> queue until the restart completes and then exit. This guarantees that
> >> the pgid that it owns remains available for all (regular) restarting
> >> tasks for when they need it.
> >>
> >> Regular tasks perform the restart as before, except that they also
> >> now restore their old pgrp, which is guaranteed to exist.
> >>
> >> Changelog [v1]:
> >>   - Verify that pgid owner is a thread-group-leader.
> >>   - Handle the case of pgid/sid == 0 using root's parent pid-ns
> >>
> >> Signed-off-by: Oren Laadan <orenl-3n9FKfXVdfcggxPgtCmjjw@public.gmane.org>
> >> ---
> >>  checkpoint/process.c             |  106 ++++++++++++++++++++++++-
> >>  checkpoint/restart.c             |  158 ++++++++++++++++++++++++++------------
> >>  checkpoint/sys.c                 |    3 +-
> >>  include/linux/checkpoint.h       |   11 ++-
> >>  include/linux/checkpoint_hdr.h   |    3 +
> >>  include/linux/checkpoint_types.h |    6 +-
> >>  6 files changed, 230 insertions(+), 57 deletions(-)
> >>
> >> diff --git a/checkpoint/process.c b/checkpoint/process.c
> >> index 40b2580..5d6bdb9 100644
> >> --- a/checkpoint/process.c
> >> +++ b/checkpoint/process.c
> >> @@ -23,6 +23,57 @@
> >>  #include <linux/syscalls.h>
> >>
> >>
> >> +pid_t ckpt_pid_nr(struct ckpt_ctx *ctx, struct pid *pid)
> >> +{
> >> +	return pid ? pid_nr_ns(pid, ctx->root_nsproxy->pid_ns) : CKPT_PID_NULL;
> >> +}
> >> +
> >> +/* must be called with tasklist_lock or rcu_read_lock() held */
> >> +struct pid *_ckpt_find_pgrp(struct ckpt_ctx *ctx, pid_t pgid)
> >> +{
> >> +	struct task_struct *p;
> >> +	struct pid *pgrp;
> >> +
> >> +	if (pgid == 0) {
> >> +		/*
> >> +		 * At checkpoint the pgid owner lived in an ancestor
> >> +		 * pid-ns. The best we can do (sanely and safely) is
> >> +		 * to examine the parent of this restart's root: if in
> >> +		 * a distinct pid-ns, use its pgrp; otherwise fail.
> >> +		 */
> >> +		p = ctx->root_task->real_parent;
> >> +		if (p->nsproxy->pid_ns == current->nsproxy->pid_ns)
> >> +			return NULL;
> >> +		pgrp = task_pgrp(p);
> >> +	} else {
> >> +		/*
> >> +		 * Find the owner process of this pgid (it must exist
> >> +		 * if pgrp exists). It must be a thread group leader.
> >> +		 */
> >> +		pgrp = find_vpid(pgid);
> >> +		p = pid_task(pgrp, PIDTYPE_PID);
> >> +		if (!p || !thread_group_leader(p))
> >> +			return NULL;
> >> +		/*
> >> +		 * The pgrp must "belong" to our restart tree (compare
> >> +		 * p->checkpoint_ctx to ours). This prevents malicious
> >> +		 * input from (guessing and) using unrelated pgrps. If
> >> +		 * the owner is dead, then it doesn't have a context,
> >> +		 * so instead compare against its (real) parent's.
> >> +		 */
> >> +		if (p->exit_state == EXIT_ZOMBIE)
> >> +			p = p->real_parent;
> >> +		if (p->checkpoint_ctx != ctx)
> >> +			return NULL;
> >> +	}
> >> +
> >> +	if (task_session(current) != task_session(p))
> >> +		return NULL;
> >> +
> >> +	return pgrp;
> >> +}
> >> +
> >> +
> >>  #ifdef CONFIG_FUTEX
> >>  static void save_task_robust_futex_list(struct ckpt_hdr_task *h,
> >>  					struct task_struct *t)
> >> @@ -94,8 +145,8 @@ static int checkpoint_task_struct(struct ckpt_ctx *ctx, struct task_struct *t)
> >>  		h->exit_signal = t->exit_signal;
> >>  		h->pdeath_signal = t->pdeath_signal;
> >>
> >> -		h->set_child_tid = t->set_child_tid;
> >> -		h->clear_child_tid = t->clear_child_tid;
> >> +		h->set_child_tid = (unsigned long) t->set_child_tid;
> > 
> > note that set_child_tid is an int (signed), not a long.  Same on
> > x86, but not on other arches.  Shouldn't lose info so could be worse.
> 
> {set,clear}_child_tid are both pointers to user space: it's an address
> in userspace, so we save it as 'unsigned long'.
> 
> {clear,set}_child_tid is defined in include/linux/sched.h ... how can
> it differ for different archs ?

sizeof long differs for different archs.  Not the type of x_child_tid.

> > 
> > On the whole,
> > 
> > Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> 
> Thanks. I got a few fixes for the code piles up and now c/r of 'screen'
> with a couple of shells is working :)

Cool!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] c/r: [pty 1/2] allow allocation of desired pty slave
       [not found]                 ` <4AA659C7.5020700-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
@ 2009-09-08 14:04                   ` Louis Rilling
  0 siblings, 0 replies; 22+ messages in thread
From: Louis Rilling @ 2009-09-08 14:04 UTC (permalink / raw)
  To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA


[-- Attachment #1.1: Type: text/plain, Size: 2636 bytes --]

On 08/09/09  9:19 -0400, Oren Laadan wrote:
> 
> 
> Louis Rilling wrote:
> > On 04/09/09 10:26 -0500, Serge E. Hallyn wrote:
> >> Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
> >>> During restart, we need to allocate pty slaves with the same
> >>> identifiers as recorded during checkpoint. Modify the allocation code
> >>> to allow an in-kernel caller to request a specific slave identifier.
> >>>
> >>> For this, add a new field to task_struct - 'required_id'. It will
> >>> hold the desired identifier when restoring a (master) pty.
> >>>
> >>> The code in ptmx_open() will use this value only for tasks that try to
> >>> open /dev/ptmx that are restarting (PF_RESTARTING), and if the value
> >>> isn't CKPT_REQUIRED_NONE (-1).
> >> So noone has indicated any preference for this versus the ptmx_create()
> >> approach...
> >>
> >> I'm satisfied knowing we have a working fallback in case task->required_id
> >> is deemed unacceptable.
> >>
> >> However I'd like to not have linux-kernel folks think us morons for not
> >> having considered that.  Can you add a message to the changelog saying
> >> why we're going with this approach (namely, that it lets us re-use
> >> filp_open() instead of having to do a custom alloc_file in a new code-path,
> >> which introduces maintenance duplication for file permission checking
> >> paths)?
> > 
> > As far as I am concerned, I do have a preference for the ptmx_create()
> > approach. This task->required_id field reminds me the former approach taken for
> > restarting pids and (and SYSV IPC ids IIRC) from userspace, that was proposed
> > last year and actually deemed unacceptable [ IIRC, this was an argument in favor
> > of a restart() syscall ]. I know that it's not the same since ->required_id is
> > not set from userspace and used in a later syscall, but still ...
> > 
> > Moreover I see no reason whey there would be no way to refactorize ptmx code and
> > have less duplicated code with the ptmx_create() approach.
> 
> I basically agree - I simply took the easiest/fastest path; if the ptmx
> code is properly refactored, we should use that instead.
> 
> Did you have a chance to look at Serge's attempt to do exactly that ?
> https://lists.linux-foundation.org/pipermail/containers/2009-September/020363.html

I only had a quick look. Will try to look closer, but don't rely on me being
quick at this.

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 206 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] c/r: correctly restore pgid
       [not found]               ` <20090908140056.GA873-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-09-08 16:40                 ` Oren Laadan
  0 siblings, 0 replies; 22+ messages in thread
From: Oren Laadan @ 2009-09-08 16:40 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
>>
>> Serge E. Hallyn wrote:
>>> Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
>>>> The main challenge with restoring the pgid of tasks is that the
>>>> original "owner" (the process with that pid) might have exited
>>>> already. I call these "ghost" pgids. 'mktree' does create these
>>>> processes, but they then exit without participating in the restart.
>>>>
>>>> To solve this, this patch introduces a RESTART_GHOST flag, used for
>>>> "ghost" owners that are created only to pass their pgid to other
>>>> tasks. ('mktree' now makes them call restart(2) instead of exiting).
>>>>
>>>> When a "ghost" task calls restart(2), it will be placed on a wait
>>>> queue until the restart completes and then exit. This guarantees that
>>>> the pgid that it owns remains available for all (regular) restarting
>>>> tasks for when they need it.
>>>>
>>>> Regular tasks perform the restart as before, except that they also
>>>> now restore their old pgrp, which is guaranteed to exist.
>>>>
>>>> Changelog [v1]:
>>>>   - Verify that pgid owner is a thread-group-leader.
>>>>   - Handle the case of pgid/sid == 0 using root's parent pid-ns
>>>>
>>>> Signed-off-by: Oren Laadan <orenl-3n9FKfXVdfcggxPgtCmjjw@public.gmane.org>
>>>> ---
>>>>  checkpoint/process.c             |  106 ++++++++++++++++++++++++-
>>>>  checkpoint/restart.c             |  158 ++++++++++++++++++++++++++------------
>>>>  checkpoint/sys.c                 |    3 +-
>>>>  include/linux/checkpoint.h       |   11 ++-
>>>>  include/linux/checkpoint_hdr.h   |    3 +
>>>>  include/linux/checkpoint_types.h |    6 +-
>>>>  6 files changed, 230 insertions(+), 57 deletions(-)
>>>>
>>>> diff --git a/checkpoint/process.c b/checkpoint/process.c
>>>> index 40b2580..5d6bdb9 100644
>>>> --- a/checkpoint/process.c
>>>> +++ b/checkpoint/process.c
>>>> @@ -23,6 +23,57 @@
>>>>  #include <linux/syscalls.h>
>>>>
>>>>
>>>> +pid_t ckpt_pid_nr(struct ckpt_ctx *ctx, struct pid *pid)
>>>> +{
>>>> +	return pid ? pid_nr_ns(pid, ctx->root_nsproxy->pid_ns) : CKPT_PID_NULL;
>>>> +}
>>>> +
>>>> +/* must be called with tasklist_lock or rcu_read_lock() held */
>>>> +struct pid *_ckpt_find_pgrp(struct ckpt_ctx *ctx, pid_t pgid)
>>>> +{
>>>> +	struct task_struct *p;
>>>> +	struct pid *pgrp;
>>>> +
>>>> +	if (pgid == 0) {
>>>> +		/*
>>>> +		 * At checkpoint the pgid owner lived in an ancestor
>>>> +		 * pid-ns. The best we can do (sanely and safely) is
>>>> +		 * to examine the parent of this restart's root: if in
>>>> +		 * a distinct pid-ns, use its pgrp; otherwise fail.
>>>> +		 */
>>>> +		p = ctx->root_task->real_parent;
>>>> +		if (p->nsproxy->pid_ns == current->nsproxy->pid_ns)
>>>> +			return NULL;
>>>> +		pgrp = task_pgrp(p);
>>>> +	} else {
>>>> +		/*
>>>> +		 * Find the owner process of this pgid (it must exist
>>>> +		 * if pgrp exists). It must be a thread group leader.
>>>> +		 */
>>>> +		pgrp = find_vpid(pgid);
>>>> +		p = pid_task(pgrp, PIDTYPE_PID);
>>>> +		if (!p || !thread_group_leader(p))
>>>> +			return NULL;
>>>> +		/*
>>>> +		 * The pgrp must "belong" to our restart tree (compare
>>>> +		 * p->checkpoint_ctx to ours). This prevents malicious
>>>> +		 * input from (guessing and) using unrelated pgrps. If
>>>> +		 * the owner is dead, then it doesn't have a context,
>>>> +		 * so instead compare against its (real) parent's.
>>>> +		 */
>>>> +		if (p->exit_state == EXIT_ZOMBIE)
>>>> +			p = p->real_parent;
>>>> +		if (p->checkpoint_ctx != ctx)
>>>> +			return NULL;
>>>> +	}
>>>> +
>>>> +	if (task_session(current) != task_session(p))
>>>> +		return NULL;
>>>> +
>>>> +	return pgrp;
>>>> +}
>>>> +
>>>> +
>>>>  #ifdef CONFIG_FUTEX
>>>>  static void save_task_robust_futex_list(struct ckpt_hdr_task *h,
>>>>  					struct task_struct *t)
>>>> @@ -94,8 +145,8 @@ static int checkpoint_task_struct(struct ckpt_ctx *ctx, struct task_struct *t)
>>>>  		h->exit_signal = t->exit_signal;
>>>>  		h->pdeath_signal = t->pdeath_signal;
>>>>
>>>> -		h->set_child_tid = t->set_child_tid;
>>>> -		h->clear_child_tid = t->clear_child_tid;
>>>> +		h->set_child_tid = (unsigned long) t->set_child_tid;
>>> note that set_child_tid is an int (signed), not a long.  Same on
>>> x86, but not on other arches.  Shouldn't lose info so could be worse.
>> {set,clear}_child_tid are both pointers to user space: it's an address
>> in userspace, so we save it as 'unsigned long'.
>>
>> {clear,set}_child_tid is defined in include/linux/sched.h ... how can
>> it differ for different archs ?
> 
> sizeof long differs for different archs.  Not the type of x_child_tid.

Sure. In all ckpt headers, all pointers allway get a __u64 regardless
of arch, to cover both 32- and 64-bit.

Oren.

> 
>>> On the whole,
>>>
>>> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>> Thanks. I got a few fixes for the code piles up and now c/r of 'screen'
>> with a couple of shells is working :)
> 
> Cool!

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-09-08 16:40 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-04 14:20 Checkpoint/restart of ptys, pgids, and controlling tty Oren Laadan
     [not found] ` <1252074054-22241-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-09-04 14:20   ` [PATCH 1/6] c/r: [objhash] add ckpt_obj_reserve() Oren Laadan
2009-09-04 14:20   ` [PATCH 2/6] c/r: [pty 1/2] allow allocation of desired pty slave Oren Laadan
     [not found]     ` <1252074054-22241-3-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-09-04 15:26       ` Serge E. Hallyn
     [not found]         ` <20090904152644.GA15253-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-04 16:08           ` Oren Laadan
2009-09-08  8:09           ` Louis Rilling
     [not found]             ` <20090908080944.GP12824-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2009-09-08 13:19               ` Oren Laadan
     [not found]                 ` <4AA659C7.5020700-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-09-08 14:04                   ` Louis Rilling
2009-09-04 14:20   ` [PATCH 3/6] c/r: [pty 2/2] support for pseudo terminals Oren Laadan
     [not found]     ` <1252074054-22241-4-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-09-04 16:21       ` Serge E. Hallyn
2009-09-08  8:50       ` Louis Rilling
     [not found]         ` <20090908085013.GQ12824-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2009-09-08 13:19           ` Oren Laadan
2009-09-04 14:20   ` [PATCH 4/6] c/r: tighten logic to protect against bogus pids in input Oren Laadan
     [not found]     ` <1252074054-22241-5-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-09-08  1:09       ` Serge E. Hallyn
2009-09-04 14:20   ` [PATCH 1/3] mktree: add support to ghost tasks (TASK_GHOST) Oren Laadan
2009-09-04 14:20   ` [PATCH 2/3] test/pgrp.c: add test case for process-groups Oren Laadan
2009-09-04 14:20   ` [PATCH 3/3] restart: correctly handle pgid/ppid/sid = 0 Oren Laadan
     [not found] ` <1252074054-22241-6-git-send-email-orenl@librato.com>
     [not found]   ` <1252074054-22241-6-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-09-08  3:04     ` [PATCH 5/6] c/r: correctly restore pgid Serge E. Hallyn
     [not found]       ` <20090908030445.GB16460-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-08 13:33         ` Oren Laadan
     [not found]           ` <4AA65D10.6000702-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-09-08 14:00             ` Serge E. Hallyn
     [not found]               ` <20090908140056.GA873-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-09-08 16:40                 ` Oren Laadan
     [not found] ` <1252074054-22241-7-git-send-email-orenl@librato.com>
     [not found]   ` <1252074054-22241-7-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-09-08  3:21     ` [PATCH 6/6] c/r: support for controlling terminal and job control Serge E. Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.