linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements
@ 2012-10-24 15:34 Stanislav Kinsbursky
  2012-10-24 15:35 ` [PATCH v8 1/5] ipc: remove forced assignment of selected message Stanislav Kinsbursky
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Stanislav Kinsbursky @ 2012-10-24 15:34 UTC (permalink / raw)
  To: akpm
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages

v8:
This respin of the patch set was significantly reworked. Most part of new API
was replaced by sysctls (by one per messages, semaphores and shared memory),
allowing to preset desired id for next new IPC object.

This patch set is aimed to provide additional functionality for all IPC
objects, which is required for migration of these objects by user-space
checkpoint/restore utils (CRIU).

The main problem here was impossibility to set up object id. This patch set
solves the problem by adding new sysctls for preset of desired id for new IPC
object.

Another problem was to peek messages from queues without deleting them.
This was achived by introducing of new MSG_COPY flag for sys_msgrcv(). If
MSG_COPY flag is set, then msgtyp is interpreted as message number.

The following series implements...

---

Stanislav Kinsbursky (5):
      ipc: remove forced assignment of selected message
      ipc: add sysctl to specify desired next object id
      ipc: message queue receive cleanup
      ipc: message queue copy feature introduced
      test: IPC message queue copy feture test


 include/linux/ipc_namespace.h        |    1 
 include/linux/msg.h                  |    5 -
 include/uapi/linux/msg.h             |    1 
 ipc/compat.c                         |   45 +++----
 ipc/ipc_sysctl.c                     |   28 ++++
 ipc/msg.c                            |   99 +++++++++++----
 ipc/msgutil.c                        |   38 ++++++
 ipc/util.c                           |   16 ++
 ipc/util.h                           |    2 
 tools/testing/selftests/ipc/Makefile |   25 ++++
 tools/testing/selftests/ipc/msgque.c |  231 ++++++++++++++++++++++++++++++++++
 11 files changed, 432 insertions(+), 59 deletions(-)
 create mode 100644 tools/testing/selftests/ipc/Makefile
 create mode 100644 tools/testing/selftests/ipc/msgque.c


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v8 1/5] ipc: remove forced assignment of selected message
  2012-10-24 15:34 [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Stanislav Kinsbursky
@ 2012-10-24 15:35 ` Stanislav Kinsbursky
  2012-10-24 15:35 ` [PATCH v8 2/5] ipc: add sysctl to specify desired next object id Stanislav Kinsbursky
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Stanislav Kinsbursky @ 2012-10-24 15:35 UTC (permalink / raw)
  To: akpm
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages

This is a cleanup patch. The assignment is redundant.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 ipc/msg.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index a71af5a..2f272fa 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -793,12 +793,9 @@ long do_msgrcv(int msqid, long *pmtype, void __user *mtext,
 				msg = walk_msg;
 				if (mode == SEARCH_LESSEQUAL &&
 						walk_msg->m_type != 1) {
-					msg = walk_msg;
 					msgtyp = walk_msg->m_type - 1;
-				} else {
-					msg = walk_msg;
+				} else
 					break;
-				}
 			}
 			tmp = tmp->next;
 		}


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 2/5] ipc: add sysctl to specify desired next object id
  2012-10-24 15:34 [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Stanislav Kinsbursky
  2012-10-24 15:35 ` [PATCH v8 1/5] ipc: remove forced assignment of selected message Stanislav Kinsbursky
@ 2012-10-24 15:35 ` Stanislav Kinsbursky
  2012-10-24 21:41   ` Andrew Morton
  2012-10-24 15:35 ` [PATCH v8 3/5] ipc: message queue receive cleanup Stanislav Kinsbursky
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Stanislav Kinsbursky @ 2012-10-24 15:35 UTC (permalink / raw)
  To: akpm
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages

This patch adds 3 new variables and sysctls to tune them (by one "next_id"
variable for messages, semaphores and shared memory respectively).
This variable can be used to set desired id for next allocated IPC object.
By default it's equal to -1 and old behaviour is preserved.
If this variable is non-negative, then desired idr will be extracted from it
and used as a start value to search for free IDR slot.

Notes:
1) this patch doesn't garantee, that new object will have desired id. So it's
up to user space how to handle new object with wrong id.
2) After sucessfull id allocation attempt, "next_id" will be set back to -1
(if it was non-negative).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 include/linux/ipc_namespace.h |    1 +
 ipc/ipc_sysctl.c              |   28 ++++++++++++++++++++++++++++
 ipc/util.c                    |   16 ++++++++++++----
 ipc/util.h                    |    1 +
 4 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 5499c92..8704e40 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -24,6 +24,7 @@ struct ipc_ids {
 	unsigned short seq_max;
 	struct rw_semaphore rw_mutex;
 	struct idr ipcs_idr;
+	int next_id;
 };
 
 struct ipc_namespace {
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 00fba2b..d06d77a 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -158,6 +158,7 @@ static int proc_ipcauto_dointvec_minmax(ctl_table *table, int write,
 
 static int zero;
 static int one = 1;
+static int int_max = INT_MAX;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -227,6 +228,33 @@ static struct ctl_table ipc_kern_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one,
 	},
+	{
+		.procname	= "sem_next_id",
+		.data		= &init_ipc_ns.ids[IPC_SEM_IDS].next_id,
+		.maxlen		= sizeof(init_ipc_ns.ids[IPC_SEM_IDS].next_id),
+		.mode		= 0644,
+		.proc_handler	= proc_ipc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &int_max,
+	},
+	{
+		.procname	= "msg_next_id",
+		.data		= &init_ipc_ns.ids[IPC_MSG_IDS].next_id,
+		.maxlen		= sizeof(init_ipc_ns.ids[IPC_MSG_IDS].next_id),
+		.mode		= 0644,
+		.proc_handler	= proc_ipc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &int_max,
+	},
+	{
+		.procname	= "shm_next_id",
+		.data		= &init_ipc_ns.ids[IPC_SHM_IDS].next_id,
+		.maxlen		= sizeof(init_ipc_ns.ids[IPC_SHM_IDS].next_id),
+		.mode		= 0644,
+		.proc_handler	= proc_ipc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &int_max,
+	},
 	{}
 };
 
diff --git a/ipc/util.c b/ipc/util.c
index 72fd078..a961e46 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -122,6 +122,7 @@ void ipc_init_ids(struct ipc_ids *ids)
 
 	ids->in_use = 0;
 	ids->seq = 0;
+	ids->next_id = -1;
 	{
 		int seq_limit = INT_MAX/SEQ_MULTIPLIER;
 		if (seq_limit > USHRT_MAX)
@@ -252,6 +253,7 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 	kuid_t euid;
 	kgid_t egid;
 	int id, err;
+	int next_id = ids->next_id;
 
 	if (size > IPCMNI)
 		size = IPCMNI;
@@ -264,7 +266,8 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 	rcu_read_lock();
 	spin_lock(&new->lock);
 
-	err = idr_get_new(&ids->ipcs_idr, new, &id);
+	err = idr_get_new_above(&ids->ipcs_idr, new,
+				(next_id < 0) ? 0 : ipcid_to_idx(next_id), &id);
 	if (err) {
 		spin_unlock(&new->lock);
 		rcu_read_unlock();
@@ -277,9 +280,14 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 	new->cuid = new->uid = euid;
 	new->gid = new->cgid = egid;
 
-	new->seq = ids->seq++;
-	if(ids->seq > ids->seq_max)
-		ids->seq = 0;
+	if (next_id < 0) {
+		new->seq = ids->seq++;
+		if(ids->seq > ids->seq_max)
+			ids->seq = 0;
+	} else {
+		new->seq = ipcid_to_seqx(next_id);
+		ids->next_id = -1;
+	}
 
 	new->id = ipc_buildid(id, new->seq);
 	return id;
diff --git a/ipc/util.h b/ipc/util.h
index c8fe2f7..a61e0ca 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -92,6 +92,7 @@ void __init ipc_init_proc_interface(const char *path, const char *header,
 #define IPC_SHM_IDS	2
 
 #define ipcid_to_idx(id) ((id) % SEQ_MULTIPLIER)
+#define ipcid_to_seqx(id) ((id) / SEQ_MULTIPLIER)
 
 /* must be called with ids->rw_mutex acquired for writing */
 int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 3/5] ipc: message queue receive cleanup
  2012-10-24 15:34 [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Stanislav Kinsbursky
  2012-10-24 15:35 ` [PATCH v8 1/5] ipc: remove forced assignment of selected message Stanislav Kinsbursky
  2012-10-24 15:35 ` [PATCH v8 2/5] ipc: add sysctl to specify desired next object id Stanislav Kinsbursky
@ 2012-10-24 15:35 ` Stanislav Kinsbursky
  2012-10-24 15:35 ` [PATCH v8 4/5] ipc: message queue copy feature introduced Stanislav Kinsbursky
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Stanislav Kinsbursky @ 2012-10-24 15:35 UTC (permalink / raw)
  To: akpm
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages

This patch moves all message related manipulation into one function msg_fill().
Actually, two functions because of the compat one.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 include/linux/msg.h |    5 +++--
 ipc/compat.c        |   45 +++++++++++++++++++--------------------------
 ipc/msg.c           |   44 +++++++++++++++++++++++---------------------
 3 files changed, 45 insertions(+), 49 deletions(-)

diff --git a/include/linux/msg.h b/include/linux/msg.h
index 7a4b9e9..f38edba 100644
--- a/include/linux/msg.h
+++ b/include/linux/msg.h
@@ -34,7 +34,8 @@ struct msg_queue {
 /* Helper routines for sys_msgsnd and sys_msgrcv */
 extern long do_msgsnd(int msqid, long mtype, void __user *mtext,
 			size_t msgsz, int msgflg);
-extern long do_msgrcv(int msqid, long *pmtype, void __user *mtext,
-			size_t msgsz, long msgtyp, int msgflg);
+extern long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
+		      int msgflg,
+		      long (*msg_fill)(void __user *, struct msg_msg *, size_t ));
 
 #endif /* _LINUX_MSG_H */
diff --git a/ipc/compat.c b/ipc/compat.c
index ad9518e..eb3ea16 100644
--- a/ipc/compat.c
+++ b/ipc/compat.c
@@ -306,6 +306,20 @@ static long do_compat_semctl(int first, int second, int third, u32 pad)
 	return err;
 }
 
+long compat_do_msg_fill(void __user *dest, struct msg_msg *msg, size_t bufsz)
+{
+	struct compat_msgbuf __user *msgp = dest;
+	size_t msgsz;
+
+	if (put_user(msg->m_type, &msgp->mtype))
+		return -EFAULT;
+
+	msgsz = (bufsz > msg->m_ts) ? msg->m_ts : bufsz;
+	if (store_msg(msgp->mtext, msg, msgsz))
+		return -EFAULT;
+	return msgsz;
+}
+
 #ifdef CONFIG_ARCH_WANT_OLD_COMPAT_IPC
 long compat_sys_semctl(int first, int second, int third, void __user *uptr)
 {
@@ -337,10 +351,6 @@ long compat_sys_msgsnd(int first, int second, int third, void __user *uptr)
 long compat_sys_msgrcv(int first, int second, int msgtyp, int third,
 			   int version, void __user *uptr)
 {
-	struct compat_msgbuf __user *up;
-	long type;
-	int err;
-
 	if (first < 0)
 		return -EINVAL;
 	if (second < 0)
@@ -348,23 +358,14 @@ long compat_sys_msgrcv(int first, int second, int msgtyp, int third,
 
 	if (!version) {
 		struct compat_ipc_kludge ipck;
-		err = -EINVAL;
 		if (!uptr)
-			goto out;
-		err = -EFAULT;
+			return -EINVAL;
 		if (copy_from_user (&ipck, uptr, sizeof(ipck)))
-			goto out;
+			return -EFAULT;
 		uptr = compat_ptr(ipck.msgp);
 		msgtyp = ipck.msgtyp;
 	}
-	up = uptr;
-	err = do_msgrcv(first, &type, up->mtext, second, msgtyp, third);
-	if (err < 0)
-		goto out;
-	if (put_user(type, &up->mtype))
-		err = -EFAULT;
-out:
-	return err;
+	return do_msgrcv(first, uptr, second, msgtyp, third, compat_do_msg_fill);
 }
 #else
 long compat_sys_semctl(int semid, int semnum, int cmd, int arg)
@@ -385,16 +386,8 @@ long compat_sys_msgsnd(int msqid, struct compat_msgbuf __user *msgp,
 long compat_sys_msgrcv(int msqid, struct compat_msgbuf __user *msgp,
 		       compat_ssize_t msgsz, long msgtyp, int msgflg)
 {
-	long err, mtype;
-
-	err =  do_msgrcv(msqid, &mtype, msgp->mtext, (ssize_t)msgsz, msgtyp, msgflg);
-	if (err < 0)
-		goto out;
-
-	if (put_user(mtype, &msgp->mtype))
-		err = -EFAULT;
- out:
-	return err;
+	return do_msgrcv(msqid, msgp, (ssize_t)msgsz, msgtyp, msgflg,
+			 compat_do_msg_fill);
 }
 #endif
 
diff --git a/ipc/msg.c b/ipc/msg.c
index 2f272fa..532ebc3 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -755,15 +755,30 @@ static inline int convert_mode(long *msgtyp, int msgflg)
 	return SEARCH_EQUAL;
 }
 
-long do_msgrcv(int msqid, long *pmtype, void __user *mtext,
-		size_t msgsz, long msgtyp, int msgflg)
+static long do_msg_fill(void __user *dest, struct msg_msg *msg, size_t bufsz)
+{
+	struct msgbuf __user *msgp = dest;
+	size_t msgsz;
+
+	if (put_user(msg->m_type, &msgp->mtype))
+		return -EFAULT;
+
+	msgsz = (bufsz > msg->m_ts) ? msg->m_ts : bufsz;
+	if (store_msg(msgp->mtext, msg, msgsz))
+		return -EFAULT;
+	return msgsz;
+}
+
+long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
+	       int msgflg,
+	       long (*msg_handler)(void __user *, struct msg_msg *, size_t ))
 {
 	struct msg_queue *msq;
 	struct msg_msg *msg;
 	int mode;
 	struct ipc_namespace *ns;
 
-	if (msqid < 0 || (long) msgsz < 0)
+	if (msqid < 0 || (long) bufsz < 0)
 		return -EINVAL;
 	mode = convert_mode(&msgtyp, msgflg);
 	ns = current->nsproxy->ipc_ns;
@@ -804,7 +819,7 @@ long do_msgrcv(int msqid, long *pmtype, void __user *mtext,
 			 * Found a suitable message.
 			 * Unlink it from the queue.
 			 */
-			if ((msgsz < msg->m_ts) && !(msgflg & MSG_NOERROR)) {
+			if ((bufsz < msg->m_ts) && !(msgflg & MSG_NOERROR)) {
 				msg = ERR_PTR(-E2BIG);
 				goto out_unlock;
 			}
@@ -831,7 +846,7 @@ long do_msgrcv(int msqid, long *pmtype, void __user *mtext,
 		if (msgflg & MSG_NOERROR)
 			msr_d.r_maxsize = INT_MAX;
 		else
-			msr_d.r_maxsize = msgsz;
+			msr_d.r_maxsize = bufsz;
 		msr_d.r_msg = ERR_PTR(-EAGAIN);
 		current->state = TASK_INTERRUPTIBLE;
 		msg_unlock(msq);
@@ -894,29 +909,16 @@ out_unlock:
 	if (IS_ERR(msg))
 		return PTR_ERR(msg);
 
-	msgsz = (msgsz > msg->m_ts) ? msg->m_ts : msgsz;
-	*pmtype = msg->m_type;
-	if (store_msg(mtext, msg, msgsz))
-		msgsz = -EFAULT;
-
+	bufsz = msg_handler(buf, msg, bufsz);
 	free_msg(msg);
 
-	return msgsz;
+	return bufsz;
 }
 
 SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz,
 		long, msgtyp, int, msgflg)
 {
-	long err, mtype;
-
-	err =  do_msgrcv(msqid, &mtype, msgp->mtext, msgsz, msgtyp, msgflg);
-	if (err < 0)
-		goto out;
-
-	if (put_user(mtype, &msgp->mtype))
-		err = -EFAULT;
-out:
-	return err;
+	return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill);
 }
 
 #ifdef CONFIG_PROC_FS


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 4/5] ipc: message queue copy feature introduced
  2012-10-24 15:34 [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Stanislav Kinsbursky
                   ` (2 preceding siblings ...)
  2012-10-24 15:35 ` [PATCH v8 3/5] ipc: message queue receive cleanup Stanislav Kinsbursky
@ 2012-10-24 15:35 ` Stanislav Kinsbursky
  2012-10-24 21:41   ` Andrew Morton
  2012-10-24 15:35 ` [PATCH v8 5/5] test: IPC message queue copy feture test Stanislav Kinsbursky
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Stanislav Kinsbursky @ 2012-10-24 15:35 UTC (permalink / raw)
  To: akpm
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages

This patch is required for checkpoint/restore in userspace.
IOW, c/r requires some way to get all pending IPC messages without deleting
them from the queue (checkpoint can fail and in this case tasks will be resumed,
so queue have to be valid).
To achive this, new operation flag MSG_COPY for sys_msgrcv() system call was
introduced. If this flag was specified, then mtype is interpreted as number of
the message to copy.
If MSG_COPY is set, then kernel will allocate dummy message with passed size,
and then use new copy_msg() helper function to copy desired message (instead of
unlinking it from the queue).

Notes:
1) Return -ENOSYS if MSG_COPY is specified, but CONFIG_CHECKPOINT_RESTORE is
not set.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 include/uapi/linux/msg.h |    1 +
 ipc/msg.c                |   50 ++++++++++++++++++++++++++++++++++++++++++++--
 ipc/msgutil.c            |   38 +++++++++++++++++++++++++++++++++++
 ipc/util.h               |    1 +
 4 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/msg.h b/include/uapi/linux/msg.h
index 78dbd2f..22d95c6 100644
--- a/include/uapi/linux/msg.h
+++ b/include/uapi/linux/msg.h
@@ -10,6 +10,7 @@
 /* msgrcv options */
 #define MSG_NOERROR     010000  /* no error if message is too big */
 #define MSG_EXCEPT      020000  /* recv any msg except of specified type.*/
+#define MSG_COPY        040000  /* copy (not remove) all queue messages */
 
 /* Obsolete, used only for backwards compatibility and libc5 compiles */
 struct msqid_ds {
diff --git a/ipc/msg.c b/ipc/msg.c
index 532ebc3..28320ab 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -777,19 +777,48 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
 	struct msg_msg *msg;
 	int mode;
 	struct ipc_namespace *ns;
+#ifdef CONFIG_CHECKPOINT_RESTORE
+	struct msg_msg *copy = NULL;
+	unsigned long copy_number = 0;
+#endif
 
 	if (msqid < 0 || (long) bufsz < 0)
 		return -EINVAL;
+	if (msgflg & MSG_COPY) {
+#ifdef CONFIG_CHECKPOINT_RESTORE
+
+		if (msgflg & MSG_COPY) {
+			copy_number = msgtyp;
+			msgtyp = 0;
+		}
+
+		/*
+		 * Create dummy message to copy real message to.
+		 */
+		copy = load_msg(buf, bufsz);
+		if (IS_ERR(copy))
+			return PTR_ERR(copy);
+		copy->m_ts = bufsz;
+#else
+		return -ENOSYS;
+#endif
+	}
 	mode = convert_mode(&msgtyp, msgflg);
 	ns = current->nsproxy->ipc_ns;
 
 	msq = msg_lock_check(ns, msqid);
-	if (IS_ERR(msq))
+	if (IS_ERR(msq)) {
+#ifdef CONFIG_CHECKPOINT_RESTORE
+		if (msgflg & MSG_COPY)
+			free_msg(copy);
+#endif
 		return PTR_ERR(msq);
+	}
 
 	for (;;) {
 		struct msg_receiver msr_d;
 		struct list_head *tmp;
+		long msg_counter = 0;
 
 		msg = ERR_PTR(-EACCES);
 		if (ipcperms(ns, &msq->q_perm, S_IRUGO))
@@ -809,8 +838,16 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
 				if (mode == SEARCH_LESSEQUAL &&
 						walk_msg->m_type != 1) {
 					msgtyp = walk_msg->m_type - 1;
+#ifdef CONFIG_CHECKPOINT_RESTORE
+				} else if (msgflg & MSG_COPY) {
+					if (copy_number == msg_counter) {
+						msg = copy_msg(walk_msg, copy);
+						break;
+					}
+#endif
 				} else
 					break;
+				msg_counter++;
 			}
 			tmp = tmp->next;
 		}
@@ -823,6 +860,10 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
 				msg = ERR_PTR(-E2BIG);
 				goto out_unlock;
 			}
+#ifdef CONFIG_CHECKPOINT_RESTORE
+			if (msgflg & MSG_COPY)
+				goto out_unlock;
+#endif
 			list_del(&msg->m_list);
 			msq->q_qnum--;
 			msq->q_rtime = get_seconds();
@@ -906,8 +947,13 @@ out_unlock:
 			break;
 		}
 	}
-	if (IS_ERR(msg))
+	if (IS_ERR(msg)) {
+#ifdef CONFIG_CHECKPOINT_RESTORE
+		if (msgflg & MSG_COPY)
+			free_msg(copy);
+#endif
 		return PTR_ERR(msg);
+	}
 
 	bufsz = msg_handler(buf, msg, bufsz);
 	free_msg(msg);
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index 26143d3..b281f5c 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -100,7 +100,45 @@ out_err:
 	free_msg(msg);
 	return ERR_PTR(err);
 }
+#ifdef CONFIG_CHECKPOINT_RESTORE
+struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst)
+{
+	struct msg_msgseg *dst_pseg, *src_pseg;
+	int len = src->m_ts;
+	int alen;
+
+	BUG_ON(dst == NULL);
+	if (src->m_ts > dst->m_ts)
+		return ERR_PTR(-EINVAL);
+
+	alen = len;
+	if (alen > DATALEN_MSG)
+		alen = DATALEN_MSG;
+
+	dst->next = NULL;
+	dst->security = NULL;
 
+	memcpy(dst + 1, src + 1, alen);
+
+	len -= alen;
+	dst_pseg = dst->next;
+	src_pseg = src->next;
+	while (len > 0) {
+		alen = len;
+		if (alen > DATALEN_SEG)
+			alen = DATALEN_SEG;
+		memcpy(dst_pseg + 1, src_pseg + 1, alen);
+		dst_pseg = dst_pseg->next;
+		len -= alen;
+		src_pseg = src_pseg->next;
+	}
+
+	dst->m_type = src->m_type;
+	dst->m_ts = src->m_ts;
+
+	return dst;
+}
+#endif
 int store_msg(void __user *dest, struct msg_msg *msg, int len)
 {
 	int alen;
diff --git a/ipc/util.h b/ipc/util.h
index a61e0ca..eeb79a1 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -140,6 +140,7 @@ int ipc_parse_version (int *cmd);
 
 extern void free_msg(struct msg_msg *msg);
 extern struct msg_msg *load_msg(const void __user *src, int len);
+extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst);
 extern int store_msg(void __user *dest, struct msg_msg *msg, int len);
 
 extern void recompute_msgmni(struct ipc_namespace *);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 5/5] test: IPC message queue copy feture test
  2012-10-24 15:34 [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Stanislav Kinsbursky
                   ` (3 preceding siblings ...)
  2012-10-24 15:35 ` [PATCH v8 4/5] ipc: message queue copy feature introduced Stanislav Kinsbursky
@ 2012-10-24 15:35 ` Stanislav Kinsbursky
  2012-10-24 21:42 ` [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Andrew Morton
  2012-12-18 20:36 ` Andrew Morton
  6 siblings, 0 replies; 18+ messages in thread
From: Stanislav Kinsbursky @ 2012-10-24 15:35 UTC (permalink / raw)
  To: akpm
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages

This test can be used to check wheither kernel supports IPC message queue copy
and restore features (required by CRIU project).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
 tools/testing/selftests/ipc/Makefile |   25 ++++
 tools/testing/selftests/ipc/msgque.c |  231 ++++++++++++++++++++++++++++++++++
 2 files changed, 256 insertions(+), 0 deletions(-)
 create mode 100644 tools/testing/selftests/ipc/Makefile
 create mode 100644 tools/testing/selftests/ipc/msgque.c

diff --git a/tools/testing/selftests/ipc/Makefile b/tools/testing/selftests/ipc/Makefile
new file mode 100644
index 0000000..5386fd7
--- /dev/null
+++ b/tools/testing/selftests/ipc/Makefile
@@ -0,0 +1,25 @@
+uname_M := $(shell uname -m 2>/dev/null || echo not)
+ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/)
+ifeq ($(ARCH),i386)
+        ARCH := X86
+	CFLAGS := -DCONFIG_X86_32 -D__i386__
+endif
+ifeq ($(ARCH),x86_64)
+	ARCH := X86
+	CFLAGS := -DCONFIG_X86_64 -D__x86_64__
+endif
+
+CFLAGS += -I../../../../usr/include/
+
+all:
+ifeq ($(ARCH),X86)
+	gcc $(CFLAGS) msgque.c -o msgque_test
+else
+	echo "Not an x86 target, can't build msgque selftest"
+endif
+
+run_tests: all
+	./msgque_test
+
+clean:
+	rm -fr ./msgque_test
diff --git a/tools/testing/selftests/ipc/msgque.c b/tools/testing/selftests/ipc/msgque.c
new file mode 100644
index 0000000..e2d6a64
--- /dev/null
+++ b/tools/testing/selftests/ipc/msgque.c
@@ -0,0 +1,231 @@
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <linux/msg.h>
+
+#define MAX_MSG_SIZE		32
+
+struct msg1 {
+	int msize;
+	long mtype;
+	char mtext[MAX_MSG_SIZE];
+};
+
+#define TEST_STRING "Test sysv5 msg"
+#define MSG_TYPE 1
+
+#define ANOTHER_TEST_STRING "Yet another test sysv5 msg"
+#define ANOTHER_MSG_TYPE 26538
+
+struct msgque_data {
+	int msq_id;
+	int qbytes;
+	int kern_id;
+	int qnum;
+	int mode;
+	struct msg1 *messages;
+};
+
+int restore_queue(struct msgque_data *msgque)
+{
+	struct msqid_ds ds;
+	int id, i;
+
+	id = msgget(msgque->msq_id,
+		     msgque->mode | IPC_CREAT | IPC_EXCL | IPC_PRESET);
+	if (id == -1) {
+		printf("Failed to create queue\n");
+		return -errno;
+	}
+
+	if (id != msgque->msq_id) {
+		printf("Failed to preset id (%d instead of %d)\n",
+							id, msgque->msq_id);
+		return -EFAULT;
+	}
+
+	if (msgctl(id, MSG_STAT, &ds) < 0) {
+		printf("Failed to stat queue\n");
+		return -errno;
+	}
+
+	ds.msg_perm.key = msgque->msq_id;
+	ds.msg_qbytes = msgque->qbytes;
+	if (msgctl(id, MSG_SET, &ds) < 0) {
+		printf("Failed to update message key\n");
+		return -errno;
+	}
+
+	for (i = 0; i < msgque->qnum; i++) {
+		if (msgsnd(msgque->msq_id, &msgque->messages[i].mtype, msgque->messages[i].msize, IPC_NOWAIT) != 0) {
+			printf("msgsnd failed (%m)\n");
+			return -errno;
+		};
+	}
+	return 0;
+}
+
+int check_and_destroy_queue(struct msgque_data *msgque)
+{
+	struct msg1 message;
+	int cnt = 0, ret;
+
+	while (1) {
+		ret = msgrcv(msgque->msq_id, &message.mtype, MAX_MSG_SIZE, 0, IPC_NOWAIT);
+		if (ret < 0) {
+			if (errno == ENOMSG)
+				break;
+			printf("Failed to read IPC message: %m\n");
+			ret = -errno;
+			goto err;
+		}
+		if (ret != msgque->messages[cnt].msize) {
+			printf("Wrong message size: %d (expected %d)\n", ret, msgque->messages[cnt].msize);
+			ret = -EINVAL;
+			goto err;
+		}
+		if (message.mtype != msgque->messages[cnt].mtype) {
+			printf("Wrong message type\n");
+			ret = -EINVAL;
+			goto err;
+		}
+		if (memcmp(message.mtext, msgque->messages[cnt].mtext, ret)) {
+			printf("Wrong message content\n");
+			ret = -EINVAL;
+			goto err;
+		}
+		cnt++;
+	}
+
+	if (cnt != msgque->qnum) {
+		printf("Wrong message number\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = 0;
+err:
+	if (msgctl(msgque->msq_id, IPC_RMID, 0)) {
+		printf("Failed to destroy queue: %d\n", -errno);
+		return -errno;
+	}
+	return ret;
+}
+
+int dump_queue(struct msgque_data *msgque)
+{
+	struct msqid_ds ds;
+	int i, ret;
+
+	for (msgque->kern_id = 0; msgque->kern_id < 256; msgque->kern_id++) {
+		ret = msgctl(msgque->kern_id, MSG_STAT, &ds);
+		if (ret < 0) {
+			if (errno == -EINVAL)
+				continue;
+			printf("Failed to get stats for IPC queue with id %d\n", msgque->kern_id);
+			return -errno;
+		}
+
+		if (ret == msgque->msq_id)
+			break;
+	}
+
+	msgque->messages = malloc(sizeof(struct msg1) * ds.msg_qnum);
+	if (msgque->messages == NULL) {
+		printf("Failed to get stats for IPC queue\n");
+		return -ENOMEM;
+	}
+
+	msgque->qnum = ds.msg_qnum;
+	msgque->mode = ds.msg_perm.mode;
+	msgque->qbytes = ds.msg_qbytes;
+
+	for (i = 0; i < msgque->qnum; i++) {
+		ret = msgrcv(msgque->msq_id, &msgque->messages[i].mtype, MAX_MSG_SIZE, i, IPC_NOWAIT | MSG_COPY);
+		if (ret < 0) {
+			printf("Failed to copy IPC message: %m (%d)\n", errno);
+			return -errno;
+		}
+		msgque->messages[i].msize = ret;
+	}
+	return 0;
+}
+
+int fill_msgque(struct msgque_data *msgque)
+{
+	struct msg1 msgbuf;
+
+	msgbuf.mtype = MSG_TYPE;
+	memcpy(msgbuf.mtext, TEST_STRING, sizeof(TEST_STRING));
+	if (msgsnd(msgque->msq_id, &msgbuf.mtype, sizeof(TEST_STRING), IPC_NOWAIT) != 0) {
+		printf("First message send failed (%m)\n");
+		return -errno;
+	};
+
+	msgbuf.mtype = ANOTHER_MSG_TYPE;
+	memcpy(msgbuf.mtext, ANOTHER_TEST_STRING, sizeof(ANOTHER_TEST_STRING));
+	if (msgsnd(msgque->msq_id, &msgbuf.mtype, sizeof(ANOTHER_TEST_STRING), IPC_NOWAIT) != 0) {
+		printf("Second message send failed (%m)\n");
+		return -errno;
+	};
+	return 0;
+}
+
+int main (int argc, char **argv)
+{
+	key_t key;
+	int msg, pid, err;
+	struct msgque_data msgque;
+
+	key = ftok(argv[0], 822155650);
+	if (key == -1) {
+		printf("Can't make key\n");
+		return -errno;
+	}
+
+	msgque.msq_id = msgget(key, IPC_CREAT | IPC_EXCL | 0666);
+	if (msgque.msq_id == -1) {
+		printf("Can't create queue\n");
+		goto err_out;
+	}
+
+	err = fill_msgque(&msgque);
+	if (err) {
+		printf("Failed to fill queue\n");
+		goto err_destroy;
+	}
+
+	err = dump_queue(&msgque);
+	if (err) {
+		printf("Failed to dump queue\n");
+		goto err_destroy;
+	}
+
+	err = check_and_destroy_queue(&msgque);
+	if (err) {
+		printf("Failed to check and destroy queue\n");
+		goto err_out;
+	}
+
+	err = restore_queue(&msgque);
+	if (err) {
+		printf("Failed to restore queue\n");
+		goto err_destroy;
+	}
+
+	err = check_and_destroy_queue(&msgque);
+	if (err) {
+		printf("Failed to test queue\n");
+		goto err_out;
+	}
+	return 0;
+
+err_destroy:
+	if (msgctl(msgque.msq_id, IPC_RMID, 0)) {
+		printf("Failed to destroy queue: %d\n", -errno);
+		return -errno;
+	}
+err_out:
+	return err;
+}


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 2/5] ipc: add sysctl to specify desired next object id
  2012-10-24 15:35 ` [PATCH v8 2/5] ipc: add sysctl to specify desired next object id Stanislav Kinsbursky
@ 2012-10-24 21:41   ` Andrew Morton
  2012-10-25  7:53     ` Stanislav Kinsbursky
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2012-10-24 21:41 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages

On Wed, 24 Oct 2012 19:35:09 +0400
Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:

> This patch adds 3 new variables and sysctls to tune them (by one "next_id"
> variable for messages, semaphores and shared memory respectively).
> This variable can be used to set desired id for next allocated IPC object.
> By default it's equal to -1 and old behaviour is preserved.
> If this variable is non-negative, then desired idr will be extracted from it
> and used as a start value to search for free IDR slot.
> 
> Notes:
> 1) this patch doesn't garantee, that new object will have desired id. So it's
> up to user space how to handle new object with wrong id.
> 2) After sucessfull id allocation attempt, "next_id" will be set back to -1
> (if it was non-negative).
> 
> --- a/ipc/ipc_sysctl.c
> +++ b/ipc/ipc_sysctl.c
> @@ -158,6 +158,7 @@ static int proc_ipcauto_dointvec_minmax(ctl_table *table, int write,
>  
>  static int zero;
>  static int one = 1;
> +static int int_max = INT_MAX;
>  
>  static struct ctl_table ipc_kern_table[] = {
>  	{
> @@ -227,6 +228,33 @@ static struct ctl_table ipc_kern_table[] = {
>  		.extra1		= &zero,
>  		.extra2		= &one,
>  	},
> +	{
> +		.procname	= "sem_next_id",
> +		.data		= &init_ipc_ns.ids[IPC_SEM_IDS].next_id,
> +		.maxlen		= sizeof(init_ipc_ns.ids[IPC_SEM_IDS].next_id),
> +		.mode		= 0644,
> +		.proc_handler	= proc_ipc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &int_max,
> +	},
> +	{
> +		.procname	= "msg_next_id",
> +		.data		= &init_ipc_ns.ids[IPC_MSG_IDS].next_id,
> +		.maxlen		= sizeof(init_ipc_ns.ids[IPC_MSG_IDS].next_id),
> +		.mode		= 0644,
> +		.proc_handler	= proc_ipc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &int_max,
> +	},
> +	{
> +		.procname	= "shm_next_id",
> +		.data		= &init_ipc_ns.ids[IPC_SHM_IDS].next_id,
> +		.maxlen		= sizeof(init_ipc_ns.ids[IPC_SHM_IDS].next_id),
> +		.mode		= 0644,
> +		.proc_handler	= proc_ipc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &int_max,
> +	},
>  	{}
>  };

ipc_kern_table[] is (badly) documented in
Documentation/sysctl/kernel.txt.  Can we at least mention these
controls in there?  Better, create a new way of properly documenting
each control and document these three in that manner?  Better still,
document all the other ones as well ;)

The patch adds these controls to CONFIG_CHECKPOINT_RESTORE=n kernels. 
Why is this?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 4/5] ipc: message queue copy feature introduced
  2012-10-24 15:35 ` [PATCH v8 4/5] ipc: message queue copy feature introduced Stanislav Kinsbursky
@ 2012-10-24 21:41   ` Andrew Morton
  0 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2012-10-24 21:41 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages

On Wed, 24 Oct 2012 19:35:20 +0400
Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:

> This patch is required for checkpoint/restore in userspace.
> IOW, c/r requires some way to get all pending IPC messages without deleting
> them from the queue (checkpoint can fail and in this case tasks will be resumed,
> so queue have to be valid).
> To achive this, new operation flag MSG_COPY for sys_msgrcv() system call was
> introduced. If this flag was specified, then mtype is interpreted as number of
> the message to copy.
> If MSG_COPY is set, then kernel will allocate dummy message with passed size,
> and then use new copy_msg() helper function to copy desired message (instead of
> unlinking it from the queue).
> 
> ...
>
> @@ -777,19 +777,48 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
>  	struct msg_msg *msg;
>  	int mode;
>  	struct ipc_namespace *ns;
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +	struct msg_msg *copy = NULL;
> +	unsigned long copy_number = 0;
> +#endif
>  
>  	if (msqid < 0 || (long) bufsz < 0)
>  		return -EINVAL;
> +	if (msgflg & MSG_COPY) {
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +
> +		if (msgflg & MSG_COPY) {

This test is't needed.

> +			copy_number = msgtyp;
> +			msgtyp = 0;
> +		}
> +
> +		/*
> +		 * Create dummy message to copy real message to.
> +		 */
> +		copy = load_msg(buf, bufsz);
> +		if (IS_ERR(copy))
> +			return PTR_ERR(copy);
> +		copy->m_ts = bufsz;
> +#else
> +		return -ENOSYS;
> +#endif
> +	}
>  	mode = convert_mode(&msgtyp, msgflg);
>  	ns = current->nsproxy->ipc_ns;
>  
>  	msq = msg_lock_check(ns, msqid);
> -	if (IS_ERR(msq))
> +	if (IS_ERR(msq)) {
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +		if (msgflg & MSG_COPY)
> +			free_msg(copy);
> +#endif
>  		return PTR_ERR(msq);
> +	}
>  
>  	for (;;) {
>  		struct msg_receiver msr_d;
>  		struct list_head *tmp;
> +		long msg_counter = 0;
>  
>  		msg = ERR_PTR(-EACCES);
>  		if (ipcperms(ns, &msq->q_perm, S_IRUGO))
> @@ -809,8 +838,16 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
>  				if (mode == SEARCH_LESSEQUAL &&
>  						walk_msg->m_type != 1) {
>  					msgtyp = walk_msg->m_type - 1;
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +				} else if (msgflg & MSG_COPY) {
> +					if (copy_number == msg_counter) {
> +						msg = copy_msg(walk_msg, copy);
> +						break;
> +					}
> +#endif
>  				} else
>  					break;
> +				msg_counter++;
>  			}
>  			tmp = tmp->next;
>  		}
> @@ -823,6 +860,10 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
>  				msg = ERR_PTR(-E2BIG);
>  				goto out_unlock;
>  			}
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +			if (msgflg & MSG_COPY)
> +				goto out_unlock;
> +#endif
>  			list_del(&msg->m_list);
>  			msq->q_qnum--;
>  			msq->q_rtime = get_seconds();
> @@ -906,8 +947,13 @@ out_unlock:
>  			break;
>  		}
>  	}
> -	if (IS_ERR(msg))
> +	if (IS_ERR(msg)) {
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +		if (msgflg & MSG_COPY)
> +			free_msg(copy);
> +#endif
>  		return PTR_ERR(msg);
> +	}
>  
>  	bufsz = msg_handler(buf, msg, bufsz);
>  	free_msg(msg);

It's all a bit ugly, but I don't really see much we can practically do
about that.

You could add something like

#ifdef CONFIG_CHECKPOINT_RESTORE
static inline void free_copy(void *p, int msgflg, struct msg_msg *copy)
{
	if (IS_ERR(p) && (msgflg & MSG_COPY))
		free_msg(copy);
}
#else
/* As a macro because `copy' will be undefined */
#define free_copy(p, msgflg, copy) do {} while (0)
#endif

and use that in a couple of places.  But that won't help much.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements
  2012-10-24 15:34 [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Stanislav Kinsbursky
                   ` (4 preceding siblings ...)
  2012-10-24 15:35 ` [PATCH v8 5/5] test: IPC message queue copy feture test Stanislav Kinsbursky
@ 2012-10-24 21:42 ` Andrew Morton
  2012-12-18 20:36 ` Andrew Morton
  6 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2012-10-24 21:42 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages

On Wed, 24 Oct 2012 19:34:51 +0400
Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:

> This respin of the patch set was significantly reworked. Most part of new API
> was replaced by sysctls (by one per messages, semaphores and shared memory),
> allowing to preset desired id for next new IPC object.
> 
> This patch set is aimed to provide additional functionality for all IPC
> objects, which is required for migration of these objects by user-space
> checkpoint/restore utils (CRIU).
> 
> The main problem here was impossibility to set up object id. This patch set
> solves the problem by adding new sysctls for preset of desired id for new IPC
> object.
> 
> Another problem was to peek messages from queues without deleting them.
> This was achived by introducing of new MSG_COPY flag for sys_msgrcv(). If
> MSG_COPY flag is set, then msgtyp is interpreted as message number.

OK, thanks, I grabbed these.  They're not the worst c/r patches I've
ever seen ;)

I'm not seeing any evidence that anyone else has reviewed or tested
these?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 2/5] ipc: add sysctl to specify desired next object id
  2012-10-24 21:41   ` Andrew Morton
@ 2012-10-25  7:53     ` Stanislav Kinsbursky
  0 siblings, 0 replies; 18+ messages in thread
From: Stanislav Kinsbursky @ 2012-10-25  7:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: serge.hallyn, ebiederm, linux-kernel, Pavel Emelianov,
	catalin.marinas, will.deacon, jmorris, cmetcalf, joe.korty,
	dhowells, dledford, viro, kosaki.motohiro, linux-api, serue,
	tglx, paulmck, devel, mtk.manpages

25.10.2012 01:41, Andrew Morton пишет:
> On Wed, 24 Oct 2012 19:35:09 +0400
> Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:
>
>> This patch adds 3 new variables and sysctls to tune them (by one "next_id"
>> variable for messages, semaphores and shared memory respectively).
>> This variable can be used to set desired id for next allocated IPC object.
>> By default it's equal to -1 and old behaviour is preserved.
>> If this variable is non-negative, then desired idr will be extracted from it
>> and used as a start value to search for free IDR slot.
>>
>> Notes:
>> 1) this patch doesn't garantee, that new object will have desired id. So it's
>> up to user space how to handle new object with wrong id.
>> 2) After sucessfull id allocation attempt, "next_id" will be set back to -1
>> (if it was non-negative).
>>
>> --- a/ipc/ipc_sysctl.c
>> +++ b/ipc/ipc_sysctl.c
>> @@ -158,6 +158,7 @@ static int proc_ipcauto_dointvec_minmax(ctl_table *table, int write,
>>
>>   static int zero;
>>   static int one = 1;
>> +static int int_max = INT_MAX;
>>
>>   static struct ctl_table ipc_kern_table[] = {
>>   	{
>> @@ -227,6 +228,33 @@ static struct ctl_table ipc_kern_table[] = {
>>   		.extra1		= &zero,
>>   		.extra2		= &one,
>>   	},
>> +	{
>> +		.procname	= "sem_next_id",
>> +		.data		= &init_ipc_ns.ids[IPC_SEM_IDS].next_id,
>> +		.maxlen		= sizeof(init_ipc_ns.ids[IPC_SEM_IDS].next_id),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_ipc_dointvec_minmax,
>> +		.extra1		= &zero,
>> +		.extra2		= &int_max,
>> +	},
>> +	{
>> +		.procname	= "msg_next_id",
>> +		.data		= &init_ipc_ns.ids[IPC_MSG_IDS].next_id,
>> +		.maxlen		= sizeof(init_ipc_ns.ids[IPC_MSG_IDS].next_id),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_ipc_dointvec_minmax,
>> +		.extra1		= &zero,
>> +		.extra2		= &int_max,
>> +	},
>> +	{
>> +		.procname	= "shm_next_id",
>> +		.data		= &init_ipc_ns.ids[IPC_SHM_IDS].next_id,
>> +		.maxlen		= sizeof(init_ipc_ns.ids[IPC_SHM_IDS].next_id),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_ipc_dointvec_minmax,
>> +		.extra1		= &zero,
>> +		.extra2		= &int_max,
>> +	},
>>   	{}
>>   };
>
> ipc_kern_table[] is (badly) documented in
> Documentation/sysctl/kernel.txt.  Can we at least mention these
> controls in there?  Better, create a new way of properly documenting
> each control and document these three in that manner?  Better still,
> document all the other ones as well ;)
>

Yes, sure. I'll do my best.

> The patch adds these controls to CONFIG_CHECKPOINT_RESTORE=n kernels.
> Why is this?
>

I'll fix this.

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements
  2012-10-24 15:34 [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Stanislav Kinsbursky
                   ` (5 preceding siblings ...)
  2012-10-24 21:42 ` [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Andrew Morton
@ 2012-12-18 20:36 ` Andrew Morton
  2012-12-20  4:06   ` Stanislav Kinsbursky
  6 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2012-12-18 20:36 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages, Sasha Levin, Wu Fengguang

On Wed, 24 Oct 2012 19:34:51 +0400
Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:

> This respin of the patch set was significantly reworked. Most part of new API
> was replaced by sysctls (by one per messages, semaphores and shared memory),
> allowing to preset desired id for next new IPC object.
> 
> This patch set is aimed to provide additional functionality for all IPC
> objects, which is required for migration of these objects by user-space
> checkpoint/restore utils (CRIU).
> 
> The main problem here was impossibility to set up object id. This patch set
> solves the problem by adding new sysctls for preset of desired id for new IPC
> object.
> 
> Another problem was to peek messages from queues without deleting them.
> This was achived by introducing of new MSG_COPY flag for sys_msgrcv(). If
> MSG_COPY flag is set, then msgtyp is interpreted as message number.

According to my extensive records, Sasha hit a bug in
ipc-message-queue-copy-feature-introduced.patch and Fengguang found a
bug in
ipc-message-queue-copy-feature-introduced-cleanup-do_msgrcv-aroung-msg_copy-feature.patch

It's not obvious (to me) that these things have been identified and
fixed.  What's the status, please?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements
  2012-12-18 20:36 ` Andrew Morton
@ 2012-12-20  4:06   ` Stanislav Kinsbursky
  2012-12-20 20:47     ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-20  4:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages, Sasha Levin, Wu Fengguang

19.12.2012 00:36, Andrew Morton пишет:
> On Wed, 24 Oct 2012 19:34:51 +0400
> Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:
>
>> This respin of the patch set was significantly reworked. Most part of new API
>> was replaced by sysctls (by one per messages, semaphores and shared memory),
>> allowing to preset desired id for next new IPC object.
>>
>> This patch set is aimed to provide additional functionality for all IPC
>> objects, which is required for migration of these objects by user-space
>> checkpoint/restore utils (CRIU).
>>
>> The main problem here was impossibility to set up object id. This patch set
>> solves the problem by adding new sysctls for preset of desired id for new IPC
>> object.
>>
>> Another problem was to peek messages from queues without deleting them.
>> This was achived by introducing of new MSG_COPY flag for sys_msgrcv(). If
>> MSG_COPY flag is set, then msgtyp is interpreted as message number.
> According to my extensive records, Sasha hit a bug in
> ipc-message-queue-copy-feature-introduced.patch and Fengguang found a
> bug in
> ipc-message-queue-copy-feature-introduced-cleanup-do_msgrcv-aroung-msg_copy-feature.patch
>
> It's not obvious (to me) that these things have been identified and
> fixed.  What's the status, please?

Hello, Andrew.
Fengguang's issue was solved by "ipc: simplify message copying" I sent you.
But I can't find Sasha's issue. As I remember, there was some problem in 
early
version of the patch set. But I believe its fixed now.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements
  2012-12-20  4:06   ` Stanislav Kinsbursky
@ 2012-12-20 20:47     ` Andrew Morton
  2012-12-21 20:46       ` Stanislav Kinsbursky
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2012-12-20 20:47 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages, Sasha Levin, Wu Fengguang

On Thu, 20 Dec 2012 08:06:32 +0400
Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:

> 19.12.2012 00:36, Andrew Morton __________:
> > On Wed, 24 Oct 2012 19:34:51 +0400
> > Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:
> >
> >> This respin of the patch set was significantly reworked. Most part of new API
> >> was replaced by sysctls (by one per messages, semaphores and shared memory),
> >> allowing to preset desired id for next new IPC object.
> >>
> >> This patch set is aimed to provide additional functionality for all IPC
> >> objects, which is required for migration of these objects by user-space
> >> checkpoint/restore utils (CRIU).
> >>
> >> The main problem here was impossibility to set up object id. This patch set
> >> solves the problem by adding new sysctls for preset of desired id for new IPC
> >> object.
> >>
> >> Another problem was to peek messages from queues without deleting them.
> >> This was achived by introducing of new MSG_COPY flag for sys_msgrcv(). If
> >> MSG_COPY flag is set, then msgtyp is interpreted as message number.
> > According to my extensive records, Sasha hit a bug in
> > ipc-message-queue-copy-feature-introduced.patch and Fengguang found a
> > bug in
> > ipc-message-queue-copy-feature-introduced-cleanup-do_msgrcv-aroung-msg_copy-feature.patch
> >
> > It's not obvious (to me) that these things have been identified and
> > fixed.  What's the status, please?
> 
> Hello, Andrew.
> Fengguang's issue was solved by "ipc: simplify message copying" I sent you.
> But I can't find Sasha's issue. As I remember, there was some problem in 
> early
> version of the patch set. But I believe its fixed now.

http://lkml.indiana.edu/hypermail/linux/kernel/1210.3/01710.html

Subject: "ipc, msgqueue: NULL ptr deref in msgrcv"

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements
  2012-12-20 20:47     ` Andrew Morton
@ 2012-12-21 20:46       ` Stanislav Kinsbursky
  2012-12-21 21:57         ` Sasha Levin
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Kinsbursky @ 2012-12-21 20:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: serge.hallyn, ebiederm, linux-kernel, xemul, catalin.marinas,
	will.deacon, jmorris, cmetcalf, joe.korty, dhowells, dledford,
	viro, kosaki.motohiro, linux-api, serue, tglx, paulmck, devel,
	mtk.manpages, Sasha Levin, Wu Fengguang

21.12.2012 00:47, Andrew Morton пишет:
> On Thu, 20 Dec 2012 08:06:32 +0400
> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>
>> 19.12.2012 00:36, Andrew Morton __________:
>>> On Wed, 24 Oct 2012 19:34:51 +0400
>>> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>>>
>>>> This respin of the patch set was significantly reworked. Most part of new API
>>>> was replaced by sysctls (by one per messages, semaphores and shared memory),
>>>> allowing to preset desired id for next new IPC object.
>>>>
>>>> This patch set is aimed to provide additional functionality for all IPC
>>>> objects, which is required for migration of these objects by user-space
>>>> checkpoint/restore utils (CRIU).
>>>>
>>>> The main problem here was impossibility to set up object id. This patch set
>>>> solves the problem by adding new sysctls for preset of desired id for new IPC
>>>> object.
>>>>
>>>> Another problem was to peek messages from queues without deleting them.
>>>> This was achived by introducing of new MSG_COPY flag for sys_msgrcv(). If
>>>> MSG_COPY flag is set, then msgtyp is interpreted as message number.
>>> According to my extensive records, Sasha hit a bug in
>>> ipc-message-queue-copy-feature-introduced.patch and Fengguang found a
>>> bug in
>>> ipc-message-queue-copy-feature-introduced-cleanup-do_msgrcv-aroung-msg_copy-feature.patch
>>>
>>> It's not obvious (to me) that these things have been identified and
>>> fixed.  What's the status, please?
>> Hello, Andrew.
>> Fengguang's issue was solved by "ipc: simplify message copying" I sent you.
>> But I can't find Sasha's issue. As I remember, there was some problem in
>> early
>> version of the patch set. But I believe its fixed now.
> http://lkml.indiana.edu/hypermail/linux/kernel/1210.3/01710.html
>
> Subject: "ipc, msgqueue: NULL ptr deref in msgrcv"

Ah, yes. Thanks.
Hi found it in initial version of code, which was significantly changed 
(or cleaned and simplified) by further patch series.
And I cant find out, how this can happen, because this patch he bisect 
to do not modify the queue itself, while he found the problem in testmsg.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements
  2012-12-21 20:46       ` Stanislav Kinsbursky
@ 2012-12-21 21:57         ` Sasha Levin
  2012-12-22 15:43           ` Sasha Levin
  0 siblings, 1 reply; 18+ messages in thread
From: Sasha Levin @ 2012-12-21 21:57 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: Andrew Morton, serge.hallyn, ebiederm, linux-kernel, xemul,
	catalin.marinas, will.deacon, jmorris, cmetcalf, joe.korty,
	dhowells, dledford, viro, kosaki.motohiro, linux-api, serue,
	tglx, paulmck, devel, mtk.manpages, Wu Fengguang

On 12/21/2012 03:46 PM, Stanislav Kinsbursky wrote:
> 21.12.2012 00:47, Andrew Morton пишет:
>> On Thu, 20 Dec 2012 08:06:32 +0400
>> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>>
>>> 19.12.2012 00:36, Andrew Morton __________:
>>>> On Wed, 24 Oct 2012 19:34:51 +0400
>>>> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>>>>
>>>>> This respin of the patch set was significantly reworked. Most part of new API
>>>>> was replaced by sysctls (by one per messages, semaphores and shared memory),
>>>>> allowing to preset desired id for next new IPC object.
>>>>>
>>>>> This patch set is aimed to provide additional functionality for all IPC
>>>>> objects, which is required for migration of these objects by user-space
>>>>> checkpoint/restore utils (CRIU).
>>>>>
>>>>> The main problem here was impossibility to set up object id. This patch set
>>>>> solves the problem by adding new sysctls for preset of desired id for new IPC
>>>>> object.
>>>>>
>>>>> Another problem was to peek messages from queues without deleting them.
>>>>> This was achived by introducing of new MSG_COPY flag for sys_msgrcv(). If
>>>>> MSG_COPY flag is set, then msgtyp is interpreted as message number.
>>>> According to my extensive records, Sasha hit a bug in
>>>> ipc-message-queue-copy-feature-introduced.patch and Fengguang found a
>>>> bug in
>>>> ipc-message-queue-copy-feature-introduced-cleanup-do_msgrcv-aroung-msg_copy-feature.patch
>>>>
>>>> It's not obvious (to me) that these things have been identified and
>>>> fixed.  What's the status, please?
>>> Hello, Andrew.
>>> Fengguang's issue was solved by "ipc: simplify message copying" I sent you.
>>> But I can't find Sasha's issue. As I remember, there was some problem in
>>> early
>>> version of the patch set. But I believe its fixed now.
>> http://lkml.indiana.edu/hypermail/linux/kernel/1210.3/01710.html
>>
>> Subject: "ipc, msgqueue: NULL ptr deref in msgrcv"
> 
> Ah, yes. Thanks.
> Hi found it in initial version of code, which was significantly changed (or cleaned and simplified) by further patch series.
> And I cant find out, how this can happen, because this patch he bisect to do not modify the queue itself, while he found the
> problem in testmsg.

I actually can't reproduce it on the latest -next.

I was reverting the IPC changes in the past couple of weeks so that I could test the
rest of the IPC code with the fuzzer, and when I added them back in again I can't
reproduce the issue I've reported earlier.

We can probably figure out where it got fixed by bisecting between -next trees if anyone
is interested in that.


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements
  2012-12-21 21:57         ` Sasha Levin
@ 2012-12-22 15:43           ` Sasha Levin
  2013-01-09  8:24             ` Stanislav Kinsbursky
  0 siblings, 1 reply; 18+ messages in thread
From: Sasha Levin @ 2012-12-22 15:43 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: Andrew Morton, serge.hallyn, ebiederm, linux-kernel, xemul,
	catalin.marinas, will.deacon, jmorris, cmetcalf, joe.korty,
	dhowells, dledford, viro, kosaki.motohiro, linux-api, serue,
	tglx, paulmck, devel, mtk.manpages, Wu Fengguang

On 12/21/2012 04:57 PM, Sasha Levin wrote:
> On 12/21/2012 03:46 PM, Stanislav Kinsbursky wrote:
>> 21.12.2012 00:47, Andrew Morton пишет:
>>> On Thu, 20 Dec 2012 08:06:32 +0400
>>> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>>>
>>>> 19.12.2012 00:36, Andrew Morton __________:
>>>>> On Wed, 24 Oct 2012 19:34:51 +0400
>>>>> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>>>>>
>>>>>> This respin of the patch set was significantly reworked. Most part of new API
>>>>>> was replaced by sysctls (by one per messages, semaphores and shared memory),
>>>>>> allowing to preset desired id for next new IPC object.
>>>>>>
>>>>>> This patch set is aimed to provide additional functionality for all IPC
>>>>>> objects, which is required for migration of these objects by user-space
>>>>>> checkpoint/restore utils (CRIU).
>>>>>>
>>>>>> The main problem here was impossibility to set up object id. This patch set
>>>>>> solves the problem by adding new sysctls for preset of desired id for new IPC
>>>>>> object.
>>>>>>
>>>>>> Another problem was to peek messages from queues without deleting them.
>>>>>> This was achived by introducing of new MSG_COPY flag for sys_msgrcv(). If
>>>>>> MSG_COPY flag is set, then msgtyp is interpreted as message number.
>>>>> According to my extensive records, Sasha hit a bug in
>>>>> ipc-message-queue-copy-feature-introduced.patch and Fengguang found a
>>>>> bug in
>>>>> ipc-message-queue-copy-feature-introduced-cleanup-do_msgrcv-aroung-msg_copy-feature.patch
>>>>>
>>>>> It's not obvious (to me) that these things have been identified and
>>>>> fixed.  What's the status, please?
>>>> Hello, Andrew.
>>>> Fengguang's issue was solved by "ipc: simplify message copying" I sent you.
>>>> But I can't find Sasha's issue. As I remember, there was some problem in
>>>> early
>>>> version of the patch set. But I believe its fixed now.
>>> http://lkml.indiana.edu/hypermail/linux/kernel/1210.3/01710.html
>>>
>>> Subject: "ipc, msgqueue: NULL ptr deref in msgrcv"
>>
>> Ah, yes. Thanks.
>> Hi found it in initial version of code, which was significantly changed (or cleaned and simplified) by further patch series.
>> And I cant find out, how this can happen, because this patch he bisect to do not modify the queue itself, while he found the
>> problem in testmsg.
> 
> I actually can't reproduce it on the latest -next.
> 
> I was reverting the IPC changes in the past couple of weeks so that I could test the
> rest of the IPC code with the fuzzer, and when I added them back in again I can't
> reproduce the issue I've reported earlier.
> 
> We can probably figure out where it got fixed by bisecting between -next trees if anyone
> is interested in that.

Ignore that. It just took more fuzzing to stumble on it again:

[  103.164594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[  103.168159] IP: [<ffffffff81937155>] do_msgrcv+0x205/0x540
[  103.170031] PGD c7cd067 PUD d274067 PMD 0
[  103.170031] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  103.170031] Dumping ftrace buffer:
[  103.170031]    (ftrace buffer empty)
[  103.170031] CPU 4
[  103.170031] Pid: 7056, comm: trinity Tainted: G        W    3.7.0-next-20121221-sasha-00014-g339890c #229
[  103.170031] RIP: 0010:[<ffffffff81937155>]  [<ffffffff81937155>] do_msgrcv+0x205/0x540
[  103.170031] RSP: 0018:ffff88000c7cfe88  EFLAGS: 00010246
[  103.170031] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  103.170031] RDX: ffff880013681f00 RSI: 0000000000000124 RDI: ffff8800075a7810
[  103.170031] RBP: ffff88000c7cff68 R08: 0000000000000000 R09: 0000000000000000
[  103.170031] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000002
[  103.170031] R13: ffff8800075a78c0 R14: 7fffffff00000000 R15: ffff8800075a7810
[  103.170031] FS:  00007ffa529ae700(0000) GS:ffff880013c00000(0000) knlGS:0000000000000000
[  103.170031] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  103.170031] CR2: 0000000000000010 CR3: 000000000c7cc000 CR4: 00000000000406e0
[  103.170031] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  103.170031] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  103.170031] Process trinity (pid: 7056, threadinfo ffff88000c7ce000, task ffff88000c020000)
[  103.170031] Stack:
[  103.170031]  ffff88000c7cfea8 ffff88000c020000 ffff88000c020000 ffff88000c020000
[  103.170031]  0000000000000000 ffffffff81935e50 0000000000000008 0000000000000000
[  103.170031]  ffffffff858e91e0 0000000000000000 0000000000001001 ffff88000c020000
[  103.170031] Call Trace:
[  103.170031]  [<ffffffff81935e50>] ? load_msg+0x170/0x170
[  103.170031]  [<ffffffff8107e8c4>] ? syscall_trace_enter+0x24/0x2e0
[  103.170031]  [<ffffffff81184678>] ? trace_hardirqs_on_caller+0x118/0x140
[  103.170031]  [<ffffffff819374a0>] sys_msgrcv+0x10/0x20
[  103.170031]  [<ffffffff83cdf798>] tracesys+0xe1/0xe6
[  103.170031] Code: 80 f5 ff ff ff 90 41 83 fc 03 74 32 41 83 fc 04 74 0c 41 83 fc 02 75 2c eb 11 0f 1f 40 00 4c 3b 73 10 7d 20
66 90 e9 94 00 00 00 <4c> 39 73 10 0f 85 8a 00 00 00 90 eb 0c 66 0f 1f 44 00 00 4c 39
[  103.170031] RIP  [<ffffffff81937155>] do_msgrcv+0x205/0x540
[  103.170031]  RSP <ffff88000c7cfe88>
[  103.170031] CR2: 0000000000000010
[  103.228270] ---[ end trace ddc37199fdad82b0 ]---


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements
  2012-12-22 15:43           ` Sasha Levin
@ 2013-01-09  8:24             ` Stanislav Kinsbursky
  2013-01-14  6:31               ` Sasha Levin
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Kinsbursky @ 2013-01-09  8:24 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andrew Morton, serge.hallyn, ebiederm, linux-kernel, xemul,
	catalin.marinas, will.deacon, jmorris, cmetcalf, joe.korty,
	dhowells, dledford, viro, kosaki.motohiro, linux-api, serue,
	tglx, paulmck, devel, mtk.manpages, Wu Fengguang

22.12.2012 19:43, Sasha Levin пишет:
> On 12/21/2012 04:57 PM, Sasha Levin wrote:
>> On 12/21/2012 03:46 PM, Stanislav Kinsbursky wrote:
>>> 21.12.2012 00:47, Andrew Morton пишет:
>>>> On Thu, 20 Dec 2012 08:06:32 +0400
>>>> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>>>>
>>>>> 19.12.2012 00:36, Andrew Morton __________:
>>>>>> On Wed, 24 Oct 2012 19:34:51 +0400
>>>>>> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>>>>>>
>>>>>>> This respin of the patch set was significantly reworked. Most part of new API
>>>>>>> was replaced by sysctls (by one per messages, semaphores and shared memory),
>>>>>>> allowing to preset desired id for next new IPC object.
>>>>>>>
>>>>>>> This patch set is aimed to provide additional functionality for all IPC
>>>>>>> objects, which is required for migration of these objects by user-space
>>>>>>> checkpoint/restore utils (CRIU).
>>>>>>>
>>>>>>> The main problem here was impossibility to set up object id. This patch set
>>>>>>> solves the problem by adding new sysctls for preset of desired id for new IPC
>>>>>>> object.
>>>>>>>
>>>>>>> Another problem was to peek messages from queues without deleting them.
>>>>>>> This was achived by introducing of new MSG_COPY flag for sys_msgrcv(). If
>>>>>>> MSG_COPY flag is set, then msgtyp is interpreted as message number.
>>>>>> According to my extensive records, Sasha hit a bug in
>>>>>> ipc-message-queue-copy-feature-introduced.patch and Fengguang found a
>>>>>> bug in
>>>>>> ipc-message-queue-copy-feature-introduced-cleanup-do_msgrcv-aroung-msg_copy-feature.patch
>>>>>>
>>>>>> It's not obvious (to me) that these things have been identified and
>>>>>> fixed.  What's the status, please?
>>>>> Hello, Andrew.
>>>>> Fengguang's issue was solved by "ipc: simplify message copying" I sent you.
>>>>> But I can't find Sasha's issue. As I remember, there was some problem in
>>>>> early
>>>>> version of the patch set. But I believe its fixed now.
>>>> http://lkml.indiana.edu/hypermail/linux/kernel/1210.3/01710.html
>>>>
>>>> Subject: "ipc, msgqueue: NULL ptr deref in msgrcv"
>>>
>>> Ah, yes. Thanks.
>>> Hi found it in initial version of code, which was significantly changed (or cleaned and simplified) by further patch series.
>>> And I cant find out, how this can happen, because this patch he bisect to do not modify the queue itself, while he found the
>>> problem in testmsg.
>>
>> I actually can't reproduce it on the latest -next.
>>
>> I was reverting the IPC changes in the past couple of weeks so that I could test the
>> rest of the IPC code with the fuzzer, and when I added them back in again I can't
>> reproduce the issue I've reported earlier.
>>
>> We can probably figure out where it got fixed by bisecting between -next trees if anyone
>> is interested in that.
>
> Ignore that. It just took more fuzzing to stumble on it again:
>

Hello, Sasha!
Thanks!
But I still can't understand, how this can happen... And I can't reproduce.
Could you specify your load? I.e. how do you stumble on this panic?
Looks like you don't use new "copy" feature.

> [  103.164594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> [  103.168159] IP: [<ffffffff81937155>] do_msgrcv+0x205/0x540
> [  103.170031] PGD c7cd067 PUD d274067 PMD 0
> [  103.170031] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [  103.170031] Dumping ftrace buffer:
> [  103.170031]    (ftrace buffer empty)
> [  103.170031] CPU 4
> [  103.170031] Pid: 7056, comm: trinity Tainted: G        W    3.7.0-next-20121221-sasha-00014-g339890c #229
> [  103.170031] RIP: 0010:[<ffffffff81937155>]  [<ffffffff81937155>] do_msgrcv+0x205/0x540
> [  103.170031] RSP: 0018:ffff88000c7cfe88  EFLAGS: 00010246
> [  103.170031] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [  103.170031] RDX: ffff880013681f00 RSI: 0000000000000124 RDI: ffff8800075a7810
> [  103.170031] RBP: ffff88000c7cff68 R08: 0000000000000000 R09: 0000000000000000
> [  103.170031] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000002
> [  103.170031] R13: ffff8800075a78c0 R14: 7fffffff00000000 R15: ffff8800075a7810
> [  103.170031] FS:  00007ffa529ae700(0000) GS:ffff880013c00000(0000) knlGS:0000000000000000
> [  103.170031] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  103.170031] CR2: 0000000000000010 CR3: 000000000c7cc000 CR4: 00000000000406e0
> [  103.170031] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  103.170031] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  103.170031] Process trinity (pid: 7056, threadinfo ffff88000c7ce000, task ffff88000c020000)
> [  103.170031] Stack:
> [  103.170031]  ffff88000c7cfea8 ffff88000c020000 ffff88000c020000 ffff88000c020000
> [  103.170031]  0000000000000000 ffffffff81935e50 0000000000000008 0000000000000000
> [  103.170031]  ffffffff858e91e0 0000000000000000 0000000000001001 ffff88000c020000
> [  103.170031] Call Trace:
> [  103.170031]  [<ffffffff81935e50>] ? load_msg+0x170/0x170
> [  103.170031]  [<ffffffff8107e8c4>] ? syscall_trace_enter+0x24/0x2e0
> [  103.170031]  [<ffffffff81184678>] ? trace_hardirqs_on_caller+0x118/0x140
> [  103.170031]  [<ffffffff819374a0>] sys_msgrcv+0x10/0x20
> [  103.170031]  [<ffffffff83cdf798>] tracesys+0xe1/0xe6
> [  103.170031] Code: 80 f5 ff ff ff 90 41 83 fc 03 74 32 41 83 fc 04 74 0c 41 83 fc 02 75 2c eb 11 0f 1f 40 00 4c 3b 73 10 7d 20
> 66 90 e9 94 00 00 00 <4c> 39 73 10 0f 85 8a 00 00 00 90 eb 0c 66 0f 1f 44 00 00 4c 39
> [  103.170031] RIP  [<ffffffff81937155>] do_msgrcv+0x205/0x540
> [  103.170031]  RSP <ffff88000c7cfe88>
> [  103.170031] CR2: 0000000000000010
> [  103.228270] ---[ end trace ddc37199fdad82b0 ]---
>
>
> Thanks,
> Sasha
>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements
  2013-01-09  8:24             ` Stanislav Kinsbursky
@ 2013-01-14  6:31               ` Sasha Levin
  0 siblings, 0 replies; 18+ messages in thread
From: Sasha Levin @ 2013-01-14  6:31 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: Sasha Levin, Andrew Morton, serge.hallyn, ebiederm, linux-kernel,
	xemul, catalin.marinas, will.deacon, jmorris, cmetcalf,
	joe.korty, dhowells, dledford, viro, kosaki.motohiro, linux-api,
	serue, tglx, paulmck, devel, mtk.manpages, Wu Fengguang

On Wed, Jan 9, 2013 at 3:24 AM, Stanislav Kinsbursky
<skinsbursky@parallels.com> wrote:
> 22.12.2012 19:43, Sasha Levin пишет:
>
>> On 12/21/2012 04:57 PM, Sasha Levin wrote:
>>>
>>> On 12/21/2012 03:46 PM, Stanislav Kinsbursky wrote:
>>>>
>>>> 21.12.2012 00:47, Andrew Morton пишет:
>>>>>
>>>>> On Thu, 20 Dec 2012 08:06:32 +0400
>>>>> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>>>>>
>>>>>> 19.12.2012 00:36, Andrew Morton __________:
>>>>>>>
>>>>>>> On Wed, 24 Oct 2012 19:34:51 +0400
>>>>>>> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>>>>>>>
>>>>>>>> This respin of the patch set was significantly reworked. Most part
>>>>>>>> of new API
>>>>>>>> was replaced by sysctls (by one per messages, semaphores and shared
>>>>>>>> memory),
>>>>>>>> allowing to preset desired id for next new IPC object.
>>>>>>>>
>>>>>>>> This patch set is aimed to provide additional functionality for all
>>>>>>>> IPC
>>>>>>>> objects, which is required for migration of these objects by
>>>>>>>> user-space
>>>>>>>> checkpoint/restore utils (CRIU).
>>>>>>>>
>>>>>>>> The main problem here was impossibility to set up object id. This
>>>>>>>> patch set
>>>>>>>> solves the problem by adding new sysctls for preset of desired id
>>>>>>>> for new IPC
>>>>>>>> object.
>>>>>>>>
>>>>>>>> Another problem was to peek messages from queues without deleting
>>>>>>>> them.
>>>>>>>> This was achived by introducing of new MSG_COPY flag for
>>>>>>>> sys_msgrcv(). If
>>>>>>>> MSG_COPY flag is set, then msgtyp is interpreted as message number.
>>>>>>>
>>>>>>> According to my extensive records, Sasha hit a bug in
>>>>>>> ipc-message-queue-copy-feature-introduced.patch and Fengguang found a
>>>>>>> bug in
>>>>>>>
>>>>>>> ipc-message-queue-copy-feature-introduced-cleanup-do_msgrcv-aroung-msg_copy-feature.patch
>>>>>>>
>>>>>>> It's not obvious (to me) that these things have been identified and
>>>>>>> fixed.  What's the status, please?
>>>>>>
>>>>>> Hello, Andrew.
>>>>>> Fengguang's issue was solved by "ipc: simplify message copying" I sent
>>>>>> you.
>>>>>> But I can't find Sasha's issue. As I remember, there was some problem
>>>>>> in
>>>>>> early
>>>>>> version of the patch set. But I believe its fixed now.
>>>>>
>>>>> http://lkml.indiana.edu/hypermail/linux/kernel/1210.3/01710.html
>>>>>
>>>>> Subject: "ipc, msgqueue: NULL ptr deref in msgrcv"
>>>>
>>>>
>>>> Ah, yes. Thanks.
>>>> Hi found it in initial version of code, which was significantly changed
>>>> (or cleaned and simplified) by further patch series.
>>>> And I cant find out, how this can happen, because this patch he bisect
>>>> to do not modify the queue itself, while he found the
>>>> problem in testmsg.
>>>
>>>
>>> I actually can't reproduce it on the latest -next.
>>>
>>> I was reverting the IPC changes in the past couple of weeks so that I
>>> could test the
>>> rest of the IPC code with the fuzzer, and when I added them back in again
>>> I can't
>>> reproduce the issue I've reported earlier.
>>>
>>> We can probably figure out where it got fixed by bisecting between -next
>>> trees if anyone
>>> is interested in that.
>>
>>
>> Ignore that. It just took more fuzzing to stumble on it again:
>>
>
> Hello, Sasha!
> Thanks!
> But I still can't understand, how this can happen... And I can't reproduce.
> Could you specify your load? I.e. how do you stumble on this panic?
> Looks like you don't use new "copy" feature.

I just fuzz with trinity
(http://git.codemonkey.org.uk/?p=trinity.git;a=tree) inside a pretty
basic KVM guest.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-01-14  6:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-24 15:34 [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Stanislav Kinsbursky
2012-10-24 15:35 ` [PATCH v8 1/5] ipc: remove forced assignment of selected message Stanislav Kinsbursky
2012-10-24 15:35 ` [PATCH v8 2/5] ipc: add sysctl to specify desired next object id Stanislav Kinsbursky
2012-10-24 21:41   ` Andrew Morton
2012-10-25  7:53     ` Stanislav Kinsbursky
2012-10-24 15:35 ` [PATCH v8 3/5] ipc: message queue receive cleanup Stanislav Kinsbursky
2012-10-24 15:35 ` [PATCH v8 4/5] ipc: message queue copy feature introduced Stanislav Kinsbursky
2012-10-24 21:41   ` Andrew Morton
2012-10-24 15:35 ` [PATCH v8 5/5] test: IPC message queue copy feture test Stanislav Kinsbursky
2012-10-24 21:42 ` [RFC PATCH v8 0/5] IPC: checkpoint/restore in userspace enhancements Andrew Morton
2012-12-18 20:36 ` Andrew Morton
2012-12-20  4:06   ` Stanislav Kinsbursky
2012-12-20 20:47     ` Andrew Morton
2012-12-21 20:46       ` Stanislav Kinsbursky
2012-12-21 21:57         ` Sasha Levin
2012-12-22 15:43           ` Sasha Levin
2013-01-09  8:24             ` Stanislav Kinsbursky
2013-01-14  6:31               ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).