linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next 0/5] ipc,msg: fixes and updates
@ 2014-05-13 20:27 Davidlohr Bueso
  2014-05-13 20:27 ` [PATCH 1/5] ipc,msg: use current->state helpers Davidlohr Bueso
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2014-05-13 20:27 UTC (permalink / raw)
  To: manfred, akpm; +Cc: davidlohr, aswin, linux-kernel

This patchset is a mix of misc updates for sysv msg queues, including
some cleanups, fixes and general updates.

Patches 1,2 are cleanups.
Patch 3 fixes how receivers deal with MSG_NOERROR flag.
Patch 4 improves lockless recv documentation.
Patch 5 (perhaps more RFC) loosens how we determine a full queue.

This applies on top of linux-next-20140513. Furthermore, it passes
all LTP test cases, please consider for 3.16.

Thanks!

Davidlohr Bueso (5):
  ipc,msg: use current->state helpers
  ipc,msg: move some msgq ns code around
  ipc,msg: always respect MSG_NOERROR
  ipc,msg: document volatile r_msg
  ipc,msg: loosen check for full queue

 ipc/msg.c | 217 +++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 110 insertions(+), 107 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] ipc,msg: use current->state helpers
  2014-05-13 20:27 [PATCH -next 0/5] ipc,msg: fixes and updates Davidlohr Bueso
@ 2014-05-13 20:27 ` Davidlohr Bueso
  2014-05-17 17:55   ` Manfred Spraul
  2014-05-13 20:27 ` [PATCH 2/5] ipc,msg: move some msgq ns code around Davidlohr Bueso
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2014-05-13 20:27 UTC (permalink / raw)
  To: manfred, akpm; +Cc: davidlohr, aswin, linux-kernel

Call __set_current_state() instead of assigning
the new state directly.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 ipc/msg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 7ed1ef3..5a8489b 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -227,7 +227,7 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
 static inline void ss_add(struct msg_queue *msq, struct msg_sender *mss)
 {
 	mss->tsk = current;
-	current->state = TASK_INTERRUPTIBLE;
+	__set_current_state(TASK_INTERRUPTIBLE);
 	list_add_tail(&mss->list, &msq->q_senders);
 }
 
@@ -976,7 +976,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
 		else
 			msr_d.r_maxsize = bufsz;
 		msr_d.r_msg = ERR_PTR(-EAGAIN);
-		current->state = TASK_INTERRUPTIBLE;
+		__set_current_state(TASK_INTERRUPTIBLE);
 
 		ipc_unlock_object(&msq->q_perm);
 		rcu_read_unlock();
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] ipc,msg: move some msgq ns code around
  2014-05-13 20:27 [PATCH -next 0/5] ipc,msg: fixes and updates Davidlohr Bueso
  2014-05-13 20:27 ` [PATCH 1/5] ipc,msg: use current->state helpers Davidlohr Bueso
@ 2014-05-13 20:27 ` Davidlohr Bueso
  2014-05-17 17:57   ` Manfred Spraul
  2014-05-13 20:27 ` [PATCH 3/5] ipc,msg: always respect MSG_NOERROR Davidlohr Bueso
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2014-05-13 20:27 UTC (permalink / raw)
  To: manfred, akpm; +Cc: davidlohr, aswin, linux-kernel

Nothing big and no logical changes, just get rid of some
redundant function declarations. Move msg_[init/exit]_ns
down the end of the file.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 ipc/msg.c | 132 ++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 63 insertions(+), 69 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 5a8489b..6d33e30 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -70,75 +70,6 @@ struct msg_sender {
 
 #define msg_ids(ns)	((ns)->ids[IPC_MSG_IDS])
 
-static void freeque(struct ipc_namespace *, struct kern_ipc_perm *);
-static int newque(struct ipc_namespace *, struct ipc_params *);
-#ifdef CONFIG_PROC_FS
-static int sysvipc_msg_proc_show(struct seq_file *s, void *it);
-#endif
-
-/*
- * Scale msgmni with the available lowmem size: the memory dedicated to msg
- * queues should occupy at most 1/MSG_MEM_SCALE of lowmem.
- * Also take into account the number of nsproxies created so far.
- * This should be done staying within the (MSGMNI , IPCMNI/nr_ipc_ns) range.
- */
-void recompute_msgmni(struct ipc_namespace *ns)
-{
-	struct sysinfo i;
-	unsigned long allowed;
-	int nb_ns;
-
-	si_meminfo(&i);
-	allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
-		/ MSGMNB;
-	nb_ns = atomic_read(&nr_ipc_ns);
-	allowed /= nb_ns;
-
-	if (allowed < MSGMNI) {
-		ns->msg_ctlmni = MSGMNI;
-		return;
-	}
-
-	if (allowed > IPCMNI / nb_ns) {
-		ns->msg_ctlmni = IPCMNI / nb_ns;
-		return;
-	}
-
-	ns->msg_ctlmni = allowed;
-}
-
-void msg_init_ns(struct ipc_namespace *ns)
-{
-	ns->msg_ctlmax = MSGMAX;
-	ns->msg_ctlmnb = MSGMNB;
-
-	recompute_msgmni(ns);
-
-	atomic_set(&ns->msg_bytes, 0);
-	atomic_set(&ns->msg_hdrs, 0);
-	ipc_init_ids(&ns->ids[IPC_MSG_IDS]);
-}
-
-#ifdef CONFIG_IPC_NS
-void msg_exit_ns(struct ipc_namespace *ns)
-{
-	free_ipcs(ns, &msg_ids(ns), freeque);
-	idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr);
-}
-#endif
-
-void __init msg_init(void)
-{
-	msg_init_ns(&init_ipc_ns);
-
-	printk(KERN_INFO "msgmni has been set to %d\n",
-		init_ipc_ns.msg_ctlmni);
-
-	ipc_init_proc_interface("sysvipc/msg",
-				"       key      msqid perms      cbytes       qnum lspid lrpid   uid   gid  cuid  cgid      stime      rtime      ctime\n",
-				IPC_MSG_IDS, sysvipc_msg_proc_show);
-}
-
 static inline struct msg_queue *msq_obtain_object(struct ipc_namespace *ns, int id)
 {
 	struct kern_ipc_perm *ipcp = ipc_obtain_object(&msg_ids(ns), id);
@@ -1054,6 +985,57 @@ SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz,
 	return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill);
 }
 
+/*
+ * Scale msgmni with the available lowmem size: the memory dedicated to msg
+ * queues should occupy at most 1/MSG_MEM_SCALE of lowmem.
+ * Also take into account the number of nsproxies created so far.
+ * This should be done staying within the (MSGMNI , IPCMNI/nr_ipc_ns) range.
+ */
+void recompute_msgmni(struct ipc_namespace *ns)
+{
+	struct sysinfo i;
+	unsigned long allowed;
+	int nb_ns;
+
+	si_meminfo(&i);
+	allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
+		/ MSGMNB;
+	nb_ns = atomic_read(&nr_ipc_ns);
+	allowed /= nb_ns;
+
+	if (allowed < MSGMNI) {
+		ns->msg_ctlmni = MSGMNI;
+		return;
+	}
+
+	if (allowed > IPCMNI / nb_ns) {
+		ns->msg_ctlmni = IPCMNI / nb_ns;
+		return;
+	}
+
+	ns->msg_ctlmni = allowed;
+}
+
+void msg_init_ns(struct ipc_namespace *ns)
+{
+	ns->msg_ctlmax = MSGMAX;
+	ns->msg_ctlmnb = MSGMNB;
+
+	recompute_msgmni(ns);
+
+	atomic_set(&ns->msg_bytes, 0);
+	atomic_set(&ns->msg_hdrs, 0);
+	ipc_init_ids(&ns->ids[IPC_MSG_IDS]);
+}
+
+#ifdef CONFIG_IPC_NS
+void msg_exit_ns(struct ipc_namespace *ns)
+{
+	free_ipcs(ns, &msg_ids(ns), freeque);
+	idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr);
+}
+#endif
+
 #ifdef CONFIG_PROC_FS
 static int sysvipc_msg_proc_show(struct seq_file *s, void *it)
 {
@@ -1078,3 +1060,15 @@ static int sysvipc_msg_proc_show(struct seq_file *s, void *it)
 			msq->q_ctime);
 }
 #endif
+
+void __init msg_init(void)
+{
+	msg_init_ns(&init_ipc_ns);
+
+	printk(KERN_INFO "msgmni has been set to %d\n",
+		init_ipc_ns.msg_ctlmni);
+
+	ipc_init_proc_interface("sysvipc/msg",
+				"       key      msqid perms      cbytes       qnum lspid lrpid   uid   gid  cuid  cgid      stime      rtime      ctime\n",
+				IPC_MSG_IDS, sysvipc_msg_proc_show);
+}
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] ipc,msg: always respect MSG_NOERROR
  2014-05-13 20:27 [PATCH -next 0/5] ipc,msg: fixes and updates Davidlohr Bueso
  2014-05-13 20:27 ` [PATCH 1/5] ipc,msg: use current->state helpers Davidlohr Bueso
  2014-05-13 20:27 ` [PATCH 2/5] ipc,msg: move some msgq ns code around Davidlohr Bueso
@ 2014-05-13 20:27 ` Davidlohr Bueso
  2014-05-18  5:53   ` Manfred Spraul
  2014-05-13 20:27 ` [PATCH 4/5] ipc,msg: document volatile r_msg Davidlohr Bueso
  2014-05-13 20:27 ` [PATCH 5/5] ipc,msg: loosen check for full queue Davidlohr Bueso
  4 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2014-05-13 20:27 UTC (permalink / raw)
  To: manfred, akpm; +Cc: davidlohr, aswin, linux-kernel

When specifying the MSG_NOERROR flag, receivers can avoid returning
error (E2BIG) and just truncate the message text, if it is too large.

Currently, this logic is only respected when there are already pending
messages in the queue. Fix this for the case when there are only
receivers waiting for a msg to be sent. In order for this to work, save
the flags in the msg_receiver struct as it must be used later when
doing the pipeline send.

Also do some pipeline_send() cleanups while at it.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 ipc/msg.c | 65 +++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 36 insertions(+), 29 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 6d33e30..c2cdb5b 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -50,6 +50,7 @@ struct msg_receiver {
 	struct task_struct	*r_tsk;
 
 	int			r_mode;
+	int                     r_msgflg;
 	long			r_msgtype;
 	long			r_maxsize;
 
@@ -562,42 +563,47 @@ static int testmsg(struct msg_msg *msg, long type, int mode)
 	return 0;
 }
 
-static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg)
+static inline bool pipelined_send(struct msg_queue *msq, struct msg_msg *msg)
 {
 	struct msg_receiver *msr, *t;
 
 	list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) {
-		if (testmsg(msg, msr->r_msgtype, msr->r_mode) &&
-		    !security_msg_queue_msgrcv(msq, msg, msr->r_tsk,
-					       msr->r_msgtype, msr->r_mode)) {
-
-			list_del(&msr->r_list);
-			if (msr->r_maxsize < msg->m_ts) {
-				/* initialize pipelined send ordering */
-				msr->r_msg = NULL;
-				wake_up_process(msr->r_tsk);
-				smp_mb(); /* see barrier comment below */
-				msr->r_msg = ERR_PTR(-E2BIG);
-			} else {
-				msr->r_msg = NULL;
-				msq->q_lrpid = task_pid_vnr(msr->r_tsk);
-				msq->q_rtime = get_seconds();
-				wake_up_process(msr->r_tsk);
-				/*
-				 * Ensure that the wakeup is visible before
-				 * setting r_msg, as the receiving end depends
-				 * on it. See lockless receive part 1 and 2 in
-				 * do_msgrcv().
-				 */
-				smp_mb();
-				msr->r_msg = msg;
-
-				return 1;
-			}
+		if (!testmsg(msg, msr->r_msgtype, msr->r_mode))
+			continue;
+		if (security_msg_queue_msgrcv(msq, msg, msr->r_tsk,
+					      msr->r_msgtype, msr->r_mode))
+			continue;
+
+		/* found a suitable receiver, time to dequeue and wake */
+		list_del(&msr->r_list);
+
+		/* initialize pipelined send ordering */
+		msr->r_msg = NULL;
+
+		if (msr->r_maxsize < msg->m_ts &&
+		    !(msr->r_msgflg & MSG_NOERROR)) {
+			wake_up_process(msr->r_tsk);
+			smp_mb(); /* see barrier comment below */
+			msr->r_msg = ERR_PTR(-E2BIG);
+		} else {
+			msq->q_lrpid = task_pid_vnr(msr->r_tsk);
+			msq->q_rtime = get_seconds();
+			wake_up_process(msr->r_tsk);
+
+			/*
+			 * Ensure that the wakeup is visible before
+			 * setting r_msg, as the receiving end depends
+			 * on it. See lockless receive part 1 and 2 in
+			 * do_msgrcv().
+			 */
+			smp_mb();
+			msr->r_msg = msg;
+
+			return true;
 		}
 	}
 
-	return 0;
+	return false; /* no receivers on the other side */
 }
 
 long do_msgsnd(int msqid, long mtype, void __user *mtext,
@@ -901,6 +907,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
 		list_add_tail(&msr_d.r_list, &msq->q_receivers);
 		msr_d.r_tsk = current;
 		msr_d.r_msgtype = msgtyp;
+		msr_d.r_msgflg = msgflg;
 		msr_d.r_mode = mode;
 		if (msgflg & MSG_NOERROR)
 			msr_d.r_maxsize = INT_MAX;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] ipc,msg: document volatile r_msg
  2014-05-13 20:27 [PATCH -next 0/5] ipc,msg: fixes and updates Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2014-05-13 20:27 ` [PATCH 3/5] ipc,msg: always respect MSG_NOERROR Davidlohr Bueso
@ 2014-05-13 20:27 ` Davidlohr Bueso
  2014-05-18  6:08   ` Manfred Spraul
  2014-05-13 20:27 ` [PATCH 5/5] ipc,msg: loosen check for full queue Davidlohr Bueso
  4 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2014-05-13 20:27 UTC (permalink / raw)
  To: manfred, akpm; +Cc: davidlohr, aswin, linux-kernel

The need for volatile is not obvious, document it.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 ipc/msg.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index c2cdb5b..956cd65 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -42,9 +42,7 @@
 #include <linux/uaccess.h>
 #include "util.h"
 
-/*
- * one msg_receiver structure for each sleeping receiver:
- */
+/* one msg_receiver structure for each sleeping receiver */
 struct msg_receiver {
 	struct list_head	r_list;
 	struct task_struct	*r_tsk;
@@ -54,6 +52,12 @@ struct msg_receiver {
 	long			r_msgtype;
 	long			r_maxsize;
 
+	/*
+	 * Mark r_msg volatile so that the compiler
+	 * does not try to get smart and optimize
+	 * it. We rely on this for the lockless
+	 * receive algorithm.
+	 */
 	struct msg_msg		*volatile r_msg;
 };
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] ipc,msg: loosen check for full queue
  2014-05-13 20:27 [PATCH -next 0/5] ipc,msg: fixes and updates Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2014-05-13 20:27 ` [PATCH 4/5] ipc,msg: document volatile r_msg Davidlohr Bueso
@ 2014-05-13 20:27 ` Davidlohr Bueso
  2014-05-14 18:00   ` Manfred Spraul
  4 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2014-05-13 20:27 UTC (permalink / raw)
  To: manfred, akpm; +Cc: davidlohr, aswin, linux-kernel

When sending a message, we must guarantee that there will be
enough room in the queue to add it, otherwise wait until space
becomes available -- which requires blocking the calling task.
Currently, this relies meeting both of the following conditions:

(i) The new msg size + current size is lower than the queue's max size.
(ii) The current amount of messages in the queue + 1 (this msg) is lower
     than the queue's max size.

While (i) is obvious and well documented in the msgsnd(2) manpage, the
second one is far more ambiguous and does not serve the purpose, as we do
not control the amount of messages in the queue (unlike posix queues do).
Thus remove (ii) and loosen how we check for space.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 ipc/msg.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 956cd65..7d267d0 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -658,10 +658,8 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
 		if (err)
 			goto out_unlock0;
 
-		if (msgsz + msq->q_cbytes <= msq->q_qbytes &&
-				1 + msq->q_qnum <= msq->q_qbytes) {
-			break;
-		}
+		if (msgsz + msq->q_cbytes <= msq->q_qbytes)
+			break; /* there is space in the queue for this msg */
 
 		/* queue full, wait: */
 		if (msgflg & IPC_NOWAIT) {
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] ipc,msg: loosen check for full queue
  2014-05-13 20:27 ` [PATCH 5/5] ipc,msg: loosen check for full queue Davidlohr Bueso
@ 2014-05-14 18:00   ` Manfred Spraul
  2014-05-14 19:50     ` Davidlohr Bueso
  0 siblings, 1 reply; 18+ messages in thread
From: Manfred Spraul @ 2014-05-14 18:00 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm; +Cc: aswin, linux-kernel

Hi Davidlohr, Hi Andrew,

On 05/13/2014 10:27 PM, Davidlohr Bueso wrote:
> When sending a message, we must guarantee that there will be
> enough room in the queue to add it, otherwise wait until space
> becomes available -- which requires blocking the calling task.
> Currently, this relies meeting both of the following conditions:
>
> (i) The new msg size + current size is lower than the queue's max size.
> (ii) The current amount of messages in the queue + 1 (this msg) is lower
>       than the queue's max size.
>
> While (i) is obvious and well documented in the msgsnd(2) manpage, the
> second one is far more ambiguous and does not serve the purpose, as we do
> not control the amount of messages in the queue (unlike posix queues do).
> Thus remove (ii) and loosen how we check for space.
I know that (ii) is undocumented and not part of SUS, but I think it is 
required:
Otherwise it would be possible to use up an infinite amount of locked 
memory by sending 0-byte messages.

I added it some long time ago, and at that time I didn't check what the 
other sysv msg implementation were doing.

 From a quick search:

FreeBSD:
- Optional: RACCT_MSGQQUEUED
- Always: The number of messages is limited due to a global array 
(limited to MSGTQL entries: array msghdrs, free list  free_msghdrs)

http://fxr.watson.org/fxr/source/kern/sysv_msg.c

Opensolaris:
- Also a limit of the number of messages:
http://fxr.watson.org/fxr/source/common/os/msg.c?v=OPENSOLARIS;im=bigexcerpts#L1166

- it appears that it is related to MSGTQL:
http://fxr.watson.org/fxr/source/common/os/msg.c?v=OPENSOLARIS;im=bigexcerpts#L652

I.e.: We cannot remove the check unless we add another mechanism that 
limits the number of messages.

--
     Manfred

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] ipc,msg: loosen check for full queue
  2014-05-14 18:00   ` Manfred Spraul
@ 2014-05-14 19:50     ` Davidlohr Bueso
  2014-05-15  4:20       ` Manfred Spraul
  0 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2014-05-14 19:50 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: akpm, aswin, linux-kernel

Hi Manfred,

On Wed, 2014-05-14 at 20:00 +0200, Manfred Spraul wrote:
> Hi Davidlohr, Hi Andrew,
> 
> On 05/13/2014 10:27 PM, Davidlohr Bueso wrote:
> > When sending a message, we must guarantee that there will be
> > enough room in the queue to add it, otherwise wait until space
> > becomes available -- which requires blocking the calling task.
> > Currently, this relies meeting both of the following conditions:
> >
> > (i) The new msg size + current size is lower than the queue's max size.
> > (ii) The current amount of messages in the queue + 1 (this msg) is lower
> >       than the queue's max size.
> >
> > While (i) is obvious and well documented in the msgsnd(2) manpage, the
> > second one is far more ambiguous and does not serve the purpose, as we do
> > not control the amount of messages in the queue (unlike posix queues do).
> > Thus remove (ii) and loosen how we check for space.
> I know that (ii) is undocumented and not part of SUS, but I think it is 
> required:

Yeah, I was kind of expecting something to come up, which is why I
suggested it as an RFC in the cover letter.

> Otherwise it would be possible to use up an infinite amount of locked 
> memory by sending 0-byte messages.

Locked as in unswapable used by the kernel internally, right? I agree,
that would be a bad thing to do.

> I added it some long time ago, and at that time I didn't check what the 
> other sysv msg implementation were doing.
> 
>  From a quick search:
> 
> FreeBSD:
> - Optional: RACCT_MSGQQUEUED
> - Always: The number of messages is limited due to a global array 
> (limited to MSGTQL entries: array msghdrs, free list  free_msghdrs)
> 
> http://fxr.watson.org/fxr/source/kern/sysv_msg.c
> 
> Opensolaris:
> - Also a limit of the number of messages:
> http://fxr.watson.org/fxr/source/common/os/msg.c?v=OPENSOLARIS;im=bigexcerpts#L1166
> 
> - it appears that it is related to MSGTQL:
> http://fxr.watson.org/fxr/source/common/os/msg.c?v=OPENSOLARIS;im=bigexcerpts#L652
> 
> I.e.: We cannot remove the check unless we add another mechanism that 
> limits the number of messages.

Looking further, HP-UX also implements this, so it seems to be quite
common in Unix, ie: http://nixdoc.net/man-pages/HP-UX/man5/msgtql.5.html

I am very aware that sysv ipc is no longer as sexy and popular as it
once was (before my time :-), but I'm wondering if we should actually
implement MSGTQL. The thing is, I really hate how we're mixing things
here, ie: number of msg in a queue vs the size of the queue. That seems
simply wrong and misleading, and I wouldn't mind changing that. On the
other hand, I am not aware of programs using msgq sleeping frequently
because of this.

Do you have any preferences? I can cook up a patch if you think that
this merits Linux having MSGTQL.

Worst case scenario, we should update the msgsnd(2) manpage and document
this unique Linux behavior.

It seems that Linux 3.16 will be the "lets dig around the long forgotten
corners of sysv ipc" release :)

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] ipc,msg: loosen check for full queue
  2014-05-14 19:50     ` Davidlohr Bueso
@ 2014-05-15  4:20       ` Manfred Spraul
  2014-05-15 15:46         ` Davidlohr Bueso
  0 siblings, 1 reply; 18+ messages in thread
From: Manfred Spraul @ 2014-05-15  4:20 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: akpm, aswin, linux-kernel

Hi Davidlohr,

On 05/14/2014 09:50 PM, Davidlohr Bueso wrote:
> Do you have any preferences? I can cook up a patch if you think that
> this merits Linux having MSGTQL.
MSGTQL means a global counter - therefore zero scalability. That's why I 
didn't implement it when I noticed the issue with 0-byte messages.
> Worst case scenario, we should update the msgsnd(2) manpage and document
> this unique Linux behavior.
I would document the current behavior.

--
     Manfred


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] ipc,msg: loosen check for full queue
  2014-05-15  4:20       ` Manfred Spraul
@ 2014-05-15 15:46         ` Davidlohr Bueso
  2014-05-16 12:47           ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2014-05-15 15:46 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: akpm, aswin, linux-kernel, Michael Kerrisk (man-pages)

On Thu, 2014-05-15 at 06:20 +0200, Manfred Spraul wrote:
> Hi Davidlohr,
> 
> On 05/14/2014 09:50 PM, Davidlohr Bueso wrote:
> > Do you have any preferences? I can cook up a patch if you think that
> > this merits Linux having MSGTQL.
> MSGTQL means a global counter - therefore zero scalability. That's why I 
> didn't implement it when I noticed the issue with 0-byte messages.

Hmmm so I was actually thinking of calculating it on demand, but after a
closer look, we don't track each queue in the system, which would have
made this rather trivial.

We do however have plenty of similar counters in the kernel, and the
natural way of dealing with the scalability issue is using percpu
counters. But I won't argue much if we leave it as it is, it's been like
that since almost forever and no one is complaining but me.

Andrew, could you please drop this patch from -mm, I'll send another one
to add a comment instead.

> > Worst case scenario, we should update the msgsnd(2) manpage and document
> > this unique Linux behavior.
> I would document the current behavior.

Cc'ing Michael. Here is a vague attempt to update our manpage, feel free
to update it to your taste.

Thanks,
Davidlohr

8<------------------------------------------------------------
From: Davidlohr Bueso <davidlohr@hp.com>
Subject: [PATCH] msgop.2: Document full queue criteria

Explicitly mention the two conditions we rely on when
checking if a message queue is full when calling msgsnd(2).

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 man2/msgop.2 | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/man2/msgop.2 b/man2/msgop.2
index 3f5bc36..b4c8c04 100644
--- a/man2/msgop.2
+++ b/man2/msgop.2
@@ -105,13 +105,29 @@ by
 If sufficient space is available in the queue,
 .BR msgsnd ()
 succeeds immediately.
-(The queue capacity is defined by the
-.I msg_qbytes
+The queue capacity is governed by the
+.I msg_qbytes 
 field in the associated data structure for the message queue.
 During queue creation this field is initialized to
 .B MSGMNB
 bytes, but this limit can be modified using
-.BR msgctl (2).)
+.BR msgctl (2).
+A full queue is defined by two factors :
+.IP * 2
+The new msg size + current size of the queue is greater than the 
+queue's maximum size (the
+.I msg_qbytes
+field).
+.IP *
+The current amount of messages in the queue + 1 (the new msg) is 
+greater than the queue's maximum size (the
+.I msg_qbytes
+field). This is necessary to prevent users from using infinite 
+amounts of locked memory (used by the kernel for headers) by 
+sending 0-byte messages. This is equivalent to the traditional
+MSGTQL parameter present in many Unix systems. This behavior
+is unique to Linux.
+.PP
 If insufficient space is available in the queue, then the default
 behavior of
 .BR msgsnd ()
-- 
1.8.1.4





^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] ipc,msg: loosen check for full queue
  2014-05-15 15:46         ` Davidlohr Bueso
@ 2014-05-16 12:47           ` Michael Kerrisk (man-pages)
  2014-05-18 18:16             ` Davidlohr Bueso
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-16 12:47 UTC (permalink / raw)
  To: Davidlohr Bueso, Manfred Spraul; +Cc: mtk.manpages, akpm, aswin, linux-kernel


Hi Davidlohr,

On 05/15/2014 05:46 PM, Davidlohr Bueso wrote:
> On Thu, 2014-05-15 at 06:20 +0200, Manfred Spraul wrote:
>> Hi Davidlohr,
>>
>> On 05/14/2014 09:50 PM, Davidlohr Bueso wrote:
>>> Do you have any preferences? I can cook up a patch if you think that
>>> this merits Linux having MSGTQL.
>> MSGTQL means a global counter - therefore zero scalability. That's why I 
>> didn't implement it when I noticed the issue with 0-byte messages.
> 
> Hmmm so I was actually thinking of calculating it on demand, but after a
> closer look, we don't track each queue in the system, which would have
> made this rather trivial.
> 
> We do however have plenty of similar counters in the kernel, and the
> natural way of dealing with the scalability issue is using percpu
> counters. But I won't argue much if we leave it as it is, it's been like
> that since almost forever and no one is complaining but me.
> 
> Andrew, could you please drop this patch from -mm, I'll send another one
> to add a comment instead.
> 
>>> Worst case scenario, we should update the msgsnd(2) manpage and document
>>> this unique Linux behavior.
>> I would document the current behavior.
> 
> Cc'ing Michael. Here is a vague attempt to update our manpage, feel free
> to update it to your taste.
> 
> Thanks,
> Davidlohr
> 
> 8<------------------------------------------------------------
> From: Davidlohr Bueso <davidlohr@hp.com>
> Subject: [PATCH] msgop.2: Document full queue criteria
> 
> Explicitly mention the two conditions we rely on when
> checking if a message queue is full when calling msgsnd(2).
> 
> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> ---
>  man2/msgop.2 | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/man2/msgop.2 b/man2/msgop.2
> index 3f5bc36..b4c8c04 100644
> --- a/man2/msgop.2
> +++ b/man2/msgop.2
> @@ -105,13 +105,29 @@ by
>  If sufficient space is available in the queue,
>  .BR msgsnd ()
>  succeeds immediately.
> -(The queue capacity is defined by the
> -.I msg_qbytes
> +The queue capacity is governed by the
> +.I msg_qbytes 
>  field in the associated data structure for the message queue.
>  During queue creation this field is initialized to
>  .B MSGMNB
>  bytes, but this limit can be modified using
> -.BR msgctl (2).)
> +.BR msgctl (2).
> +A full queue is defined by two factors :
> +.IP * 2
> +The new msg size + current size of the queue is greater than the 
> +queue's maximum size (the
> +.I msg_qbytes
> +field).
> +.IP *
> +The current amount of messages in the queue + 1 (the new msg) is 
> +greater than the queue's maximum size (the
> +.I msg_qbytes
> +field). This is necessary to prevent users from using infinite 
> +amounts of locked memory (used by the kernel for headers) by 
> +sending 0-byte messages. This is equivalent to the traditional
> +MSGTQL parameter present in many Unix systems. This behavior
> +is unique to Linux.
> +.PP
>  If insufficient space is available in the queue, then the default
>  behavior of
>  .BR msgsnd ()


I applied, and reworded a little. Also: I dropped the piece about
MSGTQL, since it is not quite right. As noted elsewhere on the page
MSGTQL is a *system-wide* (not per-queue) limit on the number of
messages in all MQs. Also: I dropped the piece saying this is unique
to Linux. I believe that's true, but it implies there's a lot
more standardization around these limits than there actually is
in my observation.

Thanks for the patch!

Cheers,

Michael

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] ipc,msg: use current->state helpers
  2014-05-13 20:27 ` [PATCH 1/5] ipc,msg: use current->state helpers Davidlohr Bueso
@ 2014-05-17 17:55   ` Manfred Spraul
  0 siblings, 0 replies; 18+ messages in thread
From: Manfred Spraul @ 2014-05-17 17:55 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm; +Cc: aswin, linux-kernel

On 05/13/2014 10:27 PM, Davidlohr Bueso wrote:
> Call __set_current_state() instead of assigning
> the new state directly.
>
> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Manfred Spraul <manfred@colorfullif.com>

> ---
>   ipc/msg.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/ipc/msg.c b/ipc/msg.c
> index 7ed1ef3..5a8489b 100644
> --- a/ipc/msg.c
> +++ b/ipc/msg.c
> @@ -227,7 +227,7 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
>   static inline void ss_add(struct msg_queue *msq, struct msg_sender *mss)
>   {
>   	mss->tsk = current;
> -	current->state = TASK_INTERRUPTIBLE;
> +	__set_current_state(TASK_INTERRUPTIBLE);
>   	list_add_tail(&mss->list, &msq->q_senders);
>   }
>   
> @@ -976,7 +976,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
>   		else
>   			msr_d.r_maxsize = bufsz;
>   		msr_d.r_msg = ERR_PTR(-EAGAIN);
> -		current->state = TASK_INTERRUPTIBLE;
> +		__set_current_state(TASK_INTERRUPTIBLE);
>   
>   		ipc_unlock_object(&msq->q_perm);
>   		rcu_read_unlock();


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] ipc,msg: move some msgq ns code around
  2014-05-13 20:27 ` [PATCH 2/5] ipc,msg: move some msgq ns code around Davidlohr Bueso
@ 2014-05-17 17:57   ` Manfred Spraul
  0 siblings, 0 replies; 18+ messages in thread
From: Manfred Spraul @ 2014-05-17 17:57 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm; +Cc: aswin, linux-kernel

On 05/13/2014 10:27 PM, Davidlohr Bueso wrote:
> Nothing big and no logical changes, just get rid of some
> redundant function declarations. Move msg_[init/exit]_ns
> down the end of the file.
>
> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>

> ---
>   ipc/msg.c | 132 ++++++++++++++++++++++++++++++--------------------------------
>   1 file changed, 63 insertions(+), 69 deletions(-)
>
> diff --git a/ipc/msg.c b/ipc/msg.c
> index 5a8489b..6d33e30 100644
> --- a/ipc/msg.c
> +++ b/ipc/msg.c
> @@ -70,75 +70,6 @@ struct msg_sender {
>   
>   #define msg_ids(ns)	((ns)->ids[IPC_MSG_IDS])
>   
> -static void freeque(struct ipc_namespace *, struct kern_ipc_perm *);
> -static int newque(struct ipc_namespace *, struct ipc_params *);
> -#ifdef CONFIG_PROC_FS
> -static int sysvipc_msg_proc_show(struct seq_file *s, void *it);
> -#endif
> -
> -/*
> - * Scale msgmni with the available lowmem size: the memory dedicated to msg
> - * queues should occupy at most 1/MSG_MEM_SCALE of lowmem.
> - * Also take into account the number of nsproxies created so far.
> - * This should be done staying within the (MSGMNI , IPCMNI/nr_ipc_ns) range.
> - */
> -void recompute_msgmni(struct ipc_namespace *ns)
> -{
> -	struct sysinfo i;
> -	unsigned long allowed;
> -	int nb_ns;
> -
> -	si_meminfo(&i);
> -	allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
> -		/ MSGMNB;
> -	nb_ns = atomic_read(&nr_ipc_ns);
> -	allowed /= nb_ns;
> -
> -	if (allowed < MSGMNI) {
> -		ns->msg_ctlmni = MSGMNI;
> -		return;
> -	}
> -
> -	if (allowed > IPCMNI / nb_ns) {
> -		ns->msg_ctlmni = IPCMNI / nb_ns;
> -		return;
> -	}
> -
> -	ns->msg_ctlmni = allowed;
> -}
> -
> -void msg_init_ns(struct ipc_namespace *ns)
> -{
> -	ns->msg_ctlmax = MSGMAX;
> -	ns->msg_ctlmnb = MSGMNB;
> -
> -	recompute_msgmni(ns);
> -
> -	atomic_set(&ns->msg_bytes, 0);
> -	atomic_set(&ns->msg_hdrs, 0);
> -	ipc_init_ids(&ns->ids[IPC_MSG_IDS]);
> -}
> -
> -#ifdef CONFIG_IPC_NS
> -void msg_exit_ns(struct ipc_namespace *ns)
> -{
> -	free_ipcs(ns, &msg_ids(ns), freeque);
> -	idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr);
> -}
> -#endif
> -
> -void __init msg_init(void)
> -{
> -	msg_init_ns(&init_ipc_ns);
> -
> -	printk(KERN_INFO "msgmni has been set to %d\n",
> -		init_ipc_ns.msg_ctlmni);
> -
> -	ipc_init_proc_interface("sysvipc/msg",
> -				"       key      msqid perms      cbytes       qnum lspid lrpid   uid   gid  cuid  cgid      stime      rtime      ctime\n",
> -				IPC_MSG_IDS, sysvipc_msg_proc_show);
> -}
> -
>   static inline struct msg_queue *msq_obtain_object(struct ipc_namespace *ns, int id)
>   {
>   	struct kern_ipc_perm *ipcp = ipc_obtain_object(&msg_ids(ns), id);
> @@ -1054,6 +985,57 @@ SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz,
>   	return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill);
>   }
>   
> +/*
> + * Scale msgmni with the available lowmem size: the memory dedicated to msg
> + * queues should occupy at most 1/MSG_MEM_SCALE of lowmem.
> + * Also take into account the number of nsproxies created so far.
> + * This should be done staying within the (MSGMNI , IPCMNI/nr_ipc_ns) range.
> + */
> +void recompute_msgmni(struct ipc_namespace *ns)
> +{
> +	struct sysinfo i;
> +	unsigned long allowed;
> +	int nb_ns;
> +
> +	si_meminfo(&i);
> +	allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
> +		/ MSGMNB;
> +	nb_ns = atomic_read(&nr_ipc_ns);
> +	allowed /= nb_ns;
> +
> +	if (allowed < MSGMNI) {
> +		ns->msg_ctlmni = MSGMNI;
> +		return;
> +	}
> +
> +	if (allowed > IPCMNI / nb_ns) {
> +		ns->msg_ctlmni = IPCMNI / nb_ns;
> +		return;
> +	}
> +
> +	ns->msg_ctlmni = allowed;
> +}
> +
> +void msg_init_ns(struct ipc_namespace *ns)
> +{
> +	ns->msg_ctlmax = MSGMAX;
> +	ns->msg_ctlmnb = MSGMNB;
> +
> +	recompute_msgmni(ns);
> +
> +	atomic_set(&ns->msg_bytes, 0);
> +	atomic_set(&ns->msg_hdrs, 0);
> +	ipc_init_ids(&ns->ids[IPC_MSG_IDS]);
> +}
> +
> +#ifdef CONFIG_IPC_NS
> +void msg_exit_ns(struct ipc_namespace *ns)
> +{
> +	free_ipcs(ns, &msg_ids(ns), freeque);
> +	idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr);
> +}
> +#endif
> +
>   #ifdef CONFIG_PROC_FS
>   static int sysvipc_msg_proc_show(struct seq_file *s, void *it)
>   {
> @@ -1078,3 +1060,15 @@ static int sysvipc_msg_proc_show(struct seq_file *s, void *it)
>   			msq->q_ctime);
>   }
>   #endif
> +
> +void __init msg_init(void)
> +{
> +	msg_init_ns(&init_ipc_ns);
> +
> +	printk(KERN_INFO "msgmni has been set to %d\n",
> +		init_ipc_ns.msg_ctlmni);
> +
> +	ipc_init_proc_interface("sysvipc/msg",
> +				"       key      msqid perms      cbytes       qnum lspid lrpid   uid   gid  cuid  cgid      stime      rtime      ctime\n",
> +				IPC_MSG_IDS, sysvipc_msg_proc_show);
> +}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] ipc,msg: always respect MSG_NOERROR
  2014-05-13 20:27 ` [PATCH 3/5] ipc,msg: always respect MSG_NOERROR Davidlohr Bueso
@ 2014-05-18  5:53   ` Manfred Spraul
  2014-05-18 18:01     ` Davidlohr Bueso
  0 siblings, 1 reply; 18+ messages in thread
From: Manfred Spraul @ 2014-05-18  5:53 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm; +Cc: aswin, linux-kernel

On 05/13/2014 10:27 PM, Davidlohr Bueso wrote:
> When specifying the MSG_NOERROR flag, receivers can avoid returning
> error (E2BIG) and just truncate the message text, if it is too large.
>
> Currently, this logic is only respected when there are already pending
> messages in the queue.
Do you have a test case? The code should handle that
(See below)
>   Fix this for the case when there are only
> receivers waiting for a msg to be sent. In order for this to work, save
> the flags in the msg_receiver struct as it must be used later when
> doing the pipeline send.
No, it is sufficient to set the message size to infinity.

> Also do some pipeline_send() cleanups while at it.
No - please don't mix cleanups with bugfixes.

>   
>   long do_msgsnd(int msqid, long mtype, void __user *mtext,
> @@ -901,6 +907,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
>   		list_add_tail(&msr_d.r_list, &msq->q_receivers);
>   		msr_d.r_tsk = current;
>   		msr_d.r_msgtype = msgtyp;
> +		msr_d.r_msgflg = msgflg;
>   		msr_d.r_mode = mode;
>   		if (msgflg & MSG_NOERROR)
>   			msr_d.r_maxsize = INT_MAX;
    ^^^^^^
This code should handle MSG_NOERROR:
If MSG_NOERROR is set, then maxsize is set to INT_MAX, therefore -E2BIG 
should never be returned.

--
     Manfred

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] ipc,msg: document volatile r_msg
  2014-05-13 20:27 ` [PATCH 4/5] ipc,msg: document volatile r_msg Davidlohr Bueso
@ 2014-05-18  6:08   ` Manfred Spraul
  0 siblings, 0 replies; 18+ messages in thread
From: Manfred Spraul @ 2014-05-18  6:08 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm; +Cc: aswin, linux-kernel

Hi Davidlohr,

On 05/13/2014 10:27 PM, Davidlohr Bueso wrote:
> The need for volatile is not obvious, document it.
>
> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>

In the long run, it would be great if the logic from sem.c would be 
moved to one central place.
- wake_up_sem_queue_prepare()
- wake_up_sem_queue_do()
- get_queue_result()
- the realtime compatible implementation that uses a completion instead 
of the busy loop in get_queue_result()

The code more or less just a reliable wait queue, nothing is specific to 
sysv:
- the woken up task knows if it was woken up due to a signal or due to 
wake_up_process()
- wake_up_process() out of line, after dropping all locks
- only two lock operation (one for: check that task must wait and go to 
sleep, one for: identify the task that must be woken up).

With a "normal" wait queue, there are three lock operation (notice that 
task was woken up, cleanup)


> ---
>   ipc/msg.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/ipc/msg.c b/ipc/msg.c
> index c2cdb5b..956cd65 100644
> --- a/ipc/msg.c
> +++ b/ipc/msg.c
> @@ -42,9 +42,7 @@
>   #include <linux/uaccess.h>
>   #include "util.h"
>   
> -/*
> - * one msg_receiver structure for each sleeping receiver:
> - */
> +/* one msg_receiver structure for each sleeping receiver */
>   struct msg_receiver {
>   	struct list_head	r_list;
>   	struct task_struct	*r_tsk;
> @@ -54,6 +52,12 @@ struct msg_receiver {
>   	long			r_msgtype;
>   	long			r_maxsize;
>   
> +	/*
> +	 * Mark r_msg volatile so that the compiler
> +	 * does not try to get smart and optimize
> +	 * it. We rely on this for the lockless
> +	 * receive algorithm.
> +	 */
>   	struct msg_msg		*volatile r_msg;
>   };
>   


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] ipc,msg: always respect MSG_NOERROR
  2014-05-18  5:53   ` Manfred Spraul
@ 2014-05-18 18:01     ` Davidlohr Bueso
  2014-05-19 18:01       ` Manfred Spraul
  0 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2014-05-18 18:01 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: akpm, aswin, linux-kernel

On Sun, 2014-05-18 at 07:53 +0200, Manfred Spraul wrote:
> On 05/13/2014 10:27 PM, Davidlohr Bueso wrote:
> > When specifying the MSG_NOERROR flag, receivers can avoid returning
> > error (E2BIG) and just truncate the message text, if it is too large.
> >
> > Currently, this logic is only respected when there are already pending
> > messages in the queue.
> Do you have a test case? The code should handle that
> (See below)
> >   Fix this for the case when there are only
> > receivers waiting for a msg to be sent. In order for this to work, save
> > the flags in the msg_receiver struct as it must be used later when
> > doing the pipeline send.
> No, it is sufficient to set the message size to infinity.
> 
> > Also do some pipeline_send() cleanups while at it.
> No - please don't mix cleanups with bugfixes.
> 
> >   
> >   long do_msgsnd(int msqid, long mtype, void __user *mtext,
> > @@ -901,6 +907,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
> >   		list_add_tail(&msr_d.r_list, &msq->q_receivers);
> >   		msr_d.r_tsk = current;
> >   		msr_d.r_msgtype = msgtyp;
> > +		msr_d.r_msgflg = msgflg;
> >   		msr_d.r_mode = mode;
> >   		if (msgflg & MSG_NOERROR)
> >   			msr_d.r_maxsize = INT_MAX;
>     ^^^^^^
> This code should handle MSG_NOERROR:
> If MSG_NOERROR is set, then maxsize is set to INT_MAX, therefore -E2BIG 
> should never be returned.

Yeah, I noticed that, but I'd still prefer keeping the check, even if
redundant. It's free and by keeping both scenarios where there are and
aren't msg waiting in the queue the code becomes easier to read.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] ipc,msg: loosen check for full queue
  2014-05-16 12:47           ` Michael Kerrisk (man-pages)
@ 2014-05-18 18:16             ` Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2014-05-18 18:16 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: Manfred Spraul, akpm, aswin, linux-kernel

On Fri, 2014-05-16 at 14:47 +0200, Michael Kerrisk (man-pages) wrote:
> Hi Davidlohr,
> 
> On 05/15/2014 05:46 PM, Davidlohr Bueso wrote:
> > On Thu, 2014-05-15 at 06:20 +0200, Manfred Spraul wrote:
> >> Hi Davidlohr,
> >>
> >> On 05/14/2014 09:50 PM, Davidlohr Bueso wrote:
> >>> Do you have any preferences? I can cook up a patch if you think that
> >>> this merits Linux having MSGTQL.
> >> MSGTQL means a global counter - therefore zero scalability. That's why I 
> >> didn't implement it when I noticed the issue with 0-byte messages.
> > 
> > Hmmm so I was actually thinking of calculating it on demand, but after a
> > closer look, we don't track each queue in the system, which would have
> > made this rather trivial.
> > 
> > We do however have plenty of similar counters in the kernel, and the
> > natural way of dealing with the scalability issue is using percpu
> > counters. But I won't argue much if we leave it as it is, it's been like
> > that since almost forever and no one is complaining but me.
> > 
> > Andrew, could you please drop this patch from -mm, I'll send another one
> > to add a comment instead.
> > 
> >>> Worst case scenario, we should update the msgsnd(2) manpage and document
> >>> this unique Linux behavior.
> >> I would document the current behavior.
> > 
> > Cc'ing Michael. Here is a vague attempt to update our manpage, feel free
> > to update it to your taste.
> > 
> > Thanks,
> > Davidlohr
> > 
> > 8<------------------------------------------------------------
> > From: Davidlohr Bueso <davidlohr@hp.com>
> > Subject: [PATCH] msgop.2: Document full queue criteria
> > 
> > Explicitly mention the two conditions we rely on when
> > checking if a message queue is full when calling msgsnd(2).
> > 
> > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > ---
> >  man2/msgop.2 | 22 +++++++++++++++++++---
> >  1 file changed, 19 insertions(+), 3 deletions(-)
> > 
> > diff --git a/man2/msgop.2 b/man2/msgop.2
> > index 3f5bc36..b4c8c04 100644
> > --- a/man2/msgop.2
> > +++ b/man2/msgop.2
> > @@ -105,13 +105,29 @@ by
> >  If sufficient space is available in the queue,
> >  .BR msgsnd ()
> >  succeeds immediately.
> > -(The queue capacity is defined by the
> > -.I msg_qbytes
> > +The queue capacity is governed by the
> > +.I msg_qbytes 
> >  field in the associated data structure for the message queue.
> >  During queue creation this field is initialized to
> >  .B MSGMNB
> >  bytes, but this limit can be modified using
> > -.BR msgctl (2).)
> > +.BR msgctl (2).
> > +A full queue is defined by two factors :
> > +.IP * 2
> > +The new msg size + current size of the queue is greater than the 
> > +queue's maximum size (the
> > +.I msg_qbytes
> > +field).
> > +.IP *
> > +The current amount of messages in the queue + 1 (the new msg) is 
> > +greater than the queue's maximum size (the
> > +.I msg_qbytes
> > +field). This is necessary to prevent users from using infinite 
> > +amounts of locked memory (used by the kernel for headers) by 
> > +sending 0-byte messages. This is equivalent to the traditional
> > +MSGTQL parameter present in many Unix systems. This behavior
> > +is unique to Linux.
> > +.PP
> >  If insufficient space is available in the queue, then the default
> >  behavior of
> >  .BR msgsnd ()
> 
> 
> I applied, and reworded a little. Also: I dropped the piece about
> MSGTQL, since it is not quite right. As noted elsewhere on the page
> MSGTQL is a *system-wide* (not per-queue) limit on the number of
> messages in all MQs. Also: I dropped the piece saying this is unique
> to Linux. I believe that's true, but it implies there's a lot
> more standardization around these limits than there actually is
> in my observation.
> 
> Thanks for the patch!

Looks good, thanks Michael.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] ipc,msg: always respect MSG_NOERROR
  2014-05-18 18:01     ` Davidlohr Bueso
@ 2014-05-19 18:01       ` Manfred Spraul
  0 siblings, 0 replies; 18+ messages in thread
From: Manfred Spraul @ 2014-05-19 18:01 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm; +Cc: aswin, linux-kernel

Hi Davidlohr, Hi Andrew,

On 05/18/2014 08:01 PM, Davidlohr Bueso wrote:
> On Sun, 2014-05-18 at 07:53 +0200, Manfred Spraul wrote:
>> On 05/13/2014 10:27 PM, Davidlohr Bueso wrote:
>>> When specifying the MSG_NOERROR flag, receivers can avoid returning
>>> error (E2BIG) and just truncate the message text, if it is too large.
>>>
>>> Currently, this logic is only respected when there are already pending
>>> messages in the queue.
>> Do you have a test case? The code should handle that
>> (See below)
Below you admit that the code works flawlessly.
I.e. The changelog misrepresents the patch as a bugfix, whereas it 
actually is a cleanup.

The code is more or less 15 years old, cleanups are a good thing, but 
NOT WITH A MISLEADING CHANGELOG!
(and unfortunately not the first time, you didn't mention that your 
patch to modify SHMMAX also disabled SHMMIN).


>>> @@ -901,6 +907,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
>>>    		list_add_tail(&msr_d.r_list, &msq->q_receivers);
>>>    		msr_d.r_tsk = current;
>>>    		msr_d.r_msgtype = msgtyp;
>>> +		msr_d.r_msgflg = msgflg;
>>>    		msr_d.r_mode = mode;
>>>    		if (msgflg & MSG_NOERROR)
>>>    			msr_d.r_maxsize = INT_MAX;
>>      ^^^^^^
>> This code should handle MSG_NOERROR:
>> If MSG_NOERROR is set, then maxsize is set to INT_MAX, therefore -E2BIG
>> should never be returned.
> Yeah, I noticed that, but I'd still prefer keeping the check, even if
> redundant. It's free and by keeping both scenarios where there are and
> aren't msg waiting in the queue the code becomes easier to read.
Why?
Preparation for porting to a noisy quantum computer, where we must check 
for conditions twice?

Either you can pass the flags, or you treat MSG_NOERROR as "accept 
messages up to size INT_MAX.
But not both.

A cleanup should *remove* code duplication and other complex parts, not 
add additional code duplication.

Andrew:
Could you please drop the patch?

- The change log must be updated.
- The introduction of r_msgflg is questionable. r_maxsize=INT_MAX is 
simpler than r_msgflg&MSG_NOERROR.
- The cleanup should be thoroughly tested, to verify that it doesn't 
break something that is not tested by LTP.

--
     Manfred

P.S.:
> +			wake_up_process(msr->r_tsk);
>   
> -				return 1;
> -			}
> +			/*
> +			 * Ensure that the wakeup is visible before
> +			 * setting r_msg, as the receiving end depends
> +			 * on it. See lockless receive part 1 and 2 in
> +			 * do_msgrcv().
> +			 */
> +			smp_mb();
> +			msr->r_msg = msg;
No, it's not about visibility.

The issue is use-after-free:
- immediately after msr->r_msg = msg, the other task can return to user 
space, call sys_exit() and then release the task struct
- wake_up_process(msr->r_tsk) then accesses already released memory.

On a virtualized hardware, this can happen (and did happen - either with 
ipc/sem.c or ipc/msg.c, with S390. It was documented, it seems it got lost).

So a hands-off is required:
Writing r_msg must be the last write operation.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-05-19 18:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-13 20:27 [PATCH -next 0/5] ipc,msg: fixes and updates Davidlohr Bueso
2014-05-13 20:27 ` [PATCH 1/5] ipc,msg: use current->state helpers Davidlohr Bueso
2014-05-17 17:55   ` Manfred Spraul
2014-05-13 20:27 ` [PATCH 2/5] ipc,msg: move some msgq ns code around Davidlohr Bueso
2014-05-17 17:57   ` Manfred Spraul
2014-05-13 20:27 ` [PATCH 3/5] ipc,msg: always respect MSG_NOERROR Davidlohr Bueso
2014-05-18  5:53   ` Manfred Spraul
2014-05-18 18:01     ` Davidlohr Bueso
2014-05-19 18:01       ` Manfred Spraul
2014-05-13 20:27 ` [PATCH 4/5] ipc,msg: document volatile r_msg Davidlohr Bueso
2014-05-18  6:08   ` Manfred Spraul
2014-05-13 20:27 ` [PATCH 5/5] ipc,msg: loosen check for full queue Davidlohr Bueso
2014-05-14 18:00   ` Manfred Spraul
2014-05-14 19:50     ` Davidlohr Bueso
2014-05-15  4:20       ` Manfred Spraul
2014-05-15 15:46         ` Davidlohr Bueso
2014-05-16 12:47           ` Michael Kerrisk (man-pages)
2014-05-18 18:16             ` Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).