linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data
@ 2015-06-22 22:25 Marcus Gelderie
  2015-06-25  5:47 ` Davidlohr Bueso
  0 siblings, 1 reply; 17+ messages in thread
From: Marcus Gelderie @ 2015-06-22 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mtk.manpages, dhowells, viro, dledford, John Duffy,
	Arto Bendiken, linux-api, Davidlohr Bueso, redmnic

A while back, the message queue implementation in the kernel was
improved to use btrees to speed up retrieval of messages (commit
d6629859b36). The patch introducing the improved kernel handling of
message queues (using btrees) has, as a by-product, changed the
meaning of the QSIZE field in the pseudo-file created for the queue.
Before, this field reflected the size of the user-data in the queue.
Since, it also takes kernel data structures into account. For
example, if 13 bytes of user data are in the queue, on my machine the
file reports a size of 61 bytes.

There was some discussion on this topic before (for example
https://lkml.org/lkml/2014/10/1/115). Commenting on the lkml, Michael
Kerrisk gave the following background (https://lkml.org/lkml/2015/6/16/74):

    The pseudofiles in the mqueue filesystem (usually mounted at
    /dev/mqueue) expose fields with metadata describing a message
    queue. One of these fields, QSIZE, as originally implemented,
    showed the total number of bytes of user data in all messages in
    the message queue, and this feature was documented from the
    beginning in the mq_overview(7) page. In 3.5, some other (useful)
    work happened to break the user-space API in a couple of places,
    including the value exposed via QSIZE, which now includes a measure
    of kernel overhead bytes for the queue, a figure that renders QSIZE
    useless for its original purpose, since there's no way to deduce
    the number of overhead bytes consumed by the implementation.
    (The other user-space breakage was subsequently fixed.)

Reporting the size of the message queue in kernel has its merits, but
doing so in the QSIZE field of the pseudo file corresponding to the
queue is a breaking change, as mentioned above. This patch therefore
returns the QSIZE  field to its original meaning. At the same time,
it introduces a new field QKERSIZE that reflects the size of the queue
in kernel (user data + kernel data).

It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
against the worst-case size of the queue (in both the old and the new
implementation) and is therefore not affected by this change, nor by
the previous one changing the way of accounting.

Signed-off-by: Marcus Gelderie <redmnic@gmail.com>
---
 ipc/mqueue.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 3aaea7f..7d4c464 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -41,7 +41,7 @@
 
 #define MQUEUE_MAGIC	0x19800202
 #define DIRENT_SIZE	20
-#define FILENT_SIZE	80
+#define FILENT_SIZE	90
 
 #define SEND		0
 #define RECV		1
@@ -82,8 +82,12 @@ struct mqueue_inode_info {
 	/* for tasks waiting for free space and messages, respectively */
 	struct ext_wait_queue e_wait_q[2];
 
-	unsigned long qsize; /* size of queue in memory (sum of all msgs) */
-};
+	/* size of queue in memory (sum of all msgs plus kernel
+	 * data structures) */
+	unsigned long qsize;
+
+	/* size of user data in the queue (sum of all msgs) */
+	unsigned long q_usersize; };
 
 static const struct inode_operations mqueue_dir_inode_operations;
 static const struct file_operations mqueue_file_operations;
@@ -151,6 +155,7 @@ static int msg_insert(struct msg_msg *msg, struct mqueue_inode_info *info)
 insert_msg:
 	info->attr.mq_curmsgs++;
 	info->qsize += msg->m_ts;
+	info->q_usersize += msg->m_ts;
 	list_add_tail(&msg->m_list, &leaf->msg_list);
 	return 0;
 }
@@ -210,6 +215,7 @@ try_again:
 	}
 	info->attr.mq_curmsgs--;
 	info->qsize -= msg->m_ts;
+	info->q_usersize -= msg->m_ts;
 	return msg;
 }
 
@@ -246,6 +252,7 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 		info->notify_owner = NULL;
 		info->notify_user_ns = NULL;
 		info->qsize = 0;
+		info->q_usersize = 0;
 		info->user = NULL;	/* set when all is ok */
 		info->msg_tree = RB_ROOT;
 		info->node_cache = NULL;
@@ -491,13 +498,14 @@ static ssize_t mqueue_read_file(struct file *filp, char __user *u_data,
 
 	spin_lock(&info->lock);
 	snprintf(buffer, sizeof(buffer),
-			"QSIZE:%-10lu NOTIFY:%-5d SIGNO:%-5d NOTIFY_PID:%-6d\n",
-			info->qsize,
+			"QSIZE:%-10lu NOTIFY:%-5d SIGNO:%-5d NOTIFY_PID:%-6d QKERSIZE:%-10lu\n",
+			info->q_usersize,
 			info->notify_owner ? info->notify.sigev_notify : 0,
 			(info->notify_owner &&
 			 info->notify.sigev_notify == SIGEV_SIGNAL) ?
 				info->notify.sigev_signo : 0,
-			pid_vnr(info->notify_owner));
+			pid_vnr(info->notify_owner),
+			info->qsize);
 	spin_unlock(&info->lock);
 	buffer[sizeof(buffer)-1] = '\0';
 
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data
  2015-06-22 22:25 [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data Marcus Gelderie
@ 2015-06-25  5:47 ` Davidlohr Bueso
  2015-06-25  7:23   ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 17+ messages in thread
From: Davidlohr Bueso @ 2015-06-25  5:47 UTC (permalink / raw)
  To: Marcus Gelderie
  Cc: linux-kernel, mtk.manpages, dhowells, viro, dledford, John Duffy,
	Arto Bendiken, linux-api, redmnic

On Tue, 2015-06-23 at 00:25 +0200, Marcus Gelderie wrote:
> A while back, the message queue implementation in the kernel was
> improved to use btrees to speed up retrieval of messages (commit
> d6629859b36). The patch introducing the improved kernel handling of
> message queues (using btrees) has, as a by-product, changed the
> meaning of the QSIZE field in the pseudo-file created for the queue.
> Before, this field reflected the size of the user-data in the queue.
> Since, it also takes kernel data structures into account. For
> example, if 13 bytes of user data are in the queue, on my machine the
> file reports a size of 61 bytes.

Good catch, and a nice opportunity to make the mq manpage more specific
wrt to queue sizes.

[...]

> Reporting the size of the message queue in kernel has its merits, but
> doing so in the QSIZE field of the pseudo file corresponding to the
> queue is a breaking change, as mentioned above. This patch therefore
> returns the QSIZE  field to its original meaning. At the same time,
> it introduces a new field QKERSIZE that reflects the size of the queue
> in kernel (user data + kernel data).

Hmmm I'm not sure about this. What are the specific benefits of having
QKERSIZE? We don't export in-kernel data like this in any other ipc
(posix or sysv) mechanism, afaik. Plus, we do not compromise kernel data
structures like this, as we would break userspace if later we change
posix_msg_tree_node. So NAK to this.

I would just remove the extra
+       info->qsize += sizeof(struct posix_msg_tree_node);

bits from d6629859b36 (along with -stable v3.5), plus a patch updating
the manpage that this field only reflects user data.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data
  2015-06-25  5:47 ` Davidlohr Bueso
@ 2015-06-25  7:23   ` Michael Kerrisk (man-pages)
  2015-06-25 18:21     ` Davidlohr Bueso
  2015-06-25 18:50     ` [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data Marcus Gelderie
  0 siblings, 2 replies; 17+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-06-25  7:23 UTC (permalink / raw)
  To: Davidlohr Bueso, Doug Ledford
  Cc: Marcus Gelderie, lkml, David Howells, Alexander Viro, John Duffy,
	Arto Bendiken, Linux API, Marcus Gelderie

On 25 June 2015 at 07:47, Davidlohr Bueso <dave@stgolabs.net> wrote:
> On Tue, 2015-06-23 at 00:25 +0200, Marcus Gelderie wrote:
>> A while back, the message queue implementation in the kernel was
>> improved to use btrees to speed up retrieval of messages (commit
>> d6629859b36). The patch introducing the improved kernel handling of
>> message queues (using btrees) has, as a by-product, changed the
>> meaning of the QSIZE field in the pseudo-file created for the queue.
>> Before, this field reflected the size of the user-data in the queue.
>> Since, it also takes kernel data structures into account. For
>> example, if 13 bytes of user data are in the queue, on my machine the
>> file reports a size of 61 bytes.
>
> Good catch, and a nice opportunity to make the mq manpage more specific
> wrt to queue sizes.
>
> [...]
>
>> Reporting the size of the message queue in kernel has its merits, but
>> doing so in the QSIZE field of the pseudo file corresponding to the
>> queue is a breaking change, as mentioned above. This patch therefore
>> returns the QSIZE  field to its original meaning. At the same time,
>> it introduces a new field QKERSIZE that reflects the size of the queue
>> in kernel (user data + kernel data).
>
> Hmmm I'm not sure about this. What are the specific benefits of having
> QKERSIZE? We don't export in-kernel data like this in any other ipc
> (posix or sysv) mechanism, afaik. Plus, we do not compromise kernel data
> structures like this, as we would break userspace if later we change
> posix_msg_tree_node. So NAK to this.
>
> I would just remove the extra
> +       info->qsize += sizeof(struct posix_msg_tree_node);
>
> bits from d6629859b36 (along with -stable v3.5), plus a patch updating
> the manpage that this field only reflects user data.

I've been hoping that Doug would jump into this discussion...

If I recall/understand Doug correctly (see
http://thread.gmane.org/gmane.linux.man/7050/focus=1797645 ), his
rationale for the QSIZE change was that it then revealed a value that
was closer to what was being used to account against the
RLIMIT_MSGQUEUE resource limit. (Even with these changes, the QSIZE
value was not 100% accurate for accounting against RLIMIT_MSGQUEUE,
since some pieces of kernel overhead were still not being accounted
for. Nevertheless, it's much closer than the old (pre 3.5) QSIZE for
some corner cases.) Thus, Marcus's rationale for preserving this info
as QKERSIZE.

Now whether QKERSIZE is actually useful or used by anyone is another
question. As far as I know, there was no user request that drove the
change. But Doug can perhaps say something to this. QSIZE should I
think definitely be fixed (reverted to pre-3.5 behavior). I'm agnostic
about QKERSIZE.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data
  2015-06-25  7:23   ` Michael Kerrisk (man-pages)
@ 2015-06-25 18:21     ` Davidlohr Bueso
  2015-07-06 15:49       ` [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account Marcus Gelderie
  2015-06-25 18:50     ` [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data Marcus Gelderie
  1 sibling, 1 reply; 17+ messages in thread
From: Davidlohr Bueso @ 2015-06-25 18:21 UTC (permalink / raw)
  To: mtk.manpages
  Cc: Doug Ledford, Marcus Gelderie, lkml, David Howells,
	Alexander Viro, John Duffy, Arto Bendiken, Linux API,
	Marcus Gelderie, akpm

[CC'ing akpm as he handles such changes]

On Thu, 2015-06-25 at 09:23 +0200, Michael Kerrisk (man-pages) wrote:
> On 25 June 2015 at 07:47, Davidlohr Bueso <dave@stgolabs.net> wrote:
> > On Tue, 2015-06-23 at 00:25 +0200, Marcus Gelderie wrote:
> >> A while back, the message queue implementation in the kernel was
> >> improved to use btrees to speed up retrieval of messages (commit
> >> d6629859b36). The patch introducing the improved kernel handling of
> >> message queues (using btrees) has, as a by-product, changed the
> >> meaning of the QSIZE field in the pseudo-file created for the queue.
> >> Before, this field reflected the size of the user-data in the queue.
> >> Since, it also takes kernel data structures into account. For
> >> example, if 13 bytes of user data are in the queue, on my machine the
> >> file reports a size of 61 bytes.
> >
> > Good catch, and a nice opportunity to make the mq manpage more specific
> > wrt to queue sizes.
> >
> > [...]
> >
> >> Reporting the size of the message queue in kernel has its merits, but
> >> doing so in the QSIZE field of the pseudo file corresponding to the
> >> queue is a breaking change, as mentioned above. This patch therefore
> >> returns the QSIZE  field to its original meaning. At the same time,
> >> it introduces a new field QKERSIZE that reflects the size of the queue
> >> in kernel (user data + kernel data).
> >
> > Hmmm I'm not sure about this. What are the specific benefits of having
> > QKERSIZE? We don't export in-kernel data like this in any other ipc
> > (posix or sysv) mechanism, afaik. Plus, we do not compromise kernel data
> > structures like this, as we would break userspace if later we change
> > posix_msg_tree_node. So NAK to this.
> >
> > I would just remove the extra
> > +       info->qsize += sizeof(struct posix_msg_tree_node);
> >
> > bits from d6629859b36 (along with -stable v3.5), plus a patch updating
> > the manpage that this field only reflects user data.
> 
> I've been hoping that Doug would jump into this discussion...
> 
> If I recall/understand Doug correctly (see
> http://thread.gmane.org/gmane.linux.man/7050/focus=1797645 ),

Ah so we _had_ this conversation in the past.

>  his
> rationale for the QSIZE change was that it then revealed a value that
> was closer to what was being used to account against the
> RLIMIT_MSGQUEUE resource limit. (Even with these changes, the QSIZE
> value was not 100% accurate for accounting against RLIMIT_MSGQUEUE,
> since some pieces of kernel overhead were still not being accounted
> for. Nevertheless, it's much closer than the old (pre 3.5) QSIZE for
> some corner cases.) Thus, Marcus's rationale for preserving this info
> as QKERSIZE.
> 
> Now whether QKERSIZE is actually useful or used by anyone is another
> question. As far as I know, there was no user request that drove the
> change. But Doug can perhaps say something to this. QSIZE should I
> think definitely be fixed (reverted to pre-3.5 behavior). I'm agnostic
> about QKERSIZE.

Right, and we seemed to have agreed in the previous thread that the
QSIZE changes should be reverted back to its original values. We also
agree on the main reason why: it exposes unnecessarily kernel
implementation details to userland -- and as such I at least still don't
buy much into the idea of QKERSIZE either.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data
  2015-06-25  7:23   ` Michael Kerrisk (man-pages)
  2015-06-25 18:21     ` Davidlohr Bueso
@ 2015-06-25 18:50     ` Marcus Gelderie
  2015-07-07 18:49       ` Doug Ledford
  1 sibling, 1 reply; 17+ messages in thread
From: Marcus Gelderie @ 2015-06-25 18:50 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Davidlohr Bueso, Doug Ledford, lkml, David Howells,
	Alexander Viro, John Duffy, Arto Bendiken, Linux API

Hey all,

answers in text below... 

TLDR: I can remove the QKERSIZE field, as I believe that it does not affect he
RLIMIT accounting (details [=code] below). Question is: Should it?

Before I provide another patch, I would appreciate another round of feedback, though.

So...

On Thu, Jun 25, 2015 at 09:23:33AM +0200, Michael Kerrisk (man-pages) wrote:
> On 25 June 2015 at 07:47, Davidlohr Bueso <dave@stgolabs.net> wrote:
> > Good catch, and a nice opportunity to make the mq manpage more specific
> > wrt to queue sizes.
> >
> > [...]
> >
ACK, I can write a patch for the manpage once we figure out what to do
here.
> >> Reporting the size of the message queue in kernel has its merits, but
> >> doing so in the QSIZE field of the pseudo file corresponding to the
> >> queue is a breaking change, as mentioned above. This patch therefore
> >> returns the QSIZE  field to its original meaning. At the same time,
> >> it introduces a new field QKERSIZE that reflects the size of the queue
> >> in kernel (user data + kernel data).
> >
> > Hmmm I'm not sure about this. What are the specific benefits of having
> > QKERSIZE? We don't export in-kernel data like this in any other ipc
> > (posix or sysv) mechanism, afaik. Plus, we do not compromise kernel data
> > structures like this, as we would break userspace if later we change
> > posix_msg_tree_node. So NAK to this.
> >
> > I would just remove the extra
> > +       info->qsize += sizeof(struct posix_msg_tree_node);
> >
> > bits from d6629859b36 (along with -stable v3.5), plus a patch updating
> > the manpage that this field only reflects user data.
> 
This can be done if the RLIMIT accounting is not done against that
value (more below).

> I've been hoping that Doug would jump into this discussion...
> 
> If I recall/understand Doug correctly (see
> http://thread.gmane.org/gmane.linux.man/7050/focus=1797645 ), his
> rationale for the QSIZE change was that it then revealed a value that
> was closer to what was being used to account against the
> RLIMIT_MSGQUEUE resource limit. (Even with these changes, the QSIZE
> value was not 100% accurate for accounting against RLIMIT_MSGQUEUE,
> since some pieces of kernel overhead were still not being accounted
> for. Nevertheless, it's much closer than the old (pre 3.5) QSIZE for
> some corner cases.) Thus, Marcus's rationale for preserving this info
> as QKERSIZE.
> 

Actually, looking at the code, it seems to me that the QKERSIZE
(a.k.a. the current qsize field) is actually not used for the purpose of
accounting at all. From ipc/mqueue.c (line 281 ff, comments added by me):
		
		/* worst case estimate for the btree in memory */
		
		mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
			min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
			sizeof(struct posix_msg_tree_node);

		/* worst case estimate for the user data in the queue */
		
		mq_bytes = mq_treesize + (info->attr.mq_maxmsg *
					  info->attr.mq_msgsize);

		spin_lock(&mq_lock);
		    
		    /* acutal accounting; u->mq_bytes is the other
		     * message queus for this user (computed in the way
		     * above (i.e. worst-case estimates) 
		     */

		if (u->mq_bytes + mq_bytes < u->mq_bytes ||
		    u->mq_bytes + mq_bytes > rlimit(RLIMIT_MSGQUEUE)) {
			[....]
			ret = -EMFILE;
			[.....]

So, I think that if the RLIMIT should take the actual size (not the
worst case) into account, then we have a bug. I can write a patch, but that
should be separate from this... meaning I'll turn this into a patchset.

However, we should decide what we want first: If I read the manpage (mq_overview), I
am inclined to think that the RLIMIT should use the acutal amount of
data occupied by the queue (or at least the user data). But it does not
necessarily rule out an accounting against worst-case estimation (to me
at least).


> Now whether QKERSIZE is actually useful or used by anyone is another
> question. As far as I know, there was no user request that drove the
> change. But Doug can perhaps say something to this. QSIZE should I
> think definitely be fixed (reverted to pre-3.5 behavior). I'm agnostic
> about QKERSIZE.
> 
> Cheers,
> 
> Michael
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account
  2015-06-25 18:21     ` Davidlohr Bueso
@ 2015-07-06 15:49       ` Marcus Gelderie
  2015-07-07  5:16         ` Davidlohr Bueso
  2015-07-11  0:48         ` [PATCH 2/1] ipc,mqueue: Delete bogus overflow check Davidlohr Bueso
  0 siblings, 2 replies; 17+ messages in thread
From: Marcus Gelderie @ 2015-07-06 15:49 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: mtk.manpages, Doug Ledford, redmnic, lkml, David Howells,
	Alexander Viro, John Duffy, Arto Bendiken, Linux API, akpm

A while back, the message queue implementation in the kernel was
improved to use btrees to speed up retrieval of messages (commit
d6629859b36). The patch introducing the improved kernel handling of
message queues (using btrees) has, as a by-product, changed the
meaning of the QSIZE field in the pseudo-file created for the queue.
Before, this field reflected the size of the user-data in the queue.
Since, it also takes kernel data structures into account. For
example, if 13 bytes of user data are in the queue, on my machine the
file reports a size of 61 bytes.

There was some discussion on this topic before (for example
https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
Kerrisk gave the following background (https://lkml.org/lkml/2015/6/16/74):

    The pseudofiles in the mqueue filesystem (usually mounted at
    /dev/mqueue) expose fields with metadata describing a message
    queue. One of these fields, QSIZE, as originally implemented,
    showed the total number of bytes of user data in all messages in
    the message queue, and this feature was documented from the
    beginning in the mq_overview(7) page. In 3.5, some other (useful)
    work happened to break the user-space API in a couple of places,
    including the value exposed via QSIZE, which now includes a measure
    of kernel overhead bytes for the queue, a figure that renders QSIZE
    useless for its original purpose, since there's no way to deduce
    the number of overhead bytes consumed by the implementation.
    (The other user-space breakage was subsequently fixed.)

This patch removes the accounting of kernel data structures in the
queue. Reporting the size of these data-structures in the QSIZE field
was a breaking change (see Michael's comment above). Without the QSIZE
field reporting the total size of user-data in the queue, there is no
way to deduce this number.

It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
against the worst-case size of the queue (in both the old and the new
implementation). Therefore, the kernel overhead accounting in QSIZE is
not necessary to help the user understand the limitations RLIMIT imposes
on the processes.

Signed-off-by: Marcus Gelderie <redmnic@gmail.com>

v3 Changes: Revert QSIZE to old meaning and remove QKERSIZE field, because the rlimit accounting does not take runtime kernel overhead into account (it is a worst case measure).

---
 ipc/mqueue.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 3aaea7f..c3fc5c2 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -143,7 +143,6 @@ static int msg_insert(struct msg_msg *msg, struct mqueue_inode_info *info)
 		if (!leaf)
 			return -ENOMEM;
 		INIT_LIST_HEAD(&leaf->msg_list);
-		info->qsize += sizeof(*leaf);
 	}
 	leaf->priority = msg->m_type;
 	rb_link_node(&leaf->rb_node, parent, p);
@@ -188,7 +187,6 @@ try_again:
 			     "lazy leaf delete!\n");
 		rb_erase(&leaf->rb_node, &info->msg_tree);
 		if (info->node_cache) {
-			info->qsize -= sizeof(*leaf);
 			kfree(leaf);
 		} else {
 			info->node_cache = leaf;
@@ -201,7 +199,6 @@ try_again:
 		if (list_empty(&leaf->msg_list)) {
 			rb_erase(&leaf->rb_node, &info->msg_tree);
 			if (info->node_cache) {
-				info->qsize -= sizeof(*leaf);
 				kfree(leaf);
 			} else {
 				info->node_cache = leaf;
@@ -1026,7 +1023,6 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
 		/* Save our speculative allocation into the cache */
 		INIT_LIST_HEAD(&new_leaf->msg_list);
 		info->node_cache = new_leaf;
-		info->qsize += sizeof(*new_leaf);
 		new_leaf = NULL;
 	} else {
 		kfree(new_leaf);
@@ -1133,7 +1129,6 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
 		/* Save our speculative allocation into the cache */
 		INIT_LIST_HEAD(&new_leaf->msg_list);
 		info->node_cache = new_leaf;
-		info->qsize += sizeof(*new_leaf);
 	} else {
 		kfree(new_leaf);
 	}
-- 
2.4.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account
  2015-07-06 15:49       ` [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account Marcus Gelderie
@ 2015-07-07  5:16         ` Davidlohr Bueso
  2015-07-07 13:01           ` Michael Kerrisk (man-pages)
  2015-07-11  0:48         ` [PATCH 2/1] ipc,mqueue: Delete bogus overflow check Davidlohr Bueso
  1 sibling, 1 reply; 17+ messages in thread
From: Davidlohr Bueso @ 2015-07-07  5:16 UTC (permalink / raw)
  To: Marcus Gelderie
  Cc: mtk.manpages, Doug Ledford, lkml, David Howells, Alexander Viro,
	John Duffy, Arto Bendiken, Linux API, akpm

On Mon, 2015-07-06 at 17:49 +0200, Marcus Gelderie wrote:
> A while back, the message queue implementation in the kernel was
> improved to use btrees to speed up retrieval of messages (commit
> d6629859b36). The patch introducing the improved kernel handling of
> message queues (using btrees) has, as a by-product, changed the
> meaning of the QSIZE field in the pseudo-file created for the queue.
> Before, this field reflected the size of the user-data in the queue.
> Since, it also takes kernel data structures into account. For
> example, if 13 bytes of user data are in the queue, on my machine the
> file reports a size of 61 bytes.
> 
> There was some discussion on this topic before (for example
> https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
> Kerrisk gave the following background (https://lkml.org/lkml/2015/6/16/74):
> 
>     The pseudofiles in the mqueue filesystem (usually mounted at
>     /dev/mqueue) expose fields with metadata describing a message
>     queue. One of these fields, QSIZE, as originally implemented,
>     showed the total number of bytes of user data in all messages in
>     the message queue, and this feature was documented from the
>     beginning in the mq_overview(7) page. In 3.5, some other (useful)
>     work happened to break the user-space API in a couple of places,
>     including the value exposed via QSIZE, which now includes a measure
>     of kernel overhead bytes for the queue, a figure that renders QSIZE
>     useless for its original purpose, since there's no way to deduce
>     the number of overhead bytes consumed by the implementation.
>     (The other user-space breakage was subsequently fixed.)

Michael, this breakage was never finally documented in the manpage,
right? I took a look and there is no mention, but it was a quick look.
It's just that if this patch goes in, I'd hate ending up with something
like this in the manpage:

as of 3.5
  <accounts for kernel overhead>

as of 4.3
  <behavior reverted back to not include kernel overhead... *sigh*>

If there are changes to be made to the manpage, it should probably be
posted with this patch, methinks.

> 
> This patch removes the accounting of kernel data structures in the
> queue. Reporting the size of these data-structures in the QSIZE field
> was a breaking change (see Michael's comment above). Without the QSIZE
> field reporting the total size of user-data in the queue, there is no
> way to deduce this number.
> 
> It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
> against the worst-case size of the queue (in both the old and the new
> implementation). Therefore, the kernel overhead accounting in QSIZE is
> not necessary to help the user understand the limitations RLIMIT imposes
> on the processes.

Also, I would suggest adding some comment in struct mqueue_inode_info 
for future reference, ie:

-       unsigned long qsize; /* size of queue in memory (sum of all msgs) */
+       /*
+        * Size of queue in memory (sum of all msgs). Accounts for
+        * only userspace overhead; ignoring any in-kernel rbtree nodes.
+        */
+       unsigned long qsize;

But no big deal in any case.

I think this is the right approach, but would still like to know if Doug
has any concerns about it.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account
  2015-07-07  5:16         ` Davidlohr Bueso
@ 2015-07-07 13:01           ` Michael Kerrisk (man-pages)
  2015-07-08 19:17             ` Doug Ledford
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-07-07 13:01 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Marcus Gelderie, Doug Ledford, lkml, David Howells,
	Alexander Viro, John Duffy, Arto Bendiken, Linux API,
	Andrew Morton

Hi David,

On 7 July 2015 at 07:16, Davidlohr Bueso <dave@stgolabs.net> wrote:
> On Mon, 2015-07-06 at 17:49 +0200, Marcus Gelderie wrote:
>> A while back, the message queue implementation in the kernel was
>> improved to use btrees to speed up retrieval of messages (commit
>> d6629859b36). The patch introducing the improved kernel handling of
>> message queues (using btrees) has, as a by-product, changed the
>> meaning of the QSIZE field in the pseudo-file created for the queue.
>> Before, this field reflected the size of the user-data in the queue.
>> Since, it also takes kernel data structures into account. For
>> example, if 13 bytes of user data are in the queue, on my machine the
>> file reports a size of 61 bytes.
>>
>> There was some discussion on this topic before (for example
>> https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
>> Kerrisk gave the following background (https://lkml.org/lkml/2015/6/16/74):
>>
>>     The pseudofiles in the mqueue filesystem (usually mounted at
>>     /dev/mqueue) expose fields with metadata describing a message
>>     queue. One of these fields, QSIZE, as originally implemented,
>>     showed the total number of bytes of user data in all messages in
>>     the message queue, and this feature was documented from the
>>     beginning in the mq_overview(7) page. In 3.5, some other (useful)
>>     work happened to break the user-space API in a couple of places,
>>     including the value exposed via QSIZE, which now includes a measure
>>     of kernel overhead bytes for the queue, a figure that renders QSIZE
>>     useless for its original purpose, since there's no way to deduce
>>     the number of overhead bytes consumed by the implementation.
>>     (The other user-space breakage was subsequently fixed.)
>
> Michael, this breakage was never finally documented in the manpage,
> right?

Right.

> I took a look and there is no mention, but it was a quick look.
> It's just that if this patch goes in, I'd hate ending up with something
> like this in the manpage:
>
> as of 3.5
>   <accounts for kernel overhead>
>
> as of 4.3
>   <behavior reverted back to not include kernel overhead... *sigh*>
>
> If there are changes to be made to the manpage, it should probably be
> posted with this patch, methinks.

I think something like the above will have to end up in the man page.
The only thing I'd add is that the fix also has gone to -stable
kernels. At least: I think this patch should also be marked for
-stable. I'll take care of the man page updates as the patch goes
through.

>> This patch removes the accounting of kernel data structures in the
>> queue. Reporting the size of these data-structures in the QSIZE field
>> was a breaking change (see Michael's comment above). Without the QSIZE
>> field reporting the total size of user-data in the queue, there is no
>> way to deduce this number.
>>
>> It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
>> against the worst-case size of the queue (in both the old and the new
>> implementation). Therefore, the kernel overhead accounting in QSIZE is
>> not necessary to help the user understand the limitations RLIMIT imposes
>> on the processes.
>
> Also, I would suggest adding some comment in struct mqueue_inode_info
> for future reference, ie:
>
> -       unsigned long qsize; /* size of queue in memory (sum of all msgs) */
> +       /*
> +        * Size of queue in memory (sum of all msgs). Accounts for
> +        * only userspace overhead; ignoring any in-kernel rbtree nodes.
> +        */
> +       unsigned long qsize;
>
> But no big deal in any case.
>
> I think this is the right approach,

Me too.

> but would still like to know if Doug
> has any concerns about it.

Looking on gmane, Doug does not appear to have been active on any
lists since late May! Not sure why.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data
  2015-06-25 18:50     ` [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data Marcus Gelderie
@ 2015-07-07 18:49       ` Doug Ledford
  0 siblings, 0 replies; 17+ messages in thread
From: Doug Ledford @ 2015-07-07 18:49 UTC (permalink / raw)
  To: Marcus Gelderie
  Cc: Michael Kerrisk (man-pages),
	Davidlohr Bueso, linux-kernel, David Howells, Alexander Viro,
	John Duffy, Arto Bendiken, Linux API

[-- Attachment #1: Type: text/plain, Size: 9134 bytes --]


> On Jun 25, 2015, at 2:50 PM, Marcus Gelderie <redmnic@gmail.com> wrote:
> 
> Hey all,
> 
> answers in text below...
> 
> TLDR: I can remove the QKERSIZE field, as I believe that it does not affect he
> RLIMIT accounting (details [=code] below). Question is: Should it?
> 
> Before I provide another patch, I would appreciate another round of feedback, though.
> 
> So…
> On Thu, Jun 25, 2015 at 09:23:33AM +0200, Michael Kerrisk (man-pages) wrote:
>> On 25 June 2015 at 07:47, Davidlohr Bueso <dave@stgolabs.net> wrote:
>>> Good catch, and a nice opportunity to make the mq manpage more specific
>>> wrt to queue sizes.
>>> 
>>> [...]
>>> 
> ACK, I can write a patch for the manpage once we figure out what to do
> here.
>>>> Reporting the size of the message queue in kernel has its merits, but
>>>> doing so in the QSIZE field of the pseudo file corresponding to the
>>>> queue is a breaking change, as mentioned above. This patch therefore
>>>> returns the QSIZE  field to its original meaning. At the same time,
>>>> it introduces a new field QKERSIZE that reflects the size of the queue
>>>> in kernel (user data + kernel data).
>>> 
>>> Hmmm I'm not sure about this. What are the specific benefits of having
>>> QKERSIZE? We don't export in-kernel data like this in any other ipc
>>> (posix or sysv) mechanism, afaik. Plus, we do not compromise kernel data
>>> structures like this, as we would break userspace if later we change
>>> posix_msg_tree_node. So NAK to this.
>>> 
>>> I would just remove the extra
>>> +       info->qsize += sizeof(struct posix_msg_tree_node);
>>> 
>>> bits from d6629859b36 (along with -stable v3.5), plus a patch updating
>>> the manpage that this field only reflects user data.
>> 
> This can be done if the RLIMIT accounting is not done against that
> value (more below).
> 
>> I've been hoping that Doug would jump into this discussion…

Sorry to have been quiet so far.  PTO interfered.

>> If I recall/understand Doug correctly (see
>> http://thread.gmane.org/gmane.linux.man/7050/focus=1797645 ), his
>> rationale for the QSIZE change was that it then revealed a value that
>> was closer to what was being used to account against the
>> RLIMIT_MSGQUEUE resource limit. (Even with these changes, the QSIZE
>> value was not 100% accurate for accounting against RLIMIT_MSGQUEUE,
>> since some pieces of kernel overhead were still not being accounted
>> for. Nevertheless, it's much closer than the old (pre 3.5) QSIZE for
>> some corner cases.) Thus, Marcus's rationale for preserving this info
>> as QKERSIZE.

Yes.  A bit more rationale was that the change to use an rbtree for priorities made the worst case scenario for queue size grow quite a lot.  This is especially true when dealing with queues with a very small message size.  If we didn’t account for some of this growth in size, we opened ourselves up to a DDoS type attack where a malicious user could create lots of queues with very small message sizes and lots of messages and then just create enough messages of different priorities to force allocating all of these structs.  Since they would be allowed to create 1 byte messages in the queue, and could easily create a 1024 element queue, that would be a 1k accounting for user data that in worst case scenario used up something like 96k of kernel memory.  With the default RLIMIT for msg queues being something like 800K, let’s just assume one user id can create 800 * 96k of kernel memory allocations, so 72MB of kernel memory used per user id.  Given a few user ids, it can actually be significant.  And all the while the sysadmin is scratching his head going “where the hell is all my memory going?” because there is no clear indication that a 1 byte message in the POSIX mqueue subsystem is consuming 96bytes or so of kernel memory.

So, just to be clear, my rationale here is “Breaking expectations is bad, but leaving a possible DDoS style exploit is worse”.  That’s why I changed the accounting.  I had one or two users complain that they had to administratively adjust the RLIMIT_MSGQUEUE setting on their servers to compensate, but that’s as it should be.  Now they *know* where their memory is being used instead of having a lurking memory black hole in their system.  For those users that use large message sizes, they didn’t even notice the change, it was just lost in the noise.

> 
> Actually, looking at the code, it seems to me that the QKERSIZE
> (a.k.a. the current qsize field) is actually not used for the purpose of
> accounting at all. From ipc/mqueue.c (line 281 ff, comments added by me):
> 
> 		/* worst case estimate for the btree in memory */
> 
> 		mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
> 			min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
> 			sizeof(struct posix_msg_tree_node);
> 
> 		/* worst case estimate for the user data in the queue */
> 
> 		mq_bytes = mq_treesize + (info->attr.mq_maxmsg *
> 					  info->attr.mq_msgsize);
> 
> 		spin_lock(&mq_lock);
> 
> 		    /* acutal accounting; u->mq_bytes is the other
> 		     * message queus for this user (computed in the way
> 		     * above (i.e. worst-case estimates)
> 		     */
> 
> 		if (u->mq_bytes + mq_bytes < u->mq_bytes ||
> 		    u->mq_bytes + mq_bytes > rlimit(RLIMIT_MSGQUEUE)) {
> 			[....]
> 			ret = -EMFILE;
> 			[.....]
> 
> So, I think that if the RLIMIT should take the actual size (not the
> worst case) into account, then we have a bug. I can write a patch, but that
> should be separate from this... meaning I'll turn this into a patchset.
> 
> However, we should decide what we want first: If I read the manpage (mq_overview), I
> am inclined to think that the RLIMIT should use the acutal amount of
> data occupied by the queue (or at least the user data). But it does not
> necessarily rule out an accounting against worst-case estimation (to me
> at least).

You have to use worst case.  You don’t have a choice.  The reason is that the check is only performed at queue creation time (and the user API expects this), so once the queue is created and the user is allowed to use it, failure due to RLIMIT issues is no longer allowed.  We therefore must use whatever worst case scenario we wish to use when we test the size during creation and then we have to be OK with the variance we might get from that during real world usage.  My tests/maths were intended to be “close enough” that they couldn’t be fooled with small message sizes and large queue sizes, but not intended to be perfect.

>> Now whether QKERSIZE is actually useful or used by anyone is another
>> question. As far as I know, there was no user request that drove the
>> change. But Doug can perhaps say something to this. QSIZE should I
>> think definitely be fixed (reverted to pre-3.5 behavior). I'm agnostic
>> about QKERSIZE.

Here’s what I would prefer to see happen:

1) Keep the maths as they are.  We need to account worst case because apps don’t expect run time checks/failures.
2) Keep accounting for kernel data size as part of queue size and adding both before checking it against the rlimit for the user.
3) Create a new mq_create call that take a new mq_attr struct.  This struct should add a field for max_priorities.  For existing apps, the current default of 32,768 priorities can be used.  For other apps, let them select their requested number of priorities.  By selecting lower numbers of priorities, that worst case scenario gets much better (as long as their queue size is larger than their number of priorities).

Or, as an alternative:

1) Change the maths to account for the msg struct, but ignore the rbtree structs.  This is still different from when I made my change in that the maths used to only account for a msg struct *.  This would put the overall total change somewhere in between real memory usage and memory queue size.  For instance, if you had a 1k queue of 1 byte messages, here’s how it lays out:

Pre-3.5 math: 1k * (1byte + (sizeof *)) = 5k or 9k (32/64bit arch)
Pre-3.5 reality: 1k * (1byte + (sizeof *) + sizeof (struct msg_msg) + unaccounted stuff) = 50ishk

Post-3.5 math: 1k * (1byte + sizeof (struct msg_msg) + sizeof (struct msg_node)) = 90ishk
Post-3.5 reality: 90ishk

Proposed math: 1k * (qbyte + sizeof (struc msg_msg)) = 50ishk
Proposed reality: 90ishk

The reason I suggest this is that the real world usage I’ve seen does not use lots of priorities.  So, even though the *worst case* reality is 90k, the common case will be in the 50k range like the math.  Given that we are less than a 2 to 1 ratio of worst case to math, a DDoS is unlikely to go undetected.  The only real problem here is that if you have an admin that wants to create lots of queues and use as much ram as possible, and they use lots of priorities, then they might get confused when their calculated sizes of mqueues fills RAM faster than they expected and the actually run out of RAM in use.

—
Doug Ledford <dledford@redhat.com>
	GPG Key ID: 0E572FDD






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account
  2015-07-07 13:01           ` Michael Kerrisk (man-pages)
@ 2015-07-08 19:17             ` Doug Ledford
  2015-07-08 19:53               ` Michael Kerrisk (man-pages)
                                 ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Doug Ledford @ 2015-07-08 19:17 UTC (permalink / raw)
  To: mtk.manpages, Davidlohr Bueso
  Cc: Marcus Gelderie, lkml, David Howells, Alexander Viro, John Duffy,
	Arto Bendiken, Linux API, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 4914 bytes --]

On 07/07/2015 09:01 AM, Michael Kerrisk (man-pages) wrote:
> Hi David,
> 
> On 7 July 2015 at 07:16, Davidlohr Bueso <dave@stgolabs.net> wrote:
>> On Mon, 2015-07-06 at 17:49 +0200, Marcus Gelderie wrote:
>>> A while back, the message queue implementation in the kernel was
>>> improved to use btrees to speed up retrieval of messages (commit
>>> d6629859b36). The patch introducing the improved kernel handling of
>>> message queues (using btrees) has, as a by-product, changed the
>>> meaning of the QSIZE field in the pseudo-file created for the queue.
>>> Before, this field reflected the size of the user-data in the queue.
>>> Since, it also takes kernel data structures into account. For
>>> example, if 13 bytes of user data are in the queue, on my machine the
>>> file reports a size of 61 bytes.
>>>
>>> There was some discussion on this topic before (for example
>>> https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
>>> Kerrisk gave the following background (https://lkml.org/lkml/2015/6/16/74):
>>>
>>>     The pseudofiles in the mqueue filesystem (usually mounted at
>>>     /dev/mqueue) expose fields with metadata describing a message
>>>     queue. One of these fields, QSIZE, as originally implemented,
>>>     showed the total number of bytes of user data in all messages in
>>>     the message queue, and this feature was documented from the
>>>     beginning in the mq_overview(7) page. In 3.5, some other (useful)
>>>     work happened to break the user-space API in a couple of places,
>>>     including the value exposed via QSIZE, which now includes a measure
>>>     of kernel overhead bytes for the queue, a figure that renders QSIZE
>>>     useless for its original purpose, since there's no way to deduce
>>>     the number of overhead bytes consumed by the implementation.
>>>     (The other user-space breakage was subsequently fixed.)
>>
>> Michael, this breakage was never finally documented in the manpage,
>> right?
> 
> Right.
> 
>> I took a look and there is no mention, but it was a quick look.
>> It's just that if this patch goes in, I'd hate ending up with something
>> like this in the manpage:
>>
>> as of 3.5
>>   <accounts for kernel overhead>
>>
>> as of 4.3
>>   <behavior reverted back to not include kernel overhead... *sigh*>
>>
>> If there are changes to be made to the manpage, it should probably be
>> posted with this patch, methinks.
> 
> I think something like the above will have to end up in the man page.
> The only thing I'd add is that the fix also has gone to -stable
> kernels. At least: I think this patch should also be marked for
> -stable. I'll take care of the man page updates as the patch goes
> through.
> 
>>> This patch removes the accounting of kernel data structures in the
>>> queue. Reporting the size of these data-structures in the QSIZE field
>>> was a breaking change (see Michael's comment above). Without the QSIZE
>>> field reporting the total size of user-data in the queue, there is no
>>> way to deduce this number.
>>>
>>> It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
>>> against the worst-case size of the queue (in both the old and the new
>>> implementation). Therefore, the kernel overhead accounting in QSIZE is
>>> not necessary to help the user understand the limitations RLIMIT imposes
>>> on the processes.
>>
>> Also, I would suggest adding some comment in struct mqueue_inode_info
>> for future reference, ie:
>>
>> -       unsigned long qsize; /* size of queue in memory (sum of all msgs) */
>> +       /*
>> +        * Size of queue in memory (sum of all msgs). Accounts for
>> +        * only userspace overhead; ignoring any in-kernel rbtree nodes.
>> +        */
>> +       unsigned long qsize;
>>
>> But no big deal in any case.
>>
>> I think this is the right approach,
> 
> Me too.
> 
>> but would still like to know if Doug
>> has any concerns about it.
> 
> Looking on gmane, Doug does not appear to have been active on any
> lists since late May! Not sure why.

I responded yesterday in the v2 patch thread I believe.  In any case, I
think this patch is fine, and can go to stable.  This patch doesn't
actually change the math related to the rlimit checks (which is the main
thing I wanted to correct in my original patches), instead it corrects a
mistake I made.  At the time, I mistakenly thought that the qsize
included the current message data total + the struct msg_msg size total.
 It didn't, it was just the current user data total.  I added the rbtree
nodes in order to keep the total accurate but I shouldn't have added the
rbtree nodes to this total because none of the other kernel usage was
previously included.

Acked-by: Doug Ledford <dledford@redhat.com>



-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account
  2015-07-08 19:17             ` Doug Ledford
@ 2015-07-08 19:53               ` Michael Kerrisk (man-pages)
  2015-07-08 21:49               ` Davidlohr Bueso
  2015-07-10  0:00               ` Davidlohr Bueso
  2 siblings, 0 replies; 17+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-07-08 19:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Davidlohr Bueso, Marcus Gelderie, lkml, David Howells,
	Alexander Viro, John Duffy, Arto Bendiken, Linux API,
	Andrew Morton

On 8 July 2015 at 21:17, Doug Ledford <dledford@redhat.com> wrote:
> On 07/07/2015 09:01 AM, Michael Kerrisk (man-pages) wrote:
>> Hi David,
>>
>> On 7 July 2015 at 07:16, Davidlohr Bueso <dave@stgolabs.net> wrote:
>>> On Mon, 2015-07-06 at 17:49 +0200, Marcus Gelderie wrote:
>>>> A while back, the message queue implementation in the kernel was
>>>> improved to use btrees to speed up retrieval of messages (commit
>>>> d6629859b36). The patch introducing the improved kernel handling of
>>>> message queues (using btrees) has, as a by-product, changed the
>>>> meaning of the QSIZE field in the pseudo-file created for the queue.
>>>> Before, this field reflected the size of the user-data in the queue.
>>>> Since, it also takes kernel data structures into account. For
>>>> example, if 13 bytes of user data are in the queue, on my machine the
>>>> file reports a size of 61 bytes.
>>>>
>>>> There was some discussion on this topic before (for example
>>>> https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
>>>> Kerrisk gave the following background (https://lkml.org/lkml/2015/6/16/74):
>>>>
>>>>     The pseudofiles in the mqueue filesystem (usually mounted at
>>>>     /dev/mqueue) expose fields with metadata describing a message
>>>>     queue. One of these fields, QSIZE, as originally implemented,
>>>>     showed the total number of bytes of user data in all messages in
>>>>     the message queue, and this feature was documented from the
>>>>     beginning in the mq_overview(7) page. In 3.5, some other (useful)
>>>>     work happened to break the user-space API in a couple of places,
>>>>     including the value exposed via QSIZE, which now includes a measure
>>>>     of kernel overhead bytes for the queue, a figure that renders QSIZE
>>>>     useless for its original purpose, since there's no way to deduce
>>>>     the number of overhead bytes consumed by the implementation.
>>>>     (The other user-space breakage was subsequently fixed.)
>>>
>>> Michael, this breakage was never finally documented in the manpage,
>>> right?
>>
>> Right.
>>
>>> I took a look and there is no mention, but it was a quick look.
>>> It's just that if this patch goes in, I'd hate ending up with something
>>> like this in the manpage:
>>>
>>> as of 3.5
>>>   <accounts for kernel overhead>
>>>
>>> as of 4.3
>>>   <behavior reverted back to not include kernel overhead... *sigh*>
>>>
>>> If there are changes to be made to the manpage, it should probably be
>>> posted with this patch, methinks.
>>
>> I think something like the above will have to end up in the man page.
>> The only thing I'd add is that the fix also has gone to -stable
>> kernels. At least: I think this patch should also be marked for
>> -stable. I'll take care of the man page updates as the patch goes
>> through.
>>
>>>> This patch removes the accounting of kernel data structures in the
>>>> queue. Reporting the size of these data-structures in the QSIZE field
>>>> was a breaking change (see Michael's comment above). Without the QSIZE
>>>> field reporting the total size of user-data in the queue, there is no
>>>> way to deduce this number.
>>>>
>>>> It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
>>>> against the worst-case size of the queue (in both the old and the new
>>>> implementation). Therefore, the kernel overhead accounting in QSIZE is
>>>> not necessary to help the user understand the limitations RLIMIT imposes
>>>> on the processes.
>>>
>>> Also, I would suggest adding some comment in struct mqueue_inode_info
>>> for future reference, ie:
>>>
>>> -       unsigned long qsize; /* size of queue in memory (sum of all msgs) */
>>> +       /*
>>> +        * Size of queue in memory (sum of all msgs). Accounts for
>>> +        * only userspace overhead; ignoring any in-kernel rbtree nodes.
>>> +        */
>>> +       unsigned long qsize;
>>>
>>> But no big deal in any case.
>>>
>>> I think this is the right approach,
>>
>> Me too.
>>
>>> but would still like to know if Doug
>>> has any concerns about it.
>>
>> Looking on gmane, Doug does not appear to have been active on any
>> lists since late May! Not sure why.
>
> I responded yesterday in the v2 patch thread I believe.  In any case, I
> think this patch is fine, and can go to stable.  This patch doesn't
> actually change the math related to the rlimit checks (which is the main
> thing I wanted to correct in my original patches), instead it corrects a
> mistake I made.  At the time, I mistakenly thought that the qsize
> included the current message data total + the struct msg_msg size total.
>  It didn't, it was just the current user data total.  I added the rbtree
> nodes in order to keep the total accurate but I shouldn't have added the
> rbtree nodes to this total because none of the other kernel usage was
> previously included.
>
> Acked-by: Doug Ledford <dledford@redhat.com>

+
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account
  2015-07-08 19:17             ` Doug Ledford
  2015-07-08 19:53               ` Michael Kerrisk (man-pages)
@ 2015-07-08 21:49               ` Davidlohr Bueso
  2015-07-10  0:00               ` Davidlohr Bueso
  2 siblings, 0 replies; 17+ messages in thread
From: Davidlohr Bueso @ 2015-07-08 21:49 UTC (permalink / raw)
  To: Doug Ledford
  Cc: mtk.manpages, Marcus Gelderie, lkml, David Howells,
	Alexander Viro, John Duffy, Arto Bendiken, Linux API,
	Andrew Morton

On Wed, 2015-07-08 at 15:17 -0400, Doug Ledford wrote:
> I responded yesterday in the v2 patch thread I believe.  In any case, I
> think this patch is fine, and can go to stable.  This patch doesn't
> actually change the math related to the rlimit checks (which is the main
> thing I wanted to correct in my original patches), instead it corrects a
> mistake I made.  At the time, I mistakenly thought that the qsize
> included the current message data total + the struct msg_msg size total.
>  It didn't, it was just the current user data total.  I added the rbtree
> nodes in order to keep the total accurate but I shouldn't have added the
> rbtree nodes to this total because none of the other kernel usage was
> previously included.

Exactly, this is what I was referring to when I suggested staying clear
from rlimit modifications. It just makes 0 sense to account for each
rbtree.

Thanks for taking a look!

> 
> Acked-by: Doug Ledford <dledford@redhat.com>

Acked-by: Davidlohr Bueso <dbueso@suse.de>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account
  2015-07-08 19:17             ` Doug Ledford
  2015-07-08 19:53               ` Michael Kerrisk (man-pages)
  2015-07-08 21:49               ` Davidlohr Bueso
@ 2015-07-10  0:00               ` Davidlohr Bueso
  2 siblings, 0 replies; 17+ messages in thread
From: Davidlohr Bueso @ 2015-07-10  0:00 UTC (permalink / raw)
  To: Doug Ledford
  Cc: mtk.manpages, Marcus Gelderie, lkml, David Howells,
	Alexander Viro, John Duffy, Arto Bendiken, Linux API,
	Andrew Morton

While on the subject, after taking another look at where and how we
calculate the treesize, I think we can remove this check when checking
attributes (mqueue_attr_ok -- which itself could use some simplifying):

	mq_treesize = attr->mq_maxmsg * sizeof(struct msg_msg) +
		min_t(unsigned int, attr->mq_maxmsg, MQ_PRIO_MAX) *
		sizeof(struct posix_msg_tree_node);

	total_size = attr->mq_maxmsg * attr->mq_msgsize;

	if (total_size + mq_treesize < total_size)
		return -EOVERFLOW;
		^^^ this condition is never true, so we never return EOVERFLOW.
	return 0;

The below gets rid of this bogus check and refactors similar code,
introducing a mqueue_sizeof() helper to do the job of calculating total
mqueue overhead.

Thoughts?

Thanks,
Davidlohr

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 161a180..a5d0c9e 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -209,6 +209,31 @@ try_again:
 	return msg;
 }
 
+/*
+ * We used to allocate a static array of pointers and account
+ * the size of that array as well as one msg_msg struct per
+ * possible message into the queue size. That's no longer
+ * accurate as the queue is now an rbtree and will grow and
+ * shrink depending on usage patterns.  We can, however, still
+ * account one msg_msg struct per message, but the nodes are
+ * allocated depending on priority usage, and most programs
+ * only use one, or a handful, of priorities.  However, since
+ * this is pinned memory, we need to assume worst case, so
+ * that means the min(mq_maxmsg, max_priorities) * struct
+ * posix_msg_tree_node.
+ */
+static inline unsigned long mqueue_sizeof(struct mqueue_inode_info *info)
+{
+	unsigned long mq_treesize, mq_max_msgsize;
+
+	mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
+		min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
+		sizeof(struct posix_msg_tree_node);
+
+	mq_max_msgsize = info->attr.mq_maxmsg * info->attr.mq_msgsize;
+	return mq_treesize + mq_max_msgsize; /* bytes */
+}
+
 static struct inode *mqueue_get_inode(struct super_block *sb,
 		struct ipc_namespace *ipc_ns, umode_t mode,
 		struct mq_attr *attr)
@@ -229,7 +254,7 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 
 	if (S_ISREG(mode)) {
 		struct mqueue_inode_info *info;
-		unsigned long mq_bytes, mq_treesize;
+		unsigned long mq_bytes;
 
 		inode->i_fop = &mqueue_file_operations;
 		inode->i_size = FILENT_SIZE;
@@ -254,25 +279,8 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 			info->attr.mq_maxmsg = attr->mq_maxmsg;
 			info->attr.mq_msgsize = attr->mq_msgsize;
 		}
-		/*
-		 * We used to allocate a static array of pointers and account
-		 * the size of that array as well as one msg_msg struct per
-		 * possible message into the queue size. That's no longer
-		 * accurate as the queue is now an rbtree and will grow and
-		 * shrink depending on usage patterns.  We can, however, still
-		 * account one msg_msg struct per message, but the nodes are
-		 * allocated depending on priority usage, and most programs
-		 * only use one, or a handful, of priorities.  However, since
-		 * this is pinned memory, we need to assume worst case, so
-		 * that means the min(mq_maxmsg, max_priorities) * struct
-		 * posix_msg_tree_node.
-		 */
-		mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
-			min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
-			sizeof(struct posix_msg_tree_node);
 
-		mq_bytes = mq_treesize + (info->attr.mq_maxmsg *
-					  info->attr.mq_msgsize);
+		mq_bytes = mqueue_sizeof(info);
 
 		spin_lock(&mq_lock);
 		if (u->mq_bytes + mq_bytes < u->mq_bytes ||
@@ -371,7 +379,7 @@ static void mqueue_evict_inode(struct inode *inode)
 {
 	struct mqueue_inode_info *info;
 	struct user_struct *user;
-	unsigned long mq_bytes, mq_treesize;
+	unsigned long mq_bytes;
 	struct ipc_namespace *ipc_ns;
 	struct msg_msg *msg;
 
@@ -388,13 +396,7 @@ static void mqueue_evict_inode(struct inode *inode)
 	kfree(info->node_cache);
 	spin_unlock(&info->lock);
 
-	/* Total amount of bytes accounted for the mqueue */
-	mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
-		min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
-		sizeof(struct posix_msg_tree_node);
-
-	mq_bytes = mq_treesize + (info->attr.mq_maxmsg *
-				  info->attr.mq_msgsize);
+	mq_bytes = mqueue_sizeof(info);
 
 	user = info->user;
 	if (user) {
@@ -692,9 +694,6 @@ static void remove_notification(struct mqueue_inode_info *info)
 
 static int mq_attr_ok(struct ipc_namespace *ipc_ns, struct mq_attr *attr)
 {
-	int mq_treesize;
-	unsigned long total_size;
-
 	if (attr->mq_maxmsg <= 0 || attr->mq_msgsize <= 0)
 		return -EINVAL;
 	if (capable(CAP_SYS_RESOURCE)) {
@@ -709,12 +708,6 @@ static int mq_attr_ok(struct ipc_namespace *ipc_ns, struct mq_attr *attr)
 	/* check for overflow */
 	if (attr->mq_msgsize > ULONG_MAX/attr->mq_maxmsg)
 		return -EOVERFLOW;
-	mq_treesize = attr->mq_maxmsg * sizeof(struct msg_msg) +
-		min_t(unsigned int, attr->mq_maxmsg, MQ_PRIO_MAX) *
-		sizeof(struct posix_msg_tree_node);
-	total_size = attr->mq_maxmsg * attr->mq_msgsize;
-	if (total_size + mq_treesize < total_size)
-		return -EOVERFLOW;
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/1] ipc,mqueue: Delete bogus overflow check
  2015-07-06 15:49       ` [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account Marcus Gelderie
  2015-07-07  5:16         ` Davidlohr Bueso
@ 2015-07-11  0:48         ` Davidlohr Bueso
  2015-07-11  2:03           ` Al Viro
  1 sibling, 1 reply; 17+ messages in thread
From: Davidlohr Bueso @ 2015-07-11  0:48 UTC (permalink / raw)
  To: Marcus Gelderie
  Cc: mtk.manpages, Doug Ledford, lkml, David Howells, Alexander Viro,
	John Duffy, Arto Bendiken, Linux API, akpm

Mathematically, returning -EOVERFLOW in mq_attr_ok()
cannot occur under this condition:

       mq_treesize = attr->mq_maxmsg * sizeof(struct msg_msg) +
	       min_t(unsigned int, attr->mq_maxmsg, MQ_PRIO_MAX) *
	       sizeof(struct posix_msg_tree_node);
       total_size = attr->mq_maxmsg * attr->mq_msgsize;
       if (total_size + mq_treesize < total_size)
	       return -EOVERFLOW;

Thus remove the check and simplify code around calculating
total queue overhead by introducing a mqueue_sizeof() helper.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
Passes ipc stresser and ltp tests.

 ipc/mqueue.c | 65 +++++++++++++++++++++++++++---------------------------------
 1 file changed, 29 insertions(+), 36 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 161a180..a5d0c9e 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -209,6 +209,31 @@ try_again:
 	return msg;
 }
 
+/*
+ * We used to allocate a static array of pointers and account
+ * the size of that array as well as one msg_msg struct per
+ * possible message into the queue size. That's no longer
+ * accurate as the queue is now an rbtree and will grow and
+ * shrink depending on usage patterns.  We can, however, still
+ * account one msg_msg struct per message, but the nodes are
+ * allocated depending on priority usage, and most programs
+ * only use one, or a handful, of priorities.  However, since
+ * this is pinned memory, we need to assume worst case, so
+ * that means the min(mq_maxmsg, max_priorities) * struct
+ * posix_msg_tree_node.
+ */
+static inline unsigned long mqueue_sizeof(struct mqueue_inode_info *info)
+{
+	unsigned long mq_treesize, mq_max_msgsize;
+
+	mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
+		min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
+		sizeof(struct posix_msg_tree_node);
+
+	mq_max_msgsize = info->attr.mq_maxmsg * info->attr.mq_msgsize;
+	return mq_treesize + mq_max_msgsize; /* bytes */
+}
+
 static struct inode *mqueue_get_inode(struct super_block *sb,
 		struct ipc_namespace *ipc_ns, umode_t mode,
 		struct mq_attr *attr)
@@ -229,7 +254,7 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 
 	if (S_ISREG(mode)) {
 		struct mqueue_inode_info *info;
-		unsigned long mq_bytes, mq_treesize;
+		unsigned long mq_bytes;
 
 		inode->i_fop = &mqueue_file_operations;
 		inode->i_size = FILENT_SIZE;
@@ -254,25 +279,8 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 			info->attr.mq_maxmsg = attr->mq_maxmsg;
 			info->attr.mq_msgsize = attr->mq_msgsize;
 		}
-		/*
-		 * We used to allocate a static array of pointers and account
-		 * the size of that array as well as one msg_msg struct per
-		 * possible message into the queue size. That's no longer
-		 * accurate as the queue is now an rbtree and will grow and
-		 * shrink depending on usage patterns.  We can, however, still
-		 * account one msg_msg struct per message, but the nodes are
-		 * allocated depending on priority usage, and most programs
-		 * only use one, or a handful, of priorities.  However, since
-		 * this is pinned memory, we need to assume worst case, so
-		 * that means the min(mq_maxmsg, max_priorities) * struct
-		 * posix_msg_tree_node.
-		 */
-		mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
-			min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
-			sizeof(struct posix_msg_tree_node);
 
-		mq_bytes = mq_treesize + (info->attr.mq_maxmsg *
-					  info->attr.mq_msgsize);
+		mq_bytes = mqueue_sizeof(info);
 
 		spin_lock(&mq_lock);
 		if (u->mq_bytes + mq_bytes < u->mq_bytes ||
@@ -371,7 +379,7 @@ static void mqueue_evict_inode(struct inode *inode)
 {
 	struct mqueue_inode_info *info;
 	struct user_struct *user;
-	unsigned long mq_bytes, mq_treesize;
+	unsigned long mq_bytes;
 	struct ipc_namespace *ipc_ns;
 	struct msg_msg *msg;
 
@@ -388,13 +396,7 @@ static void mqueue_evict_inode(struct inode *inode)
 	kfree(info->node_cache);
 	spin_unlock(&info->lock);
 
-	/* Total amount of bytes accounted for the mqueue */
-	mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
-		min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
-		sizeof(struct posix_msg_tree_node);
-
-	mq_bytes = mq_treesize + (info->attr.mq_maxmsg *
-				  info->attr.mq_msgsize);
+	mq_bytes = mqueue_sizeof(info);
 
 	user = info->user;
 	if (user) {
@@ -692,9 +694,6 @@ static void remove_notification(struct mqueue_inode_info *info)
 
 static int mq_attr_ok(struct ipc_namespace *ipc_ns, struct mq_attr *attr)
 {
-	int mq_treesize;
-	unsigned long total_size;
-
 	if (attr->mq_maxmsg <= 0 || attr->mq_msgsize <= 0)
 		return -EINVAL;
 	if (capable(CAP_SYS_RESOURCE)) {
@@ -709,12 +708,6 @@ static int mq_attr_ok(struct ipc_namespace *ipc_ns, struct mq_attr *attr)
 	/* check for overflow */
 	if (attr->mq_msgsize > ULONG_MAX/attr->mq_maxmsg)
 		return -EOVERFLOW;
-	mq_treesize = attr->mq_maxmsg * sizeof(struct msg_msg) +
-		min_t(unsigned int, attr->mq_maxmsg, MQ_PRIO_MAX) *
-		sizeof(struct posix_msg_tree_node);
-	total_size = attr->mq_maxmsg * attr->mq_msgsize;
-	if (total_size + mq_treesize < total_size)
-		return -EOVERFLOW;
 	return 0;
 }
 
-- 
2.1.4




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/1] ipc,mqueue: Delete bogus overflow check
  2015-07-11  0:48         ` [PATCH 2/1] ipc,mqueue: Delete bogus overflow check Davidlohr Bueso
@ 2015-07-11  2:03           ` Al Viro
  2015-07-11  2:59             ` Doug Ledford
  0 siblings, 1 reply; 17+ messages in thread
From: Al Viro @ 2015-07-11  2:03 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Marcus Gelderie, mtk.manpages, Doug Ledford, lkml, David Howells,
	John Duffy, Arto Bendiken, Linux API, akpm

On Fri, Jul 10, 2015 at 05:48:11PM -0700, Davidlohr Bueso wrote:
> Mathematically, returning -EOVERFLOW in mq_attr_ok()
> cannot occur under this condition:
> 
>        mq_treesize = attr->mq_maxmsg * sizeof(struct msg_msg) +
> 	       min_t(unsigned int, attr->mq_maxmsg, MQ_PRIO_MAX) *
> 	       sizeof(struct posix_msg_tree_node);
>        total_size = attr->mq_maxmsg * attr->mq_msgsize;
>        if (total_size + mq_treesize < total_size)
> 	       return -EOVERFLOW;

A proof would be nice.  More detailed one than "cannot occur", that is.

	Condition in question is basically mq_treesize < 0 or
total_size + mq_treesize (in natural numbers) > 2^BITS_PER_LONG.
Now, the maximal values of ->mq_maxmsg and ->mq_msgsize are 2^16 and
2^24 resp. and we are guaranteed that their product is below 2^BITS_PER_LONG.
For mq_treesize we are guaranteed that it's below 2^31.  Now, on a 64bit
box that would suffice to avoid overflow - the product is at most 2^40 and
its sum with mq_treesize can't wrap around.

For 32bit system, though...  Suppose attr->mq_maxmsg == 65535 and
attr->mq_msgsize == 65537.  Their product *is* below 2^BITS_PER_LONG - it's
exactly 1 less than that.  _Any_ non-zero value for mq_tresize (and it
will be non-zero in the above) will lead to wraparound.

Looks like a counterexample to your assertion above...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/1] ipc,mqueue: Delete bogus overflow check
  2015-07-11  2:03           ` Al Viro
@ 2015-07-11  2:59             ` Doug Ledford
  2015-07-14 16:11               ` Marcus Gelderie
  0 siblings, 1 reply; 17+ messages in thread
From: Doug Ledford @ 2015-07-11  2:59 UTC (permalink / raw)
  To: Al Viro, Davidlohr Bueso
  Cc: Marcus Gelderie, mtk.manpages, lkml, David Howells, John Duffy,
	Arto Bendiken, Linux API, akpm

[-- Attachment #1: Type: text/plain, Size: 1843 bytes --]

On 07/10/2015 10:03 PM, Al Viro wrote:
> On Fri, Jul 10, 2015 at 05:48:11PM -0700, Davidlohr Bueso wrote:
>> Mathematically, returning -EOVERFLOW in mq_attr_ok()
>> cannot occur under this condition:
>>
>>        mq_treesize = attr->mq_maxmsg * sizeof(struct msg_msg) +
>> 	       min_t(unsigned int, attr->mq_maxmsg, MQ_PRIO_MAX) *
>> 	       sizeof(struct posix_msg_tree_node);
>>        total_size = attr->mq_maxmsg * attr->mq_msgsize;
>>        if (total_size + mq_treesize < total_size)
>> 	       return -EOVERFLOW;
> 
> A proof would be nice.  More detailed one than "cannot occur", that is.
> 
> 	Condition in question is basically mq_treesize < 0 or
> total_size + mq_treesize (in natural numbers) > 2^BITS_PER_LONG.
> Now, the maximal values of ->mq_maxmsg and ->mq_msgsize are 2^16 and
> 2^24 resp. and we are guaranteed that their product is below 2^BITS_PER_LONG.
> For mq_treesize we are guaranteed that it's below 2^31.  Now, on a 64bit
> box that would suffice to avoid overflow - the product is at most 2^40 and
> its sum with mq_treesize can't wrap around.
> 
> For 32bit system, though...  Suppose attr->mq_maxmsg == 65535 and
> attr->mq_msgsize == 65537.  Their product *is* below 2^BITS_PER_LONG - it's
> exactly 1 less than that.  _Any_ non-zero value for mq_tresize (and it
> will be non-zero in the above) will lead to wraparound.
> 
> Looks like a counterexample to your assertion above...
> 

I'm pretty sure you're right.  The above looks like an example of "Gee,
we need to protect against signed wrap around.  Wait, it's unsigned, no
worries." when in fact unsigned will wrap around too if the total
exceeds the maximum (it just wraps to a small positive value instead of
a large negative value).

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/1] ipc,mqueue: Delete bogus overflow check
  2015-07-11  2:59             ` Doug Ledford
@ 2015-07-14 16:11               ` Marcus Gelderie
  0 siblings, 0 replies; 17+ messages in thread
From: Marcus Gelderie @ 2015-07-14 16:11 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Al Viro, Davidlohr Bueso, mtk.manpages, lkml, David Howells,
	John Duffy, Arto Bendiken, Linux API, akpm

Hey,

I think Davidlohr has a point though on the computation of mq_treesize
being redundant code.

So even though the overflow check turns out to be necessary, it wouldn't 
hurt to do some refactoring of the computations, especially, because
they occur in mutiple places as copy-paste code.


@Davidlohr: Will there be a v2 of your patch?

On Fri, Jul 10, 2015 at 10:59:06PM -0400, Doug Ledford wrote:
> On 07/10/2015 10:03 PM, Al Viro wrote:
> > On Fri, Jul 10, 2015 at 05:48:11PM -0700, Davidlohr Bueso wrote:
> >> Mathematically, returning -EOVERFLOW in mq_attr_ok()
> >> cannot occur under this condition:
> >>
> >>        mq_treesize = attr->mq_maxmsg * sizeof(struct msg_msg) +
> >> 	       min_t(unsigned int, attr->mq_maxmsg, MQ_PRIO_MAX) *
> >> 	       sizeof(struct posix_msg_tree_node);
> >>        total_size = attr->mq_maxmsg * attr->mq_msgsize;
> >>        if (total_size + mq_treesize < total_size)
> >> 	       return -EOVERFLOW;
> > 
> > A proof would be nice.  More detailed one than "cannot occur", that is.
> > 
> > 	Condition in question is basically mq_treesize < 0 or
> > total_size + mq_treesize (in natural numbers) > 2^BITS_PER_LONG.
> > Now, the maximal values of ->mq_maxmsg and ->mq_msgsize are 2^16 and
> > 2^24 resp. and we are guaranteed that their product is below 2^BITS_PER_LONG.
> > For mq_treesize we are guaranteed that it's below 2^31.  Now, on a 64bit
> > box that would suffice to avoid overflow - the product is at most 2^40 and
> > its sum with mq_treesize can't wrap around.
> > 
> > For 32bit system, though...  Suppose attr->mq_maxmsg == 65535 and
> > attr->mq_msgsize == 65537.  Their product *is* below 2^BITS_PER_LONG - it's
> > exactly 1 less than that.  _Any_ non-zero value for mq_tresize (and it
> > will be non-zero in the above) will lead to wraparound.
> > 
> > Looks like a counterexample to your assertion above...
> > 
> 
> I'm pretty sure you're right.  The above looks like an example of "Gee,
> we need to protect against signed wrap around.  Wait, it's unsigned, no
> worries." when in fact unsigned will wrap around too if the total
> exceeds the maximum (it just wraps to a small positive value instead of
> a large negative value).
> 
> -- 
> Doug Ledford <dledford@redhat.com>
>               GPG KeyID: 0E572FDD
> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-07-14 16:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-22 22:25 [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data Marcus Gelderie
2015-06-25  5:47 ` Davidlohr Bueso
2015-06-25  7:23   ` Michael Kerrisk (man-pages)
2015-06-25 18:21     ` Davidlohr Bueso
2015-07-06 15:49       ` [PATCH v3] ipc: Modify message queue accounting to not take kernel data structures into account Marcus Gelderie
2015-07-07  5:16         ` Davidlohr Bueso
2015-07-07 13:01           ` Michael Kerrisk (man-pages)
2015-07-08 19:17             ` Doug Ledford
2015-07-08 19:53               ` Michael Kerrisk (man-pages)
2015-07-08 21:49               ` Davidlohr Bueso
2015-07-10  0:00               ` Davidlohr Bueso
2015-07-11  0:48         ` [PATCH 2/1] ipc,mqueue: Delete bogus overflow check Davidlohr Bueso
2015-07-11  2:03           ` Al Viro
2015-07-11  2:59             ` Doug Ledford
2015-07-14 16:11               ` Marcus Gelderie
2015-06-25 18:50     ` [PATCH v2] ipc: Modify message queue accounting to reflect both total user data and auxiliary kernel data Marcus Gelderie
2015-07-07 18:49       ` Doug Ledford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).