All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: dhowells@redhat.com, torvalds@linux-foundation.org,
	Casey Schaufler <casey@schaufler-ca.com>,
	Stephen Smalley <sds@tycho.nsa.gov>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	nicolas.dichtel@6wind.com, raven@themaw.net,
	Christian Brauner <christian@brauner.io>,
	keyrings@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-block@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 03/21] pipe: Use head and tail pointers for the ring, not cursor and length
Date: Thu, 17 Oct 2019 11:53:19 +0100	[thread overview]
Message-ID: <8694.1571309599@warthog.procyon.org.uk> (raw)
In-Reply-To: <b8799179-d389-8005-4f6d-845febc3bb23@rasmusvillemoes.dk>

Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> >  (6) The number of free slots in the ring is "(tail + pipe->ring_size) -
> >      head".
> 
> Seems an odd way of writing pipe->ring_size - (head - tail) ; i.e.
> obviously #free slots is #size minus #occupancy.

Perhaps so.  The way I was looking at it is the window into which things can
be written is tail...tail+ring_size; the number of free slots is the distance
from head to the end of the window.

Anyway, I now have a helper that does it your way.

> >  (7) The ring is full if "head >= (tail + pipe->ring_size)", which can also
> >      be written as "head - tail >= pipe->ring_size".
> >
> 
> No it cannot, it _must_ be written in the latter form.

Ah, you're right.  I have a helper now for that too.

> head-tail == pipe_size or head-tail >= pipe_size

In general, I'd prefer ">=" just in case tail gets in front of head.

Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> > Also split pipe->buffers into pipe->ring_size (which indicates the size of
> > the ring) and pipe->max_usage (which restricts the amount of ring that
> > write() is allowed to fill).  This allows for a pipe that is both writable
> > by the kernel notification facility and by userspace, allowing plenty of
> > ring space for notifications to be added whilst preventing userspace from
> > being able to use up too much buffer space.
> 
> That seems like something that should be added in a separate patch -
> adding ->max_usage and switching appropriate users of ->ring_size over,
> so it's more clear where you're using one or the other.

Okay.

> > +		ibuf = &pipe->bufs[tail];
> 
> I don't see where tail gets masked between tail = pipe->tail;

Yeah - I missed that one.

> In any case, how about seeding head and tail with something like 1<<20 when
> creating the pipe so bugs like that are hit more quickly.

That's sounds like a good idea.

> > +			while (tail < head) {
> > +				count += pipe->bufs[tail & mask].len;
> > +				tail++;
> >  			}
> 
> This is broken if head has wrapped but tail has not. It has to be "while
> (head - tail)" or perhaps just "while (tail != head)" or something along
> those lines.

Yeah...  It's just too easy to overlook this and use ordinary comparisons.
I've switched to "while (tail != head)".

> > +	mask = pipe->ring_size - 1;
> > +	head = pipe->head & mask;
> > +	tail = pipe->tail & mask;
> > +	n = pipe->head - pipe->tail;
> 
> I think it's confusing to "premask" head and tail here. Can you either
> drop that (pipe_set_size should hardly be a hot path?), or perhaps call
> them something else to avoid a future reader seeing an unmasked
> bufs[head] and thinking that's a bug?

I've made it now do the masking right before doing the memcpy calls and used
different variable names for it:

	if (n > 0) {
		unsigned int h = head & mask;
		unsigned int t = tail & mask;
		if (h > t) {
			memcpy(bufs, &pipe->bufs + t,
			       n * sizeof(struct pipe_buffer));
		} else {
			unsigned int tsize = pipe->ring_size - t;
			if (h > 0)
				memcpy(bufs + tsize, pipe->bufs,
				       h * sizeof(struct pipe_buffer));
			memcpy(bufs, pipe->bufs + t,
			       tsize * sizeof(struct pipe_buffer));
		}

> > -	data_start(i, &idx, start);
> > -	/* some of this one + all after this one */
> > -	npages = ((i->pipe->curbuf - idx - 1) & (i->pipe->buffers - 1)) + 1;
> > -	capacity = min(npages,maxpages) * PAGE_SIZE - *start;
> > +	data_start(i, &i_head, start);
> > +	p_tail = i->pipe->tail;
> > +	/* Amount of free space: some of this one + all after this one */
> > +	npages = (p_tail + i->pipe->ring_size) - i_head;
> 
> Hm, it's not clear that this is equivalent to the old computation. Since
> it seems repeated in a few places, could it be factored to a little
> helper (before this patch) and the "some of this one + all after this
> one" comment perhaps expanded to explain what is going on?

Yeah...  It's a bit weird, even before my changes.

However, looking at it again, it seems data_start() does the appropriate
calculations.  If there's space in the current head buffer, it returns the
offset to that and the head of that buffer, otherwise it advances the head
pointer and sets the offset to 0.

So I think the comment may actually be retrospective - referring to the state
that data_start() has given us, rather than talking about the next bit of
code.

I also wonder if pipe_get_pages_alloc() is broken.  It doesn't check to see
whether the buffer is full at the point it calls data_start().  However,
data_start() doesn't check either and, without this patch, will simply advance
and mask off the ring index - which may wrap.

The maths in the unpatched version is pretty icky and I'm not convinced it's
correct.

David

WARNING: multiple messages have this Message-ID (diff)
From: David Howells <dhowells@redhat.com>
To: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: dhowells@redhat.com, torvalds@linux-foundation.org,
	Casey Schaufler <casey@schaufler-ca.com>,
	Stephen Smalley <sds@tycho.nsa.gov>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	nicolas.dichtel@6wind.com, raven@themaw.net,
	Christian Brauner <christian@brauner.io>,
	keyrings@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-block@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 03/21] pipe: Use head and tail pointers for the ring, not cursor and length
Date: Thu, 17 Oct 2019 10:53:19 +0000	[thread overview]
Message-ID: <8694.1571309599@warthog.procyon.org.uk> (raw)
In-Reply-To: <b8799179-d389-8005-4f6d-845febc3bb23@rasmusvillemoes.dk>

Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> >  (6) The number of free slots in the ring is "(tail + pipe->ring_size) -
> >      head".
> 
> Seems an odd way of writing pipe->ring_size - (head - tail) ; i.e.
> obviously #free slots is #size minus #occupancy.

Perhaps so.  The way I was looking at it is the window into which things can
be written is tail...tail+ring_size; the number of free slots is the distance
from head to the end of the window.

Anyway, I now have a helper that does it your way.

> >  (7) The ring is full if "head >= (tail + pipe->ring_size)", which can also
> >      be written as "head - tail >= pipe->ring_size".
> >
> 
> No it cannot, it _must_ be written in the latter form.

Ah, you're right.  I have a helper now for that too.

> head-tail = pipe_size or head-tail >= pipe_size

In general, I'd prefer ">=" just in case tail gets in front of head.

Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> > Also split pipe->buffers into pipe->ring_size (which indicates the size of
> > the ring) and pipe->max_usage (which restricts the amount of ring that
> > write() is allowed to fill).  This allows for a pipe that is both writable
> > by the kernel notification facility and by userspace, allowing plenty of
> > ring space for notifications to be added whilst preventing userspace from
> > being able to use up too much buffer space.
> 
> That seems like something that should be added in a separate patch -
> adding ->max_usage and switching appropriate users of ->ring_size over,
> so it's more clear where you're using one or the other.

Okay.

> > +		ibuf = &pipe->bufs[tail];
> 
> I don't see where tail gets masked between tail = pipe->tail;

Yeah - I missed that one.

> In any case, how about seeding head and tail with something like 1<<20 when
> creating the pipe so bugs like that are hit more quickly.

That's sounds like a good idea.

> > +			while (tail < head) {
> > +				count += pipe->bufs[tail & mask].len;
> > +				tail++;
> >  			}
> 
> This is broken if head has wrapped but tail has not. It has to be "while
> (head - tail)" or perhaps just "while (tail != head)" or something along
> those lines.

Yeah...  It's just too easy to overlook this and use ordinary comparisons.
I've switched to "while (tail != head)".

> > +	mask = pipe->ring_size - 1;
> > +	head = pipe->head & mask;
> > +	tail = pipe->tail & mask;
> > +	n = pipe->head - pipe->tail;
> 
> I think it's confusing to "premask" head and tail here. Can you either
> drop that (pipe_set_size should hardly be a hot path?), or perhaps call
> them something else to avoid a future reader seeing an unmasked
> bufs[head] and thinking that's a bug?

I've made it now do the masking right before doing the memcpy calls and used
different variable names for it:

	if (n > 0) {
		unsigned int h = head & mask;
		unsigned int t = tail & mask;
		if (h > t) {
			memcpy(bufs, &pipe->bufs + t,
			       n * sizeof(struct pipe_buffer));
		} else {
			unsigned int tsize = pipe->ring_size - t;
			if (h > 0)
				memcpy(bufs + tsize, pipe->bufs,
				       h * sizeof(struct pipe_buffer));
			memcpy(bufs, pipe->bufs + t,
			       tsize * sizeof(struct pipe_buffer));
		}

> > -	data_start(i, &idx, start);
> > -	/* some of this one + all after this one */
> > -	npages = ((i->pipe->curbuf - idx - 1) & (i->pipe->buffers - 1)) + 1;
> > -	capacity = min(npages,maxpages) * PAGE_SIZE - *start;
> > +	data_start(i, &i_head, start);
> > +	p_tail = i->pipe->tail;
> > +	/* Amount of free space: some of this one + all after this one */
> > +	npages = (p_tail + i->pipe->ring_size) - i_head;
> 
> Hm, it's not clear that this is equivalent to the old computation. Since
> it seems repeated in a few places, could it be factored to a little
> helper (before this patch) and the "some of this one + all after this
> one" comment perhaps expanded to explain what is going on?

Yeah...  It's a bit weird, even before my changes.

However, looking at it again, it seems data_start() does the appropriate
calculations.  If there's space in the current head buffer, it returns the
offset to that and the head of that buffer, otherwise it advances the head
pointer and sets the offset to 0.

So I think the comment may actually be retrospective - referring to the state
that data_start() has given us, rather than talking about the next bit of
code.

I also wonder if pipe_get_pages_alloc() is broken.  It doesn't check to see
whether the buffer is full at the point it calls data_start().  However,
data_start() doesn't check either and, without this patch, will simply advance
and mask off the ring index - which may wrap.

The maths in the unpatched version is pretty icky and I'm not convinced it's
correct.

David

  parent reply	other threads:[~2019-10-17 10:53 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-15 21:47 [RFC PATCH 00/21] pipe: Keyrings, Block and USB notifications David Howells
2019-10-15 21:47 ` David Howells
2019-10-15 21:47 ` David Howells
2019-10-15 21:47 ` [RFC PATCH 01/21] pipe: Reduce #inclusion of pipe_fs_i.h David Howells
2019-10-15 21:47   ` David Howells
2019-10-15 21:47   ` David Howells
2019-10-15 21:48 ` [RFC PATCH 02/21] Add a prelocked wake-up David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 22:14   ` Linus Torvalds
2019-10-15 22:14     ` Linus Torvalds
2019-10-16 17:02     ` Tim Chen
2019-10-16 17:02       ` Tim Chen
2019-10-15 22:33   ` David Howells
2019-10-15 22:33     ` David Howells
2019-10-16 14:26   ` David Howells
2019-10-16 14:26     ` David Howells
2019-10-16 15:31     ` Linus Torvalds
2019-10-16 15:31       ` Linus Torvalds
2019-10-15 21:48 ` [RFC PATCH 03/21] pipe: Use head and tail pointers for the ring, not cursor and length David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 21:48   ` David Howells
2019-10-16  7:46   ` Rasmus Villemoes
2019-10-16  7:46     ` Rasmus Villemoes
2019-10-17 10:53   ` David Howells [this message]
2019-10-17 10:53     ` David Howells
2019-10-15 21:48 ` [RFC PATCH 04/21] pipe: Advance tail pointer inside of wait spinlock in pipe_read() David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 21:48 ` [RFC PATCH 05/21] pipe: Conditionalise wakeup " David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 21:48 ` [RFC PATCH 06/21] pipe: Rearrange sequence in pipe_write() to preallocate slot David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 21:48 ` [RFC PATCH 07/21] pipe: Remove redundant wakeup from pipe_write() David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 21:48   ` David Howells
2019-10-15 21:49 ` [RFC PATCH 08/21] pipe: Check for ring full inside of the spinlock in pipe_write() David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 22:20   ` Linus Torvalds
2019-10-15 22:20     ` Linus Torvalds
2019-10-15 21:49 ` [RFC PATCH 09/21] uapi: General notification queue definitions David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49 ` [RFC PATCH 10/21] security: Add hooks to rule on setting a watch David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49 ` [RFC PATCH 11/21] security: Add a hook for the point of notification insertion David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49 ` [RFC PATCH 12/21] pipe: Add general notification queue support David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49 ` [RFC PATCH 13/21] keys: Add a notification facility David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49 ` [RFC PATCH 14/21] Add sample notification program David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:49   ` David Howells
2019-10-15 21:50 ` [RFC PATCH 15/21] pipe: Allow buffers to be marked read-whole-or-error for notifications David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50 ` [RFC PATCH 16/21] pipe: Add notification lossage handling David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50 ` [RFC PATCH 17/21] Add a general, global device notification watch list David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50 ` [RFC PATCH 18/21] block: Add block layer notifications David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50 ` [RFC PATCH 19/21] usb: Add USB subsystem notifications David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50 ` [RFC PATCH 20/21] selinux: Implement the watch_key security hook David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50 ` [RFC PATCH 21/21] smack: Implement the watch_key and post_notification hooks David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 21:50   ` David Howells
2019-10-15 22:11 ` [RFC PATCH 00/21] pipe: Keyrings, Block and USB notifications James Morris
2019-10-15 22:11   ` James Morris
2019-10-15 22:11   ` James Morris
2019-10-15 22:32 ` Linus Torvalds
2019-10-15 22:32   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8694.1571309599@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=casey@schaufler-ca.com \
    --cc=christian@brauner.io \
    --cc=gregkh@linuxfoundation.org \
    --cc=keyrings@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=nicolas.dichtel@6wind.com \
    --cc=raven@themaw.net \
    --cc=sds@tycho.nsa.gov \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.