linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: torvalds@linux-foundation.org
Cc: dhowells@redhat.com, Casey Schaufler <casey@schaufler-ca.com>,
	Stephen Smalley <sds@tycho.nsa.gov>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	nicolas.dichtel@6wind.com, raven@themaw.net,
	Christian Brauner <christian@brauner.io>,
	dhowells@redhat.com, keyrings@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-block@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH 06/21] pipe: Rearrange sequence in pipe_write() to preallocate slot
Date: Tue, 15 Oct 2019 22:48:43 +0100	[thread overview]
Message-ID: <157117612345.15019.10880434751559609346.stgit@warthog.procyon.org.uk> (raw)
In-Reply-To: <157117606853.15019.15459271147790470307.stgit@warthog.procyon.org.uk>

Rearrange the sequence in pipe_write() so that the allocation of the new
buffer, the allocation of a ring slot and the attachment to the ring is
done under the pipe wait spinlock and then the lock is dropped and the
buffer can be filled.

The data copy needs to be done with the spinlock unheld and irqs enabled,
so the lock needs to be dropped first.  However, the reader can't progress
as we're holding pipe->mutex.

We also need to drop the lock as that would impact others looking at the
pipe waitqueue, such as poll(), the consumer and a future kernel message
writer.

We just abandon the preallocated slot if we get a copy error.  Future
writes may continue it and a future read will eventually recycle it.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/pipe.c |   53 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 0d25cb090a03..5a199b249191 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -386,7 +386,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *filp = iocb->ki_filp;
 	struct pipe_inode_info *pipe = filp->private_data;
-	unsigned int head, tail, buffers, mask;
+	unsigned int head, buffers, mask;
 	ssize_t ret = 0;
 	int do_wakeup = 0;
 	size_t total_len = iov_iter_count(from);
@@ -404,14 +404,13 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
 		goto out;
 	}
 
-	tail = pipe->tail;
 	head = pipe->head;
 	buffers = pipe->max_usage;
 	mask = pipe->ring_size - 1;
 
 	/* We try to merge small writes */
 	chars = total_len & (PAGE_SIZE-1); /* size of the last buffer */
-	if (head != tail && chars != 0) {
+	if (head != pipe->tail && chars != 0) {
 		struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
 		int offset = buf->offset + buf->len;
 
@@ -440,9 +439,9 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
 			break;
 		}
 
-		tail = pipe->tail;
-		if (head - tail < buffers) {
-			struct pipe_buffer *buf = &pipe->bufs[head & mask];
+		head = pipe->head;
+		if (head - pipe->tail < buffers) {
+			struct pipe_buffer *buf;
 			struct page *page = pipe->tmp_page;
 			int copied;
 
@@ -454,40 +453,56 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
 				}
 				pipe->tmp_page = page;
 			}
+
+			/* Allocate a slot in the ring in advance and attach an
+			 * empty buffer.  If we fault or otherwise fail to use
+			 * it, either the reader will consume it or it'll still
+			 * be there for the next write.
+			 */
+			spin_lock_irq(&pipe->wait.lock);
+
+			head = pipe->head;
+			pipe_commit_write(pipe, head + 1);
+
 			/* Always wake up, even if the copy fails. Otherwise
 			 * we lock up (O_NONBLOCK-)readers that sleep due to
 			 * syscall merging.
 			 * FIXME! Is this really true?
 			 */
-			do_wakeup = 1;
-			copied = copy_page_from_iter(page, 0, PAGE_SIZE, from);
-			if (unlikely(copied < PAGE_SIZE && iov_iter_count(from))) {
-				if (!ret)
-					ret = -EFAULT;
-				break;
-			}
-			ret += copied;
+			prelocked_wake_up_interruptible_sync_poll(
+				&pipe->wait, EPOLLIN | EPOLLRDNORM);
+
+			spin_unlock_irq(&pipe->wait.lock);
+			kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
 
 			/* Insert it into the buffer array */
+			buf = &pipe->bufs[head & mask];
 			buf->page = page;
 			buf->ops = &anon_pipe_buf_ops;
 			buf->offset = 0;
-			buf->len = copied;
+			buf->len = 0;
 			buf->flags = 0;
 			if (is_packetized(filp)) {
 				buf->ops = &packet_pipe_buf_ops;
 				buf->flags = PIPE_BUF_FLAG_PACKET;
 			}
-
-			head++;
-			pipe_commit_write(pipe, head);
 			pipe->tmp_page = NULL;
 
+			copied = copy_page_from_iter(page, 0, PAGE_SIZE, from);
+			if (unlikely(copied < PAGE_SIZE && iov_iter_count(from))) {
+				if (!ret)
+					ret = -EFAULT;
+				break;
+			}
+			ret += copied;
+			buf->offset = 0;
+			buf->len = copied;
+
 			if (!iov_iter_count(from))
 				break;
 		}
 
-		if (head - tail < buffers)
+		if (pipe->head - pipe->tail < buffers)
 			continue;
 
 		/* Wait for buffer space to become available. */


  parent reply	other threads:[~2019-10-15 21:48 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-15 21:47 [RFC PATCH 00/21] pipe: Keyrings, Block and USB notifications David Howells
2019-10-15 21:47 ` [RFC PATCH 01/21] pipe: Reduce #inclusion of pipe_fs_i.h David Howells
2019-10-15 21:48 ` [RFC PATCH 02/21] Add a prelocked wake-up David Howells
2019-10-15 22:14   ` Linus Torvalds
2019-10-16 17:02     ` Tim Chen
2019-10-15 22:33   ` David Howells
2019-10-16 14:26   ` David Howells
2019-10-16 15:31     ` Linus Torvalds
2019-10-15 21:48 ` [RFC PATCH 03/21] pipe: Use head and tail pointers for the ring, not cursor and length David Howells
2019-10-16  7:46   ` Rasmus Villemoes
2019-10-17 10:53   ` David Howells
2019-10-15 21:48 ` [RFC PATCH 04/21] pipe: Advance tail pointer inside of wait spinlock in pipe_read() David Howells
2019-10-15 21:48 ` [RFC PATCH 05/21] pipe: Conditionalise wakeup " David Howells
2019-10-15 21:48 ` David Howells [this message]
2019-10-15 21:48 ` [RFC PATCH 07/21] pipe: Remove redundant wakeup from pipe_write() David Howells
2019-10-15 21:49 ` [RFC PATCH 08/21] pipe: Check for ring full inside of the spinlock in pipe_write() David Howells
2019-10-15 22:20   ` Linus Torvalds
2019-10-15 21:49 ` [RFC PATCH 09/21] uapi: General notification queue definitions David Howells
2019-10-15 21:49 ` [RFC PATCH 10/21] security: Add hooks to rule on setting a watch David Howells
2019-10-15 21:49 ` [RFC PATCH 11/21] security: Add a hook for the point of notification insertion David Howells
2019-10-15 21:49 ` [RFC PATCH 12/21] pipe: Add general notification queue support David Howells
2019-10-15 21:49 ` [RFC PATCH 13/21] keys: Add a notification facility David Howells
2019-10-15 21:49 ` [RFC PATCH 14/21] Add sample notification program David Howells
2019-10-15 21:50 ` [RFC PATCH 15/21] pipe: Allow buffers to be marked read-whole-or-error for notifications David Howells
2019-10-15 21:50 ` [RFC PATCH 16/21] pipe: Add notification lossage handling David Howells
2019-10-15 21:50 ` [RFC PATCH 17/21] Add a general, global device notification watch list David Howells
2019-10-15 21:50 ` [RFC PATCH 18/21] block: Add block layer notifications David Howells
2019-10-15 21:50 ` [RFC PATCH 19/21] usb: Add USB subsystem notifications David Howells
2019-10-15 21:50 ` [RFC PATCH 20/21] selinux: Implement the watch_key security hook David Howells
2019-10-15 21:50 ` [RFC PATCH 21/21] smack: Implement the watch_key and post_notification hooks David Howells
2019-10-15 22:11 ` [RFC PATCH 00/21] pipe: Keyrings, Block and USB notifications James Morris
2019-10-15 22:32 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=157117612345.15019.10880434751559609346.stgit@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=casey@schaufler-ca.com \
    --cc=christian@brauner.io \
    --cc=gregkh@linuxfoundation.org \
    --cc=keyrings@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=raven@themaw.net \
    --cc=sds@tycho.nsa.gov \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [RFC PATCH 06/21] pipe: Rearrange sequence in pipe_write() to preallocate slot' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).