Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2] Killable synchronous BIO submission
@ 2020-10-16 16:14 Matthew Wilcox (Oracle)
  2020-10-16 16:14 ` [PATCH 1/2] block: Add submit_bio_killable Matthew Wilcox (Oracle)
  2020-10-16 16:14 ` [PATCH 2/2] fs: Make mpage_readpage synchronous Matthew Wilcox (Oracle)
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-16 16:14 UTC (permalink / raw)
  To: linux-block, linux-fsdevel; +Cc: Matthew Wilcox (Oracle)

It would be nice to be able to report the actual errors from block devices
instead of the default -EIO.  In order to do that, we need to execute the
BIO synchronously as the only way to get the error back to the caller is
by returning it from readpage() -- we can't store it in the struct page.

But we need to be able to respond to a fatal signal, as we do today with
lock_page_killable().  This turns out to be quite hard.  The solution
I settled on is that the caller must pass in an alternate end_io to be
called asynchronously if a fatal signal arrives.

I believe the synchronize_rcu() call to be sufficient to ensure that the
old bi_end_io() will not be called.  If there are callers of bi_end_io()
from BH-enabled regions, it may not be!  Perhaps we could put a warning
in bio_endio() to make sure that's true?

Matthew Wilcox (Oracle) (2):
  block: Add submit_bio_killable
  fs: Make mpage_readpage synchronous

 block/bio.c                | 87 +++++++++++++++++++++++++++++---------
 fs/mpage.c                 | 25 +++++++++--
 include/linux/bio.h        |  1 +
 include/linux/completion.h |  1 +
 kernel/sched/completion.c  |  9 ++--
 5 files changed, 97 insertions(+), 26 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] block: Add submit_bio_killable
  2020-10-16 16:14 [PATCH 0/2] Killable synchronous BIO submission Matthew Wilcox (Oracle)
@ 2020-10-16 16:14 ` Matthew Wilcox (Oracle)
  2020-10-16 16:14 ` [PATCH 2/2] fs: Make mpage_readpage synchronous Matthew Wilcox (Oracle)
  1 sibling, 0 replies; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-16 16:14 UTC (permalink / raw)
  To: linux-block, linux-fsdevel; +Cc: Matthew Wilcox (Oracle)

This new function allows the user to interrupt the I/O wait with a fatal
signal, as long as the caller provides an alternative function to clean
up once the I/O does complete.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 block/bio.c                | 87 +++++++++++++++++++++++++++++---------
 include/linux/bio.h        |  1 +
 include/linux/completion.h |  1 +
 kernel/sched/completion.c  |  9 ++--
 4 files changed, 75 insertions(+), 23 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index a9931f23d933..83e1d22c053e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1129,39 +1129,88 @@ static void submit_bio_wait_endio(struct bio *bio)
 	complete(bio->bi_private);
 }
 
+static
+int __submit_bio_wait(struct bio *bio, struct completion *done, int state)
+{
+	long timeout;
+	long ret = 0;
+
+	bio->bi_private = done;
+	bio->bi_end_io = submit_bio_wait_endio;
+	bio->bi_opf |= REQ_SYNC;
+	submit_bio(bio);
+
+	/* Prevent hang_check timer from firing at us during very long I/O */
+	timeout = sysctl_hung_task_timeout_secs * (HZ / 2);
+	if (!timeout)
+		timeout = MAX_SCHEDULE_TIMEOUT;
+	while (ret == 0)
+		ret = __wait_for_completion_io(done, timeout, state);
+
+	if (ret < 0)
+		return ret;
+
+	return blk_status_to_errno(bio->bi_status);
+}
+
 /**
- * submit_bio_wait - submit a bio, and wait until it completes
+ * submit_bio_wait - Submit a bio, and wait for it to complete.
  * @bio: The &struct bio which describes the I/O
  *
  * Simple wrapper around submit_bio(). Returns 0 on success, or the error from
  * bio_endio() on failure.
  *
- * WARNING: Unlike to how submit_bio() is usually used, this function does not
- * result in bio reference to be consumed. The caller must drop the reference
- * on his own.
+ * WARNING: Unlike how submit_bio() is usually used, this function does
+ * not result in a bio reference being consumed.  The caller must drop
+ * the reference.
  */
 int submit_bio_wait(struct bio *bio)
 {
 	DECLARE_COMPLETION_ONSTACK_MAP(done, bio->bi_disk->lockdep_map);
-	unsigned long hang_check;
+	return __submit_bio_wait(bio, &done, TASK_UNINTERRUPTIBLE);
+}
+EXPORT_SYMBOL(submit_bio_wait);
 
-	bio->bi_private = &done;
-	bio->bi_end_io = submit_bio_wait_endio;
-	bio->bi_opf |= REQ_SYNC;
-	submit_bio(bio);
+/**
+ * submit_bio_killable - Submit a bio, and wait for it to complete.
+ * @bio: The &struct bio which describes the I/O
+ * @async_end_io: Callback for interrupted waits
+ *
+ * Submits the BIO to the block device and waits for it to complete.
+ * If the wait is interrupted by a fatal signal, the @async_end_io
+ * will be called instead.  Unlike submit_bio_wait(), the bio will
+ * have its reference count consumed by this call (unless we get a
+ * fatal signal; in which case the reference count should be
+ * consumed by @async_end_io).
+ *
+ * Return: 0 if the bio completed successfully, -ERESTARTSYS if we received
+ * a fatal signal, or a different errno if the bio could not complete.
+ */
+int submit_bio_killable(struct bio *bio, bio_end_io_t async_end_io)
+{
+	DECLARE_COMPLETION_ONSTACK_MAP(cmpl, bio->bi_disk->lockdep_map);
+	int err = __submit_bio_wait(bio, &cmpl, TASK_KILLABLE);
 
-	/* Prevent hang_check timer from firing at us during very long I/O */
-	hang_check = sysctl_hung_task_timeout_secs;
-	if (hang_check)
-		while (!wait_for_completion_io_timeout(&done,
-					hang_check * (HZ/2)))
-			;
-	else
-		wait_for_completion_io(&done);
+	if (likely(err != -ERESTARTSYS))
+		goto completed;
 
-	return blk_status_to_errno(bio->bi_status);
+	bio->bi_end_io = async_end_io;
+	synchronize_rcu();
+	/*
+	 * Nobody can touch the completion now, but it may have been
+	 * completed while we waited.  It doesn't really matter what
+	 * error we return since the task is about to die, but we need
+	 * to not leak the bio.
+	 */
+	if (!cmpl.done)
+		return err;
+
+	err = blk_status_to_errno(bio->bi_status);
+completed:
+	bio_put(bio);
+	return err;
 }
-EXPORT_SYMBOL(submit_bio_wait);
+EXPORT_SYMBOL(submit_bio_killable);
 
 /**
  * bio_advance - increment/complete a bio by some number of bytes
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c6d765382926..f254bc79bb3a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -431,6 +431,7 @@ static inline void bio_wouldblock_error(struct bio *bio)
 struct request_queue;
 
 extern int submit_bio_wait(struct bio *bio);
+extern int submit_bio_killable(struct bio *bio, bio_end_io_t async_end_io);
 extern void bio_advance(struct bio *, unsigned);
 
 extern void bio_init(struct bio *bio, struct bio_vec *table,
diff --git a/include/linux/completion.h b/include/linux/completion.h
index bf8e77001f18..c7d557d5639c 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -108,6 +108,7 @@ extern unsigned long wait_for_completion_timeout(struct completion *x,
 						   unsigned long timeout);
 extern unsigned long wait_for_completion_io_timeout(struct completion *x,
 						    unsigned long timeout);
+long __wait_for_completion_io(struct completion *x, long timeout, int state);
 extern long wait_for_completion_interruptible_timeout(
 	struct completion *x, unsigned long timeout);
 extern long wait_for_completion_killable_timeout(
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index a778554f9dad..3cb97b754b44 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -117,11 +117,12 @@ wait_for_common(struct completion *x, long timeout, int state)
 	return __wait_for_common(x, schedule_timeout, timeout, state);
 }
 
-static long __sched
-wait_for_common_io(struct completion *x, long timeout, int state)
+long __sched
+__wait_for_completion_io(struct completion *x, long timeout, int state)
 {
 	return __wait_for_common(x, io_schedule_timeout, timeout, state);
 }
+EXPORT_SYMBOL_GPL(__wait_for_completion_io);
 
 /**
  * wait_for_completion: - waits for completion of a task
@@ -168,7 +169,7 @@ EXPORT_SYMBOL(wait_for_completion_timeout);
  */
 void __sched wait_for_completion_io(struct completion *x)
 {
-	wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
+	__wait_for_completion_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(wait_for_completion_io);
 
@@ -188,7 +189,7 @@ EXPORT_SYMBOL(wait_for_completion_io);
 unsigned long __sched
 wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
 {
-	return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE);
+	return __wait_for_completion_io(x, timeout, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(wait_for_completion_io_timeout);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] fs: Make mpage_readpage synchronous
  2020-10-16 16:14 [PATCH 0/2] Killable synchronous BIO submission Matthew Wilcox (Oracle)
  2020-10-16 16:14 ` [PATCH 1/2] block: Add submit_bio_killable Matthew Wilcox (Oracle)
@ 2020-10-16 16:14 ` Matthew Wilcox (Oracle)
  2020-10-16 16:56   ` Keith Busch
  1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-16 16:14 UTC (permalink / raw)
  To: linux-block, linux-fsdevel; +Cc: Matthew Wilcox (Oracle)

A synchronous readpage lets us report the actual errno instead of
ineffectively setting PageError.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/mpage.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index 830e6cc2a9e7..88aba79a1861 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -406,11 +406,30 @@ int mpage_readpage(struct page *page, get_block_t get_block)
 		.nr_pages = 1,
 		.get_block = get_block,
 	};
+	int err;
 
 	args.bio = do_mpage_readpage(&args);
-	if (args.bio)
-		mpage_bio_submit(REQ_OP_READ, 0, args.bio);
-	return 0;
+	/*
+	 * XXX: We can't tell the difference between "fell back to
+	 * block_read_full_page" and "this was a hole".  If we could,
+	 * we'd avoid unlocking the page in do_mpage_readpage() and
+	 * return AOP_UPDATED_PAGE here.
+	 */
+	if (!args.bio)
+		return 0;
+	bio_set_op_attrs(args.bio, REQ_OP_READ, 0);
+	guard_bio_eod(args.bio);
+	err = submit_bio_killable(args.bio, mpage_end_io);
+	if (unlikely(err))
+		goto err;
+
+	SetPageUptodate(page);
+	return AOP_UPDATED_PAGE;
+
+err:
+	if (err != -ERESTARTSYS)
+		unlock_page(page);
+	return err;
 }
 EXPORT_SYMBOL(mpage_readpage);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] fs: Make mpage_readpage synchronous
  2020-10-16 16:14 ` [PATCH 2/2] fs: Make mpage_readpage synchronous Matthew Wilcox (Oracle)
@ 2020-10-16 16:56   ` Keith Busch
  2020-10-16 16:59     ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2020-10-16 16:56 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-block, linux-fsdevel

On Fri, Oct 16, 2020 at 05:14:26PM +0100, Matthew Wilcox (Oracle) wrote:
> +	err = submit_bio_killable(args.bio, mpage_end_io);
> +	if (unlikely(err))
> +		goto err;
> +
> +	SetPageUptodate(page);

On success, isn't the page already set up to date through the
mpage_end_io() callback?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] fs: Make mpage_readpage synchronous
  2020-10-16 16:56   ` Keith Busch
@ 2020-10-16 16:59     ` Matthew Wilcox
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2020-10-16 16:59 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, linux-fsdevel

On Fri, Oct 16, 2020 at 09:56:08AM -0700, Keith Busch wrote:
> On Fri, Oct 16, 2020 at 05:14:26PM +0100, Matthew Wilcox (Oracle) wrote:
> > +	err = submit_bio_killable(args.bio, mpage_end_io);
> > +	if (unlikely(err))
> > +		goto err;
> > +
> > +	SetPageUptodate(page);
> 
> On success, isn't the page already set up to date through the
> mpage_end_io() callback?

On success, mpage_end_io() isn't called.  And we don't want it to be
because we don't want the page to be unlocked.  I should probably have
included a reference to
https://lore.kernel.org/linux-fsdevel/20201016160443.18685-1-willy@infradead.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-16 16:14 [PATCH 0/2] Killable synchronous BIO submission Matthew Wilcox (Oracle)
2020-10-16 16:14 ` [PATCH 1/2] block: Add submit_bio_killable Matthew Wilcox (Oracle)
2020-10-16 16:14 ` [PATCH 2/2] fs: Make mpage_readpage synchronous Matthew Wilcox (Oracle)
2020-10-16 16:56   ` Keith Busch
2020-10-16 16:59     ` Matthew Wilcox

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git