* [PATCH 0/2] Killable synchronous BIO submission
@ 2020-10-16 16:14 Matthew Wilcox (Oracle)
2020-10-16 16:14 ` [PATCH 1/2] block: Add submit_bio_killable Matthew Wilcox (Oracle)
2020-10-16 16:14 ` [PATCH 2/2] fs: Make mpage_readpage synchronous Matthew Wilcox (Oracle)
0 siblings, 2 replies; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-16 16:14 UTC (permalink / raw)
To: linux-block, linux-fsdevel; +Cc: Matthew Wilcox (Oracle)
It would be nice to be able to report the actual errors from block devices
instead of the default -EIO. In order to do that, we need to execute the
BIO synchronously as the only way to get the error back to the caller is
by returning it from readpage() -- we can't store it in the struct page.
But we need to be able to respond to a fatal signal, as we do today with
lock_page_killable(). This turns out to be quite hard. The solution
I settled on is that the caller must pass in an alternate end_io to be
called asynchronously if a fatal signal arrives.
I believe the synchronize_rcu() call to be sufficient to ensure that the
old bi_end_io() will not be called. If there are callers of bi_end_io()
from BH-enabled regions, it may not be! Perhaps we could put a warning
in bio_endio() to make sure that's true?
Matthew Wilcox (Oracle) (2):
block: Add submit_bio_killable
fs: Make mpage_readpage synchronous
block/bio.c | 87 +++++++++++++++++++++++++++++---------
fs/mpage.c | 25 +++++++++--
include/linux/bio.h | 1 +
include/linux/completion.h | 1 +
kernel/sched/completion.c | 9 ++--
5 files changed, 97 insertions(+), 26 deletions(-)
--
2.28.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] block: Add submit_bio_killable
2020-10-16 16:14 [PATCH 0/2] Killable synchronous BIO submission Matthew Wilcox (Oracle)
@ 2020-10-16 16:14 ` Matthew Wilcox (Oracle)
2020-10-16 16:14 ` [PATCH 2/2] fs: Make mpage_readpage synchronous Matthew Wilcox (Oracle)
1 sibling, 0 replies; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-16 16:14 UTC (permalink / raw)
To: linux-block, linux-fsdevel; +Cc: Matthew Wilcox (Oracle)
This new function allows the user to interrupt the I/O wait with a fatal
signal, as long as the caller provides an alternative function to clean
up once the I/O does complete.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
block/bio.c | 87 +++++++++++++++++++++++++++++---------
include/linux/bio.h | 1 +
include/linux/completion.h | 1 +
kernel/sched/completion.c | 9 ++--
4 files changed, 75 insertions(+), 23 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index a9931f23d933..83e1d22c053e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1129,39 +1129,88 @@ static void submit_bio_wait_endio(struct bio *bio)
complete(bio->bi_private);
}
+static
+int __submit_bio_wait(struct bio *bio, struct completion *done, int state)
+{
+ long timeout;
+ long ret = 0;
+
+ bio->bi_private = done;
+ bio->bi_end_io = submit_bio_wait_endio;
+ bio->bi_opf |= REQ_SYNC;
+ submit_bio(bio);
+
+ /* Prevent hang_check timer from firing at us during very long I/O */
+ timeout = sysctl_hung_task_timeout_secs * (HZ / 2);
+ if (!timeout)
+ timeout = MAX_SCHEDULE_TIMEOUT;
+ while (ret == 0)
+ ret = __wait_for_completion_io(done, timeout, state);
+
+ if (ret < 0)
+ return ret;
+
+ return blk_status_to_errno(bio->bi_status);
+}
+
/**
- * submit_bio_wait - submit a bio, and wait until it completes
+ * submit_bio_wait - Submit a bio, and wait for it to complete.
* @bio: The &struct bio which describes the I/O
*
* Simple wrapper around submit_bio(). Returns 0 on success, or the error from
* bio_endio() on failure.
*
- * WARNING: Unlike to how submit_bio() is usually used, this function does not
- * result in bio reference to be consumed. The caller must drop the reference
- * on his own.
+ * WARNING: Unlike how submit_bio() is usually used, this function does
+ * not result in a bio reference being consumed. The caller must drop
+ * the reference.
*/
int submit_bio_wait(struct bio *bio)
{
DECLARE_COMPLETION_ONSTACK_MAP(done, bio->bi_disk->lockdep_map);
- unsigned long hang_check;
+ return __submit_bio_wait(bio, &done, TASK_UNINTERRUPTIBLE);
+}
+EXPORT_SYMBOL(submit_bio_wait);
- bio->bi_private = &done;
- bio->bi_end_io = submit_bio_wait_endio;
- bio->bi_opf |= REQ_SYNC;
- submit_bio(bio);
+/**
+ * submit_bio_killable - Submit a bio, and wait for it to complete.
+ * @bio: The &struct bio which describes the I/O
+ * @async_end_io: Callback for interrupted waits
+ *
+ * Submits the BIO to the block device and waits for it to complete.
+ * If the wait is interrupted by a fatal signal, the @async_end_io
+ * will be called instead. Unlike submit_bio_wait(), the bio will
+ * have its reference count consumed by this call (unless we get a
+ * fatal signal; in which case the reference count should be
+ * consumed by @async_end_io).
+ *
+ * Return: 0 if the bio completed successfully, -ERESTARTSYS if we received
+ * a fatal signal, or a different errno if the bio could not complete.
+ */
+int submit_bio_killable(struct bio *bio, bio_end_io_t async_end_io)
+{
+ DECLARE_COMPLETION_ONSTACK_MAP(cmpl, bio->bi_disk->lockdep_map);
+ int err = __submit_bio_wait(bio, &cmpl, TASK_KILLABLE);
- /* Prevent hang_check timer from firing at us during very long I/O */
- hang_check = sysctl_hung_task_timeout_secs;
- if (hang_check)
- while (!wait_for_completion_io_timeout(&done,
- hang_check * (HZ/2)))
- ;
- else
- wait_for_completion_io(&done);
+ if (likely(err != -ERESTARTSYS))
+ goto completed;
- return blk_status_to_errno(bio->bi_status);
+ bio->bi_end_io = async_end_io;
+ synchronize_rcu();
+ /*
+ * Nobody can touch the completion now, but it may have been
+ * completed while we waited. It doesn't really matter what
+ * error we return since the task is about to die, but we need
+ * to not leak the bio.
+ */
+ if (!cmpl.done)
+ return err;
+
+ err = blk_status_to_errno(bio->bi_status);
+completed:
+ bio_put(bio);
+ return err;
}
-EXPORT_SYMBOL(submit_bio_wait);
+EXPORT_SYMBOL(submit_bio_killable);
/**
* bio_advance - increment/complete a bio by some number of bytes
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c6d765382926..f254bc79bb3a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -431,6 +431,7 @@ static inline void bio_wouldblock_error(struct bio *bio)
struct request_queue;
extern int submit_bio_wait(struct bio *bio);
+extern int submit_bio_killable(struct bio *bio, bio_end_io_t async_end_io);
extern void bio_advance(struct bio *, unsigned);
extern void bio_init(struct bio *bio, struct bio_vec *table,
diff --git a/include/linux/completion.h b/include/linux/completion.h
index bf8e77001f18..c7d557d5639c 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -108,6 +108,7 @@ extern unsigned long wait_for_completion_timeout(struct completion *x,
unsigned long timeout);
extern unsigned long wait_for_completion_io_timeout(struct completion *x,
unsigned long timeout);
+long __wait_for_completion_io(struct completion *x, long timeout, int state);
extern long wait_for_completion_interruptible_timeout(
struct completion *x, unsigned long timeout);
extern long wait_for_completion_killable_timeout(
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index a778554f9dad..3cb97b754b44 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -117,11 +117,12 @@ wait_for_common(struct completion *x, long timeout, int state)
return __wait_for_common(x, schedule_timeout, timeout, state);
}
-static long __sched
-wait_for_common_io(struct completion *x, long timeout, int state)
+long __sched
+__wait_for_completion_io(struct completion *x, long timeout, int state)
{
return __wait_for_common(x, io_schedule_timeout, timeout, state);
}
+EXPORT_SYMBOL_GPL(__wait_for_completion_io);
/**
* wait_for_completion: - waits for completion of a task
@@ -168,7 +169,7 @@ EXPORT_SYMBOL(wait_for_completion_timeout);
*/
void __sched wait_for_completion_io(struct completion *x)
{
- wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
+ __wait_for_completion_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
}
EXPORT_SYMBOL(wait_for_completion_io);
@@ -188,7 +189,7 @@ EXPORT_SYMBOL(wait_for_completion_io);
unsigned long __sched
wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
{
- return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE);
+ return __wait_for_completion_io(x, timeout, TASK_UNINTERRUPTIBLE);
}
EXPORT_SYMBOL(wait_for_completion_io_timeout);
--
2.28.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] fs: Make mpage_readpage synchronous
2020-10-16 16:14 [PATCH 0/2] Killable synchronous BIO submission Matthew Wilcox (Oracle)
2020-10-16 16:14 ` [PATCH 1/2] block: Add submit_bio_killable Matthew Wilcox (Oracle)
@ 2020-10-16 16:14 ` Matthew Wilcox (Oracle)
2020-10-16 16:56 ` Keith Busch
1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-16 16:14 UTC (permalink / raw)
To: linux-block, linux-fsdevel; +Cc: Matthew Wilcox (Oracle)
A synchronous readpage lets us report the actual errno instead of
ineffectively setting PageError.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/mpage.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/fs/mpage.c b/fs/mpage.c
index 830e6cc2a9e7..88aba79a1861 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -406,11 +406,30 @@ int mpage_readpage(struct page *page, get_block_t get_block)
.nr_pages = 1,
.get_block = get_block,
};
+ int err;
args.bio = do_mpage_readpage(&args);
- if (args.bio)
- mpage_bio_submit(REQ_OP_READ, 0, args.bio);
- return 0;
+ /*
+ * XXX: We can't tell the difference between "fell back to
+ * block_read_full_page" and "this was a hole". If we could,
+ * we'd avoid unlocking the page in do_mpage_readpage() and
+ * return AOP_UPDATED_PAGE here.
+ */
+ if (!args.bio)
+ return 0;
+ bio_set_op_attrs(args.bio, REQ_OP_READ, 0);
+ guard_bio_eod(args.bio);
+ err = submit_bio_killable(args.bio, mpage_end_io);
+ if (unlikely(err))
+ goto err;
+
+ SetPageUptodate(page);
+ return AOP_UPDATED_PAGE;
+
+err:
+ if (err != -ERESTARTSYS)
+ unlock_page(page);
+ return err;
}
EXPORT_SYMBOL(mpage_readpage);
--
2.28.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] fs: Make mpage_readpage synchronous
2020-10-16 16:14 ` [PATCH 2/2] fs: Make mpage_readpage synchronous Matthew Wilcox (Oracle)
@ 2020-10-16 16:56 ` Keith Busch
2020-10-16 16:59 ` Matthew Wilcox
0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2020-10-16 16:56 UTC (permalink / raw)
To: Matthew Wilcox (Oracle); +Cc: linux-block, linux-fsdevel
On Fri, Oct 16, 2020 at 05:14:26PM +0100, Matthew Wilcox (Oracle) wrote:
> + err = submit_bio_killable(args.bio, mpage_end_io);
> + if (unlikely(err))
> + goto err;
> +
> + SetPageUptodate(page);
On success, isn't the page already set up to date through the
mpage_end_io() callback?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] fs: Make mpage_readpage synchronous
2020-10-16 16:56 ` Keith Busch
@ 2020-10-16 16:59 ` Matthew Wilcox
0 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2020-10-16 16:59 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-block, linux-fsdevel
On Fri, Oct 16, 2020 at 09:56:08AM -0700, Keith Busch wrote:
> On Fri, Oct 16, 2020 at 05:14:26PM +0100, Matthew Wilcox (Oracle) wrote:
> > + err = submit_bio_killable(args.bio, mpage_end_io);
> > + if (unlikely(err))
> > + goto err;
> > +
> > + SetPageUptodate(page);
>
> On success, isn't the page already set up to date through the
> mpage_end_io() callback?
On success, mpage_end_io() isn't called. And we don't want it to be
because we don't want the page to be unlocked. I should probably have
included a reference to
https://lore.kernel.org/linux-fsdevel/20201016160443.18685-1-willy@infradead.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-10-16 16:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-16 16:14 [PATCH 0/2] Killable synchronous BIO submission Matthew Wilcox (Oracle)
2020-10-16 16:14 ` [PATCH 1/2] block: Add submit_bio_killable Matthew Wilcox (Oracle)
2020-10-16 16:14 ` [PATCH 2/2] fs: Make mpage_readpage synchronous Matthew Wilcox (Oracle)
2020-10-16 16:56 ` Keith Busch
2020-10-16 16:59 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).