* [PATCHSET v3 0/3] Improve IOCB_NOWAIT O_DIRECT reads
@ 2021-02-24 16:44 Jens Axboe
2021-02-24 16:44 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Jens Axboe @ 2021-02-24 16:44 UTC (permalink / raw)
To: linux-fsdevel, linux-mm; +Cc: akpm
Hi,
For v1, see:
https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/
tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache
entries for the given range. This causes unnecessary work from the callers
side, when the IO could have been issued totally fine without blocking on
writeback when there is none.
fs/iomap/direct-io.c | 24 ++++++++++++++--------
include/linux/fs.h | 2 ++
mm/filemap.c | 47 ++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 63 insertions(+), 10 deletions(-)
Andrew, any chance you can pick this up for 5.12?
Since v2:
- Drop overly long line (hch)
- Rebase to master, iomap changed flags to iomap_flags
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper
2021-02-24 16:44 [PATCHSET v3 0/3] Improve IOCB_NOWAIT O_DIRECT reads Jens Axboe
@ 2021-02-24 16:44 ` Jens Axboe
2021-02-24 16:50 ` Matthew Wilcox
2021-02-24 17:22 ` Jan Kara
2021-02-24 16:44 ` [PATCH 2/3] mm: use filemap_range_needs_writeback() for O_DIRECT reads Jens Axboe
2021-02-24 16:44 ` [PATCH 3/3] iomap: " Jens Axboe
2 siblings, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2021-02-24 16:44 UTC (permalink / raw)
To: linux-fsdevel, linux-mm; +Cc: akpm, Jens Axboe
For O_DIRECT reads/writes, we check if we need to issue a call to
filemap_write_and_wait_range() to issue and/or wait for writeback for any
page in the given range. The existing mechanism just checks for a page in
the range, which is suboptimal for IOCB_NOWAIT as we'll fallback to the
slow path (and needing retry) if there's just a clean page cache page in
the range.
Provide filemap_range_needs_writeback() which tries a little harder to
check if we actually need to issue and/or wait for writeback in the
range.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
include/linux/fs.h | 2 ++
mm/filemap.c | 43 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6d8b1e7337e4..4925275e6365 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2633,6 +2633,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
loff_t lend);
+extern bool filemap_range_needs_writeback(struct address_space *,
+ loff_t lstart, loff_t lend);
extern int filemap_write_and_wait_range(struct address_space *mapping,
loff_t lstart, loff_t lend);
extern int __filemap_fdatawrite_range(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index 6ff2a3fb0dc7..13338f877677 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -635,6 +635,49 @@ static bool mapping_needs_writeback(struct address_space *mapping)
return mapping->nrpages;
}
+/**
+ * filemap_range_needs_writeback - check if range potentially needs writeback
+ * @mapping: address space within which to check
+ * @start_byte: offset in bytes where the range starts
+ * @end_byte: offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback. Used by O_DIRECT
+ * read/write with IOCB_NOWAIT, to see if the caller needs to do
+ * filemap_write_and_wait_range() before proceeding.
+ *
+ * Return: %true if the caller should do filemap_write_and_wait_range() before
+ * doing O_DIRECT to a page in this range, %false otherwise.
+ */
+bool filemap_range_needs_writeback(struct address_space *mapping,
+ loff_t start_byte, loff_t end_byte)
+{
+ XA_STATE(xas, &mapping->i_pages, start_byte >> PAGE_SHIFT);
+ pgoff_t max = end_byte >> PAGE_SHIFT;
+ struct page *page;
+
+ if (!mapping_needs_writeback(mapping))
+ return false;
+ if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) &&
+ !mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
+ return false;
+ if (end_byte < start_byte)
+ return false;
+
+ rcu_read_lock();
+ xas_for_each(&xas, page, max) {
+ if (xas_retry(&xas, page))
+ continue;
+ if (xa_is_value(page))
+ continue;
+ if (PageDirty(page) || PageLocked(page) || PageWriteback(page))
+ break;
+ }
+ rcu_read_unlock();
+ return page != NULL;
+}
+EXPORT_SYMBOL_GPL(filemap_range_needs_writeback);
+
/**
* filemap_write_and_wait_range - write out & wait on a file range
* @mapping: the address_space for the pages
--
2.30.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper
2021-02-24 16:44 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
@ 2021-02-24 16:50 ` Matthew Wilcox
2021-02-24 16:53 ` Jens Axboe
2021-02-24 17:22 ` Jan Kara
1 sibling, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2021-02-24 16:50 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-fsdevel, linux-mm, akpm
On Wed, Feb 24, 2021 at 09:44:53AM -0700, Jens Axboe wrote:
> +++ b/include/linux/fs.h
> @@ -2633,6 +2633,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
>
> extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
> loff_t lend);
> +extern bool filemap_range_needs_writeback(struct address_space *,
> + loff_t lstart, loff_t lend);
> extern int filemap_write_and_wait_range(struct address_space *mapping,
> loff_t lstart, loff_t lend);
> extern int __filemap_fdatawrite_range(struct address_space *mapping,
These prototypes should all be in pagemap.h, not fs.h. I can do
a followup patch if you don't want to do it as part of this set.
Also we're deprecating the use of 'extern' for prototypes.
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper
2021-02-24 16:50 ` Matthew Wilcox
@ 2021-02-24 16:53 ` Jens Axboe
0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2021-02-24 16:53 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, akpm
On 2/24/21 9:50 AM, Matthew Wilcox wrote:
> On Wed, Feb 24, 2021 at 09:44:53AM -0700, Jens Axboe wrote:
>> +++ b/include/linux/fs.h
>> @@ -2633,6 +2633,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
>>
>> extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
>> loff_t lend);
>> +extern bool filemap_range_needs_writeback(struct address_space *,
>> + loff_t lstart, loff_t lend);
>> extern int filemap_write_and_wait_range(struct address_space *mapping,
>> loff_t lstart, loff_t lend);
>> extern int __filemap_fdatawrite_range(struct address_space *mapping,
>
> These prototypes should all be in pagemap.h, not fs.h. I can do
> a followup patch if you don't want to do it as part of this set.
> Also we're deprecating the use of 'extern' for prototypes.
Agree on both of those, but that should probably be a separate patch
for both of those. extern is just following the existing style, and
fs.h is the existing location...
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Thanks!
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper
2021-02-24 16:44 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
2021-02-24 16:50 ` Matthew Wilcox
@ 2021-02-24 17:22 ` Jan Kara
1 sibling, 0 replies; 16+ messages in thread
From: Jan Kara @ 2021-02-24 17:22 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-fsdevel, linux-mm, akpm
On Wed 24-02-21 09:44:53, Jens Axboe wrote:
> For O_DIRECT reads/writes, we check if we need to issue a call to
> filemap_write_and_wait_range() to issue and/or wait for writeback for any
> page in the given range. The existing mechanism just checks for a page in
> the range, which is suboptimal for IOCB_NOWAIT as we'll fallback to the
> slow path (and needing retry) if there's just a clean page cache page in
> the range.
>
> Provide filemap_range_needs_writeback() which tries a little harder to
> check if we actually need to issue and/or wait for writeback in the
> range.
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Looks good to me. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> include/linux/fs.h | 2 ++
> mm/filemap.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 6d8b1e7337e4..4925275e6365 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2633,6 +2633,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
>
> extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
> loff_t lend);
> +extern bool filemap_range_needs_writeback(struct address_space *,
> + loff_t lstart, loff_t lend);
> extern int filemap_write_and_wait_range(struct address_space *mapping,
> loff_t lstart, loff_t lend);
> extern int __filemap_fdatawrite_range(struct address_space *mapping,
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 6ff2a3fb0dc7..13338f877677 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -635,6 +635,49 @@ static bool mapping_needs_writeback(struct address_space *mapping)
> return mapping->nrpages;
> }
>
> +/**
> + * filemap_range_needs_writeback - check if range potentially needs writeback
> + * @mapping: address space within which to check
> + * @start_byte: offset in bytes where the range starts
> + * @end_byte: offset in bytes where the range ends (inclusive)
> + *
> + * Find at least one page in the range supplied, usually used to check if
> + * direct writing in this range will trigger a writeback. Used by O_DIRECT
> + * read/write with IOCB_NOWAIT, to see if the caller needs to do
> + * filemap_write_and_wait_range() before proceeding.
> + *
> + * Return: %true if the caller should do filemap_write_and_wait_range() before
> + * doing O_DIRECT to a page in this range, %false otherwise.
> + */
> +bool filemap_range_needs_writeback(struct address_space *mapping,
> + loff_t start_byte, loff_t end_byte)
> +{
> + XA_STATE(xas, &mapping->i_pages, start_byte >> PAGE_SHIFT);
> + pgoff_t max = end_byte >> PAGE_SHIFT;
> + struct page *page;
> +
> + if (!mapping_needs_writeback(mapping))
> + return false;
> + if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) &&
> + !mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
> + return false;
> + if (end_byte < start_byte)
> + return false;
> +
> + rcu_read_lock();
> + xas_for_each(&xas, page, max) {
> + if (xas_retry(&xas, page))
> + continue;
> + if (xa_is_value(page))
> + continue;
> + if (PageDirty(page) || PageLocked(page) || PageWriteback(page))
> + break;
> + }
> + rcu_read_unlock();
> + return page != NULL;
> +}
> +EXPORT_SYMBOL_GPL(filemap_range_needs_writeback);
> +
> /**
> * filemap_write_and_wait_range - write out & wait on a file range
> * @mapping: the address_space for the pages
> --
> 2.30.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/3] mm: use filemap_range_needs_writeback() for O_DIRECT reads
2021-02-24 16:44 [PATCHSET v3 0/3] Improve IOCB_NOWAIT O_DIRECT reads Jens Axboe
2021-02-24 16:44 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
@ 2021-02-24 16:44 ` Jens Axboe
2021-02-24 16:57 ` Matthew Wilcox
2021-02-24 17:23 ` Jan Kara
2021-02-24 16:44 ` [PATCH 3/3] iomap: " Jens Axboe
2 siblings, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2021-02-24 16:44 UTC (permalink / raw)
To: linux-fsdevel, linux-mm; +Cc: akpm, Jens Axboe
For the generic page cache read helper, use the better variant of checking
for the need to call filemap_write_and_wait_range() when doing O_DIRECT
reads. This avoids falling back to the slow path for IOCB_NOWAIT, if there
are no pages to wait for (or write out).
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
mm/filemap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 13338f877677..77f1b527541e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2645,8 +2645,8 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
size = i_size_read(inode);
if (iocb->ki_flags & IOCB_NOWAIT) {
- if (filemap_range_has_page(mapping, iocb->ki_pos,
- iocb->ki_pos + count - 1))
+ if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
+ iocb->ki_pos + count - 1))
return -EAGAIN;
} else {
retval = filemap_write_and_wait_range(mapping,
--
2.30.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] mm: use filemap_range_needs_writeback() for O_DIRECT reads
2021-02-24 16:44 ` [PATCH 2/3] mm: use filemap_range_needs_writeback() for O_DIRECT reads Jens Axboe
@ 2021-02-24 16:57 ` Matthew Wilcox
2021-02-24 17:23 ` Jan Kara
1 sibling, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2021-02-24 16:57 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-fsdevel, linux-mm, akpm
On Wed, Feb 24, 2021 at 09:44:54AM -0700, Jens Axboe wrote:
> For the generic page cache read helper, use the better variant of checking
> for the need to call filemap_write_and_wait_range() when doing O_DIRECT
> reads. This avoids falling back to the slow path for IOCB_NOWAIT, if there
> are no pages to wait for (or write out).
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] mm: use filemap_range_needs_writeback() for O_DIRECT reads
2021-02-24 16:44 ` [PATCH 2/3] mm: use filemap_range_needs_writeback() for O_DIRECT reads Jens Axboe
2021-02-24 16:57 ` Matthew Wilcox
@ 2021-02-24 17:23 ` Jan Kara
1 sibling, 0 replies; 16+ messages in thread
From: Jan Kara @ 2021-02-24 17:23 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-fsdevel, linux-mm, akpm
On Wed 24-02-21 09:44:54, Jens Axboe wrote:
> For the generic page cache read helper, use the better variant of checking
> for the need to call filemap_write_and_wait_range() when doing O_DIRECT
> reads. This avoids falling back to the slow path for IOCB_NOWAIT, if there
> are no pages to wait for (or write out).
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
> mm/filemap.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Looks good to me. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 13338f877677..77f1b527541e 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2645,8 +2645,8 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>
> size = i_size_read(inode);
> if (iocb->ki_flags & IOCB_NOWAIT) {
> - if (filemap_range_has_page(mapping, iocb->ki_pos,
> - iocb->ki_pos + count - 1))
> + if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
> + iocb->ki_pos + count - 1))
> return -EAGAIN;
> } else {
> retval = filemap_write_and_wait_range(mapping,
> --
> 2.30.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/3] iomap: use filemap_range_needs_writeback() for O_DIRECT reads
2021-02-24 16:44 [PATCHSET v3 0/3] Improve IOCB_NOWAIT O_DIRECT reads Jens Axboe
2021-02-24 16:44 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
2021-02-24 16:44 ` [PATCH 2/3] mm: use filemap_range_needs_writeback() for O_DIRECT reads Jens Axboe
@ 2021-02-24 16:44 ` Jens Axboe
2021-02-24 17:25 ` Jan Kara
2 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2021-02-24 16:44 UTC (permalink / raw)
To: linux-fsdevel, linux-mm; +Cc: akpm, Jens Axboe
For reads, use the better variant of checking for the need to call
filemap_write_and_wait_range() when doing O_DIRECT. This avoids falling
back to the slow path for IOCB_NOWAIT, if there are no pages to wait for
(or write out).
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
fs/iomap/direct-io.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index e2c4991833b8..8c35a0041f0f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -487,12 +487,28 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (pos >= dio->i_size)
goto out_free_dio;
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ if (filemap_range_needs_writeback(mapping, pos, end)) {
+ ret = -EAGAIN;
+ goto out_free_dio;
+ }
+ iomap_flags |= IOMAP_NOWAIT;
+ }
+
if (iter_is_iovec(iter))
dio->flags |= IOMAP_DIO_DIRTY;
} else {
iomap_flags |= IOMAP_WRITE;
dio->flags |= IOMAP_DIO_WRITE;
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ if (filemap_range_has_page(mapping, pos, end)) {
+ ret = -EAGAIN;
+ goto out_free_dio;
+ }
+ iomap_flags |= IOMAP_NOWAIT;
+ }
+
/* for data sync or sync, we need sync completion processing */
if (iocb->ki_flags & IOCB_DSYNC)
dio->flags |= IOMAP_DIO_NEED_SYNC;
@@ -507,14 +523,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->flags |= IOMAP_DIO_WRITE_FUA;
}
- if (iocb->ki_flags & IOCB_NOWAIT) {
- if (filemap_range_has_page(mapping, pos, end)) {
- ret = -EAGAIN;
- goto out_free_dio;
- }
- iomap_flags |= IOMAP_NOWAIT;
- }
-
if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
ret = -EAGAIN;
if (pos >= dio->i_size || pos + count > dio->i_size)
--
2.30.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] iomap: use filemap_range_needs_writeback() for O_DIRECT reads
2021-02-24 16:44 ` [PATCH 3/3] iomap: " Jens Axboe
@ 2021-02-24 17:25 ` Jan Kara
0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2021-02-24 17:25 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-fsdevel, linux-mm, akpm
On Wed 24-02-21 09:44:55, Jens Axboe wrote:
> For reads, use the better variant of checking for the need to call
> filemap_write_and_wait_range() when doing O_DIRECT. This avoids falling
> back to the slow path for IOCB_NOWAIT, if there are no pages to wait for
> (or write out).
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Looks good to me. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/iomap/direct-io.c | 24 ++++++++++++++++--------
> 1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index e2c4991833b8..8c35a0041f0f 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -487,12 +487,28 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (pos >= dio->i_size)
> goto out_free_dio;
>
> + if (iocb->ki_flags & IOCB_NOWAIT) {
> + if (filemap_range_needs_writeback(mapping, pos, end)) {
> + ret = -EAGAIN;
> + goto out_free_dio;
> + }
> + iomap_flags |= IOMAP_NOWAIT;
> + }
> +
> if (iter_is_iovec(iter))
> dio->flags |= IOMAP_DIO_DIRTY;
> } else {
> iomap_flags |= IOMAP_WRITE;
> dio->flags |= IOMAP_DIO_WRITE;
>
> + if (iocb->ki_flags & IOCB_NOWAIT) {
> + if (filemap_range_has_page(mapping, pos, end)) {
> + ret = -EAGAIN;
> + goto out_free_dio;
> + }
> + iomap_flags |= IOMAP_NOWAIT;
> + }
> +
> /* for data sync or sync, we need sync completion processing */
> if (iocb->ki_flags & IOCB_DSYNC)
> dio->flags |= IOMAP_DIO_NEED_SYNC;
> @@ -507,14 +523,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> dio->flags |= IOMAP_DIO_WRITE_FUA;
> }
>
> - if (iocb->ki_flags & IOCB_NOWAIT) {
> - if (filemap_range_has_page(mapping, pos, end)) {
> - ret = -EAGAIN;
> - goto out_free_dio;
> - }
> - iomap_flags |= IOMAP_NOWAIT;
> - }
> -
> if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
> ret = -EAGAIN;
> if (pos >= dio->i_size || pos + count > dio->i_size)
> --
> 2.30.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCHSET v2 0/3] Improve IOCB_NOWAIT O_DIRECT reads
@ 2021-02-09 2:30 Jens Axboe
2021-02-09 2:30 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2021-02-09 2:30 UTC (permalink / raw)
To: linux-fsdevel, linux-mm; +Cc: hch, akpm
Hi,
For v1, see:
https://lore.kernel.org/linux-fsdevel/20210208221829.17247-1-axboe@kernel.dk/
tldr; don't -EAGAIN IOCB_NOWAIT dio reads just because we have page cache
entries for the given range. This causes unnecessary work from the callers
side, when the IO could have been issued totally fine without blocking on
writeback when there is none.
fs/iomap/direct-io.c | 23 ++++++++++++++--------
include/linux/fs.h | 2 ++
mm/filemap.c | 47 ++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 62 insertions(+), 10 deletions(-)
Since v1:
- Simplify the filemap_range_needs_writeback() loop (Willy)
- Drop the write side (Chinner)
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper
2021-02-09 2:30 [PATCHSET v2 0/3] Improve IOCB_NOWAIT " Jens Axboe
@ 2021-02-09 2:30 ` Jens Axboe
2021-02-09 7:47 ` Christoph Hellwig
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2021-02-09 2:30 UTC (permalink / raw)
To: linux-fsdevel, linux-mm; +Cc: hch, akpm, Jens Axboe
For O_DIRECT reads/writes, we check if we need to issue a call to
filemap_write_and_wait_range() to issue and/or wait for writeback for any
page in the given range. The existing mechanism just checks for a page in
the range, which is suboptimal for IOCB_NOWAIT as we'll fallback to the
slow path (and needing retry) if there's just a clean page cache page in
the range.
Provide filemap_range_needs_writeback() which tries a little harder to
check if we actually need to issue and/or wait for writeback in the
range.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
include/linux/fs.h | 2 ++
mm/filemap.c | 43 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd47deea7c17..def89222dfe9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2633,6 +2633,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
loff_t lend);
+extern bool filemap_range_needs_writeback(struct address_space *,
+ loff_t lstart, loff_t lend);
extern int filemap_write_and_wait_range(struct address_space *mapping,
loff_t lstart, loff_t lend);
extern int __filemap_fdatawrite_range(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index aa0e0fb04670..6a58d50fbd31 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -633,6 +633,49 @@ static bool mapping_needs_writeback(struct address_space *mapping)
return mapping->nrpages;
}
+/**
+ * filemap_range_needs_writeback - check if range potentially needs writeback
+ * @mapping: address space within which to check
+ * @start_byte: offset in bytes where the range starts
+ * @end_byte: offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback. Used by O_DIRECT
+ * read/write with IOCB_NOWAIT, to see if the caller needs to do
+ * filemap_write_and_wait_range() before proceeding.
+ *
+ * Return: %true if the caller should do filemap_write_and_wait_range() before
+ * doing O_DIRECT to a page in this range, %false otherwise.
+ */
+bool filemap_range_needs_writeback(struct address_space *mapping,
+ loff_t start_byte, loff_t end_byte)
+{
+ XA_STATE(xas, &mapping->i_pages, start_byte >> PAGE_SHIFT);
+ pgoff_t max = end_byte >> PAGE_SHIFT;
+ struct page *page;
+
+ if (!mapping_needs_writeback(mapping))
+ return false;
+ if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) &&
+ !mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
+ return false;
+ if (end_byte < start_byte)
+ return false;
+
+ rcu_read_lock();
+ xas_for_each(&xas, page, max) {
+ if (xas_retry(&xas, page))
+ continue;
+ if (xa_is_value(page))
+ continue;
+ if (PageDirty(page) || PageLocked(page) || PageWriteback(page))
+ break;
+ }
+ rcu_read_unlock();
+ return page != NULL;
+}
+EXPORT_SYMBOL_GPL(filemap_range_needs_writeback);
+
/**
* filemap_write_and_wait_range - write out & wait on a file range
* @mapping: the address_space for the pages
--
2.30.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCHSET 0/3] Improve IOCB_NOWAIT O_DIRECT
@ 2021-02-08 22:18 Jens Axboe
2021-02-08 22:18 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2021-02-08 22:18 UTC (permalink / raw)
To: linux-fsdevel, linux-mm; +Cc: hch, akpm
Hi,
Ran into an issue with IOCB_NOWAIT and O_DIRECT, which causes a rather
serious performance issue. If IOCB_NOWAIT is set, the generic/iomap
iterators check for page cache presence in the given range, and return
-EAGAIN if any is there. This is rather simplistic and looks like
something that was never really finished. For !IOCB_NOWAIT, we simply
call filemap_write_and_wait_range() to issue (if any) and wait on the
range. The fact that we have page cache entries for this range does
not mean that we cannot safely do O_DIRECT IO to/from it.
This series adds filemap_range_needs_writeback(), which checks if
we have pages in the range that do require us to call
filemap_write_and_wait_range(). If we don't, then we can proceed just
fine with IOCB_NOWAIT.
The problem manifested itself in a production environment, where someone
is doing O_DIRECT on a raw block device. Due to other circumstances,
blkid was triggered on this device periodically, and blkid very helpfully
does a number of page cache reads on the device. Now the mapping has
page cache entries, and performance falls to pieces because we can no
longer reliably do IOCB_NOWAIT O_DIRECT.
Patch 1 adds the helper, patch 2 uses it for the generic iterators, and
patch 3 applies the same to the iomap direct-io code.
fs/iomap/direct-io.c | 10 ++++-----
include/linux/fs.h | 2 ++
mm/filemap.c | 52 +++++++++++++++++++++++++++++++++++++++++---
3 files changed, 56 insertions(+), 8 deletions(-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper
2021-02-08 22:18 [PATCHSET 0/3] Improve IOCB_NOWAIT O_DIRECT Jens Axboe
@ 2021-02-08 22:18 ` Jens Axboe
2021-02-08 23:02 ` Matthew Wilcox
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2021-02-08 22:18 UTC (permalink / raw)
To: linux-fsdevel, linux-mm; +Cc: hch, akpm, Jens Axboe
For O_DIRECT reads/writes, we check if we need to issue a call to
filemap_write_and_wait_range() to issue and/or wait for writeback for any
page in the given range. The existing mechanism just checks for a page in
the range, which is suboptimal for IOCB_NOWAIT as we'll fallback to the
slow path (and needing retry) if there's just a clean page cache page in
the range.
Provide filemap_range_needs_writeback() which tries a little harder to
check if we actually need to issue and/or wait for writeback in the
range.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
include/linux/fs.h | 2 ++
mm/filemap.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 48 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd47deea7c17..def89222dfe9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2633,6 +2633,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
loff_t lend);
+extern bool filemap_range_needs_writeback(struct address_space *,
+ loff_t lstart, loff_t lend);
extern int filemap_write_and_wait_range(struct address_space *mapping,
loff_t lstart, loff_t lend);
extern int __filemap_fdatawrite_range(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index aa0e0fb04670..1ed7acac8a1b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -633,6 +633,52 @@ static bool mapping_needs_writeback(struct address_space *mapping)
return mapping->nrpages;
}
+/**
+ * filemap_range_needs_writeback - check if range potentially needs writeback
+ * @mapping: address space within which to check
+ * @start_byte: offset in bytes where the range starts
+ * @end_byte: offset in bytes where the range ends (inclusive)
+ *
+ * Find at least one page in the range supplied, usually used to check if
+ * direct writing in this range will trigger a writeback. Used by O_DIRECT
+ * read/write with IOCB_NOWAIT, to see if the caller needs to do
+ * filemap_write_and_wait_range() before proceeding.
+ *
+ * Return: %true if the caller should do filemap_write_and_wait_range() before
+ * doing O_DIRECT to a page in this range, %false otherwise.
+ */
+bool filemap_range_needs_writeback(struct address_space *mapping,
+ loff_t start_byte, loff_t end_byte)
+{
+ struct page *page = NULL, *head;
+ XA_STATE(xas, &mapping->i_pages, start_byte >> PAGE_SHIFT);
+ pgoff_t max = end_byte >> PAGE_SHIFT;
+
+ if (!mapping_needs_writeback(mapping))
+ return false;
+ if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) &&
+ !mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
+ return false;
+ if (end_byte < start_byte)
+ return false;
+
+ rcu_read_lock();
+ xas_for_each(&xas, head, max) {
+ if (xas_retry(&xas, head))
+ continue;
+ if (xa_is_value(head))
+ continue;
+ page = find_subpage(head, xas.xa_index);
+ if (PageDirty(page) || PageLocked(page) || PageWriteback(page))
+ break;
+ page = NULL;
+ }
+ rcu_read_unlock();
+
+ return page != NULL;
+}
+EXPORT_SYMBOL_GPL(filemap_range_needs_writeback);
+
/**
* filemap_write_and_wait_range - write out & wait on a file range
* @mapping: the address_space for the pages
--
2.30.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper
2021-02-08 22:18 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
@ 2021-02-08 23:02 ` Matthew Wilcox
2021-02-08 23:21 ` Jens Axboe
0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2021-02-08 23:02 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-fsdevel, linux-mm, hch, akpm
On Mon, Feb 08, 2021 at 03:18:27PM -0700, Jens Axboe wrote:
> + rcu_read_lock();
> + xas_for_each(&xas, head, max) {
> + if (xas_retry(&xas, head))
> + continue;
> + if (xa_is_value(head))
> + continue;
> + page = find_subpage(head, xas.xa_index);
> + if (PageDirty(page) || PageLocked(page) || PageWriteback(page))
> + break;
> + page = NULL;
> + }
> + rcu_read_unlock();
There's no need to find the sub-page for any of these three conditions --
the bit will be set on the head page.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper
2021-02-08 23:02 ` Matthew Wilcox
@ 2021-02-08 23:21 ` Jens Axboe
0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2021-02-08 23:21 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, hch, akpm
On 2/8/21 4:02 PM, Matthew Wilcox wrote:
> On Mon, Feb 08, 2021 at 03:18:27PM -0700, Jens Axboe wrote:
>> + rcu_read_lock();
>> + xas_for_each(&xas, head, max) {
>> + if (xas_retry(&xas, head))
>> + continue;
>> + if (xa_is_value(head))
>> + continue;
>> + page = find_subpage(head, xas.xa_index);
>> + if (PageDirty(page) || PageLocked(page) || PageWriteback(page))
>> + break;
>> + page = NULL;
>> + }
>> + rcu_read_unlock();
>
> There's no need to find the sub-page for any of these three conditions --
> the bit will be set on the head page.
Gotcha, that makes it simpler. I'll make the edit.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-02-24 17:26 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-24 16:44 [PATCHSET v3 0/3] Improve IOCB_NOWAIT O_DIRECT reads Jens Axboe
2021-02-24 16:44 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
2021-02-24 16:50 ` Matthew Wilcox
2021-02-24 16:53 ` Jens Axboe
2021-02-24 17:22 ` Jan Kara
2021-02-24 16:44 ` [PATCH 2/3] mm: use filemap_range_needs_writeback() for O_DIRECT reads Jens Axboe
2021-02-24 16:57 ` Matthew Wilcox
2021-02-24 17:23 ` Jan Kara
2021-02-24 16:44 ` [PATCH 3/3] iomap: " Jens Axboe
2021-02-24 17:25 ` Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2021-02-09 2:30 [PATCHSET v2 0/3] Improve IOCB_NOWAIT " Jens Axboe
2021-02-09 2:30 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
2021-02-09 7:47 ` Christoph Hellwig
2021-02-09 14:30 ` Jens Axboe
2021-02-08 22:18 [PATCHSET 0/3] Improve IOCB_NOWAIT O_DIRECT Jens Axboe
2021-02-08 22:18 ` [PATCH 1/3] mm: provide filemap_range_needs_writeback() helper Jens Axboe
2021-02-08 23:02 ` Matthew Wilcox
2021-02-08 23:21 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.