linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] iomap: 1 cleanup, 1 fix, 1 optimization
@ 2018-12-13 22:25 Eric Sandeen
  2018-12-13 22:25 ` [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page Eric Sandeen
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Eric Sandeen @ 2018-12-13 22:25 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Christoph, Hellwig, hch

Patch 1 is trivial, to use the SECTOR_SIZE macro instead of open-coding.
Patch 2 is the one that matters: iomap_is_partially_uptodate was doing invalid memory accesses past the end of the iomap_page->uptodate bitmap.
Patch3 is a small optimization (?) based on what the non-iomap code does, i.e. don't bother checking each block if the "partial" range covers the whole page

These survived an xfstests -g auto run on 4k and 1k block size filesystems
on x86_64.  Careful review for off by ones is appreciated in any case. ;)

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page
  2018-12-13 22:25 [PATCH 0/3] iomap: 1 cleanup, 1 fix, 1 optimization Eric Sandeen
@ 2018-12-13 22:25 ` Eric Sandeen
  2018-12-15 10:51   ` Christoph Hellwig
  2018-12-13 22:25 ` [PATCH 2/3] iomap: don't search past page end in iomap_is_partially_uptodate Eric Sandeen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2018-12-13 22:25 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Christoph, Hellwig, hch

From: Eric Sandeen <sandeen@redhat.com>

because iomap_page_create initializes the uptodate bitmap using the macro,
not the open-coded value.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
---
 include/linux/iomap.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 9a42581..edcdb3a 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -4,6 +4,7 @@
 
 #include <linux/atomic.h>
 #include <linux/bitmap.h>
+#include <linux/blkdev.h>
 #include <linux/mm.h>
 #include <linux/types.h>
 #include <linux/mm_types.h>
@@ -104,12 +105,13 @@ struct iomap_ops {
 
 /*
  * Structure allocate for each page when block size < PAGE_SIZE to track
- * sub-page uptodate status and I/O completions.
+ * sub-page uptodate status and I/O completions.  Allocate bitmap to
+ * accomodate blocks as small as SECTOR_SIZE
  */
 struct iomap_page {
 	atomic_t		read_count;
 	atomic_t		write_count;
-	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
+	DECLARE_BITMAP(uptodate, PAGE_SIZE / SECTOR_SIZE);
 };
 
 static inline struct iomap_page *to_iomap_page(struct page *page)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] iomap: don't search past page end in iomap_is_partially_uptodate
  2018-12-13 22:25 [PATCH 0/3] iomap: 1 cleanup, 1 fix, 1 optimization Eric Sandeen
  2018-12-13 22:25 ` [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page Eric Sandeen
@ 2018-12-13 22:25 ` Eric Sandeen
  2018-12-14  2:57   ` Dave Chinner
  2018-12-14 22:14   ` [PATCH 2/3 V2] mm: don't search past page end in is_partially_uptodate Eric Sandeen
  2018-12-13 22:25 ` [PATCH 3/3] iomap: optimize iomap_is_partially_uptodate for full page range Eric Sandeen
  2018-12-13 22:27 ` [PATCH 0/3] iomap: 1 cleanup, 1 fix, 1 optimization Eric Sandeen
  3 siblings, 2 replies; 17+ messages in thread
From: Eric Sandeen @ 2018-12-13 22:25 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Christoph, Hellwig, hch

From: Eric Sandeen <sandeen@redhat.com>

iomap_is_partially_uptodate() is intended to check wither blocks within
the selected range of a not-uptodate page are uptodate; if the range we
care about is up to date, it's an optimization.

However, the iomap implementation continues to check all blocks up to
from+count, which is beyond the page, and can even be well beyond the
iop->uptodate bitmap.

I think the worst that will happen is that we may eventually find a zero
bit and return "not partially uptodate" when it would have otherwise
returned true, and skip the optimization.  Still, it's clearly an invalid
memory access that must be fixed.

So: fix this by limiting the search to within the page as is done in the
non-iomap variant, block_is_partially_uptodate().

Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k
page system:

 BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0
 Read of size 8 at addr ffff800120c3a318 by task fsstress/22337

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
---
 fs/iomap.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index d6bc98a..ce837d9 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -492,16 +492,29 @@ struct iomap_readpage_ctx {
 }
 EXPORT_SYMBOL_GPL(iomap_readpages);
 
+/*
+ * iomap_is_partially_uptodate checks whether blocks within a page are
+ * uptodate or not.
+ *
+ * Returns true if all blocks which correspond to a file portion
+ * we want to read within the page are uptodate.
+ */
 int
 iomap_is_partially_uptodate(struct page *page, unsigned long from,
 		unsigned long count)
 {
 	struct iomap_page *iop = to_iomap_page(page);
 	struct inode *inode = page->mapping->host;
-	unsigned first = from >> inode->i_blkbits;
-	unsigned last = (from + count - 1) >> inode->i_blkbits;
+	unsigned len, first, last;
 	unsigned i;
 
+	/* Limit range to one page */
+	len = min_t(unsigned, PAGE_SIZE - from, count);
+
+	/* First and last blocks in range within page */
+	first = from >> inode->i_blkbits;
+	last = (from + len - 1) >> inode->i_blkbits;
+
 	if (iop) {
 		for (i = first; i <= last; i++)
 			if (!test_bit(i, iop->uptodate))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] iomap: optimize iomap_is_partially_uptodate for full page range
  2018-12-13 22:25 [PATCH 0/3] iomap: 1 cleanup, 1 fix, 1 optimization Eric Sandeen
  2018-12-13 22:25 ` [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page Eric Sandeen
  2018-12-13 22:25 ` [PATCH 2/3] iomap: don't search past page end in iomap_is_partially_uptodate Eric Sandeen
@ 2018-12-13 22:25 ` Eric Sandeen
  2018-12-14  3:05   ` Dave Chinner
  2018-12-13 22:27 ` [PATCH 0/3] iomap: 1 cleanup, 1 fix, 1 optimization Eric Sandeen
  3 siblings, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2018-12-13 22:25 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Christoph, Hellwig, hch

From: Eric Sandeen <sandeen@redhat.com>

We only call ->is_partially_uptodate to look for an uptodate range
within a not-uptodate page.

If the range covers all blocks in the page, there is no point to checking
each block individually - if the whole range (i.e. the whole page) were
uptodate, the page would be uptodate as well.  Hence in this case, we
can return early and skip the loop.

This is similar to what is done in block_is_partially_uptodate().

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
---
 fs/iomap.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index ce837d9..7d7d985 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -515,6 +515,10 @@ struct iomap_readpage_ctx {
 	first = from >> inode->i_blkbits;
 	last = (from + len - 1) >> inode->i_blkbits;
 
+	/* If page wasn't uptodate and range covers all blocks: no partial */
+	if (first == 0 && last == (PAGE_SIZE - 1) >> inode->i_blkbits)
+		return 0;
+
 	if (iop) {
 		for (i = first; i <= last; i++)
 			if (!test_bit(i, iop->uptodate))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] iomap: 1 cleanup, 1 fix, 1 optimization
  2018-12-13 22:25 [PATCH 0/3] iomap: 1 cleanup, 1 fix, 1 optimization Eric Sandeen
                   ` (2 preceding siblings ...)
  2018-12-13 22:25 ` [PATCH 3/3] iomap: optimize iomap_is_partially_uptodate for full page range Eric Sandeen
@ 2018-12-13 22:27 ` Eric Sandeen
  3 siblings, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2018-12-13 22:27 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: hch

On 12/13/18 4:25 PM, Eric Sandeen wrote:
> Patch 1 is trivial, to use the SECTOR_SIZE macro instead of open-coding.
> Patch 2 is the one that matters: iomap_is_partially_uptodate was doing invalid memory accesses past the end of the iomap_page->uptodate bitmap.
> Patch3 is a small optimization (?) based on what the non-iomap code does, i.e. don't bother checking each block if the "partial" range covers the whole page
> 
> These survived an xfstests -g auto run on 4k and 1k block size filesystems
> on x86_64.  Careful review for off by ones is appreciated in any case. ;)

Ugh, I fatfingered a cc (or maybe I'll blame guilt) - careful on reply-all,
sorry about that.

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] iomap: don't search past page end in iomap_is_partially_uptodate
  2018-12-13 22:25 ` [PATCH 2/3] iomap: don't search past page end in iomap_is_partially_uptodate Eric Sandeen
@ 2018-12-14  2:57   ` Dave Chinner
  2018-12-14 13:48     ` Eric Sandeen
  2018-12-14 22:14   ` [PATCH 2/3 V2] mm: don't search past page end in is_partially_uptodate Eric Sandeen
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2018-12-14  2:57 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, linux-fsdevel, Christoph, Hellwig, hch

On Thu, Dec 13, 2018 at 04:25:28PM -0600, Eric Sandeen wrote:
> From: Eric Sandeen <sandeen@redhat.com>
> 
> iomap_is_partially_uptodate() is intended to check wither blocks within
> the selected range of a not-uptodate page are uptodate; if the range we
> care about is up to date, it's an optimization.
> 
> However, the iomap implementation continues to check all blocks up to
> from+count, which is beyond the page, and can even be well beyond the
> iop->uptodate bitmap.

The issue is that the caller does not limit the range to the page it
is checking to see if it is up to date. Hence we have to clamp it.

> I think the worst that will happen is that we may eventually find a zero
> bit and return "not partially uptodate" when it would have otherwise
> returned true, and skip the optimization.  Still, it's clearly an invalid
> memory access that must be fixed.

It depends on how far beyond the page the count extends. :P

> So: fix this by limiting the search to within the page as is done in the
> non-iomap variant, block_is_partially_uptodate().
> 
> Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k
> page system:
> 
>  BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0
>  Read of size 8 at addr ffff800120c3a318 by task fsstress/22337
> 
> Reported-by: Zorro Lang <zlang@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

SOB issue. :)

But hte code is the same as the patch I wrote yesterday for this and
tested overnight, so

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] iomap: optimize iomap_is_partially_uptodate for full page range
  2018-12-13 22:25 ` [PATCH 3/3] iomap: optimize iomap_is_partially_uptodate for full page range Eric Sandeen
@ 2018-12-14  3:05   ` Dave Chinner
  2018-12-14 14:10     ` Eric Sandeen
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2018-12-14  3:05 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, linux-fsdevel, Christoph, Hellwig, hch

On Thu, Dec 13, 2018 at 04:25:29PM -0600, Eric Sandeen wrote:
> From: Eric Sandeen <sandeen@redhat.com>
> 
> We only call ->is_partially_uptodate to look for an uptodate range
> within a not-uptodate page.
> 
> If the range covers all blocks in the page, there is no point to checking
> each block individually - if the whole range (i.e. the whole page) were
> uptodate, the page would be uptodate as well.  Hence in this case, we
> can return early and skip the loop.
> 
> This is similar to what is done in block_is_partially_uptodate().
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
> ---
>  fs/iomap.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index ce837d9..7d7d985 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -515,6 +515,10 @@ struct iomap_readpage_ctx {
>  	first = from >> inode->i_blkbits;
>  	last = (from + len - 1) >> inode->i_blkbits;
>  
> +	/* If page wasn't uptodate and range covers all blocks: no partial */
> +	if (first == 0 && last == (PAGE_SIZE - 1) >> inode->i_blkbits)
> +		return 0;
> +

Even if the range covers the entire page, we still have to check
that all the individual blocks in the page are up to date or not.
Hence I think this is wrong....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] iomap: don't search past page end in iomap_is_partially_uptodate
  2018-12-14  2:57   ` Dave Chinner
@ 2018-12-14 13:48     ` Eric Sandeen
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2018-12-14 13:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, linux-fsdevel, hch

On 12/13/18 8:57 PM, Dave Chinner wrote:
> On Thu, Dec 13, 2018 at 04:25:28PM -0600, Eric Sandeen wrote:
>> From: Eric Sandeen <sandeen@redhat.com>
>>
>> iomap_is_partially_uptodate() is intended to check wither blocks within
>> the selected range of a not-uptodate page are uptodate; if the range we
>> care about is up to date, it's an optimization.
>>
>> However, the iomap implementation continues to check all blocks up to
>> from+count, which is beyond the page, and can even be well beyond the
>> iop->uptodate bitmap.
> 
> The issue is that the caller does not limit the range to the page it
> is checking to see if it is up to date. Hence we have to clamp it.

yes.

(It occurs to me that maybe we should fix/change this in the caller instead
so the implementations don't all have to do the same thing?)

>> I think the worst that will happen is that we may eventually find a zero
>> bit and return "not partially uptodate" when it would have otherwise
>> returned true, and skip the optimization.  Still, it's clearly an invalid
>> memory access that must be fixed.
> 
> It depends on how far beyond the page the count extends. :P
> 
>> So: fix this by limiting the search to within the page as is done in the
>> non-iomap variant, block_is_partially_uptodate().
>>
>> Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k
>> page system:
>>
>>  BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0
>>  Read of size 8 at addr ffff800120c3a318 by task fsstress/22337
>>
>> Reported-by: Zorro Lang <zlang@redhat.com>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
> 
> SOB issue. :)

yeah :/

thanks for the review,
-Eric

> But hte code is the same as the patch I wrote yesterday for this and
> tested overnight, so
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> -Dave.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] iomap: optimize iomap_is_partially_uptodate for full page range
  2018-12-14  3:05   ` Dave Chinner
@ 2018-12-14 14:10     ` Eric Sandeen
  2018-12-17 23:21       ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2018-12-14 14:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, linux-fsdevel, hch

On 12/13/18 9:05 PM, Dave Chinner wrote:
> On Thu, Dec 13, 2018 at 04:25:29PM -0600, Eric Sandeen wrote:
>> From: Eric Sandeen <sandeen@redhat.com>
>>
>> We only call ->is_partially_uptodate to look for an uptodate range
>> within a not-uptodate page.
>>
>> If the range covers all blocks in the page, there is no point to checking
>> each block individually - if the whole range (i.e. the whole page) were
>> uptodate, the page would be uptodate as well.  Hence in this case, we
>> can return early and skip the loop.
>>
>> This is similar to what is done in block_is_partially_uptodate().
>>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
>> ---
>>  fs/iomap.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/iomap.c b/fs/iomap.c
>> index ce837d9..7d7d985 100644
>> --- a/fs/iomap.c
>> +++ b/fs/iomap.c
>> @@ -515,6 +515,10 @@ struct iomap_readpage_ctx {
>>  	first = from >> inode->i_blkbits;
>>  	last = (from + len - 1) >> inode->i_blkbits;
>>  
>> +	/* If page wasn't uptodate and range covers all blocks: no partial */
>> +	if (first == 0 && last == (PAGE_SIZE - 1) >> inode->i_blkbits)
>> +		return 0;
>> +
> 
> Even if the range covers the entire page, we still have to check
> that all the individual blocks in the page are up to date or not.
> Hence I think this is wrong....

Didn't PageUptodate(page) answer that question already in the caller?

The caller does:

                        if (PageUptodate(page))
                                goto page_ok;
...
                        if (!mapping->a_ops->is_partially_uptodate(page,
                                                        offset, iter->count))
                                goto page_not_up_to_date_locked;


So if it was found to be uptodate, we never called this op.  If we /were/
called (page was not uptodate) that can only be because at least one block in
the page is not uptodate.  So what's the need for checking them all again?

(unless it's because we're now doing it under the page lock?)

block_is_partially_uptodate does the same thing:

        if (from < blocksize && to > PAGE_SIZE - blocksize)
                return 0;

Still, this also seems like maybe something the caller should do instead.

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/3 V2] mm: don't search past page end in is_partially_uptodate
  2018-12-13 22:25 ` [PATCH 2/3] iomap: don't search past page end in iomap_is_partially_uptodate Eric Sandeen
  2018-12-14  2:57   ` Dave Chinner
@ 2018-12-14 22:14   ` Eric Sandeen
  2018-12-15 10:49     ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2018-12-14 22:14 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: hch

When generic_file_buffered_read() calls ->is_partially_uptodate() on
a page today, it passes in iov->count as the length to check,
which may extend past the end of the page.  The original
implementation, block_is_partially_uptodate(), clamps the length
to the end of the page.

However, the iomap implementation does not do so, and tries to check
all blocks to the end of the iovec, which may be beyond the page,
and may be beyond the iop->uptodate bitmap as well.

I think the worst that will happen is that we may eventually find a zero
bit and return "not partially uptodate" when it would have otherwise
returned true, and skip the optimization.  Still, it's clearly an
invalid memory access that must be fixed.

Zorro noticed this when KASAN went off for 512 byte blocks on a 64k
page system:

 BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0
 Read of size 8 at addr ffff800120c3a318 by task fsstress/22337

and dchinner pointed out the underlying flaw.

Fix this by limiting the range we ask for in the caller
(generic_file_buffered_read()) so that offset+len doesn't extend past
the page in question, and remove the clamping from the block variant.

This also hoists the "don't bother checking each block if we span
the whole page" optimization from the block implementation.  It
changes the arg types too, because we no longer need a long to hold
offsets or lengths that will fit within a page.

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
---

V2: Move the common code (clamping & optimizations) to the caller,
rather than re-implementing in the iomap implementation.

This makes my patch 3/3 unnecessary

Dave, if you've already done and tested the same thing, happy
to just use your patch if it's in better shape.

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index efea228c..b7cc610 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -208,7 +208,7 @@ prototypes:
 	int (*migratepage)(struct address_space *, struct page *, struct page *);
 	void (*putback_page) (struct page *);
 	int (*launder_page)(struct page *);
-	int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
+	int (*is_partially_uptodate)(struct page *, unsigned, unsigned);
 	int (*error_remove_page)(struct address_space *, struct page *);
 	int (*swap_activate)(struct file *);
 	int (*swap_deactivate)(struct file *);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 5f71a25..ed6d581 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -648,8 +648,7 @@ struct address_space_operations {
 	void (*putback_page) (struct page *);
 	int (*launder_page) (struct page *);
 
-	int (*is_partially_uptodate) (struct page *, unsigned long,
-					unsigned long);
+	int (*is_partially_uptodate) (struct page *, unsigned, unsigned);
 	void (*is_dirty_writeback) (struct page *, bool *, bool *);
 	int (*error_remove_page) (struct mapping *mapping, struct page *page);
 	int (*swap_activate)(struct file *);
diff --git a/fs/buffer.c b/fs/buffer.c
index 1286c2b..b8605e5 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2170,8 +2170,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
  * Returns true if all buffers which correspond to a file portion
  * we want to read are uptodate.
  */
-int block_is_partially_uptodate(struct page *page, unsigned long from,
-					unsigned long count)
+int block_is_partially_uptodate(struct page *page, unsigned int from,
+					unsigned int count)
 {
 	unsigned block_start, block_end, blocksize;
 	unsigned to;
@@ -2183,10 +2183,7 @@ int block_is_partially_uptodate(struct page *page, unsigned long from,
 
 	head = page_buffers(page);
 	blocksize = head->b_size;
-	to = min_t(unsigned, PAGE_SIZE - from, count);
-	to = from + to;
-	if (from < blocksize && to > PAGE_SIZE - blocksize)
-		return 0;
+	to = from + count;
 
 	bh = head;
 	block_start = 0;
diff --git a/fs/iomap.c b/fs/iomap.c
index d6bc98a..6fec633 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -493,8 +493,7 @@ struct iomap_readpage_ctx {
 EXPORT_SYMBOL_GPL(iomap_readpages);
 
 int
-iomap_is_partially_uptodate(struct page *page, unsigned long from,
-		unsigned long count)
+iomap_is_partially_uptodate(struct page *page, unsigned from, unsigned count)
 {
 	struct iomap_page *iop = to_iomap_page(page);
 	struct inode *inode = page->mapping->host;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 7b73ef7..0fef1e5 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -220,8 +220,8 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 			get_block_t *get_block, struct writeback_control *wbc,
 			bh_end_io_t *handler);
 int block_read_full_page(struct page*, get_block_t*);
-int block_is_partially_uptodate(struct page *page, unsigned long from,
-				unsigned long count);
+int block_is_partially_uptodate(struct page *page, unsigned from,
+				unsigned count);
 int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
 		unsigned flags, struct page **pagep, get_block_t *get_block);
 int __block_write_begin(struct page *page, loff_t pos, unsigned len,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c95c080..95a0ec6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -378,8 +378,7 @@ struct address_space_operations {
 	bool (*isolate_page)(struct page *, isolate_mode_t);
 	void (*putback_page)(struct page *);
 	int (*launder_page) (struct page *);
-	int (*is_partially_uptodate) (struct page *, unsigned long,
-					unsigned long);
+	int (*is_partially_uptodate) (struct page *, unsigned, unsigned);
 	void (*is_dirty_writeback) (struct page *, bool *, bool *);
 	int (*error_remove_page)(struct address_space *, struct page *);
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index edcdb3a..d59e986 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -127,8 +127,8 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 int iomap_readpages(struct address_space *mapping, struct list_head *pages,
 		unsigned nr_pages, const struct iomap_ops *ops);
 int iomap_set_page_dirty(struct page *page);
-int iomap_is_partially_uptodate(struct page *page, unsigned long from,
-		unsigned long count);
+int iomap_is_partially_uptodate(struct page *page, unsigned from,
+		unsigned count);
 int iomap_releasepage(struct page *page, gfp_t gfp_mask);
 void iomap_invalidatepage(struct page *page, unsigned int offset,
 		unsigned int len);
diff --git a/mm/filemap.c b/mm/filemap.c
index 81adec8..0fab63f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1985,7 +1985,8 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 	pgoff_t index;
 	pgoff_t last_index;
 	pgoff_t prev_index;
-	unsigned long offset;      /* offset into pagecache page */
+	unsigned offset, count;		/* offset into pagecache page */
+	unsigned blocksize;
 	unsigned int prev_offset;
 	int error = 0;
 
@@ -2045,9 +2046,21 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 			if (PageUptodate(page))
 				goto page_ok;
 
-			if (inode->i_blkbits == PAGE_SHIFT ||
+			blocksize = i_blocksize(inode);
+			/* There can be no partial for page-sized blocks */
+			if (blocksize == PAGE_SIZE ||
 					!mapping->a_ops->is_partially_uptodate)
 				goto page_not_up_to_date;
+			/* Limit query to within a single page */
+			count = min_t(unsigned, PAGE_SIZE - offset,
+							iter->count);
+			/*
+			 * If the range spans all blocks in the page, and the
+			 * page is not uptodate, we can't be partially uptodate.
+			 */
+			if (offset < blocksize &&
+			    offset + count > PAGE_SIZE - blocksize)
+				goto page_not_up_to_date;
 			/* pipes can't handle partially uptodate pages */
 			if (unlikely(iov_iter_is_pipe(iter)))
 				goto page_not_up_to_date;
@@ -2057,7 +2070,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 			if (!page->mapping)
 				goto page_not_up_to_date_locked;
 			if (!mapping->a_ops->is_partially_uptodate(page,
-							offset, iter->count))
+							offset, count))
 				goto page_not_up_to_date_locked;
 			unlock_page(page);
 		}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3 V2] mm: don't search past page end in is_partially_uptodate
  2018-12-14 22:14   ` [PATCH 2/3 V2] mm: don't search past page end in is_partially_uptodate Eric Sandeen
@ 2018-12-15 10:49     ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2018-12-15 10:49 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, linux-fsdevel, hch

> Fix this by limiting the range we ask for in the caller
> (generic_file_buffered_read()) so that offset+len doesn't extend past
> the page in question, and remove the clamping from the block variant.

Yes the current prototype is a bit odd..

> +	int (*is_partially_uptodate)(struct page *, unsigned, unsigned);

Can you also make it return a bool while you are at it?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page
  2018-12-13 22:25 ` [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page Eric Sandeen
@ 2018-12-15 10:51   ` Christoph Hellwig
  2018-12-17 23:45     ` Eric Sandeen
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2018-12-15 10:51 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, linux-fsdevel, Christoph, Hellwig, hch

On Thu, Dec 13, 2018 at 04:25:27PM -0600, Eric Sandeen wrote:
> From: Eric Sandeen <sandeen@redhat.com>
> 
> because iomap_page_create initializes the uptodate bitmap using the macro,
> not the open-coded value.

I find the use of SECTOR_SIZE rather confusing, especially as often
the sector size might actually be something else.  It also forces
us to pull in another huge header file.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] iomap: optimize iomap_is_partially_uptodate for full page range
  2018-12-14 14:10     ` Eric Sandeen
@ 2018-12-17 23:21       ` Dave Chinner
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2018-12-17 23:21 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, linux-fsdevel, hch

On Fri, Dec 14, 2018 at 08:10:19AM -0600, Eric Sandeen wrote:
> On 12/13/18 9:05 PM, Dave Chinner wrote:
> > On Thu, Dec 13, 2018 at 04:25:29PM -0600, Eric Sandeen wrote:
> >> From: Eric Sandeen <sandeen@redhat.com>
> >>
> >> We only call ->is_partially_uptodate to look for an uptodate range
> >> within a not-uptodate page.
> >>
> >> If the range covers all blocks in the page, there is no point to checking
> >> each block individually - if the whole range (i.e. the whole page) were
> >> uptodate, the page would be uptodate as well.  Hence in this case, we
> >> can return early and skip the loop.
> >>
> >> This is similar to what is done in block_is_partially_uptodate().
> >>
> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> >> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
> >> ---
> >>  fs/iomap.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/fs/iomap.c b/fs/iomap.c
> >> index ce837d9..7d7d985 100644
> >> --- a/fs/iomap.c
> >> +++ b/fs/iomap.c
> >> @@ -515,6 +515,10 @@ struct iomap_readpage_ctx {
> >>  	first = from >> inode->i_blkbits;
> >>  	last = (from + len - 1) >> inode->i_blkbits;
> >>  
> >> +	/* If page wasn't uptodate and range covers all blocks: no partial */
> >> +	if (first == 0 && last == (PAGE_SIZE - 1) >> inode->i_blkbits)
> >> +		return 0;
> >> +
> > 
> > Even if the range covers the entire page, we still have to check
> > that all the individual blocks in the page are up to date or not.
> > Hence I think this is wrong....
> 
> Didn't PageUptodate(page) answer that question already in the caller?
> 
> The caller does:
> 
>                         if (PageUptodate(page))
>                                 goto page_ok;
> ...
>                         if (!mapping->a_ops->is_partially_uptodate(page,
>                                                         offset, iter->count))
>                                 goto page_not_up_to_date_locked;
> 
> 
> So if it was found to be uptodate, we never called this op.  If we /were/
> called (page was not uptodate) that can only be because at least one block in
> the page is not uptodate.  So what's the need for checking them all again?

AFAIA, ->is_partially_uptodate() is supposed to use page private
data to determine if the page is up to date, not assume that because
the range spans the entire page it's not uptodate.

i.e. if the method relies on the caller to perform certain
-undocumented- actions to ensure that the method does the right
thing, then the API and method is broken. To then make undocumented
optimisations in the method implementations to avoid checking
private state based on the undocumented assumption that callers have
all done some certain undocumented things, well, that's a pretty
awful API and even worse documentation...

> block_is_partially_uptodate does the same thing:
> 
>         if (from < blocksize && to > PAGE_SIZE - blocksize)
>                 return 0;

Which, again, assumes that the caller has done certain things and
that there are specific, fixed, undocumented access and page private
state initialisation assumptions made by the callers of
->is_partially_uptodate().

> Still, this also seems like maybe something the caller should do instead.

*nod*

IMO all the block size vs page size, page locking, truncation
checks, etc should be wrapped up in the ->is_partially_uptodate()
method. i.e. it's really a ->page_range_contains_data() check, and
really has nothing to do with the overall PageUptodate() flag state.

FWIW, if you look at how page_seek_hole_data() uses
->is_partially_uptodate - it calls it multiple times on the same
page asking whether each individual block on the page is up to date.
It doesn't even check PageUptodate() for block size < page size, so
assumptions in ->is_partially_uptodate that PageUptodate() has
already been checked are clearly invalid...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page
  2018-12-15 10:51   ` Christoph Hellwig
@ 2018-12-17 23:45     ` Eric Sandeen
  2018-12-18 18:06       ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2018-12-17 23:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel



On 12/15/18 4:51 AM, Christoph Hellwig wrote:
> On Thu, Dec 13, 2018 at 04:25:27PM -0600, Eric Sandeen wrote:
>> From: Eric Sandeen <sandeen@redhat.com>
>>
>> because iomap_page_create initializes the uptodate bitmap using the macro,
>> not the open-coded value.
> 
> I find the use of SECTOR_SIZE rather confusing, especially as often
> the sector size might actually be something else.  It also forces
> us to pull in another huge header file.

Then we should hard code "512" in iomap_page_create I guess.  Just need
consistency.

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page
  2018-12-17 23:45     ` Eric Sandeen
@ 2018-12-18 18:06       ` Christoph Hellwig
  2018-12-18 18:19         ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2018-12-18 18:06 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel

On Mon, Dec 17, 2018 at 05:45:10PM -0600, Eric Sandeen wrote:
> Then we should hard code "512" in iomap_page_create I guess.  Just need
> consistency.

Fine with me.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page
  2018-12-18 18:06       ` Christoph Hellwig
@ 2018-12-18 18:19         ` Darrick J. Wong
  2018-12-18 18:20           ` Eric Sandeen
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2018-12-18 18:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Eric Sandeen, linux-xfs, linux-fsdevel

On Tue, Dec 18, 2018 at 07:06:17PM +0100, Christoph Hellwig wrote:
> On Mon, Dec 17, 2018 at 05:45:10PM -0600, Eric Sandeen wrote:
> > Then we should hard code "512" in iomap_page_create I guess.  Just need
> > consistency.
> 
> Fine with me.

Please don't just hardcode 512 here.  AFAICT the usage in iomap.c seems
to be "minimum expected fs block size" so that the iop's uptodate bitmap
is sized to handle the worst case blocks-per-page.

Can we please have a "#define IOMAP_MIN_FS_BLOCKSIZE SECTOR_SIZE" to
capture the intent behind the 512?  Or, if you don't want to require all
includers of iomap.h to also have to include blkdev.h, define it to 512
and have a BUILD_BUG_ON somewhere so that we don't leave a subtle bug if
we ever change SECTOR_SIZE?

--D

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page
  2018-12-18 18:19         ` Darrick J. Wong
@ 2018-12-18 18:20           ` Eric Sandeen
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2018-12-18 18:20 UTC (permalink / raw)
  To: Darrick J. Wong, Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel



On 12/18/18 12:19 PM, Darrick J. Wong wrote:
> On Tue, Dec 18, 2018 at 07:06:17PM +0100, Christoph Hellwig wrote:
>> On Mon, Dec 17, 2018 at 05:45:10PM -0600, Eric Sandeen wrote:
>>> Then we should hard code "512" in iomap_page_create I guess.  Just need
>>> consistency.
>>
>> Fine with me.
> 
> Please don't just hardcode 512 here.  AFAICT the usage in iomap.c seems
> to be "minimum expected fs block size" so that the iop's uptodate bitmap
> is sized to handle the worst case blocks-per-page.
> 
> Can we please have a "#define IOMAP_MIN_FS_BLOCKSIZE SECTOR_SIZE" to
> capture the intent behind the 512?  Or, if you don't want to require all
> includers of iomap.h to also have to include blkdev.h, define it to 512
> and have a BUILD_BUG_ON somewhere so that we don't leave a subtle bug if
> we ever change SECTOR_SIZE?

Right, seems like something needs to tie this into the rest of reality
and not just blaze past it if we somehow ever encounter a block size
< 512.

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-12-18 18:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-13 22:25 [PATCH 0/3] iomap: 1 cleanup, 1 fix, 1 optimization Eric Sandeen
2018-12-13 22:25 ` [PATCH 1/3] iomap: use SECTOR_SIZE instead of 512 in iomap_page Eric Sandeen
2018-12-15 10:51   ` Christoph Hellwig
2018-12-17 23:45     ` Eric Sandeen
2018-12-18 18:06       ` Christoph Hellwig
2018-12-18 18:19         ` Darrick J. Wong
2018-12-18 18:20           ` Eric Sandeen
2018-12-13 22:25 ` [PATCH 2/3] iomap: don't search past page end in iomap_is_partially_uptodate Eric Sandeen
2018-12-14  2:57   ` Dave Chinner
2018-12-14 13:48     ` Eric Sandeen
2018-12-14 22:14   ` [PATCH 2/3 V2] mm: don't search past page end in is_partially_uptodate Eric Sandeen
2018-12-15 10:49     ` Christoph Hellwig
2018-12-13 22:25 ` [PATCH 3/3] iomap: optimize iomap_is_partially_uptodate for full page range Eric Sandeen
2018-12-14  3:05   ` Dave Chinner
2018-12-14 14:10     ` Eric Sandeen
2018-12-17 23:21       ` Dave Chinner
2018-12-13 22:27 ` [PATCH 0/3] iomap: 1 cleanup, 1 fix, 1 optimization Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).