* [PATCH next 2/3] shmem: Fix data loss when folio truncated
@ 2022-01-03 1:34 Hugh Dickins
2022-01-07 15:53 ` Matthew Wilcox
0 siblings, 1 reply; 4+ messages in thread
From: Hugh Dickins @ 2022-01-03 1:34 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Christoph Hellwig, Jan Kara, William Kucharski,
linux-fsdevel, linux-mm
xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well,
three of those _require_odirect, enabled by a shmem_direct_IO() stub),
but still fail even with the partial_end fix.
generic/098 output mismatch shows actual data loss:
--- tests/generic/098.out
+++ /home/hughd/xfstests/results//generic/098.out.bad
@@ -4,9 +4,7 @@
wrote 32768/32768 bytes at offset 262144
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
File content after remount:
-0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
-*
-0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
The problem here is that shmem_getpage(,,,SGP_READ) intentionally
supplies NULL page beyond EOF, and truncation and eviction intentionally
lower i_size before shmem_undo_range() is called: so a whole folio got
truncated instead of being treated partially.
That could be solved by adding yet another SGP_mode to select the
required behaviour, but it's cleaner just to handle cache and then swap
in shmem_get_folio() - renamed here to shmem_get_partial_folio(), given
an easier interface, and moved next to its sole user, shmem_undo_range().
We certainly do not want to read data back from swap when evicting an
inode: i_size preset to 0 still ensures that. Nor do we want to zero
folio data when evicting: truncate_inode_partial_folio()'s check for
length == folio_size(folio) already ensures that.
Fixes: 8842c9c23524 ("truncate,shmem: Handle truncates that split large folios")
Signed-off-by: Hugh Dickins <hughd@google.com>
---
mm/shmem.c | 39 ++++++++++++++++++++++++---------------
1 file changed, 24 insertions(+), 15 deletions(-)
--- hughd1/mm/shmem.c
+++ hughd2/mm/shmem.c
@@ -151,19 +151,6 @@ int shmem_getpage(struct inode *inode, p
mapping_gfp_mask(inode->i_mapping), NULL, NULL, NULL);
}
-static int shmem_get_folio(struct inode *inode, pgoff_t index,
- struct folio **foliop, enum sgp_type sgp)
-{
- struct page *page = NULL;
- int ret = shmem_getpage(inode, index, &page, sgp);
-
- if (page)
- *foliop = page_folio(page);
- else
- *foliop = NULL;
- return ret;
-}
-
static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
{
return sb->s_fs_info;
@@ -894,6 +881,28 @@ void shmem_unlock_mapping(struct address
}
}
+static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
+{
+ struct folio *folio;
+ struct page *page;
+
+ /*
+ * At first avoid shmem_getpage(,,,SGP_READ): that fails
+ * beyond i_size, and reports fallocated pages as holes.
+ */
+ folio = __filemap_get_folio(inode->i_mapping, index,
+ FGP_ENTRY | FGP_LOCK, 0);
+ if (!folio || !xa_is_value(folio))
+ return folio;
+ /*
+ * But read a page back from swap if any of it is within i_size
+ * (although in some cases this is just a waste of time).
+ */
+ page = NULL;
+ shmem_getpage(inode, index, &page, SGP_READ);
+ return page ? page_folio(page) : NULL;
+}
+
/*
* Remove range of pages and swap entries from page cache, and free them.
* If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
@@ -948,7 +957,7 @@ static void shmem_undo_range(struct inod
}
same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT);
- shmem_get_folio(inode, lstart >> PAGE_SHIFT, &folio, SGP_READ);
+ folio = shmem_get_partial_folio(inode, lstart >> PAGE_SHIFT);
if (folio) {
same_folio = lend < folio_pos(folio) + folio_size(folio);
folio_mark_dirty(folio);
@@ -963,7 +972,7 @@ static void shmem_undo_range(struct inod
}
if (!same_folio)
- shmem_get_folio(inode, lend >> PAGE_SHIFT, &folio, SGP_READ);
+ folio = shmem_get_partial_folio(inode, lend >> PAGE_SHIFT);
if (folio) {
folio_mark_dirty(folio);
if (!truncate_inode_partial_folio(folio, lstart, lend))
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH next 2/3] shmem: Fix data loss when folio truncated
2022-01-03 1:34 [PATCH next 2/3] shmem: Fix data loss when folio truncated Hugh Dickins
@ 2022-01-07 15:53 ` Matthew Wilcox
2022-01-08 17:12 ` Hugh Dickins
0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2022-01-07 15:53 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Christoph Hellwig, Jan Kara, William Kucharski,
linux-fsdevel, linux-mm
On Sun, Jan 02, 2022 at 05:34:05PM -0800, Hugh Dickins wrote:
> xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well,
> three of those _require_odirect, enabled by a shmem_direct_IO() stub),
> but still fail even with the partial_end fix.
>
> generic/098 output mismatch shows actual data loss:
> --- tests/generic/098.out
> +++ /home/hughd/xfstests/results//generic/098.out.bad
> @@ -4,9 +4,7 @@
> wrote 32768/32768 bytes at offset 262144
> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> File content after remount:
> -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> -*
> -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ...
generic/098 is passing for me ;-( I'm using 'always' for THPs.
I'll have to try harder. Regardless, I think your fix is good ...
> +static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
Love the better calling convention.
> + folio = __filemap_get_folio(inode->i_mapping, index,
> + FGP_ENTRY | FGP_LOCK, 0);
> + if (!folio || !xa_is_value(folio))
> + return folio;
That first '!folio' is redundant. xa_is_value(NULL) is false.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH next 2/3] shmem: Fix data loss when folio truncated
2022-01-07 15:53 ` Matthew Wilcox
@ 2022-01-08 17:12 ` Hugh Dickins
2022-01-08 21:25 ` Matthew Wilcox
0 siblings, 1 reply; 4+ messages in thread
From: Hugh Dickins @ 2022-01-08 17:12 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Hugh Dickins, Andrew Morton, Christoph Hellwig, Jan Kara,
William Kucharski, linux-fsdevel, linux-mm
On Fri, 7 Jan 2022, Matthew Wilcox wrote:
> On Sun, Jan 02, 2022 at 05:34:05PM -0800, Hugh Dickins wrote:
> > xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well,
> > three of those _require_odirect, enabled by a shmem_direct_IO() stub),
> > but still fail even with the partial_end fix.
> >
> > generic/098 output mismatch shows actual data loss:
> > --- tests/generic/098.out
> > +++ /home/hughd/xfstests/results//generic/098.out.bad
> > @@ -4,9 +4,7 @@
> > wrote 32768/32768 bytes at offset 262144
> > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > File content after remount:
> > -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> > -*
> > -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > ...
>
> generic/098 is passing for me ;-( I'm using 'always' for THPs.
> I'll have to try harder. Regardless, I think your fix is good ...
Worrying that the test behaves differently. Your 'always':
you have '-o huge=always' in the exported TMPFS_MOUNT_OPTIONS?
That should be enough, but I admit to belt and braces by also
echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled
Hugh
I also back up with
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH next 2/3] shmem: Fix data loss when folio truncated
2022-01-08 17:12 ` Hugh Dickins
@ 2022-01-08 21:25 ` Matthew Wilcox
0 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2022-01-08 21:25 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Christoph Hellwig, Jan Kara, William Kucharski,
linux-fsdevel, linux-mm
On Sat, Jan 08, 2022 at 09:12:08AM -0800, Hugh Dickins wrote:
> On Fri, 7 Jan 2022, Matthew Wilcox wrote:
> > On Sun, Jan 02, 2022 at 05:34:05PM -0800, Hugh Dickins wrote:
> > > xfstests generic 098 214 263 286 412 used to pass on huge tmpfs (well,
> > > three of those _require_odirect, enabled by a shmem_direct_IO() stub),
> > > but still fail even with the partial_end fix.
> > >
> > > generic/098 output mismatch shows actual data loss:
> > > --- tests/generic/098.out
> > > +++ /home/hughd/xfstests/results//generic/098.out.bad
> > > @@ -4,9 +4,7 @@
> > > wrote 32768/32768 bytes at offset 262144
> > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > > File content after remount:
> > > -0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
> > > -*
> > > -0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > +0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > ...
> >
> > generic/098 is passing for me ;-( I'm using 'always' for THPs.
> > I'll have to try harder. Regardless, I think your fix is good ...
>
> Worrying that the test behaves differently. Your 'always':
> you have '-o huge=always' in the exported TMPFS_MOUNT_OPTIONS?
> That should be enough, but I admit to belt and braces by also
> echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled
Ah, I hadn't done TMPFS_MOUNT_OPTIONS, just the
echo always >/sys/kernel/mm/transparent_hugepage/shmem_enabled
Adding TMPFS_MOUNT_OPTIONS and retrying with what I originally posted
reproduces the bug. Retrying with the current for-next branch doesn't.
So now I can confirm that there was a bug and your patch fixed it.
And maybe I can avoid introducing more bugs of this nature in the future.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-01-08 21:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-03 1:34 [PATCH next 2/3] shmem: Fix data loss when folio truncated Hugh Dickins
2022-01-07 15:53 ` Matthew Wilcox
2022-01-08 17:12 ` Hugh Dickins
2022-01-08 21:25 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).