linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* page split failures in truncate_inode_pages_range
@ 2021-06-23 18:51 Matthew Wilcox
  2021-06-24 14:32 ` Matthew Wilcox
  0 siblings, 1 reply; 2+ messages in thread
From: Matthew Wilcox @ 2021-06-23 18:51 UTC (permalink / raw)
  To: linux-fsdevel

When we have large pages in the page cache, we can end up in
truncate_inode_pages_range() with an 'lstart' that is in the middle of
a tail page.  My approach has generally been to split the large page,
and that works except when split_huge_page() fails, which it can do at
random due to a racing access having the page refcount elevated.

I've been simulating split_huge_page() failures, and found a problem
I don't know how to solve.  truncate_inode_pages_range() is called
by COLLAPSE_RANGE in order to evict the part of the page cache after
the start of the range being collapsed (any part of the page cache
remaining would now have data for the wrong part of the file in it).
xfs_flush_unmap_range() (and I presume the other filesystems which
support COLLAPSE_RANGE) calls filemap_write_and_wait_range() first,
so we can just drop the partial large page if split doesn't succeed.

But truncate_inode_pages_range() is also called by, for example,
truncate().  In that case, nobody calls filemap_write_and_wait_range(),
so we can't discard the page because it might still be dirty.
Is that an acceptable way to choose behaviour -- if the split fails,
discard the page if it's clean and keep it if it's dirty?  I'll
put a great big comment on it, because it's not entirely obvious.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: page split failures in truncate_inode_pages_range
  2021-06-23 18:51 page split failures in truncate_inode_pages_range Matthew Wilcox
@ 2021-06-24 14:32 ` Matthew Wilcox
  0 siblings, 0 replies; 2+ messages in thread
From: Matthew Wilcox @ 2021-06-24 14:32 UTC (permalink / raw)
  To: linux-fsdevel

On Wed, Jun 23, 2021 at 07:51:18PM +0100, Matthew Wilcox wrote:
> But truncate_inode_pages_range() is also called by, for example,
> truncate().  In that case, nobody calls filemap_write_and_wait_range(),
> so we can't discard the page because it might still be dirty.
> Is that an acceptable way to choose behaviour -- if the split fails,
> discard the page if it's clean and keep it if it's dirty?  I'll
> put a great big comment on it, because it's not entirely obvious.

It seems to be working ...

        if (!folio_multi(folio))
                return true;
//      if (split_huge_page(&folio->page) == 0)
//              return true;
        if (folio_dirty(folio))
                return false;
        truncate_inode_folio(mapping, folio);
        return true;

No additional xfstests failures from commenting out those two lines.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-06-24 14:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-23 18:51 page split failures in truncate_inode_pages_range Matthew Wilcox
2021-06-24 14:32 ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).