linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-xfs@vger.kernel.org
Subject: Re: Splitting a THP beyond EOF
Date: Tue, 20 Oct 2020 15:59:28 +1100	[thread overview]
Message-ID: <20201020045928.GO7391@dread.disaster.area> (raw)
In-Reply-To: <20201020014357.GW20115@casper.infradead.org>

On Tue, Oct 20, 2020 at 02:43:57AM +0100, Matthew Wilcox wrote:
> This is a weird one ... which is good because it means the obvious
> ones have been fixed and now I'm just tripping over the weird cases.
> And fortunately, xfstests exercises the weird cases.
> 
> 1. The file is 0x3d000 bytes long.
> 2. A readahead allocates an order-2 THP for 0x3c000-0x3ffff
> 3. We simulate a read error for 0x3c000-0x3cfff
> 4. Userspace writes to 0x3d697 to 0x3dfaa

So this is a write() beyond EOF, yes?

If yes, then we first go through this path:

	xfs_file_buffered_aio_write()
	  xfs_file_aio_write_checks()
	    iomap_zero_range(isize, pos - isize)

To zero the region between the current EOF and where the new write
starts. i.e. from 0x3d000 to 0x3d696.

> 5. iomap_write_begin() gets the 0x3c page, sees it's THP and !Uptodate
>    so it calls iomap_split_page() (passing page 0x3d)

Splitting the page because it's !Uptodate seems rather drastic to
me.  Why does it need to split the page here?

Also, this concerns me: if we are exposing the cached EOF page via
mmap, it needs to contain only zeroes in the region beyond EOF so
that we don't expose stale data to userspace. Hence when a THP that
contains EOF is instantiated, we have to ensure that the region
beyond EOF is compeltely zeroed. It then follows that if we read all
the data in that THP up to EOF, then the page is actually up to
date...

And hence, I don't see why we'd need to split it just because we
got a small, unaligned write beyond EOF....

> 6. iomap_split_page() calls split_huge_page()
> 7. split_huge_page() sees that page 0x3d is beyond EOF, so it removes it
>    from i_pages
> 8. iomap_write_actor() copies the data into page 0x3d
> 9. The write is lost.
> 
> Trying to persuade XFS to update i_size before calling
> iomap_file_buffered_write() seems like a bad idea.

Agreed, I can't see anything good coming from doing that...

> Changing split_huge_page() to disregard i_size() is something I kind
> of want to be able to do long-term in order to make hole-punch more
> efficient, but that seems like a lot of work right now.
> 
> I think the easiest way to fix this is to decline to allocate readahead
> pages beyond EOF.  That is, if we have a file which is, say, 61 pages
> long, read the last 5 pages into an order-2 THP and an order-0 THP
> instead of allocating an order-3 THP and zeroing the last three pages.

I think you need to consider zeroing the THP range beyond EOF when
doing readahead....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


  reply	other threads:[~2020-10-20  4:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-20  1:43 Splitting a THP beyond EOF Matthew Wilcox
2020-10-20  4:59 ` Dave Chinner [this message]
2020-10-20 11:21   ` Matthew Wilcox
2020-10-20 21:16     ` Dave Chinner
2020-10-20 22:53       ` Matthew Wilcox
2020-10-21 22:14         ` Dave Chinner
2020-10-21 23:04           ` Matthew Wilcox
2020-10-27  5:31             ` Dave Chinner
2020-10-27 12:14               ` Matthew Wilcox
2020-10-20 14:26 ` Matthew Wilcox
2020-10-20 14:32 ` Chris Mason
2020-10-21  0:23   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201020045928.GO7391@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).