linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Andrew Morton <akpm@linux-foundation.org>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] huge tmpfs: try to split_huge_page() when punching hole
Date: Thu, 27 Feb 2020 20:26:46 -0800	[thread overview]
Message-ID: <20200228042646.GF29971@bombadil.infradead.org> (raw)
In-Reply-To: <alpine.LSU.2.11.2002271909250.2026@eggly.anvils>

On Thu, Feb 27, 2020 at 08:04:21PM -0800, Hugh Dickins wrote:
> It's good to consider the implications for hole-punch on a persistent
> filesystem cached with THPs (or lower order compound pages); but I
> disagree that they should behave differently from this patch.
> 
> The hole-punch is fundamentally directed at freeing up the storage, yes;
> but its page cache must also be removed, otherwise you have the user
> writing into cache which is not backed by storage, and potentially losing
> the data later.  So a hole must be punched in the compound page in that
> case too: in fact, it's then much more important that split_huge_page()
> succeeds - not obvious what the fallback should be if it fails (perhaps
> in that case the compound page must be kept, but all its pmds removed,
> and info on holes kept in spare fields of the compound page, to prevent
> writes and write faults without calling back into the filesystem:
> soluble, but more work than tmpfs needs today)(and perhaps when that
> extra work is done, we would choose to rely on it rather than
> immediately splitting; but it will involve discounting the holes).

Ooh, a topic that reasonable people can disagree on!

The current prototype I have will allocate (huge) pages and then
ask the filesystem to fill them.  The filesystem may well find that
the extent is a hole, and if it is, it will fill the page with zeroes.
Then, the application may write to those pages, and if it does, the
filesystem will be notified to create an on-disk extent for that write.

I haven't looked at the hole-punch path in detail, but presumably it
notifies the filesystem to create a hole extent and zeroes out the
pagecache for that range (possibly by removing entire pages, and with
memset for partial pages).  Then a subsequent write to the hole will
cause the filesystem to allocate a new non-hole extent, just like the
previous case.

I think it's reasonable for the page cache to interpret a hole-punch
request as being a hint that the hole is unlikely to be accessed again,
so allocating new smaller pages for that region of the file (or just
writing back & dropping the covering page altogether) would seem like
a reasonable implementation decision.

However, it also seems reasonable that just memset() of the affected
region and leaving the page intact would also be an acceptable
implementation.  As long as writes to the newly-created hole cause the
page to become dirtied and thus writeback to be in effect.  It probably
wouldn't be as good an implementation, but it shouldn't lose writes as
you suggest above.

I'm not sure I'd choose to split a large page into smaller pages.  I think
I'd prefer to allocate lower-order pages and memcpy() the data over.
Again, that's an implementation choice, and not something that should
be visible outside the implementation.

[1] http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/xarray-pagecache

  reply	other threads:[~2020-02-28  4:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-27  4:06 [PATCH] huge tmpfs: try to split_huge_page() when punching hole Hugh Dickins
2020-02-27  8:47 ` Kirill A. Shutemov
2020-02-28  4:04   ` Hugh Dickins
2020-02-28  4:26     ` Matthew Wilcox [this message]
2020-02-28 12:54       ` Kirill A. Shutemov
2020-02-28 22:28         ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200228042646.GF29971@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mst@redhat.com \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).