linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>,
	 Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	linux-mm@kvack.org,  Hugh Dickins <hughd@google.com>,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: Issue with 8K folio size in __filemap_get_folio()
Date: Sun, 3 Dec 2023 21:57:38 -0800 (PST)	[thread overview]
Message-ID: <90344ea7-4eec-47ee-5996-0c22f42d6a6a@google.com> (raw)
In-Reply-To: <ZW0LQptvuFT9R4bw@casper.infradead.org>

On Sun, 3 Dec 2023, Matthew Wilcox wrote:
> On Sun, Dec 03, 2023 at 09:27:57PM +0000, Matthew Wilcox wrote:
> > I was talking with Darrick on Friday and he convinced me that this is
> > something we're going to need to fix sooner rather than later for the
> > benefit of devices with block size 8kB.  So it's definitely on my todo
> > list, but I haven't investigated in any detail yet.
> 
> OK, here's my initial analysis of just not putting order-1 folios
> on the deferred split list.  folio->_deferred_list is only used in
> mm/huge_memory.c, which makes this a nice simple analysis.
> 
>  - folio_prep_large_rmappable() initialises the list_head.  No problem,
>    just don't do that for order-1 folios.
>  - split_huge_page_to_list() will remove the folio from the split queue.
>    No problem, just don't do that.
>  - folio_undo_large_rmappable() removes it from the list if it's
>    on the list.  Again, no problem, don't do that for order-1 folios.
>  - deferred_split_scan() walks the list, it won't find any order-1
>    folios.
> 
>  - deferred_split_folio() will add the folio to the list.  Returning
>    here will avoid adding the folio to the list.  But what consequences
>    will that have?  Ah.  There's only one caller of
>    deferred_split_folio() and it's in page_remove_rmap() ... and it's
>    only called for anon folios anyway.

Yes, the deferred split business is quite nicely localized,
which makes it easy to avoid.

> 
> So it looks like we can support order-1 folios in the page cache without
> any change in behaviour since file-backed folios were never added to
> the deferred split list.

Yes, you just have to code in a few "don't do that"s as above.

And I think it would be reasonable to allow even anon order-1 folios,
but they'd just be prevented from participating in the deferred split
business.

Allowing a PMD to be split at any time is necessary for correctness;
then allowing the underlying folio to be split is a nice-to-have when
trying to meet competition for memory, but it's not a correctness issue,
and the smaller the folio to be split, the less the saving - a lot of
unneeded order-1s could still waste a lot of memory, but it's not so
serious as with the order-9s.

> 
> Now, is this the right direction?  Is it a bug that we never called
> deferred_split_folio() for pagecache folios?  I would defer to Hugh
> or Kirill on this.  Ccs added.

No, not a bug.

The thing with anon folios is that they only have value while mapped
(let's ignore swap and GUP for simplicity), so when page_remove_rmap()
is called in the unmapping, that's a good hint that the hugeness of
the folio is no longer worthwhile; but it's a bad moment for splitting
because of locks held, and quite possibly a stupid time for splitting
too (because just a part of unmapping a large range, when splitting
would merely slow it all down).  Hence anon's deferred split queue.

But for file pages, the file contents must retain their value whether
mapped or not.  The moment to consider splitting the folio itself is
when truncating or punching a hole; and it happens that there is not
a locking problem then, nor any overhead added to eviction.  Hence no
deferred split queue for file.

Shmem does have its per-superblock shmem_unused_huge_shrink(), for
splitting huge folios at EOF when under memory pressure (and it would
be better if it were not restricted to EOF, but could retry folios
which had been EBUSY or EAGAIN at the time of hole-punch).  Maybe
other large folio filesystems should also have such a shrinker.

Hugh


  reply	other threads:[~2023-12-04  5:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <B467D07C-00D2-47C6-A034-2D88FE88A092@dubeyko.com>
2023-12-03 21:27 ` Issue with 8K folio size in __filemap_get_folio() Matthew Wilcox
2023-12-03 23:12   ` Matthew Wilcox
2023-12-04  5:57     ` Hugh Dickins [this message]
2023-12-04 15:09     ` David Hildenbrand
2023-12-04 16:51       ` David Hildenbrand
2023-12-04 17:17       ` Matthew Wilcox
2023-12-04 17:22         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=90344ea7-4eec-47ee-5996-0c22f42d6a6a@google.com \
    --to=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=slava@dubeyko.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).