All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [GIT PULL] Memory folios for v5.15
Date: Fri, 27 Aug 2021 10:07:16 -0400	[thread overview]
Message-ID: <YSjxlNl9jeEX2Yff@cmpxchg.org> (raw)
In-Reply-To: <20210826004555.GF12597@magnolia>

On Wed, Aug 25, 2021 at 05:45:55PM -0700, Darrick J. Wong wrote:
> Pardon my ignorance, but ... how would adding yet another layer help a
> filesystem?  No matter how the software is structured, we have to set up
> and manage the (hardware) page state for programs, and we must keep that
> coherent with the file space mappings that we maintain.  I already know
> how to deal with pages and dealing with "folios" seems about the same.
> Adding another layer of caching structures just adds another layer of
> cra^Wcoherency management for a filesystem to screw up.
> 
> The folios change management of memory pages enough to disentangle the
> page/compound page confusion that exists now, and it seems like a
> reasonable means to supporting unreasonable things like copy on write
> storage for filesystems with a 56k block size.
> 
> (And I'm sure I'll get tons of blowback for this, but XFS can manage
> space in weird units like that (configure the rt volume, set a 56k rt
> extent size, and all the allocations are multiples of 56k); if we ever
> wanted to support reflink on /that/ hot mess, it would be awesome to be
> able to say that we're only going to do 56k folios in the page cache for
> those files instead of the crazy writeback games that the prototype
> patchset does now.)

I'm guessing the reason you want 56k blocks is because with a larger
filesystems and faster drives it would be a more reasonable unit for
managing this amount of data than 4k would be.

We have the same thoughts in MM and growing memory sizes. The DAX
stuff said from the start it won't be built on linear struct page
mappings anymore because we expect the memory modules to be too big to
manage them with such fine-grained granularity. But in practice, this
is more and more becoming true for DRAM as well. We don't want to
allocate gigabytes of struct page when on our servers only a very
small share of overall memory needs to be managed at this granularity.

Folio perpetuates the problem of the base page being the floor for
cache granularity, and so from an MM POV it doesn't allow us to scale
up to current memory sizes without horribly regressing certain
filesystem workloads that still need us to be able to scale down.


But there is something more important that I wish more MM people would
engage on: When you ask for 56k/2M/whatever buffers, the MM has to be
able to *allocate* them.

I'm assuming that while you certainly have preferences, you don't rely
too much on whether that memory is composed of a contiguous chunk of
4k pages, a single 56k page, a part of a 2M page, or maybe even
discontig 4k chunks with an SG API. You want to manage your disk space
one way, but you could afford the MM some flexibility to do the right
thing under different levels of memory load, and allow it to scale in
the direction it needs for its own purposes.

But if folios are also the low-level compound pages used throughout
the MM code, we're tying these fs allocations to the requirement of
being physically contiguous. This is a much more difficult allocation
problem. And from the MM side, we have a pretty poor track record of
serving contiguous memory larger than the base page size.

Since forever have non-MM people assumed that because the page
allocator takes an order argument you could make arbitrary 2^n
requests. When they inevitably complain that it doesn't work, even
under light loads, we tell them "lol order-0 or good luck".

Compaction has improved our ability to serve these requests, but only
*if you bring the time for defragmentation*. Many allocations
don't. THP has been around for years, but honestly it doesn't really
work in general purpose environments. Yeah if you have some HPC number
cruncher that allocates all the anon at startup and then runs for
hours, it's fine. But in a more dynamic environment after some uptime,
the MM code just isn't able to produce these larger pages reliably and
within a reasonable deadline. I'm assuming filesystem workloads won't
bring the necessary patience for this either.

We've effectively declared bankruptcy on this already. Many requests
have been replaced with kvmalloc(), and THP has been mostly relegated
to the optimistic background tinkering of khugepaged. You can't rely
on it, so you need to structure your expectations around it, and
perform well when it isn't. This will apply to filesystems as well.

I really don't think it makes sense to discuss folios as the means for
enabling huge pages in the page cache, without also taking a long hard
look at the allocation model that is supposed to back them. Because
you can't make it happen without that. And this part isn't looking so
hot to me, tbh.

Willy says he has future ideas to make compound pages scale. But we
have years of history saying this is incredibly hard to achieve - and
it certainly wasn't for a lack of constant trying.

Decoupling the filesystems from struct page is a necessary step. I can
also see an argument for abstracting away compound pages to clean up
the compound_head() mess in all the helpers (although I'm still not
convinced the wholesale replacement of the page concept is the best
way to achieve this). But combining the two objectives, and making
compound pages the basis for huge page cache - after everything we
know about higher-order allocs - seems like a stretch to me.

  reply	other threads:[~2021-08-27 14:05 UTC|newest]

Thread overview: 175+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-23 19:01 [GIT PULL] Memory folios for v5.15 Matthew Wilcox
2021-08-23 21:26 ` Johannes Weiner
2021-08-23 22:06   ` Linus Torvalds
2021-08-23 22:06     ` Linus Torvalds
2021-08-24  2:20     ` Matthew Wilcox
2021-08-24 13:04     ` Matthew Wilcox
2021-08-23 22:15   ` Matthew Wilcox
2021-08-24 18:32     ` Johannes Weiner
2021-08-24 18:59       ` Linus Torvalds
2021-08-24 18:59         ` Linus Torvalds
2021-08-25  6:39         ` Christoph Hellwig
2021-08-24 19:44       ` Matthew Wilcox
2021-08-25 15:13         ` Johannes Weiner
2021-08-26  0:45           ` Darrick J. Wong
2021-08-27 14:07             ` Johannes Weiner [this message]
2021-08-27 18:44               ` Matthew Wilcox
2021-08-27 21:41                 ` Dan Williams
2021-08-27 21:41                   ` Dan Williams
2021-08-27 21:49                   ` Matthew Wilcox
2021-08-30 17:32                 ` Johannes Weiner
2021-08-30 18:22                   ` Matthew Wilcox
2021-08-30 20:27                     ` Johannes Weiner
2021-08-30 21:38                       ` Matthew Wilcox
2021-08-31 17:40                         ` Vlastimil Babka
2021-09-01 17:43                         ` Johannes Weiner
2021-09-02 15:13                           ` Zi Yan
2021-09-06 14:00                             ` Vlastimil Babka
2021-08-31 18:50                       ` Eric W. Biederman
2021-08-31 18:50                         ` Eric W. Biederman
2021-08-26  8:58         ` David Howells
2021-08-27 10:03           ` Johannes Weiner
2021-08-27 12:05             ` Matthew Wilcox
2021-08-27 10:49           ` David Howells
2021-08-24 15:54   ` David Howells
2021-08-24 17:56     ` Matthew Wilcox
2021-08-24 18:26       ` Linus Torvalds
2021-08-24 18:26         ` Linus Torvalds
2021-08-24 18:29         ` Linus Torvalds
2021-08-24 18:29           ` Linus Torvalds
2021-08-24 19:26           ` Theodore Ts'o
2021-08-24 19:34           ` David Howells
2021-08-24 20:02             ` Theodore Ts'o
2021-08-24 21:32             ` David Howells
2021-08-25 12:08               ` Jeff Layton
2021-08-24 19:01         ` Matthew Wilcox
2021-08-24 19:11           ` Linus Torvalds
2021-08-24 19:11             ` Linus Torvalds
2021-08-24 19:23             ` Matthew Wilcox
2021-08-24 19:44               ` Theodore Ts'o
2021-08-24 20:00                 ` Matthew Wilcox
2021-08-25  6:32                 ` Christoph Hellwig
2021-08-25  9:01                   ` Rasmus Villemoes
2021-08-26  6:32                     ` Amir Goldstein
2021-08-26  6:32                       ` Amir Goldstein
2021-08-25 12:03                   ` Jeff Layton
2021-08-25 12:03                     ` Jeff Layton
2021-08-26  0:59                     ` Darrick J. Wong
2021-08-26  4:02                   ` Nicholas Piggin
2021-09-01 12:58                 ` Mike Rapoport
2021-08-24 19:35             ` David Howells
2021-08-24 20:35               ` Vlastimil Babka
2021-08-24 20:40                 ` Vlastimil Babka
2021-08-24 19:11         ` David Howells
2021-08-24 19:25           ` Linus Torvalds
2021-08-24 19:25             ` Linus Torvalds
2021-08-24 19:38             ` Linus Torvalds
2021-08-24 19:38               ` Linus Torvalds
2021-08-24 19:48               ` Linus Torvalds
2021-08-24 19:48                 ` Linus Torvalds
2021-08-26 17:18                 ` Matthew Wilcox
2021-08-24 19:59             ` David Howells
2021-10-05 13:52   ` Matthew Wilcox
2021-10-05 17:29     ` Johannes Weiner
2021-10-05 17:32       ` David Hildenbrand
2021-10-05 18:30       ` Matthew Wilcox
2021-10-05 19:56         ` Jason Gunthorpe
2021-08-28  3:29 ` Matthew Wilcox
2021-09-09 12:43 ` Christoph Hellwig
2021-09-09 13:56   ` Vlastimil Babka
2021-09-09 18:16     ` Johannes Weiner
2021-09-09 18:44       ` Matthew Wilcox
2021-09-09 22:03         ` Johannes Weiner
2021-09-09 22:48           ` Matthew Wilcox
2021-09-09 19:17     ` John Hubbard
2021-09-09 19:23       ` Matthew Wilcox
2021-09-10 20:16 ` Folio discussion recap Kent Overstreet
2021-09-11  1:23   ` Kirill A. Shutemov
2021-09-13 11:32     ` Michal Hocko
2021-09-13 18:12       ` Johannes Weiner
2021-09-15 15:40   ` Johannes Weiner
2021-09-15 17:55     ` Damian Tometzki
2021-09-16  2:58     ` Darrick J. Wong
2021-09-16 16:54       ` Johannes Weiner
2021-09-17  5:24         ` Dave Chinner
2021-09-17  7:18           ` Christoph Hellwig
2021-09-17 16:31           ` Johannes Weiner
2021-09-17 20:57             ` Kirill A. Shutemov
2021-09-17 21:17               ` Kent Overstreet
2021-09-17 22:02                 ` Kirill A. Shutemov
2021-09-17 22:21                   ` Kent Overstreet
2021-09-17 23:15               ` Johannes Weiner
2021-09-20 10:03                 ` Kirill A. Shutemov
2021-09-17 21:13             ` Kent Overstreet
2021-09-17 22:25               ` Theodore Ts'o
2021-09-17 23:35                 ` Josef Bacik
2021-09-18  1:04             ` Dave Chinner
2021-09-18  4:51               ` Kent Overstreet
2021-09-20  1:04                 ` Dave Chinner
2021-09-16 21:58       ` David Howells
2021-09-20  2:17   ` Matthew Wilcox
2021-09-21 19:47     ` Johannes Weiner
2021-09-21 20:38       ` Matthew Wilcox
2021-09-21 21:11         ` Kent Overstreet
2021-09-21 21:22           ` Folios for 5.15 request - Was: re: Folio discussion recap - Kent Overstreet
2021-09-22 15:08             ` Johannes Weiner
2021-09-22 15:46               ` Kent Overstreet
2021-09-22 16:26                 ` Matthew Wilcox
2021-09-22 16:56                   ` Chris Mason
2021-09-22 19:54                     ` Matthew Wilcox
2021-09-22 20:15                       ` Kent Overstreet
2021-09-22 20:21                       ` Linus Torvalds
2021-09-22 20:21                         ` Linus Torvalds
2021-09-23  5:42               ` Kent Overstreet
2021-09-23 18:00                 ` Johannes Weiner
2021-09-23 19:31                   ` Matthew Wilcox
2021-09-23 20:20                   ` Kent Overstreet
2021-10-16  3:28               ` Matthew Wilcox
2021-10-18 16:47                 ` Johannes Weiner
2021-10-18 18:12                   ` Kent Overstreet
2021-10-18 20:45                     ` Johannes Weiner
2021-10-19 16:11                       ` Splitting struct page into multiple types " Kent Overstreet
2021-10-19 17:06                         ` Gao Xiang
2021-10-19 17:34                           ` Matthew Wilcox
2021-10-19 17:54                             ` Gao Xiang
2021-10-20 17:46                               ` Kent Overstreet
2021-10-19 17:37                         ` Jason Gunthorpe
2021-10-19 21:14                       ` David Howells
2021-10-18 18:28                   ` Folios for 5.15 request " Matthew Wilcox
2021-10-18 21:56                     ` Johannes Weiner
2021-10-18 23:16                       ` Kirill A. Shutemov
2021-10-19 15:16                         ` Johannes Weiner
2021-10-20  3:19                           ` Matthew Wilcox
2021-10-20  7:50                           ` David Hildenbrand
2021-10-20 17:26                             ` Matthew Wilcox
2021-10-20 18:04                               ` David Hildenbrand
2021-10-21  6:51                                 ` Christoph Hellwig
2021-10-21  7:21                                   ` David Hildenbrand
2021-10-21 12:03                                     ` Kent Overstreet
2021-10-21 12:35                                       ` David Hildenbrand
2021-10-21 12:38                                         ` Christoph Hellwig
2021-10-21 13:00                                           ` David Hildenbrand
2021-10-21 12:41                                         ` Matthew Wilcox
2021-10-20 17:39                           ` Kent Overstreet
2021-10-21 21:37                             ` Johannes Weiner
2021-10-22  1:52                               ` Matthew Wilcox
2021-10-22  7:59                                 ` David Hildenbrand
2021-10-22 13:01                                   ` Matthew Wilcox
2021-10-22 14:40                                     ` David Hildenbrand
2021-10-23  2:22                                       ` Matthew Wilcox
2021-10-23  5:02                                         ` Christoph Hellwig
2021-10-23  9:58                                         ` David Hildenbrand
2021-10-23 16:00                                           ` Kent Overstreet
2021-10-23 21:41                                             ` Matthew Wilcox
2021-10-23 22:23                                               ` Kent Overstreet
2021-10-25 15:35                                 ` Johannes Weiner
2021-10-25 15:52                                   ` Matthew Wilcox
2021-10-25 16:05                                   ` Kent Overstreet
2021-10-16 19:07               ` Matthew Wilcox
2021-10-18 17:25                 ` Johannes Weiner
2021-09-21 22:18           ` Folio discussion recap Matthew Wilcox
2021-09-23  0:45             ` Ira Weiny
2021-09-23  3:41               ` Matthew Wilcox
2021-09-23 22:12                 ` Ira Weiny
2021-09-29 15:24                   ` Matthew Wilcox
2021-09-21 21:59         ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YSjxlNl9jeEX2Yff@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=djwong@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.