All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Matthew Wilcox <willy@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	David Howells <dhowells@redhat.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: Folios for 5.15 request - Was: re: Folio discussion recap -
Date: Fri, 22 Oct 2021 09:59:05 +0200	[thread overview]
Message-ID: <326b5796-6ef9-a08f-a671-4da4b04a2b4f@redhat.com> (raw)
In-Reply-To: <YXIZX0truEBv2YSz@casper.infradead.org>

On 22.10.21 03:52, Matthew Wilcox wrote:
> On Thu, Oct 21, 2021 at 05:37:41PM -0400, Johannes Weiner wrote:
>> Here is my summary of the discussion, and my conclusion:
> 
> Thank you for this.  It's the clearest, most useful post on this thread,
> including my own.  It really highlights the substantial points that
> should be discussed.
> 
>> The premise of the folio was initially to simply be a type that says:
>> I'm the headpage for one or more pages. Never a tailpage. Cool.
>>
>> However, after we talked about what that actually means, we seem to
>> have some consensus on the following:
>>
>> 	1) If folio is to be a generic headpage, it'll be the new
>> 	   dumping ground for slab, network, drivers etc. Nobody is
>> 	   psyched about this, hence the idea to split the page into
>> 	   subtypes which already resulted in the struct slab patches.
>>
>> 	2) If higher-order allocations are going to be the norm, it's
>> 	   wasteful to statically allocate full descriptors at a 4k
>> 	   granularity. Hence the push to eliminate overloading and do
>> 	   on-demand allocation of necessary descriptor space.
>>
>> I think that's accurate, but for the record: is there anybody who
>> disagrees with this and insists that struct folio should continue to
>> be the dumping ground for all kinds of memory types?
> 
> I think there's a useful distinction to be drawn between "where we're
> going with this patchset", "where we're going in the next six-twelve
> months" and "where we're going eventually".  I think we have minor
> differences of opinion on the answers to those questions, and they can
> be resolved as we go, instead of up-front.
> 
> My answer to that question is that, while this full conversion is not
> part of this patch, struct folio is logically:
> 
> struct folio {
> 	... almost everything that's currently in struct page ...
> };
> 
> struct page {
>     unsigned long flags;
>     unsigned long compound_head;
>     union {
>         struct { /* First tail page only */
>             unsigned char compound_dtor;
>             unsigned char compound_order;
>             atomic_t compound_mapcount;
>             unsigned int compound_nr;
>         };
>         struct { /* Second tail page only */
>             atomic_t hpage_pinned_refcount;
>             struct list_head deferred_list;
>         };
>         unsigned long padding1[4];
>     };
>     unsigned int padding2[2];
> #ifdef CONFIG_MEMCG
>     unsigned long padding3;
> #endif
> #ifdef WANT_PAGE_VIRTUAL
>     void *virtual;
> #endif
> #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
>     int _last_cpupid;
> #endif
> };
> 
> (I'm open to being told I have some of that wrong, eg maybe _last_cpupid
> is actually part of struct folio and isn't a per-page property at all)
> 
> I'd like to get there in the next year.  I think dynamically allocating
> memory descriptors is more than a year out.
> 
> Now, as far as struct folio being a dumping group, I would like to
> split other things out from struct folio.  Let me address that below.
> 
>> Let's assume the answer is "no" for now and move on.
>>
>> If folios are NOT the common headpage type, it begs two questions:
>>
>> 	1) What subtype(s) of page SHOULD it represent?
>>
>> 	   This is somewhat unclear at this time. Some say file+anon.
>> 	   It's also been suggested everything userspace-mappable, but
>> 	   that would again bring back major type punning. Who knows?
>>
>> 	   Vocal proponents of the folio type have made conflicting
>> 	   statements on this, which certainly gives me pause.
>>
>> 	2) What IS the common type used for attributes and code shared
>> 	   between subtypes?
>>
>> 	   For example: if a folio is anon+file, then the code that
>>            maps memory to userspace needs a generic type in order to
>>            map both folios and network pages. Same as the page table
>>            walkers, and things like GUP.
>>
>> 	   Will this common type be struct page? Something new? Are we
>> 	   going to duplicate the implementation for each subtype?
>>
>> 	   Another example: GUP can return tailpages. I don't see how
>> 	   it could return folio with even its most generic definition
>> 	   of "headpage".
>>
>> (But bottomline, it's not clear how folio can be the universal
>> headpage type and simultaneously avoid being the type dumping ground
>> that the page was. Maybe I'm not creative enough?)
> 
> This whole section is predicated on "If it is NOT the headpage type",
> but I think this is a great list of why it _should_ be the generic
> headpage type.
> 
> To answer a questions in here; GUP should continue to return precise
> pages because that's what its callers expect.  But we should have a
> better interface than GUP which returns a rather more compressed list
> (something like today's biovec).
> 
>> Anyway. I can even be convinved that we can figure out the exact fault
>> lines along which we split the page down the road.
>>
>> My worry is more about 2). A shared type and generic code is likely to
>> emerge regardless of how we split it. Think about it, the only world
>> in which that isn't true would be one in which either
>>
>> 	a) page subtypes are all the same, or
>> 	b) the subtypes have nothing in common
>>
>> and both are clearly bogus.
> 
> Amen!
> 
> I'm convinced that pgtable, slab and zsmalloc uses of struct page can all
> be split out into their own types instead of being folios.  They have
> little-to-nothing in common with anon+file; they can't be mapped into
> userspace and they can't be on the LRU.  The only situation you can find
> them in is something like compaction which walks PFNs.
> 
> I don't think we can split out ZONE_DEVICE and netpool into their own
> types.  While they can't be on the LRU, they can be mapped to userspace,
> like random device drivers.  So they can be found by GUP, and we want
> (need) to be able to go to folio from there in order to get, lock and
> set a folio as dirty.  Also, they have a mapcount as well as a refcount.
> 
> The real question, I think, is whether it's worth splitting anon & file
> pages out from generic pages.  I can see arguments for it, but I can also
> see arguments against it (whether it's two types: lru_mem and folio,
> three types: anon_mem, file_mem and folio or even four types: ksm_mem,
> anon_mem and file_mem).  I don't think a compelling argument has been
> made either way.
> 
> Perhaps you could comment on how you'd see separate anon_mem and
> file_mem types working for the memcg code?  Would you want to have
> separate lock_anon_memcg() and lock_file_memcg(), or would you want
> them to be cast to a common type like lock_folio_memcg()?

FWIW,

something like this would roughly express what I've been mumbling about:

anon_mem    file_mem
   |            |
   ------|------
      lru_mem       slab
         |           |
         -------------
               |
	      page

I wouldn't include folios in this picture, because IMHO folios as of now
are actually what we want to be "lru_mem", just which a much clearer
name+description (again, IMHO).

Going from file_mem -> page is easy, just casting pointers.
Going from page -> file_mem requires going to the head page if it's a
compound page.

But we expect most interfaces to pass around a proper type (e.g.,
lru_mem) instead of a page, which avoids having to lookup the compund
head page. And each function can express which type it actually wants to
consume. The filmap API wants to consume file_mem, so it should use that.

And IMHO, with something above in mind and not having a clue which
additional layers we'll really need, or which additional leaves we want
to have, we would start with the leaves (e.g., file_mem, anon_mem, slab)
and work our way towards the root. Just like we already started with slab.

Maybe that makes sense.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-10-22  7:59 UTC|newest]

Thread overview: 175+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-23 19:01 [GIT PULL] Memory folios for v5.15 Matthew Wilcox
2021-08-23 21:26 ` Johannes Weiner
2021-08-23 22:06   ` Linus Torvalds
2021-08-23 22:06     ` Linus Torvalds
2021-08-24  2:20     ` Matthew Wilcox
2021-08-24 13:04     ` Matthew Wilcox
2021-08-23 22:15   ` Matthew Wilcox
2021-08-24 18:32     ` Johannes Weiner
2021-08-24 18:59       ` Linus Torvalds
2021-08-24 18:59         ` Linus Torvalds
2021-08-25  6:39         ` Christoph Hellwig
2021-08-24 19:44       ` Matthew Wilcox
2021-08-25 15:13         ` Johannes Weiner
2021-08-26  0:45           ` Darrick J. Wong
2021-08-27 14:07             ` Johannes Weiner
2021-08-27 18:44               ` Matthew Wilcox
2021-08-27 21:41                 ` Dan Williams
2021-08-27 21:41                   ` Dan Williams
2021-08-27 21:49                   ` Matthew Wilcox
2021-08-30 17:32                 ` Johannes Weiner
2021-08-30 18:22                   ` Matthew Wilcox
2021-08-30 20:27                     ` Johannes Weiner
2021-08-30 21:38                       ` Matthew Wilcox
2021-08-31 17:40                         ` Vlastimil Babka
2021-09-01 17:43                         ` Johannes Weiner
2021-09-02 15:13                           ` Zi Yan
2021-09-06 14:00                             ` Vlastimil Babka
2021-08-31 18:50                       ` Eric W. Biederman
2021-08-31 18:50                         ` Eric W. Biederman
2021-08-26  8:58         ` David Howells
2021-08-27 10:03           ` Johannes Weiner
2021-08-27 12:05             ` Matthew Wilcox
2021-08-27 10:49           ` David Howells
2021-08-24 15:54   ` David Howells
2021-08-24 17:56     ` Matthew Wilcox
2021-08-24 18:26       ` Linus Torvalds
2021-08-24 18:26         ` Linus Torvalds
2021-08-24 18:29         ` Linus Torvalds
2021-08-24 18:29           ` Linus Torvalds
2021-08-24 19:26           ` Theodore Ts'o
2021-08-24 19:34           ` David Howells
2021-08-24 20:02             ` Theodore Ts'o
2021-08-24 21:32             ` David Howells
2021-08-25 12:08               ` Jeff Layton
2021-08-24 19:01         ` Matthew Wilcox
2021-08-24 19:11           ` Linus Torvalds
2021-08-24 19:11             ` Linus Torvalds
2021-08-24 19:23             ` Matthew Wilcox
2021-08-24 19:44               ` Theodore Ts'o
2021-08-24 20:00                 ` Matthew Wilcox
2021-08-25  6:32                 ` Christoph Hellwig
2021-08-25  9:01                   ` Rasmus Villemoes
2021-08-26  6:32                     ` Amir Goldstein
2021-08-26  6:32                       ` Amir Goldstein
2021-08-25 12:03                   ` Jeff Layton
2021-08-25 12:03                     ` Jeff Layton
2021-08-26  0:59                     ` Darrick J. Wong
2021-08-26  4:02                   ` Nicholas Piggin
2021-09-01 12:58                 ` Mike Rapoport
2021-08-24 19:35             ` David Howells
2021-08-24 20:35               ` Vlastimil Babka
2021-08-24 20:40                 ` Vlastimil Babka
2021-08-24 19:11         ` David Howells
2021-08-24 19:25           ` Linus Torvalds
2021-08-24 19:25             ` Linus Torvalds
2021-08-24 19:38             ` Linus Torvalds
2021-08-24 19:38               ` Linus Torvalds
2021-08-24 19:48               ` Linus Torvalds
2021-08-24 19:48                 ` Linus Torvalds
2021-08-26 17:18                 ` Matthew Wilcox
2021-08-24 19:59             ` David Howells
2021-10-05 13:52   ` Matthew Wilcox
2021-10-05 17:29     ` Johannes Weiner
2021-10-05 17:32       ` David Hildenbrand
2021-10-05 18:30       ` Matthew Wilcox
2021-10-05 19:56         ` Jason Gunthorpe
2021-08-28  3:29 ` Matthew Wilcox
2021-09-09 12:43 ` Christoph Hellwig
2021-09-09 13:56   ` Vlastimil Babka
2021-09-09 18:16     ` Johannes Weiner
2021-09-09 18:44       ` Matthew Wilcox
2021-09-09 22:03         ` Johannes Weiner
2021-09-09 22:48           ` Matthew Wilcox
2021-09-09 19:17     ` John Hubbard
2021-09-09 19:23       ` Matthew Wilcox
2021-09-10 20:16 ` Folio discussion recap Kent Overstreet
2021-09-11  1:23   ` Kirill A. Shutemov
2021-09-13 11:32     ` Michal Hocko
2021-09-13 18:12       ` Johannes Weiner
2021-09-15 15:40   ` Johannes Weiner
2021-09-15 17:55     ` Damian Tometzki
2021-09-16  2:58     ` Darrick J. Wong
2021-09-16 16:54       ` Johannes Weiner
2021-09-17  5:24         ` Dave Chinner
2021-09-17  7:18           ` Christoph Hellwig
2021-09-17 16:31           ` Johannes Weiner
2021-09-17 20:57             ` Kirill A. Shutemov
2021-09-17 21:17               ` Kent Overstreet
2021-09-17 22:02                 ` Kirill A. Shutemov
2021-09-17 22:21                   ` Kent Overstreet
2021-09-17 23:15               ` Johannes Weiner
2021-09-20 10:03                 ` Kirill A. Shutemov
2021-09-17 21:13             ` Kent Overstreet
2021-09-17 22:25               ` Theodore Ts'o
2021-09-17 23:35                 ` Josef Bacik
2021-09-18  1:04             ` Dave Chinner
2021-09-18  4:51               ` Kent Overstreet
2021-09-20  1:04                 ` Dave Chinner
2021-09-16 21:58       ` David Howells
2021-09-20  2:17   ` Matthew Wilcox
2021-09-21 19:47     ` Johannes Weiner
2021-09-21 20:38       ` Matthew Wilcox
2021-09-21 21:11         ` Kent Overstreet
2021-09-21 21:22           ` Folios for 5.15 request - Was: re: Folio discussion recap - Kent Overstreet
2021-09-22 15:08             ` Johannes Weiner
2021-09-22 15:46               ` Kent Overstreet
2021-09-22 16:26                 ` Matthew Wilcox
2021-09-22 16:56                   ` Chris Mason
2021-09-22 19:54                     ` Matthew Wilcox
2021-09-22 20:15                       ` Kent Overstreet
2021-09-22 20:21                       ` Linus Torvalds
2021-09-22 20:21                         ` Linus Torvalds
2021-09-23  5:42               ` Kent Overstreet
2021-09-23 18:00                 ` Johannes Weiner
2021-09-23 19:31                   ` Matthew Wilcox
2021-09-23 20:20                   ` Kent Overstreet
2021-10-16  3:28               ` Matthew Wilcox
2021-10-18 16:47                 ` Johannes Weiner
2021-10-18 18:12                   ` Kent Overstreet
2021-10-18 20:45                     ` Johannes Weiner
2021-10-19 16:11                       ` Splitting struct page into multiple types " Kent Overstreet
2021-10-19 17:06                         ` Gao Xiang
2021-10-19 17:34                           ` Matthew Wilcox
2021-10-19 17:54                             ` Gao Xiang
2021-10-20 17:46                               ` Kent Overstreet
2021-10-19 17:37                         ` Jason Gunthorpe
2021-10-19 21:14                       ` David Howells
2021-10-18 18:28                   ` Folios for 5.15 request " Matthew Wilcox
2021-10-18 21:56                     ` Johannes Weiner
2021-10-18 23:16                       ` Kirill A. Shutemov
2021-10-19 15:16                         ` Johannes Weiner
2021-10-20  3:19                           ` Matthew Wilcox
2021-10-20  7:50                           ` David Hildenbrand
2021-10-20 17:26                             ` Matthew Wilcox
2021-10-20 18:04                               ` David Hildenbrand
2021-10-21  6:51                                 ` Christoph Hellwig
2021-10-21  7:21                                   ` David Hildenbrand
2021-10-21 12:03                                     ` Kent Overstreet
2021-10-21 12:35                                       ` David Hildenbrand
2021-10-21 12:38                                         ` Christoph Hellwig
2021-10-21 13:00                                           ` David Hildenbrand
2021-10-21 12:41                                         ` Matthew Wilcox
2021-10-20 17:39                           ` Kent Overstreet
2021-10-21 21:37                             ` Johannes Weiner
2021-10-22  1:52                               ` Matthew Wilcox
2021-10-22  7:59                                 ` David Hildenbrand [this message]
2021-10-22 13:01                                   ` Matthew Wilcox
2021-10-22 14:40                                     ` David Hildenbrand
2021-10-23  2:22                                       ` Matthew Wilcox
2021-10-23  5:02                                         ` Christoph Hellwig
2021-10-23  9:58                                         ` David Hildenbrand
2021-10-23 16:00                                           ` Kent Overstreet
2021-10-23 21:41                                             ` Matthew Wilcox
2021-10-23 22:23                                               ` Kent Overstreet
2021-10-25 15:35                                 ` Johannes Weiner
2021-10-25 15:52                                   ` Matthew Wilcox
2021-10-25 16:05                                   ` Kent Overstreet
2021-10-16 19:07               ` Matthew Wilcox
2021-10-18 17:25                 ` Johannes Weiner
2021-09-21 22:18           ` Folio discussion recap Matthew Wilcox
2021-09-23  0:45             ` Ira Weiny
2021-09-23  3:41               ` Matthew Wilcox
2021-09-23 22:12                 ` Ira Weiny
2021-09-29 15:24                   ` Matthew Wilcox
2021-09-21 21:59         ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=326b5796-6ef9-a08f-a671-4da4b04a2b4f@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=kent.overstreet@gmail.com \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.