All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	David Howells <dhowells@redhat.com>
Subject: Re: Struct page proposal
Date: Tue, 28 Sep 2021 04:19:17 +0100	[thread overview]
Message-ID: <YVKJtQZFql8yTiyy@casper.infradead.org> (raw)
In-Reply-To: <YVIKlcgvN19BSZsu@moria.home.lan>

On Mon, Sep 27, 2021 at 02:16:53PM -0400, Kent Overstreet wrote:
> On Mon, Sep 27, 2021 at 07:12:19PM +0100, Matthew Wilcox wrote:
> > On Mon, Sep 27, 2021 at 02:09:49PM -0400, Kent Overstreet wrote:
> > > On Mon, Sep 27, 2021 at 07:05:26PM +0100, Matthew Wilcox wrote:
> > > > On Mon, Sep 27, 2021 at 07:48:15PM +0200, Vlastimil Babka wrote:
> > > > > On 9/23/21 03:21, Kent Overstreet wrote:
> > > > > > So if we have this:
> > > > > > 
> > > > > > struct page {
> > > > > > 	unsigned long	allocator;
> > > > > > 	unsigned long	allocatee;
> > > > > > };
> > > > > > 
> > > > > > The allocator field would be used for either a pointer to slab/slub's state, if
> > > > > > it's a slab page, or if it's a buddy allocator page it'd encode the order of the
> > > > > > allocation - like compound order today, and probably whether or not the
> > > > > > (compound group of) pages is free.
> > > > > 
> > > > > The "free page in buddy allocator" case will be interesting to implement.
> > > > > What the buddy allocator uses today is:
> > > > > 
> > > > > - PageBuddy - determine if page is free; a page_type (part of mapcount
> > > > > field) today, could be a bit in "allocator" field that would have to be 0 in
> > > > > all other "page is allocated" contexts.
> > > > > - nid/zid - to prevent merging accross node/zone boundaries, now part of
> > > > > page flags
> > > > > - buddy order
> > > > > - a list_head (reusing the "lru") to hold the struct page on the appropriate
> > > > > free list, which has to be double-linked so page can be taken from the
> > > > > middle of the list instantly
> > > > > 
> > > > > Won't be easy to cram all that into two unsigned long's, or even a single
> > > > > one. We should avoid storing anything in the free page itself. Allocating
> > > > > some external structures to track free pages is going to have funny
> > > > > bootstrap problems. Probably a major redesign would be needed...
> > > > 
> > > > Wait, why do we want to avoid using the memory that we're allocating?
> > > 
> > > The issue is where to stick the state for free pages. If that doesn't fit in two
> > > ulongs, then we'd need a separate allocation, which means slab needs to be up
> > > and running before free pages are initialized.
> > 
> > But the thing we're allocating is at least PAGE_SIZE bytes in size.
> > Why is "We should avoid storing anything in the free page itself" true?
> 
> Good point!
> 
> Highmem and dax do complicate things though - would they make it too much of a
> hassle? You want to get rid of struct page for dax (what's the right term for
> that kind of memory?), but we're not there yet, right?

DAX is used for persistent memory, often abbreviated to pmem.

"Getting there" involves rooting out struct page from all kinds of data
structures.  sg lists are the obvious place to start so that we can do
I/O to memory that's not backed by a struct page.  Get that working and
the rent-a-VM companies will love you.  Right now, we either pay the 1.6%
tax twice (once for the struct pages in the host, and once in the guest),
or we have horrendous hacks to create struct pages on the fly so the
host can do I/O to the guest's memory.

  reply	other threads:[~2021-09-28  3:20 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-23  1:21 Struct page proposal Kent Overstreet
2021-09-23  3:23 ` Matthew Wilcox
2021-09-23  5:15   ` Kent Overstreet
2021-09-23 11:40     ` Mapcount of subpages Matthew Wilcox
2021-09-23 12:45       ` Kirill A. Shutemov
2021-09-23 21:10         ` Hugh Dickins
2021-09-23 21:10           ` Hugh Dickins
2021-09-23 21:54           ` Yang Shi
2021-09-23 21:54             ` Yang Shi
2021-09-23 22:23             ` Zi Yan
2021-09-23 23:48               ` Hugh Dickins
2021-09-23 23:48                 ` Hugh Dickins
2021-09-24  0:25                 ` Zi Yan
2021-09-24  0:57                   ` Hugh Dickins
2021-09-24  0:57                     ` Hugh Dickins
2021-09-24  1:11                 ` Yang Shi
2021-09-24  1:11                   ` Yang Shi
2021-09-24  1:31                   ` Matthew Wilcox
2021-09-24  3:26                     ` Yang Shi
2021-09-24  3:26                       ` Yang Shi
2021-09-24 23:05           ` Kirill A. Shutemov
2021-09-23 18:56       ` Mike Kravetz
2021-09-23  9:03 ` Struct page proposal David Hildenbrand
2021-09-23 15:22   ` Kent Overstreet
2021-09-23 15:34     ` David Hildenbrand
2021-09-27 17:48 ` Vlastimil Babka
2021-09-27 17:53   ` Kent Overstreet
2021-09-27 18:34     ` Linus Torvalds
2021-09-27 18:34       ` Linus Torvalds
2021-09-27 20:45       ` David Hildenbrand
2021-09-27 18:05   ` Matthew Wilcox
2021-09-27 18:09     ` Kent Overstreet
2021-09-27 18:12       ` Matthew Wilcox
2021-09-27 18:16         ` David Hildenbrand
2021-09-27 18:53           ` Vlastimil Babka
2021-09-27 19:04             ` Linus Torvalds
2021-09-27 19:04               ` Linus Torvalds
2021-09-27 18:16         ` Kent Overstreet
2021-09-28  3:19           ` Matthew Wilcox [this message]
2021-09-27 19:07       ` Vlastimil Babka
2021-09-27 20:14         ` Kent Overstreet
2021-09-28 11:21         ` David Laight
2021-09-27 18:33     ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YVKJtQZFql8yTiyy@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.