All of lore.kernel.org
 help / color / mirror / Atom feed
* State of the Page (August 2022)
@ 2022-08-11 21:31 Matthew Wilcox
  2022-08-12 10:16 ` Kirill A. Shutemov
  2022-08-13 15:21 ` Mike Rapoport
  0 siblings, 2 replies; 7+ messages in thread
From: Matthew Wilcox @ 2022-08-11 21:31 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, linux-fsdevel

==============================
State Of The Page, August 2022
==============================

I thought I'd write down where we are with struct page and where
we're going, just to make sure we're all (still?) pulling in a similar
direction.

Destination
===========

For some users, the size of struct page is simply too large.  At 64
bytes per 4KiB page, memmap occupies 1.6% of memory.  If we can get
struct page down to an 8 byte tagged pointer, it will be 0.2% of memory,
which is an acceptable overhead.

   struct page {
      unsigned long mem_desc;
   };

Types of memdesc
----------------

This is very much subject to change as new users present themselves.
Here are the current ones in-plan:

 - Undescribed.  Instead of the rest of the word being a pointer,
   there are 2^28 subtypes available:
   - Unmappable.  Typically device drivers allocating private memory.
   - Reserved.  These pages are not allocatable.
   - HWPoison
   - Offline (eg balloon)
   - Guard (see debug_pagealloc)
 - Slab
 - Anon Folio
 - File Folio
 - Buddy (ie free -- also for PCP?)
 - Page Table
 - Vmalloc
 - Net Pool
 - Zsmalloc
 - Z3Fold
 - Mappable.  Typically device drivers mapping memory to userspace

That implies 4 bits needed for the tag, so all memdesc allocations
must be 16-byte aligned.  That is not an undue burden.  Memdescs
must also be TYPESAFE_BY_RCU if they are mappable to userspace or
can be stored in a file's address_space.

It may be worth distinguishing between vmalloc-mappable and
vmalloc-unmappable to prevent some things being mapped to userspace
inadvertently.

Contents of a memdesc
---------------------

At least initially, the first word of a memdesc must be identical to the
current page flags.  That allows various functions (eg set_page_dirty())
to work on any kind of page without needing to know whether it's a device
driver page, a vmalloc page, anon or file folio.

Similarly, both anon and file folios must have the list_head in the
same place so they can be placed on the same LRU list.  Whether anon
and file folios become separate types is still unclear to me.

Mappable
--------

All pages mapped to userspace must have:

 - A refcount
 - A mapcount

Preferably in the same place in the memdesc so we can handle them without
having separate cases for each type of memdesc.  It would be nice to have
a pincount as well, but that's already an optional feature.

I propose:

   struct mappable {
       unsigned long flags;	/* contains dirty flag */
       atomic_t _refcount;
       atomic_t _mapcount;
   };

   struct folio {
      union {
         unsigned long flags;
         struct mappable m;
      };
      ...
   };

Memdescs which should never be mapped to userspace (eg slab, page tables,
zsmalloc) do not need to contain such a struct.

Mapcount
--------

While discussed above, handling mapcount is tricky enough to need its
own section.  Since folios can be mapped unaligned, we may need to
increment mapcount once per page table entry that refers to it.  This
is different from how THPs are handled today (one refcount per page
plus a compound_mapcount for how many times the entire THP is mapped).
So splitting a PMD entry results in incrementing mapcount by
(PTRS_PER_PMD - 1).

If the mapcount is raised to dangerously high levels, we can split
the page.  This should not happen in normal operation.

Extended Memdescs
-----------------

One of the things we're considering is that maybe a filesystem will
want to have private data allocated with its folios.  Instead of hanging
extra stuff off folio->private, they could embed a struct folio inside
a struct ext4_folio.

Buddy memdesc
-------------

I need to firm up a plan for this.  Allocating memory in order to free
memory is generally a bad idea, so we either have to coopt the contents
of other memdescs (and some allocations don't have memdescs!) or we
need to store everything we need in the remainder of the unsigned long.
I'm not yet familiar enough with the page allocator to have a clear
picture of what is needed.

Where are we?
=============

v5.17:

 - Slab was broken out from struct page in 5.17 (thanks to Vlastimil).
 - XFS & iomap mostly converted from pages to folios
 - Block & page cache mostly have the folio interfaces in place

v5.18:

 - Large folio (multiple page) support added for filesystems that opt in
 - File truncation converted to folios
 - address_space_operations (aops) ->set_page_dirty converted to ->dirty_folio
 - Much of get_user_page() converted to folios
 - rmap_walk() converted to folios

v5.19:

 - Most aops now converted to folios
 - More folio conversions in migration, shmem, swap, vmscan 

v6.0:

 - aops->migratepage became migrate_folio
 - isolate_page and putback_page removed from aops
 - More folio conversions in migration, shmem, swap, vmscan 

Todo
====

Well, most of the above!

 - Individual filesystems need converting from pages to folios
 - Zsmalloc, z3fold, page tables, netpools need to be split from
   struct page into their own types
 - Anywhere referring to page->... needs to be converted to folio
   or some other type.

Help with any of this gratefully appreciated.  Especially if you're the
maintainer of a thing and want to convert it yourself.  I'd rather help
explain the subtleties of folios / mappables / ... to you than try
to figure out the details of your code to convert it myself (and get
it wrong).  Please contact me to avoid multiple people working on
the same thing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: State of the Page (August 2022)
  2022-08-11 21:31 State of the Page (August 2022) Matthew Wilcox
@ 2022-08-12 10:16 ` Kirill A. Shutemov
  2022-08-12 13:34   ` Matthew Wilcox
  2022-08-13 15:21 ` Mike Rapoport
  1 sibling, 1 reply; 7+ messages in thread
From: Kirill A. Shutemov @ 2022-08-12 10:16 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-kernel, linux-fsdevel

On Thu, Aug 11, 2022 at 10:31:21PM +0100, Matthew Wilcox wrote:
> ==============================
> State Of The Page, August 2022
> ==============================
> 
> I thought I'd write down where we are with struct page and where
> we're going, just to make sure we're all (still?) pulling in a similar
> direction.
> 
> Destination
> ===========
> 
> For some users, the size of struct page is simply too large.  At 64
> bytes per 4KiB page, memmap occupies 1.6% of memory.  If we can get
> struct page down to an 8 byte tagged pointer, it will be 0.2% of memory,
> which is an acceptable overhead.

Right. This is attractive. But it brings cost of indirection.

It can be especially painful for physical memory scanning. I guess we can
derive some info from memdesc type itself, like if it can be movable. But
still looks like an expensive change.

Do you have any estimation on how much CPU time we will pay to reduce
memory (and cache) overhead? RAM size tend to grow faster than IPC.
We need to make sure it is the right direction.

>    struct page {
>       unsigned long mem_desc;
>    };
> 
> Types of memdesc
> ----------------
> 
> This is very much subject to change as new users present themselves.
> Here are the current ones in-plan:
> 
>  - Undescribed.  Instead of the rest of the word being a pointer,
>    there are 2^28 subtypes available:
>    - Unmappable.  Typically device drivers allocating private memory.
>    - Reserved.  These pages are not allocatable.
>    - HWPoison
>    - Offline (eg balloon)
>    - Guard (see debug_pagealloc)
>  - Slab
>  - Anon Folio
>  - File Folio
>  - Buddy (ie free -- also for PCP?)
>  - Page Table
>  - Vmalloc
>  - Net Pool
>  - Zsmalloc
>  - Z3Fold
>  - Mappable.  Typically device drivers mapping memory to userspace
> 
> That implies 4 bits needed for the tag, so all memdesc allocations
> must be 16-byte aligned.  That is not an undue burden.  Memdescs
> must also be TYPESAFE_BY_RCU if they are mappable to userspace or
> can be stored in a file's address_space.
> 
> It may be worth distinguishing between vmalloc-mappable and
> vmalloc-unmappable to prevent some things being mapped to userspace
> inadvertently.

Given that memdesc represents Slab too, how do we allocate them?

> 
> Contents of a memdesc
> ---------------------
> 
> At least initially, the first word of a memdesc must be identical to the
> current page flags.  That allows various functions (eg set_page_dirty())
> to work on any kind of page without needing to know whether it's a device
> driver page, a vmalloc page, anon or file folio.
> 
> Similarly, both anon and file folios must have the list_head in the
> same place so they can be placed on the same LRU list.  Whether anon
> and file folios become separate types is still unclear to me.
> 
> Mappable
> --------
> 
> All pages mapped to userspace must have:
> 
>  - A refcount
>  - A mapcount
> 
> Preferably in the same place in the memdesc so we can handle them without
> having separate cases for each type of memdesc.  It would be nice to have
> a pincount as well, but that's already an optional feature.
> 
> I propose:
> 
>    struct mappable {
>        unsigned long flags;	/* contains dirty flag */
>        atomic_t _refcount;
>        atomic_t _mapcount;
>    };
> 
>    struct folio {
>       union {
>          unsigned long flags;
>          struct mappable m;
>       };
>       ...
>    };

Hm. How does lockless page cache lookup would look like in this case?

Currently it relies on get_page_unless_zero() and to keep it work there's
should be guarantee that nothing else is allocated where mappable memdesc
was before. Would it require some RCU tricks on memdesc free?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: State of the Page (August 2022)
  2022-08-12 10:16 ` Kirill A. Shutemov
@ 2022-08-12 13:34   ` Matthew Wilcox
  2022-08-12 14:33     ` Kirill A. Shutemov
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2022-08-12 13:34 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: linux-mm, linux-kernel, linux-fsdevel

On Fri, Aug 12, 2022 at 01:16:39PM +0300, Kirill A. Shutemov wrote:
> On Thu, Aug 11, 2022 at 10:31:21PM +0100, Matthew Wilcox wrote:
> > ==============================
> > State Of The Page, August 2022
> > ==============================
> > 
> > I thought I'd write down where we are with struct page and where
> > we're going, just to make sure we're all (still?) pulling in a similar
> > direction.
> > 
> > Destination
> > ===========
> > 
> > For some users, the size of struct page is simply too large.  At 64
> > bytes per 4KiB page, memmap occupies 1.6% of memory.  If we can get
> > struct page down to an 8 byte tagged pointer, it will be 0.2% of memory,
> > which is an acceptable overhead.
> 
> Right. This is attractive. But it brings cost of indirection.

It does, but it also crams 8 pages into a single cacheline instead of
occupying one cacheline per page.

> It can be especially painful for physical memory scanning. I guess we can
> derive some info from memdesc type itself, like if it can be movable. But
> still looks like an expensive change.

I just don't think of physical memory scanning as something we do
often, or in a performance-sensitive path.  I'm OK with slowing down
kcompactd if it makes walking the LRU list faster.

> Do you have any estimation on how much CPU time we will pay to reduce
> memory (and cache) overhead? RAM size tend to grow faster than IPC.
> We need to make sure it is the right direction.

I don't.  I've heard colourful metaphors from the hyperscale crowd about
how many more VMs they could sell, usually in terms of putting pallets
of money in the parking lot and setting them on fire.  But IPC isn't the
right metric either, CPU performance is all about cache misses these days.

> > That implies 4 bits needed for the tag, so all memdesc allocations
> > must be 16-byte aligned.  That is not an undue burden.  Memdescs
> > must also be TYPESAFE_BY_RCU if they are mappable to userspace or
> > can be stored in a file's address_space.
> > 
> > It may be worth distinguishing between vmalloc-mappable and
> > vmalloc-unmappable to prevent some things being mapped to userspace
> > inadvertently.
> 
> Given that memdesc represents Slab too, how do we allocate them?

First, we separate out allocating pages from allocating their memdesc.  ie:

struct folio *folio_alloc(u8 order, gfp_t gfp)
{
	struct folio *folio = kmem_cache_alloc(folio_cache, gfp);

	if (!folio)
		return NULL;
	if (page_alloc_desc(order, folio, gfp))
		return folio;
	kmem_cache_free(folio_cache, folio);
	return NULL;
}

That can't work for slab because we might recurse for ever.  So we
have to do it backwards:

struct slab *slab_alloc(size_t size, u8 order, gfp_t gfp)
{
	struct slab *slab;
	struct page *page = page_alloc(order, gfp);

	if (!page)
		return NULL;
	if (sizeof(*slab) == size) {
		slab = page_address(page);
		slab_init(slab, 1);
	} else {
		slab = kmem_cache_alloc(slab_cache, gfp);
		if (!slab) {
			page_free(page, order);
			return NULL;
		}
	}
	page_set_memdesc(page, order, slab);
	return slab;
}

So there is mutual recursion between kmem_cache_alloc() and
slab_alloc(), but it stops after one round.  (obviously this is
just a sketch of a solution)

folio_alloc()
  kmem_cache_alloc(folio)
    page_alloc(folio)
      kmem_cache_alloc(slab)
        page_alloc(slab)
  page_alloc_desc() 

Slab then has to be taught that a slab with a single object allocated
(ie itself) is actually free and can be released back to the pool,
but that seems like a SMOP.

> > Mappable
> > --------
> > 
> > All pages mapped to userspace must have:
> > 
> >  - A refcount
> >  - A mapcount
> > 
> > Preferably in the same place in the memdesc so we can handle them without
> > having separate cases for each type of memdesc.  It would be nice to have
> > a pincount as well, but that's already an optional feature.
> > 
> > I propose:
> > 
> >    struct mappable {
> >        unsigned long flags;	/* contains dirty flag */
> >        atomic_t _refcount;
> >        atomic_t _mapcount;
> >    };
> > 
> >    struct folio {
> >       union {
> >          unsigned long flags;
> >          struct mappable m;
> >       };
> >       ...
> >    };
> 
> Hm. How does lockless page cache lookup would look like in this case?
> 
> Currently it relies on get_page_unless_zero() and to keep it work there's
> should be guarantee that nothing else is allocated where mappable memdesc
> was before. Would it require some RCU tricks on memdesc free?

An earlier paragraph has:

> > That implies 4 bits needed for the tag, so all memdesc allocations
> > must be 16-byte aligned.  That is not an undue burden.  Memdescs
> > must also be TYPESAFE_BY_RCU if they are mappable to userspace or
> > can be stored in a file's address_space.

so yes, I agree, we need this RCU trick to make sure the memdesc remains a
memdesc of the right type, even if it's no longer attached to the right
chunk of memory.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: State of the Page (August 2022)
  2022-08-12 13:34   ` Matthew Wilcox
@ 2022-08-12 14:33     ` Kirill A. Shutemov
  2022-08-12 14:39       ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Kirill A. Shutemov @ 2022-08-12 14:33 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-kernel, linux-fsdevel

On Fri, Aug 12, 2022 at 02:34:53PM +0100, Matthew Wilcox wrote:
> On Fri, Aug 12, 2022 at 01:16:39PM +0300, Kirill A. Shutemov wrote:
> > On Thu, Aug 11, 2022 at 10:31:21PM +0100, Matthew Wilcox wrote:
> > > ==============================
> > > State Of The Page, August 2022
> > > ==============================
> > > 
> > > I thought I'd write down where we are with struct page and where
> > > we're going, just to make sure we're all (still?) pulling in a similar
> > > direction.
> > > 
> > > Destination
> > > ===========
> > > 
> > > For some users, the size of struct page is simply too large.  At 64
> > > bytes per 4KiB page, memmap occupies 1.6% of memory.  If we can get
> > > struct page down to an 8 byte tagged pointer, it will be 0.2% of memory,
> > > which is an acceptable overhead.
> > 
> > Right. This is attractive. But it brings cost of indirection.
> 
> It does, but it also crams 8 pages into a single cacheline instead of
> occupying one cacheline per page.

If you really need info about these pages and reference their memdesc it
is likely be 9 cache lines that scattered across memory instead of 8 cache
lines next to each other in the same page.

And it's going to be two cachelines instead of one if we need info about
one page. I think it is the most common case.

Initially, I thought we can offset the cost by caching memdescs instead of
struct page/folio. Like page cache store memdesc, but it would require
memdesc_to_pfn() which is not possible, unless we want to store pfn
explicitly in memdesc.

I don't want to be buzzkill, I like the idea a lot, but abstractions are
often costly. Getting it upstream without noticeable performance
regressions going to be a challenge.

> > It can be especially painful for physical memory scanning. I guess we can
> > derive some info from memdesc type itself, like if it can be movable. But
> > still looks like an expensive change.
> 
> I just don't think of physical memory scanning as something we do
> often, or in a performance-sensitive path.  I'm OK with slowing down
> kcompactd if it makes walking the LRU list faster.
> 
> > Do you have any estimation on how much CPU time we will pay to reduce
> > memory (and cache) overhead? RAM size tend to grow faster than IPC.
> > We need to make sure it is the right direction.
> 
> I don't.  I've heard colourful metaphors from the hyperscale crowd about
> how many more VMs they could sell, usually in terms of putting pallets
> of money in the parking lot and setting them on fire.  But IPC isn't the
> right metric either, CPU performance is all about cache misses these days.

As I said above, I don't expect the new scheme to be cache-friendly
either.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: State of the Page (August 2022)
  2022-08-12 14:33     ` Kirill A. Shutemov
@ 2022-08-12 14:39       ` Matthew Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2022-08-12 14:39 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: linux-mm, linux-kernel, linux-fsdevel

On Fri, Aug 12, 2022 at 05:33:56PM +0300, Kirill A. Shutemov wrote:
> If you really need info about these pages and reference their memdesc it
> is likely be 9 cache lines that scattered across memory instead of 8 cache
> lines next to each other in the same page.

Well, hopefully not.  Most allocations should be multiple pages.  That's
already true for slab, netpool and file (for xfs anyway), and hopefully
soon for anon.

> Initially, I thought we can offset the cost by caching memdescs instead of
> struct page/folio. Like page cache store memdesc, but it would require
> memdesc_to_pfn() which is not possible, unless we want to store pfn
> explicitly in memdesc.

I think we do, at least for some memdescs.  File folios definitely want
to store the pfn, but I don't think getting the PFN for a slab is a
common operation (although we'll still need to store the pointer to
the struct page, so it's equivalent).

> I don't want to be buzzkill, I like the idea a lot, but abstractions are
> often costly. Getting it upstream without noticeable performance
> regressions going to be a challenge.

I don't think there's a way to find out whether it'll be a performance
win without actually doing it.  Fortunately, the steps to get to this
point are mostly good cleanups anyway.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: State of the Page (August 2022)
  2022-08-11 21:31 State of the Page (August 2022) Matthew Wilcox
  2022-08-12 10:16 ` Kirill A. Shutemov
@ 2022-08-13 15:21 ` Mike Rapoport
  2022-08-14 10:57   ` Matthew Wilcox
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Rapoport @ 2022-08-13 15:21 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-kernel, linux-fsdevel

Hi,

On Thu, Aug 11, 2022 at 10:31:21PM +0100, Matthew Wilcox wrote:
> ==============================
> State Of The Page, August 2022
> ==============================
> 
> I thought I'd write down where we are with struct page and where
> we're going, just to make sure we're all (still?) pulling in a similar
> direction.
> 
> Destination
> ===========
> 
> For some users, the size of struct page is simply too large.  At 64
> bytes per 4KiB page, memmap occupies 1.6% of memory.  If we can get
> struct page down to an 8 byte tagged pointer, it will be 0.2% of memory,
> which is an acceptable overhead.
> 
>    struct page {
>       unsigned long mem_desc;
>    };

This is 0.2% for a system that does not have any actual memdescs.

Do you have an estimate how much memory will be used by the memdescs, at
least for some use-cases?

Another thing, we are very strict about keeping struct page at its current
size. Don't you think it will be much more tempting to grow either of
memdescs and for some use cases the overhead will be at least as big as
now?
 
> Types of memdesc
> ----------------
> 
> This is very much subject to change as new users present themselves.
> Here are the current ones in-plan:
> 
>  - Undescribed.  Instead of the rest of the word being a pointer,
>    there are 2^28 subtypes available:
>    - Unmappable.  Typically device drivers allocating private memory.
>    - Reserved.  These pages are not allocatable.
>    - HWPoison
>    - Offline (eg balloon)
>    - Guard (see debug_pagealloc)
>  - Slab
>  - Anon Folio
>  - File Folio
>  - Buddy (ie free -- also for PCP?)
>  - Page Table
>  - Vmalloc
>  - Net Pool
>  - Zsmalloc
>  - Z3Fold
>  - Mappable.  Typically device drivers mapping memory to userspace
> 
> That implies 4 bits needed for the tag, so all memdesc allocations
> must be 16-byte aligned.  That is not an undue burden.  Memdescs
> must also be TYPESAFE_BY_RCU if they are mappable to userspace or
> can be stored in a file's address_space.
> 
> It may be worth distinguishing between vmalloc-mappable and
> vmalloc-unmappable to prevent some things being mapped to userspace
> inadvertently.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: State of the Page (August 2022)
  2022-08-13 15:21 ` Mike Rapoport
@ 2022-08-14 10:57   ` Matthew Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2022-08-14 10:57 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: linux-mm, linux-kernel, linux-fsdevel

On Sat, Aug 13, 2022 at 06:21:12PM +0300, Mike Rapoport wrote:
> > For some users, the size of struct page is simply too large.  At 64
> > bytes per 4KiB page, memmap occupies 1.6% of memory.  If we can get
> > struct page down to an 8 byte tagged pointer, it will be 0.2% of memory,
> > which is an acceptable overhead.
> > 
> >    struct page {
> >       unsigned long mem_desc;
> >    };
> 
> This is 0.2% for a system that does not have any actual memdescs.
> 
> Do you have an estimate how much memory will be used by the memdescs, at
> least for some use-cases?

Sure.  For SLUB, we can see it today,

struct slab {
        unsigned long __page_flags;
        union {
                struct list_head slab_list;
                struct rcu_head rcu_head;
        };
        struct kmem_cache *slab_cache;
        /* Double-word boundary */
        void *freelist;         /* first free object */
        union {
                unsigned long counters;
                struct {
                        unsigned inuse:16;
                        unsigned objects:15;
                        unsigned frozen:1;
                };
        };
        unsigned int __unused;
        atomic_t __page_refcount;
#ifdef CONFIG_MEMCG
        unsigned long memcg_data;
#endif
};

That's 8 words on 64-bit, or 64 bytes.  We'll get to remove __unused and
__page_refcount which brings us back down to 56 bytes, but we'll need to
add a pointer to struct page, bringing us back up to 64 bytes.  Note
that this is per-allocation, so to calculate the amount of space used on
your system, you need to take each line like this:

radix_tree_node   189800 278348    584   28    4 : tunables    0    0    0 : slabdata   9941   9941      0

That last number before the first colon is the number of pages per slab,
so my system has currently allocated 9941 slabs, each with 4 pages in
it.  Current memory consumption is 64 * 4 * 9941 = ~2.5MB.  With
separately allocated memdescs, it's 8 * 4 * 9941 + 64 * 9941, or just
under 1MB.  Would need to repeat this calculation for each line of
slabinfo.


For other users, it depends how they evolve.  In my quick sketch, I
decided that adding pfn to struct folio was a good idea, but adding
a pointer to the page wasn't needed (for the few times it's needed,
we can call pfn_to_page()).  So struct folio will grow from 64 bytes
to 72 in order to add the pfn.  We'll also need to include the size
of subsequent fields currently stored in page[1], so dtor, order,
total_mapcount and pincount, bumping large folios up to 88 bytes.
If the mean size of a folio is 2 pages, then it's 88 + 2 * 8 = 104 bytes
per allocation instead of the current 128 bytes.  So it's still a win,
as long as we don't cache a lot of files less than 4kB.

> Another thing, we are very strict about keeping struct page at its current
> size. Don't you think it will be much more tempting to grow either of
> memdescs and for some use cases the overhead will be at least as big as
> now?

Possibly!  But we get to make that choice.  If the networking people want
to grow the size of the netpool memdesc, you and I don't need to care.
They don't need to negotiate with the MM people about the tradeoffs
involved, they can just do it, benchmark, and decide whether it makes
sense to them.

This is more of an opportunity than a potential downside.  Maybe we can
get rid of page_ext.  Yes, people who enable the features in page_ext
will see their memdescs grow, but they've got rid of the memdesc array
in the process.

Thanks for the feedback.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-14 10:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-11 21:31 State of the Page (August 2022) Matthew Wilcox
2022-08-12 10:16 ` Kirill A. Shutemov
2022-08-12 13:34   ` Matthew Wilcox
2022-08-12 14:33     ` Kirill A. Shutemov
2022-08-12 14:39       ` Matthew Wilcox
2022-08-13 15:21 ` Mike Rapoport
2022-08-14 10:57   ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.