All of lore.kernel.org
 help / color / mirror / Atom feed
* A two-bit folio_mapcount
@ 2022-01-27 21:57 Matthew Wilcox
  2022-01-28  3:05 ` John Hubbard
  0 siblings, 1 reply; 2+ messages in thread
From: Matthew Wilcox @ 2022-01-27 21:57 UTC (permalink / raw)
  To: linux-mm

As promised, here's a half-baked proposal for making folio_mapcount()
significantly cheaper at the cost of making it less precise.
I appreciate that folio_mapcount() is not upstream yet, so take a look
at total_mapcount() if you want to understand what I'm talking about.

For a 2MB folio on a 4k architecture, you have to check 512 cachelines
to determine how many times a folio is mapped.  That's 32kB of memory,
which is a good chunk of your L1 cache.  The problem is that every PTE
mapping increments the ->mapcount of each individual page (and the number
of PMD mappings is stored separately).  To find out how many times the
entire folio is mapped, you've got to look at each constituent page.

Added to that, each increment of any of the ->mapcount bumps the
refcount on the head page.  That's a lot of atomic ops, and we've had
some problems where the page refcount has been attacked resulting in
overflow.

I would like to start counting folio mapcounts in a more Discworld Troll
manner.  Zero, One, Two, Many.  That limits the total number of refcount
increments to 3.  Once you reach "Many", you've essentially lost count,
and you need to walk the interval tree to figure out exactly how many
mappings there are (this means we can no longer use mapcount to decide to
stop walking the rmap, but I think that's OK?)  You can decrement from
Two to One and One to Zero, but you can't decrement from Many to Two.
If you walk the rmap and discover there are less than Many mappings,
you can set mapcount to Two, One or Zero (adjusting page refcount at
the same time).

The mapcount would also no longer count the number of individual PTE or
PMD mappings.  Instead, it would be the number of VMAs which contain at
least one page table reference to this folio.

One advantage to this scheme is that it makes something like 30 bits
available in struct page.  I'm sure we'll be able to think of some good
uses for them.  PageDoubleMap also goes away (because we no longer care
whether the folio is mapped with PMDs or PTEs).

So ... what's going to be made catastrophically slower by this scheme?
Maybe something involving anonymous pages?  Those tend to be my blind
spot.


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: A two-bit folio_mapcount
  2022-01-27 21:57 A two-bit folio_mapcount Matthew Wilcox
@ 2022-01-28  3:05 ` John Hubbard
  0 siblings, 0 replies; 2+ messages in thread
From: John Hubbard @ 2022-01-28  3:05 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm

On 1/27/22 13:57, Matthew Wilcox wrote:
> As promised, here's a half-baked proposal for making folio_mapcount()
> significantly cheaper at the cost of making it less precise.
> I appreciate that folio_mapcount() is not upstream yet, so take a look
> at total_mapcount() if you want to understand what I'm talking about.
> 
> For a 2MB folio on a 4k architecture, you have to check 512 cachelines
> to determine how many times a folio is mapped.  That's 32kB of memory,
> which is a good chunk of your L1 cache.  The problem is that every PTE
> mapping increments the ->mapcount of each individual page (and the number
> of PMD mappings is stored separately).  To find out how many times the
> entire folio is mapped, you've got to look at each constituent page.
> 
> Added to that, each increment of any of the ->mapcount bumps the
> refcount on the head page.  That's a lot of atomic ops, and we've had
> some problems where the page refcount has been attacked resulting in
> overflow.
> 
> I would like to start counting folio mapcounts in a more Discworld Troll
> manner.  Zero, One, Two, Many.  That limits the total number of refcount
> increments to 3.  Once you reach "Many", you've essentially lost count,
> and you need to walk the interval tree to figure out exactly how many
> mappings there are (this means we can no longer use mapcount to decide to
> stop walking the rmap, but I think that's OK?)  You can decrement from
> Two to One and One to Zero, but you can't decrement from Many to Two.
> If you walk the rmap and discover there are less than Many mappings,
> you can set mapcount to Two, One or Zero (adjusting page refcount at
> the same time).
> 
> The mapcount would also no longer count the number of individual PTE or
> PMD mappings.  Instead, it would be the number of VMAs which contain at
> least one page table reference to this folio.
> 
> One advantage to this scheme is that it makes something like 30 bits
> available in struct page.  I'm sure we'll be able to think of some good
> uses for them.  PageDoubleMap also goes away (because we no longer care

Such as upgrading from:
	page_maybe_dma_pinned(),
to:
	oh_yes_page_is_most_definitely_dma_pinned() !  :)

...I just can't let that idea go. haha.

thanks,
-- 
John Hubbard
NVIDIA

> whether the folio is mapped with PMDs or PTEs).
> 
> So ... what's going to be made catastrophically slower by this scheme?
> Maybe something involving anonymous pages?  Those tend to be my blind
> spot.
> 



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-01-28  3:05 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27 21:57 A two-bit folio_mapcount Matthew Wilcox
2022-01-28  3:05 ` John Hubbard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.