linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Free up a page flag
@ 2022-03-09 20:50 Matthew Wilcox
  2022-03-09 22:07 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2022-03-09 20:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Linus Torvalds

We're always running out of page flags.  Here's an attempt to free one
up for the next time somebody wants one.

It's based on the idea that neither PageReserved nor PageSlab need
(most) other flags in the flags word.  So instead of encoding PG_slab and
PG_reserved as their own flags, we can use a magic flag (tentatively named
xyzzy because I suck at naming) to indicate that it's one of these two,
and then reuse some other flags to specify which one it is.

Here's patch 1/2 which just converts PG_slab.  There should be another
one which converts PG_reserved, but it's a bit of work and I thought it
best to get feedback on this before spending more time on it.

It's "wrong" in a number of ways, including the fact that slab doesn't
actually need the atomic versions of Set and Clear; it always uses
__folio_set_slab() and __folio_clear_slab(), but I thought it was a good
idea to illustrate how one could do the atomic flag updates if necessary.
I'm tempted to say "Oh, you shouldn't", but PG_reserved is set in an
atomic manner today, even though it probably doesn't need to be.

I don't think there are any other existing flags that we can reclaim
using this technique, but maybe there are others who would love their
own flag that can be used in this manner.

It compiles, but I didn't boot it.

From 62c98e780d4929f1a51165562c955cfdee6cfa01 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Wed, 9 Mar 2022 14:48:23 -0500
Subject: [PATCH] PG_xyzzy

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/page-flags.h | 51 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 49 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1c3b6e5c8bfd..15ba61996a78 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -107,7 +107,7 @@ enum pageflags {
 	PG_workingset,
 	PG_waiters,		/* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */
 	PG_error,
-	PG_slab,
+	PG_xyzzy,		/* Magic.  The other flags change meaning */
 	PG_owner_priv_1,	/* Owner use. If pagecache, fs may use*/
 	PG_arch_1,
 	PG_reserved,
@@ -140,6 +140,9 @@ enum pageflags {
 #endif
 	__NR_PAGEFLAGS,
 
+	/* xyzzy flags */
+	PG_slab = PG_owner_priv_1,
+
 	PG_readahead = PG_reclaim,
 
 	/* Filesystems */
@@ -425,10 +428,54 @@ PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
 	TESTCLEARFLAG(Active, active, PF_HEAD)
 PAGEFLAG(Workingset, workingset, PF_HEAD)
 	TESTCLEARFLAG(Workingset, workingset, PF_HEAD)
-__PAGEFLAG(Slab, slab, PF_NO_TAIL)
 __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL)
 PAGEFLAG(Checked, checked, PF_NO_COMPOUND)	   /* Used by some filesystems */
 
+#define XYZZY(name)		((1UL << PG_xyzzy) | (1UL << PG_##name))
+
+static __always_inline bool folio_test_slab(struct folio *folio)
+{
+	return (*folio_flags(folio, 0) & XYZZY(slab)) == XYZZY(slab);
+}
+
+static __always_inline bool PageSlab(struct page *page)
+{
+	return folio_test_slab(page_folio(page));
+}
+
+static __always_inline void folio_set_slab(struct folio *folio)
+{
+	unsigned long flags, *p = folio_flags(folio, 0);
+
+	do {
+		flags = READ_ONCE(*p);
+	} while (cmpxchg(p, flags, flags | XYZZY(slab)) != flags);
+}
+
+static __always_inline void folio_clear_slab(struct folio *folio)
+{
+	unsigned long flags, *p = folio_flags(folio, 0);
+
+	do {
+		flags = READ_ONCE(*p);
+	} while (cmpxchg(p, flags, flags & ~XYZZY(slab)) != flags);
+}
+
+static __always_inline void __folio_set_slab(struct folio *folio)
+{
+	*folio_flags(folio, 0) |= XYZZY(slab);
+}
+
+static __always_inline void __SetPageSlab(struct page *page)
+{
+	page->flags |= XYZZY(slab);
+}
+
+static __always_inline void __folio_clear_slab(struct folio *folio)
+{
+	*folio_flags(folio, 0) &= ~XYZZY(slab);
+}
+
 /* Xen */
 PAGEFLAG(Pinned, pinned, PF_NO_COMPOUND)
 	TESTSCFLAG(Pinned, pinned, PF_NO_COMPOUND)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC] Free up a page flag
  2022-03-09 20:50 [RFC] Free up a page flag Matthew Wilcox
@ 2022-03-09 22:07 ` Linus Torvalds
  2022-03-10  8:36   ` David Hildenbrand
  2022-03-10 15:39   ` Matthew Wilcox
  0 siblings, 2 replies; 4+ messages in thread
From: Linus Torvalds @ 2022-03-09 22:07 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Linux-MM, Linux Kernel Mailing List

On Wed, Mar 9, 2022 at 12:50 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> We're always running out of page flags.  Here's an attempt to free one
> up for the next time somebody wants one.

Ugh. This is too ugly for words.

I wouldn't mind something along the conceptual lines of "these bits
are only used for this type", but I think it would need to be much
more organized and explicit, not this kind of randomness.

For example, quite a few of the page bits really only make sense for
the "page cache and anonymous pages" kind.

I think this includes some really fundamental bits like the lock bit
(and the associated waiters bit), along with a lot of the "owner" aka
"this can be used by the filesystem" bits.

I think it _also_ includes all the LRU and workingset bits etc.

So if we consider that kind of case the "normal" case, the not-normal
case is likely (a) slab, (b) reserved pages and (c) zspages.,

Which is pretty close to your "xyzzy" bit (I think you came to the
same list of "slab or reserved" conclusion because of the fundamental
issues above), but my point is that I think this approach is
acceptable if we make it much less random, and make it a lot more
explicit and thought through.

And we'd probably need to actually *verify* that we don't do things
like lock (or LRU) those non-normal pages.

We already have some page flag bits that are only used for those kinds
of odd pages: the page_flags field is used only for zspages, but other
pages can (misuse) that field for PG_buddy/offline/etc. That whole
thing is particularly ugly in how it tries to make sure there are is
no mapcount use of it.

So I do think something like your "xyzzy" bit can work, but I'd really
want it to be a lot more explicit and a lot less random than "let's
encode two special bits this way".

                   Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] Free up a page flag
  2022-03-09 22:07 ` Linus Torvalds
@ 2022-03-10  8:36   ` David Hildenbrand
  2022-03-10 15:39   ` Matthew Wilcox
  1 sibling, 0 replies; 4+ messages in thread
From: David Hildenbrand @ 2022-03-10  8:36 UTC (permalink / raw)
  To: Linus Torvalds, Matthew Wilcox; +Cc: Linux-MM, Linux Kernel Mailing List

On 09.03.22 23:07, Linus Torvalds wrote:
> On Wed, Mar 9, 2022 at 12:50 PM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> We're always running out of page flags.  Here's an attempt to free one
>> up for the next time somebody wants one.
> 
> Ugh. This is too ugly for words.
> 
> I wouldn't mind something along the conceptual lines of "these bits
> are only used for this type", but I think it would need to be much
> more organized and explicit, not this kind of randomness.
> 
> For example, quite a few of the page bits really only make sense for
> the "page cache and anonymous pages" kind.
> 
> I think this includes some really fundamental bits like the lock bit
> (and the associated waiters bit), along with a lot of the "owner" aka
> "this can be used by the filesystem" bits.
> 
> I think it _also_ includes all the LRU and workingset bits etc.
> 
> So if we consider that kind of case the "normal" case, the not-normal
> case is likely (a) slab, (b) reserved pages and (c) zspages.,
> 
> Which is pretty close to your "xyzzy" bit (I think you came to the
> same list of "slab or reserved" conclusion because of the fundamental
> issues above), but my point is that I think this approach is
> acceptable if we make it much less random, and make it a lot more
> explicit and thought through.
> 
> And we'd probably need to actually *verify* that we don't do things
> like lock (or LRU) those non-normal pages.
> 

Looking at isolate_movable_page(), I think we can easily end up locking
random pages temporarily. And especially non-lru page migration makes
use of it as well.

Personally, I'd not try restricting PG_lock and PG_waiter for specific
page types, it IMHO a way to generic infrastructure.

Other page flags are different and we already reuse them in different
context: e.g., reusing PG_uptodate for buddy pages via PG_reported.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] Free up a page flag
  2022-03-09 22:07 ` Linus Torvalds
  2022-03-10  8:36   ` David Hildenbrand
@ 2022-03-10 15:39   ` Matthew Wilcox
  1 sibling, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2022-03-10 15:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux-MM, Linux Kernel Mailing List

On Wed, Mar 09, 2022 at 02:07:50PM -0800, Linus Torvalds wrote:
> On Wed, Mar 9, 2022 at 12:50 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > We're always running out of page flags.  Here's an attempt to free one
> > up for the next time somebody wants one.
> 
> Ugh. This is too ugly for words.
> 
> I wouldn't mind something along the conceptual lines of "these bits
> are only used for this type", but I think it would need to be much
> more organized and explicit, not this kind of randomness.

OK.  Serves me right for trying to do this quickly instead of doing it
right.

> For example, quite a few of the page bits really only make sense for
> the "page cache and anonymous pages" kind.
> 
> I think this includes some really fundamental bits like the lock bit
> (and the associated waiters bit), along with a lot of the "owner" aka
> "this can be used by the filesystem" bits.
> 
> I think it _also_ includes all the LRU and workingset bits etc.
> 
> So if we consider that kind of case the "normal" case, the not-normal
> case is likely (a) slab, (b) reserved pages and (c) zspages.,

There's always more things that people allocate pages for than you think.
I have a (presumably incomplete) list here:
https://kernelnewbies.org/MemoryTypes

As I wrote there (and promptly forgot, so it's a good thing I wrote
it down), any page that gets mapped to userspace needs both the locked
and dirty bits.  And random device drivers allocate pages and map them
to userspace, so those bits need to be available, even for pages that
a device driver has decided to mark as Reserved (a hang-over from
when we used to require that for ioremap?)

So I think the flags end up looking like:

0	Locked
1	Writeback
2	Dirty
3	Head
4	Type (new name for xyzzy)
	(if type == 0)			(if type == 1)
5	Referenced			Slab
6	Active				Buddy
7	Waiters				Waiters
8	LRU				VMalloc
9	Workingset			Offline
10	Error				Table
11	OwnerPriv1			Guard
12	Private				Reserved
13	Private2
14	Uptodate/Reported
15	Arch1
16	MappedToDisk
17	Reclaim/Readahead/Isolated
18	Swapbacked
19	Unevictable
20	MLocked*
21	Uncached*
22	HWPoison*
23	Young*
24	Idle*
25	Arch2*
(* Kconfig dependent)

I might want to do another pass on this list to sort the flags
for all-LRU-pages before which ones change meaning depending on
file-vs-anon.  And maybe Reported shouldn't share a bit with Uptodate,
perhaps it's just part of the OwnerPriv1/Private/Private2 mess.

It does end up getting rid of the PageType mechanism, and letting us
type pages allocated to VMalloc.

> We already have some page flag bits that are only used for those kinds
> of odd pages: the page_flags field is used only for zspages, but other
> pages can (misuse) that field for PG_buddy/offline/etc. That whole
> thing is particularly ugly in how it tries to make sure there are is
> no mapcount use of it.

I think you meant page_type here?  I was proud of how much better I made
it than it was (people used to _both_ set bits in that field _and_
map pages with those bits set into userspace ... fortunately nobody
assigned a CVE to that problem).  But it's a little bit too clever,
and I'd love to be rid of it.  And it failed to accomplish my original
goal of being able to mark pages as being allocated by vmalloc.

I think zspages using ->page_type is probably wrong.  zspages should be
using its own type like slab, but I haven't done the work to split it
out yet.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-10 15:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-09 20:50 [RFC] Free up a page flag Matthew Wilcox
2022-03-09 22:07 ` Linus Torvalds
2022-03-10  8:36   ` David Hildenbrand
2022-03-10 15:39   ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).