All of lore.kernel.org
 help / color / mirror / Atom feed
* Freeing page flags
@ 2022-05-12 20:54 Matthew Wilcox
  2022-05-13  2:41 ` Josef Bacik
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Matthew Wilcox @ 2022-05-12 20:54 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm

The LWN writeup [1] on merging the MGLRU reminded me that I need to send
out a plan for removing page flags that we can do without.

1. PG_error.  It's basically useless.  If the page was read successfully,
PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
doesn't use PG_error.  Some filesystems do, and we need to transition
them away from using it.

2. PG_private.  This tells us whether we have anything stored at
page->private.  We can just check if page->private is NULL or not.
No need to have this extra bit.  Again, there may be some filesystems
that are a bit wonky here, but I'm sure they're fixable.

3. PG_mappedtodisk.  This is really only used by the buffer cache.
Once the filesystems that use bufferheads have been converted, this can
go away.

4. I think I can also consolidate PG_slab and PG_reserved into a "single
bit" (not really, but change the encoding so that effectively they only
take a single bit).

That gives us 4 bits back, which should relieve the pressure on page flag
bits for a while.  I have Thoughts on PG_private_2 and PG_owner_priv_1,
as well as a suspicion that not all combinations of referenced, lru,
active, workingset, reclaim and unevictable are possible, and there
might be scope for a better encoding.  But I don't know that we need to
do that work; gaining back 4 bits is already a Big Deal.

I'm slowly doing the PG_private transition as part of the folio work.
For example, eagle eyed reviewers may have spotted that there is no
folio_has_buffers().  Converted code calls folio_buffers() and checks
if it's NULL.  Help from filesystem maintainers on removing the uses of
PG_error gratefully appreciated.

[1] https://lwn.net/Articles/894859/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-12 20:54 Freeing page flags Matthew Wilcox
@ 2022-05-13  2:41 ` Josef Bacik
  2022-05-13  3:39   ` Matthew Wilcox
  2022-05-13  3:46 ` Yu Zhao
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2022-05-13  2:41 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm

On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
> The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> out a plan for removing page flags that we can do without.
> 
> 1. PG_error.  It's basically useless.  If the page was read successfully,
> PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> doesn't use PG_error.  Some filesystems do, and we need to transition
> them away from using it.
>

What about writes?  A cursory look shows we don't clear Uptodate if we fail to
write, which is correct I think.  The only way to indicate we had a write error
to check later is the page error.

> 2. PG_private.  This tells us whether we have anything stored at
> page->private.  We can just check if page->private is NULL or not.
> No need to have this extra bit.  Again, there may be some filesystems
> that are a bit wonky here, but I'm sure they're fixable.
> 

At least for Btrfs we serialize the page->private with the private_lock, so we
could probably just drop PG_private, but it's kind of nice to check first before
we have to take the spin lock.  I suppose we can just do

if (page->private)
	// do lock and check thingy

Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-13  2:41 ` Josef Bacik
@ 2022-05-13  3:39   ` Matthew Wilcox
  2022-05-13  9:40     ` Luís Henriques
  2022-05-13 13:17     ` Josef Bacik
  0 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox @ 2022-05-13  3:39 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-fsdevel, linux-mm

On Thu, May 12, 2022 at 10:41:41PM -0400, Josef Bacik wrote:
> On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
> > The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> > out a plan for removing page flags that we can do without.
> > 
> > 1. PG_error.  It's basically useless.  If the page was read successfully,
> > PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> > doesn't use PG_error.  Some filesystems do, and we need to transition
> > them away from using it.
> >
> 
> What about writes?  A cursory look shows we don't clear Uptodate if we fail to
> write, which is correct I think.  The only way to indicate we had a write error
> to check later is the page error.

On encountering a write error, we're supposed to call mapping_set_error(),
not SetPageError().

> > 2. PG_private.  This tells us whether we have anything stored at
> > page->private.  We can just check if page->private is NULL or not.
> > No need to have this extra bit.  Again, there may be some filesystems
> > that are a bit wonky here, but I'm sure they're fixable.
> > 
> 
> At least for Btrfs we serialize the page->private with the private_lock, so we
> could probably just drop PG_private, but it's kind of nice to check first before
> we have to take the spin lock.  I suppose we can just do
> 
> if (page->private)
> 	// do lock and check thingy

That's my hope!  I think btrfs is already using folio_attach_private() /
attach_page_private(), which makes everything easier.  Some filesystems
still manipulate page->private and PagePrivate by hand.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-12 20:54 Freeing page flags Matthew Wilcox
  2022-05-13  2:41 ` Josef Bacik
@ 2022-05-13  3:46 ` Yu Zhao
  2022-05-14  6:10 ` Eric Biggers
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Yu Zhao @ 2022-05-13  3:46 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, Linux-MM

On Thu, May 12, 2022 at 2:55 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> out a plan for removing page flags that we can do without.

Much appreciated.

> 1. PG_error.  It's basically useless.  If the page was read successfully,
> PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> doesn't use PG_error.  Some filesystems do, and we need to transition
> them away from using it.
>
> 2. PG_private.  This tells us whether we have anything stored at
> page->private.  We can just check if page->private is NULL or not.
> No need to have this extra bit.  Again, there may be some filesystems
> that are a bit wonky here, but I'm sure they're fixable.
>
> 3. PG_mappedtodisk.  This is really only used by the buffer cache.
> Once the filesystems that use bufferheads have been converted, this can
> go away.
>
> 4. I think I can also consolidate PG_slab and PG_reserved into a "single
> bit" (not really, but change the encoding so that effectively they only
> take a single bit).
>
> That gives us 4 bits back, which should relieve the pressure on page flag
> bits for a while.  I have Thoughts on PG_private_2 and PG_owner_priv_1,
> as well as a suspicion that not all combinations of referenced, lru,
> active, workingset, reclaim and unevictable are possible, and there
> might be scope for a better encoding.  But I don't know that we need to
> do that work; gaining back 4 bits is already a Big Deal.

PG_active, PG_ unevictable and PG_swapbacked seem to be the low
hanging fruit among those you mentioned above. They indicate which LRU
list a folio is currently on, was deleted from or should be added to.
We should be able to use the spare bits in folio->lru for this
purpose. WDYT?

> I'm slowly doing the PG_private transition as part of the folio work.
> For example, eagle eyed reviewers may have spotted that there is no
> folio_has_buffers().  Converted code calls folio_buffers() and checks
> if it's NULL.  Help from filesystem maintainers on removing the uses of
> PG_error gratefully appreciated.
>
> [1] https://lwn.net/Articles/894859/

We'd be very happy to help with testing and reviewing, etc.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-13  3:39   ` Matthew Wilcox
@ 2022-05-13  9:40     ` Luís Henriques
  2022-05-13 12:53       ` Matthew Wilcox
  2022-05-13 13:17     ` Josef Bacik
  1 sibling, 1 reply; 16+ messages in thread
From: Luís Henriques @ 2022-05-13  9:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Xiubo Li, Jeff Layton, Josef Bacik, linux-fsdevel, linux-mm

Matthew Wilcox <willy@infradead.org> writes:

> On Thu, May 12, 2022 at 10:41:41PM -0400, Josef Bacik wrote:
>> On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
>> > The LWN writeup [1] on merging the MGLRU reminded me that I need to send
>> > out a plan for removing page flags that we can do without.
>> > 
>> > 1. PG_error.  It's basically useless.  If the page was read successfully,
>> > PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
>> > doesn't use PG_error.  Some filesystems do, and we need to transition
>> > them away from using it.
>> >
>> 
>> What about writes?  A cursory look shows we don't clear Uptodate if we fail to
>> write, which is correct I think.  The only way to indicate we had a write error
>> to check later is the page error.
>
> On encountering a write error, we're supposed to call mapping_set_error(),
> not SetPageError().
>
>> > 2. PG_private.  This tells us whether we have anything stored at
>> > page->private.  We can just check if page->private is NULL or not.
>> > No need to have this extra bit.  Again, there may be some filesystems
>> > that are a bit wonky here, but I'm sure they're fixable.
>> > 
>> 
>> At least for Btrfs we serialize the page->private with the private_lock, so we
>> could probably just drop PG_private, but it's kind of nice to check first before
>> we have to take the spin lock.  I suppose we can just do
>> 
>> if (page->private)
>> 	// do lock and check thingy
>
> That's my hope!  I think btrfs is already using folio_attach_private() /
> attach_page_private(), which makes everything easier.  Some filesystems
> still manipulate page->private and PagePrivate by hand.

In ceph we've recently [1] spent a bit of time debugging a bug related
with ->private not being NULL even though we expected it to be.  The
solution found was to replace the check for NULL and use
folio_test_private() instead, but we _may_ have not figured the whole
thing out.

We assumed that folios were being recycled and not cleaned-up.  The values
we were seeing in ->private looked like they were some sort of flags as
only a few bits were set (e.g. 0x0200000):

[ 1672.578313] page:00000000e23868c1 refcount:2 mapcount:0 mapping:0000000022e0d3b4 index:0xd8 pfn:0x74e83
[ 1672.581934] aops:ceph_aops [ceph] ino:10000016c9e dentry name:"faed"
[ 1672.584457] flags: 0x4000000000000015(locked|uptodate|lru|zone=1)
[ 1672.586878] raw: 4000000000000015 ffffea0001d3a108 ffffea0001d3a088 ffff888003491948
[ 1672.589894] raw: 00000000000000d8 0000000000200000 00000002ffffffff 0000000000000000
[ 1672.592935] page dumped because: VM_BUG_ON_FOLIO(1)

[1] https://lore.kernel.org/all/20220508061543.318394-1-xiubli@redhat.com/

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-13  9:40     ` Luís Henriques
@ 2022-05-13 12:53       ` Matthew Wilcox
  2022-05-13 13:18         ` Luís Henriques
                           ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Matthew Wilcox @ 2022-05-13 12:53 UTC (permalink / raw)
  To: Luís Henriques
  Cc: Xiubo Li, Jeff Layton, Josef Bacik, linux-fsdevel, linux-mm

On Fri, May 13, 2022 at 10:40:05AM +0100, Luís Henriques wrote:
> Matthew Wilcox <willy@infradead.org> writes:
> 
> > On Thu, May 12, 2022 at 10:41:41PM -0400, Josef Bacik wrote:
> >> On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
> >> > The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> >> > out a plan for removing page flags that we can do without.
> >> > 
> >> > 1. PG_error.  It's basically useless.  If the page was read successfully,
> >> > PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> >> > doesn't use PG_error.  Some filesystems do, and we need to transition
> >> > them away from using it.
> >> >
> >> 
> >> What about writes?  A cursory look shows we don't clear Uptodate if we fail to
> >> write, which is correct I think.  The only way to indicate we had a write error
> >> to check later is the page error.
> >
> > On encountering a write error, we're supposed to call mapping_set_error(),
> > not SetPageError().
> >
> >> > 2. PG_private.  This tells us whether we have anything stored at
> >> > page->private.  We can just check if page->private is NULL or not.
> >> > No need to have this extra bit.  Again, there may be some filesystems
> >> > that are a bit wonky here, but I'm sure they're fixable.
> >> > 
> >> 
> >> At least for Btrfs we serialize the page->private with the private_lock, so we
> >> could probably just drop PG_private, but it's kind of nice to check first before
> >> we have to take the spin lock.  I suppose we can just do
> >> 
> >> if (page->private)
> >> 	// do lock and check thingy
> >
> > That's my hope!  I think btrfs is already using folio_attach_private() /
> > attach_page_private(), which makes everything easier.  Some filesystems
> > still manipulate page->private and PagePrivate by hand.
> 
> In ceph we've recently [1] spent a bit of time debugging a bug related
> with ->private not being NULL even though we expected it to be.  The
> solution found was to replace the check for NULL and use
> folio_test_private() instead, but we _may_ have not figured the whole
> thing out.
> 
> We assumed that folios were being recycled and not cleaned-up.  The values
> we were seeing in ->private looked like they were some sort of flags as
> only a few bits were set (e.g. 0x0200000):
> 
> [ 1672.578313] page:00000000e23868c1 refcount:2 mapcount:0 mapping:0000000022e0d3b4 index:0xd8 pfn:0x74e83
> [ 1672.581934] aops:ceph_aops [ceph] ino:10000016c9e dentry name:"faed"
> [ 1672.584457] flags: 0x4000000000000015(locked|uptodate|lru|zone=1)
> [ 1672.586878] raw: 4000000000000015 ffffea0001d3a108 ffffea0001d3a088 ffff888003491948
> [ 1672.589894] raw: 00000000000000d8 0000000000200000 00000002ffffffff 0000000000000000
> [ 1672.592935] page dumped because: VM_BUG_ON_FOLIO(1)
> 
> [1] https://lore.kernel.org/all/20220508061543.318394-1-xiubli@redhat.com/

I remember Jeff asking me about this problem a few days ago.  A folio
passed to you in ->dirty_folio() or ->invalidate_folio() belongs to
your filesystem.  Nobody else should be storing to the ->private field;
there's no race that could lead to it being freed while you see it.
There may, of course, be bugs that are overwriting folio->private, but
it's definitely not supposed to happen.  I agree with you that it looks
like a bit has been set (is it possibly bad RAM?)

We do use page->private in the buddy allocator, but that stores the order
of the page; it wouldn't be storing 1<<21.  PG flag 21 is PG_mlocked,
which seems like a weird one to be setting in the wrong field, so probably
not that.

Is it always bit 21 that gets set?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-13  3:39   ` Matthew Wilcox
  2022-05-13  9:40     ` Luís Henriques
@ 2022-05-13 13:17     ` Josef Bacik
  1 sibling, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2022-05-13 13:17 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm

On Fri, May 13, 2022 at 04:39:28AM +0100, Matthew Wilcox wrote:
> On Thu, May 12, 2022 at 10:41:41PM -0400, Josef Bacik wrote:
> > On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
> > > The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> > > out a plan for removing page flags that we can do without.
> > > 
> > > 1. PG_error.  It's basically useless.  If the page was read successfully,
> > > PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> > > doesn't use PG_error.  Some filesystems do, and we need to transition
> > > them away from using it.
> > >
> > 
> > What about writes?  A cursory look shows we don't clear Uptodate if we fail to
> > write, which is correct I think.  The only way to indicate we had a write error
> > to check later is the page error.
> 
> On encountering a write error, we're supposed to call mapping_set_error(),
> not SetPageError().
> 

Yup I can't read, the places I was looking did mapping_set_error() in a
different area from SetPageError() so I got confused, so this can be ripped out
of btrfs with no problems.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-13 12:53       ` Matthew Wilcox
@ 2022-05-13 13:18         ` Luís Henriques
  2022-05-13 13:21         ` Jeff Layton
  2022-05-17  0:34         ` Xiubo Li
  2 siblings, 0 replies; 16+ messages in thread
From: Luís Henriques @ 2022-05-13 13:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Xiubo Li, Jeff Layton, Josef Bacik, linux-fsdevel, linux-mm

Matthew Wilcox <willy@infradead.org> writes:

> On Fri, May 13, 2022 at 10:40:05AM +0100, Luís Henriques wrote:
>> Matthew Wilcox <willy@infradead.org> writes:
>> 
>> > On Thu, May 12, 2022 at 10:41:41PM -0400, Josef Bacik wrote:
>> >> On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
>> >> > The LWN writeup [1] on merging the MGLRU reminded me that I need to send
>> >> > out a plan for removing page flags that we can do without.
>> >> > 
>> >> > 1. PG_error.  It's basically useless.  If the page was read successfully,
>> >> > PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
>> >> > doesn't use PG_error.  Some filesystems do, and we need to transition
>> >> > them away from using it.
>> >> >
>> >> 
>> >> What about writes?  A cursory look shows we don't clear Uptodate if we fail to
>> >> write, which is correct I think.  The only way to indicate we had a write error
>> >> to check later is the page error.
>> >
>> > On encountering a write error, we're supposed to call mapping_set_error(),
>> > not SetPageError().
>> >
>> >> > 2. PG_private.  This tells us whether we have anything stored at
>> >> > page->private.  We can just check if page->private is NULL or not.
>> >> > No need to have this extra bit.  Again, there may be some filesystems
>> >> > that are a bit wonky here, but I'm sure they're fixable.
>> >> > 
>> >> 
>> >> At least for Btrfs we serialize the page->private with the private_lock, so we
>> >> could probably just drop PG_private, but it's kind of nice to check first before
>> >> we have to take the spin lock.  I suppose we can just do
>> >> 
>> >> if (page->private)
>> >> 	// do lock and check thingy
>> >
>> > That's my hope!  I think btrfs is already using folio_attach_private() /
>> > attach_page_private(), which makes everything easier.  Some filesystems
>> > still manipulate page->private and PagePrivate by hand.
>> 
>> In ceph we've recently [1] spent a bit of time debugging a bug related
>> with ->private not being NULL even though we expected it to be.  The
>> solution found was to replace the check for NULL and use
>> folio_test_private() instead, but we _may_ have not figured the whole
>> thing out.
>> 
>> We assumed that folios were being recycled and not cleaned-up.  The values
>> we were seeing in ->private looked like they were some sort of flags as
>> only a few bits were set (e.g. 0x0200000):
>> 
>> [ 1672.578313] page:00000000e23868c1 refcount:2 mapcount:0 mapping:0000000022e0d3b4 index:0xd8 pfn:0x74e83
>> [ 1672.581934] aops:ceph_aops [ceph] ino:10000016c9e dentry name:"faed"
>> [ 1672.584457] flags: 0x4000000000000015(locked|uptodate|lru|zone=1)
>> [ 1672.586878] raw: 4000000000000015 ffffea0001d3a108 ffffea0001d3a088 ffff888003491948
>> [ 1672.589894] raw: 00000000000000d8 0000000000200000 00000002ffffffff 0000000000000000
>> [ 1672.592935] page dumped because: VM_BUG_ON_FOLIO(1)
>> 
>> [1] https://lore.kernel.org/all/20220508061543.318394-1-xiubli@redhat.com/
>
> I remember Jeff asking me about this problem a few days ago.  A folio
> passed to you in ->dirty_folio() or ->invalidate_folio() belongs to
> your filesystem.  Nobody else should be storing to the ->private field;
> there's no race that could lead to it being freed while you see it.

Right, I would assume so, obviously.  Our question was more on re-using
folios: is it guaranteed that this field will always be set to NULL, or is
it just the flag that is cleaned?

> There may, of course, be bugs that are overwriting folio->private, but
> it's definitely not supposed to happen.  I agree with you that it looks
> like a bit has been set (is it possibly bad RAM?)

No, I don't think it's bad RAM.  Both me and Xiubo were able to reproduce
it (although it took a while, at least for me) in different boxes (VMs).

> We do use page->private in the buddy allocator, but that stores the order
> of the page; it wouldn't be storing 1<<21.  PG flag 21 is PG_mlocked,
> which seems like a weird one to be setting in the wrong field, so probably
> not that.
>
> Is it always bit 21 that gets set?

No, it wasn't always the same bits and it wasn't always a single bit.  The
only examples I've here with me are 0x0200000 and 0x0d0000, but I'm sure
we've seen other similar values.

So, from what you stated above, looks like we'll need to revisit this and
do some more debug.

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-13 12:53       ` Matthew Wilcox
  2022-05-13 13:18         ` Luís Henriques
@ 2022-05-13 13:21         ` Jeff Layton
  2022-05-13 13:38           ` Matthew Wilcox
  2022-05-17  0:34         ` Xiubo Li
  2 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2022-05-13 13:21 UTC (permalink / raw)
  To: Matthew Wilcox, Luís Henriques
  Cc: Xiubo Li, Josef Bacik, linux-fsdevel, linux-mm

On Fri, 2022-05-13 at 13:53 +0100, Matthew Wilcox wrote:
> On Fri, May 13, 2022 at 10:40:05AM +0100, Luís Henriques wrote:
> > Matthew Wilcox <willy@infradead.org> writes:
> > 
> > > On Thu, May 12, 2022 at 10:41:41PM -0400, Josef Bacik wrote:
> > > > On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
> > > > > The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> > > > > out a plan for removing page flags that we can do without.
> > > > > 
> > > > > 1. PG_error.  It's basically useless.  If the page was read successfully,
> > > > > PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> > > > > doesn't use PG_error.  Some filesystems do, and we need to transition
> > > > > them away from using it.
> > > > > 
> > > > 
> > > > What about writes?  A cursory look shows we don't clear Uptodate if we fail to
> > > > write, which is correct I think.  The only way to indicate we had a write error
> > > > to check later is the page error.
> > > 
> > > On encountering a write error, we're supposed to call mapping_set_error(),
> > > not SetPageError().
> > > 
> > > > > 2. PG_private.  This tells us whether we have anything stored at
> > > > > page->private.  We can just check if page->private is NULL or not.
> > > > > No need to have this extra bit.  Again, there may be some filesystems
> > > > > that are a bit wonky here, but I'm sure they're fixable.
> > > > > 
> > > > 
> > > > At least for Btrfs we serialize the page->private with the private_lock, so we
> > > > could probably just drop PG_private, but it's kind of nice to check first before
> > > > we have to take the spin lock.  I suppose we can just do
> > > > 
> > > > if (page->private)
> > > > 	// do lock and check thingy
> > > 
> > > That's my hope!  I think btrfs is already using folio_attach_private() /
> > > attach_page_private(), which makes everything easier.  Some filesystems
> > > still manipulate page->private and PagePrivate by hand.
> > 
> > In ceph we've recently [1] spent a bit of time debugging a bug related
> > with ->private not being NULL even though we expected it to be.  The
> > solution found was to replace the check for NULL and use
> > folio_test_private() instead, but we _may_ have not figured the whole
> > thing out.
> > 
> > We assumed that folios were being recycled and not cleaned-up.  The values
> > we were seeing in ->private looked like they were some sort of flags as
> > only a few bits were set (e.g. 0x0200000):
> > 
> > [ 1672.578313] page:00000000e23868c1 refcount:2 mapcount:0 mapping:0000000022e0d3b4 index:0xd8 pfn:0x74e83
> > [ 1672.581934] aops:ceph_aops [ceph] ino:10000016c9e dentry name:"faed"
> > [ 1672.584457] flags: 0x4000000000000015(locked|uptodate|lru|zone=1)
> > [ 1672.586878] raw: 4000000000000015 ffffea0001d3a108 ffffea0001d3a088 ffff888003491948
> > [ 1672.589894] raw: 00000000000000d8 0000000000200000 00000002ffffffff 0000000000000000
> > [ 1672.592935] page dumped because: VM_BUG_ON_FOLIO(1)
> > 
> > [1] https://lore.kernel.org/all/20220508061543.318394-1-xiubli@redhat.com/
> 
> I remember Jeff asking me about this problem a few days ago.  A folio
> passed to you in ->dirty_folio() or ->invalidate_folio() belongs to
> your filesystem.  Nobody else should be storing to the ->private field;
> there's no race that could lead to it being freed while you see it.
> There may, of course, be bugs that are overwriting folio->private, but
> it's definitely not supposed to happen.  I agree with you that it looks
> like a bit has been set (is it possibly bad RAM?)
> 
> We do use page->private in the buddy allocator, but that stores the order
> of the page; it wouldn't be storing 1<<21.  PG flag 21 is PG_mlocked,
> which seems like a weird one to be setting in the wrong field, so probably
> not that.
> 
> Is it always bit 21 that gets set?

No, it varies, but it was always just a few bits in the field that end
up being set. I was never able to reproduce it locally, but saw it in a
run in ceph's teuthology lab a few times. Xiubo did the most digging
here, so he may be able to add more info.

Basically though, we call __filemap_get_folio in netfs_write_begin and
it will sometimes give us a folio that has PG_private clear, but the
->private field has just a few bits that aren't zeroed out. I'm pretty
sure we zero out that field in ceph, so the theory was that the page was
traveling through some other subsystem before coming to us.

He wasn't able to ascertain the cause, and just decided to check for
PG_private instead since you (presumably) shouldn't trust ->private
unless that's set anyway.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-13 13:21         ` Jeff Layton
@ 2022-05-13 13:38           ` Matthew Wilcox
  2022-05-13 13:57             ` Jeff Layton
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2022-05-13 13:38 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Luís Henriques, Xiubo Li, Josef Bacik, linux-fsdevel, linux-mm

On Fri, May 13, 2022 at 09:21:11AM -0400, Jeff Layton wrote:
> On Fri, 2022-05-13 at 13:53 +0100, Matthew Wilcox wrote:
> > On Fri, May 13, 2022 at 10:40:05AM +0100, Luís Henriques wrote:
> > > Matthew Wilcox <willy@infradead.org> writes:
> > > 
> > > > On Thu, May 12, 2022 at 10:41:41PM -0400, Josef Bacik wrote:
> > > > > On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
> > > > > > The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> > > > > > out a plan for removing page flags that we can do without.
> > > > > > 
> > > > > > 1. PG_error.  It's basically useless.  If the page was read successfully,
> > > > > > PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> > > > > > doesn't use PG_error.  Some filesystems do, and we need to transition
> > > > > > them away from using it.
> > > > > > 
> > > > > 
> > > > > What about writes?  A cursory look shows we don't clear Uptodate if we fail to
> > > > > write, which is correct I think.  The only way to indicate we had a write error
> > > > > to check later is the page error.
> > > > 
> > > > On encountering a write error, we're supposed to call mapping_set_error(),
> > > > not SetPageError().
> > > > 
> > > > > > 2. PG_private.  This tells us whether we have anything stored at
> > > > > > page->private.  We can just check if page->private is NULL or not.
> > > > > > No need to have this extra bit.  Again, there may be some filesystems
> > > > > > that are a bit wonky here, but I'm sure they're fixable.
> > > > > > 
> > > > > 
> > > > > At least for Btrfs we serialize the page->private with the private_lock, so we
> > > > > could probably just drop PG_private, but it's kind of nice to check first before
> > > > > we have to take the spin lock.  I suppose we can just do
> > > > > 
> > > > > if (page->private)
> > > > > 	// do lock and check thingy
> > > > 
> > > > That's my hope!  I think btrfs is already using folio_attach_private() /
> > > > attach_page_private(), which makes everything easier.  Some filesystems
> > > > still manipulate page->private and PagePrivate by hand.
> > > 
> > > In ceph we've recently [1] spent a bit of time debugging a bug related
> > > with ->private not being NULL even though we expected it to be.  The
> > > solution found was to replace the check for NULL and use
> > > folio_test_private() instead, but we _may_ have not figured the whole
> > > thing out.
> > > 
> > > We assumed that folios were being recycled and not cleaned-up.  The values
> > > we were seeing in ->private looked like they were some sort of flags as
> > > only a few bits were set (e.g. 0x0200000):
> > > 
> > > [ 1672.578313] page:00000000e23868c1 refcount:2 mapcount:0 mapping:0000000022e0d3b4 index:0xd8 pfn:0x74e83
> > > [ 1672.581934] aops:ceph_aops [ceph] ino:10000016c9e dentry name:"faed"
> > > [ 1672.584457] flags: 0x4000000000000015(locked|uptodate|lru|zone=1)
> > > [ 1672.586878] raw: 4000000000000015 ffffea0001d3a108 ffffea0001d3a088 ffff888003491948
> > > [ 1672.589894] raw: 00000000000000d8 0000000000200000 00000002ffffffff 0000000000000000
> > > [ 1672.592935] page dumped because: VM_BUG_ON_FOLIO(1)
> > > 
> > > [1] https://lore.kernel.org/all/20220508061543.318394-1-xiubli@redhat.com/
> > 
> > I remember Jeff asking me about this problem a few days ago.  A folio
> > passed to you in ->dirty_folio() or ->invalidate_folio() belongs to
> > your filesystem.  Nobody else should be storing to the ->private field;
> > there's no race that could lead to it being freed while you see it.
> > There may, of course, be bugs that are overwriting folio->private, but
> > it's definitely not supposed to happen.  I agree with you that it looks
> > like a bit has been set (is it possibly bad RAM?)
> > 
> > We do use page->private in the buddy allocator, but that stores the order
> > of the page; it wouldn't be storing 1<<21.  PG flag 21 is PG_mlocked,
> > which seems like a weird one to be setting in the wrong field, so probably
> > not that.
> > 
> > Is it always bit 21 that gets set?
> 
> No, it varies, but it was always just a few bits in the field that end
> up being set. I was never able to reproduce it locally, but saw it in a
> run in ceph's teuthology lab a few times. Xiubo did the most digging
> here, so he may be able to add more info.
> 
> Basically though, we call __filemap_get_folio in netfs_write_begin and
> it will sometimes give us a folio that has PG_private clear, but the
> ->private field has just a few bits that aren't zeroed out. I'm pretty
> sure we zero out that field in ceph, so the theory was that the page was
> traveling through some other subsystem before coming to us.

It _shouldn't_ be.  __filemap_get_folio() may return a page that was
already in the page cache (and so may have had page->private set by
the filesystem originally), or it may allocate a fresh page in
filemap_alloc_folio() which _should_ come with page->private clear.
Adding an assert that is true might be a good debugging tactic:

+++ b/mm/filemap.c
@@ -2008,6 +2008,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
                                goto repeat;
                }

+VM_BUG_ON_FOLIO(folio->private, folio);
                /*
                 * filemap_add_folio locks the page, and for mmap
                 * we expect an unlocked page.

> He wasn't able to ascertain the cause, and just decided to check for
> PG_private instead since you (presumably) shouldn't trust ->private
> unless that's set anyway.

They are usually in sync ... which means we can reclaim the flag ;-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-13 13:38           ` Matthew Wilcox
@ 2022-05-13 13:57             ` Jeff Layton
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Layton @ 2022-05-13 13:57 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luís Henriques, Xiubo Li, Josef Bacik, linux-fsdevel, linux-mm

On Fri, 2022-05-13 at 14:38 +0100, Matthew Wilcox wrote:
> On Fri, May 13, 2022 at 09:21:11AM -0400, Jeff Layton wrote:
> > On Fri, 2022-05-13 at 13:53 +0100, Matthew Wilcox wrote:
> > > On Fri, May 13, 2022 at 10:40:05AM +0100, Luís Henriques wrote:
> > > > Matthew Wilcox <willy@infradead.org> writes:
> > > > 
> > > > > On Thu, May 12, 2022 at 10:41:41PM -0400, Josef Bacik wrote:
> > > > > > On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
> > > > > > > The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> > > > > > > out a plan for removing page flags that we can do without.
> > > > > > > 
> > > > > > > 1. PG_error.  It's basically useless.  If the page was read successfully,
> > > > > > > PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> > > > > > > doesn't use PG_error.  Some filesystems do, and we need to transition
> > > > > > > them away from using it.
> > > > > > > 
> > > > > > 
> > > > > > What about writes?  A cursory look shows we don't clear Uptodate if we fail to
> > > > > > write, which is correct I think.  The only way to indicate we had a write error
> > > > > > to check later is the page error.
> > > > > 
> > > > > On encountering a write error, we're supposed to call mapping_set_error(),
> > > > > not SetPageError().
> > > > > 
> > > > > > > 2. PG_private.  This tells us whether we have anything stored at
> > > > > > > page->private.  We can just check if page->private is NULL or not.
> > > > > > > No need to have this extra bit.  Again, there may be some filesystems
> > > > > > > that are a bit wonky here, but I'm sure they're fixable.
> > > > > > > 
> > > > > > 
> > > > > > At least for Btrfs we serialize the page->private with the private_lock, so we
> > > > > > could probably just drop PG_private, but it's kind of nice to check first before
> > > > > > we have to take the spin lock.  I suppose we can just do
> > > > > > 
> > > > > > if (page->private)
> > > > > > 	// do lock and check thingy
> > > > > 
> > > > > That's my hope!  I think btrfs is already using folio_attach_private() /
> > > > > attach_page_private(), which makes everything easier.  Some filesystems
> > > > > still manipulate page->private and PagePrivate by hand.
> > > > 
> > > > In ceph we've recently [1] spent a bit of time debugging a bug related
> > > > with ->private not being NULL even though we expected it to be.  The
> > > > solution found was to replace the check for NULL and use
> > > > folio_test_private() instead, but we _may_ have not figured the whole
> > > > thing out.
> > > > 
> > > > We assumed that folios were being recycled and not cleaned-up.  The values
> > > > we were seeing in ->private looked like they were some sort of flags as
> > > > only a few bits were set (e.g. 0x0200000):
> > > > 
> > > > [ 1672.578313] page:00000000e23868c1 refcount:2 mapcount:0 mapping:0000000022e0d3b4 index:0xd8 pfn:0x74e83
> > > > [ 1672.581934] aops:ceph_aops [ceph] ino:10000016c9e dentry name:"faed"
> > > > [ 1672.584457] flags: 0x4000000000000015(locked|uptodate|lru|zone=1)
> > > > [ 1672.586878] raw: 4000000000000015 ffffea0001d3a108 ffffea0001d3a088 ffff888003491948
> > > > [ 1672.589894] raw: 00000000000000d8 0000000000200000 00000002ffffffff 0000000000000000
> > > > [ 1672.592935] page dumped because: VM_BUG_ON_FOLIO(1)
> > > > 
> > > > [1] https://lore.kernel.org/all/20220508061543.318394-1-xiubli@redhat.com/
> > > 
> > > I remember Jeff asking me about this problem a few days ago.  A folio
> > > passed to you in ->dirty_folio() or ->invalidate_folio() belongs to
> > > your filesystem.  Nobody else should be storing to the ->private field;
> > > there's no race that could lead to it being freed while you see it.
> > > There may, of course, be bugs that are overwriting folio->private, but
> > > it's definitely not supposed to happen.  I agree with you that it looks
> > > like a bit has been set (is it possibly bad RAM?)
> > > 
> > > We do use page->private in the buddy allocator, but that stores the order
> > > of the page; it wouldn't be storing 1<<21.  PG flag 21 is PG_mlocked,
> > > which seems like a weird one to be setting in the wrong field, so probably
> > > not that.
> > > 
> > > Is it always bit 21 that gets set?
> > 
> > No, it varies, but it was always just a few bits in the field that end
> > up being set. I was never able to reproduce it locally, but saw it in a
> > run in ceph's teuthology lab a few times. Xiubo did the most digging
> > here, so he may be able to add more info.
> > 
> > Basically though, we call __filemap_get_folio in netfs_write_begin and
> > it will sometimes give us a folio that has PG_private clear, but the
> > ->private field has just a few bits that aren't zeroed out. I'm pretty
> > sure we zero out that field in ceph, so the theory was that the page was
> > traveling through some other subsystem before coming to us.
> 
> It _shouldn't_ be.  __filemap_get_folio() may return a page that was
> already in the page cache (and so may have had page->private set by
> the filesystem originally), or it may allocate a fresh page in
> filemap_alloc_folio() which _should_ come with page->private clear.
> Adding an assert that is true might be a good debugging tactic:
> 
> +++ b/mm/filemap.c
> @@ -2008,6 +2008,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
>                                 goto repeat;
>                 }
> 
> +VM_BUG_ON_FOLIO(folio->private, folio);
>                 /*
>                  * filemap_add_folio locks the page, and for mmap
>                  * we expect an unlocked page.
> 
> > He wasn't able to ascertain the cause, and just decided to check for
> > PG_private instead since you (presumably) shouldn't trust ->private
> > unless that's set anyway.
> 
> They are usually in sync ... which means we can reclaim the flag ;-)

Agreed. I'm all for freeing up a page bit and PG_private seems like
belt-and-suspenders stuff. There are some details in this tracker
ticket.

This note in particular seems to indicate that ->private is not always
coming back as zero:

    https://tracker.ceph.com/issues/55421#note-20

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-12 20:54 Freeing page flags Matthew Wilcox
  2022-05-13  2:41 ` Josef Bacik
  2022-05-13  3:46 ` Yu Zhao
@ 2022-05-14  6:10 ` Eric Biggers
  2022-05-23  6:38 ` Mike Rapoport
  2022-05-30 16:54 ` David Hildenbrand
  4 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2022-05-14  6:10 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm

On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
> The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> out a plan for removing page flags that we can do without.
> 
> 1. PG_error.  It's basically useless.  If the page was read successfully,
> PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> doesn't use PG_error.  Some filesystems do, and we need to transition
> them away from using it.

One of the uses of PG_error is in ext4 and f2fs to keep track of decryption
(fscrypt) and verity (fs-verity) failures on pages in a bio being read.
Decryption and verity happen in separate phases, so PG_error is used to record
which subset of pages a decryption error occurred on.  Similarly, PG_error is
also used to record which subset of pages a verity error occurred on, which the
filesystem uses to decide whether to set PG_uptodate afterwards.

If we had to get rid of it, one idea would be:

* Don't keep track of decryption failures on a per-page basis; just fail the
  whole bio if one occurs.  That's how the block layer works, after all; you
  just get one ->bi_status and not an individual status for each page.  Also,
  decryption failures never really happen in practice anyway.

* Also merge together the verity step and the setting-uptodate+unlocking step,
  so that PG_error isn't needed to propagate the page status between these.
  This should work since verity only happens at the end anyway.

I was also thinking about just unlocking pages as errors occur on them.  I think
that wouldn't work, because the filesystem wouldn't be able to distinguish
between pages it still has locked and pages that got re-locked by someone else.

- Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-13 12:53       ` Matthew Wilcox
  2022-05-13 13:18         ` Luís Henriques
  2022-05-13 13:21         ` Jeff Layton
@ 2022-05-17  0:34         ` Xiubo Li
  2 siblings, 0 replies; 16+ messages in thread
From: Xiubo Li @ 2022-05-17  0:34 UTC (permalink / raw)
  To: Matthew Wilcox, Luís Henriques
  Cc: Jeff Layton, Josef Bacik, linux-fsdevel, linux-mm


On 5/13/22 8:53 PM, Matthew Wilcox wrote:
> On Fri, May 13, 2022 at 10:40:05AM +0100, Luís Henriques wrote:
>> Matthew Wilcox <willy@infradead.org> writes:
>>
>>> On Thu, May 12, 2022 at 10:41:41PM -0400, Josef Bacik wrote:
>>>> On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
>>>>> The LWN writeup [1] on merging the MGLRU reminded me that I need to send
>>>>> out a plan for removing page flags that we can do without.
>>>>>
>>>>> 1. PG_error.  It's basically useless.  If the page was read successfully,
>>>>> PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
>>>>> doesn't use PG_error.  Some filesystems do, and we need to transition
>>>>> them away from using it.
>>>>>
>>>> What about writes?  A cursory look shows we don't clear Uptodate if we fail to
>>>> write, which is correct I think.  The only way to indicate we had a write error
>>>> to check later is the page error.
>>> On encountering a write error, we're supposed to call mapping_set_error(),
>>> not SetPageError().
>>>
>>>>> 2. PG_private.  This tells us whether we have anything stored at
>>>>> page->private.  We can just check if page->private is NULL or not.
>>>>> No need to have this extra bit.  Again, there may be some filesystems
>>>>> that are a bit wonky here, but I'm sure they're fixable.
>>>>>
>>>> At least for Btrfs we serialize the page->private with the private_lock, so we
>>>> could probably just drop PG_private, but it's kind of nice to check first before
>>>> we have to take the spin lock.  I suppose we can just do
>>>>
>>>> if (page->private)
>>>> 	// do lock and check thingy
>>> That's my hope!  I think btrfs is already using folio_attach_private() /
>>> attach_page_private(), which makes everything easier.  Some filesystems
>>> still manipulate page->private and PagePrivate by hand.
>> In ceph we've recently [1] spent a bit of time debugging a bug related
>> with ->private not being NULL even though we expected it to be.  The
>> solution found was to replace the check for NULL and use
>> folio_test_private() instead, but we _may_ have not figured the whole
>> thing out.
>>
>> We assumed that folios were being recycled and not cleaned-up.  The values
>> we were seeing in ->private looked like they were some sort of flags as
>> only a few bits were set (e.g. 0x0200000):
>>
>> [ 1672.578313] page:00000000e23868c1 refcount:2 mapcount:0 mapping:0000000022e0d3b4 index:0xd8 pfn:0x74e83
>> [ 1672.581934] aops:ceph_aops [ceph] ino:10000016c9e dentry name:"faed"
>> [ 1672.584457] flags: 0x4000000000000015(locked|uptodate|lru|zone=1)
>> [ 1672.586878] raw: 4000000000000015 ffffea0001d3a108 ffffea0001d3a088 ffff888003491948
>> [ 1672.589894] raw: 00000000000000d8 0000000000200000 00000002ffffffff 0000000000000000
>> [ 1672.592935] page dumped because: VM_BUG_ON_FOLIO(1)
>>
>> [1] https://lore.kernel.org/all/20220508061543.318394-1-xiubli@redhat.com/
> I remember Jeff asking me about this problem a few days ago.  A folio
> passed to you in ->dirty_folio() or ->invalidate_folio() belongs to
> your filesystem.  Nobody else should be storing to the ->private field;
> there's no race that could lead to it being freed while you see it.
> There may, of course, be bugs that are overwriting folio->private, but
> it's definitely not supposed to happen.  I agree with you that it looks
> like a bit has been set (is it possibly bad RAM?)

I don't think so.

Please see the values I saw from my tests locally below.

>
> We do use page->private in the buddy allocator, but that stores the order
> of the page; it wouldn't be storing 1<<21.  PG flag 21 is PG_mlocked,
> which seems like a weird one to be setting in the wrong field, so probably
> not that.
>
> Is it always bit 21 that gets set?
>
No, from my test I can reproduce it locally very easy and almost every 
time the values were different, which were random values, just like:

0000000000040000,0000000000070000,000000000002e0000...

More detail please see https://tracker.ceph.com/issues/55421.

I am sure that for all the none zero values the lower 16 bits were 0000.

-- Xiubo



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-12 20:54 Freeing page flags Matthew Wilcox
                   ` (2 preceding siblings ...)
  2022-05-14  6:10 ` Eric Biggers
@ 2022-05-23  6:38 ` Mike Rapoport
  2022-06-10 17:39   ` David Hildenbrand
  2022-05-30 16:54 ` David Hildenbrand
  4 siblings, 1 reply; 16+ messages in thread
From: Mike Rapoport @ 2022-05-23  6:38 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm

On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
> The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> out a plan for removing page flags that we can do without.
> 
> 4. I think I can also consolidate PG_slab and PG_reserved into a "single
> bit" (not really, but change the encoding so that effectively they only
> take a single bit).

PG_reserved could be a PageType, AFAIR no reserved pages are ever mapped to
userspace
 
> That gives us 4 bits back, which should relieve the pressure on page flag
> bits for a while.  I have Thoughts on PG_private_2 and PG_owner_priv_1,
> as well as a suspicion that not all combinations of referenced, lru,
> active, workingset, reclaim and unevictable are possible, and there
> might be scope for a better encoding.  But I don't know that we need to
> do that work; gaining back 4 bits is already a Big Deal.
> 
> I'm slowly doing the PG_private transition as part of the folio work.
> For example, eagle eyed reviewers may have spotted that there is no
> folio_has_buffers().  Converted code calls folio_buffers() and checks
> if it's NULL.  Help from filesystem maintainers on removing the uses of
> PG_error gratefully appreciated.
> 
> [1] https://lwn.net/Articles/894859/
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-12 20:54 Freeing page flags Matthew Wilcox
                   ` (3 preceding siblings ...)
  2022-05-23  6:38 ` Mike Rapoport
@ 2022-05-30 16:54 ` David Hildenbrand
  4 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2022-05-30 16:54 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel, linux-mm

On 12.05.22 22:54, Matthew Wilcox wrote:
> The LWN writeup [1] on merging the MGLRU reminded me that I need to send
> out a plan for removing page flags that we can do without.
> 
> 1. PG_error.  It's basically useless.  If the page was read successfully,
> PG_uptodate is set.  If not, PG_uptodate is clear.  The page cache
> doesn't use PG_error.  Some filesystems do, and we need to transition
> them away from using it.
> 
> 2. PG_private.  This tells us whether we have anything stored at
> page->private.  We can just check if page->private is NULL or not.
> No need to have this extra bit.  Again, there may be some filesystems
> that are a bit wonky here, but I'm sure they're fixable.
> 
> 3. PG_mappedtodisk.  This is really only used by the buffer cache.
> Once the filesystems that use bufferheads have been converted, this can
> go away.

Nowadays (upstream) PG_anon_exclusive for anonymous memory.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Freeing page flags
  2022-05-23  6:38 ` Mike Rapoport
@ 2022-06-10 17:39   ` David Hildenbrand
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2022-06-10 17:39 UTC (permalink / raw)
  To: Mike Rapoport, Matthew Wilcox; +Cc: linux-fsdevel, linux-mm

On 23.05.22 08:38, Mike Rapoport wrote:
> On Thu, May 12, 2022 at 09:54:59PM +0100, Matthew Wilcox wrote:
>> The LWN writeup [1] on merging the MGLRU reminded me that I need to send
>> out a plan for removing page flags that we can do without.
>>
>> 4. I think I can also consolidate PG_slab and PG_reserved into a "single
>> bit" (not really, but change the encoding so that effectively they only
>> take a single bit).
> 
> PG_reserved could be a PageType, AFAIR no reserved pages are ever mapped to
> userspace

include/linux/page-flags.h documents for PG_reserved:

"Pages part of the kernel image (including vDSO)"

Further, vm_normal_page() contains a comment

"NOTE! We still have PageReserved() pages in the page tables. eg. VDSO
mappings can cause them to exist."

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-06-10 17:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12 20:54 Freeing page flags Matthew Wilcox
2022-05-13  2:41 ` Josef Bacik
2022-05-13  3:39   ` Matthew Wilcox
2022-05-13  9:40     ` Luís Henriques
2022-05-13 12:53       ` Matthew Wilcox
2022-05-13 13:18         ` Luís Henriques
2022-05-13 13:21         ` Jeff Layton
2022-05-13 13:38           ` Matthew Wilcox
2022-05-13 13:57             ` Jeff Layton
2022-05-17  0:34         ` Xiubo Li
2022-05-13 13:17     ` Josef Bacik
2022-05-13  3:46 ` Yu Zhao
2022-05-14  6:10 ` Eric Biggers
2022-05-23  6:38 ` Mike Rapoport
2022-06-10 17:39   ` David Hildenbrand
2022-05-30 16:54 ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.