linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Page zapping and page table reclaim
@ 2021-03-11 18:14 David Hildenbrand
  2021-03-11 21:26 ` Peter Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-03-11 18:14 UTC (permalink / raw)
  To: Linux Memory Management List
  Cc: Minchan Kim, Matthew Wilcox, Rik van Riel, Michal Hocko,
	Andrea Arcangeli, Peter Xu

Hi folks,

I was wondering, is there any mechanism that reclaims basically empty 
page tables in a running process?

Like: When I MADV_DONTNEED a huge range, there could be plenty of 
basically empty (e.g., all entries invalid) page tables we could 
reclaim. As soon as we zap a complete PMD we could reclaim (depending on 
the architecture) a whole page.

Zapping on the PMD level might make most impact I guess.

For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we 
need a total of 8 MB for the lowest level page tables (PTE).

OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB 
would mean we can free up another 4MB - rather a corner case and we can 
live with that.


Of course, the same might apply to other cases where we can restore all 
page table content from the VMA again. One example would be after 
MADV_FREE zapped a whole range of entries we marked.

Looks like if we happen to zap a THP, we should already get what we want 
(no page table, nothing to remove)

I haven't immediately stumbled over anything, but could be I am missing 
the obvious. I guess what would need some thought is concurrent 
discards/pagefaults - but it feels like being similar to 
collapsing/splitting a THP while there is other system activity.

Maybe there is already something and I am just not aware of it.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-11 18:14 Page zapping and page table reclaim David Hildenbrand
@ 2021-03-11 21:26 ` Peter Xu
  2021-03-11 21:35   ` David Hildenbrand
  2021-03-18 16:57 ` Vlastimil Babka
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2021-03-11 21:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux Memory Management List, Minchan Kim, Matthew Wilcox,
	Rik van Riel, Michal Hocko, Andrea Arcangeli

On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
> I was wondering, is there any mechanism that reclaims basically empty page
> tables in a running process?

Would munmap() count? :)

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-11 21:26 ` Peter Xu
@ 2021-03-11 21:35   ` David Hildenbrand
  2021-03-19 17:04     ` Yang Shi
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2021-03-11 21:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linux Memory Management List, Minchan Kim, Matthew Wilcox,
	Rik van Riel, Michal Hocko, Andrea Arcangeli

On 11.03.21 22:26, Peter Xu wrote:
> On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
>> I was wondering, is there any mechanism that reclaims basically empty page
>> tables in a running process?
> 
> Would munmap() count? :)

Haha, no -- also not mmap(FIXED) or mremap(FIXED) ;)

As so often lately, the use case is sparse memory mappings where we

a) may want to reuse the area later.
b) don't want to hold the mmap lock in write while optimizing
c) don't want to create a lot of individual mappings that we might not 
be able to merge again.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-11 18:14 Page zapping and page table reclaim David Hildenbrand
  2021-03-11 21:26 ` Peter Xu
@ 2021-03-18 16:57 ` Vlastimil Babka
  2021-03-18 23:53   ` Balbir Singh
  2021-03-18 18:03 ` Rik van Riel
  2021-03-24  9:55 ` David Hildenbrand
  3 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2021-03-18 16:57 UTC (permalink / raw)
  To: David Hildenbrand, Linux Memory Management List
  Cc: Minchan Kim, Matthew Wilcox, Rik van Riel, Michal Hocko,
	Andrea Arcangeli, Peter Xu

On 3/11/21 7:14 PM, David Hildenbrand wrote:
> Hi folks,
> 
> I was wondering, is there any mechanism that reclaims basically empty page
> tables in a running process?
> 
> Like: When I MADV_DONTNEED a huge range, there could be plenty of basically
> empty (e.g., all entries invalid) page tables we could reclaim. As soon as we
> zap a complete PMD we could reclaim (depending on the architecture) a whole page.
> 
> Zapping on the PMD level might make most impact I guess.
> 
> For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we need a
> total of 8 MB for the lowest level page tables (PTE).
> 
> OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB would mean
> we can free up another 4MB - rather a corner case and we can live with that.
> 
> 
> Of course, the same might apply to other cases where we can restore all page
> table content from the VMA again. One example would be after MADV_FREE zapped a
> whole range of entries we marked.

I don't think we have such mechanism, but IIRC I've heard the idea mentioned
before, probably from Michal Hocko. Definitely an interesting research project
idea to evaluate the cost vs benefits of that.

> Looks like if we happen to zap a THP, we should already get what we want (no
> page table, nothing to remove)
> 
> I haven't immediately stumbled over anything, but could be I am missing the
> obvious. I guess what would need some thought is concurrent discards/pagefaults
> - but it feels like being similar to collapsing/splitting a THP while there is
> other system activity.
> 
> Maybe there is already something and I am just not aware of it.
> 
> Thanks!
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-11 18:14 Page zapping and page table reclaim David Hildenbrand
  2021-03-11 21:26 ` Peter Xu
  2021-03-18 16:57 ` Vlastimil Babka
@ 2021-03-18 18:03 ` Rik van Riel
  2021-03-18 18:15   ` David Hildenbrand
  2021-03-24  9:55 ` David Hildenbrand
  3 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2021-03-18 18:03 UTC (permalink / raw)
  To: David Hildenbrand, Linux Memory Management List
  Cc: Minchan Kim, Matthew Wilcox, Michal Hocko, Andrea Arcangeli, Peter Xu

[-- Attachment #1: Type: text/plain, Size: 423 bytes --]

On Thu, 2021-03-11 at 19:14 +0100, David Hildenbrand wrote:
> Hi folks,
> 
> I was wondering, is there any mechanism that reclaims basically
> empty 
> page tables in a running process?

Currently we have do_munmap -> unmap_region -> free_pgtables,
which is hooked up only to sys_munmap.

We don't seem to have an equivalent for the various MADV_
options that lead to freed memory.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-18 18:03 ` Rik van Riel
@ 2021-03-18 18:15   ` David Hildenbrand
  0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-03-18 18:15 UTC (permalink / raw)
  To: Rik van Riel, Linux Memory Management List
  Cc: Minchan Kim, Matthew Wilcox, Michal Hocko, Andrea Arcangeli, Peter Xu

On 18.03.21 19:03, Rik van Riel wrote:
> On Thu, 2021-03-11 at 19:14 +0100, David Hildenbrand wrote:
>> Hi folks,
>>
>> I was wondering, is there any mechanism that reclaims basically
>> empty
>> page tables in a running process?
> 
> Currently we have do_munmap -> unmap_region -> free_pgtables,
> which is hooked up only to sys_munmap.
> 
> We don't seem to have an equivalent for the various MADV_
> options that lead to freed memory.
> 

The other path I am interested in is doing an 
fallocate(FALLOC_FL_PUNCH_HOLE)/MADV_REMOVE on a shared file/mapping, 
resulting in the same situation AFAIKS.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-18 16:57 ` Vlastimil Babka
@ 2021-03-18 23:53   ` Balbir Singh
  2021-03-19 12:44     ` David Hildenbrand
  0 siblings, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2021-03-18 23:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Hildenbrand, Linux Memory Management List, Minchan Kim,
	Matthew Wilcox, Rik van Riel, Michal Hocko, Andrea Arcangeli,
	Peter Xu

On Thu, Mar 18, 2021 at 05:57:06PM +0100, Vlastimil Babka wrote:
> On 3/11/21 7:14 PM, David Hildenbrand wrote:
> > Hi folks,
> > 
> > I was wondering, is there any mechanism that reclaims basically empty page
> > tables in a running process?
> > 
> > Like: When I MADV_DONTNEED a huge range, there could be plenty of basically
> > empty (e.g., all entries invalid) page tables we could reclaim. As soon as we
> > zap a complete PMD we could reclaim (depending on the architecture) a whole page.
> > 
> > Zapping on the PMD level might make most impact I guess.
> > 
> > For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we need a
> > total of 8 MB for the lowest level page tables (PTE).
> > 
> > OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB would mean
> > we can free up another 4MB - rather a corner case and we can live with that.
> > 
> > 
> > Of course, the same might apply to other cases where we can restore all page
> > table content from the VMA again. One example would be after MADV_FREE zapped a
> > whole range of entries we marked.
> 
> I don't think we have such mechanism, but IIRC I've heard the idea mentioned
> before, probably from Michal Hocko. Definitely an interesting research project
> idea to evaluate the cost vs benefits of that.
>

It might lead to interesting interactions with lockless page table walking
with implications on the mmap_lock as well.

Balbir



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-18 23:53   ` Balbir Singh
@ 2021-03-19 12:44     ` David Hildenbrand
  2021-03-20  1:56       ` Balbir Singh
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2021-03-19 12:44 UTC (permalink / raw)
  To: Balbir Singh, Vlastimil Babka
  Cc: Linux Memory Management List, Minchan Kim, Matthew Wilcox,
	Rik van Riel, Michal Hocko, Andrea Arcangeli, Peter Xu

On 19.03.21 00:53, Balbir Singh wrote:
> On Thu, Mar 18, 2021 at 05:57:06PM +0100, Vlastimil Babka wrote:
>> On 3/11/21 7:14 PM, David Hildenbrand wrote:
>>> Hi folks,
>>>
>>> I was wondering, is there any mechanism that reclaims basically empty page
>>> tables in a running process?
>>>
>>> Like: When I MADV_DONTNEED a huge range, there could be plenty of basically
>>> empty (e.g., all entries invalid) page tables we could reclaim. As soon as we
>>> zap a complete PMD we could reclaim (depending on the architecture) a whole page.
>>>
>>> Zapping on the PMD level might make most impact I guess.
>>>
>>> For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we need a
>>> total of 8 MB for the lowest level page tables (PTE).
>>>
>>> OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB would mean
>>> we can free up another 4MB - rather a corner case and we can live with that.
>>>
>>>
>>> Of course, the same might apply to other cases where we can restore all page
>>> table content from the VMA again. One example would be after MADV_FREE zapped a
>>> whole range of entries we marked.
>>
>> I don't think we have such mechanism, but IIRC I've heard the idea mentioned
>> before, probably from Michal Hocko. Definitely an interesting research project
>> idea to evaluate the cost vs benefits of that.
>>
> 
> It might lead to interesting interactions with lockless page table walking
> with implications on the mmap_lock as well.
>

I think if lockless page table walks have to be able with THP code 
swapping populated page tables by a PMD back and forth, swapping an 
unpopulated page table by an invalid PMD entry might be quite similar. 
At least it feels like both approaches would rely on similar mechanisms 
/ locking. :)

I'm planning on looking into this, but not sure when I'll have time to 
prototype something up.

> Balbir
> 
> 


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-11 21:35   ` David Hildenbrand
@ 2021-03-19 17:04     ` Yang Shi
  2021-03-22  9:34       ` David Hildenbrand
  0 siblings, 1 reply; 13+ messages in thread
From: Yang Shi @ 2021-03-19 17:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Xu, Linux Memory Management List, Minchan Kim,
	Matthew Wilcox, Rik van Riel, Michal Hocko, Andrea Arcangeli

On Thu, Mar 11, 2021 at 1:35 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 11.03.21 22:26, Peter Xu wrote:
> > On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
> >> I was wondering, is there any mechanism that reclaims basically empty page
> >> tables in a running process?
> >
> > Would munmap() count? :)
>
> Haha, no -- also not mmap(FIXED) or mremap(FIXED) ;)
>
> As so often lately, the use case is sparse memory mappings where we
>
> a) may want to reuse the area later.
> b) don't want to hold the mmap lock in write while optimizing
> c) don't want to create a lot of individual mappings that we might not
> be able to merge again.

Will the below work for you?

1. acquire write mmap lock
2. unlink vmas from the list and rbtree (so the vmas won't be visible
to any concurrent readers/writers)
3. downgrade write lock to read lock
4. zap page tables and free page tables
5. upgrade to write lock
6. relink vmas back to list and rbtree

Actually the current implementation of munmap() does the first 5 steps.

But there is no rwsem upgrade function, so you may have to release the
lock then reacquire the write lock.

>
>
> --
> Thanks,
>
> David / dhildenb
>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-19 12:44     ` David Hildenbrand
@ 2021-03-20  1:56       ` Balbir Singh
  2021-03-22  9:19         ` David Hildenbrand
  0 siblings, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2021-03-20  1:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Vlastimil Babka, Linux Memory Management List, Minchan Kim,
	Matthew Wilcox, Rik van Riel, Michal Hocko, Andrea Arcangeli,
	Peter Xu

On Fri, Mar 19, 2021 at 01:44:55PM +0100, David Hildenbrand wrote:
> On 19.03.21 00:53, Balbir Singh wrote:
> > On Thu, Mar 18, 2021 at 05:57:06PM +0100, Vlastimil Babka wrote:
> > > On 3/11/21 7:14 PM, David Hildenbrand wrote:
> > > > Hi folks,
> > > > 
> > > > I was wondering, is there any mechanism that reclaims basically empty page
> > > > tables in a running process?
> > > > 
> > > > Like: When I MADV_DONTNEED a huge range, there could be plenty of basically
> > > > empty (e.g., all entries invalid) page tables we could reclaim. As soon as we
> > > > zap a complete PMD we could reclaim (depending on the architecture) a whole page.
> > > > 
> > > > Zapping on the PMD level might make most impact I guess.
> > > > 
> > > > For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we need a
> > > > total of 8 MB for the lowest level page tables (PTE).
> > > > 
> > > > OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB would mean
> > > > we can free up another 4MB - rather a corner case and we can live with that.
> > > > 
> > > > 
> > > > Of course, the same might apply to other cases where we can restore all page
> > > > table content from the VMA again. One example would be after MADV_FREE zapped a
> > > > whole range of entries we marked.
> > > 
> > > I don't think we have such mechanism, but IIRC I've heard the idea mentioned
> > > before, probably from Michal Hocko. Definitely an interesting research project
> > > idea to evaluate the cost vs benefits of that.
> > > 
> > 
> > It might lead to interesting interactions with lockless page table walking
> > with implications on the mmap_lock as well.
> > 
> 
> I think if lockless page table walks have to be able with THP code swapping
> populated page tables by a PMD back and forth, swapping an unpopulated page
> table by an invalid PMD entry might be quite similar. At least it feels like
> both approaches would rely on similar mechanisms / locking. :)
>

Yes, but then I suspect you always need destruct page tables by RCU.
 
> I'm planning on looking into this, but not sure when I'll have time to
> prototype something up.
> 
>

Thanks,
Balbir Singh. 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-20  1:56       ` Balbir Singh
@ 2021-03-22  9:19         ` David Hildenbrand
  0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-03-22  9:19 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Vlastimil Babka, Linux Memory Management List, Minchan Kim,
	Matthew Wilcox, Rik van Riel, Michal Hocko, Andrea Arcangeli,
	Peter Xu

On 20.03.21 02:56, Balbir Singh wrote:
> On Fri, Mar 19, 2021 at 01:44:55PM +0100, David Hildenbrand wrote:
>> On 19.03.21 00:53, Balbir Singh wrote:
>>> On Thu, Mar 18, 2021 at 05:57:06PM +0100, Vlastimil Babka wrote:
>>>> On 3/11/21 7:14 PM, David Hildenbrand wrote:
>>>>> Hi folks,
>>>>>
>>>>> I was wondering, is there any mechanism that reclaims basically empty page
>>>>> tables in a running process?
>>>>>
>>>>> Like: When I MADV_DONTNEED a huge range, there could be plenty of basically
>>>>> empty (e.g., all entries invalid) page tables we could reclaim. As soon as we
>>>>> zap a complete PMD we could reclaim (depending on the architecture) a whole page.
>>>>>
>>>>> Zapping on the PMD level might make most impact I guess.
>>>>>
>>>>> For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we need a
>>>>> total of 8 MB for the lowest level page tables (PTE).
>>>>>
>>>>> OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB would mean
>>>>> we can free up another 4MB - rather a corner case and we can live with that.
>>>>>
>>>>>
>>>>> Of course, the same might apply to other cases where we can restore all page
>>>>> table content from the VMA again. One example would be after MADV_FREE zapped a
>>>>> whole range of entries we marked.
>>>>
>>>> I don't think we have such mechanism, but IIRC I've heard the idea mentioned
>>>> before, probably from Michal Hocko. Definitely an interesting research project
>>>> idea to evaluate the cost vs benefits of that.
>>>>
>>>
>>> It might lead to interesting interactions with lockless page table walking
>>> with implications on the mmap_lock as well.
>>>
>>
>> I think if lockless page table walks have to be able with THP code swapping
>> populated page tables by a PMD back and forth, swapping an unpopulated page
>> table by an invalid PMD entry might be quite similar. At least it feels like
>> both approaches would rely on similar mechanisms / locking. :)
>>
> 
> Yes, but then I suspect you always need destruct page tables by RCU.

I envision page table reclaim to happen asynchronously, so that 
shouldn't be an issue. We can just collect a bunch of reclaimed page 
tables and then issue an rcu synchronize before handing them back to the 
buddy.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-19 17:04     ` Yang Shi
@ 2021-03-22  9:34       ` David Hildenbrand
  0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-03-22  9:34 UTC (permalink / raw)
  To: Yang Shi
  Cc: Peter Xu, Linux Memory Management List, Minchan Kim,
	Matthew Wilcox, Rik van Riel, Michal Hocko, Andrea Arcangeli

On 19.03.21 18:04, Yang Shi wrote:
> On Thu, Mar 11, 2021 at 1:35 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 11.03.21 22:26, Peter Xu wrote:
>>> On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
>>>> I was wondering, is there any mechanism that reclaims basically empty page
>>>> tables in a running process?
>>>
>>> Would munmap() count? :)
>>
>> Haha, no -- also not mmap(FIXED) or mremap(FIXED) ;)
>>
>> As so often lately, the use case is sparse memory mappings where we
>>
>> a) may want to reuse the area later.
>> b) don't want to hold the mmap lock in write while optimizing
>> c) don't want to create a lot of individual mappings that we might not
>> be able to merge again.
> 
> Will the below work for you?
> 
> 1. acquire write mmap lock
> 2. unlink vmas from the list and rbtree (so the vmas won't be visible
> to any concurrent readers/writers)
> 3. downgrade write lock to read lock
> 4. zap page tables and free page tables
> 5. upgrade to write lock
> 6. relink vmas back to list and rbtree
> 
> Actually the current implementation of munmap() does the first 5 steps.

That's almost mmap(MAP_FIXED) for the cases where we can merge VMAs. But 
I don't think this is actually what we want. We don't want to do such 
optimizations while we're in mmap-read-locked MADV_DONTNEED etc.


Simple example: QEMU implements memory ballooning for its VMs via 
virtio-balloon. When the guest inflates/deflates 4k pages and we're 
using anonymous memory, we issue madvise(MADV_DONTNEED) syscalls for 
each 4k page. At some point, we might be able to reclaim page tables - 
but we don't want to suddenly take the mmap lock in write during 
madvise() when there is no actual memory pressure, or scan for 
optimization opportunities during every syscall. User space pretty much 
relies on madvise(DONTNEED) being fast and little intrusive.

I think there might be other cases where we can reclaim page tables as 
well, not necessarily triggered by user space. For example, after we 
wrote back/evicted a sequence of file-mapped pages, I would assume that 
we might also be able to reclaim page tables, but I haven't looked into 
it yet. For now, I mostly care about page table reclaim for the cases 
where we discard pages from page tables completely (MADV_DONTNEED, 
MADV_FREE, MADV_REMOVE, fallocate(PUNCH_HOLE)).


I envision page table reclaim to happen asynchronously, either 
periodically once under memory pressure, or once sufficient evidence is 
there that reclaim might make sense. There, similarly to khugepaged, we 
might have to temporarily take the mmap lock in write for a short period 
in time, but I'll have to look into the details first.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Page zapping and page table reclaim
  2021-03-11 18:14 Page zapping and page table reclaim David Hildenbrand
                   ` (2 preceding siblings ...)
  2021-03-18 18:03 ` Rik van Riel
@ 2021-03-24  9:55 ` David Hildenbrand
  3 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-03-24  9:55 UTC (permalink / raw)
  To: Linux Memory Management List
  Cc: Minchan Kim, Matthew Wilcox, Rik van Riel, Michal Hocko,
	Andrea Arcangeli, Peter Xu, Vlastimil Babka, Yang Shi,
	Balbir Singh

On 11.03.21 19:14, David Hildenbrand wrote:
> Hi folks,
> 
> I was wondering, is there any mechanism that reclaims basically empty
> page tables in a running process?
> 
> Like: When I MADV_DONTNEED a huge range, there could be plenty of
> basically empty (e.g., all entries invalid) page tables we could
> reclaim. As soon as we zap a complete PMD we could reclaim (depending on
> the architecture) a whole page.
> 
> Zapping on the PMD level might make most impact I guess.
> 
> For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we
> need a total of 8 MB for the lowest level page tables (PTE).
> 
> OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB
> would mean we can free up another 4MB - rather a corner case and we can
> live with that.
> 
> 
> Of course, the same might apply to other cases where we can restore all
> page table content from the VMA again. One example would be after
> MADV_FREE zapped a whole range of entries we marked.
> 
> Looks like if we happen to zap a THP, we should already get what we want
> (no page table, nothing to remove)
> 
> I haven't immediately stumbled over anything, but could be I am missing
> the obvious. I guess what would need some thought is concurrent
> discards/pagefaults - but it feels like being similar to
> collapsing/splitting a THP while there is other system activity.
> 
> Maybe there is already something and I am just not aware of it.
> 
> Thanks!

Thanks for the feedback so far. I just did a very simple experiment:

1. Start a VM (QEMU) with 60 GB and populate/preallocate all page tables.
2. Inflate the memory balloon (virtio-balloon) in the VM to 58 GB
3. Wait until fully inflated

Before inflating the balloon: PageTables:       131760 kB
After inflating the balloon: No real change
Shutting down the VM: PageTables:         8064 kB

In comparison, starting a 2 GB VM and preallocating/populating all 
memory: PageTables:        12660 kB


So in this case, there is quite some room for improvements (> 100 MiB). 
virtio-balloon will discard in 4k granularity, which means, that we'll 
never get to zap whole THPs (the first discard will break up the THP), 
therefore, don't remove any page tables.

I'll try identifying other workloads/cases where such an optimization 
are applicable and work on asynchronous page table reclaim. Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-03-24  9:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 18:14 Page zapping and page table reclaim David Hildenbrand
2021-03-11 21:26 ` Peter Xu
2021-03-11 21:35   ` David Hildenbrand
2021-03-19 17:04     ` Yang Shi
2021-03-22  9:34       ` David Hildenbrand
2021-03-18 16:57 ` Vlastimil Babka
2021-03-18 23:53   ` Balbir Singh
2021-03-19 12:44     ` David Hildenbrand
2021-03-20  1:56       ` Balbir Singh
2021-03-22  9:19         ` David Hildenbrand
2021-03-18 18:03 ` Rik van Riel
2021-03-18 18:15   ` David Hildenbrand
2021-03-24  9:55 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).