* Page zapping and page table reclaim
@ 2021-03-11 18:14 David Hildenbrand
2021-03-11 21:26 ` Peter Xu
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-03-11 18:14 UTC (permalink / raw)
To: Linux Memory Management List
Cc: Minchan Kim, Matthew Wilcox, Rik van Riel, Michal Hocko,
Andrea Arcangeli, Peter Xu
Hi folks,
I was wondering, is there any mechanism that reclaims basically empty
page tables in a running process?
Like: When I MADV_DONTNEED a huge range, there could be plenty of
basically empty (e.g., all entries invalid) page tables we could
reclaim. As soon as we zap a complete PMD we could reclaim (depending on
the architecture) a whole page.
Zapping on the PMD level might make most impact I guess.
For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we
need a total of 8 MB for the lowest level page tables (PTE).
OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB
would mean we can free up another 4MB - rather a corner case and we can
live with that.
Of course, the same might apply to other cases where we can restore all
page table content from the VMA again. One example would be after
MADV_FREE zapped a whole range of entries we marked.
Looks like if we happen to zap a THP, we should already get what we want
(no page table, nothing to remove)
I haven't immediately stumbled over anything, but could be I am missing
the obvious. I guess what would need some thought is concurrent
discards/pagefaults - but it feels like being similar to
collapsing/splitting a THP while there is other system activity.
Maybe there is already something and I am just not aware of it.
Thanks!
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-11 18:14 Page zapping and page table reclaim David Hildenbrand
@ 2021-03-11 21:26 ` Peter Xu
2021-03-11 21:35 ` David Hildenbrand
2021-03-18 16:57 ` Vlastimil Babka
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2021-03-11 21:26 UTC (permalink / raw)
To: David Hildenbrand
Cc: Linux Memory Management List, Minchan Kim, Matthew Wilcox,
Rik van Riel, Michal Hocko, Andrea Arcangeli
On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
> I was wondering, is there any mechanism that reclaims basically empty page
> tables in a running process?
Would munmap() count? :)
--
Peter Xu
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-11 21:26 ` Peter Xu
@ 2021-03-11 21:35 ` David Hildenbrand
2021-03-19 17:04 ` Yang Shi
0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2021-03-11 21:35 UTC (permalink / raw)
To: Peter Xu
Cc: Linux Memory Management List, Minchan Kim, Matthew Wilcox,
Rik van Riel, Michal Hocko, Andrea Arcangeli
On 11.03.21 22:26, Peter Xu wrote:
> On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
>> I was wondering, is there any mechanism that reclaims basically empty page
>> tables in a running process?
>
> Would munmap() count? :)
Haha, no -- also not mmap(FIXED) or mremap(FIXED) ;)
As so often lately, the use case is sparse memory mappings where we
a) may want to reuse the area later.
b) don't want to hold the mmap lock in write while optimizing
c) don't want to create a lot of individual mappings that we might not
be able to merge again.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-11 18:14 Page zapping and page table reclaim David Hildenbrand
2021-03-11 21:26 ` Peter Xu
@ 2021-03-18 16:57 ` Vlastimil Babka
2021-03-18 23:53 ` Balbir Singh
2021-03-18 18:03 ` Rik van Riel
2021-03-24 9:55 ` David Hildenbrand
3 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2021-03-18 16:57 UTC (permalink / raw)
To: David Hildenbrand, Linux Memory Management List
Cc: Minchan Kim, Matthew Wilcox, Rik van Riel, Michal Hocko,
Andrea Arcangeli, Peter Xu
On 3/11/21 7:14 PM, David Hildenbrand wrote:
> Hi folks,
>
> I was wondering, is there any mechanism that reclaims basically empty page
> tables in a running process?
>
> Like: When I MADV_DONTNEED a huge range, there could be plenty of basically
> empty (e.g., all entries invalid) page tables we could reclaim. As soon as we
> zap a complete PMD we could reclaim (depending on the architecture) a whole page.
>
> Zapping on the PMD level might make most impact I guess.
>
> For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we need a
> total of 8 MB for the lowest level page tables (PTE).
>
> OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB would mean
> we can free up another 4MB - rather a corner case and we can live with that.
>
>
> Of course, the same might apply to other cases where we can restore all page
> table content from the VMA again. One example would be after MADV_FREE zapped a
> whole range of entries we marked.
I don't think we have such mechanism, but IIRC I've heard the idea mentioned
before, probably from Michal Hocko. Definitely an interesting research project
idea to evaluate the cost vs benefits of that.
> Looks like if we happen to zap a THP, we should already get what we want (no
> page table, nothing to remove)
>
> I haven't immediately stumbled over anything, but could be I am missing the
> obvious. I guess what would need some thought is concurrent discards/pagefaults
> - but it feels like being similar to collapsing/splitting a THP while there is
> other system activity.
>
> Maybe there is already something and I am just not aware of it.
>
> Thanks!
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-11 18:14 Page zapping and page table reclaim David Hildenbrand
2021-03-11 21:26 ` Peter Xu
2021-03-18 16:57 ` Vlastimil Babka
@ 2021-03-18 18:03 ` Rik van Riel
2021-03-18 18:15 ` David Hildenbrand
2021-03-24 9:55 ` David Hildenbrand
3 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2021-03-18 18:03 UTC (permalink / raw)
To: David Hildenbrand, Linux Memory Management List
Cc: Minchan Kim, Matthew Wilcox, Michal Hocko, Andrea Arcangeli, Peter Xu
[-- Attachment #1: Type: text/plain, Size: 423 bytes --]
On Thu, 2021-03-11 at 19:14 +0100, David Hildenbrand wrote:
> Hi folks,
>
> I was wondering, is there any mechanism that reclaims basically
> empty
> page tables in a running process?
Currently we have do_munmap -> unmap_region -> free_pgtables,
which is hooked up only to sys_munmap.
We don't seem to have an equivalent for the various MADV_
options that lead to freed memory.
--
All Rights Reversed.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-18 18:03 ` Rik van Riel
@ 2021-03-18 18:15 ` David Hildenbrand
0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-03-18 18:15 UTC (permalink / raw)
To: Rik van Riel, Linux Memory Management List
Cc: Minchan Kim, Matthew Wilcox, Michal Hocko, Andrea Arcangeli, Peter Xu
On 18.03.21 19:03, Rik van Riel wrote:
> On Thu, 2021-03-11 at 19:14 +0100, David Hildenbrand wrote:
>> Hi folks,
>>
>> I was wondering, is there any mechanism that reclaims basically
>> empty
>> page tables in a running process?
>
> Currently we have do_munmap -> unmap_region -> free_pgtables,
> which is hooked up only to sys_munmap.
>
> We don't seem to have an equivalent for the various MADV_
> options that lead to freed memory.
>
The other path I am interested in is doing an
fallocate(FALLOC_FL_PUNCH_HOLE)/MADV_REMOVE on a shared file/mapping,
resulting in the same situation AFAIKS.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-18 16:57 ` Vlastimil Babka
@ 2021-03-18 23:53 ` Balbir Singh
2021-03-19 12:44 ` David Hildenbrand
0 siblings, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2021-03-18 23:53 UTC (permalink / raw)
To: Vlastimil Babka
Cc: David Hildenbrand, Linux Memory Management List, Minchan Kim,
Matthew Wilcox, Rik van Riel, Michal Hocko, Andrea Arcangeli,
Peter Xu
On Thu, Mar 18, 2021 at 05:57:06PM +0100, Vlastimil Babka wrote:
> On 3/11/21 7:14 PM, David Hildenbrand wrote:
> > Hi folks,
> >
> > I was wondering, is there any mechanism that reclaims basically empty page
> > tables in a running process?
> >
> > Like: When I MADV_DONTNEED a huge range, there could be plenty of basically
> > empty (e.g., all entries invalid) page tables we could reclaim. As soon as we
> > zap a complete PMD we could reclaim (depending on the architecture) a whole page.
> >
> > Zapping on the PMD level might make most impact I guess.
> >
> > For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we need a
> > total of 8 MB for the lowest level page tables (PTE).
> >
> > OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB would mean
> > we can free up another 4MB - rather a corner case and we can live with that.
> >
> >
> > Of course, the same might apply to other cases where we can restore all page
> > table content from the VMA again. One example would be after MADV_FREE zapped a
> > whole range of entries we marked.
>
> I don't think we have such mechanism, but IIRC I've heard the idea mentioned
> before, probably from Michal Hocko. Definitely an interesting research project
> idea to evaluate the cost vs benefits of that.
>
It might lead to interesting interactions with lockless page table walking
with implications on the mmap_lock as well.
Balbir
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-18 23:53 ` Balbir Singh
@ 2021-03-19 12:44 ` David Hildenbrand
2021-03-20 1:56 ` Balbir Singh
0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2021-03-19 12:44 UTC (permalink / raw)
To: Balbir Singh, Vlastimil Babka
Cc: Linux Memory Management List, Minchan Kim, Matthew Wilcox,
Rik van Riel, Michal Hocko, Andrea Arcangeli, Peter Xu
On 19.03.21 00:53, Balbir Singh wrote:
> On Thu, Mar 18, 2021 at 05:57:06PM +0100, Vlastimil Babka wrote:
>> On 3/11/21 7:14 PM, David Hildenbrand wrote:
>>> Hi folks,
>>>
>>> I was wondering, is there any mechanism that reclaims basically empty page
>>> tables in a running process?
>>>
>>> Like: When I MADV_DONTNEED a huge range, there could be plenty of basically
>>> empty (e.g., all entries invalid) page tables we could reclaim. As soon as we
>>> zap a complete PMD we could reclaim (depending on the architecture) a whole page.
>>>
>>> Zapping on the PMD level might make most impact I guess.
>>>
>>> For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we need a
>>> total of 8 MB for the lowest level page tables (PTE).
>>>
>>> OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB would mean
>>> we can free up another 4MB - rather a corner case and we can live with that.
>>>
>>>
>>> Of course, the same might apply to other cases where we can restore all page
>>> table content from the VMA again. One example would be after MADV_FREE zapped a
>>> whole range of entries we marked.
>>
>> I don't think we have such mechanism, but IIRC I've heard the idea mentioned
>> before, probably from Michal Hocko. Definitely an interesting research project
>> idea to evaluate the cost vs benefits of that.
>>
>
> It might lead to interesting interactions with lockless page table walking
> with implications on the mmap_lock as well.
>
I think if lockless page table walks have to be able with THP code
swapping populated page tables by a PMD back and forth, swapping an
unpopulated page table by an invalid PMD entry might be quite similar.
At least it feels like both approaches would rely on similar mechanisms
/ locking. :)
I'm planning on looking into this, but not sure when I'll have time to
prototype something up.
> Balbir
>
>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-11 21:35 ` David Hildenbrand
@ 2021-03-19 17:04 ` Yang Shi
2021-03-22 9:34 ` David Hildenbrand
0 siblings, 1 reply; 13+ messages in thread
From: Yang Shi @ 2021-03-19 17:04 UTC (permalink / raw)
To: David Hildenbrand
Cc: Peter Xu, Linux Memory Management List, Minchan Kim,
Matthew Wilcox, Rik van Riel, Michal Hocko, Andrea Arcangeli
On Thu, Mar 11, 2021 at 1:35 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 11.03.21 22:26, Peter Xu wrote:
> > On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
> >> I was wondering, is there any mechanism that reclaims basically empty page
> >> tables in a running process?
> >
> > Would munmap() count? :)
>
> Haha, no -- also not mmap(FIXED) or mremap(FIXED) ;)
>
> As so often lately, the use case is sparse memory mappings where we
>
> a) may want to reuse the area later.
> b) don't want to hold the mmap lock in write while optimizing
> c) don't want to create a lot of individual mappings that we might not
> be able to merge again.
Will the below work for you?
1. acquire write mmap lock
2. unlink vmas from the list and rbtree (so the vmas won't be visible
to any concurrent readers/writers)
3. downgrade write lock to read lock
4. zap page tables and free page tables
5. upgrade to write lock
6. relink vmas back to list and rbtree
Actually the current implementation of munmap() does the first 5 steps.
But there is no rwsem upgrade function, so you may have to release the
lock then reacquire the write lock.
>
>
> --
> Thanks,
>
> David / dhildenb
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-19 12:44 ` David Hildenbrand
@ 2021-03-20 1:56 ` Balbir Singh
2021-03-22 9:19 ` David Hildenbrand
0 siblings, 1 reply; 13+ messages in thread
From: Balbir Singh @ 2021-03-20 1:56 UTC (permalink / raw)
To: David Hildenbrand
Cc: Vlastimil Babka, Linux Memory Management List, Minchan Kim,
Matthew Wilcox, Rik van Riel, Michal Hocko, Andrea Arcangeli,
Peter Xu
On Fri, Mar 19, 2021 at 01:44:55PM +0100, David Hildenbrand wrote:
> On 19.03.21 00:53, Balbir Singh wrote:
> > On Thu, Mar 18, 2021 at 05:57:06PM +0100, Vlastimil Babka wrote:
> > > On 3/11/21 7:14 PM, David Hildenbrand wrote:
> > > > Hi folks,
> > > >
> > > > I was wondering, is there any mechanism that reclaims basically empty page
> > > > tables in a running process?
> > > >
> > > > Like: When I MADV_DONTNEED a huge range, there could be plenty of basically
> > > > empty (e.g., all entries invalid) page tables we could reclaim. As soon as we
> > > > zap a complete PMD we could reclaim (depending on the architecture) a whole page.
> > > >
> > > > Zapping on the PMD level might make most impact I guess.
> > > >
> > > > For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we need a
> > > > total of 8 MB for the lowest level page tables (PTE).
> > > >
> > > > OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB would mean
> > > > we can free up another 4MB - rather a corner case and we can live with that.
> > > >
> > > >
> > > > Of course, the same might apply to other cases where we can restore all page
> > > > table content from the VMA again. One example would be after MADV_FREE zapped a
> > > > whole range of entries we marked.
> > >
> > > I don't think we have such mechanism, but IIRC I've heard the idea mentioned
> > > before, probably from Michal Hocko. Definitely an interesting research project
> > > idea to evaluate the cost vs benefits of that.
> > >
> >
> > It might lead to interesting interactions with lockless page table walking
> > with implications on the mmap_lock as well.
> >
>
> I think if lockless page table walks have to be able with THP code swapping
> populated page tables by a PMD back and forth, swapping an unpopulated page
> table by an invalid PMD entry might be quite similar. At least it feels like
> both approaches would rely on similar mechanisms / locking. :)
>
Yes, but then I suspect you always need destruct page tables by RCU.
> I'm planning on looking into this, but not sure when I'll have time to
> prototype something up.
>
>
Thanks,
Balbir Singh.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-20 1:56 ` Balbir Singh
@ 2021-03-22 9:19 ` David Hildenbrand
0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-03-22 9:19 UTC (permalink / raw)
To: Balbir Singh
Cc: Vlastimil Babka, Linux Memory Management List, Minchan Kim,
Matthew Wilcox, Rik van Riel, Michal Hocko, Andrea Arcangeli,
Peter Xu
On 20.03.21 02:56, Balbir Singh wrote:
> On Fri, Mar 19, 2021 at 01:44:55PM +0100, David Hildenbrand wrote:
>> On 19.03.21 00:53, Balbir Singh wrote:
>>> On Thu, Mar 18, 2021 at 05:57:06PM +0100, Vlastimil Babka wrote:
>>>> On 3/11/21 7:14 PM, David Hildenbrand wrote:
>>>>> Hi folks,
>>>>>
>>>>> I was wondering, is there any mechanism that reclaims basically empty page
>>>>> tables in a running process?
>>>>>
>>>>> Like: When I MADV_DONTNEED a huge range, there could be plenty of basically
>>>>> empty (e.g., all entries invalid) page tables we could reclaim. As soon as we
>>>>> zap a complete PMD we could reclaim (depending on the architecture) a whole page.
>>>>>
>>>>> Zapping on the PMD level might make most impact I guess.
>>>>>
>>>>> For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we need a
>>>>> total of 8 MB for the lowest level page tables (PTE).
>>>>>
>>>>> OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB would mean
>>>>> we can free up another 4MB - rather a corner case and we can live with that.
>>>>>
>>>>>
>>>>> Of course, the same might apply to other cases where we can restore all page
>>>>> table content from the VMA again. One example would be after MADV_FREE zapped a
>>>>> whole range of entries we marked.
>>>>
>>>> I don't think we have such mechanism, but IIRC I've heard the idea mentioned
>>>> before, probably from Michal Hocko. Definitely an interesting research project
>>>> idea to evaluate the cost vs benefits of that.
>>>>
>>>
>>> It might lead to interesting interactions with lockless page table walking
>>> with implications on the mmap_lock as well.
>>>
>>
>> I think if lockless page table walks have to be able with THP code swapping
>> populated page tables by a PMD back and forth, swapping an unpopulated page
>> table by an invalid PMD entry might be quite similar. At least it feels like
>> both approaches would rely on similar mechanisms / locking. :)
>>
>
> Yes, but then I suspect you always need destruct page tables by RCU.
I envision page table reclaim to happen asynchronously, so that
shouldn't be an issue. We can just collect a bunch of reclaimed page
tables and then issue an rcu synchronize before handing them back to the
buddy.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-19 17:04 ` Yang Shi
@ 2021-03-22 9:34 ` David Hildenbrand
0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-03-22 9:34 UTC (permalink / raw)
To: Yang Shi
Cc: Peter Xu, Linux Memory Management List, Minchan Kim,
Matthew Wilcox, Rik van Riel, Michal Hocko, Andrea Arcangeli
On 19.03.21 18:04, Yang Shi wrote:
> On Thu, Mar 11, 2021 at 1:35 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 11.03.21 22:26, Peter Xu wrote:
>>> On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
>>>> I was wondering, is there any mechanism that reclaims basically empty page
>>>> tables in a running process?
>>>
>>> Would munmap() count? :)
>>
>> Haha, no -- also not mmap(FIXED) or mremap(FIXED) ;)
>>
>> As so often lately, the use case is sparse memory mappings where we
>>
>> a) may want to reuse the area later.
>> b) don't want to hold the mmap lock in write while optimizing
>> c) don't want to create a lot of individual mappings that we might not
>> be able to merge again.
>
> Will the below work for you?
>
> 1. acquire write mmap lock
> 2. unlink vmas from the list and rbtree (so the vmas won't be visible
> to any concurrent readers/writers)
> 3. downgrade write lock to read lock
> 4. zap page tables and free page tables
> 5. upgrade to write lock
> 6. relink vmas back to list and rbtree
>
> Actually the current implementation of munmap() does the first 5 steps.
That's almost mmap(MAP_FIXED) for the cases where we can merge VMAs. But
I don't think this is actually what we want. We don't want to do such
optimizations while we're in mmap-read-locked MADV_DONTNEED etc.
Simple example: QEMU implements memory ballooning for its VMs via
virtio-balloon. When the guest inflates/deflates 4k pages and we're
using anonymous memory, we issue madvise(MADV_DONTNEED) syscalls for
each 4k page. At some point, we might be able to reclaim page tables -
but we don't want to suddenly take the mmap lock in write during
madvise() when there is no actual memory pressure, or scan for
optimization opportunities during every syscall. User space pretty much
relies on madvise(DONTNEED) being fast and little intrusive.
I think there might be other cases where we can reclaim page tables as
well, not necessarily triggered by user space. For example, after we
wrote back/evicted a sequence of file-mapped pages, I would assume that
we might also be able to reclaim page tables, but I haven't looked into
it yet. For now, I mostly care about page table reclaim for the cases
where we discard pages from page tables completely (MADV_DONTNEED,
MADV_FREE, MADV_REMOVE, fallocate(PUNCH_HOLE)).
I envision page table reclaim to happen asynchronously, either
periodically once under memory pressure, or once sufficient evidence is
there that reclaim might make sense. There, similarly to khugepaged, we
might have to temporarily take the mmap lock in write for a short period
in time, but I'll have to look into the details first.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Page zapping and page table reclaim
2021-03-11 18:14 Page zapping and page table reclaim David Hildenbrand
` (2 preceding siblings ...)
2021-03-18 18:03 ` Rik van Riel
@ 2021-03-24 9:55 ` David Hildenbrand
3 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-03-24 9:55 UTC (permalink / raw)
To: Linux Memory Management List
Cc: Minchan Kim, Matthew Wilcox, Rik van Riel, Michal Hocko,
Andrea Arcangeli, Peter Xu, Vlastimil Babka, Yang Shi,
Balbir Singh
On 11.03.21 19:14, David Hildenbrand wrote:
> Hi folks,
>
> I was wondering, is there any mechanism that reclaims basically empty
> page tables in a running process?
>
> Like: When I MADV_DONTNEED a huge range, there could be plenty of
> basically empty (e.g., all entries invalid) page tables we could
> reclaim. As soon as we zap a complete PMD we could reclaim (depending on
> the architecture) a whole page.
>
> Zapping on the PMD level might make most impact I guess.
>
> For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we
> need a total of 8 MB for the lowest level page tables (PTE).
>
> OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB
> would mean we can free up another 4MB - rather a corner case and we can
> live with that.
>
>
> Of course, the same might apply to other cases where we can restore all
> page table content from the VMA again. One example would be after
> MADV_FREE zapped a whole range of entries we marked.
>
> Looks like if we happen to zap a THP, we should already get what we want
> (no page table, nothing to remove)
>
> I haven't immediately stumbled over anything, but could be I am missing
> the obvious. I guess what would need some thought is concurrent
> discards/pagefaults - but it feels like being similar to
> collapsing/splitting a THP while there is other system activity.
>
> Maybe there is already something and I am just not aware of it.
>
> Thanks!
Thanks for the feedback so far. I just did a very simple experiment:
1. Start a VM (QEMU) with 60 GB and populate/preallocate all page tables.
2. Inflate the memory balloon (virtio-balloon) in the VM to 58 GB
3. Wait until fully inflated
Before inflating the balloon: PageTables: 131760 kB
After inflating the balloon: No real change
Shutting down the VM: PageTables: 8064 kB
In comparison, starting a 2 GB VM and preallocating/populating all
memory: PageTables: 12660 kB
So in this case, there is quite some room for improvements (> 100 MiB).
virtio-balloon will discard in 4k granularity, which means, that we'll
never get to zap whole THPs (the first discard will break up the THP),
therefore, don't remove any page tables.
I'll try identifying other workloads/cases where such an optimization
are applicable and work on asynchronous page table reclaim. Thanks!
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-03-24 9:55 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 18:14 Page zapping and page table reclaim David Hildenbrand
2021-03-11 21:26 ` Peter Xu
2021-03-11 21:35 ` David Hildenbrand
2021-03-19 17:04 ` Yang Shi
2021-03-22 9:34 ` David Hildenbrand
2021-03-18 16:57 ` Vlastimil Babka
2021-03-18 23:53 ` Balbir Singh
2021-03-19 12:44 ` David Hildenbrand
2021-03-20 1:56 ` Balbir Singh
2021-03-22 9:19 ` David Hildenbrand
2021-03-18 18:03 ` Rik van Riel
2021-03-18 18:15 ` David Hildenbrand
2021-03-24 9:55 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).