Re: Page zapping and page table reclaim

From: David Hildenbrand <david@redhat.com>
To: Linux Memory Management List <linux-mm@kvack.org>
Cc: Minchan Kim <minchan@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Rik van Riel <riel@surriel.com>, Michal Hocko <mhocko@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Balbir Singh <bsingharora@gmail.com>
Subject: Re: Page zapping and page table reclaim
Date: Wed, 24 Mar 2021 10:55:36 +0100	[thread overview]
Message-ID: <53e72516-2e38-f490-4d1f-709291140e2f@redhat.com> (raw)
In-Reply-To: <bae8b967-c206-819d-774c-f57b94c4b362@redhat.com>

On 11.03.21 19:14, David Hildenbrand wrote:
> Hi folks,
> 
> I was wondering, is there any mechanism that reclaims basically empty
> page tables in a running process?
> 
> Like: When I MADV_DONTNEED a huge range, there could be plenty of
> basically empty (e.g., all entries invalid) page tables we could
> reclaim. As soon as we zap a complete PMD we could reclaim (depending on
> the architecture) a whole page.
> 
> Zapping on the PMD level might make most impact I guess.
> 
> For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we
> need a total of 8 MB for the lowest level page tables (PTE).
> 
> OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB
> would mean we can free up another 4MB - rather a corner case and we can
> live with that.
> 
> 
> Of course, the same might apply to other cases where we can restore all
> page table content from the VMA again. One example would be after
> MADV_FREE zapped a whole range of entries we marked.
> 
> Looks like if we happen to zap a THP, we should already get what we want
> (no page table, nothing to remove)
> 
> I haven't immediately stumbled over anything, but could be I am missing
> the obvious. I guess what would need some thought is concurrent
> discards/pagefaults - but it feels like being similar to
> collapsing/splitting a THP while there is other system activity.
> 
> Maybe there is already something and I am just not aware of it.
> 
> Thanks!

Thanks for the feedback so far. I just did a very simple experiment:

1. Start a VM (QEMU) with 60 GB and populate/preallocate all page tables.
2. Inflate the memory balloon (virtio-balloon) in the VM to 58 GB
3. Wait until fully inflated

Before inflating the balloon: PageTables:       131760 kB
After inflating the balloon: No real change
Shutting down the VM: PageTables:         8064 kB

In comparison, starting a 2 GB VM and preallocating/populating all 
memory: PageTables:        12660 kB

So in this case, there is quite some room for improvements (> 100 MiB). 
virtio-balloon will discard in 4k granularity, which means, that we'll 
never get to zap whole THPs (the first discard will break up the THP), 
therefore, don't remove any page tables.

I'll try identifying other workloads/cases where such an optimization 
are applicable and work on asynchronous page table reclaim. Thanks!

-- 
Thanks,

David / dhildenb