Re: [PATCH 1/2] mm/mprotect: use mmu_gather

From: Nadav Amit <nadav.amit@gmail.com>
To: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Nick Piggin <npiggin@gmail.com>,
	x86@kernel.org
Subject: Re: [PATCH 1/2] mm/mprotect: use mmu_gather
Date: Wed, 13 Oct 2021 08:59:04 -0700	[thread overview]
Message-ID: <09F31D01-E818-4538-A6E9-3E4779FC4B53@gmail.com> (raw)
In-Reply-To: <YWYYUOy0W8z5SVKh@t490s>

> On Oct 12, 2021, at 4:20 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> On Tue, Oct 12, 2021 at 10:31:45AM -0700, Nadav Amit wrote:
>> 
>> 
>>> On Oct 12, 2021, at 3:16 AM, Peter Xu <peterx@redhat.com> wrote:
>>> 
>>> On Sat, Sep 25, 2021 at 01:54:22PM -0700, Nadav Amit wrote:
>>>> @@ -338,25 +344,25 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,
>>>> 	struct mm_struct *mm = vma->vm_mm;
>>>> 	pgd_t *pgd;
>>>> 	unsigned long next;
>>>> -	unsigned long start = addr;
>>>> 	unsigned long pages = 0;
>>>> +	struct mmu_gather tlb;
>>>> 
>>>> 	BUG_ON(addr >= end);
>>>> 	pgd = pgd_offset(mm, addr);
>>>> 	flush_cache_range(vma, addr, end);
>>>> 	inc_tlb_flush_pending(mm);
>>>> +	tlb_gather_mmu(&tlb, mm);
>>>> +	tlb_start_vma(&tlb, vma);
>>> 
>>> Pure question:
>>> 
>>> I actually have no idea why tlb_start_vma() is needed here, as protection range
>>> can be just a single page, but anyway.. I do see that tlb_start_vma() contains
>>> a whole-vma flush_cache_range() when the arch needs it, then does it mean that
>>> besides the inc_tlb_flush_pending() to be dropped, so as to the other call to
>>> flush_cache_range() above?
>> 
>> Good point.
>> 
>> tlb_start_vma() and tlb_end_vma() are required since some archs do not
>> batch TLB flushes across VMAs (e.g., ARM).
> 
> Sorry I didn't follow here - as change_protection() is per-vma anyway, so I
> don't see why it needs to consider vma crossing.
> 
> In all cases, it'll be great if you could add some explanation into commit
> message on why we need tlb_{start|end}_vma(), as I think it could not be
> obvious to all people.

tlb_start_vma() is required when we switch from flush_tlb_range() because
certain properties of the VMA (e.g., executable) are needed on certain
arch. That’s the reason flush_tlb_range() requires the VMA that is
invalidated to be provided.

Regardless, there is an interface and that is the way it is used. I am not
inclined to break it, even if it was possible, for unclear performance
benefits.

As I discussed offline with Andrea and David, switching to tlb_gather_mmu()
interface has additional advantages than batching and avoiding unnecessary
flushes on PTE permission promotion (as done in patch 2). If a single PTE
is updated out of a bigger range, currently flush_tlb_range() would flush
the whole range instead of the single page. In addition, once I fix this
patch-set, if you update a THP, you would (at least on x86) be able to
flush a single PTE instead of flushing 512 entries (which would actually
be done using a full TLB flush).

I would say that as I mentioned in a different thread, and was not
upfront about before, one of the motivations of mine behind this patch
is that I need a vectored UFFDIO_WRITEPROTECTV interface for performance.
Nevertheless, I think these two patches stand by themselves and have
independent value.