All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>, Nadav Amit <nadav.amit@gmail.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Nadav Amit <namit@vmware.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Nick Piggin <npiggin@gmail.com>
Subject: Re: [RFC PATCH 01/14] userfaultfd: set dirty and young on writeprotect
Date: Wed, 20 Jul 2022 11:39:23 +0200	[thread overview]
Message-ID: <017facf0-7ef8-3faf-138d-3013a20b37db@redhat.com> (raw)
In-Reply-To: <YtcYVMoSRVxRH70Z@xz-m1.local>

On 19.07.22 22:47, Peter Xu wrote:
> On Mon, Jul 18, 2022 at 05:01:59AM -0700, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>>
>> When userfaultfd makes a PTE writable, it can now change the PTE
>> directly, in some cases, without going triggering a page-fault first.
>> Yet, doing so might leave the PTE that was write-unprotected as old and
>> clean. At least on x86, this would cause a >500 cycles overhead when the
>> PTE is first accessed.
>>
>> Use MM_CP_WILL_NEED to set the PTE as young and dirty when userfaultfd
>> gets a hint that the page is likely to be used. Avoid changing the PTE
>> to young and dirty in other cases to avoid excessive writeback and
>> messing with the page reclamation logic.
>>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Yu Zhao <yuzhao@google.com>
>> Cc: Nick Piggin <npiggin@gmail.com>
>> ---
>>  include/linux/mm.h | 2 ++
>>  mm/mprotect.c      | 9 ++++++++-
>>  mm/userfaultfd.c   | 8 ++++++--
>>  3 files changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 9cc02a7e503b..4afd75ce5875 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1988,6 +1988,8 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma,
>>  /* Whether this change is for write protecting */
>>  #define  MM_CP_UFFD_WP                     (1UL << 2) /* do wp */
>>  #define  MM_CP_UFFD_WP_RESOLVE             (1UL << 3) /* Resolve wp */
>> +/* Whether to try to mark entries as dirty as they are to be written */
>> +#define  MM_CP_WILL_NEED		   (1UL << 4)
>>  #define  MM_CP_UFFD_WP_ALL                 (MM_CP_UFFD_WP | \
>>  					    MM_CP_UFFD_WP_RESOLVE)
>>  
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 996a97e213ad..34c2dfb68c42 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -82,6 +82,7 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
>>  	bool prot_numa = cp_flags & MM_CP_PROT_NUMA;
>>  	bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
>>  	bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
>> +	bool will_need = cp_flags & MM_CP_WILL_NEED;
>>  
>>  	tlb_change_page_size(tlb, PAGE_SIZE);
>>  
>> @@ -172,6 +173,9 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
>>  				ptent = pte_clear_uffd_wp(ptent);
>>  			}
>>  
>> +			if (will_need)
>> +				ptent = pte_mkyoung(ptent);
> 
> For uffd path, UFFD_FLAGS_ACCESS_LIKELY|UFFD_FLAGS_WRITE_LIKELY are new
> internal flags used with or without the new feature bit set.  It means even
> with !ACCESS_HINT we'll start to set young bit while we used not to?  Is
> that some kind of a light abi change?
> 
> I'd suggest we only set will_need if ACCESS_HINT is set.
> 
>> +
>>  			/*
>>  			 * In some writable, shared mappings, we might want
>>  			 * to catch actual write access -- see
>> @@ -187,8 +191,11 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
>>  			 */
>>  			if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) &&
>>  			    !pte_write(ptent) &&
>> -			    can_change_pte_writable(vma, addr, ptent))
>> +			    can_change_pte_writable(vma, addr, ptent)) {
>>  				ptent = pte_mkwrite(ptent);
>> +				if (will_need)
>> +					ptent = pte_mkdirty(ptent);
> 
> Can we make this unconditional?  IOW to cover both:
> 
>   (1) When will_need is not set, or
>   (2) mprotect() too
> 
> David's patch is good in that we merged the unprotect and CoW.  However
> that's not complete because the dirty bit ops are missing.
> 
> Here IMHO we should have a standalone patch to just add the dirty bit into
> this logic when we'll grant write bit.  IMHO it'll make the write+dirty
> bits coherent again in all paths.

I'm not sure I follow.

We *surely* don't want to dirty random pages (especially once in the
pagecache/swapcache) simply because we change protection.

Just like we don't set all pages write+dirty in a writable VMA on a read
fault.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2022-07-20  9:39 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-18 12:01 [RFC PATCH 00/14] mm: relaxed TLB flushes and other optimi Nadav Amit
2022-07-18 12:01 ` [RFC PATCH 01/14] userfaultfd: set dirty and young on writeprotect Nadav Amit
2022-07-19 20:47   ` Peter Xu
2022-07-20  9:39     ` David Hildenbrand [this message]
2022-07-20 13:10       ` Peter Xu
2022-07-20 15:10         ` David Hildenbrand
2022-07-20 19:15           ` Peter Xu
2022-07-20 19:33             ` David Hildenbrand
2022-07-20 19:48               ` Peter Xu
2022-07-20 19:55                 ` David Hildenbrand
2022-07-20 20:22                   ` Nadav Amit
2022-07-20 20:38                     ` David Hildenbrand
2022-07-20 20:56                       ` Nadav Amit
2022-07-21  7:52                         ` David Hildenbrand
2022-07-21 14:10                           ` David Hildenbrand
2022-07-20  9:42   ` David Hildenbrand
2022-07-20 17:36     ` Nadav Amit
2022-07-20 18:00       ` David Hildenbrand
2022-07-20 18:09         ` Nadav Amit
2022-07-20 18:11           ` David Hildenbrand
2022-07-18 12:02 ` [RFC PATCH 02/14] userfaultfd: try to map write-unprotected pages Nadav Amit
2022-07-19 20:49   ` Peter Xu
2022-07-18 12:02 ` [RFC PATCH 03/14] mm/mprotect: allow exclusive anon pages to be writable Nadav Amit
2022-07-20 15:19   ` David Hildenbrand
2022-07-20 17:25     ` Nadav Amit
2022-07-21  7:45       ` David Hildenbrand
2022-07-18 12:02 ` [RFC PATCH 04/14] mm/mprotect: preserve write with MM_CP_TRY_CHANGE_WRITABLE Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 05/14] x86/mm: check exec permissions on fault Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 06/14] mm/rmap: avoid flushing on page_vma_mkclean_one() when possible Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 07/14] mm: do fix spurious page-faults for instruction faults Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 08/14] x86/mm: introduce flush_tlb_fix_spurious_fault Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 09/14] mm: introduce relaxed TLB flushes Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 10/14] x86/mm: " Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 11/14] x86/mm: use relaxed TLB flushes when protection is removed Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 12/14] x86/tlb: no flush on PTE change from RW->RO when PTE is clean Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 13/14] mm/mprotect: do not check flush type if a strict is needed Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 14/14] mm: conditional check of pfn in pte_flush_type Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=017facf0-7ef8-3faf-138d-3013a20b37db@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=axelrasmussen@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=namit@vmware.com \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.