All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Nick Piggin <npiggin@gmail.com>
Subject: Re: [RFC PATCH 01/14] userfaultfd: set dirty and young on writeprotect
Date: Wed, 20 Jul 2022 13:22:15 -0700	[thread overview]
Message-ID: <95320077-52CF-4CB0-92F9-523E1AE74A3D@gmail.com> (raw)
In-Reply-To: <4ad140b5-1d5b-2486-0893-7886a9cdfd76@redhat.com>

On Jul 20, 2022, at 12:55 PM, David Hildenbrand <david@redhat.com> wrote:

> On 20.07.22 21:48, Peter Xu wrote:
>> On Wed, Jul 20, 2022 at 09:33:35PM +0200, David Hildenbrand wrote:
>>> On 20.07.22 21:15, Peter Xu wrote:
>>>> On Wed, Jul 20, 2022 at 05:10:37PM +0200, David Hildenbrand wrote:
>>>>> For pagecache pages it may as well be *plain wrong* to bypass the write
>>>>> fault handler and simply mark pages dirty+map them writable.
>>>> 
>>>> Could you elaborate?
>>> 
>>> Write-fault handling for some filesystems (that even require this
>>> "slow path") is a bit special.
>>> 
>>> For example, do_shared_fault() might have to call page_mkwrite().
>>> 
>>> AFAIK file systems use that for lazy allocation of disk blocks.
>>> If you simply go ahead and map a !dirty pagecache page writable
>>> and mark it dirty, it will not trigger page_mkwrite() and you might
>>> end up corrupting data.
>>> 
>>> That's why we the old change_pte_range() code never touched
>>> anything if the pte wasn't already dirty.
>> 
>> I don't think that pte_dirty() check was for the pagecache code. For any fs
>> that has page_mkwrite() defined, it'll already have vma_wants_writenotify()
>> return 1, so we'll never try to add write bit, hence we'll never even try
>> to check pte_dirty().
> 
> I might be too tired, but the whole reason we had this magic before my
> commit in place was only for the pagecache.
> 
> With vma_wants_writenotify()=0 you can directly map the pages writable
> and don't have to do these advanced checks here. In a writable
> MAP_SHARED VMA you'll already have pte_write().
> 
> We only get !pte_write() in case we have vma_wants_writenotify()=1 ...
> 
>  try_change_writable = vma_wants_writenotify(vma, vma->vm_page_prot);
> 
> and that's the code that checked the dirty bit after all to decide --
> amongst other things -- if we can simply map it writable without going
> via the write fault handler and triggering do_shared_fault() .
> 
> See crazy/ugly FOLL_FORCE code in GUP that similarly checks the dirty bit.

I thought you want to get rid of it at least for anonymous pages. No?

> 
> But yeah, it's all confusing so I might just be wrong regarding
> pagecache pages.

Just to note: I am not very courageous and I did not intend to change
condition for when non-anonymous pages are set as writable. That’s the
reason I did not change the dirty for non-writable non-anonymous entries (as
Peter said). And that’s the reason that setting the dirty bit (at least as I
should have done it) is only performed after we made the decision on the
write-bit.

IOW, after you made your decision about the write-bit, then and only then
you may be able to set the dirty bit for writable entries. Since the entry
is already writeable (i.e., can be written without a fault later directly
from userspace), there should be no concern of correctness when you set it.


  reply	other threads:[~2022-07-20 20:22 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-18 12:01 [RFC PATCH 00/14] mm: relaxed TLB flushes and other optimi Nadav Amit
2022-07-18 12:01 ` [RFC PATCH 01/14] userfaultfd: set dirty and young on writeprotect Nadav Amit
2022-07-19 20:47   ` Peter Xu
2022-07-20  9:39     ` David Hildenbrand
2022-07-20 13:10       ` Peter Xu
2022-07-20 15:10         ` David Hildenbrand
2022-07-20 19:15           ` Peter Xu
2022-07-20 19:33             ` David Hildenbrand
2022-07-20 19:48               ` Peter Xu
2022-07-20 19:55                 ` David Hildenbrand
2022-07-20 20:22                   ` Nadav Amit [this message]
2022-07-20 20:38                     ` David Hildenbrand
2022-07-20 20:56                       ` Nadav Amit
2022-07-21  7:52                         ` David Hildenbrand
2022-07-21 14:10                           ` David Hildenbrand
2022-07-20  9:42   ` David Hildenbrand
2022-07-20 17:36     ` Nadav Amit
2022-07-20 18:00       ` David Hildenbrand
2022-07-20 18:09         ` Nadav Amit
2022-07-20 18:11           ` David Hildenbrand
2022-07-18 12:02 ` [RFC PATCH 02/14] userfaultfd: try to map write-unprotected pages Nadav Amit
2022-07-19 20:49   ` Peter Xu
2022-07-18 12:02 ` [RFC PATCH 03/14] mm/mprotect: allow exclusive anon pages to be writable Nadav Amit
2022-07-20 15:19   ` David Hildenbrand
2022-07-20 17:25     ` Nadav Amit
2022-07-21  7:45       ` David Hildenbrand
2022-07-18 12:02 ` [RFC PATCH 04/14] mm/mprotect: preserve write with MM_CP_TRY_CHANGE_WRITABLE Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 05/14] x86/mm: check exec permissions on fault Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 06/14] mm/rmap: avoid flushing on page_vma_mkclean_one() when possible Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 07/14] mm: do fix spurious page-faults for instruction faults Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 08/14] x86/mm: introduce flush_tlb_fix_spurious_fault Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 09/14] mm: introduce relaxed TLB flushes Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 10/14] x86/mm: " Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 11/14] x86/mm: use relaxed TLB flushes when protection is removed Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 12/14] x86/tlb: no flush on PTE change from RW->RO when PTE is clean Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 13/14] mm/mprotect: do not check flush type if a strict is needed Nadav Amit
2022-07-18 12:02 ` [RFC PATCH 14/14] mm: conditional check of pfn in pte_flush_type Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95320077-52CF-4CB0-92F9-523E1AE74A3D@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=axelrasmussen@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.