Re: [RFC PATCH 01/14] userfaultfd: set dirty and young on writeprotect

From: David Hildenbrand <david@redhat.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Xu <peterx@redhat.com>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Nick Piggin <npiggin@gmail.com>
Subject: Re: [RFC PATCH 01/14] userfaultfd: set dirty and young on writeprotect
Date: Thu, 21 Jul 2022 09:52:39 +0200	[thread overview]
Message-ID: <8e2dcae3-5f50-cd59-7588-b8f566549ad4@redhat.com> (raw)
In-Reply-To: <F56D4014-7FE2-4C60-9CFA-A7E7B684098B@gmail.com>

>> Yes. Especially for any MAP_PRIVATE mappings.
>>
>> If you want to write to something that's not mapped writable in a
>> MAP_PRIVATE mapping it
>> a) Has to be an exclusive anonymous page
>> b) The pte has to be dirty
> 
> Do you need both conditions to be true? I thought (a) is sufficient (if
> the soft-dirty and related checks succeed).

If we force-write to a page, we need it to be dirty to tell reclaim code
that the content stale. We can either mark the pte dirty manually, or
just let the write fault handler deal with it to simplify GUP code. This
needs some more thought, but that's my understanding.

> 
>>
>> In any other case, you clearly missed to COW or the modifications might
>> get lost if the PTE is not dirty.
>>
>> MAP_SHARED is a bit more involved. But whether the pte is dirty might be
>> good enough ... but this needs a lot more care.
>>
>>>> But yeah, it's all confusing so I might just be wrong regarding
>>>> pagecache pages.
>>>
>>> Just to note: I am not very courageous and I did not intend to change
>>> condition for when non-anonymous pages are set as writable. That’s the
>>> reason I did not change the dirty for non-writable non-anonymous entries (as
>>> Peter said). And that’s the reason that setting the dirty bit (at least as I
>>> should have done it) is only performed after we made the decision on the
>>> write-bit.
>>
>> Good. As long as we stick to anonymous pages I roughly know what we we
>> can and cannot do at this point :)
>>
>>
>> The problem I see is that detection whether we can write requires the
>> dirty bit ... and whether to set the dirty bit requires the information
>> whether we can write.
>>
>> Again, for anonymous pages we should be able to relax the "dirty"
>> requirement when detecting whether we can write.
> 
> That’s all I wanted to do there.
> 
>>
>>> IOW, after you made your decision about the write-bit, then and only then
>>> you may be able to set the dirty bit for writable entries. Since the entry
>>> is already writeable (i.e., can be written without a fault later directly
>>> from userspace), there should be no concern of correctness when you set it.
>>
>> That makes sense to me. What keeps confusing me are architectures with
>> and without a hw-managed dirty bit ... :)
> 
> I don’t know which arch you have in your mind. But the moment a PTE is
> writable, then marking it logically/architecturally as dirty should be
> fine.
> 
> But… if the Exclusive check is not good enough for private+anon without
> the “logical” dirty bit, then there would be a problem. 

I think we are good for anonymous pages.

-- 
Thanks,

David / dhildenb