stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
@ 2022-02-17 21:16 Nadav Amit
  2022-02-17 21:28 ` Nadav Amit
  2022-02-18  1:58 ` Peter Xu
  0 siblings, 2 replies; 12+ messages in thread
From: Nadav Amit @ 2022-02-17 21:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Nadav Amit, Mike Rapoport, Andrea Arcangeli, Peter Xu, stable

From: Nadav Amit <namit@vmware.com>

When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is
currently only marked as write-protected if the VMA has VM_WRITE flag
set. This seems incorrect or at least would be unexpected by the users.

Consider the following sequence of operations that are being performed
on a certain page:

	mprotect(PROT_READ)
	UFFDIO_COPY(UFFDIO_COPY_MODE_WP)
	mprotect(PROT_READ|PROT_WRITE)

At this point the user would expect to still get UFFD notification when
the page is accessed for write, but the user would not get one, since
the PTE was not marked as UFFD_WP during UFFDIO_COPY.

Fix it by always marking PTEs as UFFD_WP regardless on the
write-permission in the VMA flags.

Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit")
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 mm/userfaultfd.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 0780c2a57ff1..885e5adb0168 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -72,12 +72,15 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 	_dst_pte = pte_mkdirty(_dst_pte);
 	if (page_in_cache && !vm_shared)
 		writable = false;
-	if (writable) {
-		if (wp_copy)
-			_dst_pte = pte_mkuffd_wp(_dst_pte);
-		else
-			_dst_pte = pte_mkwrite(_dst_pte);
-	}
+
+	/*
+	 * Always mark a PTE as write-protected when needed, regardless of
+	 * VM_WRITE, which the user might change.
+	 */
+	if (wp_copy)
+		_dst_pte = pte_mkuffd_wp(_dst_pte);
+	else if (writable)
+		_dst_pte = pte_mkwrite(_dst_pte);
 
 	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-02-17 21:16 [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag Nadav Amit
@ 2022-02-17 21:28 ` Nadav Amit
  2022-02-18  1:58 ` Peter Xu
  1 sibling, 0 replies; 12+ messages in thread
From: Nadav Amit @ 2022-02-17 21:28 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli,
	Peter Xu, stable



> On Feb 17, 2022, at 1:16 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> 
> From: Nadav Amit <namit@vmware.com>
> 
> When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is
> currently only marked as write-protected if the VMA has VM_WRITE flag
> set. This seems incorrect or at least would be unexpected by the users.

One more note - if you want you can try the following reproducer:

#define _GNU_SOURCE
#include <linux/userfaultfd.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <stdio.h>

volatile int ok;

static void * fault_handler_thread(void *arg)
{
	struct uffd_msg msg = {0};
	int nread;
	int uffd = (int)(long)arg;
	struct uffdio_writeprotect uffd_wp = {0};

	nread = read(uffd, &msg, sizeof(msg));
	if (nread == 0) {
		printf("EOF on userfaultfd!\n");
		exit(EXIT_FAILURE);
	}

	ok = 1;

	uffd_wp.range.start = msg.arg.pagefault.address & ~(4095ull);
	uffd_wp.range.len = 4096;

	if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_wp) == -1) {
		perror("uffd-w-unprotect");
		exit(EXIT_FAILURE);
	}
	return NULL;
}

char page[4096];

int main(void)
{
	struct uffdio_api uffdio_api = {0};
	struct uffdio_register uffdio_register = {0};
	struct uffdio_writeprotect uffdio_wp = {0};
	struct uffdio_copy uffdio_copy = {0};
	volatile char *pc;
	pthread_t thr;
	int err;
	int uffd;
	void *p;

	uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
	if (uffd == -1) {
		perror("userfaultfd");
		exit(-1);
	}

	uffdio_api.api = UFFD_API;
	uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
	if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1) {
		perror("api");
		exit(EXIT_FAILURE);
	}

	p = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

	if (p == MAP_FAILED) {
		perror("mmap");
		exit(EXIT_FAILURE);
	}

	uffdio_register.range.start = (unsigned long)p;
	uffdio_register.range.len = 4096;
	uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING|UFFDIO_REGISTER_MODE_WP;
	if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) {
		perror("register");
		exit(EXIT_FAILURE);
	}

	uffdio_copy.src = (unsigned long)page;
	uffdio_copy.dst = (unsigned long)p;
	uffdio_copy.len = 4096;
	uffdio_copy.mode = UFFDIO_COPY_MODE_WP;
	uffdio_copy.copy = 0;
	if (ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1) {
		perror("uffd-copy");
		exit(EXIT_FAILURE);
	}

	err = pthread_create(&thr, NULL, fault_handler_thread, (void *)(long)uffd);
	if (err != 0) {
		exit(EXIT_FAILURE);
	}

	if (mprotect(p, 4096, PROT_READ|PROT_WRITE) < 0) {
		perror("mprotect(PROT_READ|PROT_WRITE)");
		exit(EXIT_FAILURE);
	}

	pc = (volatile char *)p;
	*pc = 1;
	printf("%s\n", ok ? "ok" : "failed”);
	return 0;
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-02-17 21:16 [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag Nadav Amit
  2022-02-17 21:28 ` Nadav Amit
@ 2022-02-18  1:58 ` Peter Xu
  2022-02-18  2:23   ` Nadav Amit
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Xu @ 2022-02-18  1:58 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrew Morton, linux-mm, Nadav Amit, Mike Rapoport,
	Andrea Arcangeli, stable

Hello, Nadav,

On Thu, Feb 17, 2022 at 09:16:02PM +0000, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is
> currently only marked as write-protected if the VMA has VM_WRITE flag
> set. This seems incorrect or at least would be unexpected by the users.
> 
> Consider the following sequence of operations that are being performed
> on a certain page:
> 
> 	mprotect(PROT_READ)
> 	UFFDIO_COPY(UFFDIO_COPY_MODE_WP)
> 	mprotect(PROT_READ|PROT_WRITE)

No objection to the patch, however I'm wondering why this is a valid use
case because mprotect seems to be conflict with uffd, because AFAICT
mprotect(PROT_READ|PROT_WRITE) can already grant write bit.

In change_pte_range():

        if (dirty_accountable && pte_dirty(ptent) &&
                        (pte_soft_dirty(ptent) ||
                                !(vma->vm_flags & VM_SOFTDIRTY))) {
                ptent = pte_mkwrite(ptent);
        }

PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be:

        if (dirty_accountable && pte_dirty(ptent) &&
                        (pte_soft_dirty(ptent) ||
                        (vma->vm_flags & VM_SOFTDIRTY))) {
                ptent = pte_mkwrite(ptent);
        }

Because when VM_SOFTDIRTY is cleared it means soft dirty enabled.  I wanted
to post a patch but I never yet.

Could I ask why you need mprotect() with uffd?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-02-18  1:58 ` Peter Xu
@ 2022-02-18  2:23   ` Nadav Amit
  2022-02-18  3:56     ` Peter Xu
  2022-02-18  4:00     ` Nadav Amit
  0 siblings, 2 replies; 12+ messages in thread
From: Nadav Amit @ 2022-02-18  2:23 UTC (permalink / raw)
  To: Peter Xu; +Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable



> On Feb 17, 2022, at 5:58 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> Hello, Nadav,
> 
> On Thu, Feb 17, 2022 at 09:16:02PM +0000, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is
>> currently only marked as write-protected if the VMA has VM_WRITE flag
>> set. This seems incorrect or at least would be unexpected by the users.
>> 
>> Consider the following sequence of operations that are being performed
>> on a certain page:
>> 
>> 	mprotect(PROT_READ)
>> 	UFFDIO_COPY(UFFDIO_COPY_MODE_WP)
>> 	mprotect(PROT_READ|PROT_WRITE)
> 
> No objection to the patch, however I'm wondering why this is a valid use
> case because mprotect seems to be conflict with uffd, because AFAICT
> mprotect(PROT_READ|PROT_WRITE) can already grant write bit.
> 
> In change_pte_range():
> 
>        if (dirty_accountable && pte_dirty(ptent) &&
>                        (pte_soft_dirty(ptent) ||
>                                !(vma->vm_flags & VM_SOFTDIRTY))) {
>                ptent = pte_mkwrite(ptent);
>        }

I think you are right, and an additional patch is needed to prevent
mprotect() from making an entry writable if the PTE has _PAGE_UFFD_WP
set and uffd_wp_resolve was not provided. I missed that.

I’ll post another patch for this one.

> 
> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be:
> 
>        if (dirty_accountable && pte_dirty(ptent) &&
>                        (pte_soft_dirty(ptent) ||
>                        (vma->vm_flags & VM_SOFTDIRTY))) {
>                ptent = pte_mkwrite(ptent);
>        }
> 
> Because when VM_SOFTDIRTY is cleared it means soft dirty enabled.  I wanted
> to post a patch but I never yet.

Seems that you are right. Yet, having this wrong code around for
some time raises the concern whether something will break. By the
soft-dirty I saw so far, it seems that it is not commonly used.

> Could I ask why you need mprotect() with uffd?

Sure. I think I mentioned it before, that I want to use userfaultfd
for other processes [1], by having one monitor UFFD for multiple
processes that handles their swap/prefetch activities based on custom
policies.

I try to set the least amount of constraints on what these processes
might do, and mprotect() is something they are allowed to do.

I would hopefully send the patches that are required for all of that
and open source my code soon. In the meanwhile I try to upstream the
least controversial parts.

[1] https://lore.kernel.org/linux-mm/YWZCClDorCCM7KMG@t490s/t/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-02-18  2:23   ` Nadav Amit
@ 2022-02-18  3:56     ` Peter Xu
  2022-02-18  4:00     ` Nadav Amit
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Xu @ 2022-02-18  3:56 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable

On Thu, Feb 17, 2022 at 06:23:14PM -0800, Nadav Amit wrote:
> 
> 
> > On Feb 17, 2022, at 5:58 PM, Peter Xu <peterx@redhat.com> wrote:
> > 
> > Hello, Nadav,
> > 
> > On Thu, Feb 17, 2022 at 09:16:02PM +0000, Nadav Amit wrote:
> >> From: Nadav Amit <namit@vmware.com>
> >> 
> >> When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is
> >> currently only marked as write-protected if the VMA has VM_WRITE flag
> >> set. This seems incorrect or at least would be unexpected by the users.
> >> 
> >> Consider the following sequence of operations that are being performed
> >> on a certain page:
> >> 
> >> 	mprotect(PROT_READ)
> >> 	UFFDIO_COPY(UFFDIO_COPY_MODE_WP)
> >> 	mprotect(PROT_READ|PROT_WRITE)
> > 
> > No objection to the patch, however I'm wondering why this is a valid use
> > case because mprotect seems to be conflict with uffd, because AFAICT
> > mprotect(PROT_READ|PROT_WRITE) can already grant write bit.
> > 
> > In change_pte_range():
> > 
> >        if (dirty_accountable && pte_dirty(ptent) &&
> >                        (pte_soft_dirty(ptent) ||
> >                                !(vma->vm_flags & VM_SOFTDIRTY))) {
> >                ptent = pte_mkwrite(ptent);
> >        }
> 
> I think you are right, and an additional patch is needed to prevent
> mprotect() from making an entry writable if the PTE has _PAGE_UFFD_WP
> set and uffd_wp_resolve was not provided. I missed that.

Perhaps we can simply make this "if" to be "else" so as to connect with the
previous "if"?  After all the three (wp, wp_resolv, dirty_acct) are never
used with more than one flag set.

> 
> I’ll post another patch for this one.
> 
> > 
> > PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be:
> > 
> >        if (dirty_accountable && pte_dirty(ptent) &&
> >                        (pte_soft_dirty(ptent) ||
> >                        (vma->vm_flags & VM_SOFTDIRTY))) {
> >                ptent = pte_mkwrite(ptent);
> >        }
> > 
> > Because when VM_SOFTDIRTY is cleared it means soft dirty enabled.  I wanted
> > to post a patch but I never yet.
> 
> Seems that you are right. Yet, having this wrong code around for
> some time raises the concern whether something will break. By the
> soft-dirty I saw so far, it seems that it is not commonly used.

I'll see whether I should prepare a patch and a test, maybe after yours.

> 
> > Could I ask why you need mprotect() with uffd?
> 
> Sure. I think I mentioned it before, that I want to use userfaultfd
> for other processes [1], by having one monitor UFFD for multiple
> processes that handles their swap/prefetch activities based on custom
> policies.
> 
> I try to set the least amount of constraints on what these processes
> might do, and mprotect() is something they are allowed to do.

I see.  I didn't expect mprotect() can work well with uffd, but it seems
fine at least in this case.

Have you thought about other use of mprotect() other than RO?  Say, I only
know a valid use case of PROT_NONE for region reservation purpose, which
normally will be followed up by a munmap() and remap on the same address.
That sounds okay.  But not sure whether this patch will cover all the
possible mprotect() uses in the tracee.

> 
> I would hopefully send the patches that are required for all of that
> and open source my code soon. In the meanwhile I try to upstream the
> least controversial parts.

Sure, I'm always happy to learn it.  Thanks,

> 
> [1] https://lore.kernel.org/linux-mm/YWZCClDorCCM7KMG@t490s/t/

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-02-18  2:23   ` Nadav Amit
  2022-02-18  3:56     ` Peter Xu
@ 2022-02-18  4:00     ` Nadav Amit
  2022-02-18  4:05       ` Nadav Amit
  2022-02-21  6:23       ` Peter Xu
  1 sibling, 2 replies; 12+ messages in thread
From: Nadav Amit @ 2022-02-18  4:00 UTC (permalink / raw)
  To: Peter Xu; +Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable


> On Feb 17, 2022, at 6:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> 
>> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be:
>> 
>>       if (dirty_accountable && pte_dirty(ptent) &&
>>                       (pte_soft_dirty(ptent) ||
>>                       (vma->vm_flags & VM_SOFTDIRTY))) {
>>               ptent = pte_mkwrite(ptent);
>>       }

I know it is off-topic (not directly related to my patch), but
I tried to understand the logic - both of the existing code and of
your suggested change - and I failed.

IIUC dirty_accountable (whose value is taken from
vma_wants_writenotify()) means that the writes *should* be tracked,
and therefore the page should remain read-only.

So basically the condition should have been based on
!dirty_accountable, i.e. the inverted value of dirty_accountable.

The problem is that dirty_accountable also reflects VM_SOFTDIRTY
considerations, so looking on the PTE does not tell you whether
the PTE should remain write-protected even if it is dirty.

Am I missing something?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-02-18  4:00     ` Nadav Amit
@ 2022-02-18  4:05       ` Nadav Amit
  2022-03-16 22:05         ` Andrew Morton
  2022-02-21  6:23       ` Peter Xu
  1 sibling, 1 reply; 12+ messages in thread
From: Nadav Amit @ 2022-02-18  4:05 UTC (permalink / raw)
  To: Peter Xu; +Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable



> On Feb 17, 2022, at 8:00 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> 
> 
>> On Feb 17, 2022, at 6:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
>> 
>>> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be:
>>> 
>>>      if (dirty_accountable && pte_dirty(ptent) &&
>>>                      (pte_soft_dirty(ptent) ||
>>>                      (vma->vm_flags & VM_SOFTDIRTY))) {
>>>              ptent = pte_mkwrite(ptent);
>>>      }
> 
> I know it is off-topic (not directly related to my patch), but
> I tried to understand the logic - both of the existing code and of
> your suggested change - and I failed.
> 
> IIUC dirty_accountable (whose value is taken from
> vma_wants_writenotify()) means that the writes *should* be tracked,
> and therefore the page should remain read-only.
> 
> So basically the condition should have been based on
> !dirty_accountable, i.e. the inverted value of dirty_accountable.
> 
> The problem is that dirty_accountable also reflects VM_SOFTDIRTY
> considerations, so looking on the PTE does not tell you whether
> the PTE should remain write-protected even if it is dirty.

Just as I pressed send, I understood your suggestion.

It is still confusing for me how setting write-permissions for
dirty_accountable PTEs is safe (as long as they are dirty), and
yet in the general case it is not safe. I need to further think
of it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-02-18  4:00     ` Nadav Amit
  2022-02-18  4:05       ` Nadav Amit
@ 2022-02-21  6:23       ` Peter Xu
  2022-02-28 18:31         ` Nadav Amit
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Xu @ 2022-02-21  6:23 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable

On Thu, Feb 17, 2022 at 08:00:12PM -0800, Nadav Amit wrote:
> 
> > On Feb 17, 2022, at 6:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> > 
> >> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be:
> >> 
> >>       if (dirty_accountable && pte_dirty(ptent) &&
> >>                       (pte_soft_dirty(ptent) ||
> >>                       (vma->vm_flags & VM_SOFTDIRTY))) {
> >>               ptent = pte_mkwrite(ptent);
> >>       }
> 
> I know it is off-topic (not directly related to my patch), but
> I tried to understand the logic - both of the existing code and of
> your suggested change - and I failed.
> 
> IIUC dirty_accountable (whose value is taken from
> vma_wants_writenotify()) means that the writes *should* be tracked,
> and therefore the page should remain read-only.

Right.

> 
> So basically the condition should have been based on
> !dirty_accountable, i.e. the inverted value of dirty_accountable.
> 
> The problem is that dirty_accountable also reflects VM_SOFTDIRTY
> considerations, so looking on the PTE does not tell you whether
> the PTE should remain write-protected even if it is dirty.

My understanding is that the dirty bits (especially if both set) means
we've tracked dirty on this pte already so we don't need to, hence we can
set the dirty bit here.  E.g., continuous mprotect(RO), mprotect(RW) upon a
full dirty pte.

When something wants to enable tracking again, it needs to clear the dirty
bit, either the real one or soft-dirty one.  So it's a pure performance
enhancement to conditionally set write bit here, when we're sure we won't
need any further tracking on this pte.

One thing to mention is that this path only applies to VM_SHARED|VM_WRITE,
because that's what checked the first in vma_wants_writenotify():

	/* If it was private or non-writable, the write bit is already clear */
	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
		return 0;

IOW private mappings are not optimized in current tree yet.

Peter Collingbourne proposed a patch some time ago to optimize it but it
didn't get merged somehow.  Meanwhile even with his latest version it
should still miss the thp case, so if to reference the private optimization
Andrea's tree would be the best:

https://github.com/aagit/aa/commit/fadb5e04d94472614c76819acd979b2f60e4eff6

Hope it clarifies things a bit.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-02-21  6:23       ` Peter Xu
@ 2022-02-28 18:31         ` Nadav Amit
  0 siblings, 0 replies; 12+ messages in thread
From: Nadav Amit @ 2022-02-28 18:31 UTC (permalink / raw)
  To: Peter Xu; +Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable



> On Feb 20, 2022, at 10:23 PM, Peter Xu <peterx@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 08:00:12PM -0800, Nadav Amit wrote:
>> 
>>> On Feb 17, 2022, at 6:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
>>> 
>>>> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be:
>>>> 
>>>>      if (dirty_accountable && pte_dirty(ptent) &&
>>>>                      (pte_soft_dirty(ptent) ||
>>>>                      (vma->vm_flags & VM_SOFTDIRTY))) {
>>>>              ptent = pte_mkwrite(ptent);
>>>>      }
>> 
>> I know it is off-topic (not directly related to my patch), but
>> I tried to understand the logic - both of the existing code and of
>> your suggested change - and I failed.
>> 
>> IIUC dirty_accountable (whose value is taken from
>> vma_wants_writenotify()) means that the writes *should* be tracked,
>> and therefore the page should remain read-only.
> 
> Right.
> 
>> 
>> So basically the condition should have been based on
>> !dirty_accountable, i.e. the inverted value of dirty_accountable.
>> 
>> The problem is that dirty_accountable also reflects VM_SOFTDIRTY
>> considerations, so looking on the PTE does not tell you whether
>> the PTE should remain write-protected even if it is dirty.
> 
> My understanding is that the dirty bits (especially if both set) means
> we've tracked dirty on this pte already so we don't need to, hence we can
> set the dirty bit here.  E.g., continuous mprotect(RO), mprotect(RW) upon a
> full dirty pte.
> 
> When something wants to enable tracking again, it needs to clear the dirty
> bit, either the real one or soft-dirty one.  So it's a pure performance
> enhancement to conditionally set write bit here, when we're sure we won't
> need any further tracking on this pte.
> 
> One thing to mention is that this path only applies to VM_SHARED|VM_WRITE,
> because that's what checked the first in vma_wants_writenotify():
> 
> 	/* If it was private or non-writable, the write bit is already clear */
> 	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> 		return 0;
> 
> IOW private mappings are not optimized in current tree yet.
> 
> Peter Collingbourne proposed a patch some time ago to optimize it but it
> didn't get merged somehow.  Meanwhile even with his latest version it
> should still miss the thp case, so if to reference the private optimization
> Andrea's tree would be the best:
> 
> https://github.com/aagit/aa/commit/fadb5e04d94472614c76819acd979b2f60e4eff6
> 
> Hope it clarifies things a bit.  Thanks,

Thanks for the clarification. That’s what I suspected - I did not encounter
it since I only used private anonymous mappings. I will try to create a
test-case and send an additional fix for this issue.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-02-18  4:05       ` Nadav Amit
@ 2022-03-16 22:05         ` Andrew Morton
  2022-03-17  0:11           ` Peter Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2022-03-16 22:05 UTC (permalink / raw)
  To: Nadav Amit; +Cc: Peter Xu, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable


As I understand it, this patch (below) is still good to merge upstream,
although Peter hasn't acked it (please).

And a whole bunch of followup patches are being thought about, but have
yet to eventuate.

Do we feel that this patch warrants the cc:stable?  I'm suspecting
"no", as it isn't clear that the use-case is really legitimate at this
time?

Thanks.


From: Nadav Amit <namit@vmware.com>
Subject: userfaultfd: mark uffd_wp regardless of VM_WRITE flag

When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is
currently only marked as write-protected if the VMA has VM_WRITE flag set.
This seems incorrect or at least would be unexpected by the users.

Consider the following sequence of operations that are being performed on
a certain page:

	mprotect(PROT_READ)
	UFFDIO_COPY(UFFDIO_COPY_MODE_WP)
	mprotect(PROT_READ|PROT_WRITE)

At this point the user would expect to still get UFFD notification when
the page is accessed for write, but the user would not get one, since the
PTE was not marked as UFFD_WP during UFFDIO_COPY.

Fix it by always marking PTEs as UFFD_WP regardless on the
write-permission in the VMA flags.

Link: https://lkml.kernel.org/r/20220217211602.2769-1-namit@vmware.com
Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit")
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/userfaultfd.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

--- a/mm/userfaultfd.c~userfaultfd-mark-uffd_wp-regardless-of-vm_write-flag
+++ a/mm/userfaultfd.c
@@ -72,12 +72,15 @@ int mfill_atomic_install_pte(struct mm_s
 	_dst_pte = pte_mkdirty(_dst_pte);
 	if (page_in_cache && !vm_shared)
 		writable = false;
-	if (writable) {
-		if (wp_copy)
-			_dst_pte = pte_mkuffd_wp(_dst_pte);
-		else
-			_dst_pte = pte_mkwrite(_dst_pte);
-	}
+
+	/*
+	 * Always mark a PTE as write-protected when needed, regardless of
+	 * VM_WRITE, which the user might change.
+	 */
+	if (wp_copy)
+		_dst_pte = pte_mkuffd_wp(_dst_pte);
+	else if (writable)
+		_dst_pte = pte_mkwrite(_dst_pte);
 
 	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
 
_


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-03-16 22:05         ` Andrew Morton
@ 2022-03-17  0:11           ` Peter Xu
  2022-03-17  0:20             ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2022-03-17  0:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nadav Amit, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable

Hi, Andrew,

On Wed, Mar 16, 2022 at 03:05:53PM -0700, Andrew Morton wrote:
> 
> As I understand it, this patch (below) is still good to merge upstream,
> although Peter hasn't acked it (please).

Thanks for asking.  I didn't ack because I saw that it's queued a long time
ago into -mm, and also it's in -next for a long time too (my new uffd-wp
patchset is rebased to this one already).

I normally assume that means you read and ack that patch already, so if I
didn't see anything obviously wrong I'll just keep silent. Please let me
know if that's not the expected behavior..

So here it is..

Acked-by: Peter Xu <peterx@redhat.com>

> 
> And a whole bunch of followup patches are being thought about, but have
> yet to eventuate.

Is there a pointer/subject?

> 
> Do we feel that this patch warrants the cc:stable?  I'm suspecting
> "no", as it isn't clear that the use-case is really legitimate at this
> time?

Right. Uffd-wp+mprotect usage is IMHO not a major use case for uffd-wp per
my knowledge. At least I didn't even expect both work together, not until
Nadav started working on some of the problems..

Said that, for this specific case it should be literally only changing the
behavior of anonymous UFFD-WP && !WRITE case but nothing else (please
correct me otherwise..), then IMHO it's pretty safe to copy stable too
especially for the cleanly applicable branches.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  2022-03-17  0:11           ` Peter Xu
@ 2022-03-17  0:20             ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2022-03-17  0:20 UTC (permalink / raw)
  To: Peter Xu; +Cc: Nadav Amit, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable

On Thu, 17 Mar 2022 08:11:44 +0800 Peter Xu <peterx@redhat.com> wrote:

> Hi, Andrew,
> 
> On Wed, Mar 16, 2022 at 03:05:53PM -0700, Andrew Morton wrote:
> > 
> > As I understand it, this patch (below) is still good to merge upstream,
> > although Peter hasn't acked it (please).
> 
> Thanks for asking.  I didn't ack because I saw that it's queued a long time
> ago into -mm, and also it's in -next for a long time too (my new uffd-wp
> patchset is rebased to this one already).
> 
> I normally assume that means you read and ack that patch already, so if I
> didn't see anything obviously wrong I'll just keep silent. Please let me
> know if that's not the expected behavior..

Acks and reviews are always welcome.  If they come in late, git tree
maintainer might not want to update and rebase, but it's still there in
the archives for people who click on the Link:.

> So here it is..
> 
> Acked-by: Peter Xu <peterx@redhat.com>

Thanks.

> > 
> > And a whole bunch of followup patches are being thought about, but have
> > yet to eventuate.
> 
> Is there a pointer/subject?

The messages in this thread.  Several followup patches were discussed.

> > 
> > Do we feel that this patch warrants the cc:stable?  I'm suspecting
> > "no", as it isn't clear that the use-case is really legitimate at this
> > time?
> 
> Right. Uffd-wp+mprotect usage is IMHO not a major use case for uffd-wp per
> my knowledge. At least I didn't even expect both work together, not until
> Nadav started working on some of the problems..
> 
> Said that, for this specific case it should be literally only changing the
> behavior of anonymous UFFD-WP && !WRITE case but nothing else (please
> correct me otherwise..), then IMHO it's pretty safe to copy stable too
> especially for the cleanly applicable branches.

OK, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-17  0:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-17 21:16 [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag Nadav Amit
2022-02-17 21:28 ` Nadav Amit
2022-02-18  1:58 ` Peter Xu
2022-02-18  2:23   ` Nadav Amit
2022-02-18  3:56     ` Peter Xu
2022-02-18  4:00     ` Nadav Amit
2022-02-18  4:05       ` Nadav Amit
2022-03-16 22:05         ` Andrew Morton
2022-03-17  0:11           ` Peter Xu
2022-03-17  0:20             ` Andrew Morton
2022-02-21  6:23       ` Peter Xu
2022-02-28 18:31         ` Nadav Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).