* [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag @ 2022-02-17 21:16 Nadav Amit 2022-02-17 21:28 ` Nadav Amit 2022-02-18 1:58 ` Peter Xu 0 siblings, 2 replies; 12+ messages in thread From: Nadav Amit @ 2022-02-17 21:16 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, Nadav Amit, Mike Rapoport, Andrea Arcangeli, Peter Xu, stable From: Nadav Amit <namit@vmware.com> When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is currently only marked as write-protected if the VMA has VM_WRITE flag set. This seems incorrect or at least would be unexpected by the users. Consider the following sequence of operations that are being performed on a certain page: mprotect(PROT_READ) UFFDIO_COPY(UFFDIO_COPY_MODE_WP) mprotect(PROT_READ|PROT_WRITE) At this point the user would expect to still get UFFD notification when the page is accessed for write, but the user would not get one, since the PTE was not marked as UFFD_WP during UFFDIO_COPY. Fix it by always marking PTEs as UFFD_WP regardless on the write-permission in the VMA flags. Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit") Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> --- mm/userfaultfd.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 0780c2a57ff1..885e5adb0168 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -72,12 +72,15 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, _dst_pte = pte_mkdirty(_dst_pte); if (page_in_cache && !vm_shared) writable = false; - if (writable) { - if (wp_copy) - _dst_pte = pte_mkuffd_wp(_dst_pte); - else - _dst_pte = pte_mkwrite(_dst_pte); - } + + /* + * Always mark a PTE as write-protected when needed, regardless of + * VM_WRITE, which the user might change. + */ + if (wp_copy) + _dst_pte = pte_mkuffd_wp(_dst_pte); + else if (writable) + _dst_pte = pte_mkwrite(_dst_pte); dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); -- 2.25.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-02-17 21:16 [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag Nadav Amit @ 2022-02-17 21:28 ` Nadav Amit 2022-02-18 1:58 ` Peter Xu 1 sibling, 0 replies; 12+ messages in thread From: Nadav Amit @ 2022-02-17 21:28 UTC (permalink / raw) To: Nadav Amit Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, Peter Xu, stable > On Feb 17, 2022, at 1:16 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > > From: Nadav Amit <namit@vmware.com> > > When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is > currently only marked as write-protected if the VMA has VM_WRITE flag > set. This seems incorrect or at least would be unexpected by the users. One more note - if you want you can try the following reproducer: #define _GNU_SOURCE #include <linux/userfaultfd.h> #include <errno.h> #include <unistd.h> #include <stdlib.h> #include <fcntl.h> #include <pthread.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/types.h> #include <stdio.h> volatile int ok; static void * fault_handler_thread(void *arg) { struct uffd_msg msg = {0}; int nread; int uffd = (int)(long)arg; struct uffdio_writeprotect uffd_wp = {0}; nread = read(uffd, &msg, sizeof(msg)); if (nread == 0) { printf("EOF on userfaultfd!\n"); exit(EXIT_FAILURE); } ok = 1; uffd_wp.range.start = msg.arg.pagefault.address & ~(4095ull); uffd_wp.range.len = 4096; if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_wp) == -1) { perror("uffd-w-unprotect"); exit(EXIT_FAILURE); } return NULL; } char page[4096]; int main(void) { struct uffdio_api uffdio_api = {0}; struct uffdio_register uffdio_register = {0}; struct uffdio_writeprotect uffdio_wp = {0}; struct uffdio_copy uffdio_copy = {0}; volatile char *pc; pthread_t thr; int err; int uffd; void *p; uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); if (uffd == -1) { perror("userfaultfd"); exit(-1); } uffdio_api.api = UFFD_API; uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP; if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1) { perror("api"); exit(EXIT_FAILURE); } p = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); } uffdio_register.range.start = (unsigned long)p; uffdio_register.range.len = 4096; uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING|UFFDIO_REGISTER_MODE_WP; if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1) { perror("register"); exit(EXIT_FAILURE); } uffdio_copy.src = (unsigned long)page; uffdio_copy.dst = (unsigned long)p; uffdio_copy.len = 4096; uffdio_copy.mode = UFFDIO_COPY_MODE_WP; uffdio_copy.copy = 0; if (ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1) { perror("uffd-copy"); exit(EXIT_FAILURE); } err = pthread_create(&thr, NULL, fault_handler_thread, (void *)(long)uffd); if (err != 0) { exit(EXIT_FAILURE); } if (mprotect(p, 4096, PROT_READ|PROT_WRITE) < 0) { perror("mprotect(PROT_READ|PROT_WRITE)"); exit(EXIT_FAILURE); } pc = (volatile char *)p; *pc = 1; printf("%s\n", ok ? "ok" : "failed”); return 0; } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-02-17 21:16 [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag Nadav Amit 2022-02-17 21:28 ` Nadav Amit @ 2022-02-18 1:58 ` Peter Xu 2022-02-18 2:23 ` Nadav Amit 1 sibling, 1 reply; 12+ messages in thread From: Peter Xu @ 2022-02-18 1:58 UTC (permalink / raw) To: Nadav Amit Cc: Andrew Morton, linux-mm, Nadav Amit, Mike Rapoport, Andrea Arcangeli, stable Hello, Nadav, On Thu, Feb 17, 2022 at 09:16:02PM +0000, Nadav Amit wrote: > From: Nadav Amit <namit@vmware.com> > > When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is > currently only marked as write-protected if the VMA has VM_WRITE flag > set. This seems incorrect or at least would be unexpected by the users. > > Consider the following sequence of operations that are being performed > on a certain page: > > mprotect(PROT_READ) > UFFDIO_COPY(UFFDIO_COPY_MODE_WP) > mprotect(PROT_READ|PROT_WRITE) No objection to the patch, however I'm wondering why this is a valid use case because mprotect seems to be conflict with uffd, because AFAICT mprotect(PROT_READ|PROT_WRITE) can already grant write bit. In change_pte_range(): if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || !(vma->vm_flags & VM_SOFTDIRTY))) { ptent = pte_mkwrite(ptent); } PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be: if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || (vma->vm_flags & VM_SOFTDIRTY))) { ptent = pte_mkwrite(ptent); } Because when VM_SOFTDIRTY is cleared it means soft dirty enabled. I wanted to post a patch but I never yet. Could I ask why you need mprotect() with uffd? Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-02-18 1:58 ` Peter Xu @ 2022-02-18 2:23 ` Nadav Amit 2022-02-18 3:56 ` Peter Xu 2022-02-18 4:00 ` Nadav Amit 0 siblings, 2 replies; 12+ messages in thread From: Nadav Amit @ 2022-02-18 2:23 UTC (permalink / raw) To: Peter Xu; +Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable > On Feb 17, 2022, at 5:58 PM, Peter Xu <peterx@redhat.com> wrote: > > Hello, Nadav, > > On Thu, Feb 17, 2022 at 09:16:02PM +0000, Nadav Amit wrote: >> From: Nadav Amit <namit@vmware.com> >> >> When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is >> currently only marked as write-protected if the VMA has VM_WRITE flag >> set. This seems incorrect or at least would be unexpected by the users. >> >> Consider the following sequence of operations that are being performed >> on a certain page: >> >> mprotect(PROT_READ) >> UFFDIO_COPY(UFFDIO_COPY_MODE_WP) >> mprotect(PROT_READ|PROT_WRITE) > > No objection to the patch, however I'm wondering why this is a valid use > case because mprotect seems to be conflict with uffd, because AFAICT > mprotect(PROT_READ|PROT_WRITE) can already grant write bit. > > In change_pte_range(): > > if (dirty_accountable && pte_dirty(ptent) && > (pte_soft_dirty(ptent) || > !(vma->vm_flags & VM_SOFTDIRTY))) { > ptent = pte_mkwrite(ptent); > } I think you are right, and an additional patch is needed to prevent mprotect() from making an entry writable if the PTE has _PAGE_UFFD_WP set and uffd_wp_resolve was not provided. I missed that. I’ll post another patch for this one. > > PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be: > > if (dirty_accountable && pte_dirty(ptent) && > (pte_soft_dirty(ptent) || > (vma->vm_flags & VM_SOFTDIRTY))) { > ptent = pte_mkwrite(ptent); > } > > Because when VM_SOFTDIRTY is cleared it means soft dirty enabled. I wanted > to post a patch but I never yet. Seems that you are right. Yet, having this wrong code around for some time raises the concern whether something will break. By the soft-dirty I saw so far, it seems that it is not commonly used. > Could I ask why you need mprotect() with uffd? Sure. I think I mentioned it before, that I want to use userfaultfd for other processes [1], by having one monitor UFFD for multiple processes that handles their swap/prefetch activities based on custom policies. I try to set the least amount of constraints on what these processes might do, and mprotect() is something they are allowed to do. I would hopefully send the patches that are required for all of that and open source my code soon. In the meanwhile I try to upstream the least controversial parts. [1] https://lore.kernel.org/linux-mm/YWZCClDorCCM7KMG@t490s/t/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-02-18 2:23 ` Nadav Amit @ 2022-02-18 3:56 ` Peter Xu 2022-02-18 4:00 ` Nadav Amit 1 sibling, 0 replies; 12+ messages in thread From: Peter Xu @ 2022-02-18 3:56 UTC (permalink / raw) To: Nadav Amit Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable On Thu, Feb 17, 2022 at 06:23:14PM -0800, Nadav Amit wrote: > > > > On Feb 17, 2022, at 5:58 PM, Peter Xu <peterx@redhat.com> wrote: > > > > Hello, Nadav, > > > > On Thu, Feb 17, 2022 at 09:16:02PM +0000, Nadav Amit wrote: > >> From: Nadav Amit <namit@vmware.com> > >> > >> When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is > >> currently only marked as write-protected if the VMA has VM_WRITE flag > >> set. This seems incorrect or at least would be unexpected by the users. > >> > >> Consider the following sequence of operations that are being performed > >> on a certain page: > >> > >> mprotect(PROT_READ) > >> UFFDIO_COPY(UFFDIO_COPY_MODE_WP) > >> mprotect(PROT_READ|PROT_WRITE) > > > > No objection to the patch, however I'm wondering why this is a valid use > > case because mprotect seems to be conflict with uffd, because AFAICT > > mprotect(PROT_READ|PROT_WRITE) can already grant write bit. > > > > In change_pte_range(): > > > > if (dirty_accountable && pte_dirty(ptent) && > > (pte_soft_dirty(ptent) || > > !(vma->vm_flags & VM_SOFTDIRTY))) { > > ptent = pte_mkwrite(ptent); > > } > > I think you are right, and an additional patch is needed to prevent > mprotect() from making an entry writable if the PTE has _PAGE_UFFD_WP > set and uffd_wp_resolve was not provided. I missed that. Perhaps we can simply make this "if" to be "else" so as to connect with the previous "if"? After all the three (wp, wp_resolv, dirty_acct) are never used with more than one flag set. > > I’ll post another patch for this one. > > > > > PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be: > > > > if (dirty_accountable && pte_dirty(ptent) && > > (pte_soft_dirty(ptent) || > > (vma->vm_flags & VM_SOFTDIRTY))) { > > ptent = pte_mkwrite(ptent); > > } > > > > Because when VM_SOFTDIRTY is cleared it means soft dirty enabled. I wanted > > to post a patch but I never yet. > > Seems that you are right. Yet, having this wrong code around for > some time raises the concern whether something will break. By the > soft-dirty I saw so far, it seems that it is not commonly used. I'll see whether I should prepare a patch and a test, maybe after yours. > > > Could I ask why you need mprotect() with uffd? > > Sure. I think I mentioned it before, that I want to use userfaultfd > for other processes [1], by having one monitor UFFD for multiple > processes that handles their swap/prefetch activities based on custom > policies. > > I try to set the least amount of constraints on what these processes > might do, and mprotect() is something they are allowed to do. I see. I didn't expect mprotect() can work well with uffd, but it seems fine at least in this case. Have you thought about other use of mprotect() other than RO? Say, I only know a valid use case of PROT_NONE for region reservation purpose, which normally will be followed up by a munmap() and remap on the same address. That sounds okay. But not sure whether this patch will cover all the possible mprotect() uses in the tracee. > > I would hopefully send the patches that are required for all of that > and open source my code soon. In the meanwhile I try to upstream the > least controversial parts. Sure, I'm always happy to learn it. Thanks, > > [1] https://lore.kernel.org/linux-mm/YWZCClDorCCM7KMG@t490s/t/ -- Peter Xu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-02-18 2:23 ` Nadav Amit 2022-02-18 3:56 ` Peter Xu @ 2022-02-18 4:00 ` Nadav Amit 2022-02-18 4:05 ` Nadav Amit 2022-02-21 6:23 ` Peter Xu 1 sibling, 2 replies; 12+ messages in thread From: Nadav Amit @ 2022-02-18 4:00 UTC (permalink / raw) To: Peter Xu; +Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable > On Feb 17, 2022, at 6:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > >> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be: >> >> if (dirty_accountable && pte_dirty(ptent) && >> (pte_soft_dirty(ptent) || >> (vma->vm_flags & VM_SOFTDIRTY))) { >> ptent = pte_mkwrite(ptent); >> } I know it is off-topic (not directly related to my patch), but I tried to understand the logic - both of the existing code and of your suggested change - and I failed. IIUC dirty_accountable (whose value is taken from vma_wants_writenotify()) means that the writes *should* be tracked, and therefore the page should remain read-only. So basically the condition should have been based on !dirty_accountable, i.e. the inverted value of dirty_accountable. The problem is that dirty_accountable also reflects VM_SOFTDIRTY considerations, so looking on the PTE does not tell you whether the PTE should remain write-protected even if it is dirty. Am I missing something? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-02-18 4:00 ` Nadav Amit @ 2022-02-18 4:05 ` Nadav Amit 2022-03-16 22:05 ` Andrew Morton 2022-02-21 6:23 ` Peter Xu 1 sibling, 1 reply; 12+ messages in thread From: Nadav Amit @ 2022-02-18 4:05 UTC (permalink / raw) To: Peter Xu; +Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable > On Feb 17, 2022, at 8:00 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > > >> On Feb 17, 2022, at 6:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote: >> >>> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be: >>> >>> if (dirty_accountable && pte_dirty(ptent) && >>> (pte_soft_dirty(ptent) || >>> (vma->vm_flags & VM_SOFTDIRTY))) { >>> ptent = pte_mkwrite(ptent); >>> } > > I know it is off-topic (not directly related to my patch), but > I tried to understand the logic - both of the existing code and of > your suggested change - and I failed. > > IIUC dirty_accountable (whose value is taken from > vma_wants_writenotify()) means that the writes *should* be tracked, > and therefore the page should remain read-only. > > So basically the condition should have been based on > !dirty_accountable, i.e. the inverted value of dirty_accountable. > > The problem is that dirty_accountable also reflects VM_SOFTDIRTY > considerations, so looking on the PTE does not tell you whether > the PTE should remain write-protected even if it is dirty. Just as I pressed send, I understood your suggestion. It is still confusing for me how setting write-permissions for dirty_accountable PTEs is safe (as long as they are dirty), and yet in the general case it is not safe. I need to further think of it. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-02-18 4:05 ` Nadav Amit @ 2022-03-16 22:05 ` Andrew Morton 2022-03-17 0:11 ` Peter Xu 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2022-03-16 22:05 UTC (permalink / raw) To: Nadav Amit; +Cc: Peter Xu, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable As I understand it, this patch (below) is still good to merge upstream, although Peter hasn't acked it (please). And a whole bunch of followup patches are being thought about, but have yet to eventuate. Do we feel that this patch warrants the cc:stable? I'm suspecting "no", as it isn't clear that the use-case is really legitimate at this time? Thanks. From: Nadav Amit <namit@vmware.com> Subject: userfaultfd: mark uffd_wp regardless of VM_WRITE flag When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is currently only marked as write-protected if the VMA has VM_WRITE flag set. This seems incorrect or at least would be unexpected by the users. Consider the following sequence of operations that are being performed on a certain page: mprotect(PROT_READ) UFFDIO_COPY(UFFDIO_COPY_MODE_WP) mprotect(PROT_READ|PROT_WRITE) At this point the user would expect to still get UFFD notification when the page is accessed for write, but the user would not get one, since the PTE was not marked as UFFD_WP during UFFDIO_COPY. Fix it by always marking PTEs as UFFD_WP regardless on the write-permission in the VMA flags. Link: https://lkml.kernel.org/r/20220217211602.2769-1-namit@vmware.com Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit") Signed-off-by: Nadav Amit <namit@vmware.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/userfaultfd.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) --- a/mm/userfaultfd.c~userfaultfd-mark-uffd_wp-regardless-of-vm_write-flag +++ a/mm/userfaultfd.c @@ -72,12 +72,15 @@ int mfill_atomic_install_pte(struct mm_s _dst_pte = pte_mkdirty(_dst_pte); if (page_in_cache && !vm_shared) writable = false; - if (writable) { - if (wp_copy) - _dst_pte = pte_mkuffd_wp(_dst_pte); - else - _dst_pte = pte_mkwrite(_dst_pte); - } + + /* + * Always mark a PTE as write-protected when needed, regardless of + * VM_WRITE, which the user might change. + */ + if (wp_copy) + _dst_pte = pte_mkuffd_wp(_dst_pte); + else if (writable) + _dst_pte = pte_mkwrite(_dst_pte); dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); _ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-03-16 22:05 ` Andrew Morton @ 2022-03-17 0:11 ` Peter Xu 2022-03-17 0:20 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: Peter Xu @ 2022-03-17 0:11 UTC (permalink / raw) To: Andrew Morton Cc: Nadav Amit, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable Hi, Andrew, On Wed, Mar 16, 2022 at 03:05:53PM -0700, Andrew Morton wrote: > > As I understand it, this patch (below) is still good to merge upstream, > although Peter hasn't acked it (please). Thanks for asking. I didn't ack because I saw that it's queued a long time ago into -mm, and also it's in -next for a long time too (my new uffd-wp patchset is rebased to this one already). I normally assume that means you read and ack that patch already, so if I didn't see anything obviously wrong I'll just keep silent. Please let me know if that's not the expected behavior.. So here it is.. Acked-by: Peter Xu <peterx@redhat.com> > > And a whole bunch of followup patches are being thought about, but have > yet to eventuate. Is there a pointer/subject? > > Do we feel that this patch warrants the cc:stable? I'm suspecting > "no", as it isn't clear that the use-case is really legitimate at this > time? Right. Uffd-wp+mprotect usage is IMHO not a major use case for uffd-wp per my knowledge. At least I didn't even expect both work together, not until Nadav started working on some of the problems.. Said that, for this specific case it should be literally only changing the behavior of anonymous UFFD-WP && !WRITE case but nothing else (please correct me otherwise..), then IMHO it's pretty safe to copy stable too especially for the cleanly applicable branches. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-03-17 0:11 ` Peter Xu @ 2022-03-17 0:20 ` Andrew Morton 0 siblings, 0 replies; 12+ messages in thread From: Andrew Morton @ 2022-03-17 0:20 UTC (permalink / raw) To: Peter Xu; +Cc: Nadav Amit, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable On Thu, 17 Mar 2022 08:11:44 +0800 Peter Xu <peterx@redhat.com> wrote: > Hi, Andrew, > > On Wed, Mar 16, 2022 at 03:05:53PM -0700, Andrew Morton wrote: > > > > As I understand it, this patch (below) is still good to merge upstream, > > although Peter hasn't acked it (please). > > Thanks for asking. I didn't ack because I saw that it's queued a long time > ago into -mm, and also it's in -next for a long time too (my new uffd-wp > patchset is rebased to this one already). > > I normally assume that means you read and ack that patch already, so if I > didn't see anything obviously wrong I'll just keep silent. Please let me > know if that's not the expected behavior.. Acks and reviews are always welcome. If they come in late, git tree maintainer might not want to update and rebase, but it's still there in the archives for people who click on the Link:. > So here it is.. > > Acked-by: Peter Xu <peterx@redhat.com> Thanks. > > > > And a whole bunch of followup patches are being thought about, but have > > yet to eventuate. > > Is there a pointer/subject? The messages in this thread. Several followup patches were discussed. > > > > Do we feel that this patch warrants the cc:stable? I'm suspecting > > "no", as it isn't clear that the use-case is really legitimate at this > > time? > > Right. Uffd-wp+mprotect usage is IMHO not a major use case for uffd-wp per > my knowledge. At least I didn't even expect both work together, not until > Nadav started working on some of the problems.. > > Said that, for this specific case it should be literally only changing the > behavior of anonymous UFFD-WP && !WRITE case but nothing else (please > correct me otherwise..), then IMHO it's pretty safe to copy stable too > especially for the cleanly applicable branches. OK, thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-02-18 4:00 ` Nadav Amit 2022-02-18 4:05 ` Nadav Amit @ 2022-02-21 6:23 ` Peter Xu 2022-02-28 18:31 ` Nadav Amit 1 sibling, 1 reply; 12+ messages in thread From: Peter Xu @ 2022-02-21 6:23 UTC (permalink / raw) To: Nadav Amit Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable On Thu, Feb 17, 2022 at 08:00:12PM -0800, Nadav Amit wrote: > > > On Feb 17, 2022, at 6:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > > > >> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be: > >> > >> if (dirty_accountable && pte_dirty(ptent) && > >> (pte_soft_dirty(ptent) || > >> (vma->vm_flags & VM_SOFTDIRTY))) { > >> ptent = pte_mkwrite(ptent); > >> } > > I know it is off-topic (not directly related to my patch), but > I tried to understand the logic - both of the existing code and of > your suggested change - and I failed. > > IIUC dirty_accountable (whose value is taken from > vma_wants_writenotify()) means that the writes *should* be tracked, > and therefore the page should remain read-only. Right. > > So basically the condition should have been based on > !dirty_accountable, i.e. the inverted value of dirty_accountable. > > The problem is that dirty_accountable also reflects VM_SOFTDIRTY > considerations, so looking on the PTE does not tell you whether > the PTE should remain write-protected even if it is dirty. My understanding is that the dirty bits (especially if both set) means we've tracked dirty on this pte already so we don't need to, hence we can set the dirty bit here. E.g., continuous mprotect(RO), mprotect(RW) upon a full dirty pte. When something wants to enable tracking again, it needs to clear the dirty bit, either the real one or soft-dirty one. So it's a pure performance enhancement to conditionally set write bit here, when we're sure we won't need any further tracking on this pte. One thing to mention is that this path only applies to VM_SHARED|VM_WRITE, because that's what checked the first in vma_wants_writenotify(): /* If it was private or non-writable, the write bit is already clear */ if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) return 0; IOW private mappings are not optimized in current tree yet. Peter Collingbourne proposed a patch some time ago to optimize it but it didn't get merged somehow. Meanwhile even with his latest version it should still miss the thp case, so if to reference the private optimization Andrea's tree would be the best: https://github.com/aagit/aa/commit/fadb5e04d94472614c76819acd979b2f60e4eff6 Hope it clarifies things a bit. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag 2022-02-21 6:23 ` Peter Xu @ 2022-02-28 18:31 ` Nadav Amit 0 siblings, 0 replies; 12+ messages in thread From: Nadav Amit @ 2022-02-28 18:31 UTC (permalink / raw) To: Peter Xu; +Cc: Andrew Morton, Linux-MM, Mike Rapoport, Andrea Arcangeli, stable > On Feb 20, 2022, at 10:23 PM, Peter Xu <peterx@redhat.com> wrote: > > On Thu, Feb 17, 2022 at 08:00:12PM -0800, Nadav Amit wrote: >> >>> On Feb 17, 2022, at 6:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote: >>> >>>> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be: >>>> >>>> if (dirty_accountable && pte_dirty(ptent) && >>>> (pte_soft_dirty(ptent) || >>>> (vma->vm_flags & VM_SOFTDIRTY))) { >>>> ptent = pte_mkwrite(ptent); >>>> } >> >> I know it is off-topic (not directly related to my patch), but >> I tried to understand the logic - both of the existing code and of >> your suggested change - and I failed. >> >> IIUC dirty_accountable (whose value is taken from >> vma_wants_writenotify()) means that the writes *should* be tracked, >> and therefore the page should remain read-only. > > Right. > >> >> So basically the condition should have been based on >> !dirty_accountable, i.e. the inverted value of dirty_accountable. >> >> The problem is that dirty_accountable also reflects VM_SOFTDIRTY >> considerations, so looking on the PTE does not tell you whether >> the PTE should remain write-protected even if it is dirty. > > My understanding is that the dirty bits (especially if both set) means > we've tracked dirty on this pte already so we don't need to, hence we can > set the dirty bit here. E.g., continuous mprotect(RO), mprotect(RW) upon a > full dirty pte. > > When something wants to enable tracking again, it needs to clear the dirty > bit, either the real one or soft-dirty one. So it's a pure performance > enhancement to conditionally set write bit here, when we're sure we won't > need any further tracking on this pte. > > One thing to mention is that this path only applies to VM_SHARED|VM_WRITE, > because that's what checked the first in vma_wants_writenotify(): > > /* If it was private or non-writable, the write bit is already clear */ > if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) > return 0; > > IOW private mappings are not optimized in current tree yet. > > Peter Collingbourne proposed a patch some time ago to optimize it but it > didn't get merged somehow. Meanwhile even with his latest version it > should still miss the thp case, so if to reference the private optimization > Andrea's tree would be the best: > > https://github.com/aagit/aa/commit/fadb5e04d94472614c76819acd979b2f60e4eff6 > > Hope it clarifies things a bit. Thanks, Thanks for the clarification. That’s what I suspected - I did not encounter it since I only used private anonymous mappings. I will try to create a test-case and send an additional fix for this issue. Regards, Nadav ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-03-17 0:21 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-17 21:16 [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag Nadav Amit 2022-02-17 21:28 ` Nadav Amit 2022-02-18 1:58 ` Peter Xu 2022-02-18 2:23 ` Nadav Amit 2022-02-18 3:56 ` Peter Xu 2022-02-18 4:00 ` Nadav Amit 2022-02-18 4:05 ` Nadav Amit 2022-03-16 22:05 ` Andrew Morton 2022-03-17 0:11 ` Peter Xu 2022-03-17 0:20 ` Andrew Morton 2022-02-21 6:23 ` Peter Xu 2022-02-28 18:31 ` Nadav Amit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).