From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id E678B8E0001 for ; Mon, 21 Jan 2019 09:05:56 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id q33so20671655qte.23 for ; Mon, 21 Jan 2019 06:05:56 -0800 (PST) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id p78si403079qke.233.2019.01.21.06.05.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Jan 2019 06:05:55 -0800 (PST) Date: Mon, 21 Jan 2019 09:05:35 -0500 From: Jerome Glisse Subject: Re: [PATCH RFC 06/24] userfaultfd: wp: support write protection for userfault vma range Message-ID: <20190121140535.GD3344@redhat.com> References: <20190121075722.7945-1-peterx@redhat.com> <20190121075722.7945-7-peterx@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20190121075722.7945-7-peterx@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Peter Xu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Hugh Dickins , Maya Gokhale , Johannes Weiner , Martin Cracauer , Denis Plotnikov , Shaohua Li , Andrea Arcangeli , Pavel Emelyanov , Mike Kravetz , Marty McFadden , Mike Rapoport , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" , Rik van Riel On Mon, Jan 21, 2019 at 03:57:04PM +0800, Peter Xu wrote: > From: Shaohua Li > > Add API to enable/disable writeprotect a vma range. Unlike mprotect, > this doesn't split/merge vmas. AFAICT it does not do that. > > Cc: Andrea Arcangeli > Cc: Pavel Emelyanov > Cc: Rik van Riel > Cc: Kirill A. Shutemov > Cc: Mel Gorman > Cc: Hugh Dickins > Cc: Johannes Weiner > Signed-off-by: Shaohua Li > Signed-off-by: Andrea Arcangeli > Signed-off-by: Peter Xu > --- > include/linux/userfaultfd_k.h | 2 ++ > mm/userfaultfd.c | 52 +++++++++++++++++++++++++++++++++++ > 2 files changed, 54 insertions(+) > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > index 38f748e7186e..e82f3156f4e9 100644 > --- a/include/linux/userfaultfd_k.h > +++ b/include/linux/userfaultfd_k.h > @@ -37,6 +37,8 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, > unsigned long dst_start, > unsigned long len, > bool *mmap_changing); > +extern int mwriteprotect_range(struct mm_struct *dst_mm, > + unsigned long start, unsigned long len, bool enable_wp); > > /* mm helpers */ > static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 458acda96f20..c38903f501c7 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -615,3 +615,55 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, > { > return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing); > } > + > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > + unsigned long len, bool enable_wp) > +{ > + struct vm_area_struct *dst_vma; > + pgprot_t newprot; > + int err; > + > + /* > + * Sanitize the command parameters: > + */ > + BUG_ON(start & ~PAGE_MASK); > + BUG_ON(len & ~PAGE_MASK); > + > + /* Does the address range wrap, or is the span zero-sized? */ > + BUG_ON(start + len <= start); > + > + down_read(&dst_mm->mmap_sem); > + > + /* > + * Make sure the vma is not shared, that the dst range is > + * both valid and fully within a single existing vma. > + */ > + err = -EINVAL; > + dst_vma = find_vma(dst_mm, start); > + if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) > + goto out_unlock; > + if (start < dst_vma->vm_start || > + start + len > dst_vma->vm_end) > + goto out_unlock; > + > + if (!dst_vma->vm_userfaultfd_ctx.ctx) > + goto out_unlock; > + if (!userfaultfd_wp(dst_vma)) > + goto out_unlock; > + > + if (!vma_is_anonymous(dst_vma)) > + goto out_unlock; > + > + if (enable_wp) > + newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE)); > + else > + newprot = vm_get_page_prot(dst_vma->vm_flags); > + > + change_protection(dst_vma, start, start + len, newprot, > + !enable_wp, 0); So setting dirty_accountable bring us to that code in mprotect.c: if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || !(vma->vm_flags & VM_SOFTDIRTY))) { ptent = pte_mkwrite(ptent); } My understanding is that you want to set write flag when enable_wp is false and you want to set the write flag unconditionaly, right ? If so then you should really move the change_protection() flags patch before this patch and add a flag for setting pte write flags. Otherwise the above is broken at it will only set the write flag for pte that were dirty and i am guessing so far you always were lucky because pte were all dirty (change_protection will preserve dirtyness) when you write protected them. So i believe the above is broken or at very least unclear if what you really want is to only set write flag to pte that have the dirty flag set. Cheers, Jérôme > + > + err = 0; > +out_unlock: > + up_read(&dst_mm->mmap_sem); > + return err; > +} > -- > 2.17.1 >