From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C8EFC11D0F for ; Thu, 20 Feb 2020 16:32:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1614A206F4 for ; Thu, 20 Feb 2020 16:32:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QgisGrZQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1614A206F4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E00F56B0072; Thu, 20 Feb 2020 11:31:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DD8246B0074; Thu, 20 Feb 2020 11:31:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2D4E6B0075; Thu, 20 Feb 2020 11:31:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37]) by kanga.kvack.org (Postfix) with ESMTP id A3C7B6B0072 for ; Thu, 20 Feb 2020 11:31:48 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 59DB9629 for ; Thu, 20 Feb 2020 16:31:48 +0000 (UTC) X-FDA: 76511046696.09.scene49_2cc7faf4d5c21 X-HE-Tag: scene49_2cc7faf4d5c21 X-Filterd-Recvd-Size: 12572 Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Thu, 20 Feb 2020 16:31:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582216307; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KwAt63ieu7zPE8XDltJMVHBI0CJ7NvlIBLQTH18zNlI=; b=QgisGrZQpPD8IGcTX2cNuQidHjgsT/Aj6Gp9aJzQfOXSgZSYQ3rQmbRZZaF2xKEcvfRLTr AjPZhh7HnJwvnMjkfRo/Y5Do34iBUyrPay21WwZWBBIbMHAHdPmmn/kYIpEer669rRrxPv nPPsij499WLlhyPPRHP/QiVG2oCBf2c= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-18-N7thfYQVMyarcUKtUAostQ-1; Thu, 20 Feb 2020 11:31:41 -0500 X-MC-Unique: N7thfYQVMyarcUKtUAostQ-1 Received: by mail-qt1-f198.google.com with SMTP id t4so2978908qtd.3 for ; Thu, 20 Feb 2020 08:31:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/fErIiriZjN2LfwRYQkM2RM9ZLBZ/qGM/YQvYt8olcQ=; b=IUBN59OWXBPSfjT92pIIpHiIIGIN4wXd4sVJ45KerQ6hEdGJ7lqiSFjLY2VZ/fV5FK A/2tAWpi9M/WsM6HL0MxsDKrK+GtZzqcgVhub1E4475un8zMt3xS0iRKArGQkdMCQbhS 7W40Xo42RYLYYQvxYuV/nL9RKEkQ4/aF40v+GT1aJpp0w1VLQ/C70MIyN8ynL/CIi39B xJJasRpkC/CpAETV3vZiY+mXmFYpeYBmunsSw2epVGQYZ4S8+m1t64CaUmnQakcIAWY3 KJXor+yJolN2vPFlzCBMo8i1PZMFHKDx+ftet/y/BFIMvSbPkshDEsLetjItkGshb//h lC4g== X-Gm-Message-State: APjAAAWGrWnxsecpY7Y6VnYSzDTIl2c2zsK9gt6sXPWrL3fpawxtldT8 sziHrolYVe/qKIeuE7oCAxniNA9sVGfJMXgtZF9aq7HD2xIJ8qbxl6t6798LYOyOxKUG9f2ahlm Mh+ft1zRZ/4g= X-Received: by 2002:a0c:cdcb:: with SMTP id a11mr25621311qvn.244.1582216300181; Thu, 20 Feb 2020 08:31:40 -0800 (PST) X-Google-Smtp-Source: APXvYqzj+G/5aXMbVvbkSJ7QSRVrrtzLalulUiAYjpq3+fBLL51Wlj2lyKCbKSLkGOcUBcleWlxD8w== X-Received: by 2002:a0c:cdcb:: with SMTP id a11mr25621252qvn.244.1582216299522; Thu, 20 Feb 2020 08:31:39 -0800 (PST) Received: from xz-x1.redhat.com ([104.156.64.75]) by smtp.gmail.com with ESMTPSA id l19sm42366qkl.3.2020.02.20.08.31.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Feb 2020 08:31:39 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Brian Geffon , Pavel Emelyanov , Mike Kravetz , David Hildenbrand , peterx@redhat.com, Martin Cracauer , Andrea Arcangeli , Mel Gorman , Bobby Powers , Mike Rapoport , "Kirill A . Shutemov" , Maya Gokhale , Johannes Weiner , Marty McFadden , Denis Plotnikov , Hugh Dickins , "Dr . David Alan Gilbert" , Jerome Glisse Subject: [PATCH v6 13/19] userfaultfd: wp: add the writeprotect API to userfaultfd ioctl Date: Thu, 20 Feb 2020 11:31:06 -0500 Message-Id: <20200220163112.11409-14-peterx@redhat.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200220163112.11409-1-peterx@redhat.com> References: <20200220163112.11409-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Andrea Arcangeli v1: From: Shaohua Li v2: cleanups, remove a branch. [peterx writes up the commit message, as below...] This patch introduces the new uffd-wp APIs for userspace. Firstly, we'll allow to do UFFDIO_REGISTER with write protection tracking using the new UFFDIO_REGISTER_MODE_WP flag. Note that this flag can co-exist with the existing UFFDIO_REGISTER_MODE_MISSING, in which case the userspace program can not only resolve missing page faults, and at the same time tracking page data changes along the way. Secondly, we introduced the new UFFDIO_WRITEPROTECT API to do page level write protection tracking. Note that we will need to register the memory region with UFFDIO_REGISTER_MODE_WP before that. Signed-off-by: Andrea Arcangeli [peterx: remove useless block, write commit message, check against VM_MAYWRITE rather than VM_WRITE when register] Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- fs/userfaultfd.c | 82 +++++++++++++++++++++++++------- include/uapi/linux/userfaultfd.h | 23 +++++++++ 2 files changed, 89 insertions(+), 16 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index c49bef505775..59e9e399fddb 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -314,8 +314,11 @@ static inline bool userfaultfd_must_wait(struct userfa= ultfd_ctx *ctx, =09if (!pmd_present(_pmd)) =09=09goto out; =20 -=09if (pmd_trans_huge(_pmd)) +=09if (pmd_trans_huge(_pmd)) { +=09=09if (!pmd_write(_pmd) && (reason & VM_UFFD_WP)) +=09=09=09ret =3D true; =09=09goto out; +=09} =20 =09/* =09 * the pmd is stable (as in !pmd_trans_unstable) so we can re-read it @@ -328,6 +331,8 @@ static inline bool userfaultfd_must_wait(struct userfau= ltfd_ctx *ctx, =09 */ =09if (pte_none(*pte)) =09=09ret =3D true; +=09if (!pte_write(*pte) && (reason & VM_UFFD_WP)) +=09=09ret =3D true; =09pte_unmap(pte); =20 out: @@ -1287,10 +1292,13 @@ static __always_inline int validate_range(struct mm= _struct *mm, =09return 0; } =20 -static inline bool vma_can_userfault(struct vm_area_struct *vma) +static inline bool vma_can_userfault(struct vm_area_struct *vma, +=09=09=09=09 unsigned long vm_flags) { -=09return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || -=09=09vma_is_shmem(vma); +=09/* FIXME: add WP support to hugetlbfs and shmem */ +=09return vma_is_anonymous(vma) || +=09=09((is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) && +=09=09 !(vm_flags & VM_UFFD_WP)); } =20 static int userfaultfd_register(struct userfaultfd_ctx *ctx, @@ -1322,15 +1330,8 @@ static int userfaultfd_register(struct userfaultfd_c= tx *ctx, =09vm_flags =3D 0; =09if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING) =09=09vm_flags |=3D VM_UFFD_MISSING; -=09if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) { +=09if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) =09=09vm_flags |=3D VM_UFFD_WP; -=09=09/* -=09=09 * FIXME: remove the below error constraint by -=09=09 * implementing the wprotect tracking mode. -=09=09 */ -=09=09ret =3D -EINVAL; -=09=09goto out; -=09} =20 =09ret =3D validate_range(mm, &uffdio_register.range.start, =09=09=09 uffdio_register.range.len); @@ -1380,7 +1381,7 @@ static int userfaultfd_register(struct userfaultfd_ct= x *ctx, =20 =09=09/* check not compatible vmas */ =09=09ret =3D -EINVAL; -=09=09if (!vma_can_userfault(cur)) +=09=09if (!vma_can_userfault(cur, vm_flags)) =09=09=09goto out_unlock; =20 =09=09/* @@ -1408,6 +1409,8 @@ static int userfaultfd_register(struct userfaultfd_ct= x *ctx, =09=09=09if (end & (vma_hpagesize - 1)) =09=09=09=09goto out_unlock; =09=09} +=09=09if ((vm_flags & VM_UFFD_WP) && !(cur->vm_flags & VM_MAYWRITE)) +=09=09=09goto out_unlock; =20 =09=09/* =09=09 * Check that this vma isn't already owned by a @@ -1437,7 +1440,7 @@ static int userfaultfd_register(struct userfaultfd_ct= x *ctx, =09do { =09=09cond_resched(); =20 -=09=09BUG_ON(!vma_can_userfault(vma)); +=09=09BUG_ON(!vma_can_userfault(vma, vm_flags)); =09=09BUG_ON(vma->vm_userfaultfd_ctx.ctx && =09=09 vma->vm_userfaultfd_ctx.ctx !=3D ctx); =09=09WARN_ON(!(vma->vm_flags & VM_MAYWRITE)); @@ -1575,7 +1578,7 @@ static int userfaultfd_unregister(struct userfaultfd_= ctx *ctx, =09=09 * provides for more strict behavior to notice =09=09 * unregistration errors. =09=09 */ -=09=09if (!vma_can_userfault(cur)) +=09=09if (!vma_can_userfault(cur, cur->vm_flags)) =09=09=09goto out_unlock; =20 =09=09found =3D true; @@ -1589,7 +1592,7 @@ static int userfaultfd_unregister(struct userfaultfd_= ctx *ctx, =09do { =09=09cond_resched(); =20 -=09=09BUG_ON(!vma_can_userfault(vma)); +=09=09BUG_ON(!vma_can_userfault(vma, vma->vm_flags)); =20 =09=09/* =09=09 * Nothing to do: this vma is already registered into this @@ -1802,6 +1805,50 @@ static int userfaultfd_zeropage(struct userfaultfd_c= tx *ctx, =09return ret; } =20 +static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, +=09=09=09=09 unsigned long arg) +{ +=09int ret; +=09struct uffdio_writeprotect uffdio_wp; +=09struct uffdio_writeprotect __user *user_uffdio_wp; +=09struct userfaultfd_wake_range range; + +=09if (READ_ONCE(ctx->mmap_changing)) +=09=09return -EAGAIN; + +=09user_uffdio_wp =3D (struct uffdio_writeprotect __user *) arg; + +=09if (copy_from_user(&uffdio_wp, user_uffdio_wp, +=09=09=09 sizeof(struct uffdio_writeprotect))) +=09=09return -EFAULT; + +=09ret =3D validate_range(ctx->mm, &uffdio_wp.range.start, +=09=09=09 uffdio_wp.range.len); +=09if (ret) +=09=09return ret; + +=09if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | +=09=09=09 UFFDIO_WRITEPROTECT_MODE_WP)) +=09=09return -EINVAL; +=09if ((uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP) && +=09 (uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE)) +=09=09return -EINVAL; + +=09ret =3D mwriteprotect_range(ctx->mm, uffdio_wp.range.start, +=09=09=09=09 uffdio_wp.range.len, uffdio_wp.mode & +=09=09=09=09 UFFDIO_WRITEPROTECT_MODE_WP, +=09=09=09=09 &ctx->mmap_changing); +=09if (ret) +=09=09return ret; + +=09if (!(uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE)) { +=09=09range.start =3D uffdio_wp.range.start; +=09=09range.len =3D uffdio_wp.range.len; +=09=09wake_userfault(ctx, &range); +=09} +=09return ret; +} + static inline unsigned int uffd_ctx_features(__u64 user_features) { =09/* @@ -1883,6 +1930,9 @@ static long userfaultfd_ioctl(struct file *file, unsi= gned cmd, =09case UFFDIO_ZEROPAGE: =09=09ret =3D userfaultfd_zeropage(ctx, arg); =09=09break; +=09case UFFDIO_WRITEPROTECT: +=09=09ret =3D userfaultfd_writeprotect(ctx, arg); +=09=09break; =09} =09return ret; } diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index 340f23bc251d..95c4a160e5f8 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -52,6 +52,7 @@ #define _UFFDIO_WAKE=09=09=09(0x02) #define _UFFDIO_COPY=09=09=09(0x03) #define _UFFDIO_ZEROPAGE=09=09(0x04) +#define _UFFDIO_WRITEPROTECT=09=09(0x06) #define _UFFDIO_API=09=09=09(0x3F) =20 /* userfaultfd ioctl ids */ @@ -68,6 +69,8 @@ =09=09=09=09 struct uffdio_copy) #define UFFDIO_ZEROPAGE=09=09_IOWR(UFFDIO, _UFFDIO_ZEROPAGE,=09\ =09=09=09=09 struct uffdio_zeropage) +#define UFFDIO_WRITEPROTECT=09_IOWR(UFFDIO, _UFFDIO_WRITEPROTECT, \ +=09=09=09=09 struct uffdio_writeprotect) =20 /* read() structure */ struct uffd_msg { @@ -232,4 +235,24 @@ struct uffdio_zeropage { =09__s64 zeropage; }; =20 +struct uffdio_writeprotect { +=09struct uffdio_range range; +/* + * UFFDIO_WRITEPROTECT_MODE_WP: set the flag to write protect a range, + * unset the flag to undo protection of a range which was previously + * write protected. + * + * UFFDIO_WRITEPROTECT_MODE_DONTWAKE: set the flag to avoid waking up + * any wait thread after the operation succeeds. + * + * NOTE: Write protecting a region (WP=3D1) is unrelated to page faults, + * therefore DONTWAKE flag is meaningless with WP=3D1. Removing write + * protection (WP=3D0) in response to a page fault wakes the faulting + * task unless DONTWAKE is set. + */ +#define UFFDIO_WRITEPROTECT_MODE_WP=09=09((__u64)1<<0) +#define UFFDIO_WRITEPROTECT_MODE_DONTWAKE=09((__u64)1<<1) +=09__u64 mode; +}; + #endif /* _LINUX_USERFAULTFD_H */ --=20 2.24.1