From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DFBFC433E6 for ; Mon, 1 Feb 2021 22:12:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1E96E64EC1 for ; Mon, 1 Feb 2021 22:12:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E96E64EC1 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A17DD6B0005; Mon, 1 Feb 2021 17:12:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C6536B0006; Mon, 1 Feb 2021 17:12:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 842506B006E; Mon, 1 Feb 2021 17:12:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0038.hostedemail.com [216.40.44.38]) by kanga.kvack.org (Postfix) with ESMTP id 6B8676B0005 for ; Mon, 1 Feb 2021 17:12:33 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 2C2EA180AD802 for ; Mon, 1 Feb 2021 22:12:33 +0000 (UTC) X-FDA: 77771098986.16.meal80_3209e1c275c5 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 18722100E690C for ; Mon, 1 Feb 2021 22:12:33 +0000 (UTC) X-HE-Tag: meal80_3209e1c275c5 X-Filterd-Recvd-Size: 14281 Received: from mail-io1-f47.google.com (mail-io1-f47.google.com [209.85.166.47]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Mon, 1 Feb 2021 22:12:32 +0000 (UTC) Received: by mail-io1-f47.google.com with SMTP id u8so14058529ior.13 for ; Mon, 01 Feb 2021 14:12:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2WJoG60ceKTntaeB3RRnHJeVlEFtZwmWSC8LANiTw3Y=; b=PlDxAZzeq5kG3hS8Hcmc2CFLYpPCxSVc6Offv+qN4cdUjQ1Pyv3eJ501vrS93E48en hajpLNEm80yaVTKnIOe6yGnPWNJFD5Bkvbdk6NTaUi2PmEj7l8ZkiWMB9kRnk0OpvrIL uyZI2Y+46C+/W6HaydVBRHB2Q6i7EAo79dpNr0ipdha7cR4JEeFF6sXsiS02527YO4NG f+ZmO+IXUzRnih8uU19BN5LsojDZegLsew9zdB5l9PrTY4h3VZ0MN/lnLPqo41AquTWP 7b1BFjcCAVaFh1oyfWl4KTQSgv6xFWEEMHGniF7oa404hHJhE5zHIxPz2zu5AgvSAFR8 ccIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2WJoG60ceKTntaeB3RRnHJeVlEFtZwmWSC8LANiTw3Y=; b=rp0tnSZmObuEUJQIjEgvkyO1l3j4VJh7hPVqujPHKW9lSglV0KUXEY7QTNKKZG4sie mWMGm+p4n1rV6u5FwoxpG2jVcMoP4wFo0zZ5Ov7n0UmhJHPB0VpfDAR94oi3Bpz2WFj1 j7/mlbE0vyRg7jG5WeNGAYDp9hq+3rlKv2UhcQWgM/+/P4xxJvc06JqtnaODW/zTgDYz Zk/cRwbppSf7yj/s8E+JRtZTVeSwalbKpL0uXibAICXKW10EsZ4rrqSpOjlO+T5vioK2 I7awKc/e1IfHEEQJIvafnITUtsjGKtjYa9RIHGC6RcDNppO5Ba+phP7NC1VXad72tIbB Z0eA== X-Gm-Message-State: AOAM5326pRI2WTkDynR728iZp2Huw61PxhCxt1Q105Ui0NwoxaC06+Zd 9z/RbX3ovWyUgtYCcRYx4cLeVdLHLhKpvA9CSQI/Gw== X-Google-Smtp-Source: ABdhPJwPHLD4WbPAI5s4dJVj3b8xUybLTSiEtpQq3lluEC22rU7uaLERqtTz3Hcc+W3NQb5b0sv7VBcwwtvUOa7FvBk= X-Received: by 2002:a02:3844:: with SMTP id v4mr16268861jae.1.1612217551689; Mon, 01 Feb 2021 14:12:31 -0800 (PST) MIME-Version: 1.0 References: <20210128224819.2651899-1-axelrasmussen@google.com> <20210128224819.2651899-8-axelrasmussen@google.com> <20210201192120.GG260413@xz-x1> In-Reply-To: <20210201192120.GG260413@xz-x1> From: Axel Rasmussen Date: Mon, 1 Feb 2021 14:11:55 -0800 Message-ID: Subject: Re: [PATCH v3 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl To: Peter Xu Cc: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka , LKML , linux-fsdevel@vger.kernel.org, Linux MM , Adam Ruprecht , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Oliver Upton Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 1, 2021 at 11:21 AM Peter Xu wrote: > > On Thu, Jan 28, 2021 at 02:48:17PM -0800, Axel Rasmussen wrote: > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index f94a35296618..79e1f0155afa 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -135,11 +135,14 @@ void hugetlb_show_meminfo(void); > > unsigned long hugetlb_total_pages(void); > > vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > unsigned long address, unsigned int flags); > > +#ifdef CONFIG_USERFAULTFD > > I'm confused why this is needed.. hugetlb_mcopy_atomic_pte() should only be > called in userfaultfd.c, but if without uffd config set it won't compile > either: > > obj-$(CONFIG_USERFAULTFD) += userfaultfd.o With this series as-is, but *without* the #ifdef CONFIG_USERFAULTFD here, we introduce a bunch of build warnings like this: In file included from ./include/linux/migrate.h:8:0, from kernel/sched/sched.h:53, from kernel/sched/isolation.c:10: ./include/linux/hugetlb.h:143:12: warning: 'enum mcopy_atomic_mode' declared inside parameter list struct page **pagep); ^ ./include/linux/hugetlb.h:143:12: warning: its scope is only this definition or declaration, which is probably not what you want And similarly we get an error about the "mode" parameter having an incomplete type in hugetlb.c. This is because enum mcopy_atomic_mode is defined in userfaultfd_k.h, and that entire header is wrapped in a #ifdef CONFIG_USERFAULTFD. So we either need to define enum mcopy_atomic_mode unconditionally, or we need to #ifdef CONFIG_USERFAULTFD the references to it also. - I opted not to move it outside the #ifdef CONFIG_USERFAULTFD in userfaultfd_k.h (defining it unconditionally), because that seemed messy to me. - I opted not to define it unconditionally in hugetlb.h, because we'd have to move it to userfaultfd_k.h anyway when shmem or other users are introduced. I'm planning to send a series to add this a few days or so after this series is merged, so it seems churn-y to move it then. - It seemed optimal to not compile hugetlb_mcopy_atomic_pte anyway (even ignoring adding the continue ioctl), since as you point out userfaultfd is the only caller. Hopefully this clarifies this and the next two comments. Let me know if you still feel strongly, I don't hate any of the alternatives, just wanted to clarify that I had considered them and thought this approach was best. > > > int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, > > struct vm_area_struct *dst_vma, > > unsigned long dst_addr, > > unsigned long src_addr, > > + enum mcopy_atomic_mode mode, > > struct page **pagep); > > +#endif > > int hugetlb_reserve_pages(struct inode *inode, long from, long to, > > struct vm_area_struct *vma, > > vm_flags_t vm_flags); > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > > index fb9abaeb4194..2fcb686211e8 100644 > > --- a/include/linux/userfaultfd_k.h > > +++ b/include/linux/userfaultfd_k.h > > @@ -37,6 +37,22 @@ extern int sysctl_unprivileged_userfaultfd; > > > > extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason); > > > > +/* > > + * The mode of operation for __mcopy_atomic and its helpers. > > + * > > + * This is almost an implementation detail (mcopy_atomic below doesn't take this > > + * as a parameter), but it's exposed here because memory-kind-specific > > + * implementations (e.g. hugetlbfs) need to know the mode of operation. > > + */ > > +enum mcopy_atomic_mode { > > + /* A normal copy_from_user into the destination range. */ > > + MCOPY_ATOMIC_NORMAL, > > + /* Don't copy; map the destination range to the zero page. */ > > + MCOPY_ATOMIC_ZEROPAGE, > > + /* Just setup the dst_vma, without modifying the underlying page(s). */ > > + MCOPY_ATOMIC_CONTINUE, > > +}; > > + > > Maybe better to keep this to where it's used, e.g. hugetlb.h where we've > defined hugetlb_mcopy_atomic_pte()? > > [...] > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 6f9d8349f818..3d318ef3d180 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -4647,6 +4647,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > return ret; > > } > > > > +#ifdef CONFIG_USERFAULTFD > > So I feel like you added the header ifdef for this. > > IMHO we can drop both since that's what we have had. I agree maybe it's better > to not compile that without CONFIG_USERFAULTFD but that may worth a standalone > patch anyways. > > > /* > > * Used by userfaultfd UFFDIO_COPY. Based on mcopy_atomic_pte with > > * modifications for huge pages. > > @@ -4656,6 +4657,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > > struct vm_area_struct *dst_vma, > > unsigned long dst_addr, > > unsigned long src_addr, > > + enum mcopy_atomic_mode mode, > > struct page **pagep) > > { > > struct address_space *mapping; > > @@ -4668,7 +4670,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > > int ret; > > struct page *page; > > > > - if (!*pagep) { > > + mapping = dst_vma->vm_file->f_mapping; > > + idx = vma_hugecache_offset(h, dst_vma, dst_addr); > > + > > + if (!*pagep && mode != MCOPY_ATOMIC_CONTINUE) { > > ret = -ENOMEM; > > page = alloc_huge_page(dst_vma, dst_addr, 0); > > if (IS_ERR(page)) > > @@ -4685,6 +4690,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > > /* don't free the page */ > > goto out; > > } > > + } else if (mode == MCOPY_ATOMIC_CONTINUE) { > > + ret = -EFAULT; > > + page = find_lock_page(mapping, idx); > > + *pagep = NULL; > > + if (!page) > > + goto out; > > } else { > > page = *pagep; > > *pagep = NULL; > > I would write this as: > > if (mode == MCOPY_ATOMIC_CONTINUE) > ... > else if (!*pagep) > ... > else > ... > > No strong opinion, but that'll look slightly cleaner to me. Agreed, I like that better as well. :) > > [...] > > > @@ -408,7 +407,7 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, > > unsigned long dst_start, > > unsigned long src_start, > > unsigned long len, > > - bool zeropage); > > + enum mcopy_atomic_mode mode); > > #endif /* CONFIG_HUGETLB_PAGE */ > > > > static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, > > @@ -417,7 +416,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, > > unsigned long dst_addr, > > unsigned long src_addr, > > struct page **page, > > - bool zeropage, > > + enum mcopy_atomic_mode mode, > > bool wp_copy) > > { > > ssize_t err; > > @@ -433,22 +432,38 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, > > * and not in the radix tree. > > */ > > if (!(dst_vma->vm_flags & VM_SHARED)) { > > - if (!zeropage) > > + switch (mode) { > > + case MCOPY_ATOMIC_NORMAL: > > err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, > > dst_addr, src_addr, page, > > wp_copy); > > - else > > + break; > > + case MCOPY_ATOMIC_ZEROPAGE: > > err = mfill_zeropage_pte(dst_mm, dst_pmd, > > dst_vma, dst_addr); > > + break; > > + /* It only makes sense to CONTINUE for shared memory. */ > > + case MCOPY_ATOMIC_CONTINUE: > > + err = -EINVAL; > > + break; > > + } > > } else { > > VM_WARN_ON_ONCE(wp_copy); > > - if (!zeropage) > > + switch (mode) { > > + case MCOPY_ATOMIC_NORMAL: > > err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd, > > dst_vma, dst_addr, > > src_addr, page); > > - else > > + break; > > + case MCOPY_ATOMIC_ZEROPAGE: > > err = shmem_mfill_zeropage_pte(dst_mm, dst_pmd, > > dst_vma, dst_addr); > > + break; > > + case MCOPY_ATOMIC_CONTINUE: > > + /* FIXME: Add minor fault interception for shmem. */ > > + err = -EINVAL; > > + break; > > + } > > } > > > > return err; > > The whole chunk above is not needed for hugetlbfs it seems - I'd avoid touching > the anon/shmem code path until it's being supported. > > What you need is probably set zeropage as below in __mcopy_atomic(): > > zeropage = (mode == MCOPY_ATOMIC_ZEROPAGE); > > Before passing it over to mfill_atomic_pte(). As long as we reject > UFFDIO_CONTINUE with !hugetlbfs correctly that'll be enough iiuc. Seems reasonable to me, I'll make this change in v4. > > Thanks, > > -- > Peter Xu >