From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-21.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A5ECC433E0 for ; Mon, 1 Feb 2021 23:22:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DB7C864EBD for ; Mon, 1 Feb 2021 23:22:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DB7C864EBD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 602EA6B0005; Mon, 1 Feb 2021 18:22:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B3DA6B0006; Mon, 1 Feb 2021 18:22:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C8EC6B006E; Mon, 1 Feb 2021 18:22:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id 37B546B0005 for ; Mon, 1 Feb 2021 18:22:08 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F19D3362E for ; Mon, 1 Feb 2021 23:22:07 +0000 (UTC) X-FDA: 77771274294.09.rock91_0a16422275c6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id D8FD6180AD817 for ; Mon, 1 Feb 2021 23:22:07 +0000 (UTC) X-HE-Tag: rock91_0a16422275c6 X-Filterd-Recvd-Size: 7972 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Mon, 1 Feb 2021 23:22:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612221726; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JXealjKRFD0r2pq7PhUr7p4RbntHdqXsJgCcMNzSn98=; b=LVhfaZpEQ4SMNb395N3GB0QlbAp4pTF/voDPgB/NXzs9AJmU59Xja2CwPEPexsLx4ann6U h3BDuXLEzqmsRY/iIwyZsdVjPc5MAkM7gfCBTdQhbbcczL5dcCWqPbP3epMPy5R/rY8YHP uNbZ/lMhVSzjgWyl/KqF0TQ9uasb7pI= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-204-mfXe78FSMyqk8UYYDq9rmw-1; Mon, 01 Feb 2021 18:22:00 -0500 X-MC-Unique: mfXe78FSMyqk8UYYDq9rmw-1 Received: by mail-qt1-f199.google.com with SMTP id m21so11761874qtp.6 for ; Mon, 01 Feb 2021 15:22:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JXealjKRFD0r2pq7PhUr7p4RbntHdqXsJgCcMNzSn98=; b=bzqCB7znEk3ghWn6VkM6HwKO9bqZ9YcS3aPbRfAFGptcC9j85cl7rvZUh6WpEd8G4i LI7xg7Y+u3WSwFXTutffXfsNf3DFc3oawUK49YAg57e3MdPxMaXrN9JQdSJhx9M11VSL zBLPIUusqXCugbeRgv+ODRSHpmAy9VM+Vg8KAxHlOSg3CPbAwoKTPfpk4LI2d95q/g61 nqojyHDuJMhgPFo3TQw+hG1yeE5xLFS6WcV3SCtf2ud4Kbspt9uPVzFNZGd9fkgUYHJH J/rKb2VK07v4sk4MrMkh0ZbQyMyAyxzjpiHr5jA/IAI69hnJqhm1uOF0+I3so6sDMon2 1+jQ== X-Gm-Message-State: AOAM531B3d90TCYcTukDIewQs1/SaKvTXhIUfS3JjCSsVBvMQjYYoji+ 2wIigVZOkSvGYYi/CU3afrIPYZM6cFvumJiqtlT7ysO1GrgSEBkhrE6Fq9IJjLCP5rUJwLYTkMW ytoN9Ww96Ziw= X-Received: by 2002:a37:a955:: with SMTP id s82mr18552948qke.121.1612221719636; Mon, 01 Feb 2021 15:21:59 -0800 (PST) X-Google-Smtp-Source: ABdhPJzS+3wDmDxJ77Mv8mHl7zqigCXAmGJWxZRWwAqLuTopvy7+5ZDn3Q72W9Q6xnbA4goZHL5uNQ== X-Received: by 2002:a37:a955:: with SMTP id s82mr18552913qke.121.1612221719336; Mon, 01 Feb 2021 15:21:59 -0800 (PST) Received: from xz-x1 ([142.126.83.202]) by smtp.gmail.com with ESMTPSA id n24sm14841134qtv.26.2021.02.01.15.21.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Feb 2021 15:21:58 -0800 (PST) Date: Mon, 1 Feb 2021 18:21:55 -0500 From: Peter Xu To: Mike Kravetz Cc: Axel Rasmussen , Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , Michal =?utf-8?Q?Koutn=C3=BD?= , Michel Lespinasse , Mike Rapoport , Nicholas Piggin , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Oliver Upton Subject: Re: [PATCH v3 4/9] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Message-ID: <20210201232155.GL260413@xz-x1> References: <20210128224819.2651899-1-axelrasmussen@google.com> <20210128224819.2651899-5-axelrasmussen@google.com> MIME-Version: 1.0 In-Reply-To: Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 01, 2021 at 02:33:20PM -0800, Mike Kravetz wrote: > On 1/28/21 2:48 PM, Axel Rasmussen wrote: > > From: Peter Xu > > > > Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because > > userfaultfd-wp is always based on pgtable entries, so they cannot be shared. > > > > Walk the hugetlb range and unshare all such mappings if there is, right before > > UFFDIO_REGISTER will succeed and return to userspace. > > > > This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing > > is completely disabled for userfaultfd-wp registered range. > > > > Signed-off-by: Peter Xu > > Signed-off-by: Axel Rasmussen > > --- > > fs/userfaultfd.c | 45 ++++++++++++++++++++++++++++++++++++ > > include/linux/mmu_notifier.h | 1 + > > 2 files changed, 46 insertions(+) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index 894cc28142e7..2c6706ac2504 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -15,6 +15,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -1190,6 +1191,47 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf, > > } > > } > > > > +/* > > + * This function will unconditionally remove all the shared pmd pgtable entries > > + * within the specific vma for a hugetlbfs memory range. > > + */ > > +static void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) > > +{ > > +#ifdef CONFIG_HUGETLB_PAGE > > + struct hstate *h = hstate_vma(vma); > > + unsigned long sz = huge_page_size(h); > > + struct mm_struct *mm = vma->vm_mm; > > + struct mmu_notifier_range range; > > + unsigned long address; > > + spinlock_t *ptl; > > + pte_t *ptep; > > + > > Perhaps we should add a quick to see if vma is sharable. Might be as > simple as !(vma->vm_flags & VM_MAYSHARE). I see a comment/question in > a later patch about only doing minor fault processing on shared mappings. Yes, that comment was majorly about shmem though - I believe shared case should still be the major one, especially for hugetlbfs. So what I was thinking is something like: one non-uffd process use shared mapping of the file, meanwhile the other uffd process used private mapping on the same file. When the uffd process access page it could fault in the page cache and continued by UFFDIO_CONTINUE, however when it writes it'll COW into private pages. Something like that. Not sure whether it's useful, but I just don't see why we should block that case. > > Code below looks fine, but it would be a wast to do all that for a vma > that could not be shared. Right, still better to check it. Mike, I agree with all your comments on the initial 4 patches, thanks for the input! To make Axel's life easier, I've modified them locally and pushed since after all I'll do it in my series too (I also picked Mike's r-b on patch 3): https://github.com/xzpeter/linux/commits/uffd-wp-shmem-hugetlbfs Axel, feel free to fetch from it directly. Thanks, -- Peter Xu