From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A3B1C433DB for ; Thu, 18 Feb 2021 17:56:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C0B6F64EAF for ; Thu, 18 Feb 2021 17:56:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C0B6F64EAF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 205BB6B0006; Thu, 18 Feb 2021 12:56:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 168BA6B006C; Thu, 18 Feb 2021 12:56:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F24486B006E; Thu, 18 Feb 2021 12:56:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0186.hostedemail.com [216.40.44.186]) by kanga.kvack.org (Postfix) with ESMTP id C6D666B0006 for ; Thu, 18 Feb 2021 12:56:01 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 857E52C0C for ; Thu, 18 Feb 2021 17:56:01 +0000 (UTC) X-FDA: 77832142122.07.pigs73_270c75f27657 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 64FB218079A66 for ; Thu, 18 Feb 2021 17:56:01 +0000 (UTC) X-HE-Tag: pigs73_270c75f27657 X-Filterd-Recvd-Size: 8141 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Thu, 18 Feb 2021 17:56:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613670960; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hgus6Jh48EXSiuN6NRTTRChhTP22InEzFrKGu7j+/f0=; b=Eoz93E8UDEXOAy+EmrIW7/3SKJYs1L5EAd3A1TFwsVnKruJqRkIlOHPfbmrqePjU09/IXL 87jw2kMjHYso+I75PH0m3dxSpWfFh6JpcNM8WNHHCFveBh2F+0MSEy81Z9GrtAQ/DdftNi MJPqpbuBGiKAmU3AT+jlDmKQ5xAzNF0= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-479-Re8b8EKeNsiabNgqfEO-GQ-1; Thu, 18 Feb 2021 12:55:58 -0500 X-MC-Unique: Re8b8EKeNsiabNgqfEO-GQ-1 Received: by mail-qk1-f198.google.com with SMTP id p27so1793970qkp.8 for ; Thu, 18 Feb 2021 09:55:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=hgus6Jh48EXSiuN6NRTTRChhTP22InEzFrKGu7j+/f0=; b=cjaJv6wDm5roO+L+9MfNS6E0ln7eJ+CymTpqqhfXKfw3nGcj83jMlzx/ymhX5DTt5j 1mjjiwt/H3n4EHDVoTCGSkWIglRnQsJlsmZnZoaR50Dus/+2dKkvI7XO5vM8GGPfwCb4 CvCH2kp2hkX1jRrGckFtajKZCcjq6Z44yM7BpekeVMs/4IEcNnLFxjhaGQ11xXZyZ1+P 9iRgfffqJFxs1RDSlyhMC4po/UuBXG5QOr/otCsCKILKkaehwDUxM2IX77t1RtMycUwk z5X2JlwVbG+RgvgeEzebRkBvnDW5mx9aK87WWgODUjX+mi6XccCzEheeOmIYwAoWt31K O2NQ== X-Gm-Message-State: AOAM533dH9HPB1GUHsNvPmxQ+DeFxY5dhFB5mNhb1u4irvclnGjsAqsE 0RGzXswDFWBUyH2vy6ZZP2JlUxWPZcKtsfe63fLaymQjyahWt8OQVDZxEAZ6dNg0C30WTluApr7 y3vPmzhPVZ0M= X-Received: by 2002:a37:7fc2:: with SMTP id a185mr5673096qkd.202.1613670958112; Thu, 18 Feb 2021 09:55:58 -0800 (PST) X-Google-Smtp-Source: ABdhPJx7kQEHs9A+nPCZeDDOR0L4XNbrDzLBb36tqy/NHHh6MTnMGgE0nltV0zwmGpXEglTgsLKuFw== X-Received: by 2002:a37:7fc2:: with SMTP id a185mr5673073qkd.202.1613670957892; Thu, 18 Feb 2021 09:55:57 -0800 (PST) Received: from xz-x1 (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id e190sm4571993qkd.122.2021.02.18.09.55.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 09:55:57 -0800 (PST) Date: Thu, 18 Feb 2021 12:55:55 -0500 From: Peter Xu To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mike Rapoport , Andrea Arcangeli , Axel Rasmussen , Matthew Wilcox , "Kirill A . Shutemov" , Andrew Morton Subject: Re: [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Message-ID: <20210218175555.GC108961@xz-x1> References: <20210217204418.54259-1-peterx@redhat.com> <20210217204619.54761-1-peterx@redhat.com> <20210217204619.54761-3-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 17, 2021 at 05:46:30PM -0800, Mike Kravetz wrote: > On 2/17/21 12:46 PM, Peter Xu wrote: > > Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because > > userfaultfd-wp is always based on pgtable entries, so they cannot be shared. > > > > Walk the hugetlb range and unshare all such mappings if there is, right before > > UFFDIO_REGISTER will succeed and return to userspace. > > > > This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing > > is completely disabled for userfaultfd-wp registered range. > > > > Signed-off-by: Peter Xu > > --- > > fs/userfaultfd.c | 4 ++++ > > include/linux/hugetlb.h | 1 + > > mm/hugetlb.c | 51 +++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 56 insertions(+) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index 894cc28142e7..e259318fcae1 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -15,6 +15,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > vma->vm_flags = new_flags; > > vma->vm_userfaultfd_ctx.ctx = ctx; > > > > + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) > > + hugetlb_unshare_all_pmds(vma); > > + > > skip: > > prev = vma; > > start = vma->vm_end; > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index 3b4104021dd3..97ecfd4c20b2 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -188,6 +188,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, > > unsigned long address, unsigned long end, pgprot_t newprot); > > > > bool is_hugetlb_entry_migration(pte_t pte); > > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); > > > > #else /* !CONFIG_HUGETLB_PAGE */ > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index f53a0b852ed8..83c006ea3ff9 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -5723,4 +5723,55 @@ void __init hugetlb_cma_check(void) > > pr_warn("hugetlb_cma: the option isn't supported by current arch\n"); > > } > > > > +/* > > + * This function will unconditionally remove all the shared pmd pgtable entries > > + * within the specific vma for a hugetlbfs memory range. > > + */ > > Thanks for updating this! > > > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) > > +{ > > + struct hstate *h = hstate_vma(vma); > > + unsigned long sz = huge_page_size(h); > > + struct mm_struct *mm = vma->vm_mm; > > + struct mmu_notifier_range range; > > + unsigned long address, start, end; > > + spinlock_t *ptl; > > + pte_t *ptep; > > + > > + if (!(vma->vm_flags & VM_MAYSHARE)) > > + return; > > + > > + start = ALIGN(vma->vm_start, PUD_SIZE); > > + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); > > + > > + if (start >= end) > > + return; > > + > > + /* > > + * No need to call adjust_range_if_pmd_sharing_possible(), because > > + * we're going to operate on the whole vma > > not necessary, but perhaps change to: > * we're going to operate on ever PUD_SIZE aligned sized range > * within the vma. > > > + * we're going to operate on the whole vma > > + */ > > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, > > + vma->vm_start, vma->vm_end); > > Should we use start, end here instead of vma->vm_start, vma->vm_end ? > > > + mmu_notifier_invalidate_range_start(&range); > > + i_mmap_lock_write(vma->vm_file->f_mapping); > > + for (address = start; address < end; address += PUD_SIZE) { > > + unsigned long tmp = address; > > + > > + ptep = huge_pte_offset(mm, address, sz); > > + if (!ptep) > > + continue; > > + ptl = huge_pte_lock(h, mm, ptep); > > + /* We don't want 'address' to be changed */ > > + huge_pmd_unshare(mm, vma, &tmp, ptep); > > + spin_unlock(ptl); > > + } > > + flush_hugetlb_tlb_range(vma, vma->vm_start, vma->vm_end); > > start, end ? Right we can even shrink the notifier, I'll respin shortly. Thanks, -- Peter Xu