From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BCBEC47090 for ; Thu, 27 May 2021 20:23:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 06B8161360 for ; Thu, 27 May 2021 20:23:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 06B8161360 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A03856B0088; Thu, 27 May 2021 16:23:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B4B06B0089; Thu, 27 May 2021 16:23:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82C906B008A; Thu, 27 May 2021 16:23:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id 4C7B86B0088 for ; Thu, 27 May 2021 16:23:44 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E010F18047612 for ; Thu, 27 May 2021 20:23:43 +0000 (UTC) X-FDA: 78188136726.15.35F1980 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 907A3C007749 for ; Thu, 27 May 2021 20:23:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1622147023; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CB/ZZ041zggBCCBv7NLw8xpkW7+qlOpR2v+yvM7l654=; b=HT5nM+tM9tdQfYBWwbash98RM4M4f4ZuJrGrfS3sO4Nea/dhNHpL2E+oee0Ip741lUW2xl L9mzxM3pNgE1U1QgdaD8J5+4yCouPihWD1V4jOax+GycMegGaPanfgqxBMbhNIJ8t0dADA zc+Qlxmy2mJCQ8q8juS0ziU1hsI16cA= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-209-xoeF608cNWSeM7ymu4yOVg-1; Thu, 27 May 2021 16:23:41 -0400 X-MC-Unique: xoeF608cNWSeM7ymu4yOVg-1 Received: by mail-qv1-f71.google.com with SMTP id k12-20020a0cfd6c0000b029020df9543019so1010782qvs.14 for ; Thu, 27 May 2021 13:23:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CB/ZZ041zggBCCBv7NLw8xpkW7+qlOpR2v+yvM7l654=; b=Pgp15B1rLOUfxcfu2gU0O/qGSjN9rZZeVd0pJixwJDkbe3F/YSTIsZux2XABlBqSl0 CHVI4dB7OzBr1RO5bB64zMQHVmR0oGWEewA5iUEsWCP9JGcDXX9FsAmuaIJMuen4n4Zp 4fNtuPCfqJ1fOyXVnVFHL60RCQbatUzoXgh0AuT9m6AAdqzCcriTCKRdKnWprKxGdXmG /G4y10kwLNtkFIhmJkCaTFb2MsqP8+uakokzkHB2U20NHpxytfnWtTYIkPueQ4SnGhlf E7/bDntMcuinyPLnvSLDbm3fCZoMkiWIed/fB6wUgKuMpkC7r04CGGQs8x7dfhonngEn fHQA== X-Gm-Message-State: AOAM532/yH8QlUDN27arlt8TGnDZjHbg6RIuFzp22YQePQdN4Ol8MZSB JUtoSBEw/mxZ9TuWwh0O5YCLjqd8Lwet+AfFh4qHSViirIQiZMw+GDIRtSZ/E13Fiox3JIM7vT1 GI/jVxWn4FKg= X-Received: by 2002:ac8:5a8f:: with SMTP id c15mr354817qtc.162.1622147020665; Thu, 27 May 2021 13:23:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwPUX4VOylyMwlokkX/25yMH5dG/jiMR6d/tM2BGaCOW3lGyhn9A9qZLMyhTc0YE/SXfOxZLQ== X-Received: by 2002:ac8:5a8f:: with SMTP id c15mr354789qtc.162.1622147020352; Thu, 27 May 2021 13:23:40 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-72-184-145-4-219.dsl.bell.ca. [184.145.4.219]) by smtp.gmail.com with ESMTPSA id j1sm1973305qtn.83.2021.05.27.13.23.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 May 2021 13:23:39 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Rapoport , peterx@redhat.com, Andrew Morton , Mike Kravetz , Jerome Glisse , Miaohe Lin , Nadav Amit , Hugh Dickins , Matthew Wilcox , Jason Gunthorpe , "Kirill A . Shutemov" , Andrea Arcangeli , Axel Rasmussen Subject: [PATCH v3 24/27] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Date: Thu, 27 May 2021 16:23:37 -0400 Message-Id: <20210527202337.32256-1-peterx@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527201927.29586-1-peterx@redhat.com> References: <20210527201927.29586-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HT5nM+tM; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf06.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=peterx@redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 907A3C007749 X-Stat-Signature: dkq5w8bkqioaa3iy14zz5j4an4hmcxhb X-HE-Tag: 1622147015-510882 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pt= e if unmapping an entire vma or synchronized such that faults can not race wit= h the unmap operation. This requires passing zap_flags all the way to the lowe= st level hugetlb unmap routine: __unmap_hugepage_range. In general, unmap calls originated in hugetlbfs code will pass the ZAP_FLAG_DROP_FILE_UFFD_WP flag as synchronization is in place to prevent faults. The exception is hole punch which will first unmap without any synchronization. Later when hole punch actually removes the page from th= e file, it will check to see if there was a subsequent fault and if so take= the hugetlb fault mutex while unmapping again. This second unmap will pass i= n ZAP_FLAG_DROP_FILE_UFFD_WP. The core justification of "whether to apply ZAP_FLAG_DROP_FILE_UFFD_WP fl= ag when unmap a hugetlb range" is (IMHO): we should never reach a state when= a page fault could errornously fault in a page-cache page that was wr-prote= cted to be writable, even in an extremely short period. That could happen if e.g. we pass ZAP_FLAG_DROP_FILE_UFFD_WP in hugetlbfs_punch_hole() when ca= lling hugetlb_vmdelete_list(), because if a page fault triggers after that call= and before the remove_inode_hugepages() right after it, the page cache can be mapped writable again in the small window, which can cause data corruptio= n. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 15 +++++++++------ include/linux/hugetlb.h | 8 +++++--- mm/hugetlb.c | 27 +++++++++++++++++++++------ mm/memory.c | 5 ++++- 4 files changed, 39 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 55efd3dd04f6..b917fb4c670e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -404,7 +404,8 @@ static void remove_huge_page(struct page *page) } =20 static void -hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_= t end) +hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_= t end, + unsigned long zap_flags) { struct vm_area_struct *vma; =20 @@ -437,7 +438,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pg= off_t start, pgoff_t end) } =20 unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, - NULL); + NULL, zap_flags); } } =20 @@ -515,7 +516,8 @@ static void remove_inode_hugepages(struct inode *inod= e, loff_t lstart, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); + (index + 1) * pages_per_huge_page(h), + ZAP_FLAG_DROP_FILE_UFFD_WP); i_mmap_unlock_write(mapping); } =20 @@ -581,7 +583,8 @@ static void hugetlb_vmtruncate(struct inode *inode, l= off_t offset) i_mmap_lock_write(mapping); i_size_write(inode, offset); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) - hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); + hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, + ZAP_FLAG_DROP_FILE_UFFD_WP); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); } @@ -614,8 +617,8 @@ static long hugetlbfs_punch_hole(struct inode *inode,= loff_t offset, loff_t len) i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, - hole_start >> PAGE_SHIFT, - hole_end >> PAGE_SHIFT); + hole_start >> PAGE_SHIFT, + hole_end >> PAGE_SHIFT, 0); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); inode_unlock(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3e4c5c64d867..d3e8b3b38ded 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -138,11 +138,12 @@ long follow_hugetlb_page(struct mm_struct *, struct= vm_area_struct *, unsigned long *, unsigned long *, long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *, + unsigned long); void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page); + struct page *ref_page, unsigned long zap_flags); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(char *buf, int len, int nid); void hugetlb_show_meminfo(void); @@ -377,7 +378,8 @@ static inline unsigned long hugetlb_change_protection= ( =20 static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { BUG(); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c4dd0c531bb5..78675158911c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4274,7 +4274,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, = struct mm_struct *src, =20 void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struc= t *vma, unsigned long start, unsigned long end, - struct page *ref_page) + struct page *ref_page, unsigned long zap_flags) { struct mm_struct *mm =3D vma->vm_mm; unsigned long address; @@ -4326,6 +4326,19 @@ void __unmap_hugepage_range(struct mmu_gather *tlb= , struct vm_area_struct *vma, continue; } =20 + if (unlikely(is_swap_special_pte(pte))) { + WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte)); + /* + * Only drop the special swap uffd-wp pte if + * e.g. unmapping a vma or punching a hole (with proper + * lock held so that concurrent page fault won't happen). + */ + if (zap_flags & ZAP_FLAG_DROP_FILE_UFFD_WP) + huge_pte_clear(mm, address, ptep, sz); + spin_unlock(ptl); + continue; + } + /* * Migrating hugepage or HWPoisoned hugepage is already * unmapped and its refcount is dropped, so just clear pte here. @@ -4377,9 +4390,10 @@ void __unmap_hugepage_range(struct mmu_gather *tlb= , struct vm_area_struct *vma, =20 void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { - __unmap_hugepage_range(tlb, vma, start, end, ref_page); + __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); =20 /* * Clear this flag so that x86's huge_pmd_share page_table_shareable @@ -4395,12 +4409,13 @@ void __unmap_hugepage_range_final(struct mmu_gath= er *tlb, } =20 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long star= t, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { struct mmu_gather tlb; =20 tlb_gather_mmu(&tlb, vma->vm_mm); - __unmap_hugepage_range(&tlb, vma, start, end, ref_page); + __unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags); tlb_finish_mmu(&tlb); } =20 @@ -4455,7 +4470,7 @@ static void unmap_ref_private(struct mm_struct *mm,= struct vm_area_struct *vma, */ if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER)) unmap_hugepage_range(iter_vma, address, - address + huge_page_size(h), page); + address + huge_page_size(h), page, 0); } i_mmap_unlock_write(mapping); } diff --git a/mm/memory.c b/mm/memory.c index 8372b212993a..4427f48e446d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1607,8 +1607,11 @@ static void unmap_single_vma(struct mmu_gather *tl= b, * safe to do nothing in this case. */ if (vma->vm_file) { + unsigned long zap_flags =3D details ? + details->zap_flags : 0; i_mmap_lock_write(vma->vm_file->f_mapping); - __unmap_hugepage_range_final(tlb, vma, start, end, NULL); + __unmap_hugepage_range_final(tlb, vma, start, end, + NULL, zap_flags); i_mmap_unlock_write(vma->vm_file->f_mapping); } } else --=20 2.31.1