From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-24.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B899C47080 for ; Tue, 1 Jun 2021 21:05:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DD1EF613AE for ; Tue, 1 Jun 2021 21:05:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DD1EF613AE Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 53EA66B006C; Tue, 1 Jun 2021 17:05:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 49EB36B006E; Tue, 1 Jun 2021 17:05:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F1446B0070; Tue, 1 Jun 2021 17:05:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0171.hostedemail.com [216.40.44.171]) by kanga.kvack.org (Postfix) with ESMTP id E782F6B006C for ; Tue, 1 Jun 2021 17:05:49 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 80B1D180AD817 for ; Tue, 1 Jun 2021 21:05:49 +0000 (UTC) X-FDA: 78206386818.12.F8D152E Received: from mail-ot1-f52.google.com (mail-ot1-f52.google.com [209.85.210.52]) by imf03.hostedemail.com (Postfix) with ESMTP id AA70AC00CBFA for ; Tue, 1 Jun 2021 21:05:36 +0000 (UTC) Received: by mail-ot1-f52.google.com with SMTP id v19-20020a0568301413b0290304f00e3d88so595559otp.4 for ; Tue, 01 Jun 2021 14:05:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=aOIG7tu+/2aICcc9Ly2ISwpTg6tce60fLXoVvD1JTzM=; b=LVCZanx/eYTvSx8lbWpO+ZdK2aPRERFXGZTx1Mhgep8Q8QkzGUo7bbB3g+i5u8rpSp Soqb3fcuYDDDvt0nFd0wlpFXBxgnZe3ZIIZ7mgooDnnJ1XxzdUCk17V1ihl4yakJt5Mm 1B502MVUExrwyRSZ5YiNucIo3AkPrclnGMUGraAbZYC/olCSlLINFvP4uLCoGa5uogeU Ma3csqVApJ7WJY1t2WesG1pEfBnC89flXSRo4Ty2ljjJH9Jak59UnDl7KUieNY01cwYU 5NROwk8GWWXUJNYIeP5ruto/jmw+GN/3HS1HRrl79yZcILHFQXlNGh80ws3w+LBwSdOV Ffnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=aOIG7tu+/2aICcc9Ly2ISwpTg6tce60fLXoVvD1JTzM=; b=IzesCkh/gUXuq/yhQJO8fpStD1QiURzIy3MR3PG4o/Z5tRU0Gq+or5dy7xLUOv/jPP jmzDnxUFcr1dtzBWTaXId81GurBGXkHBj/vOvRYX5lRWh6VXwneUp+LkPx6QERUw+4vb n1s3ZR5dMMgBimURzzbD00Gak4Ik9EHoB+MxYo/C5IAV/9ecT15KU1eZ/I409HmXa+pu rWt7PFjI0mYoYoqtVDaXx8TtvLdPzHFJSmZlHVPn+f1Z6XRpYzQM53o37oz/52+4hGt8 VljHLCUAOukEu6/t2qgG1tFnCdCywlKqsYqwbtKYL+u+fXh0iC9sQRIgKoQ3YigdeC31 iBig== X-Gm-Message-State: AOAM533ZLEv+egcupUATvhg1b2H98KWD6aYw6HiwbG9k+HHAk2HnQ/xL EamEchSUrQ+vZAY/4/CchVciog== X-Google-Smtp-Source: ABdhPJzV5gldYPZB2iLgdrqcizeQk2B+APGM0bHru2ERnaVvQl2M5X3Gk3ejGz5izjqbmQcVBiRC7g== X-Received: by 2002:a05:6830:18dc:: with SMTP id v28mr8776572ote.326.1622581548311; Tue, 01 Jun 2021 14:05:48 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t14sm3641081ooh.39.2021.06.01.14.05.46 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Tue, 01 Jun 2021 14:05:47 -0700 (PDT) Date: Tue, 1 Jun 2021 14:05:45 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Andrew Morton cc: Hugh Dickins , "Kirill A. Shutemov" , Yang Shi , Wang Yugui , Matthew Wilcox , Naoya Horiguchi , Alistair Popple , Ralph Campbell , Zi Yan , Miaohe Lin , Minchan Kim , Jue Wang , Peter Xu , Jan Kara , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/7] mm/thp: fix __split_huge_pmd_locked() on shmem migration entry In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b="LVCZanx/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of hughd@google.com designates 209.85.210.52 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: ctwmheonkpakbktu5zze4s5gz951duro X-Rspamd-Queue-Id: AA70AC00CBFA X-Rspamd-Server: rspam02 X-HE-Tag: 1622581536-735338 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Stressing huge tmpfs page migration racing hole punch often crashed on the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y kernel; or shortly afterwards, on a bad dereference in __split_huge_pmd_locked() when DEBUG_VM=n. They forgot to allow for pmd migration entries in the non-anonymous case. Full disclosure: those particular experiments were on a kernel with more relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the vanilla kernel: it is conceivable that stricter locking happens to avoid those cases, or makes them less likely; but __split_huge_pmd_locked() already allowed for pmd migration entries when handling anonymous THPs, so this commit brings the shmem and file THP handling into line. Are there more places that need to be careful about pmd migration entries? None hit in practice, but several of those is_huge_zero_pmd() tests were done without checking pmd_present() first: I believe a pmd migration entry could end up satisfying that test. Ah, the inversion of swap offset, to protect against L1TF, makes that impossible on x86; but other arches need the pmd_present() check, and even x86 ought not to apply pmd_page() to a swap-like pmd. Fix those instances; __split_huge_pmd_locked() was not wrong to be checking with pmd_trans_huge() instead, but I think it's clearer to use pmd_present() in each instance. And while there: make it clearer to the eye that the !vma_is_anonymous() and is_huge_zero_pmd() blocks make early returns (and don't return void). Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins Cc: --- mm/huge_memory.c | 38 ++++++++++++++++++++++++-------------- mm/pgtable-generic.c | 5 ++--- 2 files changed, 26 insertions(+), 17 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 63ed6b25deaa..9fb7b47da87e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1587,9 +1587,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, goto out_unlocked; orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto out; - if (unlikely(!pmd_present(orig_pmd))) { VM_BUG_ON(thp_migration_supported() && !is_pmd_migration_entry(orig_pmd)); @@ -1597,6 +1594,9 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, } page = pmd_page(orig_pmd); + if (is_huge_zero_page(page)) + goto out; + /* * If other processes are mapping this page, we couldn't discard * the page unless they all do MADV_FREE so let's skip the page. @@ -1676,7 +1676,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, spin_unlock(ptl); if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); - } else if (is_huge_zero_pmd(orig_pmd)) { + } else if (pmd_present(orig_pmd) && is_huge_zero_pmd(orig_pmd)) { zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); @@ -2044,7 +2044,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, count_vm_event(THP_SPLIT_PMD); if (!vma_is_anonymous(vma)) { - _pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd); + old_pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd); /* * We are going to unmap this huge page. So * just go ahead and zap it @@ -2053,16 +2053,25 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, zap_deposited_table(mm, pmd); if (vma_is_special_huge(vma)) return; - page = pmd_page(_pmd); - if (!PageDirty(page) && pmd_dirty(_pmd)) - set_page_dirty(page); - if (!PageReferenced(page) && pmd_young(_pmd)) - SetPageReferenced(page); - page_remove_rmap(page, true); - put_page(page); + if (unlikely(is_pmd_migration_entry(old_pmd))) { + swp_entry_t entry; + + entry = pmd_to_swp_entry(old_pmd); + page = migration_entry_to_page(entry); + } else { + page = pmd_page(old_pmd); + if (!PageDirty(page) && pmd_dirty(old_pmd)) + set_page_dirty(page); + if (!PageReferenced(page) && pmd_young(old_pmd)) + SetPageReferenced(page); + page_remove_rmap(page, true); + put_page(page); + } add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { + } + + if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside @@ -2072,7 +2081,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, * small page also write protected so it does not seems useful * to invalidate secondary mmu at this time. */ - return __split_huge_zero_page_pmd(vma, haddr, pmd); + __split_huge_zero_page_pmd(vma, haddr, pmd); + return; } /* diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index c2210e1cdb51..4e640baf9794 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -135,9 +135,8 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, { pmd_t pmd; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - VM_BUG_ON(!pmd_present(*pmdp)); - /* Below assumes pmd_present() is true */ - VM_BUG_ON(!pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp)); + VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) && + !pmd_devmap(*pmdp)); pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd; -- 2.32.0.rc0.204.g9fa02ecfa5-goog