From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E8AAC48BCF for ; Wed, 9 Jun 2021 21:53:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 30F37613E1 for ; Wed, 9 Jun 2021 21:53:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229685AbhFIVzO (ORCPT ); Wed, 9 Jun 2021 17:55:14 -0400 Received: from mail-ed1-f52.google.com ([209.85.208.52]:42761 "EHLO mail-ed1-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229535AbhFIVzN (ORCPT ); Wed, 9 Jun 2021 17:55:13 -0400 Received: by mail-ed1-f52.google.com with SMTP id i13so30298643edb.9 for ; Wed, 09 Jun 2021 14:53:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5PRgHHIIgTRdkB8/6mBxXJGkSQIhkneFEmfNdmtFn5g=; b=uZhAJ80pi8a8J0lvLdWtgaqPpTTbpxUOmHlUAL1fjEuYGzTq4qbl/rhVL3OpHSqbrH yAKQ+dLQdux+VZaCfEikMeK85JudN7BkJOyrcjKU3BeDr5vBld5T/Oo7RoPllIa/9Bz0 Xut3XqjG4klR3TBw1BMg/dJgDvureeNeNP6JzwlAJGjxKETmn4xSQfix8J+umRoHsyo5 7ntrkN/V29MmZfxvhl7lPHj9wtixmfBdioCFaQS6RgOOBQRQjfUoD57qnnJ/AJ4pRSVd a4Vp0addKmlR8Xl647WGXiKARk8n1Zea93nvBdkzuwiSWA19TPLXkEu+PNcs7s58ZPrt vV7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5PRgHHIIgTRdkB8/6mBxXJGkSQIhkneFEmfNdmtFn5g=; b=sRo7lO4eYuGNLOebusLYakzhcCfl1uyPTtWt9D1AtqxMHKoCk4GOUjTTk6+UKU/7ki +oV1xsMMj1/e+3v1fMOkYjcoRYAOPQlPSYDg2yQrKSJPZESU3lzRcWUPYHiTj6KAhFFl p4qiJkKPF6numLaZc8ghPsGaabDPl19mswKF+87VsW6VwlEHy04d+pPKAY6pxyQbOLB7 OPFZcTZdTvhNjpIT0M4lpKII6bhOcS+mXTr4OUglppQXDxX3l7/pja51Tp+8seEusA/M vTDhYx43/PvuSm0kYD9cCLFdwrorfTS49Et5aazEQ32wajfe35D7hAbyjTLNfWiSXgeM GNXQ== X-Gm-Message-State: AOAM533ns8VOkUDM3Ap2ZbfZ/ovy96bFPErJqpgRBWwT6wFhRkUMdISd UcdJ4QJA8t/zkL9kjnTWccDyLKgwZyNenE1ZXAM= X-Google-Smtp-Source: ABdhPJy7yppG66WxlN0hHUuoirVaGxFaTzxXKCPteRzfo/zc0wGyWnMX9bpraNLe3kdhC9KJRqwBiCIoSbxezpCYgH8= X-Received: by 2002:a05:6402:145a:: with SMTP id d26mr1430601edx.151.1623275521983; Wed, 09 Jun 2021 14:52:01 -0700 (PDT) MIME-Version: 1.0 References: <50eb41a-e6f8-e566-20ef-22fb63b0a529@google.com> In-Reply-To: From: Yang Shi Date: Wed, 9 Jun 2021 14:51:50 -0700 Message-ID: Subject: Re: [PATCH v3 06/10] mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() To: Hugh Dickins Cc: Andrew Morton , "Kirill A. Shutemov" , Wang Yugui , Matthew Wilcox , Naoya Horiguchi , Alistair Popple , Ralph Campbell , Zi Yan , Miaohe Lin , Minchan Kim , Jue Wang , Peter Xu , Jan Kara , Shakeel Butt , Oscar Salvador , Linux MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 9, 2021 at 2:16 PM Hugh Dickins wrote: > > There is a race between THP unmapping and truncation, when truncate sees > pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, > but before its page_remove_rmap() gets to decrement compound_mapcount: > generating false "BUG: Bad page cache" reports that the page is still > mapped when deleted. This commit fixes that, but not in the way I hoped. > > The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK) > instead of unmap_mapping_range() in truncate_cleanup_page(): it has often > been an annoyance that we usually call unmap_mapping_range() with no pages > locked, but there apply it to a single locked page. try_to_unmap() looks > more suitable for a single locked page. > > However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): > it is used to insert THP migration entries, but not used to unmap THPs. > Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB > needs are different, I'm too ignorant of the DAX cases, and couldn't > decide how far to go for anon+swap. Set that aside. > > The second attempt took a different tack: make no change in truncate.c, > but modify zap_huge_pmd() to insert an invalidated huge pmd instead of > clearing it initially, then pmd_clear() between page_remove_rmap() and > unlocking at the end. Nice. But powerpc blows that approach out of the > water, with its serialize_against_pte_lookup(), and interesting pgtable > usage. It would need serious help to get working on powerpc (with a > minor optimization issue on s390 too). Set that aside. > > Just add an "if (page_mapped(page)) synchronize_rcu();" or other such > delay, after unmapping in truncate_cleanup_page()? Perhaps, but though > that's likely to reduce or eliminate the number of incidents, it would > give less assurance of whether we had identified the problem correctly. > > This successful iteration introduces "unmap_mapping_page(page)" instead > of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, > with an addition to details. Then zap_pmd_range() watches for this case, > and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() > now does in the PVMW_SYNC case. Not pretty, but safe. > > Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to > assert its interface; but currently that's only used to make sure that > page->mapping is stable, and zap_pmd_range() doesn't care if the page is > locked or not. Along these lines, in invalidate_inode_pages2_range() > move the initial unmap_mapping_range() out from under page lock, before > then calling unmap_mapping_page() under page lock if still mapped. > > Fixes: fc127da085c2 ("truncate: handle file thp") > Signed-off-by: Hugh Dickins > Acked-by: Kirill A. Shutemov > Cc: > --- > v3: expanded spin_unlock(pmd_lock()) like in 03/10, per Yang Shi Thanks. Reviewed-by: Yang Shi > > include/linux/mm.h | 3 +++ > mm/memory.c | 41 +++++++++++++++++++++++++++++++++++++++++ > mm/truncate.c | 43 +++++++++++++++++++------------------------ > 3 files changed, 63 insertions(+), 24 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index c274f75efcf9..8ae31622deef 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1719,6 +1719,7 @@ struct zap_details { > struct address_space *check_mapping; /* Check page->mapping if set */ > pgoff_t first_index; /* Lowest page->index to unmap */ > pgoff_t last_index; /* Highest page->index to unmap */ > + struct page *single_page; /* Locked page to be unmapped */ > }; > > struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > @@ -1766,6 +1767,7 @@ extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma, > extern int fixup_user_fault(struct mm_struct *mm, > unsigned long address, unsigned int fault_flags, > bool *unlocked); > +void unmap_mapping_page(struct page *page); > void unmap_mapping_pages(struct address_space *mapping, > pgoff_t start, pgoff_t nr, bool even_cows); > void unmap_mapping_range(struct address_space *mapping, > @@ -1786,6 +1788,7 @@ static inline int fixup_user_fault(struct mm_struct *mm, unsigned long address, > BUG(); > return -EFAULT; > } > +static inline void unmap_mapping_page(struct page *page) { } > static inline void unmap_mapping_pages(struct address_space *mapping, > pgoff_t start, pgoff_t nr, bool even_cows) { } > static inline void unmap_mapping_range(struct address_space *mapping, > diff --git a/mm/memory.c b/mm/memory.c > index f3ffab9b9e39..486f4a2874e7 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1361,7 +1361,18 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, > else if (zap_huge_pmd(tlb, vma, pmd, addr)) > goto next; > /* fall through */ > + } else if (details && details->single_page && > + PageTransCompound(details->single_page) && > + next - addr == HPAGE_PMD_SIZE && pmd_none(*pmd)) { > + spinlock_t *ptl = pmd_lock(tlb->mm, pmd); > + /* > + * Take and drop THP pmd lock so that we cannot return > + * prematurely, while zap_huge_pmd() has cleared *pmd, > + * but not yet decremented compound_mapcount(). > + */ > + spin_unlock(ptl); > } > + > /* > * Here there can be other concurrent MADV_DONTNEED or > * trans huge page faults running, and if the pmd is > @@ -3236,6 +3247,36 @@ static inline void unmap_mapping_range_tree(struct rb_root_cached *root, > } > } > > +/** > + * unmap_mapping_page() - Unmap single page from processes. > + * @page: The locked page to be unmapped. > + * > + * Unmap this page from any userspace process which still has it mmaped. > + * Typically, for efficiency, the range of nearby pages has already been > + * unmapped by unmap_mapping_pages() or unmap_mapping_range(). But once > + * truncation or invalidation holds the lock on a page, it may find that > + * the page has been remapped again: and then uses unmap_mapping_page() > + * to unmap it finally. > + */ > +void unmap_mapping_page(struct page *page) > +{ > + struct address_space *mapping = page->mapping; > + struct zap_details details = { }; > + > + VM_BUG_ON(!PageLocked(page)); > + VM_BUG_ON(PageTail(page)); > + > + details.check_mapping = mapping; > + details.first_index = page->index; > + details.last_index = page->index + thp_nr_pages(page) - 1; > + details.single_page = page; > + > + i_mmap_lock_write(mapping); > + if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) > + unmap_mapping_range_tree(&mapping->i_mmap, &details); > + i_mmap_unlock_write(mapping); > +} > + > /** > * unmap_mapping_pages() - Unmap pages from processes. > * @mapping: The address space containing pages to be unmapped. > diff --git a/mm/truncate.c b/mm/truncate.c > index 95af244b112a..234ddd879caa 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -167,13 +167,10 @@ void do_invalidatepage(struct page *page, unsigned int offset, > * its lock, b) when a concurrent invalidate_mapping_pages got there first and > * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space. > */ > -static void > -truncate_cleanup_page(struct address_space *mapping, struct page *page) > +static void truncate_cleanup_page(struct page *page) > { > - if (page_mapped(page)) { > - unsigned int nr = thp_nr_pages(page); > - unmap_mapping_pages(mapping, page->index, nr, false); > - } > + if (page_mapped(page)) > + unmap_mapping_page(page); > > if (page_has_private(page)) > do_invalidatepage(page, 0, thp_size(page)); > @@ -218,7 +215,7 @@ int truncate_inode_page(struct address_space *mapping, struct page *page) > if (page->mapping != mapping) > return -EIO; > > - truncate_cleanup_page(mapping, page); > + truncate_cleanup_page(page); > delete_from_page_cache(page); > return 0; > } > @@ -325,7 +322,7 @@ void truncate_inode_pages_range(struct address_space *mapping, > index = indices[pagevec_count(&pvec) - 1] + 1; > truncate_exceptional_pvec_entries(mapping, &pvec, indices); > for (i = 0; i < pagevec_count(&pvec); i++) > - truncate_cleanup_page(mapping, pvec.pages[i]); > + truncate_cleanup_page(pvec.pages[i]); > delete_from_page_cache_batch(mapping, &pvec); > for (i = 0; i < pagevec_count(&pvec); i++) > unlock_page(pvec.pages[i]); > @@ -639,6 +636,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping, > continue; > } > > + if (!did_range_unmap && page_mapped(page)) { > + /* > + * If page is mapped, before taking its lock, > + * zap the rest of the file in one hit. > + */ > + unmap_mapping_pages(mapping, index, > + (1 + end - index), false); > + did_range_unmap = 1; > + } > + > lock_page(page); > WARN_ON(page_to_index(page) != index); > if (page->mapping != mapping) { > @@ -646,23 +653,11 @@ int invalidate_inode_pages2_range(struct address_space *mapping, > continue; > } > wait_on_page_writeback(page); > - if (page_mapped(page)) { > - if (!did_range_unmap) { > - /* > - * Zap the rest of the file in one hit. > - */ > - unmap_mapping_pages(mapping, index, > - (1 + end - index), false); > - did_range_unmap = 1; > - } else { > - /* > - * Just zap this page > - */ > - unmap_mapping_pages(mapping, index, > - 1, false); > - } > - } > + > + if (page_mapped(page)) > + unmap_mapping_page(page); > BUG_ON(page_mapped(page)); > + > ret2 = do_launder_page(mapping, page); > if (ret2 == 0) { > if (!invalidate_complete_page2(mapping, page)) > -- > 2.26.2 > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B09B9C48BCF for ; Wed, 9 Jun 2021 21:52:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0F66D613E1 for ; Wed, 9 Jun 2021 21:52:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0F66D613E1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 696776B006C; Wed, 9 Jun 2021 17:52:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6462E6B006E; Wed, 9 Jun 2021 17:52:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4990F6B0070; Wed, 9 Jun 2021 17:52:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21]) by kanga.kvack.org (Postfix) with ESMTP id 191E26B006C for ; Wed, 9 Jun 2021 17:52:04 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A2F77180B5F8C for ; Wed, 9 Jun 2021 21:52:03 +0000 (UTC) X-FDA: 78235532886.01.6128746 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf30.hostedemail.com (Postfix) with ESMTP id C843CE000254 for ; Wed, 9 Jun 2021 21:52:00 +0000 (UTC) Received: by mail-ed1-f48.google.com with SMTP id u24so30262018edy.11 for ; Wed, 09 Jun 2021 14:52:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5PRgHHIIgTRdkB8/6mBxXJGkSQIhkneFEmfNdmtFn5g=; b=uZhAJ80pi8a8J0lvLdWtgaqPpTTbpxUOmHlUAL1fjEuYGzTq4qbl/rhVL3OpHSqbrH yAKQ+dLQdux+VZaCfEikMeK85JudN7BkJOyrcjKU3BeDr5vBld5T/Oo7RoPllIa/9Bz0 Xut3XqjG4klR3TBw1BMg/dJgDvureeNeNP6JzwlAJGjxKETmn4xSQfix8J+umRoHsyo5 7ntrkN/V29MmZfxvhl7lPHj9wtixmfBdioCFaQS6RgOOBQRQjfUoD57qnnJ/AJ4pRSVd a4Vp0addKmlR8Xl647WGXiKARk8n1Zea93nvBdkzuwiSWA19TPLXkEu+PNcs7s58ZPrt vV7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5PRgHHIIgTRdkB8/6mBxXJGkSQIhkneFEmfNdmtFn5g=; b=GHb0fC9NrYrjQG4HNVcTm9ndLujxkiyn8Glv/SJDJWuJYxiu/4zaHijn4uPxOeg5tW TqauC2O0oRnXxMalmKk5S1wOY0gaE82nTZxIoXnBBopLOrkRDK7s7YJDqppM2kl6gKG1 KC+WqNNWZRejuKPo9Y6NT8BgsmqwNg7uaMU+xIcRrrgBeH4k7I4jC4DL4MedtTXNqLmT 12jJALEmdU5HP/1lQstE4Y+DVs3PfmeLfGDF2RmfsYd6a4/BVfw0Vn706yeWw/4aYrrS CdL0ZTfCPU3CSqrGv0sjqhCl1fAbxi+4uiwc4wVVvkgceLuMMKFbZrxZdd5xCrz9toL+ xzAA== X-Gm-Message-State: AOAM531ZkNs+sm4lXnVg7IVM11+KcBsqcRVjKOzMKYXo6QrVNYxkKyck RLBthhLhhXT/7GdZ7uS5KR5eCD5XaPAK8uBPG/k= X-Google-Smtp-Source: ABdhPJy7yppG66WxlN0hHUuoirVaGxFaTzxXKCPteRzfo/zc0wGyWnMX9bpraNLe3kdhC9KJRqwBiCIoSbxezpCYgH8= X-Received: by 2002:a05:6402:145a:: with SMTP id d26mr1430601edx.151.1623275521983; Wed, 09 Jun 2021 14:52:01 -0700 (PDT) MIME-Version: 1.0 References: <50eb41a-e6f8-e566-20ef-22fb63b0a529@google.com> In-Reply-To: From: Yang Shi Date: Wed, 9 Jun 2021 14:51:50 -0700 Message-ID: Subject: Re: [PATCH v3 06/10] mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() To: Hugh Dickins Cc: Andrew Morton , "Kirill A. Shutemov" , Wang Yugui , Matthew Wilcox , Naoya Horiguchi , Alistair Popple , Ralph Campbell , Zi Yan , Miaohe Lin , Minchan Kim , Jue Wang , Peter Xu , Jan Kara , Shakeel Butt , Oscar Salvador , Linux MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: C843CE000254 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=uZhAJ80p; spf=pass (imf30.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: 1xtfo3eaojj48s1u1qt4x6jf67ixchap X-HE-Tag: 1623275520-710858 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 9, 2021 at 2:16 PM Hugh Dickins wrote: > > There is a race between THP unmapping and truncation, when truncate sees > pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, > but before its page_remove_rmap() gets to decrement compound_mapcount: > generating false "BUG: Bad page cache" reports that the page is still > mapped when deleted. This commit fixes that, but not in the way I hoped. > > The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK) > instead of unmap_mapping_range() in truncate_cleanup_page(): it has often > been an annoyance that we usually call unmap_mapping_range() with no pages > locked, but there apply it to a single locked page. try_to_unmap() looks > more suitable for a single locked page. > > However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): > it is used to insert THP migration entries, but not used to unmap THPs. > Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB > needs are different, I'm too ignorant of the DAX cases, and couldn't > decide how far to go for anon+swap. Set that aside. > > The second attempt took a different tack: make no change in truncate.c, > but modify zap_huge_pmd() to insert an invalidated huge pmd instead of > clearing it initially, then pmd_clear() between page_remove_rmap() and > unlocking at the end. Nice. But powerpc blows that approach out of the > water, with its serialize_against_pte_lookup(), and interesting pgtable > usage. It would need serious help to get working on powerpc (with a > minor optimization issue on s390 too). Set that aside. > > Just add an "if (page_mapped(page)) synchronize_rcu();" or other such > delay, after unmapping in truncate_cleanup_page()? Perhaps, but though > that's likely to reduce or eliminate the number of incidents, it would > give less assurance of whether we had identified the problem correctly. > > This successful iteration introduces "unmap_mapping_page(page)" instead > of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, > with an addition to details. Then zap_pmd_range() watches for this case, > and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() > now does in the PVMW_SYNC case. Not pretty, but safe. > > Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to > assert its interface; but currently that's only used to make sure that > page->mapping is stable, and zap_pmd_range() doesn't care if the page is > locked or not. Along these lines, in invalidate_inode_pages2_range() > move the initial unmap_mapping_range() out from under page lock, before > then calling unmap_mapping_page() under page lock if still mapped. > > Fixes: fc127da085c2 ("truncate: handle file thp") > Signed-off-by: Hugh Dickins > Acked-by: Kirill A. Shutemov > Cc: > --- > v3: expanded spin_unlock(pmd_lock()) like in 03/10, per Yang Shi Thanks. Reviewed-by: Yang Shi > > include/linux/mm.h | 3 +++ > mm/memory.c | 41 +++++++++++++++++++++++++++++++++++++++++ > mm/truncate.c | 43 +++++++++++++++++++------------------------ > 3 files changed, 63 insertions(+), 24 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index c274f75efcf9..8ae31622deef 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1719,6 +1719,7 @@ struct zap_details { > struct address_space *check_mapping; /* Check page->mapping if set */ > pgoff_t first_index; /* Lowest page->index to unmap */ > pgoff_t last_index; /* Highest page->index to unmap */ > + struct page *single_page; /* Locked page to be unmapped */ > }; > > struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > @@ -1766,6 +1767,7 @@ extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma, > extern int fixup_user_fault(struct mm_struct *mm, > unsigned long address, unsigned int fault_flags, > bool *unlocked); > +void unmap_mapping_page(struct page *page); > void unmap_mapping_pages(struct address_space *mapping, > pgoff_t start, pgoff_t nr, bool even_cows); > void unmap_mapping_range(struct address_space *mapping, > @@ -1786,6 +1788,7 @@ static inline int fixup_user_fault(struct mm_struct *mm, unsigned long address, > BUG(); > return -EFAULT; > } > +static inline void unmap_mapping_page(struct page *page) { } > static inline void unmap_mapping_pages(struct address_space *mapping, > pgoff_t start, pgoff_t nr, bool even_cows) { } > static inline void unmap_mapping_range(struct address_space *mapping, > diff --git a/mm/memory.c b/mm/memory.c > index f3ffab9b9e39..486f4a2874e7 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1361,7 +1361,18 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, > else if (zap_huge_pmd(tlb, vma, pmd, addr)) > goto next; > /* fall through */ > + } else if (details && details->single_page && > + PageTransCompound(details->single_page) && > + next - addr == HPAGE_PMD_SIZE && pmd_none(*pmd)) { > + spinlock_t *ptl = pmd_lock(tlb->mm, pmd); > + /* > + * Take and drop THP pmd lock so that we cannot return > + * prematurely, while zap_huge_pmd() has cleared *pmd, > + * but not yet decremented compound_mapcount(). > + */ > + spin_unlock(ptl); > } > + > /* > * Here there can be other concurrent MADV_DONTNEED or > * trans huge page faults running, and if the pmd is > @@ -3236,6 +3247,36 @@ static inline void unmap_mapping_range_tree(struct rb_root_cached *root, > } > } > > +/** > + * unmap_mapping_page() - Unmap single page from processes. > + * @page: The locked page to be unmapped. > + * > + * Unmap this page from any userspace process which still has it mmaped. > + * Typically, for efficiency, the range of nearby pages has already been > + * unmapped by unmap_mapping_pages() or unmap_mapping_range(). But once > + * truncation or invalidation holds the lock on a page, it may find that > + * the page has been remapped again: and then uses unmap_mapping_page() > + * to unmap it finally. > + */ > +void unmap_mapping_page(struct page *page) > +{ > + struct address_space *mapping = page->mapping; > + struct zap_details details = { }; > + > + VM_BUG_ON(!PageLocked(page)); > + VM_BUG_ON(PageTail(page)); > + > + details.check_mapping = mapping; > + details.first_index = page->index; > + details.last_index = page->index + thp_nr_pages(page) - 1; > + details.single_page = page; > + > + i_mmap_lock_write(mapping); > + if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) > + unmap_mapping_range_tree(&mapping->i_mmap, &details); > + i_mmap_unlock_write(mapping); > +} > + > /** > * unmap_mapping_pages() - Unmap pages from processes. > * @mapping: The address space containing pages to be unmapped. > diff --git a/mm/truncate.c b/mm/truncate.c > index 95af244b112a..234ddd879caa 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -167,13 +167,10 @@ void do_invalidatepage(struct page *page, unsigned int offset, > * its lock, b) when a concurrent invalidate_mapping_pages got there first and > * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space. > */ > -static void > -truncate_cleanup_page(struct address_space *mapping, struct page *page) > +static void truncate_cleanup_page(struct page *page) > { > - if (page_mapped(page)) { > - unsigned int nr = thp_nr_pages(page); > - unmap_mapping_pages(mapping, page->index, nr, false); > - } > + if (page_mapped(page)) > + unmap_mapping_page(page); > > if (page_has_private(page)) > do_invalidatepage(page, 0, thp_size(page)); > @@ -218,7 +215,7 @@ int truncate_inode_page(struct address_space *mapping, struct page *page) > if (page->mapping != mapping) > return -EIO; > > - truncate_cleanup_page(mapping, page); > + truncate_cleanup_page(page); > delete_from_page_cache(page); > return 0; > } > @@ -325,7 +322,7 @@ void truncate_inode_pages_range(struct address_space *mapping, > index = indices[pagevec_count(&pvec) - 1] + 1; > truncate_exceptional_pvec_entries(mapping, &pvec, indices); > for (i = 0; i < pagevec_count(&pvec); i++) > - truncate_cleanup_page(mapping, pvec.pages[i]); > + truncate_cleanup_page(pvec.pages[i]); > delete_from_page_cache_batch(mapping, &pvec); > for (i = 0; i < pagevec_count(&pvec); i++) > unlock_page(pvec.pages[i]); > @@ -639,6 +636,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping, > continue; > } > > + if (!did_range_unmap && page_mapped(page)) { > + /* > + * If page is mapped, before taking its lock, > + * zap the rest of the file in one hit. > + */ > + unmap_mapping_pages(mapping, index, > + (1 + end - index), false); > + did_range_unmap = 1; > + } > + > lock_page(page); > WARN_ON(page_to_index(page) != index); > if (page->mapping != mapping) { > @@ -646,23 +653,11 @@ int invalidate_inode_pages2_range(struct address_space *mapping, > continue; > } > wait_on_page_writeback(page); > - if (page_mapped(page)) { > - if (!did_range_unmap) { > - /* > - * Zap the rest of the file in one hit. > - */ > - unmap_mapping_pages(mapping, index, > - (1 + end - index), false); > - did_range_unmap = 1; > - } else { > - /* > - * Just zap this page > - */ > - unmap_mapping_pages(mapping, index, > - 1, false); > - } > - } > + > + if (page_mapped(page)) > + unmap_mapping_page(page); > BUG_ON(page_mapped(page)); > + > ret2 = do_launder_page(mapping, page); > if (ret2 == 0) { > if (!invalidate_complete_page2(mapping, page)) > -- > 2.26.2 >