All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Yang Shi <shy828301@gmail.com>,
	Wang Yugui <wangyugui@e16-tech.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Naoya Horiguchi" <naoya.horiguchi@nec.com>,
	Ralph Campbell <rcampbell@nvidia.com>, Zi Yan <ziy@nvidia.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>, Jue Wang <juew@google.com>,
	Peter Xu <peterx@redhat.com>, Jan Kara <jack@suse.cz>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/7] mm/thp: try_to_unmap() use TTU_SYNC for safe DEBUG_VM splitting
Date: Wed, 2 Jun 2021 11:59:13 +1000	[thread overview]
Message-ID: <5096506.aDjlqC2hHN@nvdebian> (raw)
In-Reply-To: <alpine.LSU.2.11.2106011405510.2148@eggly.anvils>

On Wednesday, 2 June 2021 7:07:53 AM AEST Hugh Dickins wrote:
> External email: Use caution opening links or attachments
> 
> 
> Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE
> (!unmap_success): with dump_page() showing mapcount:1, but then its
> raw struct page output showing _mapcount ffffffff i.e. mapcount 0.
> 
> And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed,
> it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)),
> and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG():
> all indicative of some mapcount difficulty in development here perhaps.
> But the !CONFIG_DEBUG_VM path handles the failures correctly and silently.
> 
> I believe the problem is that once a racing unmap has cleared pte or pmd,
> try_to_unmap_one() may skip taking the page table lock, and emerge from
> try_to_unmap() before the racing task has reached decrementing mapcount.
> 
> Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that
> follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC
> to the options, and passing that from unmap_page() when CONFIG_DEBUG_VM=y.
> It could be passed in the non-debug case too, but that would sometimes add
> a little overhead, whereas it's rare for this race to result in failure.
> 
> mm/memory-failure.c:hwpoison_user_mappings() should probably use the new
> TTU_SYNC option too, just in case this race coincides with its attempts to
> unmap a failing page (THP or not); but this commit does not add that.
> 
> Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with
> generic rmap walkers") Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
> ---
>  include/linux/rmap.h |  3 ++-
>  mm/huge_memory.c     |  4 ++++
>  mm/page_vma_mapped.c |  8 ++++++++
>  mm/rmap.c            | 17 ++++++++++++++++-
>  4 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index def5c62c93b3..891599a4cb8d 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -97,7 +97,8 @@ enum ttu_flags {
>                                          * do a final flush if necessary */
>         TTU_RMAP_LOCKED         = 0x80, /* do not grab rmap lock:
>                                          * caller holds it */
> -       TTU_SPLIT_FREEZE        = 0x100,                /* freeze pte under
> splitting thp */ +       TTU_SPLIT_FREEZE        = 0x100, /* freeze pte
> under splitting thp */ +       TTU_SYNC                = 0x200, /* avoid
> racy checks with PVMW_SYNC */ };
> 
>  #ifdef CONFIG_MMU
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9fb7b47da87e..305f709a7aca 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2357,6 +2357,10 @@ static void unmap_page(struct page *page)
>         if (PageAnon(page))
>                 ttu_flags |= TTU_SPLIT_FREEZE;
> 
> +       /* Make sure that the BUGs will not bite */
> +       if (IS_ENABLED(CONFIG_DEBUG_VM))
> +               ttu_flags |= TTU_SYNC;
> +
>         unmap_success = try_to_unmap(page, ttu_flags);
>         VM_BUG_ON_PAGE(!unmap_success, page);
>  }
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 2cf01d933f13..b45d22738b45 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -212,6 +212,14 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk
> *pvmw) pvmw->ptl = NULL;
>                 }
>         } else if (!pmd_present(pmde)) {
> +               /*
> +                * If PVMW_SYNC, take and drop THP pmd lock so that we
> +                * cannot return prematurely, while zap_huge_pmd() has
> +                * cleared *pmd but not decremented compound_mapcount().
> +                */
> +               if ((pvmw->flags & PVMW_SYNC) &&
> +                   PageTransCompound(pvmw->page))
> +                       spin_unlock(pmd_lock(mm, pvmw->pmd));
>                 return false;
>         }
>         if (!map_pte(pvmw))
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 693a610e181d..07811b4ae793 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1405,6 +1405,15 @@ static bool try_to_unmap_one(struct page *page,
> struct vm_area_struct *vma, struct mmu_notifier_range range;
>         enum ttu_flags flags = (enum ttu_flags)(long)arg;
> 
> +       /*
> +        * When racing against e.g. zap_pte_range() on another cpu,
> +        * in between its ptep_get_and_clear_full() and page_remove_rmap(),
> +        * try_to_unmap() may return false when it is about to become true,
> +        * if page table locking is skipped: use TTU_SYNC to wait for that.
> +        */
> +       if (flags & TTU_SYNC)
> +               pvmw.flags = PVMW_SYNC;
> +

If this gets applied on top of my series then I think we would also need to 
add this to the start of try_to_migrate_one() as I assume you can hit this bug 
regardless of whether unmapping vs. installing swap migration entries.

We would also need to update the flag check at the start of try_to_migrate() 
to allow passing TTU_SYNC.

>         /* munlock has nothing to gain from examining un-locked vmas */
>         if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
>                 return true;
> @@ -1777,7 +1786,13 @@ bool try_to_unmap(struct page *page, enum ttu_flags
> flags) else
>                 rmap_walk(page, &rwc);
> 
> -       return !page_mapcount(page) ? true : false;
> +       /*
> +        * When racing against e.g. zap_pte_range() on another cpu,
> +        * in between its ptep_get_and_clear_full() and page_remove_rmap(),
> +        * try_to_unmap() may return false when it is about to become true,
> +        * if page table locking is skipped: use TTU_SYNC to wait for that.
> +        */
> +       return !page_mapcount(page);
>  }
> 
>  /**
> --
> 2.32.0.rc0.204.g9fa02ecfa5-goog





  reply	other threads:[~2021-06-02  1:59 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01 21:03 [PATCH 0/7] mm/thp: fix THP splitting unmap BUGs and related Hugh Dickins
2021-06-01 21:03 ` Hugh Dickins
2021-06-01 21:05 ` [PATCH 1/7] mm/thp: fix __split_huge_pmd_locked() on shmem migration entry Hugh Dickins
2021-06-01 21:05   ` Hugh Dickins
2021-06-03 21:26   ` Yang Shi
2021-06-03 21:26     ` Yang Shi
2021-06-04  2:22     ` Hugh Dickins
2021-06-04  2:22       ` Hugh Dickins
2021-06-04 18:03       ` Yang Shi
2021-06-04 18:03         ` Yang Shi
2021-06-04 21:52         ` Hugh Dickins
2021-06-04 21:52           ` Hugh Dickins
2021-06-04 15:34   ` Kirill A. Shutemov
2021-06-04 21:29     ` Hugh Dickins
2021-06-04 21:29       ` Hugh Dickins
2021-06-01 21:07 ` [PATCH 2/7] mm/thp: try_to_unmap() use TTU_SYNC for safe DEBUG_VM splitting Hugh Dickins
2021-06-01 21:07   ` Hugh Dickins
2021-06-02  1:59   ` Alistair Popple [this message]
2021-06-03 21:45   ` Yang Shi
2021-06-03 21:45     ` Yang Shi
2021-06-04  2:45     ` Hugh Dickins
2021-06-04  2:45       ` Hugh Dickins
2021-06-04 18:24       ` Yang Shi
2021-06-04 18:24         ` Yang Shi
2021-06-03 21:48   ` Peter Xu
2021-06-04  2:54     ` Hugh Dickins
2021-06-04  2:54       ` Hugh Dickins
2021-06-04 14:48       ` Peter Xu
2021-06-04 22:26         ` Hugh Dickins
2021-06-04 22:26           ` Hugh Dickins
2021-06-04 15:47       ` Kirill A. Shutemov
2021-06-01 21:09 ` [PATCH 3/7] mm/thp: fix vma_address() if virtual address below file offset Hugh Dickins
2021-06-01 21:09   ` Hugh Dickins
2021-06-01 21:30   ` Matthew Wilcox
2021-06-03 21:36     ` Hugh Dickins
2021-06-03 21:36       ` Hugh Dickins
2021-06-03 21:40       ` [PATCH v2 " Hugh Dickins
2021-06-03 21:40         ` Hugh Dickins
2021-06-04 15:53         ` Kirill A. Shutemov
2021-06-04 17:36         ` Matthew Wilcox
2021-06-04 22:35           ` Hugh Dickins
2021-06-04 22:35             ` Hugh Dickins
2021-06-01 21:11 ` [PATCH 4/7] mm/thp: fix page_address_in_vma() on file THP tails Hugh Dickins
2021-06-01 21:11   ` Hugh Dickins
2021-06-01 21:32   ` Matthew Wilcox
2021-06-03 22:06   ` Yang Shi
2021-06-03 22:06     ` Yang Shi
2021-06-04 15:54   ` Kirill A. Shutemov
2021-06-01 21:13 ` [PATCH 5/7] mm/thp: fix page_vma_mapped_walk() if huge page mapped by ptes Hugh Dickins
2021-06-01 21:13   ` Hugh Dickins
2021-06-04 16:24   ` Kirill A. Shutemov
2021-06-04 17:42     ` Matthew Wilcox
2021-06-04 22:56     ` Hugh Dickins
2021-06-04 22:56       ` Hugh Dickins
2021-06-01 21:15 ` [PATCH 6/7] mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() Hugh Dickins
2021-06-01 21:15   ` Hugh Dickins
2021-06-04 16:39   ` Kirill A. Shutemov
2021-06-04 23:07     ` Hugh Dickins
2021-06-04 23:07       ` Hugh Dickins
2021-06-01 21:17 ` [PATCH 7/7] mm/thp: remap_page() is only needed on anonymous THP Hugh Dickins
2021-06-01 21:17   ` Hugh Dickins
2021-06-03 22:09   ` Yang Shi
2021-06-03 22:09     ` Yang Shi
2021-06-04 16:41   ` Kirill A. Shutemov
2021-06-02  2:07 ` [PATCH 0/7] mm/thp: fix THP splitting unmap BUGs and related Alistair Popple
2021-06-03 22:21 ` Hugh Dickins
2021-06-03 22:21   ` Hugh Dickins
2021-06-03 23:03   ` Andrew Morton
2021-06-03 22:26 ` [PATCH 6.1/7] mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split Hugh Dickins
2021-06-03 22:26   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5096506.aDjlqC2hHN@nvdebian \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=juew@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterx@redhat.com \
    --cc=rcampbell@nvidia.com \
    --cc=shy828301@gmail.com \
    --cc=wangyugui@e16-tech.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.