All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 03/10] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting (fwd)
Date: Wed, 9 Jun 2021 09:57:58 -0700	[thread overview]
Message-ID: <CAHbLzkpCO-yN6dkM_JbCpK3FtBy_xu1Z4VCcVsSKqkV6nH5Mew@mail.gmail.com> (raw)
In-Reply-To: <6b2b6683-d9a7-b7d0-a3e5-425b96338d63@google.com>

On Tue, Jun 8, 2021 at 9:12 PM Hugh Dickins <hughd@google.com> wrote:
>
>
>
> ---------- Forwarded message ----------
> Date: Tue, 8 Jun 2021 21:10:19 -0700 (PDT)
> From: Hugh Dickins <hughd@google.com>
> To: Andrew Morton <akpm@linux-foundation.org>
> Cc: Hugh Dickins <hughd@google.com>,
>     Kirill A. Shutemov <kirill.shutemov@linux.intel.com>,
>     Yang Shi <shy828301@gmail.com>, Wang Yugui <wangyugui@e16-tech.com>,
>     Matthew Wilcox <willy@infradead.org>,
>     Naoya Horiguchi <naoya.horiguchi@nec.com>,
>     Alistair Popple <apopple@nvidia.com>, Ralph Campbell <rcampbell@nvidia.com>,
>     Zi Yan <ziy@nvidia.com>, Miaohe Lin <linmiaohe@huawei.com>,
>     Minchan Kim <minchan@kernel.org>, Jue Wang <juew@google.com>,
>     Peter Xu <peterx@redhat.com>, Jan Kara <jack@suse.cz>,
>     Shakeel Butt <shakeelb@google.com>, Oscar Salvador <osalvador@suse.de>
> Subject: [PATCH v2 03/10] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
>
> Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE
> (!unmap_success): with dump_page() showing mapcount:1, but then its
> raw struct page output showing _mapcount ffffffff i.e. mapcount 0.
>
> And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed,
> it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)),
> and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG():
> all indicative of some mapcount difficulty in development here perhaps.
> But the !CONFIG_DEBUG_VM path handles the failures correctly and silently.
>
> I believe the problem is that once a racing unmap has cleared pte or pmd,
> try_to_unmap_one() may skip taking the page table lock, and emerge from
> try_to_unmap() before the racing task has reached decrementing mapcount.
>
> Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that
> follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC
> to the options, and passing that from unmap_page().
>
> When CONFIG_DEBUG_VM, or for non-debug too?  Consensus is to do the same
> for both: the slight overhead added should rarely matter, except perhaps
> if splitting sparsely-populated multiply-mapped shmem.  Once confident
> that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated.
>
> Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
> ---
> v2: moved TTU_SYNC definition up, to avoid conflict with other patchset
>     use TTU_SYNC even when non-debug, per Peter Xu and Yang Shi
>     expanded PVMW_SYNC's spin_unlock(pmd_lock()), per Kirill and Peter

Reviewed-by: Yang Shi <shy828301@gmail.com>

>
>  include/linux/rmap.h |  1 +
>  mm/huge_memory.c     |  2 +-
>  mm/page_vma_mapped.c | 11 +++++++++++
>  mm/rmap.c            | 17 ++++++++++++++++-
>  4 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index def5c62c93b3..8d04e7deedc6 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -91,6 +91,7 @@ enum ttu_flags {
>
>         TTU_SPLIT_HUGE_PMD      = 0x4,  /* split huge PMD if any */
>         TTU_IGNORE_MLOCK        = 0x8,  /* ignore mlock */
> +       TTU_SYNC                = 0x10, /* avoid racy checks with PVMW_SYNC */
>         TTU_IGNORE_HWPOISON     = 0x20, /* corrupted page is recoverable */
>         TTU_BATCH_FLUSH         = 0x40, /* Batch TLB flushes where possible
>                                          * and caller guarantees they will
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5885c5f5836f..84ab735139dc 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
>
>  static void unmap_page(struct page *page)
>  {
> -       enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK |
> +       enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC |
>                 TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
>         bool unmap_success;
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 2cf01d933f13..5b559967410e 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -212,6 +212,17 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>                         pvmw->ptl = NULL;
>                 }
>         } else if (!pmd_present(pmde)) {
> +               /*
> +                * If PVMW_SYNC, take and drop THP pmd lock so that we
> +                * cannot return prematurely, while zap_huge_pmd() has
> +                * cleared *pmd but not decremented compound_mapcount().
> +                */
> +               if ((pvmw->flags & PVMW_SYNC) &&
> +                   PageTransCompound(pvmw->page)) {
> +                       spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
> +
> +                       spin_unlock(ptl);
> +               }
>                 return false;
>         }
>         if (!map_pte(pvmw))
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 693a610e181d..07811b4ae793 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1405,6 +1405,15 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>         struct mmu_notifier_range range;
>         enum ttu_flags flags = (enum ttu_flags)(long)arg;
>
> +       /*
> +        * When racing against e.g. zap_pte_range() on another cpu,
> +        * in between its ptep_get_and_clear_full() and page_remove_rmap(),
> +        * try_to_unmap() may return false when it is about to become true,
> +        * if page table locking is skipped: use TTU_SYNC to wait for that.
> +        */
> +       if (flags & TTU_SYNC)
> +               pvmw.flags = PVMW_SYNC;
> +
>         /* munlock has nothing to gain from examining un-locked vmas */
>         if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
>                 return true;
> @@ -1777,7 +1786,13 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags)
>         else
>                 rmap_walk(page, &rwc);
>
> -       return !page_mapcount(page) ? true : false;
> +       /*
> +        * When racing against e.g. zap_pte_range() on another cpu,
> +        * in between its ptep_get_and_clear_full() and page_remove_rmap(),
> +        * try_to_unmap() may return false when it is about to become true,
> +        * if page table locking is skipped: use TTU_SYNC to wait for that.
> +        */
> +       return !page_mapcount(page);
>  }
>
>  /**
> --
> 2.26.2
>
>

  parent reply	other threads:[~2021-06-09 16:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09  4:12 [PATCH v2 03/10] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting (fwd) Hugh Dickins
2021-06-09 10:25 ` Kirill A. Shutemov
2021-06-09 16:57 ` Yang Shi [this message]
2021-06-09 16:57   ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkpCO-yN6dkM_JbCpK3FtBy_xu1Z4VCcVsSKqkV6nH5Mew@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.