All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hillf Danton <hdanton@sina.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Suleiman Souhlal <suleiman@google.com>,
	Jann Horn <jannh@google.com>, Hillf Danton <hdanton@sina.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: thp: fix MADV_REMOVE deadlock on shmem THP
Date: Sun, 17 Jan 2021 11:48:03 +0800	[thread overview]
Message-ID: <20210117034803.16648-1-hdanton@sina.com> (raw)
In-Reply-To: <alpine.LSU.2.11.2101161409470.2022@eggly.anvils>

on Sat, 16 Jan 2021 14:16:24 -0800 (PST) Hugh Dickins wrote:
> 
> Sergey reported deadlock between kswapd correctly doing its usual
> lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem),
> and madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing
> down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page).
> 
> This happened when shmem_fallocate(punch hole)'s unmap_mapping_range()
> reaches zap_pmd_range()'s call to __split_huge_pmd().  The same deadlock
> could occur when partially truncating a mapped huge tmpfs file, or using
> fallocate(FALLOC_FL_PUNCH_HOLE) on it.
> 
> __split_huge_pmd()'s page lock was added in 5.8, to make sure that any
> concurrent use of reuse_swap_page() (holding page lock) could not catch
> the anon THP's mapcounts and swapcounts while they were being split.
> 
> Fortunately, reuse_swap_page() is never applied to a shmem or file THP
> (not even by khugepaged, which checks PageSwapCache before calling),
> and anonymous THPs are never created in shmem or file areas: so that
> __split_huge_pmd()'s page lock can only be necessary for anonymous THPs,
> on which there is no risk of deadlock with i_mmap_rwsem.

	CPU0		CPU1
	----		----
	kswapd		madvise
	lock_page	down_write(i_mmap_rwsem)
	down_read	lock_page

Given nothing wrong on the reclaimer's side, it is the reverse locking
order on CPU1 that paves a brick for the peril to run another term, a
long one maybe, if I dont misread you.


> Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> Fixes: c444eb564fb1 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
> Cc: stable@vger.kernel.org
> ---
> 
> The status of reuse_swap_page(), and its use on THPs, is currently under
> discussion, and may need to be changed: but this patch is a simple fix
> to the reported deadlock, which can go in now, and be easily backported
> to whichever stable and longterm releases took in 5.8's c444eb564fb1.
> 
>  mm/huge_memory.c |   37 +++++++++++++++++++++++--------------
>  1 file changed, 23 insertions(+), 14 deletions(-)
> 
> --- 5.11-rc3/mm/huge_memory.c	2020-12-27 20:39:37.667932292 -0800
> +++ linux/mm/huge_memory.c	2021-01-16 08:02:08.265551393 -0800
> @@ -2202,7 +2202,7 @@ void __split_huge_pmd(struct vm_area_str
>  {
>  	spinlock_t *ptl;
>  	struct mmu_notifier_range range;
> -	bool was_locked = false;
> +	bool do_unlock_page = false;
>  	pmd_t _pmd;
>  
>  	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> @@ -2218,7 +2218,6 @@ void __split_huge_pmd(struct vm_area_str
>  	VM_BUG_ON(freeze && !page);
>  	if (page) {
>  		VM_WARN_ON_ONCE(!PageLocked(page));
> -		was_locked = true;
>  		if (page != pmd_page(*pmd))
>  			goto out;
>  	}
> @@ -2227,19 +2226,29 @@ repeat:
>  	if (pmd_trans_huge(*pmd)) {
>  		if (!page) {
>  			page = pmd_page(*pmd);
> -			if (unlikely(!trylock_page(page))) {
> -				get_page(page);
> -				_pmd = *pmd;
> -				spin_unlock(ptl);
> -				lock_page(page);
> -				spin_lock(ptl);
> -				if (unlikely(!pmd_same(*pmd, _pmd))) {
> -					unlock_page(page);
> +			/*
> +			 * An anonymous page must be locked, to ensure that a
> +			 * concurrent reuse_swap_page() sees stable mapcount;
> +			 * but reuse_swap_page() is not used on shmem or file,
> +			 * and page lock must not be taken when zap_pmd_range()
> +			 * calls __split_huge_pmd() while i_mmap_lock is held.
> +			 */
> +			if (PageAnon(page)) {
> +				if (unlikely(!trylock_page(page))) {
> +					get_page(page);
> +					_pmd = *pmd;
> +					spin_unlock(ptl);
> +					lock_page(page);
> +					spin_lock(ptl);
> +					if (unlikely(!pmd_same(*pmd, _pmd))) {
> +						unlock_page(page);
> +						put_page(page);
> +						page = NULL;
> +						goto repeat;
> +					}
>  					put_page(page);
> -					page = NULL;
> -					goto repeat;
>  				}
> -				put_page(page);
> +				do_unlock_page = true;
>  			}
>  		}
>  		if (PageMlocked(page))
> @@ -2249,7 +2258,7 @@ repeat:
>  	__split_huge_pmd_locked(vma, pmd, range.start, freeze);
>  out:
>  	spin_unlock(ptl);
> -	if (!was_locked && page)
> +	if (do_unlock_page)
>  		unlock_page(page);
>  	/*
>  	 * No need to double call mmu_notifier->invalidate_range() callback.
> 


      reply	other threads:[~2021-01-17  3:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-16 22:16 [PATCH] mm: thp: fix MADV_REMOVE deadlock on shmem THP Hugh Dickins
2021-01-16 22:16 ` Hugh Dickins
2021-01-17  3:48 ` Hillf Danton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210117034803.16648-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=suleiman@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.