mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, aquini@redhat.com,
	desmondcheongzx@gmail.com, linux-mm@kvack.org,
	mgorman@techsingularity.net, mm-commits@vger.kernel.org,
	Qiang.Zhang@windriver.com, skhan@linuxfoundation.org,
	torvalds@linux-foundation.org
Subject: [patch 05/13] mm/page_alloc: avoid page allocator recursion with pagesets.lock held
Date: Wed, 14 Jul 2021 21:26:46 -0700	[thread overview]
Message-ID: <20210715042646.glYIx91lD%akpm@linux-foundation.org> (raw)
In-Reply-To: <20210714212609.fad116e584ba1194981a6294@linux-foundation.org>

From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: avoid page allocator recursion with pagesets.lock held

Syzbot is reporting potential deadlocks due to pagesets.lock when
PAGE_OWNER is enabled.  One example from Desmond Cheong Zhi Xi is as
follows

  __alloc_pages_bulk()
    local_lock_irqsave(&pagesets.lock, flags) <---- outer lock here
    prep_new_page():
      post_alloc_hook():
        set_page_owner():
          __set_page_owner():
            save_stack():
              stack_depot_save():
                alloc_pages():
                  alloc_page_interleave():
                    __alloc_pages():
                      get_page_from_freelist():
                        rm_queue():
                          rm_queue_pcplist():
                            local_lock_irqsave(&pagesets.lock, flags);
                            *** DEADLOCK ***

Zhang, Qiang also reported

  BUG: sleeping function called from invalid context at mm/page_alloc.c:5179
  in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
  .....
  __dump_stack lib/dump_stack.c:79 [inline]
  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
  ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9153
  prepare_alloc_pages+0x3da/0x580 mm/page_alloc.c:5179
  __alloc_pages+0x12f/0x500 mm/page_alloc.c:5375
  alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2147
  alloc_pages+0x238/0x2a0 mm/mempolicy.c:2270
  stack_depot_save+0x39d/0x4e0 lib/stackdepot.c:303
  save_stack+0x15e/0x1e0 mm/page_owner.c:120
  __set_page_owner+0x50/0x290 mm/page_owner.c:181
  prep_new_page mm/page_alloc.c:2445 [inline]
  __alloc_pages_bulk+0x8b9/0x1870 mm/page_alloc.c:5313
  alloc_pages_bulk_array_node include/linux/gfp.h:557 [inline]
  vm_area_alloc_pages mm/vmalloc.c:2775 [inline]
  __vmalloc_area_node mm/vmalloc.c:2845 [inline]
  __vmalloc_node_range+0x39d/0x960 mm/vmalloc.c:2947
  __vmalloc_node mm/vmalloc.c:2996 [inline]
  vzalloc+0x67/0x80 mm/vmalloc.c:3066

There are a number of ways it could be fixed.  The page owner code could
be audited to strip GFP flags that allow sleeping but it'll impair the
functionality of PAGE_OWNER if allocations fail.  The bulk allocator could
add a special case to release/reacquire the lock for prep_new_page and
lookup PCP after the lock is reacquired at the cost of performance.  The
pages requiring prep could be tracked using the least significant bit and
looping through the array although it is more complicated for the list
interface.  The options are relatively complex and the second one still
incurs a performance penalty when PAGE_OWNER is active so this patch takes
the simple approach -- disable bulk allocation of PAGE_OWNER is active. 
The caller will be forced to allocate one page at a time incurring a
performance penalty but PAGE_OWNER is already a performance penalty.

Link: https://lkml.kernel.org/r/20210708081434.GV3840@techsingularity.net
Fixes: dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reported-by: "Zhang, Qiang" <Qiang.Zhang@windriver.com>
Reported-by: syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com
Tested-by: syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/mm/page_alloc.c~mm-page_alloc-avoid-page-allocator-recursion-with-pagesetslock-held
+++ a/mm/page_alloc.c
@@ -5239,6 +5239,18 @@ unsigned long __alloc_pages_bulk(gfp_t g
 	if (nr_pages - nr_populated == 1)
 		goto failed;
 
+#ifdef CONFIG_PAGE_OWNER
+	/*
+	 * PAGE_OWNER may recurse into the allocator to allocate space to
+	 * save the stack with pagesets.lock held. Releasing/reacquiring
+	 * removes much of the performance benefit of bulk allocation so
+	 * force the caller to allocate one page at a time as it'll have
+	 * similar performance to added complexity to the bulk allocator.
+	 */
+	if (static_branch_unlikely(&page_owner_inited))
+		goto failed;
+#endif
+
 	/* May set ALLOC_NOFRAGMENT, fragmentation will return 1 page. */
 	gfp &= gfp_allowed_mask;
 	alloc_gfp = gfp;
_

  parent reply	other threads:[~2021-07-15  4:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-15  4:26 incoming Andrew Morton
2021-07-15  4:26 ` [patch 01/13] mm: move helper to check slub_debug_enabled Andrew Morton
2021-07-15  4:26 ` [patch 02/13] kasan: add memzero init for unaligned size at DEBUG Andrew Morton
2021-07-15  4:26 ` [patch 03/13] kasan: fix build by including kernel.h Andrew Morton
2021-07-15  4:26 ` [patch 04/13] Revert "mm/page_alloc: make should_fail_alloc_page() static" Andrew Morton
2021-07-15  4:26 ` Andrew Morton [this message]
2021-07-15  4:26 ` [patch 06/13] mm/page_alloc: correct return value when failing at preparing Andrew Morton
2021-07-15  4:26 ` [patch 07/13] mm/page_alloc: further fix __alloc_pages_bulk() return value Andrew Morton
2021-07-15  4:26 ` [patch 08/13] mm: fix the try_to_unmap prototype for !CONFIG_MMU Andrew Morton
2021-07-15  4:26 ` [patch 09/13] lib/test_hmm: remove set but unused page variable Andrew Morton
2021-07-15  4:27 ` [patch 10/13] hfs: add missing clean-up in hfs_fill_super Andrew Morton
2021-07-15  4:27 ` [patch 11/13] hfs: fix high memory mapping in hfs_bnode_read Andrew Morton
2021-07-15  4:27 ` [patch 12/13] hfs: add lock nesting notation to hfs_find_init Andrew Morton
2021-07-15  4:27 ` [patch 13/13] mm/hugetlb: fix refs calculation from unaligned @vaddr Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210715042646.glYIx91lD%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=Qiang.Zhang@windriver.com \
    --cc=aquini@redhat.com \
    --cc=desmondcheongzx@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mm-commits@vger.kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).