linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, skhan@linuxfoundation.org,
	gregkh@linuxfoundation.org, Zqiang <qiang.zhang@windriver.com>,
	linux-kernel-mentees@lists.linuxfoundation.org,
	syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com
Subject: Re: [PATCH] mm/page_alloc: avoid deadlocks for &pagesets.lock
Date: Thu, 8 Jul 2021 09:48:30 +0800	[thread overview]
Message-ID: <b97a32b7-8b63-9f34-395c-bf902aa8a6d2@gmail.com> (raw)
In-Reply-To: <20210707122530.GU3840@techsingularity.net>

On 7/7/21 8:25 pm, Mel Gorman wrote:
> On Wed, Jul 07, 2021 at 07:12:45PM +0800, Desmond Cheong Zhi Xi wrote:
>> Syzbot reports a number of potential deadlocks for &pagesets.lock. It
>> seems that this new lock is being used as both an inner and outer
>> lock, which makes it prone to creating circular dependencies.
>>
>> For example, one such call trace goes as follows:
>>    __alloc_pages_bulk()
>>      local_lock_irqsave(&pagesets.lock, flags) <---- outer lock here
>>      prep_new_page():
>>        post_alloc_hook():
>>          set_page_owner():
>>            __set_page_owner():
>>              save_stack():
>>                stack_depot_save():
>>                  alloc_pages():
>>                    alloc_page_interleave():
>>                      __alloc_pages():
>>                        get_page_from_freelist():
>>                          rm_queue():
>>                            rm_queue_pcplist():
>>                              local_lock_irqsave(&pagesets.lock, flags);
>>                              *** DEADLOCK ***
>>
>> The common culprit for the lockdep splats seems to be the call to
>> local_lock_irqsave(&pagesets.lock, flags) inside
>> __alloc_pages_bulk(). &pagesets.lock becomes an outer lock if it's
>> held during the call to prep_new_page().
>>
>> As the local lock is used to protect the PCP structure, we adjust the
>> locking in __alloc_pages_bulk so that only the necessary structures
>> are protected.
>>
>> Fixes: dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock")
>> Reported-and-tested-by: syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com
>> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
> 
> Hi Desmond,
> 
> Thanks for the report. Unfortunately, this patch incurs a performance
> penalty for the bulk allocator even if PAGE_OWNER is disabled. Can you
> try the following as an alternative please? It passed a build and boot
> test but I didn't try triggering the actual bug.
> 

Hi Mel,

Thanks for the feedback, I hadn't thought of the performance penalty. I 
think you're right that if the recursive call to __set_page_owner is 
avoided, then that also avoids creating the circular lock hierarchy.

Your proposed patch passed the Syzbot repro test:

Reported-and-tested-by: 
syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com

Best wishes,
Desmond

> --8<--
> mm/page_alloc: Avoid page allocator recursion with pagesets.lock held
> 
> Syzbot is reporting potential deadlocks due to pagesets.lock when
> PAGE_OWNER is enabled. One example from Desmond Cheong Zhi Xi is
> as follows
> 
>    __alloc_pages_bulk()
>      local_lock_irqsave(&pagesets.lock, flags) <---- outer lock here
>      prep_new_page():
>        post_alloc_hook():
>          set_page_owner():
>            __set_page_owner():
>              save_stack():
>                stack_depot_save():
>                  alloc_pages():
>                    alloc_page_interleave():
>                      __alloc_pages():
>                        get_page_from_freelist():
>                          rm_queue():
>                            rm_queue_pcplist():
>                              local_lock_irqsave(&pagesets.lock, flags);
>                              *** DEADLOCK ***
> 
> Zhang, Qiang also reported
> 
>    BUG: sleeping function called from invalid context at mm/page_alloc.c:5179
>    in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
>    .....
>    __dump_stack lib/dump_stack.c:79 [inline]
>    dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
>    ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9153
>    prepare_alloc_pages+0x3da/0x580 mm/page_alloc.c:5179
>    __alloc_pages+0x12f/0x500 mm/page_alloc.c:5375
>    alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2147
>    alloc_pages+0x238/0x2a0 mm/mempolicy.c:2270
>    stack_depot_save+0x39d/0x4e0 lib/stackdepot.c:303
>    save_stack+0x15e/0x1e0 mm/page_owner.c:120
>    __set_page_owner+0x50/0x290 mm/page_owner.c:181
>    prep_new_page mm/page_alloc.c:2445 [inline]
>    __alloc_pages_bulk+0x8b9/0x1870 mm/page_alloc.c:5313
>    alloc_pages_bulk_array_node include/linux/gfp.h:557 [inline]
>    vm_area_alloc_pages mm/vmalloc.c:2775 [inline]
>    __vmalloc_area_node mm/vmalloc.c:2845 [inline]
>    __vmalloc_node_range+0x39d/0x960 mm/vmalloc.c:2947
>    __vmalloc_node mm/vmalloc.c:2996 [inline]
>    vzalloc+0x67/0x80 mm/vmalloc.c:3066
> 
> There are a number of ways it could be fixed. The page owner code could
> be audited to strip GFP flags that allow sleeping but it'll impair the
> functionality of PAGE_OWNER if allocations fail. The bulk allocator
> could add a special case to release/reacquire the lock for prep_new_page
> and lookup PCP after the lock is reacquired at the cost of performance.
> Both options are relatively complex and the second one still incurs a
> performance penalty when PAGE_OWNER is active so this patch takes the
> simple approach -- disable bulk allocation of PAGE_OWNER is active. The
> caller will be forced to allocate one page at a time incurring a
> performance penalty but PAGE_OWNER is already a performance penalty.
> 
> Fixes: dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock")
> Reported-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
> Reported-by: "Zhang, Qiang" <Qiang.Zhang@windriver.com>
> Reported-by: syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>   mm/page_alloc.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3b97e17806be..6ef86f338151 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5239,6 +5239,18 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>   	if (nr_pages - nr_populated == 1)
>   		goto failed;
>   
> +#ifdef CONFIG_PAGE_OWNER
> +	/*
> +	 * PAGE_OWNER may recurse into the allocator to allocate space to
> +	 * save the stack with pagesets.lock held. Releasing/reacquiring
> +	 * removes much of the performance benefit of bulk allocation so
> +	 * force the caller to allocate one page at a time as it'll have
> +	 * similar performance to added complexity to the bulk allocator.
> +	 */
> +	if (static_branch_unlikely(&page_owner_inited))
> +		goto failed;
> +#endif
> +
>   	/* May set ALLOC_NOFRAGMENT, fragmentation will return 1 page. */
>   	gfp &= gfp_allowed_mask;
>   	alloc_gfp = gfp;
> 


      reply	other threads:[~2021-07-08  1:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07 11:12 [PATCH] mm/page_alloc: avoid deadlocks for &pagesets.lock Desmond Cheong Zhi Xi
2021-07-07 12:25 ` Mel Gorman
2021-07-08  1:48   ` Desmond Cheong Zhi Xi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b97a32b7-8b63-9f34-395c-bf902aa8a6d2@gmail.com \
    --to=desmondcheongzx@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=qiang.zhang@windriver.com \
    --cc=skhan@linuxfoundation.org \
    --cc=syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).