linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: might_sleep warning
@ 2018-03-06 19:20 Pavel Tatashin
  2018-03-06 20:36 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Tatashin @ 2018-03-06 19:20 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, pasha.tatashin, m.mizuma, akpm,
	mhocko, catalin.marinas, takahiro.akashi, gi-oh.kim,
	heiko.carstens, baiyaowei, richard.weiyang, paul.burton,
	miles.chen, vbabka, mgorman, hannes, linux-kernel, linux-mm

Robot reported this issue:
https://lkml.org/lkml/2018/2/27/851

That is introduced by:
mm: initialize pages on demand during boot

The problem is caused by changing static branch value within spin lock.
Spin lock disables preemption, and changing static branch value takes
mutex lock in its path, and thus may sleep.

The fix is to add another boolean variable to avoid the need to change
static branch within spinlock.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 mm/page_alloc.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b337a026007c..52edc6695b2b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1579,6 +1579,7 @@ static int __init deferred_init_memmap(void *data)
  * page_alloc_init_late() soon after smp_init() is complete.
  */
 static __initdata DEFINE_SPINLOCK(deferred_zone_grow_lock);
+static bool deferred_zone_grow __initdata = true;
 static DEFINE_STATIC_KEY_TRUE(deferred_pages);
 
 /*
@@ -1616,7 +1617,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
 	 * Bail if we raced with another thread that disabled on demand
 	 * initialization.
 	 */
-	if (!static_branch_unlikely(&deferred_pages)) {
+	if (!static_branch_unlikely(&deferred_pages) || !deferred_zone_grow) {
 		spin_unlock_irqrestore(&deferred_zone_grow_lock, flags);
 		return false;
 	}
@@ -1683,10 +1684,15 @@ void __init page_alloc_init_late(void)
 	/*
 	 * We are about to initialize the rest of deferred pages, permanently
 	 * disable on-demand struct page initialization.
+	 *
+	 * Note: it is prohibited to modify static branches in non-preemptible
+	 * context. Since, spin_lock() disables preemption, we must use an
+	 * extra boolean deferred_zone_grow.
 	 */
 	spin_lock(&deferred_zone_grow_lock);
-	static_branch_disable(&deferred_pages);
+	deferred_zone_grow = false;
 	spin_unlock(&deferred_zone_grow_lock);
+	static_branch_disable(&deferred_pages);
 
 	/* There will be num_node_state(N_MEMORY) threads */
 	atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: might_sleep warning
  2018-03-06 19:20 [PATCH] mm: might_sleep warning Pavel Tatashin
@ 2018-03-06 20:36 ` Andrew Morton
       [not found]   ` <CAGM2rea1raxsXDkqZgmmdBiuywp1M3y1p++=J893VJDgGDWLnQ@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2018-03-06 20:36 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, m.mizuma, mhocko,
	catalin.marinas, takahiro.akashi, gi-oh.kim, heiko.carstens,
	baiyaowei, richard.weiyang, paul.burton, miles.chen, vbabka,
	mgorman, hannes, linux-kernel, linux-mm

On Tue,  6 Mar 2018 14:20:22 -0500 Pavel Tatashin <pasha.tatashin@oracle.com> wrote:

> Robot reported this issue:
> https://lkml.org/lkml/2018/2/27/851
> 
> That is introduced by:
> mm: initialize pages on demand during boot
> 
> The problem is caused by changing static branch value within spin lock.
> Spin lock disables preemption, and changing static branch value takes
> mutex lock in its path, and thus may sleep.
> 
> The fix is to add another boolean variable to avoid the need to change
> static branch within spinlock.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1579,6 +1579,7 @@ static int __init deferred_init_memmap(void *data)
>   * page_alloc_init_late() soon after smp_init() is complete.
>   */
>  static __initdata DEFINE_SPINLOCK(deferred_zone_grow_lock);
> +static bool deferred_zone_grow __initdata = true;
>  static DEFINE_STATIC_KEY_TRUE(deferred_pages);
>  
>  /*
> @@ -1616,7 +1617,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
>  	 * Bail if we raced with another thread that disabled on demand
>  	 * initialization.
>  	 */
> -	if (!static_branch_unlikely(&deferred_pages)) {
> +	if (!static_branch_unlikely(&deferred_pages) || !deferred_zone_grow) {
>  		spin_unlock_irqrestore(&deferred_zone_grow_lock, flags);
>  		return false;
>  	}
> @@ -1683,10 +1684,15 @@ void __init page_alloc_init_late(void)
>  	/*
>  	 * We are about to initialize the rest of deferred pages, permanently
>  	 * disable on-demand struct page initialization.
> +	 *
> +	 * Note: it is prohibited to modify static branches in non-preemptible
> +	 * context. Since, spin_lock() disables preemption, we must use an
> +	 * extra boolean deferred_zone_grow.
>  	 */
>  	spin_lock(&deferred_zone_grow_lock);
> -	static_branch_disable(&deferred_pages);
> +	deferred_zone_grow = false;
>  	spin_unlock(&deferred_zone_grow_lock);
> +	static_branch_disable(&deferred_pages);
>  
>  	/* There will be num_node_state(N_MEMORY) threads */
>  	atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));

Kinda ugly, but I can see the logic behind the decisions.

Can we instead turn deferred_zone_grow_lock into a mutex?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: might_sleep warning
       [not found]   ` <CAGM2rea1raxsXDkqZgmmdBiuywp1M3y1p++=J893VJDgGDWLnQ@mail.gmail.com>
@ 2018-03-06 20:56     ` Andrew Morton
  2018-03-06 21:04       ` Pavel Tatashin
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2018-03-06 20:56 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Steven Sistare, Daniel Jordan, Masayoshi Mizuma, Michal Hocko,
	Catalin Marinas, AKASHI Takahiro, Gioh Kim, Heiko Carstens,
	Yaowei Bai, Wei Yang, Paul Burton, Miles Chen, Vlastimil Babka,
	Mel Gorman, Johannes Weiner, LKML, linux-mm

On Tue, 6 Mar 2018 15:48:26 -0500 Pavel Tatashin <pasha.tatashin@oracle.com> wrote:

> On Tue, Mar 6, 2018 at 3:36 PM, Andrew Morton <akpm@linux-foundation.org>
> wrote:
> 
> > On Tue,  6 Mar 2018 14:20:22 -0500 Pavel Tatashin <
> > pasha.tatashin@oracle.com> wrote:
> >
> > >       spin_lock(&deferred_zone_grow_lock);
> > > -     static_branch_disable(&deferred_pages);
> > > +     deferred_zone_grow = false;
> > >       spin_unlock(&deferred_zone_grow_lock);
> > > +     static_branch_disable(&deferred_pages);
> > >
> > >       /* There will be num_node_state(N_MEMORY) threads */
> > >       atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));
> >
> > Kinda ugly, but I can see the logic behind the decisions.
> >
> > Can we instead turn deferred_zone_grow_lock into a mutex?

(top-posting repaired.  Please don't top-post).

> [CCed everyone]
> 
> Hi Andrew,
> 
> I afraid we cannot change this spinlock to mutex
> because deferred_grow_zone() might be called from an interrupt context if
> interrupt thread needs to allocate memory.
> 

OK.  But if deferred_grow_zone() can be called from interrupt then
page_alloc_init_late() should be using spin_lock_irq(), shouldn't it? 
I'm surprised that lockdep didn't detect that.


--- a/mm/page_alloc.c~mm-initialize-pages-on-demand-during-boot-fix-4-fix
+++ a/mm/page_alloc.c
@@ -1689,9 +1689,9 @@ void __init page_alloc_init_late(void)
 	 * context. Since, spin_lock() disables preemption, we must use an
 	 * extra boolean deferred_zone_grow.
 	 */
-	spin_lock(&deferred_zone_grow_lock);
+	spin_lock_irq(&deferred_zone_grow_lock);
 	deferred_zone_grow = false;
-	spin_unlock(&deferred_zone_grow_lock);
+	spin_unlock_irq(&deferred_zone_grow_lock);
 	static_branch_disable(&deferred_pages);
 
 	/* There will be num_node_state(N_MEMORY) threads */
_

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: might_sleep warning
  2018-03-06 20:56     ` Andrew Morton
@ 2018-03-06 21:04       ` Pavel Tatashin
  2018-03-06 21:21         ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Tatashin @ 2018-03-06 21:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Steven Sistare, Daniel Jordan, Masayoshi Mizuma, Michal Hocko,
	Catalin Marinas, AKASHI Takahiro, Gioh Kim, Heiko Carstens,
	Yaowei Bai, Wei Yang, Paul Burton, Miles Chen, Vlastimil Babka,
	Mel Gorman, Johannes Weiner, LKML, linux-mm

> > > >       spin_lock(&deferred_zone_grow_lock);
> > > > -     static_branch_disable(&deferred_pages);
> > > > +     deferred_zone_grow = false;
> > > >       spin_unlock(&deferred_zone_grow_lock);
> > > > +     static_branch_disable(&deferred_pages);
> > > >
> > > >       /* There will be num_node_state(N_MEMORY) threads */
> > > >       atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));
> > >
> > > Kinda ugly, but I can see the logic behind the decisions.
> > >
> > > Can we instead turn deferred_zone_grow_lock into a mutex?
>
> (top-posting repaired.  Please don't top-post).
>
> > [CCed everyone]
> >
> > Hi Andrew,
> >
> > I afraid we cannot change this spinlock to mutex
> > because deferred_grow_zone() might be called from an interrupt context if
> > interrupt thread needs to allocate memory.
> >
>
> OK.  But if deferred_grow_zone() can be called from interrupt then
> page_alloc_init_late() should be using spin_lock_irq(), shouldn't it?
> I'm surprised that lockdep didn't detect that.

No, page_alloc_init_late()  cannot be called from interrupt, it is
called straight from:
kernel_init_freeable(). But, I believe deferred_grow_zone(): can be called:

get_page_from_freelist()
 _deferred_grow_zone()
   deferred_grow_zone()


>
>
>
> --- a/mm/page_alloc.c~mm-initialize-pages-on-demand-during-boot-fix-4-fix
> +++ a/mm/page_alloc.c
> @@ -1689,9 +1689,9 @@ void __init page_alloc_init_late(void)
>          * context. Since, spin_lock() disables preemption, we must use an
>          * extra boolean deferred_zone_grow.
>          */
> -       spin_lock(&deferred_zone_grow_lock);
> +       spin_lock_irq(&deferred_zone_grow_lock);
>         deferred_zone_grow = false;
> -       spin_unlock(&deferred_zone_grow_lock);
> +       spin_unlock_irq(&deferred_zone_grow_lock);
>         static_branch_disable(&deferred_pages);
>
>         /* There will be num_node_state(N_MEMORY) threads */
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: might_sleep warning
  2018-03-06 21:04       ` Pavel Tatashin
@ 2018-03-06 21:21         ` Andrew Morton
  2018-03-06 21:48           ` Pavel Tatashin
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2018-03-06 21:21 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Steven Sistare, Daniel Jordan, Masayoshi Mizuma, Michal Hocko,
	Catalin Marinas, AKASHI Takahiro, Gioh Kim, Heiko Carstens,
	Yaowei Bai, Wei Yang, Paul Burton, Miles Chen, Vlastimil Babka,
	Mel Gorman, Johannes Weiner, LKML, linux-mm

On Tue, 6 Mar 2018 16:04:06 -0500 Pavel Tatashin <pasha.tatashin@oracle.com> wrote:

> > > > >       spin_lock(&deferred_zone_grow_lock);
> > > > > -     static_branch_disable(&deferred_pages);
> > > > > +     deferred_zone_grow = false;
> > > > >       spin_unlock(&deferred_zone_grow_lock);
> > > > > +     static_branch_disable(&deferred_pages);
> > > > >
> > > > >       /* There will be num_node_state(N_MEMORY) threads */
> > > > >       atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));
> > > >
> > > > Kinda ugly, but I can see the logic behind the decisions.
> > > >
> > > > Can we instead turn deferred_zone_grow_lock into a mutex?
> >
> > (top-posting repaired.  Please don't top-post).
> >
> > > [CCed everyone]
> > >
> > > Hi Andrew,
> > >
> > > I afraid we cannot change this spinlock to mutex
> > > because deferred_grow_zone() might be called from an interrupt context if
> > > interrupt thread needs to allocate memory.
> > >
> >
> > OK.  But if deferred_grow_zone() can be called from interrupt then
> > page_alloc_init_late() should be using spin_lock_irq(), shouldn't it?
> > I'm surprised that lockdep didn't detect that.
> 
> No, page_alloc_init_late()  cannot be called from interrupt, it is
> called straight from:
> kernel_init_freeable(). But, I believe deferred_grow_zone(): can be called:
> 
> get_page_from_freelist()
>  _deferred_grow_zone()
>    deferred_grow_zone()

That's why page_alloc_init_late() needs spin_lock_irq().  If a CPU is
holding deferred_zone_grow_lock with enabled interrupts and an
interrupt comes in on that CPU and the CPU runs deferred_grow_zone() in
its interrupt handler, we deadlock.

lockdep knows about this bug and should have reported it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: might_sleep warning
  2018-03-06 21:21         ` Andrew Morton
@ 2018-03-06 21:48           ` Pavel Tatashin
  0 siblings, 0 replies; 6+ messages in thread
From: Pavel Tatashin @ 2018-03-06 21:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Steven Sistare, Daniel Jordan, Masayoshi Mizuma, Michal Hocko,
	Catalin Marinas, AKASHI Takahiro, Gioh Kim, Heiko Carstens,
	Yaowei Bai, Wei Yang, Paul Burton, Miles Chen, Vlastimil Babka,
	Mel Gorman, Johannes Weiner, LKML, linux-mm

> That's why page_alloc_init_late() needs spin_lock_irq().  If a CPU is
> holding deferred_zone_grow_lock with enabled interrupts and an
> interrupt comes in on that CPU and the CPU runs deferred_grow_zone() in
> its interrupt handler, we deadlock.
> 
> lockdep knows about this bug and should have reported it.
> 

I see what you are saying. Yes you are correct, we need spin_lock_irq() 
in page_alloc_init_late(). I will update the patch. I am not sure why 
lockdep has not reported it. May be it is initialized after this code is 
executed?

Thank you,
Pavel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-06 21:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-06 19:20 [PATCH] mm: might_sleep warning Pavel Tatashin
2018-03-06 20:36 ` Andrew Morton
     [not found]   ` <CAGM2rea1raxsXDkqZgmmdBiuywp1M3y1p++=J893VJDgGDWLnQ@mail.gmail.com>
2018-03-06 20:56     ` Andrew Morton
2018-03-06 21:04       ` Pavel Tatashin
2018-03-06 21:21         ` Andrew Morton
2018-03-06 21:48           ` Pavel Tatashin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).