All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Dmitry Vyukov <dvyukov@google.com>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	syzkaller <syzkaller@googlegroups.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc
Date: Tue, 7 Feb 2017 13:37:08 +0100	[thread overview]
Message-ID: <20170207123708.GO5065@dhcp22.suse.cz> (raw)
In-Reply-To: <20170207114327.GI5065@dhcp22.suse.cz>

On Tue 07-02-17 12:43:27, Michal Hocko wrote:
> On Tue 07-02-17 11:34:35, Mel Gorman wrote:
> > On Tue, Feb 07, 2017 at 11:35:52AM +0100, Michal Hocko wrote:
> > > On Tue 07-02-17 10:28:09, Mel Gorman wrote:
> > > > On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote:
> > > > > On 02/07/2017 10:43 AM, Mel Gorman wrote:
> > > > > > If I'm reading this right, a hot-remove will set the pool POOL_DISASSOCIATED
> > > > > > and unbound. A workqueue queued for draining get migrated during hot-remove
> > > > > > and a drain operation will execute twice on a CPU -- one for what was
> > > > > > queued and a second time for the CPU it was migrated from. It should still
> > > > > > work with flush_work which doesn't appear to block forever if an item
> > > > > > got migrated to another workqueue. The actual drain workqueue function is
> > > > > > using the CPU ID it's currently running on so it shouldn't get confused.
> > > > > 
> > > > > Is the worker that will process this migrated workqueue also guaranteed
> > > > > to be pinned to a cpu for the whole work, though? drain_local_pages()
> > > > > needs that guarantee.
> > > > > 
> > > > 
> > > > It should be by running on a workqueue handler bound to that CPU (queued
> > > > on wq->cpu_pwqs in __queue_work)
> > > 
> > > Are you sure? The comment in kernel/workqueue.c says
> > >          * While DISASSOCIATED, the cpu may be offline and all workers have
> > >          * %WORKER_UNBOUND set and concurrency management disabled, and may
> > >          * be executing on any CPU.  The pool behaves as an unbound one.
> > > 
> > > I might be misreadig but an unbound pool can be handled by workers which
> > > are not pinned on any cpu AFAIU.
> > 
> > Right. The unbind operation can set a mask that is any allowable CPU and
> > the final process_work is not done in a context that prevents
> > preemption.
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3b93879990fd..7af165d308c4 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2342,7 +2342,14 @@ void drain_local_pages(struct zone *zone)
> >  
> >  static void drain_local_pages_wq(struct work_struct *work)
> >  {
> > +	/*
> > +	 * Ordinarily a drain operation is bound to a CPU but may be unbound
> > +	 * after a CPU hotplug operation so it's necessary to disable
> > +	 * preemption for the drain to stabilise the CPU ID.
> > +	 */
> > +	preempt_disable();
> >  	drain_local_pages(NULL);
> > +	preempt_enable_no_resched();
> >  }
> >  
> >  /*
> [...]
> > @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu)
> >  {
> >  
> >  	lru_add_drain_cpu(cpu);
> > +
> > +	/*
> > +	 * A per-cpu drain via a workqueue from drain_all_pages can be
> > +	 * rescheduled onto an unrelated CPU. That allows the hotplug
> > +	 * operation and the drain to potentially race on the same
> > +	 * CPU. Serialise hotplug versus drain using pcpu_drain_mutex
> > +	 */
> > +	mutex_lock(&pcpu_drain_mutex);
> >  	drain_pages(cpu);
> > +	mutex_unlock(&pcpu_drain_mutex);
> 
> You cannot put sleepable lock inside the preempt disbaled section...
> We can make it a spinlock right?

Scratch that! For some reason I thought that cpu notifiers are run in an
atomic context. Now that I am checking the code again it turns out I was
wrong. __cpu_notify uses __raw_notifier_call_chain so this is not an
atomic context. Anyway, shouldn't be it sufficient to disable preemption
on drain_local_pages_wq? The CPU hotplug callback will not preempt us
and so we cannot work on the same cpus, right?

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Dmitry Vyukov <dvyukov@google.com>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	syzkaller <syzkaller@googlegroups.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc
Date: Tue, 7 Feb 2017 13:37:08 +0100	[thread overview]
Message-ID: <20170207123708.GO5065@dhcp22.suse.cz> (raw)
In-Reply-To: <20170207114327.GI5065@dhcp22.suse.cz>

On Tue 07-02-17 12:43:27, Michal Hocko wrote:
> On Tue 07-02-17 11:34:35, Mel Gorman wrote:
> > On Tue, Feb 07, 2017 at 11:35:52AM +0100, Michal Hocko wrote:
> > > On Tue 07-02-17 10:28:09, Mel Gorman wrote:
> > > > On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote:
> > > > > On 02/07/2017 10:43 AM, Mel Gorman wrote:
> > > > > > If I'm reading this right, a hot-remove will set the pool POOL_DISASSOCIATED
> > > > > > and unbound. A workqueue queued for draining get migrated during hot-remove
> > > > > > and a drain operation will execute twice on a CPU -- one for what was
> > > > > > queued and a second time for the CPU it was migrated from. It should still
> > > > > > work with flush_work which doesn't appear to block forever if an item
> > > > > > got migrated to another workqueue. The actual drain workqueue function is
> > > > > > using the CPU ID it's currently running on so it shouldn't get confused.
> > > > > 
> > > > > Is the worker that will process this migrated workqueue also guaranteed
> > > > > to be pinned to a cpu for the whole work, though? drain_local_pages()
> > > > > needs that guarantee.
> > > > > 
> > > > 
> > > > It should be by running on a workqueue handler bound to that CPU (queued
> > > > on wq->cpu_pwqs in __queue_work)
> > > 
> > > Are you sure? The comment in kernel/workqueue.c says
> > >          * While DISASSOCIATED, the cpu may be offline and all workers have
> > >          * %WORKER_UNBOUND set and concurrency management disabled, and may
> > >          * be executing on any CPU.  The pool behaves as an unbound one.
> > > 
> > > I might be misreadig but an unbound pool can be handled by workers which
> > > are not pinned on any cpu AFAIU.
> > 
> > Right. The unbind operation can set a mask that is any allowable CPU and
> > the final process_work is not done in a context that prevents
> > preemption.
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3b93879990fd..7af165d308c4 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2342,7 +2342,14 @@ void drain_local_pages(struct zone *zone)
> >  
> >  static void drain_local_pages_wq(struct work_struct *work)
> >  {
> > +	/*
> > +	 * Ordinarily a drain operation is bound to a CPU but may be unbound
> > +	 * after a CPU hotplug operation so it's necessary to disable
> > +	 * preemption for the drain to stabilise the CPU ID.
> > +	 */
> > +	preempt_disable();
> >  	drain_local_pages(NULL);
> > +	preempt_enable_no_resched();
> >  }
> >  
> >  /*
> [...]
> > @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu)
> >  {
> >  
> >  	lru_add_drain_cpu(cpu);
> > +
> > +	/*
> > +	 * A per-cpu drain via a workqueue from drain_all_pages can be
> > +	 * rescheduled onto an unrelated CPU. That allows the hotplug
> > +	 * operation and the drain to potentially race on the same
> > +	 * CPU. Serialise hotplug versus drain using pcpu_drain_mutex
> > +	 */
> > +	mutex_lock(&pcpu_drain_mutex);
> >  	drain_pages(cpu);
> > +	mutex_unlock(&pcpu_drain_mutex);
> 
> You cannot put sleepable lock inside the preempt disbaled section...
> We can make it a spinlock right?

Scratch that! For some reason I thought that cpu notifiers are run in an
atomic context. Now that I am checking the code again it turns out I was
wrong. __cpu_notify uses __raw_notifier_call_chain so this is not an
atomic context. Anyway, shouldn't be it sufficient to disable preemption
on drain_local_pages_wq? The CPU hotplug callback will not preempt us
and so we cannot work on the same cpus, right?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-02-07 12:37 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-29 12:44 mm: deadlock between get_online_cpus/pcpu_alloc Dmitry Vyukov
2017-01-29 12:44 ` Dmitry Vyukov
2017-01-29 17:22 ` Vlastimil Babka
2017-01-29 17:22   ` Vlastimil Babka
2017-01-30 15:48   ` Dmitry Vyukov
2017-01-30 15:48     ` Dmitry Vyukov
2017-02-06 19:13     ` Dmitry Vyukov
2017-02-06 19:13       ` Dmitry Vyukov
2017-02-06 22:05       ` Mel Gorman
2017-02-06 22:05         ` Mel Gorman
2017-02-07  8:48         ` Michal Hocko
2017-02-07  8:48           ` Michal Hocko
2017-02-07  9:23           ` Vlastimil Babka
2017-02-07  9:23             ` Vlastimil Babka
2017-02-07  9:46             ` Mel Gorman
2017-02-07  9:46               ` Mel Gorman
2017-02-07  9:53             ` Michal Hocko
2017-02-07  9:53               ` Michal Hocko
2017-02-07 10:42             ` Mel Gorman
2017-02-07 10:42               ` Mel Gorman
2017-02-07 11:13               ` Mel Gorman
2017-02-07 11:13                 ` Mel Gorman
2017-02-07  9:43           ` Mel Gorman
2017-02-07  9:43             ` Mel Gorman
2017-02-07  9:49             ` Vlastimil Babka
2017-02-07  9:49               ` Vlastimil Babka
2017-02-07 10:05               ` Michal Hocko
2017-02-07 10:05                 ` Michal Hocko
2017-02-07 10:28               ` Mel Gorman
2017-02-07 10:28                 ` Mel Gorman
2017-02-07 10:35                 ` Michal Hocko
2017-02-07 10:35                   ` Michal Hocko
2017-02-07 11:34                   ` Mel Gorman
2017-02-07 11:34                     ` Mel Gorman
2017-02-07 11:43                     ` Michal Hocko
2017-02-07 11:43                       ` Michal Hocko
2017-02-07 11:54                       ` Vlastimil Babka
2017-02-07 11:54                         ` Vlastimil Babka
2017-02-07 12:08                         ` Michal Hocko
2017-02-07 12:08                           ` Michal Hocko
2017-02-07 12:37                       ` Michal Hocko [this message]
2017-02-07 12:37                         ` Michal Hocko
2017-02-07 12:43                         ` Vlastimil Babka
2017-02-07 12:43                           ` Vlastimil Babka
2017-02-07 12:48                           ` Michal Hocko
2017-02-07 12:48                             ` Michal Hocko
2017-02-07 13:57                             ` Vlastimil Babka
2017-02-07 13:57                               ` Vlastimil Babka
2017-02-07 13:58                         ` Mel Gorman
2017-02-07 13:58                           ` Mel Gorman
2017-02-07 14:19                           ` Michal Hocko
2017-02-07 14:19                             ` Michal Hocko
2017-02-07 15:34                             ` Michal Hocko
2017-02-07 15:34                               ` Michal Hocko
2017-02-07 16:22                               ` Mel Gorman
2017-02-07 16:22                                 ` Mel Gorman
2017-02-07 16:41                                 ` Michal Hocko
2017-02-07 16:41                                   ` Michal Hocko
2017-02-07 16:55                                   ` Christoph Lameter
2017-02-07 16:55                                     ` Christoph Lameter
2017-02-07 22:25                                     ` Thomas Gleixner
2017-02-07 22:25                                       ` Thomas Gleixner
2017-02-08  7:35                                       ` Michal Hocko
2017-02-08  7:35                                         ` Michal Hocko
2017-02-08 12:02                                         ` Thomas Gleixner
2017-02-08 12:02                                           ` Thomas Gleixner
2017-02-08 12:21                                           ` Michal Hocko
2017-02-08 12:21                                             ` Michal Hocko
2017-02-08 12:26                                           ` Mel Gorman
2017-02-08 12:26                                             ` Mel Gorman
2017-02-08 13:23                                             ` Thomas Gleixner
2017-02-08 13:23                                               ` Thomas Gleixner
2017-02-08 14:03                                               ` Mel Gorman
2017-02-08 14:03                                                 ` Mel Gorman
2017-02-08 14:11                                                 ` Peter Zijlstra
2017-02-08 14:11                                                   ` Peter Zijlstra
2017-02-08 15:11                                         ` Christoph Lameter
2017-02-08 15:11                                           ` Christoph Lameter
2017-02-08 15:21                                           ` Michal Hocko
2017-02-08 15:21                                             ` Michal Hocko
2017-02-08 16:17                                             ` Christoph Lameter
2017-02-08 16:17                                               ` Christoph Lameter
2017-02-08 17:46                                               ` Thomas Gleixner
2017-02-08 17:46                                                 ` Thomas Gleixner
2017-02-09  3:15                                                 ` Christoph Lameter
2017-02-09  3:15                                                   ` Christoph Lameter
2017-02-09 11:42                                                   ` Thomas Gleixner
2017-02-09 11:42                                                     ` Thomas Gleixner
2017-02-09 14:00                                                     ` Christoph Lameter
2017-02-09 14:00                                                       ` Christoph Lameter
2017-02-09 14:53                                                       ` Thomas Gleixner
2017-02-09 14:53                                                         ` Thomas Gleixner
2017-02-09 15:42                                                         ` Christoph Lameter
2017-02-09 15:42                                                           ` Christoph Lameter
2017-02-09 16:12                                                           ` Thomas Gleixner
2017-02-09 16:12                                                             ` Thomas Gleixner
2017-02-09 17:22                                                             ` Christoph Lameter
2017-02-09 17:22                                                               ` Christoph Lameter
2017-02-09 17:40                                                               ` Thomas Gleixner
2017-02-09 17:40                                                                 ` Thomas Gleixner
2017-02-09 19:15                                                               ` Michal Hocko
2017-02-09 19:15                                                                 ` Michal Hocko
2017-02-10 17:58                                                                 ` Christoph Lameter
2017-02-10 17:58                                                                   ` Christoph Lameter
2017-02-08 15:06                                       ` Christoph Lameter
2017-02-08 15:06                                         ` Christoph Lameter
2017-02-07 17:03                               ` Tejun Heo
2017-02-07 17:03                                 ` Tejun Heo
2017-02-07 20:16                                 ` Michal Hocko
2017-02-07 20:16                                   ` Michal Hocko
2017-02-07 13:03                       ` Mel Gorman
2017-02-07 13:03                         ` Mel Gorman
2017-02-07 13:48                         ` Michal Hocko
2017-02-07 13:48                           ` Michal Hocko
2017-02-07 11:24         ` Tetsuo Handa
2017-02-07 11:24           ` Tetsuo Handa
2017-02-07  8:43       ` Michal Hocko
2017-02-07  8:43         ` Michal Hocko
2017-02-07 21:53       ` Thomas Gleixner
2017-02-07 21:53         ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170207123708.GO5065@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dvyukov@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=syzkaller@googlegroups.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.