Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node`

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Donald Buczek <buczek@molgen.mpg.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	dvteam@molgen.mpg.de, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>
Subject: Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node`
Date: Wed, 30 Nov 2016 18:50:16 +0100	[thread overview]
Message-ID: <20161130175015.GR3092@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20161130170557.GK18432@dhcp22.suse.cz>

On Wed, Nov 30, 2016 at 06:05:57PM +0100, Michal Hocko wrote:
> On Wed 30-11-16 17:38:20, Peter Zijlstra wrote:
> > On Wed, Nov 30, 2016 at 06:29:55AM -0800, Paul E. McKenney wrote:
> > > We can, and you are correct that cond_resched() does not unconditionally
> > > supply RCU quiescent states, and never has.  Last time I tried to add
> > > cond_resched_rcu_qs() semantics to cond_resched(), I got told "no",
> > > but perhaps it is time to try again.
> > 
> > Well, you got told: "ARRGH my benchmark goes all regress", or something
> > along those lines. Didn't we recently dig out those commits for some
> > reason or other?
> > 
> > Finding out what benchmark that was and running it against this patch
> > would make sense.

See commit:

  4a81e8328d37 ("rcu: Reduce overhead of cond_resched() checks for RCU")

Someone actually wrote down what the problem was.

> > Also, I seem to have missed, why are we going through this again?
> 
> Well, the point I've brought that up is because having basically two
> APIs for cond_resched is more than confusing. Basically all longer in
> kernel loops do cond_resched() but it seems that this will not help the
> silence RCU lockup detector in rare cases where nothing really wants to
> schedule. I am really not sure whether we want to sprinkle
> cond_resched_rcu_qs at random places just to silence RCU detector...

Right.. now, this is obviously all PREEMPT=n code, which therefore also
implies this is rcu-sched.

Paul, now doesn't rcu-sched, when the grace-period has been long in
coming, try and force it? And doesn't that forcing include prodding CPUs
with resched_cpu() ?

I'm thinking not, because if it did, that would make cond_resched()
actually schedule, which would then call into rcu_note_context_switch()
which would then make RCU progress, no?

WARNING: multiple messages have this Message-ID (diff)

From: Peter Zijlstra <peterz@infradead.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Donald Buczek <buczek@molgen.mpg.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	dvteam@molgen.mpg.de, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>
Subject: Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node`
Date: Wed, 30 Nov 2016 18:50:16 +0100	[thread overview]
Message-ID: <20161130175015.GR3092@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20161130170557.GK18432@dhcp22.suse.cz>

On Wed, Nov 30, 2016 at 06:05:57PM +0100, Michal Hocko wrote:
> On Wed 30-11-16 17:38:20, Peter Zijlstra wrote:
> > On Wed, Nov 30, 2016 at 06:29:55AM -0800, Paul E. McKenney wrote:
> > > We can, and you are correct that cond_resched() does not unconditionally
> > > supply RCU quiescent states, and never has.  Last time I tried to add
> > > cond_resched_rcu_qs() semantics to cond_resched(), I got told "no",
> > > but perhaps it is time to try again.
> > 
> > Well, you got told: "ARRGH my benchmark goes all regress", or something
> > along those lines. Didn't we recently dig out those commits for some
> > reason or other?
> > 
> > Finding out what benchmark that was and running it against this patch
> > would make sense.

See commit:

  4a81e8328d37 ("rcu: Reduce overhead of cond_resched() checks for RCU")

Someone actually wrote down what the problem was.

> > Also, I seem to have missed, why are we going through this again?
> 
> Well, the point I've brought that up is because having basically two
> APIs for cond_resched is more than confusing. Basically all longer in
> kernel loops do cond_resched() but it seems that this will not help the
> silence RCU lockup detector in rare cases where nothing really wants to
> schedule. I am really not sure whether we want to sprinkle
> cond_resched_rcu_qs at random places just to silence RCU detector...

Right.. now, this is obviously all PREEMPT=n code, which therefore also
implies this is rcu-sched.

Paul, now doesn't rcu-sched, when the grace-period has been long in
coming, try and force it? And doesn't that forcing include prodding CPUs
with resched_cpu() ?

I'm thinking not, because if it did, that would make cond_resched()
actually schedule, which would then call into rcu_note_context_switch()
which would then make RCU progress, no?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-11-30 17:50 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <24c226a5-1a4a-173e-8b4e-5107a2baac04@molgen.mpg.de>
2016-11-08 12:22 ` INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node` Paul Menzel
2016-11-08 17:03   ` Paul E. McKenney
2016-11-08 17:38     ` Paul Menzel
2016-11-08 18:39       ` Paul E. McKenney
2016-11-16 17:01         ` Paul Menzel
2016-11-16 17:30           ` Paul E. McKenney
2016-11-21 13:41             ` Michal Hocko
2016-11-21 14:01               ` Paul E. McKenney
2016-11-21 14:18                 ` Michal Hocko
2016-11-21 14:29                   ` Paul E. McKenney
2016-11-21 15:35                     ` Donald Buczek
2016-11-24 10:15                       ` Michal Hocko
2016-11-24 18:50                         ` Donald Buczek
2016-11-27  9:37                           ` Paul Menzel
2016-11-27  5:32                         ` Christopher S. Aker
2016-11-27  9:19                         ` Donald Buczek
2016-11-28 11:04                           ` Michal Hocko
2016-11-28 12:26                             ` Paul Menzel
2016-11-28 12:26                               ` Paul Menzel
2016-11-30 10:28                               ` Donald Buczek
2016-11-30 10:28                                 ` Donald Buczek
2016-11-30 11:09                                 ` Michal Hocko
2016-11-30 11:09                                   ` Michal Hocko
2016-11-30 11:43                                   ` Donald Buczek
2016-11-30 11:43                                     ` Donald Buczek
2016-12-02  9:14                                     ` Donald Buczek
2016-12-02  9:14                                       ` Donald Buczek
2016-12-06  8:32                                       ` Donald Buczek
2016-12-06  8:32                                         ` Donald Buczek
2016-11-30 11:53                                   ` Paul E. McKenney
2016-11-30 11:53                                     ` Paul E. McKenney
2016-11-30 11:54                                     ` Paul E. McKenney
2016-11-30 11:54                                       ` Paul E. McKenney
2016-11-30 12:31                                       ` Paul Menzel
2016-11-30 12:31                                         ` Paul Menzel
2016-11-30 14:31                                         ` Paul E. McKenney
2016-11-30 14:31                                           ` Paul E. McKenney
2016-11-30 13:19                                     ` Michal Hocko
2016-11-30 13:19                                       ` Michal Hocko
2016-11-30 14:29                                       ` Paul E. McKenney
2016-11-30 14:29                                         ` Paul E. McKenney
2016-11-30 16:38                                         ` Peter Zijlstra
2016-11-30 16:38                                           ` Peter Zijlstra
2016-11-30 17:02                                           ` Paul E. McKenney
2016-11-30 17:02                                             ` Paul E. McKenney
2016-11-30 17:05                                           ` Michal Hocko
2016-11-30 17:05                                             ` Michal Hocko
2016-11-30 17:23                                             ` Paul E. McKenney
2016-11-30 17:23                                               ` Paul E. McKenney
2016-11-30 17:34                                               ` Michal Hocko
2016-11-30 17:34                                                 ` Michal Hocko
2016-11-30 17:50                                             ` Peter Zijlstra [this message]
2016-11-30 17:50                                               ` Peter Zijlstra
2016-11-30 19:40                                               ` Paul E. McKenney
2016-11-30 19:40                                                 ` Paul E. McKenney
2016-12-01  5:30                                                 ` Peter Zijlstra
2016-12-01  5:30                                                   ` Peter Zijlstra
2016-12-01 12:40                                                   ` Paul E. McKenney
2016-12-01 12:40                                                     ` Paul E. McKenney
2016-12-01 16:36                                                     ` Peter Zijlstra
2016-12-01 16:36                                                       ` Peter Zijlstra
2016-12-01 16:59                                                       ` Paul E. McKenney
2016-12-01 16:59                                                         ` Paul E. McKenney
2016-12-01 18:09                                                         ` Peter Zijlstra
2016-12-01 18:09                                                           ` Peter Zijlstra
2016-12-01 18:42                                                           ` Paul E. McKenney
2016-12-01 18:42                                                             ` Paul E. McKenney
2016-12-01 18:49                                                             ` Peter Zijlstra
2016-12-01 18:49                                                               ` Peter Zijlstra
     [not found] <d6981bac-8e97-b482-98c0-40949db03ca3@kernelpanic.ru>
     [not found] ` <20161124133019.GE3612@linux.vnet.ibm.com>
     [not found]   ` <de88a72a-f861-b51f-9fb3-4265378702f1@kernelpanic.ru>
     [not found]     ` <20161125212000.GI31360@linux.vnet.ibm.com>
     [not found]       ` <20161128095825.GI14788@dhcp22.suse.cz>
     [not found]         ` <20161128105425.GY31360@linux.vnet.ibm.com>
     [not found]           ` <3a4242cb-0198-0a3b-97ae-536fb5ff83ec@kernelpanic.ru>
     [not found]             ` <20161128143435.GC3924@linux.vnet.ibm.com>
2016-11-28 14:40               ` Boris Zhmurov
2016-11-28 15:05                 ` Paul E. McKenney
2016-11-28 19:16                   ` Boris Zhmurov
2016-11-29 18:59                     ` Paul E. McKenney
2016-11-30 17:41                   ` Boris Zhmurov
2016-11-30 17:48                     ` Michal Hocko
2016-11-30 18:12                       ` Boris Zhmurov
2016-11-30 18:25                         ` Michal Hocko
2016-11-30 18:26                           ` Boris Zhmurov
2016-12-01 18:10                           ` Boris Zhmurov
2016-12-01 19:39                             ` Paul E. McKenney
2016-12-01 19:39                               ` Paul E. McKenney
2016-12-02  9:37                             ` Michal Hocko
2016-12-02  9:37                               ` Michal Hocko
2016-12-02 13:52                               ` Paul E. McKenney
2016-12-02 13:52                                 ` Paul E. McKenney
2016-12-02 16:39                             ` Boris Zhmurov
2016-12-02 16:39                               ` Boris Zhmurov
2016-12-02 16:44                               ` Paul E. McKenney
2016-12-02 16:44                                 ` Paul E. McKenney
2016-12-02 17:02                                 ` Michal Hocko
2016-12-02 17:02                                   ` Michal Hocko
2016-12-02 17:15                                   ` Paul E. McKenney
2016-12-02 17:15                                     ` Paul E. McKenney
2016-11-30 19:42                         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161130175015.GR3092@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=buczek@molgen.mpg.de \
    --cc=dvteam@molgen.mpg.de \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.