linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Donald Buczek <buczek@molgen.mpg.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	dvteam@molgen.mpg.de, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	peterz@infradead.org
Subject: Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node`
Date: Wed, 30 Nov 2016 06:29:55 -0800	[thread overview]
Message-ID: <20161130142955.GS3924@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161130131910.GF18432@dhcp22.suse.cz>

On Wed, Nov 30, 2016 at 02:19:10PM +0100, Michal Hocko wrote:
> On Wed 30-11-16 03:53:20, Paul E. McKenney wrote:
> > On Wed, Nov 30, 2016 at 12:09:44PM +0100, Michal Hocko wrote:
> > > [CCing Paul]
> > > 
> > > On Wed 30-11-16 11:28:34, Donald Buczek wrote:
> > > [...]
> > > > shrink_active_list gets and releases the spinlock and calls cond_resched().
> > > > This should give other tasks a chance to run. Just as an experiment, I'm
> > > > trying
> > > > 
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -1921,7 +1921,7 @@ static void shrink_active_list(unsigned long
> > > > nr_to_scan,
> > > >         spin_unlock_irq(&pgdat->lru_lock);
> > > > 
> > > >         while (!list_empty(&l_hold)) {
> > > > -               cond_resched();
> > > > +               cond_resched_rcu_qs();
> > > >                 page = lru_to_page(&l_hold);
> > > >                 list_del(&page->lru);
> > > > 
> > > > and didn't hit a rcu_sched warning for >21 hours uptime now. We'll see.
> > > 
> > > This is really interesting! Is it possible that the RCU stall detector
> > > is somehow confused?
> > 
> > No, it is not confused.  Again, cond_resched() is not a quiescent
> > state unless it does a context switch.  Therefore, if the task running
> > in that loop was the only runnable task on its CPU, cond_resched()
> > would -never- provide RCU with a quiescent state.
> 
> Sorry for being dense here. But why cannot we hide the QS handling into
> cond_resched()? I mean doesn't every current usage of cond_resched
> suffer from the same problem wrt RCU stalls?

We can, and you are correct that cond_resched() does not unconditionally
supply RCU quiescent states, and never has.  Last time I tried to add
cond_resched_rcu_qs() semantics to cond_resched(), I got told "no",
but perhaps it is time to try again.

One of the challenges is that there are two different timeframes.
If we want CONFIG_PREEMPT=n kernels to have millisecond-level scheduling
latencies, we need a cond_resched() more than once per millisecond, and
the usual uncertainties will mean more like once per hundred microseconds
or so.  In contrast, the occasional 100-millisecond RCU grace period when
under heavy load is normally not considered to be a problem, which means
that a cond_resched_rcu_qs() every 10 milliseconds or so is just fine.

Which means that cond_resched() is much more sensitive to overhead
than is cond_resched_rcu_qs().

No reason not to give it another try, though!  (Adding Peter Zijlstra
to CC for his reactions.)

Right now, the added overhead is a function call, two tests of per-CPU
variables, one increment of a per-CPU variable, and a barrier() before
and after.  I could probably combine the tests, but I do need at least
one test.  I cannot see how I can eliminate either barrier().  I might
be able to pull the increment under the test.

The patch below is instead very straightforward, avoiding any
optimizations.  Untested, probably does not even build.

Failing this approach, the rule is as follows:

1.	Add cond_resched() to in-kernel loops that cause excessive
	scheduling latencies.

2.	Add cond_resched_rcu_qs() to in-kernel loops that cause
	RCU CPU stall warnings.

							Thanx, Paul

> > In contrast, cond_resched_rcu_qs() unconditionally provides RCU
> > with a quiescent state (hence the _rcu_qs in its name), regardless
> > of whether or not a context switch happens.
> > 
> > It is therefore expected behavior that this change might prevent
> > RCU CPU stall warnings.

------------------------------------------------------------------------

commit d7100358d066cd7d64301a2da161390e9f4aa63f
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Nov 30 06:24:30 2016 -0800

    sched,rcu: Make cond_resched() provide RCU quiescent state
    
    There is some confusion as to which of cond_resched() or
    cond_resched_rcu_qs() should be added to long in-kernel loops.
    This commit therefore eliminates the decision by adding RCU
    quiescent states to cond_resched().
    
    Warning: This is a prototype.  For example, it does not correctly
    handle Tasks RCU.  Which is OK for the moment, given that no one
    actually uses Tasks RCU yet.
    
    Reported-by: Michal Hocko <mhocko@kernel.org>
    Not-yet-signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b0ec92..ccdb6064884e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3308,10 +3308,11 @@ static inline int signal_pending_state(long state, struct task_struct *p)
  * cond_resched_lock() will drop the spinlock before scheduling,
  * cond_resched_softirq() will enable bhs before scheduling.
  */
+void rcu_all_qs(void);
 #ifndef CONFIG_PREEMPT
 extern int _cond_resched(void);
 #else
-static inline int _cond_resched(void) { return 0; }
+static inline int _cond_resched(void) { rcu_all_qs(); return 0; }
 #endif
 
 #define cond_resched() ({			\
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 94732d1ab00a..40b690813b80 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4906,6 +4906,7 @@ int __sched _cond_resched(void)
 		preempt_schedule_common();
 		return 1;
 	}
+	rcu_all_qs();
 	return 0;
 }
 EXPORT_SYMBOL(_cond_resched);

  reply	other threads:[~2016-11-30 14:30 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <24c226a5-1a4a-173e-8b4e-5107a2baac04@molgen.mpg.de>
2016-11-08 12:22 ` INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node` Paul Menzel
2016-11-08 17:03   ` Paul E. McKenney
2016-11-08 17:38     ` Paul Menzel
2016-11-08 18:39       ` Paul E. McKenney
2016-11-16 17:01         ` Paul Menzel
2016-11-16 17:30           ` Paul E. McKenney
2016-11-21 13:41             ` Michal Hocko
2016-11-21 14:01               ` Paul E. McKenney
2016-11-21 14:18                 ` Michal Hocko
2016-11-21 14:29                   ` Paul E. McKenney
2016-11-21 15:35                     ` Donald Buczek
2016-11-24 10:15                       ` Michal Hocko
2016-11-24 18:50                         ` Donald Buczek
2016-11-27  9:37                           ` Paul Menzel
2016-11-27  5:32                         ` Christopher S. Aker
2016-11-27  9:19                         ` Donald Buczek
2016-11-28 11:04                           ` Michal Hocko
2016-11-28 12:26                             ` Paul Menzel
2016-11-30 10:28                               ` Donald Buczek
2016-11-30 11:09                                 ` Michal Hocko
2016-11-30 11:43                                   ` Donald Buczek
2016-12-02  9:14                                     ` Donald Buczek
2016-12-06  8:32                                       ` Donald Buczek
2016-11-30 11:53                                   ` Paul E. McKenney
2016-11-30 11:54                                     ` Paul E. McKenney
2016-11-30 12:31                                       ` Paul Menzel
2016-11-30 14:31                                         ` Paul E. McKenney
2016-11-30 13:19                                     ` Michal Hocko
2016-11-30 14:29                                       ` Paul E. McKenney [this message]
2016-11-30 16:38                                         ` Peter Zijlstra
2016-11-30 17:02                                           ` Paul E. McKenney
2016-11-30 17:05                                           ` Michal Hocko
2016-11-30 17:23                                             ` Paul E. McKenney
2016-11-30 17:34                                               ` Michal Hocko
2016-11-30 17:50                                             ` Peter Zijlstra
2016-11-30 19:40                                               ` Paul E. McKenney
2016-12-01  5:30                                                 ` Peter Zijlstra
2016-12-01 12:40                                                   ` Paul E. McKenney
2016-12-01 16:36                                                     ` Peter Zijlstra
2016-12-01 16:59                                                       ` Paul E. McKenney
2016-12-01 18:09                                                         ` Peter Zijlstra
2016-12-01 18:42                                                           ` Paul E. McKenney
2016-12-01 18:49                                                             ` Peter Zijlstra
     [not found] <20161125212000.GI31360@linux.vnet.ibm.com>
     [not found] ` <20161128095825.GI14788@dhcp22.suse.cz>
     [not found]   ` <20161128105425.GY31360@linux.vnet.ibm.com>
     [not found]     ` <3a4242cb-0198-0a3b-97ae-536fb5ff83ec@kernelpanic.ru>
     [not found]       ` <20161128143435.GC3924@linux.vnet.ibm.com>
     [not found]         ` <eba1571e-f7a8-09b3-5516-c2bc35b38a83@kernelpanic.ru>
     [not found]           ` <20161128150509.GG3924@linux.vnet.ibm.com>
     [not found]             ` <66fd50e1-a922-846a-f427-7654795bd4b5@kernelpanic.ru>
     [not found]               ` <20161130174802.GM18432@dhcp22.suse.cz>
     [not found]                 ` <fd34243c-2ebf-c14b-55e6-684a9dc614e7@kernelpanic.ru>
     [not found]                   ` <20161130182552.GN18432@dhcp22.suse.cz>
2016-12-01 18:10                     ` Boris Zhmurov
2016-12-01 19:39                       ` Paul E. McKenney
2016-12-02  9:37                       ` Michal Hocko
2016-12-02 13:52                         ` Paul E. McKenney
2016-12-02 16:39                       ` Boris Zhmurov
2016-12-02 16:44                         ` Paul E. McKenney
2016-12-02 17:02                           ` Michal Hocko
2016-12-02 17:15                             ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161130142955.GS3924@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=buczek@molgen.mpg.de \
    --cc=dvteam@molgen.mpg.de \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).