All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>,
	linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	dvteam@molgen.mpg.de
Subject: Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node`
Date: Mon, 21 Nov 2016 06:01:22 -0800	[thread overview]
Message-ID: <20161121140122.GU3612@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161121134130.GB18112@dhcp22.suse.cz>

On Mon, Nov 21, 2016 at 02:41:31PM +0100, Michal Hocko wrote:
> On Wed 16-11-16 09:30:36, Paul E. McKenney wrote:
> > On Wed, Nov 16, 2016 at 06:01:19PM +0100, Paul Menzel wrote:
> > > Dear Linux folks,
> > > 
> > > 
> > > On 11/08/16 19:39, Paul E. McKenney wrote:
> > > >On Tue, Nov 08, 2016 at 06:38:18PM +0100, Paul Menzel wrote:
> > > >>On 11/08/16 18:03, Paul E. McKenney wrote:
> > > >>>On Tue, Nov 08, 2016 at 01:22:28PM +0100, Paul Menzel wrote:
> > > >>
> > > >>>>Could you please help me shedding some light into the messages below?
> > > >>>>
> > > >>>>With Linux 4.4.X, these messages were not seen. When updating to
> > > >>>>Linux 4.8.4, and Linux 4.8.6 they started to appear. In that
> > > >>>>version, we enabled several CGROUP options.
> > > >>>>
> > > >>>>>$ dmesg -T
> > > >>>>>[…]
> > > >>>>>[Mon Nov  7 15:09:45 2016] INFO: rcu_sched detected stalls on CPUs/tasks:
> > > >>>>>[Mon Nov  7 15:09:45 2016]     3-...: (493 ticks this GP) idle=515/140000000000000/0 softirq=5504423/5504423 fqs=13876
> > > >>>>>[Mon Nov  7 15:09:45 2016]     (detected by 5, t=60002 jiffies, g=1363193, c=1363192, q=268508)
> > > >>>>>[Mon Nov  7 15:09:45 2016] Task dump for CPU 3:
> > > >>>>>[Mon Nov  7 15:09:45 2016] kswapd1         R  running task        0    87      2 0x00000008
> > > >>>>>[Mon Nov  7 15:09:45 2016]  ffffffff81aabdfd ffff8810042a5cb8 ffff88080ad34000 ffff88080ad33dc8
> > > >>>>>[Mon Nov  7 15:09:45 2016]  ffff88080ad33d00 0000000000003501 0000000000000000 0000000000000000
> > > >>>>>[Mon Nov  7 15:09:45 2016]  0000000000000000 0000000000000000 0000000000022316 000000000002bc9f
> > > >>>>>[Mon Nov  7 15:09:45 2016] Call Trace:
> > > >>>>>[Mon Nov  7 15:09:45 2016]  [<ffffffff81aabdfd>] ? __schedule+0x21d/0x5b0
> > > >>>>>[Mon Nov  7 15:09:45 2016]  [<ffffffff81106dcf>] ? shrink_node+0xbf/0x1c0
> > > >>>>>[Mon Nov  7 15:09:45 2016]  [<ffffffff81107865>] ? kswapd+0x315/0x5f0
> > > >>>>>[Mon Nov  7 15:09:45 2016]  [<ffffffff81107550>] ? mem_cgroup_shrink_node+0x90/0x90
> > > >>>>>[Mon Nov  7 15:09:45 2016]  [<ffffffff8106c614>] ? kthread+0xc4/0xe0
> > > >>>>>[Mon Nov  7 15:09:45 2016]  [<ffffffff81aaf64f>] ? ret_from_fork+0x1f/0x40
> > > >>>>>[Mon Nov  7 15:09:45 2016]  [<ffffffff8106c550>] ? kthread_worker_fn+0x160/0x160
> > > >>>>
> > > >>>>Even after reading `stallwarn.txt` [1], I don’t know what could
> > > >>>>cause this. All items in the backtrace seem to belong to the Linux
> > > >>>>kernel.
> > > >>>>
> > > >>>>There is also nothing suspicious in the monitoring graphs during that time.
> > > >>>
> > > >>>If you let it be, do you get a later stall warning a few minutes later?
> > > >>>If so, how does the stack trace compare?
> > > >>
> > > >>With Linux 4.8.6 this is the only occurrence since yesterday.
> > > >>
> > > >>With Linux 4.8.3, and 4.8.4 the following stack traces were seen.
> > > >
> > > >Looks to me like one or both of the loops in shrink_node() need
> > > >an cond_resched_rcu_qs().
> > > 
> > > Thank you for the pointer. I haven’t had time yet to look into it.
> > 
> > In theory, it is quite straightforward, as shown by the patch below.
> > In practice, the MM guys might wish to call cond_resched_rcu_qs() less
> > frequently, but I will leave that to their judgment.  My guess is that
> > the overhead of the cond_resched_rcu_qs() is way down in the noise,
> > but I have been surprised in the past.
> > 
> > Anyway, please give this patch a try and let me know how it goes.
> 
> I am not seeing the full thread in my inbox but I am wondering what is
> actually going on here. The reclaim path (shrink_node_memcg resp.
> shrink_slab should have preemption points and there is not done much
> except of iterating over all memcgs other than that. Are there
> gazillions of memcgs configured (most of them with the low limit
> configured)? In other words is the system configured properly?
> 
> To the patch. I cannot say I would like it. cond_resched_rcu_qs sounds
> way too lowlevel for this usage. If anything cond_resched somewhere inside
> mem_cgroup_iter would be more appropriate to me.

Like this?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae052b5e3315..81cb30d5b2fc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -867,6 +867,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 out:
 	if (prev && prev != root)
 		css_put(&prev->css);
+	cond_resched_rcu_qs();
 
 	return memcg;
 }

  reply	other threads:[~2016-11-21 14:01 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <24c226a5-1a4a-173e-8b4e-5107a2baac04@molgen.mpg.de>
2016-11-08 12:22 ` INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node` Paul Menzel
2016-11-08 17:03   ` Paul E. McKenney
2016-11-08 17:38     ` Paul Menzel
2016-11-08 18:39       ` Paul E. McKenney
2016-11-16 17:01         ` Paul Menzel
2016-11-16 17:30           ` Paul E. McKenney
2016-11-21 13:41             ` Michal Hocko
2016-11-21 14:01               ` Paul E. McKenney [this message]
2016-11-21 14:18                 ` Michal Hocko
2016-11-21 14:29                   ` Paul E. McKenney
2016-11-21 15:35                     ` Donald Buczek
2016-11-24 10:15                       ` Michal Hocko
2016-11-24 18:50                         ` Donald Buczek
2016-11-27  9:37                           ` Paul Menzel
2016-11-27  5:32                         ` Christopher S. Aker
2016-11-27  9:19                         ` Donald Buczek
2016-11-28 11:04                           ` Michal Hocko
2016-11-28 12:26                             ` Paul Menzel
2016-11-28 12:26                               ` Paul Menzel
2016-11-30 10:28                               ` Donald Buczek
2016-11-30 10:28                                 ` Donald Buczek
2016-11-30 11:09                                 ` Michal Hocko
2016-11-30 11:09                                   ` Michal Hocko
2016-11-30 11:43                                   ` Donald Buczek
2016-11-30 11:43                                     ` Donald Buczek
2016-12-02  9:14                                     ` Donald Buczek
2016-12-02  9:14                                       ` Donald Buczek
2016-12-06  8:32                                       ` Donald Buczek
2016-12-06  8:32                                         ` Donald Buczek
2016-11-30 11:53                                   ` Paul E. McKenney
2016-11-30 11:53                                     ` Paul E. McKenney
2016-11-30 11:54                                     ` Paul E. McKenney
2016-11-30 11:54                                       ` Paul E. McKenney
2016-11-30 12:31                                       ` Paul Menzel
2016-11-30 12:31                                         ` Paul Menzel
2016-11-30 14:31                                         ` Paul E. McKenney
2016-11-30 14:31                                           ` Paul E. McKenney
2016-11-30 13:19                                     ` Michal Hocko
2016-11-30 13:19                                       ` Michal Hocko
2016-11-30 14:29                                       ` Paul E. McKenney
2016-11-30 14:29                                         ` Paul E. McKenney
2016-11-30 16:38                                         ` Peter Zijlstra
2016-11-30 16:38                                           ` Peter Zijlstra
2016-11-30 17:02                                           ` Paul E. McKenney
2016-11-30 17:02                                             ` Paul E. McKenney
2016-11-30 17:05                                           ` Michal Hocko
2016-11-30 17:05                                             ` Michal Hocko
2016-11-30 17:23                                             ` Paul E. McKenney
2016-11-30 17:23                                               ` Paul E. McKenney
2016-11-30 17:34                                               ` Michal Hocko
2016-11-30 17:34                                                 ` Michal Hocko
2016-11-30 17:50                                             ` Peter Zijlstra
2016-11-30 17:50                                               ` Peter Zijlstra
2016-11-30 19:40                                               ` Paul E. McKenney
2016-11-30 19:40                                                 ` Paul E. McKenney
2016-12-01  5:30                                                 ` Peter Zijlstra
2016-12-01  5:30                                                   ` Peter Zijlstra
2016-12-01 12:40                                                   ` Paul E. McKenney
2016-12-01 12:40                                                     ` Paul E. McKenney
2016-12-01 16:36                                                     ` Peter Zijlstra
2016-12-01 16:36                                                       ` Peter Zijlstra
2016-12-01 16:59                                                       ` Paul E. McKenney
2016-12-01 16:59                                                         ` Paul E. McKenney
2016-12-01 18:09                                                         ` Peter Zijlstra
2016-12-01 18:09                                                           ` Peter Zijlstra
2016-12-01 18:42                                                           ` Paul E. McKenney
2016-12-01 18:42                                                             ` Paul E. McKenney
2016-12-01 18:49                                                             ` Peter Zijlstra
2016-12-01 18:49                                                               ` Peter Zijlstra
     [not found] <d6981bac-8e97-b482-98c0-40949db03ca3@kernelpanic.ru>
     [not found] ` <20161124133019.GE3612@linux.vnet.ibm.com>
     [not found]   ` <de88a72a-f861-b51f-9fb3-4265378702f1@kernelpanic.ru>
     [not found]     ` <20161125212000.GI31360@linux.vnet.ibm.com>
     [not found]       ` <20161128095825.GI14788@dhcp22.suse.cz>
     [not found]         ` <20161128105425.GY31360@linux.vnet.ibm.com>
     [not found]           ` <3a4242cb-0198-0a3b-97ae-536fb5ff83ec@kernelpanic.ru>
     [not found]             ` <20161128143435.GC3924@linux.vnet.ibm.com>
2016-11-28 14:40               ` Boris Zhmurov
2016-11-28 15:05                 ` Paul E. McKenney
2016-11-28 19:16                   ` Boris Zhmurov
2016-11-29 18:59                     ` Paul E. McKenney
2016-11-30 17:41                   ` Boris Zhmurov
2016-11-30 17:48                     ` Michal Hocko
2016-11-30 18:12                       ` Boris Zhmurov
2016-11-30 18:25                         ` Michal Hocko
2016-11-30 18:26                           ` Boris Zhmurov
2016-12-01 18:10                           ` Boris Zhmurov
2016-12-01 19:39                             ` Paul E. McKenney
2016-12-01 19:39                               ` Paul E. McKenney
2016-12-02  9:37                             ` Michal Hocko
2016-12-02  9:37                               ` Michal Hocko
2016-12-02 13:52                               ` Paul E. McKenney
2016-12-02 13:52                                 ` Paul E. McKenney
2016-12-02 16:39                             ` Boris Zhmurov
2016-12-02 16:39                               ` Boris Zhmurov
2016-12-02 16:44                               ` Paul E. McKenney
2016-12-02 16:44                                 ` Paul E. McKenney
2016-12-02 17:02                                 ` Michal Hocko
2016-12-02 17:02                                   ` Michal Hocko
2016-12-02 17:15                                   ` Paul E. McKenney
2016-12-02 17:15                                     ` Paul E. McKenney
2016-11-30 19:42                         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161121140122.GU3612@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=dvteam@molgen.mpg.de \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.