From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1756962AbcK3NTP (ORCPT <rfc822;w@1wt.eu>);
        Wed, 30 Nov 2016 08:19:15 -0500
Received: from mail-wj0-f194.google.com ([209.85.210.194]:35197 "EHLO
        mail-wj0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1755449AbcK3NTN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 30 Nov 2016 08:19:13 -0500
Date: Wed, 30 Nov 2016 14:19:10 +0100
From: Michal Hocko <mhocko@kernel.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Donald Buczek <buczek@molgen.mpg.de>,
        Paul Menzel <pmenzel@molgen.mpg.de>, dvteam@molgen.mpg.de,
        linux-mm@kvack.org, linux-kernel@vger.kernel.org,
        Josh Triplett <josh@joshtriplett.org>
Subject: Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and
 `mem_cgroup_shrink_node`
Message-ID: <20161130131910.GF18432@dhcp22.suse.cz>
References: <20161121141818.GD18112@dhcp22.suse.cz>
 <20161121142901.GV3612@linux.vnet.ibm.com>
 <68025f6c-6801-ab46-b0fc-a9407353d8ce@molgen.mpg.de>
 <20161124101525.GB20668@dhcp22.suse.cz>
 <583AA50A.9010608@molgen.mpg.de>
 <20161128110449.GK14788@dhcp22.suse.cz>
 <109d5128-f3a4-4b6e-db17-7a1fcb953500@molgen.mpg.de>
 <29196f89-c35e-f79d-8e4d-2bf73fe930df@molgen.mpg.de>
 <20161130110944.GD18432@dhcp22.suse.cz>
 <20161130115320.GO3924@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161130115320.GO3924@linux.vnet.ibm.com>
User-Agent: Mutt/1.6.0 (2016-04-01)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed 30-11-16 03:53:20, Paul E. McKenney wrote:
> On Wed, Nov 30, 2016 at 12:09:44PM +0100, Michal Hocko wrote:
> > [CCing Paul]
> > 
> > On Wed 30-11-16 11:28:34, Donald Buczek wrote:
> > [...]
> > > shrink_active_list gets and releases the spinlock and calls cond_resched().
> > > This should give other tasks a chance to run. Just as an experiment, I'm
> > > trying
> > > 
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -1921,7 +1921,7 @@ static void shrink_active_list(unsigned long
> > > nr_to_scan,
> > >         spin_unlock_irq(&pgdat->lru_lock);
> > > 
> > >         while (!list_empty(&l_hold)) {
> > > -               cond_resched();
> > > +               cond_resched_rcu_qs();
> > >                 page = lru_to_page(&l_hold);
> > >                 list_del(&page->lru);
> > > 
> > > and didn't hit a rcu_sched warning for >21 hours uptime now. We'll see.
> > 
> > This is really interesting! Is it possible that the RCU stall detector
> > is somehow confused?
> 
> No, it is not confused.  Again, cond_resched() is not a quiescent
> state unless it does a context switch.  Therefore, if the task running
> in that loop was the only runnable task on its CPU, cond_resched()
> would -never- provide RCU with a quiescent state.

Sorry for being dense here. But why cannot we hide the QS handling into
cond_resched()? I mean doesn't every current usage of cond_resched
suffer from the same problem wrt RCU stalls?

> In contrast, cond_resched_rcu_qs() unconditionally provides RCU
> with a quiescent state (hence the _rcu_qs in its name), regardless
> of whether or not a context switch happens.
> 
> It is therefore expected behavior that this change might prevent
> RCU CPU stall warnings.
> 
> 							Thanx, Paul

-- 
Michal Hocko
SUSE Labs