All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Boris Zhmurov <bb@kernelpanic.ru>,
	"Christopher S. Aker" <caker@theshore.net>,
	Donald Buczek <buczek@molgen.mpg.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH] mm, vmscan: add cond_resched into shrink_node_memcg
Date: Mon, 5 Dec 2016 13:49:55 +0100	[thread overview]
Message-ID: <20161205124955.GG30758@dhcp22.suse.cz> (raw)
In-Reply-To: <CAKTCnz=K8QG69tKB8yStiZypBzcvnE=wW+25xuo9f_HZNzPtDg@mail.gmail.com>

[CC Paul - sorry I've tried to save you from more emails...]

On Mon 05-12-16 23:44:27, Balbir Singh wrote:
> >
> > Hi,
> > there were multiple reportes of the similar RCU stalls. Only Boris has
> > confirmed that this patch helps in his workload. Others might see a
> > slightly different issue and that should be investigated if it is the
> > case. As pointed out by Paul [1] cond_resched might be not sufficient
> > to silence RCU stalls because that would require a real scheduling.
> > This is a separate problem, though, and Paul is working with Peter [2]
> > to resolve it.
> >
> > Anyway, I believe that this patch should be a good start because it
> > really seems that nr_taken=0 during the LRU isolation can be triggered
> > in the real life. All reporters are agreeing to start seeing this issue
> > when moving on to 4.8 kernel which might be just a coincidence or a
> > different behavior of some subsystem. Well, MM has moved from zone to
> > node reclaim but I couldn't have found any direct relation to that
> > change.
> >
> > [1] http://lkml.kernel.org/r/20161130142955.GS3924@linux.vnet.ibm.com
> > [2] http://lkml.kernel.org/r/20161201124024.GB3924@linux.vnet.ibm.com
> >
> >  mm/vmscan.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index c05f00042430..c4abf08861d2 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2362,6 +2362,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
> >                         }
> >                 }
> >
> > +               cond_resched();
> > +
> 
> I see a cond_resched_rcu_qs() as a part of linux next inside the while
> (nr[..]) loop.

This is a left over from Paul's initial attempt to fix this issue. I
expect him to drop his patch from his tree. He has considered it
experimental anyway.

> Do we need this as well?

Paul is working with Peter to make cond_resched general and cover RCU
stalls even when cond_resched doesn't schedule because there is no
runnable task.

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Boris Zhmurov <bb@kernelpanic.ru>,
	"Christopher S. Aker" <caker@theshore.net>,
	Donald Buczek <buczek@molgen.mpg.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH] mm, vmscan: add cond_resched into shrink_node_memcg
Date: Mon, 5 Dec 2016 13:49:55 +0100	[thread overview]
Message-ID: <20161205124955.GG30758@dhcp22.suse.cz> (raw)
In-Reply-To: <CAKTCnz=K8QG69tKB8yStiZypBzcvnE=wW+25xuo9f_HZNzPtDg@mail.gmail.com>

[CC Paul - sorry I've tried to save you from more emails...]

On Mon 05-12-16 23:44:27, Balbir Singh wrote:
> >
> > Hi,
> > there were multiple reportes of the similar RCU stalls. Only Boris has
> > confirmed that this patch helps in his workload. Others might see a
> > slightly different issue and that should be investigated if it is the
> > case. As pointed out by Paul [1] cond_resched might be not sufficient
> > to silence RCU stalls because that would require a real scheduling.
> > This is a separate problem, though, and Paul is working with Peter [2]
> > to resolve it.
> >
> > Anyway, I believe that this patch should be a good start because it
> > really seems that nr_taken=0 during the LRU isolation can be triggered
> > in the real life. All reporters are agreeing to start seeing this issue
> > when moving on to 4.8 kernel which might be just a coincidence or a
> > different behavior of some subsystem. Well, MM has moved from zone to
> > node reclaim but I couldn't have found any direct relation to that
> > change.
> >
> > [1] http://lkml.kernel.org/r/20161130142955.GS3924@linux.vnet.ibm.com
> > [2] http://lkml.kernel.org/r/20161201124024.GB3924@linux.vnet.ibm.com
> >
> >  mm/vmscan.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index c05f00042430..c4abf08861d2 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2362,6 +2362,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
> >                         }
> >                 }
> >
> > +               cond_resched();
> > +
> 
> I see a cond_resched_rcu_qs() as a part of linux next inside the while
> (nr[..]) loop.

This is a left over from Paul's initial attempt to fix this issue. I
expect him to drop his patch from his tree. He has considered it
experimental anyway.

> Do we need this as well?

Paul is working with Peter to make cond_resched general and cover RCU
stalls even when cond_resched doesn't schedule because there is no
runnable task.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-12-05 12:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-02  9:58 [PATCH] mm, vmscan: add cond_resched into shrink_node_memcg Michal Hocko
2016-12-02  9:58 ` Michal Hocko
2016-12-05 12:44 ` Balbir Singh
2016-12-05 12:44   ` Balbir Singh
2016-12-05 12:49   ` Michal Hocko [this message]
2016-12-05 12:49     ` Michal Hocko
2016-12-05 16:16     ` Paul E. McKenney
2016-12-05 16:16       ` Paul E. McKenney
2016-12-09 10:13 ` Donald Buczek
2016-12-09 10:13   ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161205124955.GG30758@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bb@kernelpanic.ru \
    --cc=bsingharora@gmail.com \
    --cc=buczek@molgen.mpg.de \
    --cc=caker@theshore.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.