All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Hugh Dickins <hughd@google.com>,
	Glauber Costa <glommer@parallels.com>,
	Michel Lespinasse <walken@google.com>,
	Greg Thelen <gthelen@google.com>, Tejun Heo <tj@kernel.org>,
	Balbir Singh <bsingharora@gmail.com>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [patch v3 -mm 1/3] memcg: integrate soft reclaim tighter with zone shrinking code
Date: Thu, 30 May 2013 10:45:56 +0200	[thread overview]
Message-ID: <20130530084556.GC3582@dhcp22.suse.cz> (raw)
In-Reply-To: <20130529200154.GF15721@cmpxchg.org>

On Wed 29-05-13 16:01:54, Johannes Weiner wrote:
> On Wed, May 29, 2013 at 05:57:56PM +0200, Michal Hocko wrote:
> > On Wed 29-05-13 15:05:38, Michal Hocko wrote:
> > > On Mon 27-05-13 19:13:08, Michal Hocko wrote:
> > > [...]
> > > > Nevertheless I have encountered an issue while testing the huge number
> > > > of groups scenario. And the issue is not limitted to only to this
> > > > scenario unfortunately. As memcg iterators use per node-zone-priority
> > > > cache to prevent from over reclaim it might quite easily happen that
> > > > the walk will not visit all groups and will terminate the loop either
> > > > prematurely or skip some groups. An example could be the direct reclaim
> > > > racing with kswapd. This might cause that the loop misses over limit
> > > > groups so no pages are scanned and so we will fall back to all groups
> > > > reclaim.
> > > 
> > > And after some more testing and head scratching it turned out that
> > > fallbacks to pass#2 I was seeing are caused by something else. It is
> > > not race between iterators but rather reclaiming from zone DMA which
> > > has troubles to scan anything despite there are pages on LRU and so we
> > > fall back. I have to look into that more but what-ever the issue is it
> > > shouldn't be related to the patch series.
> > 
> > Think I know what is going on. get_scan_count sees relatively small
> > amount of pages in the lists (around 2k). This means that get_scan_count
> > will tell us to scan nothing for DEF_PRIORITY (as the DMA32 is usually
> > ~16M) then the DEF_PRIORITY is basically no-op and we have to wait and
> > fall down to a priority which actually let us scan something.
> > 
> > Hmm, maybe ignoring soft reclaim for DMA zone would help to reduce
> > one pointless loop over groups.
> 
> If you have a small group in excess of its soft limit and bigger
> groups that are not, you may reclaim something in the regular reclaim
> cycle before reclaiming anything in the soft limit cycle with the way
> the code is structured.

Yes the way how get_scan_count works might really cause this. Although
tageted reclaim is protected from this the global reclaim can really
suffer from this. I am not sure this is necessarily a problem though. If
we are under the global reclaim then a small group which doesn't have at
least 1<<DEF_PRIORITY pages probably doesn't matter that much. The soft
limit is not a guarantee anyway so we can sacrifice some pages from all
groups in such a case.
I also think that the force_scan logic should be enhanced a bit.
Especially for cases like DMA zone. The zone is clearly under watermaks
but we have to wait few priority cycles to reclaim something. But this
is a different issue in depended on the soft reclaim rework.
 
> The soft limit cycle probably needs to sit outside of the priority
> loop, not inside the loop, so that the soft limit reclaim cycle
> descends priority levels until it makes progress BEFORE it exits to
> the regular reclaim cycle.

I do not like this to be honest. shrink_zone is an ideal place as it is
shared among all reclaimers and we really want to obey priority in the
soft reclaim as well. The corner case mentioned above is probably
fixable on the get_scan_count layer and even if not then I wouldn't call
it a disaster.

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Hugh Dickins <hughd@google.com>,
	Glauber Costa <glommer@parallels.com>,
	Michel Lespinasse <walken@google.com>,
	Greg Thelen <gthelen@google.com>, Tejun Heo <tj@kernel.org>,
	Balbir Singh <bsingharora@gmail.com>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [patch v3 -mm 1/3] memcg: integrate soft reclaim tighter with zone shrinking code
Date: Thu, 30 May 2013 10:45:56 +0200	[thread overview]
Message-ID: <20130530084556.GC3582@dhcp22.suse.cz> (raw)
In-Reply-To: <20130529200154.GF15721@cmpxchg.org>

On Wed 29-05-13 16:01:54, Johannes Weiner wrote:
> On Wed, May 29, 2013 at 05:57:56PM +0200, Michal Hocko wrote:
> > On Wed 29-05-13 15:05:38, Michal Hocko wrote:
> > > On Mon 27-05-13 19:13:08, Michal Hocko wrote:
> > > [...]
> > > > Nevertheless I have encountered an issue while testing the huge number
> > > > of groups scenario. And the issue is not limitted to only to this
> > > > scenario unfortunately. As memcg iterators use per node-zone-priority
> > > > cache to prevent from over reclaim it might quite easily happen that
> > > > the walk will not visit all groups and will terminate the loop either
> > > > prematurely or skip some groups. An example could be the direct reclaim
> > > > racing with kswapd. This might cause that the loop misses over limit
> > > > groups so no pages are scanned and so we will fall back to all groups
> > > > reclaim.
> > > 
> > > And after some more testing and head scratching it turned out that
> > > fallbacks to pass#2 I was seeing are caused by something else. It is
> > > not race between iterators but rather reclaiming from zone DMA which
> > > has troubles to scan anything despite there are pages on LRU and so we
> > > fall back. I have to look into that more but what-ever the issue is it
> > > shouldn't be related to the patch series.
> > 
> > Think I know what is going on. get_scan_count sees relatively small
> > amount of pages in the lists (around 2k). This means that get_scan_count
> > will tell us to scan nothing for DEF_PRIORITY (as the DMA32 is usually
> > ~16M) then the DEF_PRIORITY is basically no-op and we have to wait and
> > fall down to a priority which actually let us scan something.
> > 
> > Hmm, maybe ignoring soft reclaim for DMA zone would help to reduce
> > one pointless loop over groups.
> 
> If you have a small group in excess of its soft limit and bigger
> groups that are not, you may reclaim something in the regular reclaim
> cycle before reclaiming anything in the soft limit cycle with the way
> the code is structured.

Yes the way how get_scan_count works might really cause this. Although
tageted reclaim is protected from this the global reclaim can really
suffer from this. I am not sure this is necessarily a problem though. If
we are under the global reclaim then a small group which doesn't have at
least 1<<DEF_PRIORITY pages probably doesn't matter that much. The soft
limit is not a guarantee anyway so we can sacrifice some pages from all
groups in such a case.
I also think that the force_scan logic should be enhanced a bit.
Especially for cases like DMA zone. The zone is clearly under watermaks
but we have to wait few priority cycles to reclaim something. But this
is a different issue in depended on the soft reclaim rework.
 
> The soft limit cycle probably needs to sit outside of the priority
> loop, not inside the loop, so that the soft limit reclaim cycle
> descends priority levels until it makes progress BEFORE it exits to
> the regular reclaim cycle.

I do not like this to be honest. shrink_zone is an ideal place as it is
shared among all reclaimers and we really want to obey priority in the
soft reclaim as well. The corner case mentioned above is probably
fixable on the get_scan_count layer and even if not then I wouldn't call
it a disaster.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-05-30  8:46 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-13  7:46 [patch v3 0/3 -mm] Soft limit rework Michal Hocko
2013-05-13  7:46 ` Michal Hocko
2013-05-13  7:46 ` Michal Hocko
2013-05-13  7:46 ` [patch v3 -mm 1/3] memcg: integrate soft reclaim tighter with zone shrinking code Michal Hocko
2013-05-13  7:46   ` Michal Hocko
2013-05-15  8:34   ` Glauber Costa
2013-05-15  8:34     ` Glauber Costa
2013-05-16 22:12   ` Tejun Heo
2013-05-16 22:12     ` Tejun Heo
2013-05-16 22:12     ` Tejun Heo
2013-05-16 22:15     ` Tejun Heo
2013-05-16 22:15       ` Tejun Heo
2013-05-17  7:16       ` Michal Hocko
2013-05-17  7:16         ` Michal Hocko
2013-05-17  7:16         ` Michal Hocko
2013-05-17  7:12     ` Michal Hocko
2013-05-17  7:12       ` Michal Hocko
2013-05-17 16:02   ` Johannes Weiner
2013-05-17 16:02     ` Johannes Weiner
2013-05-17 16:57     ` Tejun Heo
2013-05-17 16:57       ` Tejun Heo
2013-05-17 17:27       ` Johannes Weiner
2013-05-17 17:27         ` Johannes Weiner
2013-05-17 17:45         ` Tejun Heo
2013-05-17 17:45           ` Tejun Heo
2013-05-17 17:45           ` Tejun Heo
2013-05-20 14:44     ` Michal Hocko
2013-05-20 14:44       ` Michal Hocko
2013-05-20 14:44       ` Michal Hocko
2013-05-21  6:53       ` Michal Hocko
2013-05-21  6:53         ` Michal Hocko
2013-05-27 17:13     ` Michal Hocko
2013-05-27 17:13       ` Michal Hocko
2013-05-27 17:13       ` Michal Hocko
2013-05-27 17:13       ` [PATCH 1/3] memcg: track children in soft limit excess to improve soft limit Michal Hocko
2013-05-27 17:13         ` Michal Hocko
2013-05-27 17:13       ` [PATCH 2/3] memcg, vmscan: Do not attempt soft limit reclaim if it would not scan anything Michal Hocko
2013-05-27 17:13         ` Michal Hocko
2013-05-27 17:13       ` [PATCH 3/3] memcg: Track all children over limit in the root Michal Hocko
2013-05-27 17:13         ` Michal Hocko
2013-05-27 17:20       ` [PATCH] memcg: enhance memcg iterator to support predicates Michal Hocko
2013-05-27 17:20         ` Michal Hocko
2013-05-27 17:20         ` Michal Hocko
2013-05-29 13:05       ` [patch v3 -mm 1/3] memcg: integrate soft reclaim tighter with zone shrinking code Michal Hocko
2013-05-29 13:05         ` Michal Hocko
2013-05-29 13:05         ` Michal Hocko
2013-05-29 15:57         ` Michal Hocko
2013-05-29 15:57           ` Michal Hocko
2013-05-29 20:01           ` Johannes Weiner
2013-05-29 20:01             ` Johannes Weiner
2013-05-30  8:45             ` Michal Hocko [this message]
2013-05-30  8:45               ` Michal Hocko
2013-05-29 14:54       ` Michal Hocko
2013-05-29 14:54         ` Michal Hocko
2013-05-30  8:36         ` Michal Hocko
2013-05-30  8:36           ` Michal Hocko
2013-05-13  7:46 ` [patch v3 -mm 2/3] memcg: Get rid of soft-limit tree infrastructure Michal Hocko
2013-05-13  7:46   ` Michal Hocko
2013-05-15  8:38   ` Glauber Costa
2013-05-15  8:38     ` Glauber Costa
2013-05-16 22:16   ` Tejun Heo
2013-05-16 22:16     ` Tejun Heo
2013-05-13  7:46 ` [patch v3 -mm 3/3] vmscan, memcg: Do softlimit reclaim also for targeted reclaim Michal Hocko
2013-05-13  7:46   ` Michal Hocko
2013-05-13  7:46   ` Michal Hocko
2013-05-15  8:42   ` Glauber Costa
2013-05-15  8:42     ` Glauber Costa
2013-05-17  7:50     ` Michal Hocko
2013-05-17  7:50       ` Michal Hocko
2013-05-17  7:50       ` Michal Hocko
2013-05-16 23:12   ` Tejun Heo
2013-05-16 23:12     ` Tejun Heo
2013-05-17  7:34     ` Michal Hocko
2013-05-17  7:34       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130530084556.GC3582@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=glommer@parallels.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    --cc=walken@google.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.