All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Hugh Dickins <hughd@google.com>,
	Glauber Costa <glommer@parallels.com>,
	Michel Lespinasse <walken@google.com>,
	Greg Thelen <gthelen@google.com>, Tejun Heo <tj@kernel.org>,
	Balbir Singh <bsingharora@gmail.com>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [patch v3 -mm 1/3] memcg: integrate soft reclaim tighter with zone shrinking code
Date: Mon, 27 May 2013 19:13:08 +0200	[thread overview]
Message-ID: <1369674791-13861-1-git-send-email-mhocko@suse.cz> (raw)
In-Reply-To: <20130517160247.GA10023@cmpxchg.org>

Hi,
it took me a bit longer than I wanted but I was closed in a conference
room in the end of the last week so I didn't have much time.

On Mon 20-05-13 16:44:38, Michal Hocko wrote:
> On Fri 17-05-13 12:02:47, Johannes Weiner wrote:
> > On Mon, May 13, 2013 at 09:46:10AM +0200, Michal Hocko wrote:
[...]
> > > After this patch shrink_zone is done in 2 passes. First it tries to do the
> > > soft reclaim if appropriate (only for global reclaim for now to keep
> > > compatible with the original state) and fall back to ignoring soft limit
> > > if no group is eligible to soft reclaim or nothing has been scanned
> > > during the first pass. Only groups which are over their soft limit or
> > > any of their parents up the hierarchy is over the limit are considered
> > > eligible during the first pass.
> > 
> > There are setups with thousands of groups that do not even use soft
> > limits.  Having them pointlessly iterate over all of them for every
> > couple of pages reclaimed is just not acceptable.
> 
> OK, that is a fair point. The soft reclaim pass should not be done if
> we know that every group is below the limit. This can be fixed easily.
> mem_cgroup_should_soft_reclaim could check a counter of over limit
> groups. This still doesn't solve the problem if there are relatively few
> groups over the limit wrt. those that are under the limit.
> 
> You are proposing a simple list in the follow up email. I have
> considered this approach as well but then I decided not to go that
> way because the list iteration doesn't respect per-node-zone-priority
> tree walk which makes it tricky to prevent from over-reclaim with many
> parallel reclaimers. I rather wanted to integrate the soft reclaim into
> the reclaim tree walk. There is also a problem when all groups are in
> excess then the whole tree collapses into a linked list which is not
> nice either (hmm, this could be mitigated if only a group in excess
> which is highest in the hierarchy would be in the list)
> 
[...]
> I think that the numbers can be improved even without introducing
> the list of groups in excess. One way to go could be introducing a
> conditional (callback) to the memcg iterator so the groups under the
> limit would be excluded during the walk without playing with css
> references and other things. My quick and dirty patch shows that
> 4k-0-limit System time was reduced by 40% wrt. this patchset. With a
> proper tagging we can make the walk close to free.

And the following patchset implements that. My first numbers shown an
improvement (I will post some numbers later after I collect them).

Nevertheless I have encountered an issue while testing the huge number
of groups scenario. And the issue is not limitted to only to this
scenario unfortunately. As memcg iterators use per node-zone-priority
cache to prevent from over reclaim it might quite easily happen that
the walk will not visit all groups and will terminate the loop either
prematurely or skip some groups. An example could be the direct reclaim
racing with kswapd. This might cause that the loop misses over limit
groups so no pages are scanned and so we will fall back to all groups
reclaim.

Not good! But also not easy to fix without risking an over reclaim or
potential stalls. I was thinking about introducing something like:

bool should_soft_limit_reclaim_continue(struct mem_cgroup *root, int groups_reclaimed)
{
	if (!groups_reclaimed)
		return false;

	if (mem_cgroup_soft_reclaim_eligible(root, root) == VISIT)
		return true;
}

and loop again few times in __shrink_zone. I am not entirely thrilled
about this as the effectiveness depends on the number of parallel
reclaimers at the same priority but it should work at least somehow. If
anybody has a better idea I am all for it.

I will think about it some more.

Anyway I will post 3 patches which should mitigate the "too many groups"
issue as a reply to this email. See patches for details.

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Hugh Dickins <hughd@google.com>,
	Glauber Costa <glommer@parallels.com>,
	Michel Lespinasse <walken@google.com>,
	Greg Thelen <gthelen@google.com>, Tejun Heo <tj@kernel.org>,
	Balbir Singh <bsingharora@gmail.com>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [patch v3 -mm 1/3] memcg: integrate soft reclaim tighter with zone shrinking code
Date: Mon, 27 May 2013 19:13:08 +0200	[thread overview]
Message-ID: <1369674791-13861-1-git-send-email-mhocko@suse.cz> (raw)
In-Reply-To: <20130517160247.GA10023@cmpxchg.org>

Hi,
it took me a bit longer than I wanted but I was closed in a conference
room in the end of the last week so I didn't have much time.

On Mon 20-05-13 16:44:38, Michal Hocko wrote:
> On Fri 17-05-13 12:02:47, Johannes Weiner wrote:
> > On Mon, May 13, 2013 at 09:46:10AM +0200, Michal Hocko wrote:
[...]
> > > After this patch shrink_zone is done in 2 passes. First it tries to do the
> > > soft reclaim if appropriate (only for global reclaim for now to keep
> > > compatible with the original state) and fall back to ignoring soft limit
> > > if no group is eligible to soft reclaim or nothing has been scanned
> > > during the first pass. Only groups which are over their soft limit or
> > > any of their parents up the hierarchy is over the limit are considered
> > > eligible during the first pass.
> > 
> > There are setups with thousands of groups that do not even use soft
> > limits.  Having them pointlessly iterate over all of them for every
> > couple of pages reclaimed is just not acceptable.
> 
> OK, that is a fair point. The soft reclaim pass should not be done if
> we know that every group is below the limit. This can be fixed easily.
> mem_cgroup_should_soft_reclaim could check a counter of over limit
> groups. This still doesn't solve the problem if there are relatively few
> groups over the limit wrt. those that are under the limit.
> 
> You are proposing a simple list in the follow up email. I have
> considered this approach as well but then I decided not to go that
> way because the list iteration doesn't respect per-node-zone-priority
> tree walk which makes it tricky to prevent from over-reclaim with many
> parallel reclaimers. I rather wanted to integrate the soft reclaim into
> the reclaim tree walk. There is also a problem when all groups are in
> excess then the whole tree collapses into a linked list which is not
> nice either (hmm, this could be mitigated if only a group in excess
> which is highest in the hierarchy would be in the list)
> 
[...]
> I think that the numbers can be improved even without introducing
> the list of groups in excess. One way to go could be introducing a
> conditional (callback) to the memcg iterator so the groups under the
> limit would be excluded during the walk without playing with css
> references and other things. My quick and dirty patch shows that
> 4k-0-limit System time was reduced by 40% wrt. this patchset. With a
> proper tagging we can make the walk close to free.

And the following patchset implements that. My first numbers shown an
improvement (I will post some numbers later after I collect them).

Nevertheless I have encountered an issue while testing the huge number
of groups scenario. And the issue is not limitted to only to this
scenario unfortunately. As memcg iterators use per node-zone-priority
cache to prevent from over reclaim it might quite easily happen that
the walk will not visit all groups and will terminate the loop either
prematurely or skip some groups. An example could be the direct reclaim
racing with kswapd. This might cause that the loop misses over limit
groups so no pages are scanned and so we will fall back to all groups
reclaim.

Not good! But also not easy to fix without risking an over reclaim or
potential stalls. I was thinking about introducing something like:

bool should_soft_limit_reclaim_continue(struct mem_cgroup *root, int groups_reclaimed)
{
	if (!groups_reclaimed)
		return false;

	if (mem_cgroup_soft_reclaim_eligible(root, root) == VISIT)
		return true;
}

and loop again few times in __shrink_zone. I am not entirely thrilled
about this as the effectiveness depends on the number of parallel
reclaimers at the same priority but it should work at least somehow. If
anybody has a better idea I am all for it.

I will think about it some more.

Anyway I will post 3 patches which should mitigate the "too many groups"
issue as a reply to this email. See patches for details.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
To: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	KAMEZAWA Hiroyuki
	<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	Ying Han <yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>,
	Michel Lespinasse
	<walken-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Balbir Singh
	<bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
Subject: Re: [patch v3 -mm 1/3] memcg: integrate soft reclaim tighter with zone shrinking code
Date: Mon, 27 May 2013 19:13:08 +0200	[thread overview]
Message-ID: <1369674791-13861-1-git-send-email-mhocko@suse.cz> (raw)
In-Reply-To: <20130517160247.GA10023-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

Hi,
it took me a bit longer than I wanted but I was closed in a conference
room in the end of the last week so I didn't have much time.

On Mon 20-05-13 16:44:38, Michal Hocko wrote:
> On Fri 17-05-13 12:02:47, Johannes Weiner wrote:
> > On Mon, May 13, 2013 at 09:46:10AM +0200, Michal Hocko wrote:
[...]
> > > After this patch shrink_zone is done in 2 passes. First it tries to do the
> > > soft reclaim if appropriate (only for global reclaim for now to keep
> > > compatible with the original state) and fall back to ignoring soft limit
> > > if no group is eligible to soft reclaim or nothing has been scanned
> > > during the first pass. Only groups which are over their soft limit or
> > > any of their parents up the hierarchy is over the limit are considered
> > > eligible during the first pass.
> > 
> > There are setups with thousands of groups that do not even use soft
> > limits.  Having them pointlessly iterate over all of them for every
> > couple of pages reclaimed is just not acceptable.
> 
> OK, that is a fair point. The soft reclaim pass should not be done if
> we know that every group is below the limit. This can be fixed easily.
> mem_cgroup_should_soft_reclaim could check a counter of over limit
> groups. This still doesn't solve the problem if there are relatively few
> groups over the limit wrt. those that are under the limit.
> 
> You are proposing a simple list in the follow up email. I have
> considered this approach as well but then I decided not to go that
> way because the list iteration doesn't respect per-node-zone-priority
> tree walk which makes it tricky to prevent from over-reclaim with many
> parallel reclaimers. I rather wanted to integrate the soft reclaim into
> the reclaim tree walk. There is also a problem when all groups are in
> excess then the whole tree collapses into a linked list which is not
> nice either (hmm, this could be mitigated if only a group in excess
> which is highest in the hierarchy would be in the list)
> 
[...]
> I think that the numbers can be improved even without introducing
> the list of groups in excess. One way to go could be introducing a
> conditional (callback) to the memcg iterator so the groups under the
> limit would be excluded during the walk without playing with css
> references and other things. My quick and dirty patch shows that
> 4k-0-limit System time was reduced by 40% wrt. this patchset. With a
> proper tagging we can make the walk close to free.

And the following patchset implements that. My first numbers shown an
improvement (I will post some numbers later after I collect them).

Nevertheless I have encountered an issue while testing the huge number
of groups scenario. And the issue is not limitted to only to this
scenario unfortunately. As memcg iterators use per node-zone-priority
cache to prevent from over reclaim it might quite easily happen that
the walk will not visit all groups and will terminate the loop either
prematurely or skip some groups. An example could be the direct reclaim
racing with kswapd. This might cause that the loop misses over limit
groups so no pages are scanned and so we will fall back to all groups
reclaim.

Not good! But also not easy to fix without risking an over reclaim or
potential stalls. I was thinking about introducing something like:

bool should_soft_limit_reclaim_continue(struct mem_cgroup *root, int groups_reclaimed)
{
	if (!groups_reclaimed)
		return false;

	if (mem_cgroup_soft_reclaim_eligible(root, root) == VISIT)
		return true;
}

and loop again few times in __shrink_zone. I am not entirely thrilled
about this as the effectiveness depends on the number of parallel
reclaimers at the same priority but it should work at least somehow. If
anybody has a better idea I am all for it.

I will think about it some more.

Anyway I will post 3 patches which should mitigate the "too many groups"
issue as a reply to this email. See patches for details.

  parent reply	other threads:[~2013-05-27 17:13 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-13  7:46 [patch v3 0/3 -mm] Soft limit rework Michal Hocko
2013-05-13  7:46 ` Michal Hocko
2013-05-13  7:46 ` Michal Hocko
2013-05-13  7:46 ` [patch v3 -mm 1/3] memcg: integrate soft reclaim tighter with zone shrinking code Michal Hocko
2013-05-13  7:46   ` Michal Hocko
2013-05-15  8:34   ` Glauber Costa
2013-05-15  8:34     ` Glauber Costa
2013-05-16 22:12   ` Tejun Heo
2013-05-16 22:12     ` Tejun Heo
2013-05-16 22:12     ` Tejun Heo
2013-05-16 22:15     ` Tejun Heo
2013-05-16 22:15       ` Tejun Heo
2013-05-17  7:16       ` Michal Hocko
2013-05-17  7:16         ` Michal Hocko
2013-05-17  7:16         ` Michal Hocko
2013-05-17  7:12     ` Michal Hocko
2013-05-17  7:12       ` Michal Hocko
2013-05-17 16:02   ` Johannes Weiner
2013-05-17 16:02     ` Johannes Weiner
2013-05-17 16:57     ` Tejun Heo
2013-05-17 16:57       ` Tejun Heo
2013-05-17 17:27       ` Johannes Weiner
2013-05-17 17:27         ` Johannes Weiner
2013-05-17 17:45         ` Tejun Heo
2013-05-17 17:45           ` Tejun Heo
2013-05-17 17:45           ` Tejun Heo
2013-05-20 14:44     ` Michal Hocko
2013-05-20 14:44       ` Michal Hocko
2013-05-20 14:44       ` Michal Hocko
2013-05-21  6:53       ` Michal Hocko
2013-05-21  6:53         ` Michal Hocko
2013-05-27 17:13     ` Michal Hocko [this message]
2013-05-27 17:13       ` Michal Hocko
2013-05-27 17:13       ` Michal Hocko
2013-05-27 17:13       ` [PATCH 1/3] memcg: track children in soft limit excess to improve soft limit Michal Hocko
2013-05-27 17:13         ` Michal Hocko
2013-05-27 17:13       ` [PATCH 2/3] memcg, vmscan: Do not attempt soft limit reclaim if it would not scan anything Michal Hocko
2013-05-27 17:13         ` Michal Hocko
2013-05-27 17:13       ` [PATCH 3/3] memcg: Track all children over limit in the root Michal Hocko
2013-05-27 17:13         ` Michal Hocko
2013-05-27 17:20       ` [PATCH] memcg: enhance memcg iterator to support predicates Michal Hocko
2013-05-27 17:20         ` Michal Hocko
2013-05-27 17:20         ` Michal Hocko
2013-05-29 13:05       ` [patch v3 -mm 1/3] memcg: integrate soft reclaim tighter with zone shrinking code Michal Hocko
2013-05-29 13:05         ` Michal Hocko
2013-05-29 13:05         ` Michal Hocko
2013-05-29 15:57         ` Michal Hocko
2013-05-29 15:57           ` Michal Hocko
2013-05-29 20:01           ` Johannes Weiner
2013-05-29 20:01             ` Johannes Weiner
2013-05-30  8:45             ` Michal Hocko
2013-05-30  8:45               ` Michal Hocko
2013-05-29 14:54       ` Michal Hocko
2013-05-29 14:54         ` Michal Hocko
2013-05-30  8:36         ` Michal Hocko
2013-05-30  8:36           ` Michal Hocko
2013-05-13  7:46 ` [patch v3 -mm 2/3] memcg: Get rid of soft-limit tree infrastructure Michal Hocko
2013-05-13  7:46   ` Michal Hocko
2013-05-15  8:38   ` Glauber Costa
2013-05-15  8:38     ` Glauber Costa
2013-05-16 22:16   ` Tejun Heo
2013-05-16 22:16     ` Tejun Heo
2013-05-13  7:46 ` [patch v3 -mm 3/3] vmscan, memcg: Do softlimit reclaim also for targeted reclaim Michal Hocko
2013-05-13  7:46   ` Michal Hocko
2013-05-13  7:46   ` Michal Hocko
2013-05-15  8:42   ` Glauber Costa
2013-05-15  8:42     ` Glauber Costa
2013-05-17  7:50     ` Michal Hocko
2013-05-17  7:50       ` Michal Hocko
2013-05-17  7:50       ` Michal Hocko
2013-05-16 23:12   ` Tejun Heo
2013-05-16 23:12     ` Tejun Heo
2013-05-17  7:34     ` Michal Hocko
2013-05-17  7:34       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1369674791-13861-1-git-send-email-mhocko@suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=glommer@parallels.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    --cc=walken@google.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.