* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-19 22:33 ` Johannes Weiner
@ 2012-04-19 22:51 ` Johannes Weiner
2012-04-20 7:37 ` Ying Han
2012-04-20 8:28 ` Michal Hocko
2 siblings, 0 replies; 25+ messages in thread
From: Johannes Weiner @ 2012-04-19 22:51 UTC (permalink / raw)
To: Ying Han
Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Fri, Apr 20, 2012 at 12:33:18AM +0200, Johannes Weiner wrote:
> On Thu, Apr 19, 2012 at 10:47:27AM -0700, Ying Han wrote:
> > On Thu, Apr 19, 2012 at 10:04 AM, Michal Hocko <mhocko@suse.cz> wrote:
> > > On Wed 18-04-12 11:00:40, Ying Han wrote:
> > >> On Wed, Apr 18, 2012 at 5:24 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >> > On Tue, Apr 17, 2012 at 09:37:46AM -0700, Ying Han wrote:
> > >> >> The "soft_limit" was introduced in memcg to support over-committing the
> > >> >> memory resource on the host. Each cgroup configures its "hard_limit" where
> > >> >> it will be throttled or OOM killed by going over the limit. However, the
> > >> >> cgroup can go above the "soft_limit" as long as there is no system-wide
> > >> >> memory contention. So, the "soft_limit" is the kernel mechanism for
> > >> >> re-distributing system spare memory among cgroups.
> > >> >>
> > >> >> This patch reworks the softlimit reclaim by hooking it into the new global
> > >> >> reclaim scheme. So the global reclaim path including direct reclaim and
> > >> >> background reclaim will respect the memcg softlimit.
> > >> >>
> > >> >> v3..v2:
> > >> >> 1. rebase the patch on 3.4-rc3
> > >> >> 2. squash the commits of replacing the old implementation with new
> > >> >> implementation into one commit. This is to make sure to leave the tree
> > >> >> in stable state between each commit.
> > >> >> 3. removed the commit which changes the nr_to_reclaim for global reclaim
> > >> >> case. The need of that patch is not obvious now.
> > >> >>
> > >> >> Note:
> > >> >> 1. the new implementation of softlimit reclaim is rather simple and first
> > >> >> step for further optimizations. there is no memory pressure balancing between
> > >> >> memcgs for each zone, and that is something we would like to add as follow-ups.
> > >> >>
> > >> >> 2. this patch is slightly different from the last one posted from Johannes
> > >> >> http://comments.gmane.org/gmane.linux.kernel.mm/72382
> > >> >> where his patch is closer to the reverted implementation by doing hierarchical
> > >> >> reclaim for each selected memcg. However, that is not expected behavior from
> > >> >> user perspective. Considering the following example:
> > >> >>
> > >> >> root (32G capacity)
> > >> >> --> A (hard limit 20G, soft limit 15G, usage 16G)
> > >> >> --> A1 (soft limit 5G, usage 4G)
> > >> >> --> A2 (soft limit 10G, usage 12G)
> > >> >> --> B (hard limit 20G, soft limit 10G, usage 16G)
> > >> >>
> > >> >> Under global reclaim, we shouldn't add pressure on A1 although its parent(A)
> > >> >> exceeds softlimit. This is what admin expects by setting softlimit to the
> > >> >> actual working set size and only reclaim pages under softlimit if system has
> > >> >> trouble to reclaim.
> > >> >
> > >> > Actually, this is exactly what the admin expects when creating a
> > >> > hierarchy, because she defines that A1 is a child of A and is
> > >> > responsible for the memory situation in its parent.
> > >
> > > Hmm, I guess that both approaches have cons and pros.
> > > * Hierarchical soft limit reclaim - reclaim the whole subtree of the over
> > > soft limit memcg
> > > + it is consistent with the hard limit reclaim
> > Not sure why we want them to be consistent. Soft_limit is serving
> > different purpose and the one of the main purpose is to preserve the
> > working set of the cgroup.
>
> I'd argue, given the history of cgroups, one of the main purposes is
> having a machine of containers where you overcommit their hard limit
> and set the soft limit accordingly to provide fairness.
>
> Yes, we don't want to reclaim hierarchies that are below their soft
> limit as long as there are some in excess, of course. This is a flaw
> and needs fixing. But it's something completely different than
> changing how the soft limit is defined and suddenly allow child
> groups, which you may not trust, to override rules defined by parental
> groups.
>
> It bothers me that we should add something that will almost certainly
> bite us in the future while we are discussing on the cgroups list what
> would stand in the way of getting sane hierarchy semantics across
> controllers to provide consistency, nesting, etc.
>
> To support a single use case, which I feel we still have not discussed
> nearly enough to justify this change.
>
> For example, I get that you want 'meta-groups' that group together
> subgroups for common accounting and hard limiting. But I don't see
> why such meta-groups have their own processes. Conceptually, I mean,
> how does a process fit into A? Is it superior to the tasks in A1 and
> A2? Why can't it live in A3?
>
> So here is a proposal:
>
> Would it make sense to try to keep those meta groups always free of
> their own memory so that they don't /need/ soft limits with weird
> semantics? E.g. immediately free the unused memory on rmdir, OR add
> mechanisms to migrate the memory to a dedicated group:
>
> A
> A1 (soft-limited)
> A2 (soft-limited)
> B
> unused (soft-limited)
>
> Move all leftover memory from finished jobs to this 'unused' group.
> You could set its soft limit to 0 so that it sticks around only until
> you actually need the memory for something else.
>
> Then you would get the benefits of accounting and limiting A1 and A2
> under a single umbrella without the need for a soft limit in A. We
> could keep the consistent semantics for soft limits, because you would
> only have to set it on leaf nodes.
>
> Wouldn't this work for you?
Or, if the frequency of job creation and completion permits, just keep
the original groups around after completion, set their soft limit to
0, put a watch ("threshold notification") on its usage and reap it
when global pressure finally cleaned it out.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-19 22:33 ` Johannes Weiner
2012-04-19 22:51 ` Johannes Weiner
@ 2012-04-20 7:37 ` Ying Han
2012-04-20 8:21 ` KAMEZAWA Hiroyuki
2012-04-20 13:17 ` Johannes Weiner
2012-04-20 8:28 ` Michal Hocko
2 siblings, 2 replies; 25+ messages in thread
From: Ying Han @ 2012-04-20 7:37 UTC (permalink / raw)
To: Johannes Weiner
Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Thu, Apr 19, 2012 at 3:33 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Thu, Apr 19, 2012 at 10:47:27AM -0700, Ying Han wrote:
>> On Thu, Apr 19, 2012 at 10:04 AM, Michal Hocko <mhocko@suse.cz> wrote:
>> > On Wed 18-04-12 11:00:40, Ying Han wrote:
>> >> On Wed, Apr 18, 2012 at 5:24 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> >> > On Tue, Apr 17, 2012 at 09:37:46AM -0700, Ying Han wrote:
>> >> >> The "soft_limit" was introduced in memcg to support over-committing the
>> >> >> memory resource on the host. Each cgroup configures its "hard_limit" where
>> >> >> it will be throttled or OOM killed by going over the limit. However, the
>> >> >> cgroup can go above the "soft_limit" as long as there is no system-wide
>> >> >> memory contention. So, the "soft_limit" is the kernel mechanism for
>> >> >> re-distributing system spare memory among cgroups.
>> >> >>
>> >> >> This patch reworks the softlimit reclaim by hooking it into the new global
>> >> >> reclaim scheme. So the global reclaim path including direct reclaim and
>> >> >> background reclaim will respect the memcg softlimit.
>> >> >>
>> >> >> v3..v2:
>> >> >> 1. rebase the patch on 3.4-rc3
>> >> >> 2. squash the commits of replacing the old implementation with new
>> >> >> implementation into one commit. This is to make sure to leave the tree
>> >> >> in stable state between each commit.
>> >> >> 3. removed the commit which changes the nr_to_reclaim for global reclaim
>> >> >> case. The need of that patch is not obvious now.
>> >> >>
>> >> >> Note:
>> >> >> 1. the new implementation of softlimit reclaim is rather simple and first
>> >> >> step for further optimizations. there is no memory pressure balancing between
>> >> >> memcgs for each zone, and that is something we would like to add as follow-ups.
>> >> >>
>> >> >> 2. this patch is slightly different from the last one posted from Johannes
>> >> >> http://comments.gmane.org/gmane.linux.kernel.mm/72382
>> >> >> where his patch is closer to the reverted implementation by doing hierarchical
>> >> >> reclaim for each selected memcg. However, that is not expected behavior from
>> >> >> user perspective. Considering the following example:
>> >> >>
>> >> >> root (32G capacity)
>> >> >> --> A (hard limit 20G, soft limit 15G, usage 16G)
>> >> >> --> A1 (soft limit 5G, usage 4G)
>> >> >> --> A2 (soft limit 10G, usage 12G)
>> >> >> --> B (hard limit 20G, soft limit 10G, usage 16G)
>> >> >>
>> >> >> Under global reclaim, we shouldn't add pressure on A1 although its parent(A)
>> >> >> exceeds softlimit. This is what admin expects by setting softlimit to the
>> >> >> actual working set size and only reclaim pages under softlimit if system has
>> >> >> trouble to reclaim.
>> >> >
>> >> > Actually, this is exactly what the admin expects when creating a
>> >> > hierarchy, because she defines that A1 is a child of A and is
>> >> > responsible for the memory situation in its parent.
>> >
>> > Hmm, I guess that both approaches have cons and pros.
>> > * Hierarchical soft limit reclaim - reclaim the whole subtree of the over
>> > soft limit memcg
>> > + it is consistent with the hard limit reclaim
>> Not sure why we want them to be consistent. Soft_limit is serving
>> different purpose and the one of the main purpose is to preserve the
>> working set of the cgroup.
>
> I'd argue, given the history of cgroups, one of the main purposes is
> having a machine of containers where you overcommit their hard limit
> and set the soft limit accordingly to provide fairness.
>
> Yes, we don't want to reclaim hierarchies that are below their soft
> limit as long as there are some in excess, of course. This is a flaw
> and needs fixing. But it's something completely different than
> changing how the soft limit is defined and suddenly allow child
> groups, which you may not trust, to override rules defined by parental
> groups.
>
> It bothers me that we should add something that will almost certainly
> bite us in the future while we are discussing on the cgroups list what
> would stand in the way of getting sane hierarchy semantics across
> controllers to provide consistency, nesting, etc.
I understand the concern here and I don't want the soft_limit reclaim
to be far away from the other part of the cgroup design down to the
road. On the other hand, I don't think the current implementation is
against the hierarchy semantics totally. See the comment below :)
>
> To support a single use case, which I feel we still have not discussed
> nearly enough to justify this change.
>
> For example, I get that you want 'meta-groups' that group together
> subgroups for common accounting and hard limiting. But I don't see
> why such meta-groups have their own processes. Conceptually, I mean,
> how does a process fit into A? Is it superior to the tasks in A1 and
> A2? Why can't it live in A3?
For user processes, I can see that is totally feasible to live in A3.
The case I was thinking is kernel threads, which 1) we don't want to
limit their memory usage 2) they serve for the whole group unlike
individual jobs. Of course, we could say that putting those kernel
thread in A3 and leave the cgroup to unlimited, but not sure if we
should constrain ourselves not having any processes running under A.
>
> So here is a proposal:
>
> Would it make sense to try to keep those meta groups always free of
> their own memory so that they don't /need/ soft limits with weird
> semantics? E.g. immediately free the unused memory on rmdir, OR add
> mechanisms to migrate the memory to a dedicated group:
>
> A
> A1 (soft-limited)
> A2 (soft-limited)
> B
> unused (soft-limited)
>
> Move all leftover memory from finished jobs to this 'unused' group.
> You could set its soft limit to 0 so that it sticks around only until
> you actually need the memory for something else.
>
> Then you would get the benefits of accounting and limiting A1 and A2
> under a single umbrella without the need for a soft limit in A. We
> could keep the consistent semantics for soft limits, because you would
> only have to set it on leaf nodes.
>
> Wouldn't this work for you?
To be frankly, this sounds a lot of extra work for admin to manage the
system and we still can not prevent page being landed on A totally.
Back to the current proposal, there are two concerns that I can tell by far:
1. skipping "not trust" cgroup in case it sets its soft_limit very high:
Here, we don't skip the "not trust" cgroup always. We do reclaim from
them if not enough progress made from other cgroups above the
softlimit. So, I don't see a problem here.
2. not reclaiming based on hierarchy:
Here I am not checking the ancestor's soft_limit in
should_reclaim_mem_cgroup(). And it will only make difference if A is
under soft_limit and A1 is above soft_limit. Now you do agree that we
shouldn't reclaim from those under softlimit groups if there are
cgroup exeed their softlimit. Then it leads me to think something like
the following:
1. for priority > DEF_PRIORITY - 3, only reclaim memcg above their softlimit
2. for priority <= DEF_PRIORITY - 3, besides 1), also look at memcg's
ancestor. reclaim memcgs whose ancestor above soft_limit
3. for priority == 0, reclaim everything.
Then it has the guarantee of the softlimit at certain level while also
considers the hierarchy reclaim if the first few rounds doesn't
fulfill the request.
--Ying
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 7:37 ` Ying Han
@ 2012-04-20 8:21 ` KAMEZAWA Hiroyuki
2012-04-20 14:17 ` Rik van Riel
2012-04-20 13:17 ` Johannes Weiner
1 sibling, 1 reply; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-20 8:21 UTC (permalink / raw)
To: Ying Han
Cc: Johannes Weiner, Michal Hocko, Mel Gorman, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
(2012/04/20 16:37), Ying Han wrote:
> On Thu, Apr 19, 2012 at 3:33 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> On Thu, Apr 19, 2012 at 10:47:27AM -0700, Ying Han wrote:
>>> On Thu, Apr 19, 2012 at 10:04 AM, Michal Hocko <mhocko@suse.cz> wrote:
>>>> On Wed 18-04-12 11:00:40, Ying Han wrote:
>>>>> On Wed, Apr 18, 2012 at 5:24 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>>>>> On Tue, Apr 17, 2012 at 09:37:46AM -0700, Ying Han wrote:
>>>>>>> The "soft_limit" was introduced in memcg to support over-committing the
>>>>>>> memory resource on the host. Each cgroup configures its "hard_limit" where
>>>>>>> it will be throttled or OOM killed by going over the limit. However, the
>>>>>>> cgroup can go above the "soft_limit" as long as there is no system-wide
>>>>>>> memory contention. So, the "soft_limit" is the kernel mechanism for
>>>>>>> re-distributing system spare memory among cgroups.
>>>>>>>
>>>>>>> This patch reworks the softlimit reclaim by hooking it into the new global
>>>>>>> reclaim scheme. So the global reclaim path including direct reclaim and
>>>>>>> background reclaim will respect the memcg softlimit.
>>>>>>>
>>>>>>> v3..v2:
>>>>>>> 1. rebase the patch on 3.4-rc3
>>>>>>> 2. squash the commits of replacing the old implementation with new
>>>>>>> implementation into one commit. This is to make sure to leave the tree
>>>>>>> in stable state between each commit.
>>>>>>> 3. removed the commit which changes the nr_to_reclaim for global reclaim
>>>>>>> case. The need of that patch is not obvious now.
>>>>>>>
>>>>>>> Note:
>>>>>>> 1. the new implementation of softlimit reclaim is rather simple and first
>>>>>>> step for further optimizations. there is no memory pressure balancing between
>>>>>>> memcgs for each zone, and that is something we would like to add as follow-ups.
>>>>>>>
>>>>>>> 2. this patch is slightly different from the last one posted from Johannes
>>>>>>> http://comments.gmane.org/gmane.linux.kernel.mm/72382
>>>>>>> where his patch is closer to the reverted implementation by doing hierarchical
>>>>>>> reclaim for each selected memcg. However, that is not expected behavior from
>>>>>>> user perspective. Considering the following example:
>>>>>>>
>>>>>>> root (32G capacity)
>>>>>>> --> A (hard limit 20G, soft limit 15G, usage 16G)
>>>>>>> --> A1 (soft limit 5G, usage 4G)
>>>>>>> --> A2 (soft limit 10G, usage 12G)
>>>>>>> --> B (hard limit 20G, soft limit 10G, usage 16G)
>>>>>>>
>>>>>>> Under global reclaim, we shouldn't add pressure on A1 although its parent(A)
>>>>>>> exceeds softlimit. This is what admin expects by setting softlimit to the
>>>>>>> actual working set size and only reclaim pages under softlimit if system has
>>>>>>> trouble to reclaim.
>>>>>>
>>>>>> Actually, this is exactly what the admin expects when creating a
>>>>>> hierarchy, because she defines that A1 is a child of A and is
>>>>>> responsible for the memory situation in its parent.
>>>>
>>>> Hmm, I guess that both approaches have cons and pros.
>>>> * Hierarchical soft limit reclaim - reclaim the whole subtree of the over
>>>> soft limit memcg
>>>> + it is consistent with the hard limit reclaim
>>> Not sure why we want them to be consistent. Soft_limit is serving
>>> different purpose and the one of the main purpose is to preserve the
>>> working set of the cgroup.
>>
>> I'd argue, given the history of cgroups, one of the main purposes is
>> having a machine of containers where you overcommit their hard limit
>> and set the soft limit accordingly to provide fairness.
>>
>> Yes, we don't want to reclaim hierarchies that are below their soft
>> limit as long as there are some in excess, of course. This is a flaw
>> and needs fixing. But it's something completely different than
>> changing how the soft limit is defined and suddenly allow child
>> groups, which you may not trust, to override rules defined by parental
>> groups.
>>
>> It bothers me that we should add something that will almost certainly
>> bite us in the future while we are discussing on the cgroups list what
>> would stand in the way of getting sane hierarchy semantics across
>> controllers to provide consistency, nesting, etc.
>
> I understand the concern here and I don't want the soft_limit reclaim
> to be far away from the other part of the cgroup design down to the
> road. On the other hand, I don't think the current implementation is
> against the hierarchy semantics totally. See the comment below :)
>
>>
>> To support a single use case, which I feel we still have not discussed
>> nearly enough to justify this change.
>>
>> For example, I get that you want 'meta-groups' that group together
>> subgroups for common accounting and hard limiting. But I don't see
>> why such meta-groups have their own processes. Conceptually, I mean,
>> how does a process fit into A? Is it superior to the tasks in A1 and
>> A2? Why can't it live in A3?
>
> For user processes, I can see that is totally feasible to live in A3.
> The case I was thinking is kernel threads, which 1) we don't want to
> limit their memory usage 2) they serve for the whole group unlike
> individual jobs. Of course, we could say that putting those kernel
> thread in A3 and leave the cgroup to unlimited, but not sure if we
> should constrain ourselves not having any processes running under A.
>
>>
>> So here is a proposal:
>>
>> Would it make sense to try to keep those meta groups always free of
>> their own memory so that they don't /need/ soft limits with weird
>> semantics? E.g. immediately free the unused memory on rmdir, OR add
>> mechanisms to migrate the memory to a dedicated group:
>>
>> A
>> A1 (soft-limited)
>> A2 (soft-limited)
>> B
>> unused (soft-limited)
>>
>> Move all leftover memory from finished jobs to this 'unused' group.
>> You could set its soft limit to 0 so that it sticks around only until
>> you actually need the memory for something else.
>>
>> Then you would get the benefits of accounting and limiting A1 and A2
>> under a single umbrella without the need for a soft limit in A. We
>> could keep the consistent semantics for soft limits, because you would
>> only have to set it on leaf nodes.
>>
>> Wouldn't this work for you?
>
> To be frankly, this sounds a lot of extra work for admin to manage the
> system and we still can not prevent page being landed on A totally.
>
> Back to the current proposal, there are two concerns that I can tell by far:
>
> 1. skipping "not trust" cgroup in case it sets its soft_limit very high:
> Here, we don't skip the "not trust" cgroup always. We do reclaim from
> them if not enough progress made from other cgroups above the
> softlimit. So, I don't see a problem here.
>
> 2. not reclaiming based on hierarchy:
> Here I am not checking the ancestor's soft_limit in
> should_reclaim_mem_cgroup(). And it will only make difference if A is
> under soft_limit and A1 is above soft_limit. Now you do agree that we
> shouldn't reclaim from those under softlimit groups if there are
> cgroup exeed their softlimit. Then it leads me to think something like
> the following:
>
> 1. for priority > DEF_PRIORITY - 3, only reclaim memcg above their softlimit
> 2. for priority <= DEF_PRIORITY - 3, besides 1), also look at memcg's
> ancestor. reclaim memcgs whose ancestor above soft_limit
> 3. for priority == 0, reclaim everything.
>
> Then it has the guarantee of the softlimit at certain level while also
> considers the hierarchy reclaim if the first few rounds doesn't
> fulfill the request.
>
seems complicated. I vote for " Hierarchical soft limit reclaim ".
If you need smart victim selection under hierarchy, please implement
victim scheduler which choose A2 rather than A and A1. I think you
can do it.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 8:21 ` KAMEZAWA Hiroyuki
@ 2012-04-20 14:17 ` Rik van Riel
2012-04-20 16:56 ` Ying Han
0 siblings, 1 reply; 25+ messages in thread
From: Rik van Riel @ 2012-04-20 14:17 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Ying Han, Johannes Weiner, Michal Hocko, Mel Gorman,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On 04/20/2012 04:21 AM, KAMEZAWA Hiroyuki wrote:
> If you need smart victim selection under hierarchy, please implement
> victim scheduler which choose A2 rather than A and A1. I think you
> can do it.
Ying and I spent a few hours working out exactly how to do
this, a few weeks ago in San Francisco.
She might still have the pictures of all the stuff we drew
on the whiteboard.
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 14:17 ` Rik van Riel
@ 2012-04-20 16:56 ` Ying Han
0 siblings, 0 replies; 25+ messages in thread
From: Ying Han @ 2012-04-20 16:56 UTC (permalink / raw)
To: Rik van Riel
Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Michal Hocko, Mel Gorman,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
[-- Attachment #1: Type: text/plain, Size: 937 bytes --]
On Fri, Apr 20, 2012 at 7:17 AM, Rik van Riel <riel@redhat.com> wrote:
> On 04/20/2012 04:21 AM, KAMEZAWA Hiroyuki wrote:
>
>> If you need smart victim selection under hierarchy, please implement
>> victim scheduler which choose A2 rather than A and A1. I think you
>> can do it.
>
>
> Ying and I spent a few hours working out exactly how to do
> this, a few weeks ago in San Francisco.
>
> She might still have the pictures of all the stuff we drew
> on the whiteboard.
Unfortunately, I do have that on my phone. See the attachment if
those who might be interested.
Rik, Johannes and myself were discussing how to make the soft_limit
reclaim being smart on picking memcg, the same logic we currently use
to do get_scan_count() between file/anon lru.
After the ground work in this patch is done, I do plan to make that
happen. But for now, I like to focus on the ground work as starting
point.
--Ying
>
> --
> All rights reversed
[-- Attachment #2: soft_limit.JPG --]
[-- Type: image/jpeg, Size: 104741 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 7:37 ` Ying Han
2012-04-20 8:21 ` KAMEZAWA Hiroyuki
@ 2012-04-20 13:17 ` Johannes Weiner
2012-04-20 17:44 ` Ying Han
1 sibling, 1 reply; 25+ messages in thread
From: Johannes Weiner @ 2012-04-20 13:17 UTC (permalink / raw)
To: Ying Han
Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Fri, Apr 20, 2012 at 12:37:41AM -0700, Ying Han wrote:
> On Thu, Apr 19, 2012 at 3:33 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Thu, Apr 19, 2012 at 10:47:27AM -0700, Ying Han wrote:
> >> On Thu, Apr 19, 2012 at 10:04 AM, Michal Hocko <mhocko@suse.cz> wrote:
> >> > On Wed 18-04-12 11:00:40, Ying Han wrote:
> >> >> On Wed, Apr 18, 2012 at 5:24 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >> >> > On Tue, Apr 17, 2012 at 09:37:46AM -0700, Ying Han wrote:
> >> >> >> 2. this patch is slightly different from the last one posted from Johannes
> >> >> >> http://comments.gmane.org/gmane.linux.kernel.mm/72382
> >> >> >> where his patch is closer to the reverted implementation by doing hierarchical
> >> >> >> reclaim for each selected memcg. However, that is not expected behavior from
> >> >> >> user perspective. Considering the following example:
> >> >> >>
> >> >> >> root (32G capacity)
> >> >> >> --> A (hard limit 20G, soft limit 15G, usage 16G)
> >> >> >> --> A1 (soft limit 5G, usage 4G)
> >> >> >> --> A2 (soft limit 10G, usage 12G)
> >> >> >> --> B (hard limit 20G, soft limit 10G, usage 16G)
> >> >> >>
> >> >> >> Under global reclaim, we shouldn't add pressure on A1 although its parent(A)
> >> >> >> exceeds softlimit. This is what admin expects by setting softlimit to the
> >> >> >> actual working set size and only reclaim pages under softlimit if system has
> >> >> >> trouble to reclaim.
> >> >> >
> >> >> > Actually, this is exactly what the admin expects when creating a
> >> >> > hierarchy, because she defines that A1 is a child of A and is
> >> >> > responsible for the memory situation in its parent.
> >> >
> >> > Hmm, I guess that both approaches have cons and pros.
> >> > * Hierarchical soft limit reclaim - reclaim the whole subtree of the over
> >> > soft limit memcg
> >> > + it is consistent with the hard limit reclaim
> >> Not sure why we want them to be consistent. Soft_limit is serving
> >> different purpose and the one of the main purpose is to preserve the
> >> working set of the cgroup.
> >
> > I'd argue, given the history of cgroups, one of the main purposes is
> > having a machine of containers where you overcommit their hard limit
> > and set the soft limit accordingly to provide fairness.
> >
> > Yes, we don't want to reclaim hierarchies that are below their soft
> > limit as long as there are some in excess, of course. This is a flaw
> > and needs fixing. But it's something completely different than
> > changing how the soft limit is defined and suddenly allow child
> > groups, which you may not trust, to override rules defined by parental
> > groups.
> >
> > It bothers me that we should add something that will almost certainly
> > bite us in the future while we are discussing on the cgroups list what
> > would stand in the way of getting sane hierarchy semantics across
> > controllers to provide consistency, nesting, etc.
>
> I understand the concern here and I don't want the soft_limit reclaim
> to be far away from the other part of the cgroup design down to the
> road. On the other hand, I don't think the current implementation is
> against the hierarchy semantics totally. See the comment below :)
>
> > To support a single use case, which I feel we still have not discussed
> > nearly enough to justify this change.
> >
> > For example, I get that you want 'meta-groups' that group together
> > subgroups for common accounting and hard limiting. But I don't see
> > why such meta-groups have their own processes. Conceptually, I mean,
> > how does a process fit into A? Is it superior to the tasks in A1 and
> > A2? Why can't it live in A3?
>
> For user processes, I can see that is totally feasible to live in A3.
> The case I was thinking is kernel threads, which 1) we don't want to
> limit their memory usage 2) they serve for the whole group unlike
> individual jobs. Of course, we could say that putting those kernel
> thread in A3 and leave the cgroup to unlimited, but not sure if we
> should constrain ourselves not having any processes running under A.
That's just handwaving.
> > So here is a proposal:
> >
> > Would it make sense to try to keep those meta groups always free of
> > their own memory so that they don't /need/ soft limits with weird
> > semantics? E.g. immediately free the unused memory on rmdir, OR add
> > mechanisms to migrate the memory to a dedicated group:
> >
> > A
> > A1 (soft-limited)
> > A2 (soft-limited)
> > B
> > unused (soft-limited)
> >
> > Move all leftover memory from finished jobs to this 'unused' group.
> > You could set its soft limit to 0 so that it sticks around only until
> > you actually need the memory for something else.
> >
> > Then you would get the benefits of accounting and limiting A1 and A2
> > under a single umbrella without the need for a soft limit in A. We
> > could keep the consistent semantics for soft limits, because you would
> > only have to set it on leaf nodes.
> >
> > Wouldn't this work for you?
>
> To be frankly, this sounds a lot of extra work for admin to manage the
> system and we still can not prevent page being landed on A totally.
Why not?
And what extra work are we talking here? As I wrote in the followup
mail: just keep the finished job groups around, set their soft limit
to 0. Surely you have a userspace job scheduler that sets up these
groups in the first place and could be trivially extended to set soft
limits and watch for notifications.
Let me repeat the pros here: no breaking of existing semantics. No
introduction of unprecedented semantics into the cgroup mess. No
changing of kernel code necessary (except what we want to tune
anyway). No computational overhead for you or anyone else.
If your only counter argument to this is that you can't be bothered to
slightly adjust your setup, I'm no longer interested in this
discussion.
> Back to the current proposal, there are two concerns that I can tell by far:
>
> 1. skipping "not trust" cgroup in case it sets its soft_limit very high:
> Here, we don't skip the "not trust" cgroup always. We do reclaim from
> them if not enough progress made from other cgroups above the
> softlimit. So, I don't see a problem here.
When you decide to reclaim from groups below their soft limit.
Which means that an untrusted group can force global reclaim to go for
the workingset in other groups.
> 2. not reclaiming based on hierarchy:
> Here I am not checking the ancestor's soft_limit in
> should_reclaim_mem_cgroup(). And it will only make difference if A is
> under soft_limit and A1 is above soft_limit. Now you do agree that we
> shouldn't reclaim from those under softlimit groups if there are
> cgroup exeed their softlimit. Then it leads me to think something like
> the following:
>
> 1. for priority > DEF_PRIORITY - 3, only reclaim memcg above their softlimit
> 2. for priority <= DEF_PRIORITY - 3, besides 1), also look at memcg's
> ancestor. reclaim memcgs whose ancestor above soft_limit
> 3. for priority == 0, reclaim everything.
>
> Then it has the guarantee of the softlimit at certain level while also
> considers the hierarchy reclaim if the first few rounds doesn't
> fulfill the request.
You expect sane setups to pay the cost of uselessly consulting the res
counters of every existing memcg, twice, on every single reclaim cycle.
Everyone has their agenda and their primary usecase, but this takes
the cake.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 13:17 ` Johannes Weiner
@ 2012-04-20 17:44 ` Ying Han
2012-04-20 18:58 ` Michal Hocko
0 siblings, 1 reply; 25+ messages in thread
From: Ying Han @ 2012-04-20 17:44 UTC (permalink / raw)
To: Johannes Weiner
Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Fri, Apr 20, 2012 at 6:17 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Fri, Apr 20, 2012 at 12:37:41AM -0700, Ying Han wrote:
>> On Thu, Apr 19, 2012 at 3:33 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> > On Thu, Apr 19, 2012 at 10:47:27AM -0700, Ying Han wrote:
>> >> On Thu, Apr 19, 2012 at 10:04 AM, Michal Hocko <mhocko@suse.cz> wrote:
>> >> > On Wed 18-04-12 11:00:40, Ying Han wrote:
>> >> >> On Wed, Apr 18, 2012 at 5:24 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> >> >> > On Tue, Apr 17, 2012 at 09:37:46AM -0700, Ying Han wrote:
>> >> >> >> 2. this patch is slightly different from the last one posted from Johannes
>> >> >> >> http://comments.gmane.org/gmane.linux.kernel.mm/72382
>> >> >> >> where his patch is closer to the reverted implementation by doing hierarchical
>> >> >> >> reclaim for each selected memcg. However, that is not expected behavior from
>> >> >> >> user perspective. Considering the following example:
>> >> >> >>
>> >> >> >> root (32G capacity)
>> >> >> >> --> A (hard limit 20G, soft limit 15G, usage 16G)
>> >> >> >> --> A1 (soft limit 5G, usage 4G)
>> >> >> >> --> A2 (soft limit 10G, usage 12G)
>> >> >> >> --> B (hard limit 20G, soft limit 10G, usage 16G)
>> >> >> >>
>> >> >> >> Under global reclaim, we shouldn't add pressure on A1 although its parent(A)
>> >> >> >> exceeds softlimit. This is what admin expects by setting softlimit to the
>> >> >> >> actual working set size and only reclaim pages under softlimit if system has
>> >> >> >> trouble to reclaim.
>> >> >> >
>> >> >> > Actually, this is exactly what the admin expects when creating a
>> >> >> > hierarchy, because she defines that A1 is a child of A and is
>> >> >> > responsible for the memory situation in its parent.
>> >> >
>> >> > Hmm, I guess that both approaches have cons and pros.
>> >> > * Hierarchical soft limit reclaim - reclaim the whole subtree of the over
>> >> > soft limit memcg
>> >> > + it is consistent with the hard limit reclaim
>> >> Not sure why we want them to be consistent. Soft_limit is serving
>> >> different purpose and the one of the main purpose is to preserve the
>> >> working set of the cgroup.
>> >
>> > I'd argue, given the history of cgroups, one of the main purposes is
>> > having a machine of containers where you overcommit their hard limit
>> > and set the soft limit accordingly to provide fairness.
>> >
>> > Yes, we don't want to reclaim hierarchies that are below their soft
>> > limit as long as there are some in excess, of course. This is a flaw
>> > and needs fixing. But it's something completely different than
>> > changing how the soft limit is defined and suddenly allow child
>> > groups, which you may not trust, to override rules defined by parental
>> > groups.
>> >
>> > It bothers me that we should add something that will almost certainly
>> > bite us in the future while we are discussing on the cgroups list what
>> > would stand in the way of getting sane hierarchy semantics across
>> > controllers to provide consistency, nesting, etc.
>>
>> I understand the concern here and I don't want the soft_limit reclaim
>> to be far away from the other part of the cgroup design down to the
>> road. On the other hand, I don't think the current implementation is
>> against the hierarchy semantics totally. See the comment below :)
>>
>> > To support a single use case, which I feel we still have not discussed
>> > nearly enough to justify this change.
>> >
>> > For example, I get that you want 'meta-groups' that group together
>> > subgroups for common accounting and hard limiting. But I don't see
>> > why such meta-groups have their own processes. Conceptually, I mean,
>> > how does a process fit into A? Is it superior to the tasks in A1 and
>> > A2? Why can't it live in A3?
>>
>> For user processes, I can see that is totally feasible to live in A3.
>> The case I was thinking is kernel threads, which 1) we don't want to
>> limit their memory usage 2) they serve for the whole group unlike
>> individual jobs. Of course, we could say that putting those kernel
>> thread in A3 and leave the cgroup to unlimited, but not sure if we
>> should constrain ourselves not having any processes running under A.
>
> That's just handwaving.
>
>> > So here is a proposal:
>> >
>> > Would it make sense to try to keep those meta groups always free of
>> > their own memory so that they don't /need/ soft limits with weird
>> > semantics? E.g. immediately free the unused memory on rmdir, OR add
>> > mechanisms to migrate the memory to a dedicated group:
>> >
>> > A
>> > A1 (soft-limited)
>> > A2 (soft-limited)
>> > B
>> > unused (soft-limited)
>> >
>> > Move all leftover memory from finished jobs to this 'unused' group.
>> > You could set its soft limit to 0 so that it sticks around only until
>> > you actually need the memory for something else.
>> >
>> > Then you would get the benefits of accounting and limiting A1 and A2
>> > under a single umbrella without the need for a soft limit in A. We
>> > could keep the consistent semantics for soft limits, because you would
>> > only have to set it on leaf nodes.
>> >
>> > Wouldn't this work for you?
>>
>> To be frankly, this sounds a lot of extra work for admin to manage the
>> system and we still can not prevent page being landed on A totally.
>
> Why not?
>
> And what extra work are we talking here? As I wrote in the followup
> mail: just keep the finished job groups around, set their soft limit
> to 0. Surely you have a userspace job scheduler that sets up these
> groups in the first place and could be trivially extended to set soft
> limits and watch for notifications.
>
> Let me repeat the pros here: no breaking of existing semantics. No
> introduction of unprecedented semantics into the cgroup mess. No
> changing of kernel code necessary (except what we want to tune
> anyway). No computational overhead for you or anyone else.
>
> If your only counter argument to this is that you can't be bothered to
> slightly adjust your setup, I'm no longer interested in this
> discussion.
Before going further, I wanna make sure there is no mis-communication
here. As I replied to Michal, I feel that we are mixing up global
reclaim and target reclaim policy here.
The way global reclaim works today is to scan all the mem cgroups to
fulfill the overall scan target per zone, and there is no bottom up
look up. My patch currently adds the softlimit reclaim under global
reclaim, and the difference is the filtering.
The soft_limit hierarchical reclaim we are discussing here is for
target reclaim?
--Ying
>
>> Back to the current proposal, there are two concerns that I can tell by far:
>>
>> 1. skipping "not trust" cgroup in case it sets its soft_limit very high:
>> Here, we don't skip the "not trust" cgroup always. We do reclaim from
>> them if not enough progress made from other cgroups above the
>> softlimit. So, I don't see a problem here.
>
> When you decide to reclaim from groups below their soft limit.
>
> Which means that an untrusted group can force global reclaim to go for
> the workingset in other groups.
>
>> 2. not reclaiming based on hierarchy:
>> Here I am not checking the ancestor's soft_limit in
>> should_reclaim_mem_cgroup(). And it will only make difference if A is
>> under soft_limit and A1 is above soft_limit. Now you do agree that we
>> shouldn't reclaim from those under softlimit groups if there are
>> cgroup exeed their softlimit. Then it leads me to think something like
>> the following:
>>
>> 1. for priority > DEF_PRIORITY - 3, only reclaim memcg above their softlimit
>> 2. for priority <= DEF_PRIORITY - 3, besides 1), also look at memcg's
>> ancestor. reclaim memcgs whose ancestor above soft_limit
>> 3. for priority == 0, reclaim everything.
>>
>> Then it has the guarantee of the softlimit at certain level while also
>> considers the hierarchy reclaim if the first few rounds doesn't
>> fulfill the request.
>
> You expect sane setups to pay the cost of uselessly consulting the res
> counters of every existing memcg, twice, on every single reclaim cycle.
>
> Everyone has their agenda and their primary usecase, but this takes
> the cake.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 17:44 ` Ying Han
@ 2012-04-20 18:58 ` Michal Hocko
2012-04-20 22:50 ` Ying Han
2012-04-20 23:29 ` Johannes Weiner
0 siblings, 2 replies; 25+ messages in thread
From: Michal Hocko @ 2012-04-20 18:58 UTC (permalink / raw)
To: Ying Han
Cc: Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Fri 20-04-12 10:44:14, Ying Han wrote:
> On Fri, Apr 20, 2012 at 6:17 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > Let me repeat the pros here: no breaking of existing semantics. No
> > introduction of unprecedented semantics into the cgroup mess. No
> > changing of kernel code necessary (except what we want to tune
> > anyway). No computational overhead for you or anyone else.
>
> >
> > If your only counter argument to this is that you can't be bothered to
> > slightly adjust your setup, I'm no longer interested in this
> > discussion.
>
> Before going further, I wanna make sure there is no mis-communication
> here. As I replied to Michal, I feel that we are mixing up global
> reclaim and target reclaim policy here.
I was referring to the global reclaim and my understanding is that
Johannes did the same when talking about soft reclaim (even though it
makes some sense to apply the same rules to the hard limit reclaim as
well - but later to that one...)
The primary question is whether soft reclaim should be hierarchical or
not. That is what I've tried to express in other email earlier in this
thread where I've tried (very briefly) to compare those approaches.
It currently _is_ hierarchical and your patch changes that so we have to
be sure that this change in semantic is reasonable. The only workload
that you seem to consider is when you have a full control over the
machine while Johannes is considered about containers which might misuse
your approach to push out working sets of concurrency...
My concern with hierarchical approach is that it doesn't play well with
0 default (which is needed if we want to make soft limit a guarantee,
right?). I do agree with Johannes about the potential misuse though. So
it seems that both approaches have serious issues with configurability.
Does this summary clarify the issue a bit? Or I am confused as well ;)
I am more inclined towards selective soft reclaim and make configuration
admin's responsibility (if you want some guarantee, admin has to approve
that and set it for you). This, however, doesn't enable self-ballooning
use case but I am not entirely sure this would work without a global
(admin) cooperation.
> The way global reclaim works today is to scan all the mem cgroups to
> fulfill the overall scan target per zone, and there is no bottom up
> look up.
bottom up was just an idea without anything in hands so let's put it
aside for now.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 18:58 ` Michal Hocko
@ 2012-04-20 22:50 ` Ying Han
2012-04-20 22:56 ` Rik van Riel
2012-04-21 0:19 ` Johannes Weiner
2012-04-20 23:29 ` Johannes Weiner
1 sibling, 2 replies; 25+ messages in thread
From: Ying Han @ 2012-04-20 22:50 UTC (permalink / raw)
To: Michal Hocko
Cc: Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Fri, Apr 20, 2012 at 11:58 AM, Michal Hocko <mhocko@suse.cz> wrote:
> On Fri 20-04-12 10:44:14, Ying Han wrote:
>> On Fri, Apr 20, 2012 at 6:17 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> > Let me repeat the pros here: no breaking of existing semantics. No
>> > introduction of unprecedented semantics into the cgroup mess. No
>> > changing of kernel code necessary (except what we want to tune
>> > anyway). No computational overhead for you or anyone else.
>>
>> >
>> > If your only counter argument to this is that you can't be bothered to
>> > slightly adjust your setup, I'm no longer interested in this
>> > discussion.
>>
>> Before going further, I wanna make sure there is no mis-communication
>> here. As I replied to Michal, I feel that we are mixing up global
>> reclaim and target reclaim policy here.
>
> I was referring to the global reclaim and my understanding is that
> Johannes did the same when talking about soft reclaim (even though it
> makes some sense to apply the same rules to the hard limit reclaim as
> well - but later to that one...)
>
> The primary question is whether soft reclaim should be hierarchical or
> not. That is what I've tried to express in other email earlier in this
> thread where I've tried (very briefly) to compare those approaches.
> It currently _is_ hierarchical and your patch changes that so we have to
> be sure that this change in semantic is reasonable.
Yes, after reading the other thread and I suddenly realized what you
guys are talking about.
The only workload
> that you seem to consider is when you have a full control over the
> machine while Johannes is considered about containers which might misuse
> your approach to push out working sets of concurrency...
> My concern with hierarchical approach is that it doesn't play well with
> 0 default (which is needed if we want to make soft limit a guarantee,
> right?). I do agree with Johannes about the potential misuse though. So
> it seems that both approaches have serious issues with configurability.
> Does this summary clarify the issue a bit? Or I am confused as well ;)
Thank you for the good summary and now we are on the same page :)
Regarding the misuse case, here I am gonna layout the ground rule for
setting up soft_limit:
"
Never over-commit the system by softlimit.
"
Considering the following:
root (32G, use_hierarchy = 1)
-- A (soft: 16G, usage 22G)
-- A1 (soft: 10G, usage 17G)
-- A2 (soft: 6G, usage 5G)
-- B (soft: 16G, usage 10G)
1) sum_of_softlimit(A + B) <= machine capacity
2) sum_of_softlimit(A1 + A2) <= softlimit(A)
So we have both A and A1 above softlimit. If we follow the ground rule
to set up the softlimit, we should be confidence to say that "If A is
above its softlimit, there must be cgroups under A who are also above
softlimit". We can still leave the priority check there in case all
the pages from A1 are hard to reclaim, and then we will look into A2
only by then.
I think it is reasonable to layout this upfront, otherwise we can not
make all the misuse cases right. And if we follow that route, lots of
things will become clear.
--Ying
>
> I am more inclined towards selective soft reclaim and make configuration
> admin's responsibility (if you want some guarantee, admin has to approve
> that and set it for you).
This, however, doesn't enable self-ballooning
> use case but I am not entirely sure this would work without a global
> (admin) cooperation.
>
>> The way global reclaim works today is to scan all the mem cgroups to
>> fulfill the overall scan target per zone, and there is no bottom up
>> look up.
>
> bottom up was just an idea without anything in hands so let's put it
> aside for now.
>
> --
> Michal Hocko
> SUSE Labs
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9
> Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 22:50 ` Ying Han
@ 2012-04-20 22:56 ` Rik van Riel
2012-04-20 23:14 ` Ying Han
2012-04-21 0:19 ` Johannes Weiner
1 sibling, 1 reply; 25+ messages in thread
From: Rik van Riel @ 2012-04-20 22:56 UTC (permalink / raw)
To: Ying Han
Cc: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On 04/20/2012 06:50 PM, Ying Han wrote:
> Regarding the misuse case, here I am gonna layout the ground rule for
> setting up soft_limit:
>
> "
> Never over-commit the system by softlimit.
> "
> I think it is reasonable to layout this upfront, otherwise we can not
> make all the misuse cases right. And if we follow that route, lots of
> things will become clear.
While that rule looks reasonable at first glance, I do not
believe it is possible to follow it in practice.
One reason is memory resizing through ballooning in virtual
machines. It is possible for the "physical" memory size to
shrink below the sum of the softlimits.
Another reason is memory zones and NUMA. It is possible for
one memory zone (or NUMA node) to only have cgroups that
are under their soft limit.
If this happens to be the one memory zone we can allocate
network buffers from, we could deadlock the system if we
refused to reclaim pages from a cgroup under its limit.
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 22:56 ` Rik van Riel
@ 2012-04-20 23:14 ` Ying Han
0 siblings, 0 replies; 25+ messages in thread
From: Ying Han @ 2012-04-20 23:14 UTC (permalink / raw)
To: Rik van Riel
Cc: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Fri, Apr 20, 2012 at 3:56 PM, Rik van Riel <riel@redhat.com> wrote:
> On 04/20/2012 06:50 PM, Ying Han wrote:
>
>> Regarding the misuse case, here I am gonna layout the ground rule for
>> setting up soft_limit:
>>
>> "
>> Never over-commit the system by softlimit.
>> "
>
>
>> I think it is reasonable to layout this upfront, otherwise we can not
>> make all the misuse cases right. And if we follow that route, lots of
>> things will become clear.
>
>
> While that rule looks reasonable at first glance, I do not
> believe it is possible to follow it in practice.
>
> One reason is memory resizing through ballooning in virtual
> machines. It is possible for the "physical" memory size to
> shrink below the sum of the softlimits.
Hmm, can you give more details on that? I assume the soft_limit should
be adjusted at run-time based on the memory usage, and in your case,
the "physcial" memory size.
This is different from hard_limit, which we can over-commit by set it
once and live with it.
>
> Another reason is memory zones and NUMA. It is possible for
> one memory zone (or NUMA node) to only have cgroups that
> are under their soft limit.
>
> If this happens to be the one memory zone we can allocate
> network buffers from, we could deadlock the system if we
> refused to reclaim pages from a cgroup under its limit.
Yes, that is the problem we talked about during LSF. Having
"per-memcg-per-zone softlimit" sounds too complicated and not
practical at all. To deal with that, my current patch is to identify
the situation by doing the first round of scanning, and then skip the
soft_limit if that is the case.
--Ying
>
> --
> All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 22:50 ` Ying Han
2012-04-20 22:56 ` Rik van Riel
@ 2012-04-21 0:19 ` Johannes Weiner
2012-04-21 0:48 ` Johannes Weiner
1 sibling, 1 reply; 25+ messages in thread
From: Johannes Weiner @ 2012-04-21 0:19 UTC (permalink / raw)
To: Ying Han
Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Fri, Apr 20, 2012 at 03:50:28PM -0700, Ying Han wrote:
> On Fri, Apr 20, 2012 at 11:58 AM, Michal Hocko <mhocko@suse.cz> wrote:
> > On Fri 20-04-12 10:44:14, Ying Han wrote:
> >> On Fri, Apr 20, 2012 at 6:17 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >> > Let me repeat the pros here: no breaking of existing semantics. No
> >> > introduction of unprecedented semantics into the cgroup mess. No
> >> > changing of kernel code necessary (except what we want to tune
> >> > anyway). No computational overhead for you or anyone else.
> >>
> >> >
> >> > If your only counter argument to this is that you can't be bothered to
> >> > slightly adjust your setup, I'm no longer interested in this
> >> > discussion.
> >>
> >> Before going further, I wanna make sure there is no mis-communication
> >> here. As I replied to Michal, I feel that we are mixing up global
> >> reclaim and target reclaim policy here.
> >
> > I was referring to the global reclaim and my understanding is that
> > Johannes did the same when talking about soft reclaim (even though it
> > makes some sense to apply the same rules to the hard limit reclaim as
> > well - but later to that one...)
> >
> > The primary question is whether soft reclaim should be hierarchical or
> > not. That is what I've tried to express in other email earlier in this
> > thread where I've tried (very briefly) to compare those approaches.
> > It currently _is_ hierarchical and your patch changes that so we have to
> > be sure that this change in semantic is reasonable.
>
> Yes, after reading the other thread and I suddenly realized what you
> guys are talking about.
>
> The only workload
> > that you seem to consider is when you have a full control over the
> > machine while Johannes is considered about containers which might misuse
> > your approach to push out working sets of concurrency...
> > My concern with hierarchical approach is that it doesn't play well with
> > 0 default (which is needed if we want to make soft limit a guarantee,
> > right?). I do agree with Johannes about the potential misuse though. So
> > it seems that both approaches have serious issues with configurability.
> > Does this summary clarify the issue a bit? Or I am confused as well ;)
>
> Thank you for the good summary and now we are on the same page :)
>
> Regarding the misuse case, here I am gonna layout the ground rule for
> setting up soft_limit:
>
> "
> Never over-commit the system by softlimit.
> "
Which proves that we are not on the same page at all :-(
It's not about dealing with rare, non-sensical setups, it's about
suddenly trusting children to do the right thing.
And it's about suddenly REQUIRING all children to cooperate even for
the reasonable configuration case, instead of just having soft limits
apply hierarchically.
Meanwhile, you STILL haven't provided an argument why you couldn't
just fix your cgroup tree organization to make sense for the semantics
you require instead of pushing for such a bogus change.
It's like you're trying to redefine multiplication because you
accidentally used * instead of + in your equation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-21 0:19 ` Johannes Weiner
@ 2012-04-21 0:48 ` Johannes Weiner
2012-04-23 22:19 ` Ying Han
0 siblings, 1 reply; 25+ messages in thread
From: Johannes Weiner @ 2012-04-21 0:48 UTC (permalink / raw)
To: Ying Han
Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Sat, Apr 21, 2012 at 02:19:14AM +0200, Johannes Weiner wrote:
> It's like you're trying to redefine multiplication because you
> accidentally used * instead of + in your equation.
You could for example do this:
-> A (hard limit = 16G)
-> A1 (hard limit = 10G)
-> A2 (hard limit = 6G)
and say the same: you want to account A, A1, and A2 under the same
umbrella, so you want the same hierarchy. And you want to limit the
memory in A (from finished jobs and tasks running directly in A), but
this limit should NOT apply to A1 and A2 when they have not reached
THEIR respective limits.
You can apply all your current arguments to this same case. And yet,
you say hierarchical hard limits make sense while hierarchical soft
limits don't. I hope this example makes it clear why this is not true
at all.
We have cases where we want the hierarchical limits. Both hard limits
and soft limits. You can easily fix your setup without taking away
this power from everyone else or introducing inconsistency. Your
whole problem stems from a simple misconfiguration.
The solution to both cases is this: don't stick memory in these meta
groups and complain that their hierarchical limits apply to their
children.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-21 0:48 ` Johannes Weiner
@ 2012-04-23 22:19 ` Ying Han
0 siblings, 0 replies; 25+ messages in thread
From: Ying Han @ 2012-04-23 22:19 UTC (permalink / raw)
To: Johannes Weiner
Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Fri, Apr 20, 2012 at 5:48 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Sat, Apr 21, 2012 at 02:19:14AM +0200, Johannes Weiner wrote:
>> It's like you're trying to redefine multiplication because you
>> accidentally used * instead of + in your equation.
>
> You could for example do this:
>
> -> A (hard limit = 16G)
> -> A1 (hard limit = 10G)
> -> A2 (hard limit = 6G)
>
> and say the same: you want to account A, A1, and A2 under the same
> umbrella, so you want the same hierarchy. And you want to limit the
> memory in A (from finished jobs and tasks running directly in A), but
> this limit should NOT apply to A1 and A2 when they have not reached
> THEIR respective limits.
>
> You can apply all your current arguments to this same case. And yet,
> you say hierarchical hard limits make sense while hierarchical soft
> limits don't. I hope this example makes it clear why this is not true
> at all.
I understand the example above which the pressure from A goes down to
A1 and A2, although neither of them reaches their hard_limit.
I am not against doing similar hierarchical reclaim on soft_limit, as
long as it is solving the problem which the soft_limit is targeted
for. The admin is setting up soft_limit to preserve working set for
each cgroup, which means that reclaim under the soft_limit could hurt
the application's performance. I assume that expectation is slightly
different from hard_limit and that's why we have two APIs instead of
one.
>
> We have cases where we want the hierarchical limits. Both hard limits
> and soft limits. You can easily fix your setup without taking away
> this power from everyone else or introducing inconsistency. Your
> whole problem stems from a simple misconfiguration.
Let's see the following example:
A
-- A1
-- A2
There are three possibilities of how the soft_limit being set :
Here I use X to represent pages in A's lru only (re-parented or
process running under A) and admin wants to preserve.
1. soft_limit(A) == soft_limit(A1) + soft_limit(A2) + X
// only reclaiming from A2 will bring the usage_in_bytes of A under
its soft_limit.
A (soft_limit == 31G, X=1G, usage_in_bytes = 35G)
-- A1 (soft_limit == 15G, usage_in_bytes = 14G)
-- A2 (soft_limit == 15G, usage_in_bytes = 20G)
2. soft_limit(A) > soft_limit(A1) + soft_limit(A2) + X
//only reclaiming from A2 and it is ok.
A (soft_limit == 40G, X=1G, usage_in_bytes = 35G)
-- A1 (soft_limit == 15G, usage_in_bytes = 14G)
-- A2 (soft_limit == 15G, usage_in_bytes = 20G)
3. soft_limit (A) < soft_limit(A1) + soft_limit(A2) + X
//only reclaiming from A2 doesn't help and we have to reclaim both A1 and A2.
A (soft_limit == 31G, X=1G, usage_in_bytes = 35G)
-- A1 (soft_limit == 100G, usage_in_bytes = 14G)
-- A2 (soft_limit == 15G, usage_in_bytes = 20G)
If I understand correctly, the case3 is what my patch works
differently from yours. The difference is that my patch won't reclaim
from A1 but it is reclaimed from yours.
AFAIK, in most of the cases (if not all), the case1 would be adopted
by admin and that is what I've been trying to make to work. On the
other hand, i agree w/ you that we shouldn't constrain ourselves to
support only one configuration. But here is my question:
1. Do you agree that case1 would be the configuration makes most of
the senses for admin ?
2. If the answer of 1) is yes, do you agree that your proposal doesn't
work well w/ the admin's expectation ?
Meanwhile, i haven't figured out whether case 3 would be a well
adopted configuration. But let me guess why it is configured like
this?
a) admin wants to guarantee no reclaim on pages in A1 ?
if so, my patch works as expected
b) mis-configuration ?
if so, my patch doesn't work as expected. but since it is
mis-configuration and there is really no expectation. what we need
instead is not breaking the system
Overall, I would like to make sure the most-popular use case to work
and at the same time not breaking the system by having
mis-configuration. Hopefully this makes sense to you :)
--Ying
>
> The solution to both cases is this: don't stick memory in these meta
> groups and complain that their hierarchical limits apply to their
> children.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 18:58 ` Michal Hocko
2012-04-20 22:50 ` Ying Han
@ 2012-04-20 23:29 ` Johannes Weiner
2012-04-23 13:59 ` Michal Hocko
1 sibling, 1 reply; 25+ messages in thread
From: Johannes Weiner @ 2012-04-20 23:29 UTC (permalink / raw)
To: Michal Hocko
Cc: Ying Han, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Fri, Apr 20, 2012 at 08:58:47PM +0200, Michal Hocko wrote:
> On Fri 20-04-12 10:44:14, Ying Han wrote:
> > On Fri, Apr 20, 2012 at 6:17 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > Let me repeat the pros here: no breaking of existing semantics. No
> > > introduction of unprecedented semantics into the cgroup mess. No
> > > changing of kernel code necessary (except what we want to tune
> > > anyway). No computational overhead for you or anyone else.
> >
> > >
> > > If your only counter argument to this is that you can't be bothered to
> > > slightly adjust your setup, I'm no longer interested in this
> > > discussion.
> >
> > Before going further, I wanna make sure there is no mis-communication
> > here. As I replied to Michal, I feel that we are mixing up global
> > reclaim and target reclaim policy here.
>
> I was referring to the global reclaim and my understanding is that
> Johannes did the same when talking about soft reclaim (even though it
> makes some sense to apply the same rules to the hard limit reclaim as
> well - but later to that one...)
>
> The primary question is whether soft reclaim should be hierarchical or
> not. That is what I've tried to express in other email earlier in this
> thread where I've tried (very briefly) to compare those approaches.
> It currently _is_ hierarchical and your patch changes that so we have to
> be sure that this change in semantic is reasonable. The only workload
> that you seem to consider is when you have a full control over the
> machine while Johannes is considered about containers which might misuse
> your approach to push out working sets of concurrency...
> My concern with hierarchical approach is that it doesn't play well with
> 0 default (which is needed if we want to make soft limit a guarantee,
> right?). I do agree with Johannes about the potential misuse though. So
> it seems that both approaches have serious issues with configurability.
> Does this summary clarify the issue a bit? Or I am confused as well ;)
Thanks for the nice summary!
A note on the default hierarchical soft limit:
Consider not making the default to be 0, but a special value. We want
it to mean 'no guarantee' and 'every byte is in excess of the soft
limit', to keep the existing behaviour. But at the same time, we
wouldn't have to make it inheritable:
A (soft = default)
A1 (soft = 10G)
A2 (soft = 12G)
so in case of global reclaim, A itself would be eligible, but it would
not apply hierarchically to A1 and A2. They would still only get
reclaimed if their usage would be above their respective soft limits.
Only if you set A's soft limit to 0 or higher it will apply
hierarchically, so that if a parent declares 'no guarantee', no child
is able to override it.
Maybe we can keep -1/~0UL and just treat it a bit differently.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-20 23:29 ` Johannes Weiner
@ 2012-04-23 13:59 ` Michal Hocko
0 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2012-04-23 13:59 UTC (permalink / raw)
To: Johannes Weiner
Cc: Ying Han, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Sat 21-04-12 01:29:09, Johannes Weiner wrote:
> On Fri, Apr 20, 2012 at 08:58:47PM +0200, Michal Hocko wrote:
> > On Fri 20-04-12 10:44:14, Ying Han wrote:
> > > On Fri, Apr 20, 2012 at 6:17 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > Let me repeat the pros here: no breaking of existing semantics. No
> > > > introduction of unprecedented semantics into the cgroup mess. No
> > > > changing of kernel code necessary (except what we want to tune
> > > > anyway). No computational overhead for you or anyone else.
> > >
> > > >
> > > > If your only counter argument to this is that you can't be bothered to
> > > > slightly adjust your setup, I'm no longer interested in this
> > > > discussion.
> > >
> > > Before going further, I wanna make sure there is no mis-communication
> > > here. As I replied to Michal, I feel that we are mixing up global
> > > reclaim and target reclaim policy here.
> >
> > I was referring to the global reclaim and my understanding is that
> > Johannes did the same when talking about soft reclaim (even though it
> > makes some sense to apply the same rules to the hard limit reclaim as
> > well - but later to that one...)
> >
> > The primary question is whether soft reclaim should be hierarchical or
> > not. That is what I've tried to express in other email earlier in this
> > thread where I've tried (very briefly) to compare those approaches.
> > It currently _is_ hierarchical and your patch changes that so we have to
> > be sure that this change in semantic is reasonable. The only workload
> > that you seem to consider is when you have a full control over the
> > machine while Johannes is considered about containers which might misuse
> > your approach to push out working sets of concurrency...
> > My concern with hierarchical approach is that it doesn't play well with
> > 0 default (which is needed if we want to make soft limit a guarantee,
> > right?). I do agree with Johannes about the potential misuse though. So
> > it seems that both approaches have serious issues with configurability.
> > Does this summary clarify the issue a bit? Or I am confused as well ;)
>
> Thanks for the nice summary!
>
> A note on the default hierarchical soft limit:
>
> Consider not making the default to be 0, but a special value. We want
> it to mean 'no guarantee' and 'every byte is in excess of the soft
> limit', to keep the existing behaviour. But at the same time, we
> wouldn't have to make it inheritable:
>
> A (soft = default)
> A1 (soft = 10G)
> A2 (soft = 12G)
>
> so in case of global reclaim, A itself would be eligible, but it would
> not apply hierarchically to A1 and A2. They would still only get
> reclaimed if their usage would be above their respective soft limits.
> Only if you set A's soft limit to 0 or higher it will apply
> hierarchically, so that if a parent declares 'no guarantee', no child
> is able to override it.
I was thinking about a special value for the local reclaim as well but I
didn't like it much because then it wouldn't be only a value for limit
but also an API to switch between hierarchical vs. non-hierarchical
reclaim so it is an API of some sort. So I am really not so sure about
it and would rather go a different way - if there is any...
> Maybe we can keep -1/~0UL and just treat it a bit differently.
I would rather see 0 as a special value, if this is the only way to go,
it would make the life easier and also it makes more sense to me.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V3 0/2] memcg softlimit reclaim rework
2012-04-19 22:33 ` Johannes Weiner
2012-04-19 22:51 ` Johannes Weiner
2012-04-20 7:37 ` Ying Han
@ 2012-04-20 8:28 ` Michal Hocko
2 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2012-04-20 8:28 UTC (permalink / raw)
To: Johannes Weiner
Cc: Ying Han, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
linux-mm
On Fri 20-04-12 00:33:18, Johannes Weiner wrote:
> On Thu, Apr 19, 2012 at 10:47:27AM -0700, Ying Han wrote:
> > On Thu, Apr 19, 2012 at 10:04 AM, Michal Hocko <mhocko@suse.cz> wrote:
[...]
> > > Hmm, I guess that both approaches have cons and pros.
> > > * Hierarchical soft limit reclaim - reclaim the whole subtree of the over
> > > soft limit memcg
> > > + it is consistent with the hard limit reclaim
> > Not sure why we want them to be consistent. Soft_limit is serving
> > different purpose and the one of the main purpose is to preserve the
> > working set of the cgroup.
>
> I'd argue, given the history of cgroups, one of the main purposes is
> having a machine of containers where you overcommit their hard limit
> and set the soft limit accordingly to provide fairness.
>
> Yes, we don't want to reclaim hierarchies that are below their soft
> limit as long as there are some in excess, of course. This is a flaw
> and needs fixing. But it's something completely different than
> changing how the soft limit is defined and suddenly allow child
> groups, which you may not trust, to override rules defined by parental
> groups.
As I wrote in other email. Who is allowed to set the limit? Owner of the
container? If yes then how is admin supposed to set the top limit for
the container? Default (0) will not work, right?
>
> It bothers me that we should add something that will almost certainly
> bite us in the future while we are discussing on the cgroups list what
> would stand in the way of getting sane hierarchy semantics across
> controllers to provide consistency, nesting, etc.
>
> To support a single use case, which I feel we still have not discussed
> nearly enough to justify this change.
>
> For example, I get that you want 'meta-groups' that group together
> subgroups for common accounting and hard limiting. But I don't see
> why such meta-groups have their own processes. Conceptually, I mean,
> how does a process fit into A? Is it superior to the tasks in A1 and
> A2? Why can't it live in A3?
That was my thinking as well but it will get harder if we really want to
have the unified hierarchy for all controllers.
Consider a school lab and per-user group which basically limits cpu
bandwidth and maximum amount of memory by hard limit (soft limit 0).
If a user would like to run a workload which would benefit from resident
memory he could create a subgroup and set a soft limit. All other tasks
would be executed in his native group by default because we probably do
not want him to think about cgroups for all tasks.
[...]
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread