All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/5] memcg softlimit reclaim rework
@ 2012-04-11 21:56 Ying Han
  2012-04-14 12:19 ` Hillf Danton
  0 siblings, 1 reply; 3+ messages in thread
From: Ying Han @ 2012-04-11 21:56 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Hillf Danton, Hugh Dickins, Dan Magenheimer
  Cc: linux-mm

The "soft_limit" was introduced in memcg to support over-committing the
memory resource on the host. Each cgroup configures its "hard_limit" where
it will be throttled or OOM killed by going over the limit. However, the
cgroup can go above the "soft_limit" as long as there is no system-wide
memory contention. So, the "soft_limit" is the kernel mechanism for
re-distributng system spare memory among cgroups.

This patch reworks the softlimit reclaim by hooking it into the new global
reclaim scheme. So the global reclaim path including direct reclaim and
background reclaim will respect the memcg softlimit.

Note:
1. the new implementation of softlimit reclaim is rather simple and first
step for further optimizations. there is no memory pressure balancing between
memcgs for each zone, and that is something we would like to add as follow-ups.

2. this patch is slightly different from the last one posted from Johannes,
where his patch is closer to the reverted implementation by doing hierarchical
reclaim for each selected memcg. However, that is not expected behavior from
user perspective. Considering the following example:

root (32G capacity)
--> A (hard limit 20G, soft limit 15G, usage 16G)
   --> A1 (soft limit 5G, usage 4G)
   --> A2 (soft limit 10G, usage 12G)
--> B (hard limit 20G, soft limit 10G, usage 16G)

Under global reclaim, we shouldn't add pressure on A1 although its parent(A)
exceeds softlimit. This is what admin expects by setting softlimit to the
actual working set size and only reclaim pages under softlimit if system has
trouble to reclaim.

Test on 32G host:
The stats are the memory.vmscan_stat which I didn't included in this patchset. It exports per-memcg based vmscan stats. The stat shows in the following exports the number of pages being reclaimed under global pressure from each memcg. As I can see, there is no pages reclaimed under memcg softlimit until some point (case 3). In that case, there are many reclaimers (20 container + kswapds ) with less reclaimable memcg (above softlimit) and the reclaim priority jumps. That's why we see memcg under softlimit being reclaimed as well.

1. 20 * cat 1G ramdisk containers (hardlimit = 512M, softlimit = 0 by default) + memory hog (for global pressure)
    $ for ((i=0; i<20; i++)); do cat /dev/cgroup/memory/$i/memory.vmscan_stat | grep total_freed_file_pages_by_system_under_hierarchy; done
    total_freed_file_pages_by_system_under_hierarchy 4431458
    total_freed_file_pages_by_system_under_hierarchy 4572150
    total_freed_file_pages_by_system_under_hierarchy 4260969
    total_freed_file_pages_by_system_under_hierarchy 4522491
    total_freed_file_pages_by_system_under_hierarchy 4467898
    total_freed_file_pages_by_system_under_hierarchy 4231144
    total_freed_file_pages_by_system_under_hierarchy 4467987
    total_freed_file_pages_by_system_under_hierarchy 4415137
    total_freed_file_pages_by_system_under_hierarchy 4537076
    total_freed_file_pages_by_system_under_hierarchy 4374586
    total_freed_file_pages_by_system_under_hierarchy 4238208
    total_freed_file_pages_by_system_under_hierarchy 4497263
    total_freed_file_pages_by_system_under_hierarchy 4401839
    total_freed_file_pages_by_system_under_hierarchy 4407700
    total_freed_file_pages_by_system_under_hierarchy 4291009
    total_freed_file_pages_by_system_under_hierarchy 4228416
    total_freed_file_pages_by_system_under_hierarchy 4126986
    total_freed_file_pages_by_system_under_hierarchy 4730479
    total_freed_file_pages_by_system_under_hierarchy 4316904
    total_freed_file_pages_by_system_under_hierarchy 4304469

2. 20 * cat 1G ramdisk containers (hardlimit = 512M, 1-5 container softlimit = 512M) + memory hog (for global pressure)
    total_freed_file_pages_by_system_under_hierarchy 0
    total_freed_file_pages_by_system_under_hierarchy 0
    total_freed_file_pages_by_system_under_hierarchy 0
    total_freed_file_pages_by_system_under_hierarchy 0
    total_freed_file_pages_by_system_under_hierarchy 0
    total_freed_file_pages_by_system_under_hierarchy 4562418
    total_freed_file_pages_by_system_under_hierarchy 4630498
    total_freed_file_pages_by_system_under_hierarchy 4809946
    total_freed_file_pages_by_system_under_hierarchy 4767868
    total_freed_file_pages_by_system_under_hierarchy 4716920
    total_freed_file_pages_by_system_under_hierarchy 4828952
    total_freed_file_pages_by_system_under_hierarchy 4672482
    total_freed_file_pages_by_system_under_hierarchy 4593165
    total_freed_file_pages_by_system_under_hierarchy 4862157
    total_freed_file_pages_by_system_under_hierarchy 4639331
    total_freed_file_pages_by_system_under_hierarchy 4620658
    total_freed_file_pages_by_system_under_hierarchy 4880210
    total_freed_file_pages_by_system_under_hierarchy 4652485
    total_freed_file_pages_by_system_under_hierarchy 4633724
    total_freed_file_pages_by_system_under_hierarchy 4673583

3. 20 * cat 1G ramdisk containers (hardlimit = 512M, 1-10 container softlimit = 512M) + memory hog (for global pressure)
   total_freed_file_pages_by_system_under_hierarchy 7318
   total_freed_file_pages_by_system_under_hierarchy 6612
   total_freed_file_pages_by_system_under_hierarchy 2900
   total_freed_file_pages_by_system_under_hierarchy 5740
   total_freed_file_pages_by_system_under_hierarchy 5353
   total_freed_file_pages_by_system_under_hierarchy 4707
   total_freed_file_pages_by_system_under_hierarchy 4252
   total_freed_file_pages_by_system_under_hierarchy 5518
   total_freed_file_pages_by_system_under_hierarchy 1431
   total_freed_file_pages_by_system_under_hierarchy 5722
   total_freed_file_pages_by_system_under_hierarchy 9538489
   total_freed_file_pages_by_system_under_hierarchy 9334518
   total_freed_file_pages_by_system_under_hierarchy 9727377
   total_freed_file_pages_by_system_under_hierarchy 9602573
   total_freed_file_pages_by_system_under_hierarchy 9771141
   total_freed_file_pages_by_system_under_hierarchy 9769589
   total_freed_file_pages_by_system_under_hierarchy 9610550
   total_freed_file_pages_by_system_under_hierarchy 9535241
   total_freed_file_pages_by_system_under_hierarchy 9912726
   total_freed_file_pages_by_system_under_hierarchy 9502706

Ying Han (5):
  memcg: revert current soft limit reclaim implementation
  memcg: rework softlimit reclaim
  memcg: set soft_limit_in_bytes to 0 by default
  memcg: detect no memcgs above softlimit under zone reclaim.
  memcg: change the target nr_to_reclaim for each memcg under kswapd

 include/linux/memcontrol.h |   18 +--
 include/linux/swap.h       |    4 -
 kernel/res_counter.c       |    1 -
 mm/memcontrol.c            |  397 +-------------------------------------------
 mm/vmscan.c                |  125 ++++++---------
 5 files changed, 65 insertions(+), 480 deletions(-)

-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V2 0/5] memcg softlimit reclaim rework
  2012-04-11 21:56 [PATCH V2 0/5] memcg softlimit reclaim rework Ying Han
@ 2012-04-14 12:19 ` Hillf Danton
  2012-04-16 16:32   ` Ying Han
  0 siblings, 1 reply; 3+ messages in thread
From: Hillf Danton @ 2012-04-14 12:19 UTC (permalink / raw)
  To: Ying Han
  Cc: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Hugh Dickins, Dan Magenheimer, linux-mm

On Thu, Apr 12, 2012 at 5:56 AM, Ying Han <yinghan@google.com> wrote:
> The "soft_limit" was introduced in memcg to support over-committing the
> memory resource on the host. Each cgroup configures its "hard_limit" where
> it will be throttled or OOM killed by going over the limit. However, the
> cgroup can go above the "soft_limit" as long as there is no system-wide
> memory contention. So, the "soft_limit" is the kernel mechanism for
> re-distributng system spare memory among cgroups.
>
s/re-distributng/re-distributing/

> This patch reworks the softlimit reclaim by hooking it into the new global
> reclaim scheme. So the global reclaim path including direct reclaim and
> background reclaim will respect the memcg softlimit.
>
> Note:
> 1. the new implementation of softlimit reclaim is rather simple and first
> step for further optimizations. there is no memory pressure balancing between
> memcgs for each zone, and that is something we would like to add as follow-ups.
>
> 2. this patch is slightly different from the last one posted from Johannes,
>
For those who want to see posts by Johannes, add links please.

> where his patch is closer to the reverted implementation by doing hierarchical
> reclaim for each selected memcg. However, that is not expected behavior from
> user perspective. Considering the following example:
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V2 0/5] memcg softlimit reclaim rework
  2012-04-14 12:19 ` Hillf Danton
@ 2012-04-16 16:32   ` Ying Han
  0 siblings, 0 replies; 3+ messages in thread
From: Ying Han @ 2012-04-16 16:32 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Hugh Dickins, Dan Magenheimer, linux-mm

On Sat, Apr 14, 2012 at 5:19 AM, Hillf Danton <dhillf@gmail.com> wrote:
> On Thu, Apr 12, 2012 at 5:56 AM, Ying Han <yinghan@google.com> wrote:
>> The "soft_limit" was introduced in memcg to support over-committing the
>> memory resource on the host. Each cgroup configures its "hard_limit" where
>> it will be throttled or OOM killed by going over the limit. However, the
>> cgroup can go above the "soft_limit" as long as there is no system-wide
>> memory contention. So, the "soft_limit" is the kernel mechanism for
>> re-distributng system spare memory among cgroups.
>>
> s/re-distributng/re-distributing/
>
>> This patch reworks the softlimit reclaim by hooking it into the new global
>> reclaim scheme. So the global reclaim path including direct reclaim and
>> background reclaim will respect the memcg softlimit.
>>
>> Note:
>> 1. the new implementation of softlimit reclaim is rather simple and first
>> step for further optimizations. there is no memory pressure balancing between
>> memcgs for each zone, and that is something we would like to add as follow-ups.
>>
>> 2. this patch is slightly different from the last one posted from Johannes,
>>
> For those who want to see posts by Johannes, add links please.

http://comments.gmane.org/gmane.linux.kernel.mm/72382

If that is helpful, i will include it into the next post.

--Ying

>
>> where his patch is closer to the reverted implementation by doing hierarchical
>> reclaim for each selected memcg. However, that is not expected behavior from
>> user perspective. Considering the following example:
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-04-16 16:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11 21:56 [PATCH V2 0/5] memcg softlimit reclaim rework Ying Han
2012-04-14 12:19 ` Hillf Danton
2012-04-16 16:32   ` Ying Han

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.