All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFD] Isolated memory cgroups again
@ 2011-10-20  1:33 ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-20  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: LKML, Johannes Weiner, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Hugh Dickins, Ying Han, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

Hi all,
this is a request for discussion (I hope we can touch this during memcg
meeting during the upcoming KS). I have brought this up earlier this
year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
The patch got much smaller since then due to excellent Johannes' memcg
naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
which this is based on.
I realize that this will be controversial but I would like to hear
whether this is strictly no-go or whether we can go that direction (the
implementation might differ of course).

The patch is still half baked but I guess it should be sufficient to
show what I am trying to achieve.
The basic idea is that memcgs would get a new attribute (isolated) which
would control whether that group should be considered during global
reclaim.
This means that we could achieve a certain memory isolation for
processes in the group from the rest of the system activity which has
been traditionally done by mlocking the important parts of memory.
This approach, however, has some advantages. First of all, it is a kind
of all or nothing type of approach. Either the memory is important and
mlocked or you have no guarantee that it keeps resident. 
Secondly it is much more prone to OOM situation.
Let's consider a case where a memory is evictable in theory but you
would pay quite much if you have to get it back resident (pre calculated
data from database - e.g. reports). The memory wouldn't be used very
often so it would be a number one candidate to evict after some time.
We would want to have something like a clever mlock in such a case which
would evict that memory only if the cgroup itself gets under memory
pressure (e.g. peak workload). This is not hard to do if we are not
over committing the memory but things get tricky otherwise.
With the isolated memcgs we get exactly such a guarantee because we would
reclaim such a memory only from the hard limit reclaim paths or if the
soft limit reclaim if it is set up.

Any thoughts comments?

---
From: Michal Hocko <mhocko@suse.cz>
Subject: Implement isolated cgroups

This patch adds a new per-cgroup knob (isolated) which controls whether
pages charged for the group should be considered for the global reclaim
or they are reclaimed only during soft reclaim and under per-cgroup
memory pressure.

The value can be modified by GROUP/memory.isolated knob.

The primary idea behind isolated cgroups is in a better isolation of a group
from the global system activity. At the moment, memory cgroups are mainly
used to throttle processes in a group by placing a cap on their memory
usage. However, mem. cgroups don't protect their (charged) memory from being
evicted by the global reclaim as groups are considered during global
reclaim.

The feature will provide an easy way to setup a mission critical workload in
the memory isolated environment without necessity of mlock. Due to
per-cgroup reclaim we can even handle memory usage spikes much more
gracefully because a part of the working set can get reclaimed (unlike OOM
killed as if mlock has been used). So we can look at the feature as an
intelligent mlock (protect from external memory pressure and reclaim on
internal pressure).

The implementation ignores isolated group status for the soft reclaim which
means that every isolated group can configure how much memory it can
sacrifice under global memory pressure. Soft unlimited groups are isolated
from the global memory pressure completely.

Please note that the feature has to be used with caution because isolated
groups will make a bigger reclaim pressure to non-isolated cgroups.

Implementation is really simple because we just have to hook into shrink_zone
and exclude isolated groups if we are doing the global reclaiming.

Signed-off-by: Michal Hocko <mhocko@suse.cz>

TODO
- consider hierarchies - I am not sure whether we want to have
  non-consistent isolated status in the hierarchy - probably not
- handle root cgroup
- Do we want some checks whether the current setting is safe?
- is bool sufficient. Don't we rather want something like priority
  instead?


 include/linux/memcontrol.h |    7 +++++++
 mm/memcontrol.c            |   44 ++++++++++++++++++++++++++++++++++++++++++++
 mm/vmscan.c                |    8 +++++++-
 3 files changed, 58 insertions(+), 1 deletion(-)

Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
===================================================================
--- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/memcontrol.c
+++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
@@ -258,6 +258,9 @@ struct mem_cgroup {
 	/* set when res.limit == memsw.limit */
 	bool		memsw_is_minimum;
 
+	/* is the group isolated from the global memory pressure? */
+	bool		isolated;
+
 	/* protect arrays of thresholds */
 	struct mutex thresholds_lock;
 
@@ -287,6 +290,11 @@ struct mem_cgroup {
 	spinlock_t pcp_counter_lock;
 };
 
+bool mem_cgroup_isolated(struct mem_cgroup *mem)
+{
+	return mem->isolated;
+}
+
 /* Stuffs for move charges at task migration. */
 /*
  * Types of charges to be moved. "move_charge_at_immitgrate" is treated as a
@@ -4561,6 +4569,37 @@ static int mem_control_numa_stat_open(st
 }
 #endif /* CONFIG_NUMA */
 
+static int mem_cgroup_isolated_write(struct cgroup *cgrp, struct cftype *cft,
+		const char *buffer)
+{
+	int ret = -EINVAL;
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+
+	if (mem_cgroup_is_root(mem))
+		goto out;
+
+	if (!strcasecmp(buffer, "true"))
+		mem->isolated = true;
+	else if (!strcasecmp(buffer, "false"))
+		mem->isolated = false;
+	else
+		goto out;
+
+	ret = 0;
+out:
+	return ret;
+}
+
+static int mem_cgroup_isolated_read(struct cgroup *cgrp, struct cftype *cft,
+		struct seq_file *seq)
+{
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+
+	seq_puts(seq, (mem->isolated)?"true":"false");
+
+	return 0;
+}
+
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -4624,6 +4663,11 @@ static struct cftype mem_cgroup_files[]
 		.unregister_event = mem_cgroup_oom_unregister_event,
 		.private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
 	},
+	{
+		.name = "isolated",
+		.write_string = mem_cgroup_isolated_write,
+		.read_seq_string = mem_cgroup_isolated_read,
+	},
 #ifdef CONFIG_NUMA
 	{
 		.name = "numa_stat",
Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
===================================================================
--- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/include/linux/memcontrol.h
+++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
@@ -165,6 +165,9 @@ void mem_cgroup_split_huge_fixup(struct
 bool mem_cgroup_bad_page_check(struct page *page);
 void mem_cgroup_print_bad_page(struct page *page);
 #endif
+
+bool mem_cgroup_isolated(struct mem_cgroup *mem);
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct mem_cgroup;
 
@@ -382,6 +385,10 @@ static inline
 void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 {
 }
+bool mem_cgroup_isolated(struct mem_cgroup *mem)
+{
+	return false;
+}
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
===================================================================
--- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/vmscan.c
+++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
@@ -2109,7 +2109,13 @@ static void shrink_zone(int priority, st
 			.zone = zone,
 		};
 
-		shrink_mem_cgroup_zone(priority, &mz, sc);
+		/*
+		 * Do not reclaim from an isolated group if we are in
+		 * the global reclaim.
+		 */
+		if (!(mem_cgroup_isolated(mem) && global_reclaim(sc)))
+			shrink_mem_cgroup_zone(priority, &mz, sc);
+
 		/*
 		 * Limit reclaim has historically picked one memcg and
 		 * scanned it with decreasing priority levels until
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFD] Isolated memory cgroups again
@ 2011-10-20  1:33 ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-20  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: LKML, Johannes Weiner, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Hugh Dickins, Ying Han, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

Hi all,
this is a request for discussion (I hope we can touch this during memcg
meeting during the upcoming KS). I have brought this up earlier this
year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
The patch got much smaller since then due to excellent Johannes' memcg
naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
which this is based on.
I realize that this will be controversial but I would like to hear
whether this is strictly no-go or whether we can go that direction (the
implementation might differ of course).

The patch is still half baked but I guess it should be sufficient to
show what I am trying to achieve.
The basic idea is that memcgs would get a new attribute (isolated) which
would control whether that group should be considered during global
reclaim.
This means that we could achieve a certain memory isolation for
processes in the group from the rest of the system activity which has
been traditionally done by mlocking the important parts of memory.
This approach, however, has some advantages. First of all, it is a kind
of all or nothing type of approach. Either the memory is important and
mlocked or you have no guarantee that it keeps resident. 
Secondly it is much more prone to OOM situation.
Let's consider a case where a memory is evictable in theory but you
would pay quite much if you have to get it back resident (pre calculated
data from database - e.g. reports). The memory wouldn't be used very
often so it would be a number one candidate to evict after some time.
We would want to have something like a clever mlock in such a case which
would evict that memory only if the cgroup itself gets under memory
pressure (e.g. peak workload). This is not hard to do if we are not
over committing the memory but things get tricky otherwise.
With the isolated memcgs we get exactly such a guarantee because we would
reclaim such a memory only from the hard limit reclaim paths or if the
soft limit reclaim if it is set up.

Any thoughts comments?

---
From: Michal Hocko <mhocko@suse.cz>
Subject: Implement isolated cgroups

This patch adds a new per-cgroup knob (isolated) which controls whether
pages charged for the group should be considered for the global reclaim
or they are reclaimed only during soft reclaim and under per-cgroup
memory pressure.

The value can be modified by GROUP/memory.isolated knob.

The primary idea behind isolated cgroups is in a better isolation of a group
from the global system activity. At the moment, memory cgroups are mainly
used to throttle processes in a group by placing a cap on their memory
usage. However, mem. cgroups don't protect their (charged) memory from being
evicted by the global reclaim as groups are considered during global
reclaim.

The feature will provide an easy way to setup a mission critical workload in
the memory isolated environment without necessity of mlock. Due to
per-cgroup reclaim we can even handle memory usage spikes much more
gracefully because a part of the working set can get reclaimed (unlike OOM
killed as if mlock has been used). So we can look at the feature as an
intelligent mlock (protect from external memory pressure and reclaim on
internal pressure).

The implementation ignores isolated group status for the soft reclaim which
means that every isolated group can configure how much memory it can
sacrifice under global memory pressure. Soft unlimited groups are isolated
from the global memory pressure completely.

Please note that the feature has to be used with caution because isolated
groups will make a bigger reclaim pressure to non-isolated cgroups.

Implementation is really simple because we just have to hook into shrink_zone
and exclude isolated groups if we are doing the global reclaiming.

Signed-off-by: Michal Hocko <mhocko@suse.cz>

TODO
- consider hierarchies - I am not sure whether we want to have
  non-consistent isolated status in the hierarchy - probably not
- handle root cgroup
- Do we want some checks whether the current setting is safe?
- is bool sufficient. Don't we rather want something like priority
  instead?


 include/linux/memcontrol.h |    7 +++++++
 mm/memcontrol.c            |   44 ++++++++++++++++++++++++++++++++++++++++++++
 mm/vmscan.c                |    8 +++++++-
 3 files changed, 58 insertions(+), 1 deletion(-)

Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
===================================================================
--- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/memcontrol.c
+++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
@@ -258,6 +258,9 @@ struct mem_cgroup {
 	/* set when res.limit == memsw.limit */
 	bool		memsw_is_minimum;
 
+	/* is the group isolated from the global memory pressure? */
+	bool		isolated;
+
 	/* protect arrays of thresholds */
 	struct mutex thresholds_lock;
 
@@ -287,6 +290,11 @@ struct mem_cgroup {
 	spinlock_t pcp_counter_lock;
 };
 
+bool mem_cgroup_isolated(struct mem_cgroup *mem)
+{
+	return mem->isolated;
+}
+
 /* Stuffs for move charges at task migration. */
 /*
  * Types of charges to be moved. "move_charge_at_immitgrate" is treated as a
@@ -4561,6 +4569,37 @@ static int mem_control_numa_stat_open(st
 }
 #endif /* CONFIG_NUMA */
 
+static int mem_cgroup_isolated_write(struct cgroup *cgrp, struct cftype *cft,
+		const char *buffer)
+{
+	int ret = -EINVAL;
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+
+	if (mem_cgroup_is_root(mem))
+		goto out;
+
+	if (!strcasecmp(buffer, "true"))
+		mem->isolated = true;
+	else if (!strcasecmp(buffer, "false"))
+		mem->isolated = false;
+	else
+		goto out;
+
+	ret = 0;
+out:
+	return ret;
+}
+
+static int mem_cgroup_isolated_read(struct cgroup *cgrp, struct cftype *cft,
+		struct seq_file *seq)
+{
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+
+	seq_puts(seq, (mem->isolated)?"true":"false");
+
+	return 0;
+}
+
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
@@ -4624,6 +4663,11 @@ static struct cftype mem_cgroup_files[]
 		.unregister_event = mem_cgroup_oom_unregister_event,
 		.private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
 	},
+	{
+		.name = "isolated",
+		.write_string = mem_cgroup_isolated_write,
+		.read_seq_string = mem_cgroup_isolated_read,
+	},
 #ifdef CONFIG_NUMA
 	{
 		.name = "numa_stat",
Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
===================================================================
--- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/include/linux/memcontrol.h
+++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
@@ -165,6 +165,9 @@ void mem_cgroup_split_huge_fixup(struct
 bool mem_cgroup_bad_page_check(struct page *page);
 void mem_cgroup_print_bad_page(struct page *page);
 #endif
+
+bool mem_cgroup_isolated(struct mem_cgroup *mem);
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct mem_cgroup;
 
@@ -382,6 +385,10 @@ static inline
 void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 {
 }
+bool mem_cgroup_isolated(struct mem_cgroup *mem)
+{
+	return false;
+}
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
===================================================================
--- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/vmscan.c
+++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
@@ -2109,7 +2109,13 @@ static void shrink_zone(int priority, st
 			.zone = zone,
 		};
 
-		shrink_mem_cgroup_zone(priority, &mz, sc);
+		/*
+		 * Do not reclaim from an isolated group if we are in
+		 * the global reclaim.
+		 */
+		if (!(mem_cgroup_isolated(mem) && global_reclaim(sc)))
+			shrink_mem_cgroup_zone(priority, &mz, sc);
+
 		/*
 		 * Limit reclaim has historically picked one memcg and
 		 * scanned it with decreasing priority levels until
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-20  1:33 ` Michal Hocko
@ 2011-10-20  1:59   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-10-20  1:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, LKML, Johannes Weiner, Daisuke Nishimura, Hugh Dickins,
	Ying Han, Andrew Morton, Glauber Costa, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On Wed, 19 Oct 2011 18:33:09 -0700
Michal Hocko <mhocko@suse.cz> wrote:

> Hi all,
> this is a request for discussion (I hope we can touch this during memcg
> meeting during the upcoming KS). I have brought this up earlier this
> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> The patch got much smaller since then due to excellent Johannes' memcg
> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> which this is based on.

Yes, Johannes' work will make isolation smarter.


> I realize that this will be controversial but I would like to hear
> whether this is strictly no-go or whether we can go that direction (the
> implementation might differ of course).
> 
> The patch is still half baked but I guess it should be sufficient to
> show what I am trying to achieve.
> The basic idea is that memcgs would get a new attribute (isolated) which
> would control whether that group should be considered during global
> reclaim.
> This means that we could achieve a certain memory isolation for
> processes in the group from the rest of the system activity which has
> been traditionally done by mlocking the important parts of memory.
> This approach, however, has some advantages. First of all, it is a kind
> of all or nothing type of approach. Either the memory is important and
> mlocked or you have no guarantee that it keeps resident. 
> Secondly it is much more prone to OOM situation.
> Let's consider a case where a memory is evictable in theory but you
> would pay quite much if you have to get it back resident (pre calculated
> data from database - e.g. reports). The memory wouldn't be used very
> often so it would be a number one candidate to evict after some time.
> We would want to have something like a clever mlock in such a case which
> would evict that memory only if the cgroup itself gets under memory
> pressure (e.g. peak workload). This is not hard to do if we are not
> over committing the memory but things get tricky otherwise.
> With the isolated memcgs we get exactly such a guarantee because we would
> reclaim such a memory only from the hard limit reclaim paths or if the
> soft limit reclaim if it is set up.
> 
> Any thoughts comments?
> 

I can only say
 - it can be implemented in a clean way.
 - maybe customers wants it.
 - This kinds of "mlock" can be harmful and make system admin difficult.
 - I'm not sure there will be a chance for security issue, DOS attack.

Hmm...if the number of isolated pages can be shown in /proc/meminfo,
I'll not have strong NACK.

But I personally think we should make softlimit better rather than
adding new interface. If this feature can be archieved when setting
softlimit=UNLIMITED, it's simple. And Johannes' work will make this
easy to be implemented.
(total rewrite of softlimit should be required...I think.)

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-20  1:59   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-10-20  1:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, LKML, Johannes Weiner, Daisuke Nishimura, Hugh Dickins,
	Ying Han, Andrew Morton, Glauber Costa, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On Wed, 19 Oct 2011 18:33:09 -0700
Michal Hocko <mhocko@suse.cz> wrote:

> Hi all,
> this is a request for discussion (I hope we can touch this during memcg
> meeting during the upcoming KS). I have brought this up earlier this
> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> The patch got much smaller since then due to excellent Johannes' memcg
> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> which this is based on.

Yes, Johannes' work will make isolation smarter.


> I realize that this will be controversial but I would like to hear
> whether this is strictly no-go or whether we can go that direction (the
> implementation might differ of course).
> 
> The patch is still half baked but I guess it should be sufficient to
> show what I am trying to achieve.
> The basic idea is that memcgs would get a new attribute (isolated) which
> would control whether that group should be considered during global
> reclaim.
> This means that we could achieve a certain memory isolation for
> processes in the group from the rest of the system activity which has
> been traditionally done by mlocking the important parts of memory.
> This approach, however, has some advantages. First of all, it is a kind
> of all or nothing type of approach. Either the memory is important and
> mlocked or you have no guarantee that it keeps resident. 
> Secondly it is much more prone to OOM situation.
> Let's consider a case where a memory is evictable in theory but you
> would pay quite much if you have to get it back resident (pre calculated
> data from database - e.g. reports). The memory wouldn't be used very
> often so it would be a number one candidate to evict after some time.
> We would want to have something like a clever mlock in such a case which
> would evict that memory only if the cgroup itself gets under memory
> pressure (e.g. peak workload). This is not hard to do if we are not
> over committing the memory but things get tricky otherwise.
> With the isolated memcgs we get exactly such a guarantee because we would
> reclaim such a memory only from the hard limit reclaim paths or if the
> soft limit reclaim if it is set up.
> 
> Any thoughts comments?
> 

I can only say
 - it can be implemented in a clean way.
 - maybe customers wants it.
 - This kinds of "mlock" can be harmful and make system admin difficult.
 - I'm not sure there will be a chance for security issue, DOS attack.

Hmm...if the number of isolated pages can be shown in /proc/meminfo,
I'll not have strong NACK.

But I personally think we should make softlimit better rather than
adding new interface. If this feature can be archieved when setting
softlimit=UNLIMITED, it's simple. And Johannes' work will make this
easy to be implemented.
(total rewrite of softlimit should be required...I think.)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-20  1:33 ` Michal Hocko
@ 2011-10-20  8:55   ` Glauber Costa
  -1 siblings, 0 replies; 31+ messages in thread
From: Glauber Costa @ 2011-10-20  8:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Ying Han, Andrew Morton,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On 10/20/2011 05:33 AM, Michal Hocko wrote:
> Hi all,
> this is a request for discussion (I hope we can touch this during memcg
> meeting during the upcoming KS). I have brought this up earlier this
> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> The patch got much smaller since then due to excellent Johannes' memcg
> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> which this is based on.
> I realize that this will be controversial but I would like to hear
> whether this is strictly no-go or whether we can go that direction (the
> implementation might differ of course).
>
> The patch is still half baked but I guess it should be sufficient to
> show what I am trying to achieve.
> The basic idea is that memcgs would get a new attribute (isolated) which
> would control whether that group should be considered during global
> reclaim.

I'd like to hear a bit more of your use cases, but at first, I don't 
like it. I think we should always, regardless of any knobs or 
definitions, be able to globally select a task or set of tasks, and kill 
them.

We have a slightly similar need here (we'd have to find out how 
similar...). We're working on it as well, but no patches yet (very 
basic) Let me describe it so we can see if it fits.

The main concern is with OOM behaviour of tasks within a cgroup. We'd 
like to be able to, in a per-cgroup basis:

* select how "important" a group is. OOM should try to kill less 
important memory hogs first (but note: it's less important *memory 
hogs*, not ordinary processes, and all of them are actually considered)
* select if a fat task within a group should be OOMed, or if the whole 
group should go.
* assuming an hierarchical grouping, select if we should kill children 
first
* assuming an hierarchical grouping, select if we should kill children 
at all.

This is a broader work, but I am under the impression that you should 
also be able to contemplate your needs (at least the OOM part) with such 
mechanism, by setting arbitrarily high limits on certain cgroups.

Of course it might be the case that I am not yet fully understanding 
your scenario. In this case, I'm all ears!

Thank you.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-20  8:55   ` Glauber Costa
  0 siblings, 0 replies; 31+ messages in thread
From: Glauber Costa @ 2011-10-20  8:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Ying Han, Andrew Morton,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On 10/20/2011 05:33 AM, Michal Hocko wrote:
> Hi all,
> this is a request for discussion (I hope we can touch this during memcg
> meeting during the upcoming KS). I have brought this up earlier this
> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> The patch got much smaller since then due to excellent Johannes' memcg
> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> which this is based on.
> I realize that this will be controversial but I would like to hear
> whether this is strictly no-go or whether we can go that direction (the
> implementation might differ of course).
>
> The patch is still half baked but I guess it should be sufficient to
> show what I am trying to achieve.
> The basic idea is that memcgs would get a new attribute (isolated) which
> would control whether that group should be considered during global
> reclaim.

I'd like to hear a bit more of your use cases, but at first, I don't 
like it. I think we should always, regardless of any knobs or 
definitions, be able to globally select a task or set of tasks, and kill 
them.

We have a slightly similar need here (we'd have to find out how 
similar...). We're working on it as well, but no patches yet (very 
basic) Let me describe it so we can see if it fits.

The main concern is with OOM behaviour of tasks within a cgroup. We'd 
like to be able to, in a per-cgroup basis:

* select how "important" a group is. OOM should try to kill less 
important memory hogs first (but note: it's less important *memory 
hogs*, not ordinary processes, and all of them are actually considered)
* select if a fat task within a group should be OOMed, or if the whole 
group should go.
* assuming an hierarchical grouping, select if we should kill children 
first
* assuming an hierarchical grouping, select if we should kill children 
at all.

This is a broader work, but I am under the impression that you should 
also be able to contemplate your needs (at least the OOM part) with such 
mechanism, by setting arbitrarily high limits on certain cgroups.

Of course it might be the case that I am not yet fully understanding 
your scenario. In this case, I'm all ears!

Thank you.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-20  1:59   ` KAMEZAWA Hiroyuki
@ 2011-10-20 16:30     ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-20 16:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, Johannes Weiner, Daisuke Nishimura, Hugh Dickins,
	Ying Han, Andrew Morton, Glauber Costa, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On Thu 20-10-11 10:59:50, KAMEZAWA Hiroyuki wrote:
> On Wed, 19 Oct 2011 18:33:09 -0700
> Michal Hocko <mhocko@suse.cz> wrote:
[...]
> > I realize that this will be controversial but I would like to hear
> > whether this is strictly no-go or whether we can go that direction (the
> > implementation might differ of course).
> > 
> > The patch is still half baked but I guess it should be sufficient to
> > show what I am trying to achieve.
> > The basic idea is that memcgs would get a new attribute (isolated) which
> > would control whether that group should be considered during global
> > reclaim.
> > This means that we could achieve a certain memory isolation for
> > processes in the group from the rest of the system activity which has
> > been traditionally done by mlocking the important parts of memory.
> > This approach, however, has some advantages. First of all, it is a kind
> > of all or nothing type of approach. Either the memory is important and
> > mlocked or you have no guarantee that it keeps resident. 
> > Secondly it is much more prone to OOM situation.
> > Let's consider a case where a memory is evictable in theory but you
> > would pay quite much if you have to get it back resident (pre calculated
> > data from database - e.g. reports). The memory wouldn't be used very
> > often so it would be a number one candidate to evict after some time.
> > We would want to have something like a clever mlock in such a case which
> > would evict that memory only if the cgroup itself gets under memory
> > pressure (e.g. peak workload). This is not hard to do if we are not
> > over committing the memory but things get tricky otherwise.
> > With the isolated memcgs we get exactly such a guarantee because we would
> > reclaim such a memory only from the hard limit reclaim paths or if the
> > soft limit reclaim if it is set up.
> > 
> > Any thoughts comments?
> > 
> 
> I can only say
>  - it can be implemented in a clean way.
>  - maybe customers wants it.
>  - This kinds of "mlock" can be harmful and make system admin difficult.

It is usually admin who sets up control groups and their attributes.

>  - I'm not sure there will be a chance for security issue, DOS attack.

It depends what you consider by the DOS attack. In scenarios I have in
mind it is usually the important workload that is isolated which means
that the feature helps preventing DOS attack on it.
If you are more thinking about the rest (not isolated groups) then yes,
there will be a bigger pressure on them. This is something that has to
be considered when the system is set up.

> 
> Hmm...if the number of isolated pages can be shown in /proc/meminfo,
> I'll not have strong NACK.

This will be trivial to implement.

> 
> But I personally think we should make softlimit better rather than
> adding new interface. If this feature can be archieved when setting
> softlimit=UNLIMITED, it's simple. And Johannes' work will make this
> easy to be implemented.

As I already said. I am not insisting on the implementation. I just
consider isolation important and we have several customers who need
this. If this can be done by the soft limit reclaim only I will not
object for sure. Configuration would need to be careful in both cases
anyway.

> (total rewrite of softlimit should be required...I think.)
> 
> Thanks,
> -Kame

Thanks

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-20 16:30     ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-20 16:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, Johannes Weiner, Daisuke Nishimura, Hugh Dickins,
	Ying Han, Andrew Morton, Glauber Costa, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On Thu 20-10-11 10:59:50, KAMEZAWA Hiroyuki wrote:
> On Wed, 19 Oct 2011 18:33:09 -0700
> Michal Hocko <mhocko@suse.cz> wrote:
[...]
> > I realize that this will be controversial but I would like to hear
> > whether this is strictly no-go or whether we can go that direction (the
> > implementation might differ of course).
> > 
> > The patch is still half baked but I guess it should be sufficient to
> > show what I am trying to achieve.
> > The basic idea is that memcgs would get a new attribute (isolated) which
> > would control whether that group should be considered during global
> > reclaim.
> > This means that we could achieve a certain memory isolation for
> > processes in the group from the rest of the system activity which has
> > been traditionally done by mlocking the important parts of memory.
> > This approach, however, has some advantages. First of all, it is a kind
> > of all or nothing type of approach. Either the memory is important and
> > mlocked or you have no guarantee that it keeps resident. 
> > Secondly it is much more prone to OOM situation.
> > Let's consider a case where a memory is evictable in theory but you
> > would pay quite much if you have to get it back resident (pre calculated
> > data from database - e.g. reports). The memory wouldn't be used very
> > often so it would be a number one candidate to evict after some time.
> > We would want to have something like a clever mlock in such a case which
> > would evict that memory only if the cgroup itself gets under memory
> > pressure (e.g. peak workload). This is not hard to do if we are not
> > over committing the memory but things get tricky otherwise.
> > With the isolated memcgs we get exactly such a guarantee because we would
> > reclaim such a memory only from the hard limit reclaim paths or if the
> > soft limit reclaim if it is set up.
> > 
> > Any thoughts comments?
> > 
> 
> I can only say
>  - it can be implemented in a clean way.
>  - maybe customers wants it.
>  - This kinds of "mlock" can be harmful and make system admin difficult.

It is usually admin who sets up control groups and their attributes.

>  - I'm not sure there will be a chance for security issue, DOS attack.

It depends what you consider by the DOS attack. In scenarios I have in
mind it is usually the important workload that is isolated which means
that the feature helps preventing DOS attack on it.
If you are more thinking about the rest (not isolated groups) then yes,
there will be a bigger pressure on them. This is something that has to
be considered when the system is set up.

> 
> Hmm...if the number of isolated pages can be shown in /proc/meminfo,
> I'll not have strong NACK.

This will be trivial to implement.

> 
> But I personally think we should make softlimit better rather than
> adding new interface. If this feature can be archieved when setting
> softlimit=UNLIMITED, it's simple. And Johannes' work will make this
> easy to be implemented.

As I already said. I am not insisting on the implementation. I just
consider isolation important and we have several customers who need
this. If this can be done by the soft limit reclaim only I will not
object for sure. Configuration would need to be careful in both cases
anyway.

> (total rewrite of softlimit should be required...I think.)
> 
> Thanks,
> -Kame

Thanks

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-20  8:55   ` Glauber Costa
@ 2011-10-20 16:42     ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-20 16:42 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Ying Han, Andrew Morton,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On Thu 20-10-11 12:55:24, Glauber Costa wrote:
> On 10/20/2011 05:33 AM, Michal Hocko wrote:
> >Hi all,
> >this is a request for discussion (I hope we can touch this during memcg
> >meeting during the upcoming KS). I have brought this up earlier this
> >year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> >The patch got much smaller since then due to excellent Johannes' memcg
> >naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> >which this is based on.
> >I realize that this will be controversial but I would like to hear
> >whether this is strictly no-go or whether we can go that direction (the
> >implementation might differ of course).
> >
> >The patch is still half baked but I guess it should be sufficient to
> >show what I am trying to achieve.
> >The basic idea is that memcgs would get a new attribute (isolated) which
> >would control whether that group should be considered during global
> >reclaim.
> 
> I'd like to hear a bit more of your use cases,

The primary goal is to isolate the primary workload (e.g. database) from
the rest of the system which provide a support for the primary workload
(backups, administration tools etc). While we can do that even now just
by wrapping everything into different groups and set up proper limits it
gets really tricky if you want to overcommit the box because then the
global reclaim is inevitable so we will start reclaiming from all
groups.

> but at first, I don't like it. I think we should always, regardless of
> any knobs or definitions, be able to globally select a task or set of
> tasks, and kill them.

The patchset is not about OOM but rather about the reclaim. If there is
a global OOM situation we do not care about isolated memcgs.

[...]
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-20 16:42     ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-20 16:42 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Ying Han, Andrew Morton,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On Thu 20-10-11 12:55:24, Glauber Costa wrote:
> On 10/20/2011 05:33 AM, Michal Hocko wrote:
> >Hi all,
> >this is a request for discussion (I hope we can touch this during memcg
> >meeting during the upcoming KS). I have brought this up earlier this
> >year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> >The patch got much smaller since then due to excellent Johannes' memcg
> >naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> >which this is based on.
> >I realize that this will be controversial but I would like to hear
> >whether this is strictly no-go or whether we can go that direction (the
> >implementation might differ of course).
> >
> >The patch is still half baked but I guess it should be sufficient to
> >show what I am trying to achieve.
> >The basic idea is that memcgs would get a new attribute (isolated) which
> >would control whether that group should be considered during global
> >reclaim.
> 
> I'd like to hear a bit more of your use cases,

The primary goal is to isolate the primary workload (e.g. database) from
the rest of the system which provide a support for the primary workload
(backups, administration tools etc). While we can do that even now just
by wrapping everything into different groups and set up proper limits it
gets really tricky if you want to overcommit the box because then the
global reclaim is inevitable so we will start reclaiming from all
groups.

> but at first, I don't like it. I think we should always, regardless of
> any knobs or definitions, be able to globally select a task or set of
> tasks, and kill them.

The patchset is not about OOM but rather about the reclaim. If there is
a global OOM situation we do not care about isolated memcgs.

[...]
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-20  1:33 ` Michal Hocko
@ 2011-10-20 23:41   ` Ying Han
  -1 siblings, 0 replies; 31+ messages in thread
From: Ying Han @ 2011-10-20 23:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On Wed, Oct 19, 2011 at 6:33 PM, Michal Hocko <mhocko@suse.cz> wrote:
> Hi all,
> this is a request for discussion (I hope we can touch this during memcg
> meeting during the upcoming KS). I have brought this up earlier this
> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> The patch got much smaller since then due to excellent Johannes' memcg
> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> which this is based on.
> I realize that this will be controversial but I would like to hear
> whether this is strictly no-go or whether we can go that direction (the
> implementation might differ of course).
>
> The patch is still half baked but I guess it should be sufficient to
> show what I am trying to achieve.
> The basic idea is that memcgs would get a new attribute (isolated) which
> would control whether that group should be considered during global
> reclaim.
> This means that we could achieve a certain memory isolation for
> processes in the group from the rest of the system activity which has
> been traditionally done by mlocking the important parts of memory.
> This approach, however, has some advantages. First of all, it is a kind
> of all or nothing type of approach. Either the memory is important and
> mlocked or you have no guarantee that it keeps resident.
> Secondly it is much more prone to OOM situation.
> Let's consider a case where a memory is evictable in theory but you
> would pay quite much if you have to get it back resident (pre calculated
> data from database - e.g. reports). The memory wouldn't be used very
> often so it would be a number one candidate to evict after some time.
> We would want to have something like a clever mlock in such a case which
> would evict that memory only if the cgroup itself gets under memory
> pressure (e.g. peak workload). This is not hard to do if we are not
> over committing the memory but things get tricky otherwise.
> With the isolated memcgs we get exactly such a guarantee because we would
> reclaim such a memory only from the hard limit reclaim paths or if the
> soft limit reclaim if it is set up.
>
> Any thoughts comments?
>
> ---
> From: Michal Hocko <mhocko@suse.cz>
> Subject: Implement isolated cgroups
>
> This patch adds a new per-cgroup knob (isolated) which controls whether
> pages charged for the group should be considered for the global reclaim
> or they are reclaimed only during soft reclaim and under per-cgroup
> memory pressure.
>
> The value can be modified by GROUP/memory.isolated knob.
>
> The primary idea behind isolated cgroups is in a better isolation of a group
> from the global system activity. At the moment, memory cgroups are mainly
> used to throttle processes in a group by placing a cap on their memory
> usage. However, mem. cgroups don't protect their (charged) memory from being
> evicted by the global reclaim as groups are considered during global
> reclaim.
>
> The feature will provide an easy way to setup a mission critical workload in
> the memory isolated environment without necessity of mlock. Due to
> per-cgroup reclaim we can even handle memory usage spikes much more
> gracefully because a part of the working set can get reclaimed (unlike OOM
> killed as if mlock has been used). So we can look at the feature as an
> intelligent mlock (protect from external memory pressure and reclaim on
> internal pressure).
>
> The implementation ignores isolated group status for the soft reclaim which
> means that every isolated group can configure how much memory it can
> sacrifice under global memory pressure. Soft unlimited groups are isolated
> from the global memory pressure completely.
>
> Please note that the feature has to be used with caution because isolated
> groups will make a bigger reclaim pressure to non-isolated cgroups.
>
> Implementation is really simple because we just have to hook into shrink_zone
> and exclude isolated groups if we are doing the global reclaiming.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
>
> TODO
> - consider hierarchies - I am not sure whether we want to have
>  non-consistent isolated status in the hierarchy - probably not
> - handle root cgroup
> - Do we want some checks whether the current setting is safe?
> - is bool sufficient. Don't we rather want something like priority
>  instead?
>
>
>  include/linux/memcontrol.h |    7 +++++++
>  mm/memcontrol.c            |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  mm/vmscan.c                |    8 +++++++-
>  3 files changed, 58 insertions(+), 1 deletion(-)
>
> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
> ===================================================================
> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/memcontrol.c
> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
> @@ -258,6 +258,9 @@ struct mem_cgroup {
>        /* set when res.limit == memsw.limit */
>        bool            memsw_is_minimum;
>
> +       /* is the group isolated from the global memory pressure? */
> +       bool            isolated;
> +
>        /* protect arrays of thresholds */
>        struct mutex thresholds_lock;
>
> @@ -287,6 +290,11 @@ struct mem_cgroup {
>        spinlock_t pcp_counter_lock;
>  };
>
> +bool mem_cgroup_isolated(struct mem_cgroup *mem)
> +{
> +       return mem->isolated;
> +}
> +
>  /* Stuffs for move charges at task migration. */
>  /*
>  * Types of charges to be moved. "move_charge_at_immitgrate" is treated as a
> @@ -4561,6 +4569,37 @@ static int mem_control_numa_stat_open(st
>  }
>  #endif /* CONFIG_NUMA */
>
> +static int mem_cgroup_isolated_write(struct cgroup *cgrp, struct cftype *cft,
> +               const char *buffer)
> +{
> +       int ret = -EINVAL;
> +       struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
> +
> +       if (mem_cgroup_is_root(mem))
> +               goto out;
> +
> +       if (!strcasecmp(buffer, "true"))
> +               mem->isolated = true;
> +       else if (!strcasecmp(buffer, "false"))
> +               mem->isolated = false;
> +       else
> +               goto out;
> +
> +       ret = 0;
> +out:
> +       return ret;
> +}
> +
> +static int mem_cgroup_isolated_read(struct cgroup *cgrp, struct cftype *cft,
> +               struct seq_file *seq)
> +{
> +       struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
> +
> +       seq_puts(seq, (mem->isolated)?"true":"false");
> +
> +       return 0;
> +}
> +
>  static struct cftype mem_cgroup_files[] = {
>        {
>                .name = "usage_in_bytes",
> @@ -4624,6 +4663,11 @@ static struct cftype mem_cgroup_files[]
>                .unregister_event = mem_cgroup_oom_unregister_event,
>                .private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
>        },
> +       {
> +               .name = "isolated",
> +               .write_string = mem_cgroup_isolated_write,
> +               .read_seq_string = mem_cgroup_isolated_read,
> +       },
>  #ifdef CONFIG_NUMA
>        {
>                .name = "numa_stat",
> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
> ===================================================================
> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/include/linux/memcontrol.h
> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
> @@ -165,6 +165,9 @@ void mem_cgroup_split_huge_fixup(struct
>  bool mem_cgroup_bad_page_check(struct page *page);
>  void mem_cgroup_print_bad_page(struct page *page);
>  #endif
> +
> +bool mem_cgroup_isolated(struct mem_cgroup *mem);
> +
>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>  struct mem_cgroup;
>
> @@ -382,6 +385,10 @@ static inline
>  void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
>  {
>  }
> +bool mem_cgroup_isolated(struct mem_cgroup *mem)
> +{
> +       return false;
> +}
>  #endif /* CONFIG_CGROUP_MEM_CONT */
>
>  #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
> ===================================================================
> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/vmscan.c
> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
> @@ -2109,7 +2109,13 @@ static void shrink_zone(int priority, st
>                        .zone = zone,
>                };
>
> -               shrink_mem_cgroup_zone(priority, &mz, sc);
> +               /*
> +                * Do not reclaim from an isolated group if we are in
> +                * the global reclaim.
> +                */
> +               if (!(mem_cgroup_isolated(mem) && global_reclaim(sc)))
> +                       shrink_mem_cgroup_zone(priority, &mz, sc);
> +
>                /*
>                 * Limit reclaim has historically picked one memcg and
>                 * scanned it with decreasing priority levels until
> --
> Michal Hocko
> SUSE Labs
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9
> Czech Republic
>

Hi Michal:

I didn't read through the patch itself but only the description. If we
wanna protect a memcg being reclaimed from under global memory
pressure, I think we can approach it by making change on soft_limit
reclaim.

I have a soft_limit change built on top of Johannes's patchset, which
does basically soft_limit aware reclaim under global memory pressure.
The implementation is simple, and I am looking forward to discuss more
with you guys in the conference.

--Ying

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-20 23:41   ` Ying Han
  0 siblings, 0 replies; 31+ messages in thread
From: Ying Han @ 2011-10-20 23:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On Wed, Oct 19, 2011 at 6:33 PM, Michal Hocko <mhocko@suse.cz> wrote:
> Hi all,
> this is a request for discussion (I hope we can touch this during memcg
> meeting during the upcoming KS). I have brought this up earlier this
> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> The patch got much smaller since then due to excellent Johannes' memcg
> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> which this is based on.
> I realize that this will be controversial but I would like to hear
> whether this is strictly no-go or whether we can go that direction (the
> implementation might differ of course).
>
> The patch is still half baked but I guess it should be sufficient to
> show what I am trying to achieve.
> The basic idea is that memcgs would get a new attribute (isolated) which
> would control whether that group should be considered during global
> reclaim.
> This means that we could achieve a certain memory isolation for
> processes in the group from the rest of the system activity which has
> been traditionally done by mlocking the important parts of memory.
> This approach, however, has some advantages. First of all, it is a kind
> of all or nothing type of approach. Either the memory is important and
> mlocked or you have no guarantee that it keeps resident.
> Secondly it is much more prone to OOM situation.
> Let's consider a case where a memory is evictable in theory but you
> would pay quite much if you have to get it back resident (pre calculated
> data from database - e.g. reports). The memory wouldn't be used very
> often so it would be a number one candidate to evict after some time.
> We would want to have something like a clever mlock in such a case which
> would evict that memory only if the cgroup itself gets under memory
> pressure (e.g. peak workload). This is not hard to do if we are not
> over committing the memory but things get tricky otherwise.
> With the isolated memcgs we get exactly such a guarantee because we would
> reclaim such a memory only from the hard limit reclaim paths or if the
> soft limit reclaim if it is set up.
>
> Any thoughts comments?
>
> ---
> From: Michal Hocko <mhocko@suse.cz>
> Subject: Implement isolated cgroups
>
> This patch adds a new per-cgroup knob (isolated) which controls whether
> pages charged for the group should be considered for the global reclaim
> or they are reclaimed only during soft reclaim and under per-cgroup
> memory pressure.
>
> The value can be modified by GROUP/memory.isolated knob.
>
> The primary idea behind isolated cgroups is in a better isolation of a group
> from the global system activity. At the moment, memory cgroups are mainly
> used to throttle processes in a group by placing a cap on their memory
> usage. However, mem. cgroups don't protect their (charged) memory from being
> evicted by the global reclaim as groups are considered during global
> reclaim.
>
> The feature will provide an easy way to setup a mission critical workload in
> the memory isolated environment without necessity of mlock. Due to
> per-cgroup reclaim we can even handle memory usage spikes much more
> gracefully because a part of the working set can get reclaimed (unlike OOM
> killed as if mlock has been used). So we can look at the feature as an
> intelligent mlock (protect from external memory pressure and reclaim on
> internal pressure).
>
> The implementation ignores isolated group status for the soft reclaim which
> means that every isolated group can configure how much memory it can
> sacrifice under global memory pressure. Soft unlimited groups are isolated
> from the global memory pressure completely.
>
> Please note that the feature has to be used with caution because isolated
> groups will make a bigger reclaim pressure to non-isolated cgroups.
>
> Implementation is really simple because we just have to hook into shrink_zone
> and exclude isolated groups if we are doing the global reclaiming.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
>
> TODO
> - consider hierarchies - I am not sure whether we want to have
>  non-consistent isolated status in the hierarchy - probably not
> - handle root cgroup
> - Do we want some checks whether the current setting is safe?
> - is bool sufficient. Don't we rather want something like priority
>  instead?
>
>
>  include/linux/memcontrol.h |    7 +++++++
>  mm/memcontrol.c            |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  mm/vmscan.c                |    8 +++++++-
>  3 files changed, 58 insertions(+), 1 deletion(-)
>
> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
> ===================================================================
> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/memcontrol.c
> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
> @@ -258,6 +258,9 @@ struct mem_cgroup {
>        /* set when res.limit == memsw.limit */
>        bool            memsw_is_minimum;
>
> +       /* is the group isolated from the global memory pressure? */
> +       bool            isolated;
> +
>        /* protect arrays of thresholds */
>        struct mutex thresholds_lock;
>
> @@ -287,6 +290,11 @@ struct mem_cgroup {
>        spinlock_t pcp_counter_lock;
>  };
>
> +bool mem_cgroup_isolated(struct mem_cgroup *mem)
> +{
> +       return mem->isolated;
> +}
> +
>  /* Stuffs for move charges at task migration. */
>  /*
>  * Types of charges to be moved. "move_charge_at_immitgrate" is treated as a
> @@ -4561,6 +4569,37 @@ static int mem_control_numa_stat_open(st
>  }
>  #endif /* CONFIG_NUMA */
>
> +static int mem_cgroup_isolated_write(struct cgroup *cgrp, struct cftype *cft,
> +               const char *buffer)
> +{
> +       int ret = -EINVAL;
> +       struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
> +
> +       if (mem_cgroup_is_root(mem))
> +               goto out;
> +
> +       if (!strcasecmp(buffer, "true"))
> +               mem->isolated = true;
> +       else if (!strcasecmp(buffer, "false"))
> +               mem->isolated = false;
> +       else
> +               goto out;
> +
> +       ret = 0;
> +out:
> +       return ret;
> +}
> +
> +static int mem_cgroup_isolated_read(struct cgroup *cgrp, struct cftype *cft,
> +               struct seq_file *seq)
> +{
> +       struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
> +
> +       seq_puts(seq, (mem->isolated)?"true":"false");
> +
> +       return 0;
> +}
> +
>  static struct cftype mem_cgroup_files[] = {
>        {
>                .name = "usage_in_bytes",
> @@ -4624,6 +4663,11 @@ static struct cftype mem_cgroup_files[]
>                .unregister_event = mem_cgroup_oom_unregister_event,
>                .private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
>        },
> +       {
> +               .name = "isolated",
> +               .write_string = mem_cgroup_isolated_write,
> +               .read_seq_string = mem_cgroup_isolated_read,
> +       },
>  #ifdef CONFIG_NUMA
>        {
>                .name = "numa_stat",
> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
> ===================================================================
> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/include/linux/memcontrol.h
> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
> @@ -165,6 +165,9 @@ void mem_cgroup_split_huge_fixup(struct
>  bool mem_cgroup_bad_page_check(struct page *page);
>  void mem_cgroup_print_bad_page(struct page *page);
>  #endif
> +
> +bool mem_cgroup_isolated(struct mem_cgroup *mem);
> +
>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>  struct mem_cgroup;
>
> @@ -382,6 +385,10 @@ static inline
>  void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
>  {
>  }
> +bool mem_cgroup_isolated(struct mem_cgroup *mem)
> +{
> +       return false;
> +}
>  #endif /* CONFIG_CGROUP_MEM_CONT */
>
>  #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
> ===================================================================
> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/vmscan.c
> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
> @@ -2109,7 +2109,13 @@ static void shrink_zone(int priority, st
>                        .zone = zone,
>                };
>
> -               shrink_mem_cgroup_zone(priority, &mz, sc);
> +               /*
> +                * Do not reclaim from an isolated group if we are in
> +                * the global reclaim.
> +                */
> +               if (!(mem_cgroup_isolated(mem) && global_reclaim(sc)))
> +                       shrink_mem_cgroup_zone(priority, &mz, sc);
> +
>                /*
>                 * Limit reclaim has historically picked one memcg and
>                 * scanned it with decreasing priority levels until
> --
> Michal Hocko
> SUSE Labs
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9
> Czech Republic
>

Hi Michal:

I didn't read through the patch itself but only the description. If we
wanna protect a memcg being reclaimed from under global memory
pressure, I think we can approach it by making change on soft_limit
reclaim.

I have a soft_limit change built on top of Johannes's patchset, which
does basically soft_limit aware reclaim under global memory pressure.
The implementation is simple, and I am looking forward to discuss more
with you guys in the conference.

--Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-20 23:41   ` Ying Han
@ 2011-10-21  2:45     ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-21  2:45 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On Thu 20-10-11 16:41:27, Ying Han wrote:
[...]
> Hi Michal:

Hi,

> 
> I didn't read through the patch itself but only the description. If we
> wanna protect a memcg being reclaimed from under global memory
> pressure, I think we can approach it by making change on soft_limit
> reclaim.
> 
> I have a soft_limit change built on top of Johannes's patchset, which
> does basically soft_limit aware reclaim under global memory pressure.

Is there any link to the patch(es)? I would be interested to look at
it before we discuss it.

[...]

Thanks
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-21  2:45     ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-21  2:45 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On Thu 20-10-11 16:41:27, Ying Han wrote:
[...]
> Hi Michal:

Hi,

> 
> I didn't read through the patch itself but only the description. If we
> wanna protect a memcg being reclaimed from under global memory
> pressure, I think we can approach it by making change on soft_limit
> reclaim.
> 
> I have a soft_limit change built on top of Johannes's patchset, which
> does basically soft_limit aware reclaim under global memory pressure.

Is there any link to the patch(es)? I would be interested to look at
it before we discuss it.

[...]

Thanks
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-21  2:45     ` Michal Hocko
@ 2011-10-21  3:17       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-10-21  3:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ying Han, linux-mm, LKML, Johannes Weiner, Daisuke Nishimura,
	Hugh Dickins, Andrew Morton, Glauber Costa, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On Thu, 20 Oct 2011 19:45:55 -0700
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 20-10-11 16:41:27, Ying Han wrote:
> [...]
> > Hi Michal:
> 
> Hi,
> 
> > 
> > I didn't read through the patch itself but only the description. If we
> > wanna protect a memcg being reclaimed from under global memory
> > pressure, I think we can approach it by making change on soft_limit
> > reclaim.
> > 
> > I have a soft_limit change built on top of Johannes's patchset, which
> > does basically soft_limit aware reclaim under global memory pressure.
> 
> Is there any link to the patch(es)? I would be interested to look at
> it before we discuss it.
> 

I'd like to see it, too.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-21  3:17       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 31+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-10-21  3:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ying Han, linux-mm, LKML, Johannes Weiner, Daisuke Nishimura,
	Hugh Dickins, Andrew Morton, Glauber Costa, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On Thu, 20 Oct 2011 19:45:55 -0700
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 20-10-11 16:41:27, Ying Han wrote:
> [...]
> > Hi Michal:
> 
> Hi,
> 
> > 
> > I didn't read through the patch itself but only the description. If we
> > wanna protect a memcg being reclaimed from under global memory
> > pressure, I think we can approach it by making change on soft_limit
> > reclaim.
> > 
> > I have a soft_limit change built on top of Johannes's patchset, which
> > does basically soft_limit aware reclaim under global memory pressure.
> 
> Is there any link to the patch(es)? I would be interested to look at
> it before we discuss it.
> 

I'd like to see it, too.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-20 23:41   ` Ying Han
@ 2011-10-21  8:39     ` Glauber Costa
  -1 siblings, 0 replies; 31+ messages in thread
From: Glauber Costa @ 2011-10-21  8:39 UTC (permalink / raw)
  To: Ying Han
  Cc: Michal Hocko, linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On 10/21/2011 03:41 AM, Ying Han wrote:
> On Wed, Oct 19, 2011 at 6:33 PM, Michal Hocko<mhocko@suse.cz>  wrote:
>> Hi all,
>> this is a request for discussion (I hope we can touch this during memcg
>> meeting during the upcoming KS). I have brought this up earlier this
>> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
>> The patch got much smaller since then due to excellent Johannes' memcg
>> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
>> which this is based on.
>> I realize that this will be controversial but I would like to hear
>> whether this is strictly no-go or whether we can go that direction (the
>> implementation might differ of course).
>>
>> The patch is still half baked but I guess it should be sufficient to
>> show what I am trying to achieve.
>> The basic idea is that memcgs would get a new attribute (isolated) which
>> would control whether that group should be considered during global
>> reclaim.
>> This means that we could achieve a certain memory isolation for
>> processes in the group from the rest of the system activity which has
>> been traditionally done by mlocking the important parts of memory.
>> This approach, however, has some advantages. First of all, it is a kind
>> of all or nothing type of approach. Either the memory is important and
>> mlocked or you have no guarantee that it keeps resident.
>> Secondly it is much more prone to OOM situation.
>> Let's consider a case where a memory is evictable in theory but you
>> would pay quite much if you have to get it back resident (pre calculated
>> data from database - e.g. reports). The memory wouldn't be used very
>> often so it would be a number one candidate to evict after some time.
>> We would want to have something like a clever mlock in such a case which
>> would evict that memory only if the cgroup itself gets under memory
>> pressure (e.g. peak workload). This is not hard to do if we are not
>> over committing the memory but things get tricky otherwise.
>> With the isolated memcgs we get exactly such a guarantee because we would
>> reclaim such a memory only from the hard limit reclaim paths or if the
>> soft limit reclaim if it is set up.
>>
>> Any thoughts comments?
>>
>> ---
>> From: Michal Hocko<mhocko@suse.cz>
>> Subject: Implement isolated cgroups
>>
>> This patch adds a new per-cgroup knob (isolated) which controls whether
>> pages charged for the group should be considered for the global reclaim
>> or they are reclaimed only during soft reclaim and under per-cgroup
>> memory pressure.
>>
>> The value can be modified by GROUP/memory.isolated knob.
>>
>> The primary idea behind isolated cgroups is in a better isolation of a group
>> from the global system activity. At the moment, memory cgroups are mainly
>> used to throttle processes in a group by placing a cap on their memory
>> usage. However, mem. cgroups don't protect their (charged) memory from being
>> evicted by the global reclaim as groups are considered during global
>> reclaim.
>>
>> The feature will provide an easy way to setup a mission critical workload in
>> the memory isolated environment without necessity of mlock. Due to
>> per-cgroup reclaim we can even handle memory usage spikes much more
>> gracefully because a part of the working set can get reclaimed (unlike OOM
>> killed as if mlock has been used). So we can look at the feature as an
>> intelligent mlock (protect from external memory pressure and reclaim on
>> internal pressure).
>>
>> The implementation ignores isolated group status for the soft reclaim which
>> means that every isolated group can configure how much memory it can
>> sacrifice under global memory pressure. Soft unlimited groups are isolated
>> from the global memory pressure completely.
>>
>> Please note that the feature has to be used with caution because isolated
>> groups will make a bigger reclaim pressure to non-isolated cgroups.
>>
>> Implementation is really simple because we just have to hook into shrink_zone
>> and exclude isolated groups if we are doing the global reclaiming.
>>
>> Signed-off-by: Michal Hocko<mhocko@suse.cz>
>>
>> TODO
>> - consider hierarchies - I am not sure whether we want to have
>>   non-consistent isolated status in the hierarchy - probably not
>> - handle root cgroup
>> - Do we want some checks whether the current setting is safe?
>> - is bool sufficient. Don't we rather want something like priority
>>   instead?
>>
>>
>>   include/linux/memcontrol.h |    7 +++++++
>>   mm/memcontrol.c            |   44 ++++++++++++++++++++++++++++++++++++++++++++
>>   mm/vmscan.c                |    8 +++++++-
>>   3 files changed, 58 insertions(+), 1 deletion(-)
>>
>> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
>> ===================================================================
>> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/memcontrol.c
>> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
>> @@ -258,6 +258,9 @@ struct mem_cgroup {
>>         /* set when res.limit == memsw.limit */
>>         bool            memsw_is_minimum;
>>
>> +       /* is the group isolated from the global memory pressure? */
>> +       bool            isolated;
>> +
>>         /* protect arrays of thresholds */
>>         struct mutex thresholds_lock;
>>
>> @@ -287,6 +290,11 @@ struct mem_cgroup {
>>         spinlock_t pcp_counter_lock;
>>   };
>>
>> +bool mem_cgroup_isolated(struct mem_cgroup *mem)
>> +{
>> +       return mem->isolated;
>> +}
>> +
>>   /* Stuffs for move charges at task migration. */
>>   /*
>>   * Types of charges to be moved. "move_charge_at_immitgrate" is treated as a
>> @@ -4561,6 +4569,37 @@ static int mem_control_numa_stat_open(st
>>   }
>>   #endif /* CONFIG_NUMA */
>>
>> +static int mem_cgroup_isolated_write(struct cgroup *cgrp, struct cftype *cft,
>> +               const char *buffer)
>> +{
>> +       int ret = -EINVAL;
>> +       struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
>> +
>> +       if (mem_cgroup_is_root(mem))
>> +               goto out;
>> +
>> +       if (!strcasecmp(buffer, "true"))
>> +               mem->isolated = true;
>> +       else if (!strcasecmp(buffer, "false"))
>> +               mem->isolated = false;
>> +       else
>> +               goto out;
>> +
>> +       ret = 0;
>> +out:
>> +       return ret;
>> +}
>> +
>> +static int mem_cgroup_isolated_read(struct cgroup *cgrp, struct cftype *cft,
>> +               struct seq_file *seq)
>> +{
>> +       struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
>> +
>> +       seq_puts(seq, (mem->isolated)?"true":"false");
>> +
>> +       return 0;
>> +}
>> +
>>   static struct cftype mem_cgroup_files[] = {
>>         {
>>                 .name = "usage_in_bytes",
>> @@ -4624,6 +4663,11 @@ static struct cftype mem_cgroup_files[]
>>                 .unregister_event = mem_cgroup_oom_unregister_event,
>>                 .private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
>>         },
>> +       {
>> +               .name = "isolated",
>> +               .write_string = mem_cgroup_isolated_write,
>> +               .read_seq_string = mem_cgroup_isolated_read,
>> +       },
>>   #ifdef CONFIG_NUMA
>>         {
>>                 .name = "numa_stat",
>> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
>> ===================================================================
>> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/include/linux/memcontrol.h
>> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
>> @@ -165,6 +165,9 @@ void mem_cgroup_split_huge_fixup(struct
>>   bool mem_cgroup_bad_page_check(struct page *page);
>>   void mem_cgroup_print_bad_page(struct page *page);
>>   #endif
>> +
>> +bool mem_cgroup_isolated(struct mem_cgroup *mem);
>> +
>>   #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>>   struct mem_cgroup;
>>
>> @@ -382,6 +385,10 @@ static inline
>>   void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
>>   {
>>   }
>> +bool mem_cgroup_isolated(struct mem_cgroup *mem)
>> +{
>> +       return false;
>> +}
>>   #endif /* CONFIG_CGROUP_MEM_CONT */
>>
>>   #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
>> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
>> ===================================================================
>> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/vmscan.c
>> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
>> @@ -2109,7 +2109,13 @@ static void shrink_zone(int priority, st
>>                         .zone = zone,
>>                 };
>>
>> -               shrink_mem_cgroup_zone(priority,&mz, sc);
>> +               /*
>> +                * Do not reclaim from an isolated group if we are in
>> +                * the global reclaim.
>> +                */
>> +               if (!(mem_cgroup_isolated(mem)&&  global_reclaim(sc)))
>> +                       shrink_mem_cgroup_zone(priority,&mz, sc);
>> +
>>                 /*
>>                  * Limit reclaim has historically picked one memcg and
>>                  * scanned it with decreasing priority levels until
>> --
>> Michal Hocko
>> SUSE Labs
>> SUSE LINUX s.r.o.
>> Lihovarska 1060/12
>> 190 00 Praha 9
>> Czech Republic
>>
>
> Hi Michal:
>
> I didn't read through the patch itself but only the description. If we
> wanna protect a memcg being reclaimed from under global memory
> pressure, I think we can approach it by making change on soft_limit
> reclaim.
>
> I have a soft_limit change built on top of Johannes's patchset, which
> does basically soft_limit aware reclaim under global memory pressure.
> The implementation is simple, and I am looking forward to discuss more
> with you guys in the conference.
>
> --Ying
I don't think soft limits will help his case, if I know understand it 
correctly. Global reclaim can be triggered regardless of any soft limits 
we may set.

Now, there are two things I still don't like about it:
* The definition of a "main workload", "main cgroup", or anything like 
that. I'd prefer to rank them according to some parameter, something 
akin to swapiness. This would allow for other people to use it in a 
different way, while still making you capable of reaching your goals 
through parameter settings (i.e. one cgroup has a high value of reclaim, 
all others, a much lower one)

* The fact that you seem to want to *skip* reclaim altogether for a 
cgroup. That's a dangerous condition, IMHO. What I think we should try 
to achieve, is "skip it for practical purposes on sane workloads". 
Again, a parameter that when set to a very high mark, has the effect of 
disallowing reclaim for a cgroup under most sane circumstances.

What do you think of the above, Michal ?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-21  8:39     ` Glauber Costa
  0 siblings, 0 replies; 31+ messages in thread
From: Glauber Costa @ 2011-10-21  8:39 UTC (permalink / raw)
  To: Ying Han
  Cc: Michal Hocko, linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On 10/21/2011 03:41 AM, Ying Han wrote:
> On Wed, Oct 19, 2011 at 6:33 PM, Michal Hocko<mhocko@suse.cz>  wrote:
>> Hi all,
>> this is a request for discussion (I hope we can touch this during memcg
>> meeting during the upcoming KS). I have brought this up earlier this
>> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
>> The patch got much smaller since then due to excellent Johannes' memcg
>> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
>> which this is based on.
>> I realize that this will be controversial but I would like to hear
>> whether this is strictly no-go or whether we can go that direction (the
>> implementation might differ of course).
>>
>> The patch is still half baked but I guess it should be sufficient to
>> show what I am trying to achieve.
>> The basic idea is that memcgs would get a new attribute (isolated) which
>> would control whether that group should be considered during global
>> reclaim.
>> This means that we could achieve a certain memory isolation for
>> processes in the group from the rest of the system activity which has
>> been traditionally done by mlocking the important parts of memory.
>> This approach, however, has some advantages. First of all, it is a kind
>> of all or nothing type of approach. Either the memory is important and
>> mlocked or you have no guarantee that it keeps resident.
>> Secondly it is much more prone to OOM situation.
>> Let's consider a case where a memory is evictable in theory but you
>> would pay quite much if you have to get it back resident (pre calculated
>> data from database - e.g. reports). The memory wouldn't be used very
>> often so it would be a number one candidate to evict after some time.
>> We would want to have something like a clever mlock in such a case which
>> would evict that memory only if the cgroup itself gets under memory
>> pressure (e.g. peak workload). This is not hard to do if we are not
>> over committing the memory but things get tricky otherwise.
>> With the isolated memcgs we get exactly such a guarantee because we would
>> reclaim such a memory only from the hard limit reclaim paths or if the
>> soft limit reclaim if it is set up.
>>
>> Any thoughts comments?
>>
>> ---
>> From: Michal Hocko<mhocko@suse.cz>
>> Subject: Implement isolated cgroups
>>
>> This patch adds a new per-cgroup knob (isolated) which controls whether
>> pages charged for the group should be considered for the global reclaim
>> or they are reclaimed only during soft reclaim and under per-cgroup
>> memory pressure.
>>
>> The value can be modified by GROUP/memory.isolated knob.
>>
>> The primary idea behind isolated cgroups is in a better isolation of a group
>> from the global system activity. At the moment, memory cgroups are mainly
>> used to throttle processes in a group by placing a cap on their memory
>> usage. However, mem. cgroups don't protect their (charged) memory from being
>> evicted by the global reclaim as groups are considered during global
>> reclaim.
>>
>> The feature will provide an easy way to setup a mission critical workload in
>> the memory isolated environment without necessity of mlock. Due to
>> per-cgroup reclaim we can even handle memory usage spikes much more
>> gracefully because a part of the working set can get reclaimed (unlike OOM
>> killed as if mlock has been used). So we can look at the feature as an
>> intelligent mlock (protect from external memory pressure and reclaim on
>> internal pressure).
>>
>> The implementation ignores isolated group status for the soft reclaim which
>> means that every isolated group can configure how much memory it can
>> sacrifice under global memory pressure. Soft unlimited groups are isolated
>> from the global memory pressure completely.
>>
>> Please note that the feature has to be used with caution because isolated
>> groups will make a bigger reclaim pressure to non-isolated cgroups.
>>
>> Implementation is really simple because we just have to hook into shrink_zone
>> and exclude isolated groups if we are doing the global reclaiming.
>>
>> Signed-off-by: Michal Hocko<mhocko@suse.cz>
>>
>> TODO
>> - consider hierarchies - I am not sure whether we want to have
>>   non-consistent isolated status in the hierarchy - probably not
>> - handle root cgroup
>> - Do we want some checks whether the current setting is safe?
>> - is bool sufficient. Don't we rather want something like priority
>>   instead?
>>
>>
>>   include/linux/memcontrol.h |    7 +++++++
>>   mm/memcontrol.c            |   44 ++++++++++++++++++++++++++++++++++++++++++++
>>   mm/vmscan.c                |    8 +++++++-
>>   3 files changed, 58 insertions(+), 1 deletion(-)
>>
>> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
>> ===================================================================
>> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/memcontrol.c
>> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/memcontrol.c
>> @@ -258,6 +258,9 @@ struct mem_cgroup {
>>         /* set when res.limit == memsw.limit */
>>         bool            memsw_is_minimum;
>>
>> +       /* is the group isolated from the global memory pressure? */
>> +       bool            isolated;
>> +
>>         /* protect arrays of thresholds */
>>         struct mutex thresholds_lock;
>>
>> @@ -287,6 +290,11 @@ struct mem_cgroup {
>>         spinlock_t pcp_counter_lock;
>>   };
>>
>> +bool mem_cgroup_isolated(struct mem_cgroup *mem)
>> +{
>> +       return mem->isolated;
>> +}
>> +
>>   /* Stuffs for move charges at task migration. */
>>   /*
>>   * Types of charges to be moved. "move_charge_at_immitgrate" is treated as a
>> @@ -4561,6 +4569,37 @@ static int mem_control_numa_stat_open(st
>>   }
>>   #endif /* CONFIG_NUMA */
>>
>> +static int mem_cgroup_isolated_write(struct cgroup *cgrp, struct cftype *cft,
>> +               const char *buffer)
>> +{
>> +       int ret = -EINVAL;
>> +       struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
>> +
>> +       if (mem_cgroup_is_root(mem))
>> +               goto out;
>> +
>> +       if (!strcasecmp(buffer, "true"))
>> +               mem->isolated = true;
>> +       else if (!strcasecmp(buffer, "false"))
>> +               mem->isolated = false;
>> +       else
>> +               goto out;
>> +
>> +       ret = 0;
>> +out:
>> +       return ret;
>> +}
>> +
>> +static int mem_cgroup_isolated_read(struct cgroup *cgrp, struct cftype *cft,
>> +               struct seq_file *seq)
>> +{
>> +       struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
>> +
>> +       seq_puts(seq, (mem->isolated)?"true":"false");
>> +
>> +       return 0;
>> +}
>> +
>>   static struct cftype mem_cgroup_files[] = {
>>         {
>>                 .name = "usage_in_bytes",
>> @@ -4624,6 +4663,11 @@ static struct cftype mem_cgroup_files[]
>>                 .unregister_event = mem_cgroup_oom_unregister_event,
>>                 .private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
>>         },
>> +       {
>> +               .name = "isolated",
>> +               .write_string = mem_cgroup_isolated_write,
>> +               .read_seq_string = mem_cgroup_isolated_read,
>> +       },
>>   #ifdef CONFIG_NUMA
>>         {
>>                 .name = "numa_stat",
>> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
>> ===================================================================
>> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/include/linux/memcontrol.h
>> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/include/linux/memcontrol.h
>> @@ -165,6 +165,9 @@ void mem_cgroup_split_huge_fixup(struct
>>   bool mem_cgroup_bad_page_check(struct page *page);
>>   void mem_cgroup_print_bad_page(struct page *page);
>>   #endif
>> +
>> +bool mem_cgroup_isolated(struct mem_cgroup *mem);
>> +
>>   #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>>   struct mem_cgroup;
>>
>> @@ -382,6 +385,10 @@ static inline
>>   void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
>>   {
>>   }
>> +bool mem_cgroup_isolated(struct mem_cgroup *mem)
>> +{
>> +       return false;
>> +}
>>   #endif /* CONFIG_CGROUP_MEM_CONT */
>>
>>   #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
>> Index: linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
>> ===================================================================
>> --- linux-3.1-rc4-next-20110831-mmotm-isolated-memcg.orig/mm/vmscan.c
>> +++ linux-3.1-rc4-next-20110831-mmotm-isolated-memcg/mm/vmscan.c
>> @@ -2109,7 +2109,13 @@ static void shrink_zone(int priority, st
>>                         .zone = zone,
>>                 };
>>
>> -               shrink_mem_cgroup_zone(priority,&mz, sc);
>> +               /*
>> +                * Do not reclaim from an isolated group if we are in
>> +                * the global reclaim.
>> +                */
>> +               if (!(mem_cgroup_isolated(mem)&&  global_reclaim(sc)))
>> +                       shrink_mem_cgroup_zone(priority,&mz, sc);
>> +
>>                 /*
>>                  * Limit reclaim has historically picked one memcg and
>>                  * scanned it with decreasing priority levels until
>> --
>> Michal Hocko
>> SUSE Labs
>> SUSE LINUX s.r.o.
>> Lihovarska 1060/12
>> 190 00 Praha 9
>> Czech Republic
>>
>
> Hi Michal:
>
> I didn't read through the patch itself but only the description. If we
> wanna protect a memcg being reclaimed from under global memory
> pressure, I think we can approach it by making change on soft_limit
> reclaim.
>
> I have a soft_limit change built on top of Johannes's patchset, which
> does basically soft_limit aware reclaim under global memory pressure.
> The implementation is simple, and I am looking forward to discuss more
> with you guys in the conference.
>
> --Ying
I don't think soft limits will help his case, if I know understand it 
correctly. Global reclaim can be triggered regardless of any soft limits 
we may set.

Now, there are two things I still don't like about it:
* The definition of a "main workload", "main cgroup", or anything like 
that. I'd prefer to rank them according to some parameter, something 
akin to swapiness. This would allow for other people to use it in a 
different way, while still making you capable of reaching your goals 
through parameter settings (i.e. one cgroup has a high value of reclaim, 
all others, a much lower one)

* The fact that you seem to want to *skip* reclaim altogether for a 
cgroup. That's a dangerous condition, IMHO. What I think we should try 
to achieve, is "skip it for practical purposes on sane workloads". 
Again, a parameter that when set to a very high mark, has the effect of 
disallowing reclaim for a cgroup under most sane circumstances.

What do you think of the above, Michal ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-21  8:39     ` Glauber Costa
@ 2011-10-21 12:16       ` Johannes Weiner
  -1 siblings, 0 replies; 31+ messages in thread
From: Johannes Weiner @ 2011-10-21 12:16 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Ying Han, Michal Hocko, linux-mm, LKML, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On Fri, Oct 21, 2011 at 12:39:22PM +0400, Glauber Costa wrote:
> On 10/21/2011 03:41 AM, Ying Han wrote:
> >On Wed, Oct 19, 2011 at 6:33 PM, Michal Hocko<mhocko@suse.cz>  wrote:
> >>Hi all,
> >>this is a request for discussion (I hope we can touch this during memcg
> >>meeting during the upcoming KS). I have brought this up earlier this
> >>year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> >>The patch got much smaller since then due to excellent Johannes' memcg
> >>naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> >>which this is based on.
> >>I realize that this will be controversial but I would like to hear
> >>whether this is strictly no-go or whether we can go that direction (the
> >>implementation might differ of course).
> >>
> >>The patch is still half baked but I guess it should be sufficient to
> >>show what I am trying to achieve.
> >>The basic idea is that memcgs would get a new attribute (isolated) which
> >>would control whether that group should be considered during global
> >>reclaim.
> >>This means that we could achieve a certain memory isolation for
> >>processes in the group from the rest of the system activity which has
> >>been traditionally done by mlocking the important parts of memory.
> >>This approach, however, has some advantages. First of all, it is a kind
> >>of all or nothing type of approach. Either the memory is important and
> >>mlocked or you have no guarantee that it keeps resident.
> >>Secondly it is much more prone to OOM situation.
> >>Let's consider a case where a memory is evictable in theory but you
> >>would pay quite much if you have to get it back resident (pre calculated
> >>data from database - e.g. reports). The memory wouldn't be used very
> >>often so it would be a number one candidate to evict after some time.
> >>We would want to have something like a clever mlock in such a case which
> >>would evict that memory only if the cgroup itself gets under memory
> >>pressure (e.g. peak workload). This is not hard to do if we are not
> >>over committing the memory but things get tricky otherwise.
> >>With the isolated memcgs we get exactly such a guarantee because we would
> >>reclaim such a memory only from the hard limit reclaim paths or if the
> >>soft limit reclaim if it is set up.
> >>
> >>Any thoughts comments?
> >
> >I didn't read through the patch itself but only the description. If we
> >wanna protect a memcg being reclaimed from under global memory
> >pressure, I think we can approach it by making change on soft_limit
> >reclaim.
> >
> >I have a soft_limit change built on top of Johannes's patchset, which
> >does basically soft_limit aware reclaim under global memory pressure.
> >The implementation is simple, and I am looking forward to discuss more
> >with you guys in the conference.
> 
> I don't think soft limits will help his case, if I know understand
> it correctly. Global reclaim can be triggered regardless of any soft
> limits we may set.
> 
> Now, there are two things I still don't like about it:
> * The definition of a "main workload", "main cgroup", or anything
> like that. I'd prefer to rank them according to some parameter,
> something akin to swapiness. This would allow for other people to
> use it in a different way, while still making you capable of
> reaching your goals through parameter settings (i.e. one cgroup has
> a high value of reclaim, all others, a much lower one)

This is essentially what I wanted to convert soft limit reclaim to: if
a cgroup is considered for reclaim and its exceeding its soft limit,
the amount of scanning force applied to it is doubled compared to its
buddies that are scanned in the same cycle.

> * The fact that you seem to want to *skip* reclaim altogether for a
> cgroup. That's a dangerous condition, IMHO. What I think we should
> try to achieve, is "skip it for practical purposes on sane
> workloads". Again, a parameter that when set to a very high mark,
> has the effect of disallowing reclaim for a cgroup under most sane
> circumstances.

Yes.  I think it would be better to have a minimum guarantee setting
rather than a wholesale cgroup isolation.  If the cgroup's memory
usage is below that guarantee, reclaim skips it.  If you insist, you
can still set this to ULONG_MAX.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-21 12:16       ` Johannes Weiner
  0 siblings, 0 replies; 31+ messages in thread
From: Johannes Weiner @ 2011-10-21 12:16 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Ying Han, Michal Hocko, linux-mm, LKML, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On Fri, Oct 21, 2011 at 12:39:22PM +0400, Glauber Costa wrote:
> On 10/21/2011 03:41 AM, Ying Han wrote:
> >On Wed, Oct 19, 2011 at 6:33 PM, Michal Hocko<mhocko@suse.cz>  wrote:
> >>Hi all,
> >>this is a request for discussion (I hope we can touch this during memcg
> >>meeting during the upcoming KS). I have brought this up earlier this
> >>year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> >>The patch got much smaller since then due to excellent Johannes' memcg
> >>naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> >>which this is based on.
> >>I realize that this will be controversial but I would like to hear
> >>whether this is strictly no-go or whether we can go that direction (the
> >>implementation might differ of course).
> >>
> >>The patch is still half baked but I guess it should be sufficient to
> >>show what I am trying to achieve.
> >>The basic idea is that memcgs would get a new attribute (isolated) which
> >>would control whether that group should be considered during global
> >>reclaim.
> >>This means that we could achieve a certain memory isolation for
> >>processes in the group from the rest of the system activity which has
> >>been traditionally done by mlocking the important parts of memory.
> >>This approach, however, has some advantages. First of all, it is a kind
> >>of all or nothing type of approach. Either the memory is important and
> >>mlocked or you have no guarantee that it keeps resident.
> >>Secondly it is much more prone to OOM situation.
> >>Let's consider a case where a memory is evictable in theory but you
> >>would pay quite much if you have to get it back resident (pre calculated
> >>data from database - e.g. reports). The memory wouldn't be used very
> >>often so it would be a number one candidate to evict after some time.
> >>We would want to have something like a clever mlock in such a case which
> >>would evict that memory only if the cgroup itself gets under memory
> >>pressure (e.g. peak workload). This is not hard to do if we are not
> >>over committing the memory but things get tricky otherwise.
> >>With the isolated memcgs we get exactly such a guarantee because we would
> >>reclaim such a memory only from the hard limit reclaim paths or if the
> >>soft limit reclaim if it is set up.
> >>
> >>Any thoughts comments?
> >
> >I didn't read through the patch itself but only the description. If we
> >wanna protect a memcg being reclaimed from under global memory
> >pressure, I think we can approach it by making change on soft_limit
> >reclaim.
> >
> >I have a soft_limit change built on top of Johannes's patchset, which
> >does basically soft_limit aware reclaim under global memory pressure.
> >The implementation is simple, and I am looking forward to discuss more
> >with you guys in the conference.
> 
> I don't think soft limits will help his case, if I know understand
> it correctly. Global reclaim can be triggered regardless of any soft
> limits we may set.
> 
> Now, there are two things I still don't like about it:
> * The definition of a "main workload", "main cgroup", or anything
> like that. I'd prefer to rank them according to some parameter,
> something akin to swapiness. This would allow for other people to
> use it in a different way, while still making you capable of
> reaching your goals through parameter settings (i.e. one cgroup has
> a high value of reclaim, all others, a much lower one)

This is essentially what I wanted to convert soft limit reclaim to: if
a cgroup is considered for reclaim and its exceeding its soft limit,
the amount of scanning force applied to it is doubled compared to its
buddies that are scanned in the same cycle.

> * The fact that you seem to want to *skip* reclaim altogether for a
> cgroup. That's a dangerous condition, IMHO. What I think we should
> try to achieve, is "skip it for practical purposes on sane
> workloads". Again, a parameter that when set to a very high mark,
> has the effect of disallowing reclaim for a cgroup under most sane
> circumstances.

Yes.  I think it would be better to have a minimum guarantee setting
rather than a wholesale cgroup isolation.  If the cgroup's memory
usage is below that guarantee, reclaim skips it.  If you insist, you
can still set this to ULONG_MAX.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-20  1:59   ` KAMEZAWA Hiroyuki
@ 2011-10-21 16:04     ` Balbir Singh
  -1 siblings, 0 replies; 31+ messages in thread
From: Balbir Singh @ 2011-10-21 16:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michal Hocko, linux-mm, LKML, Johannes Weiner, Daisuke Nishimura,
	Hugh Dickins, Ying Han, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On Thu, Oct 20, 2011 at 7:29 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 19 Oct 2011 18:33:09 -0700
> Michal Hocko <mhocko@suse.cz> wrote:
>
>> Hi all,
>> this is a request for discussion (I hope we can touch this during memcg
>> meeting during the upcoming KS). I have brought this up earlier this
>> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
>> The patch got much smaller since then due to excellent Johannes' memcg
>> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
>> which this is based on.
>

Hi, Michal

I'd like to understand, what the isolation is for?

1. Is it an alternative to memory guarantees?
2. How is this different from doing cpusets (fake NUMA) and isolating them?

Just trying to catch up,
Balbir

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-21 16:04     ` Balbir Singh
  0 siblings, 0 replies; 31+ messages in thread
From: Balbir Singh @ 2011-10-21 16:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michal Hocko, linux-mm, LKML, Johannes Weiner, Daisuke Nishimura,
	Hugh Dickins, Ying Han, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On Thu, Oct 20, 2011 at 7:29 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 19 Oct 2011 18:33:09 -0700
> Michal Hocko <mhocko@suse.cz> wrote:
>
>> Hi all,
>> this is a request for discussion (I hope we can touch this during memcg
>> meeting during the upcoming KS). I have brought this up earlier this
>> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
>> The patch got much smaller since then due to excellent Johannes' memcg
>> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
>> which this is based on.
>

Hi, Michal

I'd like to understand, what the isolation is for?

1. Is it an alternative to memory guarantees?
2. How is this different from doing cpusets (fake NUMA) and isolating them?

Just trying to catch up,
Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-20  1:59   ` KAMEZAWA Hiroyuki
@ 2011-10-21 16:11     ` Balbir Singh
  -1 siblings, 0 replies; 31+ messages in thread
From: Balbir Singh @ 2011-10-21 16:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michal Hocko, linux-mm, LKML, Johannes Weiner, Daisuke Nishimura,
	Hugh Dickins, Ying Han, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

> But I personally think we should make softlimit better rather than
> adding new interface. If this feature can be archieved when setting
> softlimit=UNLIMITED, it's simple. And Johannes' work will make this
> easy to be implemented.
> (total rewrite of softlimit should be required...I think.)
>

Yeah.. I'd be open to a rewrite if we get the specification/design
right. I did soft limits in a few months and tested it on workloads
till I was satisfied it worked.

Balbir Singh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-21 16:11     ` Balbir Singh
  0 siblings, 0 replies; 31+ messages in thread
From: Balbir Singh @ 2011-10-21 16:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michal Hocko, linux-mm, LKML, Johannes Weiner, Daisuke Nishimura,
	Hugh Dickins, Ying Han, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

> But I personally think we should make softlimit better rather than
> adding new interface. If this feature can be archieved when setting
> softlimit=UNLIMITED, it's simple. And Johannes' work will make this
> easy to be implemented.
> (total rewrite of softlimit should be required...I think.)
>

Yeah.. I'd be open to a rewrite if we get the specification/design
right. I did soft limits in a few months and tested it on workloads
till I was satisfied it worked.

Balbir Singh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-21  3:17       ` KAMEZAWA Hiroyuki
  (?)
@ 2011-10-21 20:00       ` Ying Han
  2011-10-22  9:31           ` Michal Hocko
  -1 siblings, 1 reply; 31+ messages in thread
From: Ying Han @ 2011-10-21 20:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Michal Hocko, linux-mm, LKML, Johannes Weiner, Daisuke Nishimura,
	Hugh Dickins, Andrew Morton, Glauber Costa, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

On Thursday, October 20, 2011, KAMEZAWA Hiroyuki <
kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 20 Oct 2011 19:45:55 -0700
> Michal Hocko <mhocko@suse.cz> wrote:
>
>> On Thu 20-10-11 16:41:27, Ying Han wrote:
>> [...]
>> > Hi Michal:
>>
>> Hi,
>>
>> >
>> > I didn't read through the patch itself but only the description. If we
>> > wanna protect a memcg being reclaimed from under global memory
>> > pressure, I think we can approach it by making change on soft_limit
>> > reclaim.
>> >
>> > I have a soft_limit change built on top of Johannes's patchset, which
>> > does basically soft_limit aware reclaim under global memory pressure.
>>
>> Is there any link to the patch(es)? I would be interested to look at
>> it before we discuss it.
>>
>
> I'd like to see it, too.
>
> Thanks,
> -Kame
>
Now I am at airport heading to Prague , I will try to post one before the
meeting if possible. The current patch is simple enough which most of the
work are reverting the existing soft limit implementation and then the new
logic is based on the memcg aware global reclaim.

The logic is based on reclaim priority, and we skip reclaim from certain
memcg(under soft limit) before getting down to DEF_PRIORITY - 3. This is
simple enough to get us start collecting some data result and I am looking
forward to discuss more thoughts in the meeting


--ying

[-- Attachment #2: Type: text/html, Size: 1745 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-21 16:04     ` Balbir Singh
@ 2011-10-22  9:26       ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-22  9:26 UTC (permalink / raw)
  To: Balbir Singh
  Cc: KAMEZAWA Hiroyuki, linux-mm, LKML, Johannes Weiner,
	Daisuke Nishimura, Hugh Dickins, Ying Han, Andrew Morton,
	Glauber Costa, Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt,
	Tim Hockin, Dave Hansen, Paul Menage, James Bottomley

On Fri 21-10-11 21:34:06, Balbir Singh wrote:
> On Thu, Oct 20, 2011 at 7:29 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Wed, 19 Oct 2011 18:33:09 -0700
> > Michal Hocko <mhocko@suse.cz> wrote:
> >
> >> Hi all,
> >> this is a request for discussion (I hope we can touch this during memcg
> >> meeting during the upcoming KS). I have brought this up earlier this
> >> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> >> The patch got much smaller since then due to excellent Johannes' memcg
> >> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> >> which this is based on.
> >
> 
> Hi, Michal

Hi Balbir,

> 
> I'd like to understand, what the isolation is for?
> 
> 1. Is it an alternative to memory guarantees?

Not really, it is more about resident working set guarantee and workload
isolations wrt. memory.

> 2. How is this different from doing cpusets (fake NUMA) and isolating them?

Yes this would work. I have not many experiences in this area but I
guess the primary stopper for fake NUMA is that it is x86_64 only,
configuration is static and little bit awkward to use (nodes of the same
size e.g.).
I understood that google is moving out of fake NUMA towards memcg for those
reasons.

> 
> Just trying to catch up,
> Balbir

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-22  9:26       ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-22  9:26 UTC (permalink / raw)
  To: Balbir Singh
  Cc: KAMEZAWA Hiroyuki, linux-mm, LKML, Johannes Weiner,
	Daisuke Nishimura, Hugh Dickins, Ying Han, Andrew Morton,
	Glauber Costa, Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt,
	Tim Hockin, Dave Hansen, Paul Menage, James Bottomley

On Fri 21-10-11 21:34:06, Balbir Singh wrote:
> On Thu, Oct 20, 2011 at 7:29 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Wed, 19 Oct 2011 18:33:09 -0700
> > Michal Hocko <mhocko@suse.cz> wrote:
> >
> >> Hi all,
> >> this is a request for discussion (I hope we can touch this during memcg
> >> meeting during the upcoming KS). I have brought this up earlier this
> >> year before LSF (http://thread.gmane.org/gmane.linux.kernel.mm/60464).
> >> The patch got much smaller since then due to excellent Johannes' memcg
> >> naturalization work (http://thread.gmane.org/gmane.linux.kernel.mm/68724)
> >> which this is based on.
> >
> 
> Hi, Michal

Hi Balbir,

> 
> I'd like to understand, what the isolation is for?
> 
> 1. Is it an alternative to memory guarantees?

Not really, it is more about resident working set guarantee and workload
isolations wrt. memory.

> 2. How is this different from doing cpusets (fake NUMA) and isolating them?

Yes this would work. I have not many experiences in this area but I
guess the primary stopper for fake NUMA is that it is x86_64 only,
configuration is static and little bit awkward to use (nodes of the same
size e.g.).
I understood that google is moving out of fake NUMA towards memcg for those
reasons.

> 
> Just trying to catch up,
> Balbir

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-21 20:00       ` Ying Han
@ 2011-10-22  9:31           ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-22  9:31 UTC (permalink / raw)
  To: Ying Han
  Cc: KAMEZAWA Hiroyuki, linux-mm, LKML, Johannes Weiner,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On Fri 21-10-11 13:00:18, Ying Han wrote:
[...]
> The logic is based on reclaim priority, and we skip reclaim from certain
> memcg(under soft limit) before getting down to DEF_PRIORITY - 3.

OK, I guess I remember something from the earlier memcg naturalization
patch set discussions. This will still not help much for my case as the
bigger memory pressure would cause reclaim also from the soft unlimited
group which I would like to prevent.
The other thing about soft limit only reclaim from the global reclaim is
that currently all created memcgs are soft unlimited by default which
might lead to unexpected results. Can we come up with a reasonable soft
limit default?

[...]
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-22  9:31           ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-22  9:31 UTC (permalink / raw)
  To: Ying Han
  Cc: KAMEZAWA Hiroyuki, linux-mm, LKML, Johannes Weiner,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Glauber Costa,
	Kir Kolyshkin, Pavel Emelianov, GregThelen, pjt, Tim Hockin,
	Dave Hansen, Paul Menage, James Bottomley

On Fri 21-10-11 13:00:18, Ying Han wrote:
[...]
> The logic is based on reclaim priority, and we skip reclaim from certain
> memcg(under soft limit) before getting down to DEF_PRIORITY - 3.

OK, I guess I remember something from the earlier memcg naturalization
patch set discussions. This will still not help much for my case as the
bigger memory pressure would cause reclaim also from the soft unlimited
group which I would like to prevent.
The other thing about soft limit only reclaim from the global reclaim is
that currently all created memcgs are soft unlimited by default which
might lead to unexpected results. Can we come up with a reasonable soft
limit default?

[...]
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
  2011-10-21  8:39     ` Glauber Costa
@ 2011-10-22  9:47       ` Michal Hocko
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-22  9:47 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Ying Han, linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On Fri 21-10-11 12:39:22, Glauber Costa wrote:
> On 10/21/2011 03:41 AM, Ying Han wrote:
> >On Wed, Oct 19, 2011 at 6:33 PM, Michal Hocko<mhocko@suse.cz>  wrote:
[...]
> >>TODO
[...]
> >>- is bool sufficient. Don't we rather want something like priority
> >>  instead?
[...]
> >Hi Michal:
> >
> >I didn't read through the patch itself but only the description. If we
> >wanna protect a memcg being reclaimed from under global memory
> >pressure, I think we can approach it by making change on soft_limit
> >reclaim.
> >
> >I have a soft_limit change built on top of Johannes's patchset, which
> >does basically soft_limit aware reclaim under global memory pressure.
> >The implementation is simple, and I am looking forward to discuss more
> >with you guys in the conference.
> >
> >--Ying
> I don't think soft limits will help his case, if I know understand
> it correctly. Global reclaim can be triggered regardless of any soft
> limits we may set.
> 
> Now, there are two things I still don't like about it:
> * The definition of a "main workload", "main cgroup", or anything
> like that.

This was just because I wanted to point out the particular case that I
am interested in. You can of course setup more cgroups to be isolated
and balance them by the soft limit.

> I'd prefer to rank them according to some parameter,
> something akin to swapiness. This would allow for other people to
> use it in a different way, while still making you capable of
> reaching your goals through parameter settings (i.e. one cgroup has
> a high value of reclaim, all others, a much lower one)

Yes, this has been mentioned in the patch TODO section (above). I wanted
the first post to be as easy as possible for the discussion starter. I
guess that we really need something like priority in fact.

> 
> * The fact that you seem to want to *skip* reclaim altogether for a
> cgroup. That's a dangerous condition, IMHO. What I think we should
> try to achieve, is "skip it for practical purposes on sane
> workloads". 

Yes the feature might be dangerous (we provide many ways to shoot self
toes already ;)) but that is what you get if you want to guarantee
something.
But I agree, I guess we can be more clever and if it is priority based
we can map isolation priorities to the reclaim priorities somehow.

> Again, a parameter that when set to a very high mark, has the effect
> of disallowing reclaim for a cgroup under most sane circumstances.
> 
> What do you think of the above, Michal ?

Yes I guess that priority based isolation is the way to go. We should,
however, start with a consensus in this regard (should we do something
like that at all?).

Thanks
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFD] Isolated memory cgroups again
@ 2011-10-22  9:47       ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2011-10-22  9:47 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Ying Han, linux-mm, LKML, Johannes Weiner, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Hugh Dickins, Andrew Morton, Kir Kolyshkin,
	Pavel Emelianov, GregThelen, pjt, Tim Hockin, Dave Hansen,
	Paul Menage, James Bottomley

On Fri 21-10-11 12:39:22, Glauber Costa wrote:
> On 10/21/2011 03:41 AM, Ying Han wrote:
> >On Wed, Oct 19, 2011 at 6:33 PM, Michal Hocko<mhocko@suse.cz>  wrote:
[...]
> >>TODO
[...]
> >>- is bool sufficient. Don't we rather want something like priority
> >>  instead?
[...]
> >Hi Michal:
> >
> >I didn't read through the patch itself but only the description. If we
> >wanna protect a memcg being reclaimed from under global memory
> >pressure, I think we can approach it by making change on soft_limit
> >reclaim.
> >
> >I have a soft_limit change built on top of Johannes's patchset, which
> >does basically soft_limit aware reclaim under global memory pressure.
> >The implementation is simple, and I am looking forward to discuss more
> >with you guys in the conference.
> >
> >--Ying
> I don't think soft limits will help his case, if I know understand
> it correctly. Global reclaim can be triggered regardless of any soft
> limits we may set.
> 
> Now, there are two things I still don't like about it:
> * The definition of a "main workload", "main cgroup", or anything
> like that.

This was just because I wanted to point out the particular case that I
am interested in. You can of course setup more cgroups to be isolated
and balance them by the soft limit.

> I'd prefer to rank them according to some parameter,
> something akin to swapiness. This would allow for other people to
> use it in a different way, while still making you capable of
> reaching your goals through parameter settings (i.e. one cgroup has
> a high value of reclaim, all others, a much lower one)

Yes, this has been mentioned in the patch TODO section (above). I wanted
the first post to be as easy as possible for the discussion starter. I
guess that we really need something like priority in fact.

> 
> * The fact that you seem to want to *skip* reclaim altogether for a
> cgroup. That's a dangerous condition, IMHO. What I think we should
> try to achieve, is "skip it for practical purposes on sane
> workloads". 

Yes the feature might be dangerous (we provide many ways to shoot self
toes already ;)) but that is what you get if you want to guarantee
something.
But I agree, I guess we can be more clever and if it is priority based
we can map isolation priorities to the reclaim priorities somehow.

> Again, a parameter that when set to a very high mark, has the effect
> of disallowing reclaim for a cgroup under most sane circumstances.
> 
> What do you think of the above, Michal ?

Yes I guess that priority based isolation is the way to go. We should,
however, start with a consensus in this regard (should we do something
like that at all?).

Thanks
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-10-22  9:47 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-20  1:33 [RFD] Isolated memory cgroups again Michal Hocko
2011-10-20  1:33 ` Michal Hocko
2011-10-20  1:59 ` KAMEZAWA Hiroyuki
2011-10-20  1:59   ` KAMEZAWA Hiroyuki
2011-10-20 16:30   ` Michal Hocko
2011-10-20 16:30     ` Michal Hocko
2011-10-21 16:04   ` Balbir Singh
2011-10-21 16:04     ` Balbir Singh
2011-10-22  9:26     ` Michal Hocko
2011-10-22  9:26       ` Michal Hocko
2011-10-21 16:11   ` Balbir Singh
2011-10-21 16:11     ` Balbir Singh
2011-10-20  8:55 ` Glauber Costa
2011-10-20  8:55   ` Glauber Costa
2011-10-20 16:42   ` Michal Hocko
2011-10-20 16:42     ` Michal Hocko
2011-10-20 23:41 ` Ying Han
2011-10-20 23:41   ` Ying Han
2011-10-21  2:45   ` Michal Hocko
2011-10-21  2:45     ` Michal Hocko
2011-10-21  3:17     ` KAMEZAWA Hiroyuki
2011-10-21  3:17       ` KAMEZAWA Hiroyuki
2011-10-21 20:00       ` Ying Han
2011-10-22  9:31         ` Michal Hocko
2011-10-22  9:31           ` Michal Hocko
2011-10-21  8:39   ` Glauber Costa
2011-10-21  8:39     ` Glauber Costa
2011-10-21 12:16     ` Johannes Weiner
2011-10-21 12:16       ` Johannes Weiner
2011-10-22  9:47     ` Michal Hocko
2011-10-22  9:47       ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.