[PATCH 0/5] mm: workingset: make shadow node shrinker memcg aware

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] mm: workingset: make shadow node shrinker memcg aware
@ 2016-02-07 17:27 ` Vladimir Davydov
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

Hi,

Workingset code was recently made memcg aware, but shadow node shrinker
is still global. As a result, one small cgroup can consume all memory
available for shadow nodes, possibly hurting other cgroups by reclaiming
their shadow nodes, even though reclaim distances stored in its shadow
nodes have no effect. To avoid this, we need to make shadow node
shrinker memcg aware.

The actual work is done in patch 5 of the series. Patches 1 and 2
prepare memcg/shrinker infrastructure for the change. Patch 3 is just a
collateral cleanup. Patch 4 makes radix_tree_node accounted, which is
necessary for making shadow node shrinker memcg aware.

Thanks,

Vladimir Davydov (5):
  mm: memcontrol: enable kmem accounting for all cgroups in the legacy
    hierarchy
  mm: vmscan: pass root_mem_cgroup instead of NULL to memcg aware
    shrinker
  mm: memcontrol: zap memcg_kmem_online helper
  radix-tree: account radix_tree_node to memory cgroup
  mm: workingset: make shadow node shrinker memcg aware

 include/linux/memcontrol.h | 20 +++++++++----------
 lib/radix-tree.c           | 16 +++++++++++++---
 mm/memcontrol.c            | 48 ++++++++--------------------------------------
 mm/slab_common.c           |  2 +-
 mm/vmscan.c                | 15 ++++++++++-----
 mm/workingset.c            | 11 ++++++++---
 6 files changed, 50 insertions(+), 62 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/5] mm: workingset: make shadow node shrinker memcg aware
@ 2016-02-07 17:27 ` Vladimir Davydov
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

Hi,

Workingset code was recently made memcg aware, but shadow node shrinker
is still global. As a result, one small cgroup can consume all memory
available for shadow nodes, possibly hurting other cgroups by reclaiming
their shadow nodes, even though reclaim distances stored in its shadow
nodes have no effect. To avoid this, we need to make shadow node
shrinker memcg aware.

The actual work is done in patch 5 of the series. Patches 1 and 2
prepare memcg/shrinker infrastructure for the change. Patch 3 is just a
collateral cleanup. Patch 4 makes radix_tree_node accounted, which is
necessary for making shadow node shrinker memcg aware.

Thanks,

Vladimir Davydov (5):
  mm: memcontrol: enable kmem accounting for all cgroups in the legacy
    hierarchy
  mm: vmscan: pass root_mem_cgroup instead of NULL to memcg aware
    shrinker
  mm: memcontrol: zap memcg_kmem_online helper
  radix-tree: account radix_tree_node to memory cgroup
  mm: workingset: make shadow node shrinker memcg aware

 include/linux/memcontrol.h | 20 +++++++++----------
 lib/radix-tree.c           | 16 +++++++++++++---
 mm/memcontrol.c            | 48 ++++++++--------------------------------------
 mm/slab_common.c           |  2 +-
 mm/vmscan.c                | 15 ++++++++++-----
 mm/workingset.c            | 11 ++++++++---
 6 files changed, 50 insertions(+), 62 deletions(-)

-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/5] mm: memcontrol: enable kmem accounting for all cgroups in the legacy hierarchy
  2016-02-07 17:27 ` Vladimir Davydov
@ 2016-02-07 17:27   ` Vladimir Davydov
  -1 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

Currently, in the legacy hierarchy kmem accounting is off for all
cgroups by default and must be enabled explicitly by writing something
to memory.kmem.limit_in_bytes. Since we don't support reclaim on hitting
kmem limit, nor do we have any plans to implement it, this is likely to
be -1, just to enable kmem accounting and limit kernel memory
consumption by the memory.limit_in_bytes along with user memory.

This user API was introduced when the implementation of kmem accounting
lacked slab shrinker support and hence was useless in practice. Things
have changed since then - slab shrinkers were made memcg aware, the
accounting overhead seems to be negligible, and a failure to charge a
kmem allocation should not have critical consequences, because we only
account those kernel objects that should be safe to fail. That's why
kmem accounting is enabled by default for all cgroups in the default
hierarchy, which will eventually replace the legacy one.

The ability to enable kmem accounting for some cgroups while keeping it
disabled for others is getting difficult to maintain. E.g. to make
shadow node shrinker memcg aware (see mm/workingset.c), we need to know
the relationship between the number of shadow nodes allocated for a
cgroup and the size of its lru list. If kmem accounting is enabled for
all cgroups there is no problem, but what should we do if kmem
accounting is enabled only for half of cgroups? We've no other choice
but use global lru stats while scanning root cgroup's shadow nodes, but
that would be wrong if kmem accounting was enabled for all cgroups
(which is the case if the unified hierarchy is used), in which case we
should use lru stats of the root cgroup's lruvec.

That being said, let's enable kmem accounting for all memory cgroups by
default. If one finds it unstable or too costly, it can always be
disabled system-wide by passing cgroup.memory=nokmem to the kernel at
boot time.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 mm/memcontrol.c | 41 +++++------------------------------------
 1 file changed, 5 insertions(+), 36 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4b7dda7c2e74..28d1b1e9d4fb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2824,6 +2824,9 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
 {
 	int memcg_id;

+	if (cgroup_memory_nokmem)
+		return 0;
+
 	BUG_ON(memcg->kmemcg_id >= 0);
 	BUG_ON(memcg->kmem_state);

@@ -2844,24 +2847,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
 	return 0;
 }

-static int memcg_propagate_kmem(struct mem_cgroup *parent,
-				struct mem_cgroup *memcg)
-{
-	int ret = 0;
-
-	mutex_lock(&memcg_limit_mutex);
-	/*
-	 * If the parent cgroup is not kmem-online now, it cannot be
-	 * onlined after this point, because it has at least one child
-	 * already.
-	 */
-	if (memcg_kmem_online(parent) ||
-	    (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nokmem))
-		ret = memcg_online_kmem(memcg);
-	mutex_unlock(&memcg_limit_mutex);
-	return ret;
-}
-
 static void memcg_offline_kmem(struct mem_cgroup *memcg)
 {
 	struct cgroup_subsys_state *css;
@@ -2920,10 +2905,6 @@ static void memcg_free_kmem(struct mem_cgroup *memcg)
 	}
 }
 #else
-static int memcg_propagate_kmem(struct mem_cgroup *parent, struct mem_cgroup *memcg)
-{
-	return 0;
-}
 static int memcg_online_kmem(struct mem_cgroup *memcg)
 {
 	return 0;
@@ -2939,22 +2920,10 @@ static void memcg_free_kmem(struct mem_cgroup *memcg)
 static int memcg_update_kmem_limit(struct mem_cgroup *memcg,
 				   unsigned long limit)
 {
-	int ret = 0;
+	int ret;

 	mutex_lock(&memcg_limit_mutex);
-	/* Top-level cgroup doesn't propagate from root */
-	if (!memcg_kmem_online(memcg)) {
-		if (cgroup_is_populated(memcg->css.cgroup) ||
-		    (memcg->use_hierarchy && memcg_has_children(memcg)))
-			ret = -EBUSY;
-		if (ret)
-			goto out;
-		ret = memcg_online_kmem(memcg);
-		if (ret)
-			goto out;
-	}
 	ret = page_counter_limit(&memcg->kmem, limit);
-out:
 	mutex_unlock(&memcg_limit_mutex);
 	return ret;
 }
@@ -4205,7 +4174,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 		return &memcg->css;
 	}

-	error = memcg_propagate_kmem(parent, memcg);
+	error = memcg_online_kmem(memcg);
 	if (error)
 		goto fail;

-- 
2.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/5] mm: memcontrol: enable kmem accounting for all cgroups in the legacy hierarchy
@ 2016-02-07 17:27   ` Vladimir Davydov
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

Currently, in the legacy hierarchy kmem accounting is off for all
cgroups by default and must be enabled explicitly by writing something
to memory.kmem.limit_in_bytes. Since we don't support reclaim on hitting
kmem limit, nor do we have any plans to implement it, this is likely to
be -1, just to enable kmem accounting and limit kernel memory
consumption by the memory.limit_in_bytes along with user memory.

This user API was introduced when the implementation of kmem accounting
lacked slab shrinker support and hence was useless in practice. Things
have changed since then - slab shrinkers were made memcg aware, the
accounting overhead seems to be negligible, and a failure to charge a
kmem allocation should not have critical consequences, because we only
account those kernel objects that should be safe to fail. That's why
kmem accounting is enabled by default for all cgroups in the default
hierarchy, which will eventually replace the legacy one.

The ability to enable kmem accounting for some cgroups while keeping it
disabled for others is getting difficult to maintain. E.g. to make
shadow node shrinker memcg aware (see mm/workingset.c), we need to know
the relationship between the number of shadow nodes allocated for a
cgroup and the size of its lru list. If kmem accounting is enabled for
all cgroups there is no problem, but what should we do if kmem
accounting is enabled only for half of cgroups? We've no other choice
but use global lru stats while scanning root cgroup's shadow nodes, but
that would be wrong if kmem accounting was enabled for all cgroups
(which is the case if the unified hierarchy is used), in which case we
should use lru stats of the root cgroup's lruvec.

That being said, let's enable kmem accounting for all memory cgroups by
default. If one finds it unstable or too costly, it can always be
disabled system-wide by passing cgroup.memory=nokmem to the kernel at
boot time.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 mm/memcontrol.c | 41 +++++------------------------------------
 1 file changed, 5 insertions(+), 36 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4b7dda7c2e74..28d1b1e9d4fb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2824,6 +2824,9 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
 {
 	int memcg_id;

+	if (cgroup_memory_nokmem)
+		return 0;
+
 	BUG_ON(memcg->kmemcg_id >= 0);
 	BUG_ON(memcg->kmem_state);

@@ -2844,24 +2847,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
 	return 0;
 }

-static int memcg_propagate_kmem(struct mem_cgroup *parent,
-				struct mem_cgroup *memcg)
-{
-	int ret = 0;
-
-	mutex_lock(&memcg_limit_mutex);
-	/*
-	 * If the parent cgroup is not kmem-online now, it cannot be
-	 * onlined after this point, because it has at least one child
-	 * already.
-	 */
-	if (memcg_kmem_online(parent) ||
-	    (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nokmem))
-		ret = memcg_online_kmem(memcg);
-	mutex_unlock(&memcg_limit_mutex);
-	return ret;
-}
-
 static void memcg_offline_kmem(struct mem_cgroup *memcg)
 {
 	struct cgroup_subsys_state *css;
@@ -2920,10 +2905,6 @@ static void memcg_free_kmem(struct mem_cgroup *memcg)
 	}
 }
 #else
-static int memcg_propagate_kmem(struct mem_cgroup *parent, struct mem_cgroup *memcg)
-{
-	return 0;
-}
 static int memcg_online_kmem(struct mem_cgroup *memcg)
 {
 	return 0;
@@ -2939,22 +2920,10 @@ static void memcg_free_kmem(struct mem_cgroup *memcg)
 static int memcg_update_kmem_limit(struct mem_cgroup *memcg,
 				   unsigned long limit)
 {
-	int ret = 0;
+	int ret;

 	mutex_lock(&memcg_limit_mutex);
-	/* Top-level cgroup doesn't propagate from root */
-	if (!memcg_kmem_online(memcg)) {
-		if (cgroup_is_populated(memcg->css.cgroup) ||
-		    (memcg->use_hierarchy && memcg_has_children(memcg)))
-			ret = -EBUSY;
-		if (ret)
-			goto out;
-		ret = memcg_online_kmem(memcg);
-		if (ret)
-			goto out;
-	}
 	ret = page_counter_limit(&memcg->kmem, limit);
-out:
 	mutex_unlock(&memcg_limit_mutex);
 	return ret;
 }
@@ -4205,7 +4174,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 		return &memcg->css;
 	}

-	error = memcg_propagate_kmem(parent, memcg);
+	error = memcg_online_kmem(memcg);
 	if (error)
 		goto fail;

-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/5] mm: vmscan: pass root_mem_cgroup instead of NULL to memcg aware shrinker
  2016-02-07 17:27 ` Vladimir Davydov
@ 2016-02-07 17:27   ` Vladimir Davydov
  -1 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

It's just convenient to implement a memcg aware shrinker when you know
that shrink_control->memcg != NULL unless memcg_kmem_enabled() returns
false.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 mm/vmscan.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 18b3767136f4..bae8f32ad9cb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -382,9 +382,8 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
  *
  * @memcg specifies the memory cgroup to target. If it is not NULL,
  * only shrinkers with SHRINKER_MEMCG_AWARE set will be called to scan
- * objects from the memory cgroup specified. Otherwise all shrinkers
- * are called, and memcg aware shrinkers are supposed to scan the
- * global list then.
+ * objects from the memory cgroup specified. Otherwise, only unaware
+ * shrinkers are called.
  *
  * @nr_scanned and @nr_eligible form a ratio that indicate how much of
  * the available objects should be scanned.  Page reclaim for example
@@ -404,7 +403,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
 	struct shrinker *shrinker;
 	unsigned long freed = 0;
 
-	if (memcg && !memcg_kmem_online(memcg))
+	if (memcg && (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)))
 		return 0;
 
 	if (nr_scanned == 0)
@@ -428,7 +427,13 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
 			.memcg = memcg,
 		};
 
-		if (memcg && !(shrinker->flags & SHRINKER_MEMCG_AWARE))
+		/*
+		 * If kernel memory accounting is disabled, we ignore
+		 * SHRINKER_MEMCG_AWARE flag and call all shrinkers
+		 * passing NULL for memcg.
+		 */
+		if (memcg_kmem_enabled() &&
+		    !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
 			continue;
 
 		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/5] mm: vmscan: pass root_mem_cgroup instead of NULL to memcg aware shrinker
@ 2016-02-07 17:27   ` Vladimir Davydov
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

It's just convenient to implement a memcg aware shrinker when you know
that shrink_control->memcg != NULL unless memcg_kmem_enabled() returns
false.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 mm/vmscan.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 18b3767136f4..bae8f32ad9cb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -382,9 +382,8 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
  *
  * @memcg specifies the memory cgroup to target. If it is not NULL,
  * only shrinkers with SHRINKER_MEMCG_AWARE set will be called to scan
- * objects from the memory cgroup specified. Otherwise all shrinkers
- * are called, and memcg aware shrinkers are supposed to scan the
- * global list then.
+ * objects from the memory cgroup specified. Otherwise, only unaware
+ * shrinkers are called.
  *
  * @nr_scanned and @nr_eligible form a ratio that indicate how much of
  * the available objects should be scanned.  Page reclaim for example
@@ -404,7 +403,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
 	struct shrinker *shrinker;
 	unsigned long freed = 0;
 
-	if (memcg && !memcg_kmem_online(memcg))
+	if (memcg && (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)))
 		return 0;
 
 	if (nr_scanned == 0)
@@ -428,7 +427,13 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
 			.memcg = memcg,
 		};
 
-		if (memcg && !(shrinker->flags & SHRINKER_MEMCG_AWARE))
+		/*
+		 * If kernel memory accounting is disabled, we ignore
+		 * SHRINKER_MEMCG_AWARE flag and call all shrinkers
+		 * passing NULL for memcg.
+		 */
+		if (memcg_kmem_enabled() &&
+		    !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
 			continue;
 
 		if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/5] mm: memcontrol: zap memcg_kmem_online helper
  2016-02-07 17:27 ` Vladimir Davydov
@ 2016-02-07 17:27   ` Vladimir Davydov
  -1 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

As kmem accounting is now either enabled for all cgroups or disabled
system-wide, there's no point in having memcg_kmem_online() helper -
instead one can use memcg_kmem_enabled() and mem_cgroup_online(), as
shrink_slab() now does.

There are only two places left where this helper is used -
__memcg_kmem_charge() and memcg_create_kmem_cache(). The former can only
be called if memcg_kmem_enabled() returned true. Since the cgroup it
operates on is online, mem_cgroup_is_root() check will be enough.

memcg_create_kmem_cache() can't use mem_cgroup_online() helper instead
of memcg_kmem_online(), because it relies on the fact that in
memcg_offline_kmem() memcg->kmem_state is changed before
memcg_deactivate_kmem_caches() is called, but there we can just
open-code the check.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 10 ----------
 mm/memcontrol.c            |  2 +-
 mm/slab_common.c           |  2 +-
 3 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d6300313b298..bc8e4e22f58f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -795,11 +795,6 @@ static inline bool memcg_kmem_enabled(void)
 	return static_branch_unlikely(&memcg_kmem_enabled_key);
 }
 
-static inline bool memcg_kmem_online(struct mem_cgroup *memcg)
-{
-	return memcg->kmem_state == KMEM_ONLINE;
-}
-
 /*
  * In general, we'll do everything in our power to not incur in any overhead
  * for non-memcg users for the kmem functions. Not even a function call, if we
@@ -909,11 +904,6 @@ static inline bool memcg_kmem_enabled(void)
 	return false;
 }
 
-static inline bool memcg_kmem_online(struct mem_cgroup *memcg)
-{
-	return false;
-}
-
 static inline int memcg_kmem_charge(struct page *page, gfp_t gfp, int order)
 {
 	return 0;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 28d1b1e9d4fb..341bf86d26c2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2346,7 +2346,7 @@ int __memcg_kmem_charge(struct page *page, gfp_t gfp, int order)
 	int ret = 0;
 
 	memcg = get_mem_cgroup_from_mm(current->mm);
-	if (memcg_kmem_online(memcg))
+	if (!mem_cgroup_is_root(memcg))
 		ret = __memcg_kmem_charge_memcg(page, gfp, order, memcg);
 	css_put(&memcg->css);
 	return ret;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6afb2263a5c5..8addc3c4df37 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -510,7 +510,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
 	 * The memory cgroup could have been offlined while the cache
 	 * creation work was pending.
 	 */
-	if (!memcg_kmem_online(memcg))
+	if (memcg->kmem_state != KMEM_ONLINE)
 		goto out_unlock;
 
 	idx = memcg_cache_id(memcg);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/5] mm: memcontrol: zap memcg_kmem_online helper
@ 2016-02-07 17:27   ` Vladimir Davydov
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

As kmem accounting is now either enabled for all cgroups or disabled
system-wide, there's no point in having memcg_kmem_online() helper -
instead one can use memcg_kmem_enabled() and mem_cgroup_online(), as
shrink_slab() now does.

There are only two places left where this helper is used -
__memcg_kmem_charge() and memcg_create_kmem_cache(). The former can only
be called if memcg_kmem_enabled() returned true. Since the cgroup it
operates on is online, mem_cgroup_is_root() check will be enough.

memcg_create_kmem_cache() can't use mem_cgroup_online() helper instead
of memcg_kmem_online(), because it relies on the fact that in
memcg_offline_kmem() memcg->kmem_state is changed before
memcg_deactivate_kmem_caches() is called, but there we can just
open-code the check.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 10 ----------
 mm/memcontrol.c            |  2 +-
 mm/slab_common.c           |  2 +-
 3 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d6300313b298..bc8e4e22f58f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -795,11 +795,6 @@ static inline bool memcg_kmem_enabled(void)
 	return static_branch_unlikely(&memcg_kmem_enabled_key);
 }
 
-static inline bool memcg_kmem_online(struct mem_cgroup *memcg)
-{
-	return memcg->kmem_state == KMEM_ONLINE;
-}
-
 /*
  * In general, we'll do everything in our power to not incur in any overhead
  * for non-memcg users for the kmem functions. Not even a function call, if we
@@ -909,11 +904,6 @@ static inline bool memcg_kmem_enabled(void)
 	return false;
 }
 
-static inline bool memcg_kmem_online(struct mem_cgroup *memcg)
-{
-	return false;
-}
-
 static inline int memcg_kmem_charge(struct page *page, gfp_t gfp, int order)
 {
 	return 0;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 28d1b1e9d4fb..341bf86d26c2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2346,7 +2346,7 @@ int __memcg_kmem_charge(struct page *page, gfp_t gfp, int order)
 	int ret = 0;
 
 	memcg = get_mem_cgroup_from_mm(current->mm);
-	if (memcg_kmem_online(memcg))
+	if (!mem_cgroup_is_root(memcg))
 		ret = __memcg_kmem_charge_memcg(page, gfp, order, memcg);
 	css_put(&memcg->css);
 	return ret;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6afb2263a5c5..8addc3c4df37 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -510,7 +510,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
 	 * The memory cgroup could have been offlined while the cache
 	 * creation work was pending.
 	 */
-	if (!memcg_kmem_online(memcg))
+	if (memcg->kmem_state != KMEM_ONLINE)
 		goto out_unlock;
 
 	idx = memcg_cache_id(memcg);
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/5] radix-tree: account radix_tree_node to memory cgroup
  2016-02-07 17:27 ` Vladimir Davydov
@ 2016-02-07 17:27   ` Vladimir Davydov
  -1 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

Allocation of radix_tree_node objects can be easily triggered from
userspace, so we should account them to memory cgroup. Besides, we need
them accounted for making shadow node shrinker per memcg (see
mm/workingset.c).

A tricky thing about accounting radix_tree_node objects is that they are
mostly allocated through radix_tree_preload(), so we can't just set
SLAB_ACCOUNT for radix_tree_node_cachep - that would likely result in a
lot of unrelated cgroups using objects from each other's caches.

One way to overcome this would be making radix tree preloads per memcg,
but that would probably look cumbersome and overcomplicated.

Instead, we make radix_tree_node_alloc() first try to allocate from the
cache with __GFP_ACCOUNT, no matter if the caller has preloaded or not,
and only if it fails fall back on using per cpu preloads. This should
make most allocations accounted.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 lib/radix-tree.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index e2511b8e2300..1624c4117961 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -227,6 +227,15 @@ radix_tree_node_alloc(struct radix_tree_root *root)
 		struct radix_tree_preload *rtp;

 		/*
+		 * Even if the caller has preloaded, try to allocate from the
+		 * cache first for the new node to get accounted.
+		 */
+		ret = kmem_cache_alloc(radix_tree_node_cachep,
+				       gfp_mask | __GFP_ACCOUNT | __GFP_NOWARN);
+		if (ret)
+			goto out;
+
+		/*
 		 * Provided the caller has preloaded here, we will always
 		 * succeed in getting a node here (and never reach
 		 * kmem_cache_alloc)
@@ -243,10 +252,11 @@ radix_tree_node_alloc(struct radix_tree_root *root)
 		 * for debugging.
 		 */
 		kmemleak_update_trace(ret);
+		goto out;
 	}
-	if (ret == NULL)
-		ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
-
+	ret = kmem_cache_alloc(radix_tree_node_cachep,
+			       gfp_mask | __GFP_ACCOUNT);
+out:
 	BUG_ON(radix_tree_is_indirect_ptr(ret));
 	return ret;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/5] radix-tree: account radix_tree_node to memory cgroup
@ 2016-02-07 17:27   ` Vladimir Davydov
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

Allocation of radix_tree_node objects can be easily triggered from
userspace, so we should account them to memory cgroup. Besides, we need
them accounted for making shadow node shrinker per memcg (see
mm/workingset.c).

A tricky thing about accounting radix_tree_node objects is that they are
mostly allocated through radix_tree_preload(), so we can't just set
SLAB_ACCOUNT for radix_tree_node_cachep - that would likely result in a
lot of unrelated cgroups using objects from each other's caches.

One way to overcome this would be making radix tree preloads per memcg,
but that would probably look cumbersome and overcomplicated.

Instead, we make radix_tree_node_alloc() first try to allocate from the
cache with __GFP_ACCOUNT, no matter if the caller has preloaded or not,
and only if it fails fall back on using per cpu preloads. This should
make most allocations accounted.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 lib/radix-tree.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index e2511b8e2300..1624c4117961 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -227,6 +227,15 @@ radix_tree_node_alloc(struct radix_tree_root *root)
 		struct radix_tree_preload *rtp;

 		/*
+		 * Even if the caller has preloaded, try to allocate from the
+		 * cache first for the new node to get accounted.
+		 */
+		ret = kmem_cache_alloc(radix_tree_node_cachep,
+				       gfp_mask | __GFP_ACCOUNT | __GFP_NOWARN);
+		if (ret)
+			goto out;
+
+		/*
 		 * Provided the caller has preloaded here, we will always
 		 * succeed in getting a node here (and never reach
 		 * kmem_cache_alloc)
@@ -243,10 +252,11 @@ radix_tree_node_alloc(struct radix_tree_root *root)
 		 * for debugging.
 		 */
 		kmemleak_update_trace(ret);
+		goto out;
 	}
-	if (ret == NULL)
-		ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
-
+	ret = kmem_cache_alloc(radix_tree_node_cachep,
+			       gfp_mask | __GFP_ACCOUNT);
+out:
 	BUG_ON(radix_tree_is_indirect_ptr(ret));
 	return ret;
 }
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware
  2016-02-07 17:27 ` Vladimir Davydov
@ 2016-02-07 17:27   ` Vladimir Davydov
  -1 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

Workingset code was recently made memcg aware, but shadow node shrinker
is still global. As a result, one small cgroup can consume all memory
available for shadow nodes, possibly hurting other cgroups by reclaiming
their shadow nodes, even though reclaim distances stored in its shadow
nodes have no effect. To avoid this, we need to make shadow node
shrinker memcg aware.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 10 ++++++++++
 mm/memcontrol.c            |  5 ++---
 mm/workingset.c            | 11 ++++++++---
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bc8e4e22f58f..1191d79aa495 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -403,6 +403,9 @@ int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
 		int nr_pages);
 
+unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
+					   int nid, unsigned int lru_mask);
+
 static inline
 unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru)
 {
@@ -661,6 +664,13 @@ mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
 {
 }
 
+static inline unsigned long
+mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
+			     int nid, unsigned int lru_mask)
+{
+	return 0;
+}
+
 static inline void
 mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 341bf86d26c2..ae8b81c55685 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -638,9 +638,8 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 	__this_cpu_add(memcg->stat->nr_page_events, nr_pages);
 }
 
-static unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
-						  int nid,
-						  unsigned int lru_mask)
+unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
+					   int nid, unsigned int lru_mask)
 {
 	unsigned long nr = 0;
 	int zid;
diff --git a/mm/workingset.c b/mm/workingset.c
index 6130ba0b2641..8c07cd8af15e 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -349,7 +349,12 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 	shadow_nodes = list_lru_shrink_count(&workingset_shadow_nodes, sc);
 	local_irq_enable();
 
-	pages = node_present_pages(sc->nid);
+	if (memcg_kmem_enabled())
+		pages = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
+						     BIT(LRU_ACTIVE_FILE));
+	else
+		pages = node_page_state(sc->nid, NR_ACTIVE_FILE);
+
 	/*
 	 * Active cache pages are limited to 50% of memory, and shadow
 	 * entries that represent a refault distance bigger than that
@@ -364,7 +369,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 	 *
 	 * PAGE_SIZE / radix_tree_nodes / node_entries / PAGE_SIZE
 	 */
-	max_nodes = pages >> (1 + RADIX_TREE_MAP_SHIFT - 3);
+	max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3);
 
 	if (shadow_nodes <= max_nodes)
 		return 0;
@@ -458,7 +463,7 @@ static struct shrinker workingset_shadow_shrinker = {
 	.count_objects = count_shadow_nodes,
 	.scan_objects = scan_shadow_nodes,
 	.seeks = DEFAULT_SEEKS,
-	.flags = SHRINKER_NUMA_AWARE,
+	.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE,
 };
 
 /*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware
@ 2016-02-07 17:27   ` Vladimir Davydov
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-07 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel

Workingset code was recently made memcg aware, but shadow node shrinker
is still global. As a result, one small cgroup can consume all memory
available for shadow nodes, possibly hurting other cgroups by reclaiming
their shadow nodes, even though reclaim distances stored in its shadow
nodes have no effect. To avoid this, we need to make shadow node
shrinker memcg aware.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 10 ++++++++++
 mm/memcontrol.c            |  5 ++---
 mm/workingset.c            | 11 ++++++++---
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bc8e4e22f58f..1191d79aa495 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -403,6 +403,9 @@ int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
 void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
 		int nr_pages);
 
+unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
+					   int nid, unsigned int lru_mask);
+
 static inline
 unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru)
 {
@@ -661,6 +664,13 @@ mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
 {
 }
 
+static inline unsigned long
+mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
+			     int nid, unsigned int lru_mask)
+{
+	return 0;
+}
+
 static inline void
 mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 341bf86d26c2..ae8b81c55685 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -638,9 +638,8 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 	__this_cpu_add(memcg->stat->nr_page_events, nr_pages);
 }
 
-static unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
-						  int nid,
-						  unsigned int lru_mask)
+unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg,
+					   int nid, unsigned int lru_mask)
 {
 	unsigned long nr = 0;
 	int zid;
diff --git a/mm/workingset.c b/mm/workingset.c
index 6130ba0b2641..8c07cd8af15e 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -349,7 +349,12 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 	shadow_nodes = list_lru_shrink_count(&workingset_shadow_nodes, sc);
 	local_irq_enable();
 
-	pages = node_present_pages(sc->nid);
+	if (memcg_kmem_enabled())
+		pages = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
+						     BIT(LRU_ACTIVE_FILE));
+	else
+		pages = node_page_state(sc->nid, NR_ACTIVE_FILE);
+
 	/*
 	 * Active cache pages are limited to 50% of memory, and shadow
 	 * entries that represent a refault distance bigger than that
@@ -364,7 +369,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 	 *
 	 * PAGE_SIZE / radix_tree_nodes / node_entries / PAGE_SIZE
 	 */
-	max_nodes = pages >> (1 + RADIX_TREE_MAP_SHIFT - 3);
+	max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3);
 
 	if (shadow_nodes <= max_nodes)
 		return 0;
@@ -458,7 +463,7 @@ static struct shrinker workingset_shadow_shrinker = {
 	.count_objects = count_shadow_nodes,
 	.scan_objects = scan_shadow_nodes,
 	.seeks = DEFAULT_SEEKS,
-	.flags = SHRINKER_NUMA_AWARE,
+	.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE,
 };
 
 /*
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/5] mm: memcontrol: enable kmem accounting for all cgroups in the legacy hierarchy
  2016-02-07 17:27   ` Vladimir Davydov
@ 2016-02-08  5:46     ` Johannes Weiner
  -1 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08  5:46 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Sun, Feb 07, 2016 at 08:27:31PM +0300, Vladimir Davydov wrote:
> Currently, in the legacy hierarchy kmem accounting is off for all
> cgroups by default and must be enabled explicitly by writing something
> to memory.kmem.limit_in_bytes. Since we don't support reclaim on hitting
> kmem limit, nor do we have any plans to implement it, this is likely to
> be -1, just to enable kmem accounting and limit kernel memory
> consumption by the memory.limit_in_bytes along with user memory.
> 
> This user API was introduced when the implementation of kmem accounting
> lacked slab shrinker support and hence was useless in practice. Things
> have changed since then - slab shrinkers were made memcg aware, the
> accounting overhead seems to be negligible, and a failure to charge a
> kmem allocation should not have critical consequences, because we only
> account those kernel objects that should be safe to fail. That's why
> kmem accounting is enabled by default for all cgroups in the default
> hierarchy, which will eventually replace the legacy one.
> 
> The ability to enable kmem accounting for some cgroups while keeping it
> disabled for others is getting difficult to maintain. E.g. to make
> shadow node shrinker memcg aware (see mm/workingset.c), we need to know
> the relationship between the number of shadow nodes allocated for a
> cgroup and the size of its lru list. If kmem accounting is enabled for
> all cgroups there is no problem, but what should we do if kmem
> accounting is enabled only for half of cgroups? We've no other choice
> but use global lru stats while scanning root cgroup's shadow nodes, but
> that would be wrong if kmem accounting was enabled for all cgroups
> (which is the case if the unified hierarchy is used), in which case we
> should use lru stats of the root cgroup's lruvec.
> 
> That being said, let's enable kmem accounting for all memory cgroups by
> default. If one finds it unstable or too costly, it can always be
> disabled system-wide by passing cgroup.memory=nokmem to the kernel at
> boot time.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

A little bolder than I would have preferred for legacy memcg, but I
don't think we have another choice here. And you're right, accounting
costs are a far cry from what they once were. So I'm okay with this.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/5] mm: memcontrol: enable kmem accounting for all cgroups in the legacy hierarchy
@ 2016-02-08  5:46     ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08  5:46 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Sun, Feb 07, 2016 at 08:27:31PM +0300, Vladimir Davydov wrote:
> Currently, in the legacy hierarchy kmem accounting is off for all
> cgroups by default and must be enabled explicitly by writing something
> to memory.kmem.limit_in_bytes. Since we don't support reclaim on hitting
> kmem limit, nor do we have any plans to implement it, this is likely to
> be -1, just to enable kmem accounting and limit kernel memory
> consumption by the memory.limit_in_bytes along with user memory.
> 
> This user API was introduced when the implementation of kmem accounting
> lacked slab shrinker support and hence was useless in practice. Things
> have changed since then - slab shrinkers were made memcg aware, the
> accounting overhead seems to be negligible, and a failure to charge a
> kmem allocation should not have critical consequences, because we only
> account those kernel objects that should be safe to fail. That's why
> kmem accounting is enabled by default for all cgroups in the default
> hierarchy, which will eventually replace the legacy one.
> 
> The ability to enable kmem accounting for some cgroups while keeping it
> disabled for others is getting difficult to maintain. E.g. to make
> shadow node shrinker memcg aware (see mm/workingset.c), we need to know
> the relationship between the number of shadow nodes allocated for a
> cgroup and the size of its lru list. If kmem accounting is enabled for
> all cgroups there is no problem, but what should we do if kmem
> accounting is enabled only for half of cgroups? We've no other choice
> but use global lru stats while scanning root cgroup's shadow nodes, but
> that would be wrong if kmem accounting was enabled for all cgroups
> (which is the case if the unified hierarchy is used), in which case we
> should use lru stats of the root cgroup's lruvec.
> 
> That being said, let's enable kmem accounting for all memory cgroups by
> default. If one finds it unstable or too costly, it can always be
> disabled system-wide by passing cgroup.memory=nokmem to the kernel at
> boot time.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

A little bolder than I would have preferred for legacy memcg, but I
don't think we have another choice here. And you're right, accounting
costs are a far cry from what they once were. So I'm okay with this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] mm: vmscan: pass root_mem_cgroup instead of NULL to memcg aware shrinker
  2016-02-07 17:27   ` Vladimir Davydov
@ 2016-02-08  5:47     ` Johannes Weiner
  -1 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08  5:47 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Sun, Feb 07, 2016 at 08:27:32PM +0300, Vladimir Davydov wrote:
> It's just convenient to implement a memcg aware shrinker when you know
> that shrink_control->memcg != NULL unless memcg_kmem_enabled() returns
> false.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] mm: vmscan: pass root_mem_cgroup instead of NULL to memcg aware shrinker
@ 2016-02-08  5:47     ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08  5:47 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Sun, Feb 07, 2016 at 08:27:32PM +0300, Vladimir Davydov wrote:
> It's just convenient to implement a memcg aware shrinker when you know
> that shrink_control->memcg != NULL unless memcg_kmem_enabled() returns
> false.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm: memcontrol: zap memcg_kmem_online helper
  2016-02-07 17:27   ` Vladimir Davydov
@ 2016-02-08  5:48     ` Johannes Weiner
  -1 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08  5:48 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Sun, Feb 07, 2016 at 08:27:33PM +0300, Vladimir Davydov wrote:
> As kmem accounting is now either enabled for all cgroups or disabled
> system-wide, there's no point in having memcg_kmem_online() helper -
> instead one can use memcg_kmem_enabled() and mem_cgroup_online(), as
> shrink_slab() now does.
> 
> There are only two places left where this helper is used -
> __memcg_kmem_charge() and memcg_create_kmem_cache(). The former can only
> be called if memcg_kmem_enabled() returned true. Since the cgroup it
> operates on is online, mem_cgroup_is_root() check will be enough.
> 
> memcg_create_kmem_cache() can't use mem_cgroup_online() helper instead
> of memcg_kmem_online(), because it relies on the fact that in
> memcg_offline_kmem() memcg->kmem_state is changed before
> memcg_deactivate_kmem_caches() is called, but there we can just
> open-code the check.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Nice. I like the direct check for the root memcg.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm: memcontrol: zap memcg_kmem_online helper
@ 2016-02-08  5:48     ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08  5:48 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Sun, Feb 07, 2016 at 08:27:33PM +0300, Vladimir Davydov wrote:
> As kmem accounting is now either enabled for all cgroups or disabled
> system-wide, there's no point in having memcg_kmem_online() helper -
> instead one can use memcg_kmem_enabled() and mem_cgroup_online(), as
> shrink_slab() now does.
> 
> There are only two places left where this helper is used -
> __memcg_kmem_charge() and memcg_create_kmem_cache(). The former can only
> be called if memcg_kmem_enabled() returned true. Since the cgroup it
> operates on is online, mem_cgroup_is_root() check will be enough.
> 
> memcg_create_kmem_cache() can't use mem_cgroup_online() helper instead
> of memcg_kmem_online(), because it relies on the fact that in
> memcg_offline_kmem() memcg->kmem_state is changed before
> memcg_deactivate_kmem_caches() is called, but there we can just
> open-code the check.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Nice. I like the direct check for the root memcg.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/5] radix-tree: account radix_tree_node to memory cgroup
  2016-02-07 17:27   ` Vladimir Davydov
@ 2016-02-08  6:01     ` Johannes Weiner
  -1 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08  6:01 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Sun, Feb 07, 2016 at 08:27:34PM +0300, Vladimir Davydov wrote:
> Allocation of radix_tree_node objects can be easily triggered from
> userspace, so we should account them to memory cgroup. Besides, we need
> them accounted for making shadow node shrinker per memcg (see
> mm/workingset.c).
> 
> A tricky thing about accounting radix_tree_node objects is that they are
> mostly allocated through radix_tree_preload(), so we can't just set
> SLAB_ACCOUNT for radix_tree_node_cachep - that would likely result in a
> lot of unrelated cgroups using objects from each other's caches.
> 
> One way to overcome this would be making radix tree preloads per memcg,
> but that would probably look cumbersome and overcomplicated.
> 
> Instead, we make radix_tree_node_alloc() first try to allocate from the
> cache with __GFP_ACCOUNT, no matter if the caller has preloaded or not,
> and only if it fails fall back on using per cpu preloads. This should
> make most allocations accounted.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

I'm not too stoked about the extra slab call. But the preload call
allocates nodes for the worst-case insertion, so you are absolutely
right that charging there would not make sense for cgroup ownership.
And I can't think of anything better to do here.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/5] radix-tree: account radix_tree_node to memory cgroup
@ 2016-02-08  6:01     ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08  6:01 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Sun, Feb 07, 2016 at 08:27:34PM +0300, Vladimir Davydov wrote:
> Allocation of radix_tree_node objects can be easily triggered from
> userspace, so we should account them to memory cgroup. Besides, we need
> them accounted for making shadow node shrinker per memcg (see
> mm/workingset.c).
> 
> A tricky thing about accounting radix_tree_node objects is that they are
> mostly allocated through radix_tree_preload(), so we can't just set
> SLAB_ACCOUNT for radix_tree_node_cachep - that would likely result in a
> lot of unrelated cgroups using objects from each other's caches.
> 
> One way to overcome this would be making radix tree preloads per memcg,
> but that would probably look cumbersome and overcomplicated.
> 
> Instead, we make radix_tree_node_alloc() first try to allocate from the
> cache with __GFP_ACCOUNT, no matter if the caller has preloaded or not,
> and only if it fails fall back on using per cpu preloads. This should
> make most allocations accounted.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

I'm not too stoked about the extra slab call. But the preload call
allocates nodes for the worst-case insertion, so you are absolutely
right that charging there would not make sense for cgroup ownership.
And I can't think of anything better to do here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware
  2016-02-07 17:27   ` Vladimir Davydov
@ 2016-02-08  6:23     ` Johannes Weiner
  -1 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08  6:23 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Sun, Feb 07, 2016 at 08:27:35PM +0300, Vladimir Davydov wrote:
> Workingset code was recently made memcg aware, but shadow node shrinker
> is still global. As a result, one small cgroup can consume all memory
> available for shadow nodes, possibly hurting other cgroups by reclaiming
> their shadow nodes, even though reclaim distances stored in its shadow
> nodes have no effect. To avoid this, we need to make shadow node
> shrinker memcg aware.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

This patch is straight forward, but there is one tiny thing that bugs
me about it, and that is switching from available memory to the size
of the active list. Because the active list can shrink drastically at
runtime.

It's true that both the shrinking of the active list and subsequent
activations to regrow it will reduce the number of actionable
refaults, and so it wouldn't be unreasonable to also shrink shadow
nodes when the active list shrinks.

However, I think these are too many assumptions to encode in the
shrinker, because it is only meant to prevent a worst-case explosion
of radix tree nodes. I'd prefer it to be dumb and conservative.

Could we instead go with the current usage of the memcg? Whether
reclaim happens globally or due to the memory limit, the usage at the
time of reclaim gives a good idea of the memory is available to the
group. But it's making less assumptions about the internal composition
of the memcg's memory, and the consequences associated with that.

What do you think?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware
@ 2016-02-08  6:23     ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08  6:23 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Sun, Feb 07, 2016 at 08:27:35PM +0300, Vladimir Davydov wrote:
> Workingset code was recently made memcg aware, but shadow node shrinker
> is still global. As a result, one small cgroup can consume all memory
> available for shadow nodes, possibly hurting other cgroups by reclaiming
> their shadow nodes, even though reclaim distances stored in its shadow
> nodes have no effect. To avoid this, we need to make shadow node
> shrinker memcg aware.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

This patch is straight forward, but there is one tiny thing that bugs
me about it, and that is switching from available memory to the size
of the active list. Because the active list can shrink drastically at
runtime.

It's true that both the shrinking of the active list and subsequent
activations to regrow it will reduce the number of actionable
refaults, and so it wouldn't be unreasonable to also shrink shadow
nodes when the active list shrinks.

However, I think these are too many assumptions to encode in the
shrinker, because it is only meant to prevent a worst-case explosion
of radix tree nodes. I'd prefer it to be dumb and conservative.

Could we instead go with the current usage of the memcg? Whether
reclaim happens globally or due to the memory limit, the usage at the
time of reclaim gives a good idea of the memory is available to the
group. But it's making less assumptions about the internal composition
of the memcg's memory, and the consequences associated with that.

What do you think?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware
  2016-02-08  6:23     ` Johannes Weiner
@ 2016-02-08 14:28       ` Vladimir Davydov
  -1 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-08 14:28 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Mon, Feb 08, 2016 at 01:23:53AM -0500, Johannes Weiner wrote:
> On Sun, Feb 07, 2016 at 08:27:35PM +0300, Vladimir Davydov wrote:
> > Workingset code was recently made memcg aware, but shadow node shrinker
> > is still global. As a result, one small cgroup can consume all memory
> > available for shadow nodes, possibly hurting other cgroups by reclaiming
> > their shadow nodes, even though reclaim distances stored in its shadow
> > nodes have no effect. To avoid this, we need to make shadow node
> > shrinker memcg aware.
> > 
> > Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> 
> This patch is straight forward, but there is one tiny thing that bugs
> me about it, and that is switching from available memory to the size
> of the active list. Because the active list can shrink drastically at
> runtime.

Yeah, active file lru is a volatile thing indeed. Not only can it shrink
rapidly, it can also grow in an instant (e.g. due to mark_page_accessed)
so you're right - sizing shadow node lru basing solely on the active lru
size would be too unpredictable.

> 
> It's true that both the shrinking of the active list and subsequent
> activations to regrow it will reduce the number of actionable
> refaults, and so it wouldn't be unreasonable to also shrink shadow
> nodes when the active list shrinks.
> 
> However, I think these are too many assumptions to encode in the
> shrinker, because it is only meant to prevent a worst-case explosion
> of radix tree nodes. I'd prefer it to be dumb and conservative.
> 
> Could we instead go with the current usage of the memcg? Whether
> reclaim happens globally or due to the memory limit, the usage at the
> time of reclaim gives a good idea of the memory is available to the
> group. But it's making less assumptions about the internal composition
> of the memcg's memory, and the consequences associated with that.

But that would likely result in wasting a considerable chunk of memory
for stale shadow nodes in case file caches constitute only a small part
of memcg memory consumption, which isn't good IMHO.

May be, we'd better use LRU_ALL_FILE / 2 instead?

diff --git a/mm/workingset.c b/mm/workingset.c
index 8c07cd8af15e..8a75f8d2916a 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -351,9 +351,10 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 
 	if (memcg_kmem_enabled())
 		pages = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
-						     BIT(LRU_ACTIVE_FILE));
+						     LRU_ALL_FILE);
 	else
-		pages = node_page_state(sc->nid, NR_ACTIVE_FILE);
+		pages = node_page_state(sc->nid, NR_ACTIVE_FILE) +
+			node_page_state(sc->nid, NR_INACTIVE_FILE);
 
 	/*
 	 * Active cache pages are limited to 50% of memory, and shadow
@@ -369,7 +370,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 	 *
 	 * PAGE_SIZE / radix_tree_nodes / node_entries / PAGE_SIZE
 	 */
-	max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3);
+	max_nodes = pages >> (1 + RADIX_TREE_MAP_SHIFT - 3);
 
 	if (shadow_nodes <= max_nodes)
 		return 0;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware
@ 2016-02-08 14:28       ` Vladimir Davydov
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Davydov @ 2016-02-08 14:28 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Mon, Feb 08, 2016 at 01:23:53AM -0500, Johannes Weiner wrote:
> On Sun, Feb 07, 2016 at 08:27:35PM +0300, Vladimir Davydov wrote:
> > Workingset code was recently made memcg aware, but shadow node shrinker
> > is still global. As a result, one small cgroup can consume all memory
> > available for shadow nodes, possibly hurting other cgroups by reclaiming
> > their shadow nodes, even though reclaim distances stored in its shadow
> > nodes have no effect. To avoid this, we need to make shadow node
> > shrinker memcg aware.
> > 
> > Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> 
> This patch is straight forward, but there is one tiny thing that bugs
> me about it, and that is switching from available memory to the size
> of the active list. Because the active list can shrink drastically at
> runtime.

Yeah, active file lru is a volatile thing indeed. Not only can it shrink
rapidly, it can also grow in an instant (e.g. due to mark_page_accessed)
so you're right - sizing shadow node lru basing solely on the active lru
size would be too unpredictable.

> 
> It's true that both the shrinking of the active list and subsequent
> activations to regrow it will reduce the number of actionable
> refaults, and so it wouldn't be unreasonable to also shrink shadow
> nodes when the active list shrinks.
> 
> However, I think these are too many assumptions to encode in the
> shrinker, because it is only meant to prevent a worst-case explosion
> of radix tree nodes. I'd prefer it to be dumb and conservative.
> 
> Could we instead go with the current usage of the memcg? Whether
> reclaim happens globally or due to the memory limit, the usage at the
> time of reclaim gives a good idea of the memory is available to the
> group. But it's making less assumptions about the internal composition
> of the memcg's memory, and the consequences associated with that.

But that would likely result in wasting a considerable chunk of memory
for stale shadow nodes in case file caches constitute only a small part
of memcg memory consumption, which isn't good IMHO.

May be, we'd better use LRU_ALL_FILE / 2 instead?

diff --git a/mm/workingset.c b/mm/workingset.c
index 8c07cd8af15e..8a75f8d2916a 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -351,9 +351,10 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 
 	if (memcg_kmem_enabled())
 		pages = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid,
-						     BIT(LRU_ACTIVE_FILE));
+						     LRU_ALL_FILE);
 	else
-		pages = node_page_state(sc->nid, NR_ACTIVE_FILE);
+		pages = node_page_state(sc->nid, NR_ACTIVE_FILE) +
+			node_page_state(sc->nid, NR_INACTIVE_FILE);
 
 	/*
 	 * Active cache pages are limited to 50% of memory, and shadow
@@ -369,7 +370,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 	 *
 	 * PAGE_SIZE / radix_tree_nodes / node_entries / PAGE_SIZE
 	 */
-	max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3);
+	max_nodes = pages >> (1 + RADIX_TREE_MAP_SHIFT - 3);
 
 	if (shadow_nodes <= max_nodes)
 		return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware
  2016-02-08 14:28       ` Vladimir Davydov
@ 2016-02-08 20:43         ` Johannes Weiner
  -1 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08 20:43 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Mon, Feb 08, 2016 at 05:28:35PM +0300, Vladimir Davydov wrote:
> On Mon, Feb 08, 2016 at 01:23:53AM -0500, Johannes Weiner wrote:
> > It's true that both the shrinking of the active list and subsequent
> > activations to regrow it will reduce the number of actionable
> > refaults, and so it wouldn't be unreasonable to also shrink shadow
> > nodes when the active list shrinks.
> > 
> > However, I think these are too many assumptions to encode in the
> > shrinker, because it is only meant to prevent a worst-case explosion
> > of radix tree nodes. I'd prefer it to be dumb and conservative.
> > 
> > Could we instead go with the current usage of the memcg? Whether
> > reclaim happens globally or due to the memory limit, the usage at the
> > time of reclaim gives a good idea of the memory is available to the
> > group. But it's making less assumptions about the internal composition
> > of the memcg's memory, and the consequences associated with that.
> 
> But that would likely result in wasting a considerable chunk of memory
> for stale shadow nodes in case file caches constitute only a small part
> of memcg memory consumption, which isn't good IMHO.

Hm, that's probably true. But I think it's a separate patch at this
point - going from total memory to the cache portion for overhead
reasons - that shouldn't be conflated with the memcg awareness patch.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware
@ 2016-02-08 20:43         ` Johannes Weiner
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2016-02-08 20:43 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Mon, Feb 08, 2016 at 05:28:35PM +0300, Vladimir Davydov wrote:
> On Mon, Feb 08, 2016 at 01:23:53AM -0500, Johannes Weiner wrote:
> > It's true that both the shrinking of the active list and subsequent
> > activations to regrow it will reduce the number of actionable
> > refaults, and so it wouldn't be unreasonable to also shrink shadow
> > nodes when the active list shrinks.
> > 
> > However, I think these are too many assumptions to encode in the
> > shrinker, because it is only meant to prevent a worst-case explosion
> > of radix tree nodes. I'd prefer it to be dumb and conservative.
> > 
> > Could we instead go with the current usage of the memcg? Whether
> > reclaim happens globally or due to the memory limit, the usage at the
> > time of reclaim gives a good idea of the memory is available to the
> > group. But it's making less assumptions about the internal composition
> > of the memcg's memory, and the consequences associated with that.
> 
> But that would likely result in wasting a considerable chunk of memory
> for stale shadow nodes in case file caches constitute only a small part
> of memcg memory consumption, which isn't good IMHO.

Hm, that's probably true. But I think it's a separate patch at this
point - going from total memory to the cache portion for overhead
reasons - that shouldn't be conflated with the memcg awareness patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-02-08 20:44 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-07 17:27 [PATCH 0/5] mm: workingset: make shadow node shrinker memcg aware Vladimir Davydov
2016-02-07 17:27 ` Vladimir Davydov
2016-02-07 17:27 ` [PATCH 1/5] mm: memcontrol: enable kmem accounting for all cgroups in the legacy hierarchy Vladimir Davydov
2016-02-07 17:27   ` Vladimir Davydov
2016-02-08  5:46   ` Johannes Weiner
2016-02-08  5:46     ` Johannes Weiner
2016-02-07 17:27 ` [PATCH 2/5] mm: vmscan: pass root_mem_cgroup instead of NULL to memcg aware shrinker Vladimir Davydov
2016-02-07 17:27   ` Vladimir Davydov
2016-02-08  5:47   ` Johannes Weiner
2016-02-08  5:47     ` Johannes Weiner
2016-02-07 17:27 ` [PATCH 3/5] mm: memcontrol: zap memcg_kmem_online helper Vladimir Davydov
2016-02-07 17:27   ` Vladimir Davydov
2016-02-08  5:48   ` Johannes Weiner
2016-02-08  5:48     ` Johannes Weiner
2016-02-07 17:27 ` [PATCH 4/5] radix-tree: account radix_tree_node to memory cgroup Vladimir Davydov
2016-02-07 17:27   ` Vladimir Davydov
2016-02-08  6:01   ` Johannes Weiner
2016-02-08  6:01     ` Johannes Weiner
2016-02-07 17:27 ` [PATCH 5/5] mm: workingset: make shadow node shrinker memcg aware Vladimir Davydov
2016-02-07 17:27   ` Vladimir Davydov
2016-02-08  6:23   ` Johannes Weiner
2016-02-08  6:23     ` Johannes Weiner
2016-02-08 14:28     ` Vladimir Davydov
2016-02-08 14:28       ` Vladimir Davydov
2016-02-08 20:43       ` Johannes Weiner
2016-02-08 20:43         ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.