linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 PATCH 0/5] mm: memcontrol: do memory reclaim when offlining
@ 2019-01-05  0:19 Yang Shi
  2019-01-05  0:19 ` [v2 PATCH 1/5] doc: memcontrol: fix the obsolete content about force empty Yang Shi
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Yang Shi @ 2019-01-05  0:19 UTC (permalink / raw)
  To: mhocko, hannes, shakeelb, akpm; +Cc: yang.shi, linux-mm, linux-kernel


We have some usecases which create and remove memcgs very frequently,
and the tasks in the memcg may just access the files which are unlikely
accessed by anyone else.  So, we prefer force_empty the memcg before
rmdir'ing it to reclaim the page cache so that they don't get
accumulated to incur unnecessary memory pressure.  Since the memory
pressure may incur direct reclaim to harm some latency sensitive
applications.

Force empty would help out such usecase, however force empty reclaims
memory synchronously when writing to memory.force_empty.  It may take
some time to return and the afterwards operations are blocked by it.
Although this can be done in background, some usecases may need create
new memcg with the same name right after the old one is deleted.  So,
the creation might get blocked by the before reclaim/remove operation.

Delaying memory reclaim in cgroup offline for such usecase sounds
reasonable.  Introduced a new interface, called wipe_on_offline for both
default and legacy hierarchy, which does memory reclaim in css offline
kworker.

v1 -> v2:
* Introduced wipe_on_offline interface suggested by Michal
* Bring force_empty into default hierarchy

Patch #1: Fix some obsolete information about force_empty in the document
Patch #2: A minor improvement to skip swap for force_empty
Patch #3: Introduces wipe_on_offline interface
Patch #4: Being force_empty into default hierarchy
Patch #5: Document update

Yang Shi (5):
      doc: memcontrol: fix the obsolete content about force empty
      mm: memcontrol: do not try to do swap when force empty
      mm: memcontrol: introduce wipe_on_offline interface
      mm: memcontrol: bring force_empty into default hierarchy
      doc: memcontrol: add description for wipe_on_offline

 Documentation/admin-guide/cgroup-v2.rst | 23 +++++++++++++++++++++++
 Documentation/cgroup-v1/memory.txt      | 17 ++++++++++++++---
 include/linux/memcontrol.h              |  3 +++
 mm/memcontrol.c                         | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 94 insertions(+), 4 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [v2 PATCH 1/5] doc: memcontrol: fix the obsolete content about force empty
  2019-01-05  0:19 [RFC v2 PATCH 0/5] mm: memcontrol: do memory reclaim when offlining Yang Shi
@ 2019-01-05  0:19 ` Yang Shi
  2019-01-05  0:19 ` [v2 PATCH 2/5] mm: memcontrol: do not try to do swap when " Yang Shi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Yang Shi @ 2019-01-05  0:19 UTC (permalink / raw)
  To: mhocko, hannes, shakeelb, akpm; +Cc: yang.shi, linux-mm, linux-kernel

We don't do page cache reparent anymore when offlining memcg, so update
force empty related content accordingly.

Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 Documentation/cgroup-v1/memory.txt | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/cgroup-v1/memory.txt b/Documentation/cgroup-v1/memory.txt
index 3682e99..8e2cb1d 100644
--- a/Documentation/cgroup-v1/memory.txt
+++ b/Documentation/cgroup-v1/memory.txt
@@ -70,7 +70,7 @@ Brief summary of control files.
  memory.soft_limit_in_bytes	 # set/show soft limit of memory usage
  memory.stat			 # show various statistics
  memory.use_hierarchy		 # set/show hierarchical account enabled
- memory.force_empty		 # trigger forced move charge to parent
+ memory.force_empty		 # trigger forced page reclaim
  memory.pressure_level		 # set memory pressure notifications
  memory.swappiness		 # set/show swappiness parameter of vmscan
 				 (See sysctl's vm.swappiness)
@@ -459,8 +459,9 @@ About use_hierarchy, see Section 6.
   the cgroup will be reclaimed and as many pages reclaimed as possible.
 
   The typical use case for this interface is before calling rmdir().
-  Because rmdir() moves all pages to parent, some out-of-use page caches can be
-  moved to the parent. If you want to avoid that, force_empty will be useful.
+  Though rmdir() offlines memcg, but the memcg may still stay there due to
+  charged file caches. Some out-of-use page caches may keep charged until
+  memory pressure happens. If you want to avoid that, force_empty will be useful.
 
   Also, note that when memory.kmem.limit_in_bytes is set the charges due to
   kernel pages will still be seen. This is not considered a failure and the
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [v2 PATCH 2/5] mm: memcontrol: do not try to do swap when force empty
  2019-01-05  0:19 [RFC v2 PATCH 0/5] mm: memcontrol: do memory reclaim when offlining Yang Shi
  2019-01-05  0:19 ` [v2 PATCH 1/5] doc: memcontrol: fix the obsolete content about force empty Yang Shi
@ 2019-01-05  0:19 ` Yang Shi
  2019-01-05  0:43   ` Shakeel Butt
  2019-01-05  0:19 ` [v2 PATCH 3/5] mm: memcontrol: introduce wipe_on_offline interface Yang Shi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Yang Shi @ 2019-01-05  0:19 UTC (permalink / raw)
  To: mhocko, hannes, shakeelb, akpm; +Cc: yang.shi, linux-mm, linux-kernel

The typical usecase of force empty is to try to reclaim as much as
possible memory before offlining a memcg.  Since there should be no
attached tasks to offlining memcg, the tasks anonymous pages would have
already been freed or uncharged.  Even though anonymous pages get
swapped out, but they still get charged to swap space.  So, it sounds
pointless to do swap for force empty.

I tried to dig into the history of this, it was introduced by
commit 8c7c6e34a125 ("memcg: mem+swap controller core"), but there is
not any clue about why it was done so at the first place.

The below simple test script shows slight file cache reclaim improvement
when swap is on.

echo 3 > /proc/sys/vm/drop_caches
mkdir /sys/fs/cgroup/memory/test
echo 30 > /sys/fs/cgroup/memory/test/memory.swappiness
echo $$ >/sys/fs/cgroup/memory/test/cgroup.procs
cat /proc/meminfo | grep ^Cached|awk -F" " '{print $2}'
dd if=/dev/zero of=/mnt/test bs=1M count=1024
ping localhost > /dev/null &
echo 1 > /sys/fs/cgroup/memory/test/memory.force_empty
killall ping
echo $$ >/sys/fs/cgroup/memory/cgroup.procs
cat /proc/meminfo | grep ^Cached|awk -F" " '{print $2}'
rmdir /sys/fs/cgroup/memory/test
cat /proc/meminfo | grep ^Cached|awk -F" " '{print $2}'

The number of page cache is:
			w/o		w/
before force empty    1088792        1088784
after force empty     41492          39428
reclaimed	      1047300        1049356

Without doing swap, force empty can reclaim 2MB more memory in 1GB page
cache.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index af7f18b..75208a2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2895,7 +2895,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
 			return -EINTR;
 
 		progress = try_to_free_mem_cgroup_pages(memcg, 1,
-							GFP_KERNEL, true);
+							GFP_KERNEL, false);
 		if (!progress) {
 			nr_retries--;
 			/* maybe some writeback is necessary */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [v2 PATCH 3/5] mm: memcontrol: introduce wipe_on_offline interface
  2019-01-05  0:19 [RFC v2 PATCH 0/5] mm: memcontrol: do memory reclaim when offlining Yang Shi
  2019-01-05  0:19 ` [v2 PATCH 1/5] doc: memcontrol: fix the obsolete content about force empty Yang Shi
  2019-01-05  0:19 ` [v2 PATCH 2/5] mm: memcontrol: do not try to do swap when " Yang Shi
@ 2019-01-05  0:19 ` Yang Shi
  2019-01-05  0:47   ` Shakeel Butt
  2019-01-05  0:19 ` [v2 PATCH 4/5] mm: memcontrol: bring force_empty into default hierarchy Yang Shi
  2019-01-05  0:19 ` [v2 PATCH 5/5] doc: memcontrol: add description for wipe_on_offline Yang Shi
  4 siblings, 1 reply; 10+ messages in thread
From: Yang Shi @ 2019-01-05  0:19 UTC (permalink / raw)
  To: mhocko, hannes, shakeelb, akpm; +Cc: yang.shi, linux-mm, linux-kernel

We have some usecases which create and remove memcgs very frequently,
and the tasks in the memcg may just access the files which are unlikely
accessed by anyone else.  So, we prefer force_empty the memcg before
rmdir'ing it to reclaim the page cache so that they don't get
accumulated to incur unnecessary memory pressure.  Since the memory
pressure may incur direct reclaim to harm some latency sensitive
applications.

Force empty would help out such usecase, however force empty reclaims
memory synchronously when writing to memory.force_empty.  It may take
some time to return and the afterwards operations are blocked by it.
Although this can be done in background, some usecases may need create
new memcg with the same name right after the old one is deleted.  So,
the creation might get blocked by the before reclaim/remove operation.

Delaying memory reclaim in cgroup offline for such usecase sounds
reasonable.  Introduced a new interface, called wipe_on_offline for both
default and legacy hierarchy, which does memory reclaim in css offline
kworker.

Writing to 1 would enable it, writing 0 would disable it.

Suggested-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 include/linux/memcontrol.h |  3 +++
 mm/memcontrol.c            | 49 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 83ae11c..2f1258a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -311,6 +311,9 @@ struct mem_cgroup {
 	struct list_head event_list;
 	spinlock_t event_list_lock;
 
+	/* Reclaim as much as possible memory in offline kworker */
+	bool wipe_on_offline;
+
 	struct mem_cgroup_per_node *nodeinfo[0];
 	/* WARNING: nodeinfo must be the last member here */
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 75208a2..5a13c6b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2918,6 +2918,35 @@ static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of,
 	return mem_cgroup_force_empty(memcg) ?: nbytes;
 }
 
+static int wipe_on_offline_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
+
+	seq_printf(m, "%lu\n", (unsigned long)memcg->wipe_on_offline);
+
+	return 0;
+}
+
+static int wipe_on_offline_write(struct cgroup_subsys_state *css,
+				 struct cftype *cft, u64 val)
+{
+	int ret = 0;
+
+	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+
+	if (mem_cgroup_is_root(memcg))
+		return -EINVAL;
+
+	if (val == 0)
+		memcg->wipe_on_offline = false;
+	else if (val == 1)
+		memcg->wipe_on_offline = true;
+	else
+		ret = -EINVAL;
+
+	return ret;
+}
+
 static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css,
 				     struct cftype *cft)
 {
@@ -4283,6 +4312,11 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 		.write = mem_cgroup_reset,
 		.read_u64 = mem_cgroup_read_u64,
 	},
+	{
+		.name = "wipe_on_offline",
+		.seq_show = wipe_on_offline_show,
+		.write_u64 = wipe_on_offline_write,
+	},
 	{ },	/* terminate */
 };
 
@@ -4569,6 +4603,15 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	page_counter_set_min(&memcg->memory, 0);
 	page_counter_set_low(&memcg->memory, 0);
 
+	/*
+	 * Reclaim as much as possible memory when offlining.
+	 *
+	 * Do it after min/low is reset otherwise some memory might
+	 * be protected by min/low.
+	 */
+	if (memcg->wipe_on_offline)
+		mem_cgroup_force_empty(memcg);
+
 	memcg_offline_kmem(memcg);
 	wb_memcg_offline(memcg);
 
@@ -5694,6 +5737,12 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
 		.seq_show = memory_oom_group_show,
 		.write = memory_oom_group_write,
 	},
+	{
+		.name = "wipe_on_offline",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = wipe_on_offline_show,
+		.write_u64 = wipe_on_offline_write,
+	},
 	{ }	/* terminate */
 };
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [v2 PATCH 4/5] mm: memcontrol: bring force_empty into default hierarchy
  2019-01-05  0:19 [RFC v2 PATCH 0/5] mm: memcontrol: do memory reclaim when offlining Yang Shi
                   ` (2 preceding siblings ...)
  2019-01-05  0:19 ` [v2 PATCH 3/5] mm: memcontrol: introduce wipe_on_offline interface Yang Shi
@ 2019-01-05  0:19 ` Yang Shi
  2019-01-05  0:19 ` [v2 PATCH 5/5] doc: memcontrol: add description for wipe_on_offline Yang Shi
  4 siblings, 0 replies; 10+ messages in thread
From: Yang Shi @ 2019-01-05  0:19 UTC (permalink / raw)
  To: mhocko, hannes, shakeelb, akpm; +Cc: yang.shi, linux-mm, linux-kernel

The default hierarchy doesn't support force_empty, but there are some
usecases which create and remove memcgs very frequently, and the
tasks in the memcg may just access the files which are unlikely
accessed by anyone else. So, we prefer force_empty the memcg before
rmdir'ing it to reclaim the page cache so that they don't get
accumulated to incur unnecessary memory pressure. Since the memory
pressure may incur direct reclaim to harm some latency sensitive
applications.

There is another patch which introduces asynchronous memory reclaim when
offlining, but the behavior of force_empty is still needed by some
usecases which want to get the memory reclaimed immediately.  So, bring
force_empty interface in default hierarchy too.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 14 ++++++++++++++
 mm/memcontrol.c                         |  4 ++++
 2 files changed, 18 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 7bf3f12..0290c65 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1289,6 +1289,20 @@ PAGE_SIZE multiple when read back.
 	Shows pressure stall information for memory. See
 	Documentation/accounting/psi.txt for details.
 
+  memory.force_empty
+        This interface is provided to make cgroup's memory usage empty.
+        When writing anything to this
+
+        # echo 0 > memory.force_empty
+
+        the cgroup will be reclaimed and as many pages reclaimed as possible.
+
+        The typical use case for this interface is before calling rmdir().
+        Though rmdir() offlines memcg, but the memcg may still stay there due to
+        charged file caches. Some out-of-use page caches may keep charged until
+        memory pressure happens. If you want to avoid that, force_empty will be
+        useful.
+
 
 Usage Guidelines
 ~~~~~~~~~~~~~~~~
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5a13c6b..c4a7dc7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5743,6 +5743,10 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
 		.seq_show = wipe_on_offline_show,
 		.write_u64 = wipe_on_offline_write,
 	},
+	{
+		.name = "force_empty",
+		.write = mem_cgroup_force_empty_write,
+	},
 	{ }	/* terminate */
 };
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [v2 PATCH 5/5] doc: memcontrol: add description for wipe_on_offline
  2019-01-05  0:19 [RFC v2 PATCH 0/5] mm: memcontrol: do memory reclaim when offlining Yang Shi
                   ` (3 preceding siblings ...)
  2019-01-05  0:19 ` [v2 PATCH 4/5] mm: memcontrol: bring force_empty into default hierarchy Yang Shi
@ 2019-01-05  0:19 ` Yang Shi
  4 siblings, 0 replies; 10+ messages in thread
From: Yang Shi @ 2019-01-05  0:19 UTC (permalink / raw)
  To: mhocko, hannes, shakeelb, akpm; +Cc: yang.shi, linux-mm, linux-kernel

Add desprition of wipe_on_offline interface in cgroup documents.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  9 +++++++++
 Documentation/cgroup-v1/memory.txt      | 10 ++++++++++
 2 files changed, 19 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 0290c65..e4ef08c 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1303,6 +1303,15 @@ PAGE_SIZE multiple when read back.
         memory pressure happens. If you want to avoid that, force_empty will be
         useful.
 
+  memory.wipe_on_offline
+
+        This is similar to force_empty, but it just does memory reclaim
+        asynchronously in css offline kworker.
+
+        Writing into 1 will enable it, disable it by writing into 0.
+
+        It would reclaim as much as possible memory just as what force_empty does.
+
 
 Usage Guidelines
 ~~~~~~~~~~~~~~~~
diff --git a/Documentation/cgroup-v1/memory.txt b/Documentation/cgroup-v1/memory.txt
index 8e2cb1d..1c6e1ca 100644
--- a/Documentation/cgroup-v1/memory.txt
+++ b/Documentation/cgroup-v1/memory.txt
@@ -71,6 +71,7 @@ Brief summary of control files.
  memory.stat			 # show various statistics
  memory.use_hierarchy		 # set/show hierarchical account enabled
  memory.force_empty		 # trigger forced page reclaim
+ memory.wipe_on_offline		 # trigger forced page reclaim when offlining
  memory.pressure_level		 # set memory pressure notifications
  memory.swappiness		 # set/show swappiness parameter of vmscan
 				 (See sysctl's vm.swappiness)
@@ -581,6 +582,15 @@ hierarchical_<counter>=<counter pages> N0=<node 0 pages> N1=<node 1 pages> ...
 
 The "total" count is sum of file + anon + unevictable.
 
+5.7 wipe_on_offline
+
+This is similar to force_empty, but it just does memory reclaim asynchronously
+in css offline kworker.
+
+Writing into 1 will enable it, disable it by writing into 0.
+
+It would reclaim as much as possible memory just as what force_empty does.
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [v2 PATCH 2/5] mm: memcontrol: do not try to do swap when force empty
  2019-01-05  0:19 ` [v2 PATCH 2/5] mm: memcontrol: do not try to do swap when " Yang Shi
@ 2019-01-05  0:43   ` Shakeel Butt
  2019-01-09 17:55     ` Yang Shi
  0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2019-01-05  0:43 UTC (permalink / raw)
  To: Yang Shi; +Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Linux MM, LKML

On Fri, Jan 4, 2019 at 4:21 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
> The typical usecase of force empty is to try to reclaim as much as
> possible memory before offlining a memcg.  Since there should be no
> attached tasks to offlining memcg, the tasks anonymous pages would have
> already been freed or uncharged.  Even though anonymous pages get
> swapped out, but they still get charged to swap space.  So, it sounds
> pointless to do swap for force empty.
>
> I tried to dig into the history of this, it was introduced by
> commit 8c7c6e34a125 ("memcg: mem+swap controller core"), but there is
> not any clue about why it was done so at the first place.
>
> The below simple test script shows slight file cache reclaim improvement
> when swap is on.
>
> echo 3 > /proc/sys/vm/drop_caches
> mkdir /sys/fs/cgroup/memory/test
> echo 30 > /sys/fs/cgroup/memory/test/memory.swappiness
> echo $$ >/sys/fs/cgroup/memory/test/cgroup.procs
> cat /proc/meminfo | grep ^Cached|awk -F" " '{print $2}'
> dd if=/dev/zero of=/mnt/test bs=1M count=1024
> ping localhost > /dev/null &
> echo 1 > /sys/fs/cgroup/memory/test/memory.force_empty
> killall ping
> echo $$ >/sys/fs/cgroup/memory/cgroup.procs
> cat /proc/meminfo | grep ^Cached|awk -F" " '{print $2}'
> rmdir /sys/fs/cgroup/memory/test
> cat /proc/meminfo | grep ^Cached|awk -F" " '{print $2}'
>
> The number of page cache is:
>                         w/o             w/
> before force empty    1088792        1088784
> after force empty     41492          39428
> reclaimed             1047300        1049356
>
> Without doing swap, force empty can reclaim 2MB more memory in 1GB page
> cache.
>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index af7f18b..75208a2 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2895,7 +2895,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
>                         return -EINTR;
>
>                 progress = try_to_free_mem_cgroup_pages(memcg, 1,
> -                                                       GFP_KERNEL, true);
> +                                                       GFP_KERNEL, false);

I think we agreed not to change the behavior of force_empty. You can
customize 'force_empty on wipe_on_offline' to not swapout.

>                 if (!progress) {
>                         nr_retries--;
>                         /* maybe some writeback is necessary */
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v2 PATCH 3/5] mm: memcontrol: introduce wipe_on_offline interface
  2019-01-05  0:19 ` [v2 PATCH 3/5] mm: memcontrol: introduce wipe_on_offline interface Yang Shi
@ 2019-01-05  0:47   ` Shakeel Butt
  2019-01-09 17:59     ` Yang Shi
  0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2019-01-05  0:47 UTC (permalink / raw)
  To: Yang Shi; +Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Linux MM, LKML

On Fri, Jan 4, 2019 at 4:21 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
> We have some usecases which create and remove memcgs very frequently,
> and the tasks in the memcg may just access the files which are unlikely
> accessed by anyone else.  So, we prefer force_empty the memcg before
> rmdir'ing it to reclaim the page cache so that they don't get
> accumulated to incur unnecessary memory pressure.  Since the memory
> pressure may incur direct reclaim to harm some latency sensitive
> applications.
>
> Force empty would help out such usecase, however force empty reclaims
> memory synchronously when writing to memory.force_empty.  It may take
> some time to return and the afterwards operations are blocked by it.
> Although this can be done in background, some usecases may need create
> new memcg with the same name right after the old one is deleted.  So,
> the creation might get blocked by the before reclaim/remove operation.
>
> Delaying memory reclaim in cgroup offline for such usecase sounds
> reasonable.  Introduced a new interface, called wipe_on_offline for both
> default and legacy hierarchy, which does memory reclaim in css offline
> kworker.
>
> Writing to 1 would enable it, writing 0 would disable it.
>
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
>  include/linux/memcontrol.h |  3 +++
>  mm/memcontrol.c            | 49 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 52 insertions(+)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 83ae11c..2f1258a 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -311,6 +311,9 @@ struct mem_cgroup {
>         struct list_head event_list;
>         spinlock_t event_list_lock;
>
> +       /* Reclaim as much as possible memory in offline kworker */
> +       bool wipe_on_offline;
> +
>         struct mem_cgroup_per_node *nodeinfo[0];
>         /* WARNING: nodeinfo must be the last member here */
>  };
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 75208a2..5a13c6b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2918,6 +2918,35 @@ static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of,
>         return mem_cgroup_force_empty(memcg) ?: nbytes;
>  }
>
> +static int wipe_on_offline_show(struct seq_file *m, void *v)
> +{
> +       struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> +
> +       seq_printf(m, "%lu\n", (unsigned long)memcg->wipe_on_offline);
> +
> +       return 0;
> +}
> +
> +static int wipe_on_offline_write(struct cgroup_subsys_state *css,
> +                                struct cftype *cft, u64 val)
> +{
> +       int ret = 0;
> +
> +       struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> +
> +       if (mem_cgroup_is_root(memcg))
> +               return -EINVAL;
> +
> +       if (val == 0)
> +               memcg->wipe_on_offline = false;
> +       else if (val == 1)
> +               memcg->wipe_on_offline = true;
> +       else
> +               ret = -EINVAL;
> +
> +       return ret;
> +}
> +
>  static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css,
>                                      struct cftype *cft)
>  {
> @@ -4283,6 +4312,11 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
>                 .write = mem_cgroup_reset,
>                 .read_u64 = mem_cgroup_read_u64,
>         },
> +       {
> +               .name = "wipe_on_offline",

What about "force_empty_on_offline"?

> +               .seq_show = wipe_on_offline_show,
> +               .write_u64 = wipe_on_offline_write,
> +       },
>         { },    /* terminate */
>  };
>
> @@ -4569,6 +4603,15 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>         page_counter_set_min(&memcg->memory, 0);
>         page_counter_set_low(&memcg->memory, 0);
>
> +       /*
> +        * Reclaim as much as possible memory when offlining.
> +        *
> +        * Do it after min/low is reset otherwise some memory might
> +        * be protected by min/low.
> +        */
> +       if (memcg->wipe_on_offline)
> +               mem_cgroup_force_empty(memcg);
> +

mem_cgroup_force_empty() also does drain_all_stock(), so, move
drain_all_stock() in mem_cgroup_css_offline() to the else of 'if
(memcg->wipe_on_offline)'.

>         memcg_offline_kmem(memcg);
>         wb_memcg_offline(memcg);
>
> @@ -5694,6 +5737,12 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
>                 .seq_show = memory_oom_group_show,
>                 .write = memory_oom_group_write,
>         },
> +       {
> +               .name = "wipe_on_offline",
> +               .flags = CFTYPE_NOT_ON_ROOT,
> +               .seq_show = wipe_on_offline_show,
> +               .write_u64 = wipe_on_offline_write,
> +       },
>         { }     /* terminate */
>  };
>
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v2 PATCH 2/5] mm: memcontrol: do not try to do swap when force empty
  2019-01-05  0:43   ` Shakeel Butt
@ 2019-01-09 17:55     ` Yang Shi
  0 siblings, 0 replies; 10+ messages in thread
From: Yang Shi @ 2019-01-09 17:55 UTC (permalink / raw)
  To: Shakeel Butt; +Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Linux MM, LKML



On 1/4/19 4:43 PM, Shakeel Butt wrote:
> On Fri, Jan 4, 2019 at 4:21 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>> The typical usecase of force empty is to try to reclaim as much as
>> possible memory before offlining a memcg.  Since there should be no
>> attached tasks to offlining memcg, the tasks anonymous pages would have
>> already been freed or uncharged.  Even though anonymous pages get
>> swapped out, but they still get charged to swap space.  So, it sounds
>> pointless to do swap for force empty.
>>
>> I tried to dig into the history of this, it was introduced by
>> commit 8c7c6e34a125 ("memcg: mem+swap controller core"), but there is
>> not any clue about why it was done so at the first place.
>>
>> The below simple test script shows slight file cache reclaim improvement
>> when swap is on.
>>
>> echo 3 > /proc/sys/vm/drop_caches
>> mkdir /sys/fs/cgroup/memory/test
>> echo 30 > /sys/fs/cgroup/memory/test/memory.swappiness
>> echo $$ >/sys/fs/cgroup/memory/test/cgroup.procs
>> cat /proc/meminfo | grep ^Cached|awk -F" " '{print $2}'
>> dd if=/dev/zero of=/mnt/test bs=1M count=1024
>> ping localhost > /dev/null &
>> echo 1 > /sys/fs/cgroup/memory/test/memory.force_empty
>> killall ping
>> echo $$ >/sys/fs/cgroup/memory/cgroup.procs
>> cat /proc/meminfo | grep ^Cached|awk -F" " '{print $2}'
>> rmdir /sys/fs/cgroup/memory/test
>> cat /proc/meminfo | grep ^Cached|awk -F" " '{print $2}'
>>
>> The number of page cache is:
>>                          w/o             w/
>> before force empty    1088792        1088784
>> after force empty     41492          39428
>> reclaimed             1047300        1049356
>>
>> Without doing swap, force empty can reclaim 2MB more memory in 1GB page
>> cache.
>>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>>   mm/memcontrol.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index af7f18b..75208a2 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2895,7 +2895,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
>>                          return -EINTR;
>>
>>                  progress = try_to_free_mem_cgroup_pages(memcg, 1,
>> -                                                       GFP_KERNEL, true);
>> +                                                       GFP_KERNEL, false);
> I think we agreed not to change the behavior of force_empty. You can
> customize 'force_empty on wipe_on_offline' to not swapout.

OK, will keep force_empty intact.

Thanks,
Yang

>
>>                  if (!progress) {
>>                          nr_retries--;
>>                          /* maybe some writeback is necessary */
>> --
>> 1.8.3.1
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [v2 PATCH 3/5] mm: memcontrol: introduce wipe_on_offline interface
  2019-01-05  0:47   ` Shakeel Butt
@ 2019-01-09 17:59     ` Yang Shi
  0 siblings, 0 replies; 10+ messages in thread
From: Yang Shi @ 2019-01-09 17:59 UTC (permalink / raw)
  To: Shakeel Butt; +Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Linux MM, LKML



On 1/4/19 4:47 PM, Shakeel Butt wrote:
> On Fri, Jan 4, 2019 at 4:21 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>> We have some usecases which create and remove memcgs very frequently,
>> and the tasks in the memcg may just access the files which are unlikely
>> accessed by anyone else.  So, we prefer force_empty the memcg before
>> rmdir'ing it to reclaim the page cache so that they don't get
>> accumulated to incur unnecessary memory pressure.  Since the memory
>> pressure may incur direct reclaim to harm some latency sensitive
>> applications.
>>
>> Force empty would help out such usecase, however force empty reclaims
>> memory synchronously when writing to memory.force_empty.  It may take
>> some time to return and the afterwards operations are blocked by it.
>> Although this can be done in background, some usecases may need create
>> new memcg with the same name right after the old one is deleted.  So,
>> the creation might get blocked by the before reclaim/remove operation.
>>
>> Delaying memory reclaim in cgroup offline for such usecase sounds
>> reasonable.  Introduced a new interface, called wipe_on_offline for both
>> default and legacy hierarchy, which does memory reclaim in css offline
>> kworker.
>>
>> Writing to 1 would enable it, writing 0 would disable it.
>>
>> Suggested-by: Michal Hocko <mhocko@suse.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>>   include/linux/memcontrol.h |  3 +++
>>   mm/memcontrol.c            | 49 ++++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 52 insertions(+)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 83ae11c..2f1258a 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -311,6 +311,9 @@ struct mem_cgroup {
>>          struct list_head event_list;
>>          spinlock_t event_list_lock;
>>
>> +       /* Reclaim as much as possible memory in offline kworker */
>> +       bool wipe_on_offline;
>> +
>>          struct mem_cgroup_per_node *nodeinfo[0];
>>          /* WARNING: nodeinfo must be the last member here */
>>   };
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 75208a2..5a13c6b 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2918,6 +2918,35 @@ static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of,
>>          return mem_cgroup_force_empty(memcg) ?: nbytes;
>>   }
>>
>> +static int wipe_on_offline_show(struct seq_file *m, void *v)
>> +{
>> +       struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
>> +
>> +       seq_printf(m, "%lu\n", (unsigned long)memcg->wipe_on_offline);
>> +
>> +       return 0;
>> +}
>> +
>> +static int wipe_on_offline_write(struct cgroup_subsys_state *css,
>> +                                struct cftype *cft, u64 val)
>> +{
>> +       int ret = 0;
>> +
>> +       struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>> +
>> +       if (mem_cgroup_is_root(memcg))
>> +               return -EINVAL;
>> +
>> +       if (val == 0)
>> +               memcg->wipe_on_offline = false;
>> +       else if (val == 1)
>> +               memcg->wipe_on_offline = true;
>> +       else
>> +               ret = -EINVAL;
>> +
>> +       return ret;
>> +}
>> +
>>   static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css,
>>                                       struct cftype *cft)
>>   {
>> @@ -4283,6 +4312,11 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
>>                  .write = mem_cgroup_reset,
>>                  .read_u64 = mem_cgroup_read_u64,
>>          },
>> +       {
>> +               .name = "wipe_on_offline",
> What about "force_empty_on_offline"?

Actually, I don't have preference to the name of the knob. However, 
wipe_on_offline looks shorter.

>
>> +               .seq_show = wipe_on_offline_show,
>> +               .write_u64 = wipe_on_offline_write,
>> +       },
>>          { },    /* terminate */
>>   };
>>
>> @@ -4569,6 +4603,15 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>>          page_counter_set_min(&memcg->memory, 0);
>>          page_counter_set_low(&memcg->memory, 0);
>>
>> +       /*
>> +        * Reclaim as much as possible memory when offlining.
>> +        *
>> +        * Do it after min/low is reset otherwise some memory might
>> +        * be protected by min/low.
>> +        */
>> +       if (memcg->wipe_on_offline)
>> +               mem_cgroup_force_empty(memcg);
>> +
> mem_cgroup_force_empty() also does drain_all_stock(), so, move
> drain_all_stock() in mem_cgroup_css_offline() to the else of 'if
> (memcg->wipe_on_offline)'.

Sure.

Thanks,
Yang

>
>>          memcg_offline_kmem(memcg);
>>          wb_memcg_offline(memcg);
>>
>> @@ -5694,6 +5737,12 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
>>                  .seq_show = memory_oom_group_show,
>>                  .write = memory_oom_group_write,
>>          },
>> +       {
>> +               .name = "wipe_on_offline",
>> +               .flags = CFTYPE_NOT_ON_ROOT,
>> +               .seq_show = wipe_on_offline_show,
>> +               .write_u64 = wipe_on_offline_write,
>> +       },
>>          { }     /* terminate */
>>   };
>>
>> --
>> 1.8.3.1
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-01-09 18:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-05  0:19 [RFC v2 PATCH 0/5] mm: memcontrol: do memory reclaim when offlining Yang Shi
2019-01-05  0:19 ` [v2 PATCH 1/5] doc: memcontrol: fix the obsolete content about force empty Yang Shi
2019-01-05  0:19 ` [v2 PATCH 2/5] mm: memcontrol: do not try to do swap when " Yang Shi
2019-01-05  0:43   ` Shakeel Butt
2019-01-09 17:55     ` Yang Shi
2019-01-05  0:19 ` [v2 PATCH 3/5] mm: memcontrol: introduce wipe_on_offline interface Yang Shi
2019-01-05  0:47   ` Shakeel Butt
2019-01-09 17:59     ` Yang Shi
2019-01-05  0:19 ` [v2 PATCH 4/5] mm: memcontrol: bring force_empty into default hierarchy Yang Shi
2019-01-05  0:19 ` [v2 PATCH 5/5] doc: memcontrol: add description for wipe_on_offline Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).