[PATCH 0/2] mm/vmscan: calculate reclaimed slab in all reclaim paths

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] mm/vmscan: calculate reclaimed slab in all reclaim paths
@ 2019-06-21 10:14 Yafang Shao
  2019-06-21 10:14 ` [PATCH 1/2] mm/vmscan: add a new member reclaim_state in struct shrink_control Yafang Shao
  2019-06-21 10:14 ` [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths Yafang Shao
  0 siblings, 2 replies; 9+ messages in thread
From: Yafang Shao @ 2019-06-21 10:14 UTC (permalink / raw)
  To: akpm, ktkhai, mhocko, hannes, vdavydov.dev, mgorman; +Cc: linux-mm, Yafang Shao

This patchset is to fix the issues in doing shrink slab.

There're six different reclaim paths by now,
- kswapd reclaim path
- node reclaim path
- hibernate preallocate memory reclaim path
- direct reclaim path
- memcg reclaim path
- memcg softlimit reclaim path

The slab caches reclaimed in these paths are only calculated in the above
three paths.
The issues are detailed explained in patch #2.
We should calculate the reclaimed slab caches in every reclaim path.
In order to do it, the struct reclaim_state is placed into the
struct shrink_control.

In node reclaim path, there'is another issue about shrinking slab,
which is adressed in another patch[1].


[1] mm/vmscan: shrink slab in node reclaim
https://lore.kernel.org/linux-mm/1559874946-22960-1-git-send-email-laoar.shao@gmail.com/

Yafang Shao (2):
  mm/vmscan: add a new member reclaim_state in struct shrink_control
  mm/vmscan: calculate reclaimed slab caches in all reclaim paths

 mm/vmscan.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] mm/vmscan: add a new member reclaim_state in struct shrink_control
  2019-06-21 10:14 [PATCH 0/2] mm/vmscan: calculate reclaimed slab in all reclaim paths Yafang Shao
@ 2019-06-21 10:14 ` Yafang Shao
  2019-06-21 10:14 ` [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths Yafang Shao
  1 sibling, 0 replies; 9+ messages in thread
From: Yafang Shao @ 2019-06-21 10:14 UTC (permalink / raw)
  To: akpm, ktkhai, mhocko, hannes, vdavydov.dev, mgorman; +Cc: linux-mm, Yafang Shao

The struct reclaim_state is used to record how many slab caches are
reclaimed in one reclaim path.
The struct shrink_control is used to control one reclaim path.
So we'd better put reclaim_state into shrink_control.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/vmscan.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b79f584..18a66e5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -131,6 +131,9 @@ struct scan_control {
 		unsigned int file_taken;
 		unsigned int taken;
 	} nr;
+
+	/* for recording the reclaimed slab by now */
+	struct reclaim_state reclaim_state;
 };
 
 #ifdef ARCH_HAS_PREFETCH
@@ -3461,6 +3464,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
 		.may_unmap = 1,
 	};
 
+	current->reclaim_state = &sc.reclaim_state;
 	psi_memstall_enter(&pflags);
 	__fs_reclaim_acquire();
 
@@ -3642,6 +3646,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
 	snapshot_refaults(NULL, pgdat);
 	__fs_reclaim_release();
 	psi_memstall_leave(&pflags);
+	current->reclaim_state = NULL;
+
 	/*
 	 * Return the order kswapd stopped reclaiming at as
 	 * prepare_kswapd_sleep() takes it into account. If another caller
@@ -3766,15 +3772,10 @@ static int kswapd(void *p)
 	unsigned int classzone_idx = MAX_NR_ZONES - 1;
 	pg_data_t *pgdat = (pg_data_t*)p;
 	struct task_struct *tsk = current;
-
-	struct reclaim_state reclaim_state = {
-		.reclaimed_slab = 0,
-	};
 	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
 
 	if (!cpumask_empty(cpumask))
 		set_cpus_allowed_ptr(tsk, cpumask);
-	current->reclaim_state = &reclaim_state;
 
 	/*
 	 * Tell the memory management that we're a "memory allocator",
@@ -3836,7 +3837,6 @@ static int kswapd(void *p)
 	}
 
 	tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
-	current->reclaim_state = NULL;
 
 	return 0;
 }
@@ -3897,7 +3897,6 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
  */
 unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 {
-	struct reclaim_state reclaim_state;
 	struct scan_control sc = {
 		.nr_to_reclaim = nr_to_reclaim,
 		.gfp_mask = GFP_HIGHUSER_MOVABLE,
@@ -3915,8 +3914,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 
 	fs_reclaim_acquire(sc.gfp_mask);
 	noreclaim_flag = memalloc_noreclaim_save();
-	reclaim_state.reclaimed_slab = 0;
-	p->reclaim_state = &reclaim_state;
+	p->reclaim_state = &sc.reclaim_state;
 
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
 
@@ -4085,7 +4083,6 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 	/* Minimum pages needed in order to stay on node */
 	const unsigned long nr_pages = 1 << order;
 	struct task_struct *p = current;
-	struct reclaim_state reclaim_state;
 	unsigned int noreclaim_flag;
 	struct scan_control sc = {
 		.nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX),
@@ -4110,8 +4107,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 	 */
 	noreclaim_flag = memalloc_noreclaim_save();
 	p->flags |= PF_SWAPWRITE;
-	reclaim_state.reclaimed_slab = 0;
-	p->reclaim_state = &reclaim_state;
+	p->reclaim_state = &sc.reclaim_state;
 
 	if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
 		/*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths
  2019-06-21 10:14 [PATCH 0/2] mm/vmscan: calculate reclaimed slab in all reclaim paths Yafang Shao
  2019-06-21 10:14 ` [PATCH 1/2] mm/vmscan: add a new member reclaim_state in struct shrink_control Yafang Shao
@ 2019-06-21 10:14 ` Yafang Shao
  2019-06-22  3:30   ` Andrew Morton
  2019-06-24  8:53   ` Kirill Tkhai
  1 sibling, 2 replies; 9+ messages in thread
From: Yafang Shao @ 2019-06-21 10:14 UTC (permalink / raw)
  To: akpm, ktkhai, mhocko, hannes, vdavydov.dev, mgorman; +Cc: linux-mm, Yafang Shao

There're six different reclaim paths by now,
- kswapd reclaim path
- node reclaim path
- hibernate preallocate memory reclaim path
- direct reclaim path
- memcg reclaim path
- memcg softlimit reclaim path

The slab caches reclaimed in these paths are only calculated in the above
three paths.

There're some drawbacks if we don't calculate the reclaimed slab caches.
- The sc->nr_reclaimed isn't correct if there're some slab caches
  relcaimed in this path.
- The slab caches may be reclaimed thoroughly if there're lots of
  reclaimable slab caches and few page caches.
  Let's take an easy example for this case.
  If one memcg is full of slab caches and the limit of it is 512M, in
  other words there're approximately 512M slab caches in this memcg.
  Then the limit of the memcg is reached and the memcg reclaim begins,
  and then in this memcg reclaim path it will continuesly reclaim the
  slab caches until the sc->priority drops to 0.
  After this reclaim stops, you will find there're few slab caches left,
  which is less than 20M in my test case.
  While after this patch applied the number is greater than 300M and
  the sc->priority only drops to 3.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/vmscan.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 18a66e5..d6c3fc8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3164,11 +3164,13 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 	if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask))
 		return 1;
 
+	current->reclaim_state = &sc.reclaim_state;
 	trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
 
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
 
 	trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
+	current->reclaim_state = NULL;
 
 	return nr_reclaimed;
 }
@@ -3191,6 +3193,7 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg,
 	};
 	unsigned long lru_pages;
 
+	current->reclaim_state = &sc.reclaim_state;
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
 
@@ -3212,7 +3215,9 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg,
 					cgroup_ino(memcg->css.cgroup),
 					sc.nr_reclaimed);
 
+	current->reclaim_state = NULL;
 	*nr_scanned = sc.nr_scanned;
+
 	return sc.nr_reclaimed;
 }
 
@@ -3239,6 +3244,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 		.may_shrinkslab = 1,
 	};
 
+	current->reclaim_state = &sc.reclaim_state;
 	/*
 	 * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
 	 * take care of from where we get pages. So the node where we start the
@@ -3263,6 +3269,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 	trace_mm_vmscan_memcg_reclaim_end(
 				cgroup_ino(memcg->css.cgroup),
 				nr_reclaimed);
+	current->reclaim_state = NULL;
 
 	return nr_reclaimed;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths
  2019-06-21 10:14 ` [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths Yafang Shao
@ 2019-06-22  3:30   ` Andrew Morton
  2019-06-22  6:31     ` Yafang Shao
  2019-06-24  8:53   ` Kirill Tkhai
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2019-06-22  3:30 UTC (permalink / raw)
  To: Yafang Shao; +Cc: ktkhai, mhocko, hannes, vdavydov.dev, mgorman, linux-mm

On Fri, 21 Jun 2019 18:14:46 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:

> There're six different reclaim paths by now,
> - kswapd reclaim path
> - node reclaim path
> - hibernate preallocate memory reclaim path
> - direct reclaim path
> - memcg reclaim path
> - memcg softlimit reclaim path
> 
> The slab caches reclaimed in these paths are only calculated in the above
> three paths.
> 
> There're some drawbacks if we don't calculate the reclaimed slab caches.
> - The sc->nr_reclaimed isn't correct if there're some slab caches
>   relcaimed in this path.
> - The slab caches may be reclaimed thoroughly if there're lots of
>   reclaimable slab caches and few page caches.
>   Let's take an easy example for this case.
>   If one memcg is full of slab caches and the limit of it is 512M, in
>   other words there're approximately 512M slab caches in this memcg.
>   Then the limit of the memcg is reached and the memcg reclaim begins,
>   and then in this memcg reclaim path it will continuesly reclaim the
>   slab caches until the sc->priority drops to 0.
>   After this reclaim stops, you will find there're few slab caches left,
>   which is less than 20M in my test case.
>   While after this patch applied the number is greater than 300M and
>   the sc->priority only drops to 3.

I got a bit exhausted checking that none of these six callsites can
scribble on some caller's value of current->reclaim_state.

How about we do it at runtime?

From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm/vmscan.c: add checks for incorrect handling of current->reclaim_state

Six sites are presently altering current->reclaim_state.  There is a risk
that one function stomps on a caller's value.  Use a helper function to
catch such errors.

Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |   37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

--- a/mm/vmscan.c~mm-vmscanc-add-checks-for-incorrect-handling-of-current-reclaim_state
+++ a/mm/vmscan.c
@@ -177,6 +177,18 @@ unsigned long vm_total_pages;
 static LIST_HEAD(shrinker_list);
 static DECLARE_RWSEM(shrinker_rwsem);
 
+static void set_task_reclaim_state(struct task_struct *task,
+				   struct reclaim_state *rs)
+{
+	/* Check for an overwrite */
+	WARN_ON_ONCE(rs && task->reclaim_state);
+
+	/* Check for the nulling of an already-nulled member */
+	WARN_ON_ONCE(!rs && !task->reclaim_state);
+
+	task->reclaim_state = rs;
+}
+
 #ifdef CONFIG_MEMCG_KMEM
 
 /*
@@ -3194,13 +3206,13 @@ unsigned long try_to_free_pages(struct z
 	if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask))
 		return 1;
 
-	current->reclaim_state = &sc.reclaim_state;
+	set_task_reclaim_state(current, &sc.reclaim_state);
 	trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
 
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
 
 	trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
-	current->reclaim_state = NULL;
+	set_task_reclaim_state(current, NULL);
 
 	return nr_reclaimed;
 }
@@ -3223,7 +3235,7 @@ unsigned long mem_cgroup_shrink_node(str
 	};
 	unsigned long lru_pages;
 
-	current->reclaim_state = &sc.reclaim_state;
+	set_task_reclaim_state(current, &sc.reclaim_state);
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
 
@@ -3245,7 +3257,7 @@ unsigned long mem_cgroup_shrink_node(str
 					cgroup_ino(memcg->css.cgroup),
 					sc.nr_reclaimed);
 
-	current->reclaim_state = NULL;
+	set_task_reclaim_state(current, NULL);
 	*nr_scanned = sc.nr_scanned;
 
 	return sc.nr_reclaimed;
@@ -3274,7 +3286,7 @@ unsigned long try_to_free_mem_cgroup_pag
 		.may_shrinkslab = 1,
 	};
 
-	current->reclaim_state = &sc.reclaim_state;
+	set_task_reclaim_state(current, &sc.reclaim_state);
 	/*
 	 * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
 	 * take care zof from where we get pages. So the node where we start the
@@ -3299,7 +3311,7 @@ unsigned long try_to_free_mem_cgroup_pag
 	trace_mm_vmscan_memcg_reclaim_end(
 				cgroup_ino(memcg->css.cgroup),
 				nr_reclaimed);
-	current->reclaim_state = NULL;
+	set_task_reclaim_state(current, NULL);
 
 	return nr_reclaimed;
 }
@@ -3501,7 +3513,7 @@ static int balance_pgdat(pg_data_t *pgda
 		.may_unmap = 1,
 	};
 
-	current->reclaim_state = &sc.reclaim_state;
+	set_task_reclaim_state(current, &sc.reclaim_state);
 	psi_memstall_enter(&pflags);
 	__fs_reclaim_acquire();
 
@@ -3683,7 +3695,7 @@ out:
 	snapshot_refaults(NULL, pgdat);
 	__fs_reclaim_release();
 	psi_memstall_leave(&pflags);
-	current->reclaim_state = NULL;
+	set_task_reclaim_state(current, NULL);
 
 	/*
 	 * Return the order kswapd stopped reclaiming at as
@@ -3945,17 +3957,16 @@ unsigned long shrink_all_memory(unsigned
 		.hibernation_mode = 1,
 	};
 	struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
-	struct task_struct *p = current;
 	unsigned long nr_reclaimed;
 	unsigned int noreclaim_flag;
 
 	fs_reclaim_acquire(sc.gfp_mask);
 	noreclaim_flag = memalloc_noreclaim_save();
-	p->reclaim_state = &sc.reclaim_state;
+	set_task_reclaim_state(current, &sc.reclaim_state);
 
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
 
-	p->reclaim_state = NULL;
+	set_task_reclaim_state(current, NULL);
 	memalloc_noreclaim_restore(noreclaim_flag);
 	fs_reclaim_release(sc.gfp_mask);
 
@@ -4144,7 +4155,7 @@ static int __node_reclaim(struct pglist_
 	 */
 	noreclaim_flag = memalloc_noreclaim_save();
 	p->flags |= PF_SWAPWRITE;
-	p->reclaim_state = &sc.reclaim_state;
+	set_task_reclaim_state(p, &sc.reclaim_state);
 
 	if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
 		/*
@@ -4156,7 +4167,7 @@ static int __node_reclaim(struct pglist_
 		} while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
 	}
 
-	p->reclaim_state = NULL;
+	set_task_reclaim_state(p, NULL);
 	current->flags &= ~PF_SWAPWRITE;
 	memalloc_noreclaim_restore(noreclaim_flag);
 	fs_reclaim_release(sc.gfp_mask);
_


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths
  2019-06-22  3:30   ` Andrew Morton
@ 2019-06-22  6:31     ` Yafang Shao
  0 siblings, 0 replies; 9+ messages in thread
From: Yafang Shao @ 2019-06-22  6:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill Tkhai, Michal Hocko, Johannes Weiner, Vladimir Davydov,
	Mel Gorman, Linux MM

On Sat, Jun 22, 2019 at 11:30 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Fri, 21 Jun 2019 18:14:46 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > There're six different reclaim paths by now,
> > - kswapd reclaim path
> > - node reclaim path
> > - hibernate preallocate memory reclaim path
> > - direct reclaim path
> > - memcg reclaim path
> > - memcg softlimit reclaim path
> >
> > The slab caches reclaimed in these paths are only calculated in the above
> > three paths.
> >
> > There're some drawbacks if we don't calculate the reclaimed slab caches.
> > - The sc->nr_reclaimed isn't correct if there're some slab caches
> >   relcaimed in this path.
> > - The slab caches may be reclaimed thoroughly if there're lots of
> >   reclaimable slab caches and few page caches.
> >   Let's take an easy example for this case.
> >   If one memcg is full of slab caches and the limit of it is 512M, in
> >   other words there're approximately 512M slab caches in this memcg.
> >   Then the limit of the memcg is reached and the memcg reclaim begins,
> >   and then in this memcg reclaim path it will continuesly reclaim the
> >   slab caches until the sc->priority drops to 0.
> >   After this reclaim stops, you will find there're few slab caches left,
> >   which is less than 20M in my test case.
> >   While after this patch applied the number is greater than 300M and
> >   the sc->priority only drops to 3.
>
> I got a bit exhausted checking that none of these six callsites can
> scribble on some caller's value of current->reclaim_state.
>
> How about we do it at runtime?
>

That's good.
Thanks for your improvement.

> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm/vmscan.c: add checks for incorrect handling of current->reclaim_state
>
> Six sites are presently altering current->reclaim_state.  There is a risk
> that one function stomps on a caller's value.  Use a helper function to
> catch such errors.
>
> Cc: Yafang Shao <laoar.shao@gmail.com>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  mm/vmscan.c |   37 ++++++++++++++++++++++++-------------
>  1 file changed, 24 insertions(+), 13 deletions(-)
>
> --- a/mm/vmscan.c~mm-vmscanc-add-checks-for-incorrect-handling-of-current-reclaim_state
> +++ a/mm/vmscan.c
> @@ -177,6 +177,18 @@ unsigned long vm_total_pages;
>  static LIST_HEAD(shrinker_list);
>  static DECLARE_RWSEM(shrinker_rwsem);
>
> +static void set_task_reclaim_state(struct task_struct *task,
> +                                  struct reclaim_state *rs)
> +{
> +       /* Check for an overwrite */
> +       WARN_ON_ONCE(rs && task->reclaim_state);
> +
> +       /* Check for the nulling of an already-nulled member */
> +       WARN_ON_ONCE(!rs && !task->reclaim_state);
> +
> +       task->reclaim_state = rs;
> +}
> +
>  #ifdef CONFIG_MEMCG_KMEM
>
>  /*
> @@ -3194,13 +3206,13 @@ unsigned long try_to_free_pages(struct z
>         if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask))
>                 return 1;
>
> -       current->reclaim_state = &sc.reclaim_state;
> +       set_task_reclaim_state(current, &sc.reclaim_state);
>         trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
>
>         nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
>
>         trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
> -       current->reclaim_state = NULL;
> +       set_task_reclaim_state(current, NULL);
>
>         return nr_reclaimed;
>  }
> @@ -3223,7 +3235,7 @@ unsigned long mem_cgroup_shrink_node(str
>         };
>         unsigned long lru_pages;
>
> -       current->reclaim_state = &sc.reclaim_state;
> +       set_task_reclaim_state(current, &sc.reclaim_state);
>         sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>                         (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
>
> @@ -3245,7 +3257,7 @@ unsigned long mem_cgroup_shrink_node(str
>                                         cgroup_ino(memcg->css.cgroup),
>                                         sc.nr_reclaimed);
>
> -       current->reclaim_state = NULL;
> +       set_task_reclaim_state(current, NULL);
>         *nr_scanned = sc.nr_scanned;
>
>         return sc.nr_reclaimed;
> @@ -3274,7 +3286,7 @@ unsigned long try_to_free_mem_cgroup_pag
>                 .may_shrinkslab = 1,
>         };
>
> -       current->reclaim_state = &sc.reclaim_state;
> +       set_task_reclaim_state(current, &sc.reclaim_state);
>         /*
>          * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
>          * take care zof from where we get pages. So the node where we start the
> @@ -3299,7 +3311,7 @@ unsigned long try_to_free_mem_cgroup_pag
>         trace_mm_vmscan_memcg_reclaim_end(
>                                 cgroup_ino(memcg->css.cgroup),
>                                 nr_reclaimed);
> -       current->reclaim_state = NULL;
> +       set_task_reclaim_state(current, NULL);
>
>         return nr_reclaimed;
>  }
> @@ -3501,7 +3513,7 @@ static int balance_pgdat(pg_data_t *pgda
>                 .may_unmap = 1,
>         };
>
> -       current->reclaim_state = &sc.reclaim_state;
> +       set_task_reclaim_state(current, &sc.reclaim_state);
>         psi_memstall_enter(&pflags);
>         __fs_reclaim_acquire();
>
> @@ -3683,7 +3695,7 @@ out:
>         snapshot_refaults(NULL, pgdat);
>         __fs_reclaim_release();
>         psi_memstall_leave(&pflags);
> -       current->reclaim_state = NULL;
> +       set_task_reclaim_state(current, NULL);
>
>         /*
>          * Return the order kswapd stopped reclaiming at as
> @@ -3945,17 +3957,16 @@ unsigned long shrink_all_memory(unsigned
>                 .hibernation_mode = 1,
>         };
>         struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
> -       struct task_struct *p = current;
>         unsigned long nr_reclaimed;
>         unsigned int noreclaim_flag;
>
>         fs_reclaim_acquire(sc.gfp_mask);
>         noreclaim_flag = memalloc_noreclaim_save();
> -       p->reclaim_state = &sc.reclaim_state;
> +       set_task_reclaim_state(current, &sc.reclaim_state);
>
>         nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
>
> -       p->reclaim_state = NULL;
> +       set_task_reclaim_state(current, NULL);
>         memalloc_noreclaim_restore(noreclaim_flag);
>         fs_reclaim_release(sc.gfp_mask);
>
> @@ -4144,7 +4155,7 @@ static int __node_reclaim(struct pglist_
>          */
>         noreclaim_flag = memalloc_noreclaim_save();
>         p->flags |= PF_SWAPWRITE;
> -       p->reclaim_state = &sc.reclaim_state;
> +       set_task_reclaim_state(p, &sc.reclaim_state);
>
>         if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
>                 /*
> @@ -4156,7 +4167,7 @@ static int __node_reclaim(struct pglist_
>                 } while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
>         }
>
> -       p->reclaim_state = NULL;
> +       set_task_reclaim_state(p, NULL);
>         current->flags &= ~PF_SWAPWRITE;
>         memalloc_noreclaim_restore(noreclaim_flag);
>         fs_reclaim_release(sc.gfp_mask);
> _
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths
  2019-06-21 10:14 ` [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths Yafang Shao
  2019-06-22  3:30   ` Andrew Morton
@ 2019-06-24  8:53   ` Kirill Tkhai
  2019-06-24 12:30     ` Yafang Shao
  1 sibling, 1 reply; 9+ messages in thread
From: Kirill Tkhai @ 2019-06-24  8:53 UTC (permalink / raw)
  To: Yafang Shao, akpm, mhocko, hannes, vdavydov.dev, mgorman; +Cc: linux-mm

On 21.06.2019 13:14, Yafang Shao wrote:
> There're six different reclaim paths by now,
> - kswapd reclaim path
> - node reclaim path
> - hibernate preallocate memory reclaim path
> - direct reclaim path
> - memcg reclaim path
> - memcg softlimit reclaim path
> 
> The slab caches reclaimed in these paths are only calculated in the above
> three paths.
> 
> There're some drawbacks if we don't calculate the reclaimed slab caches.
> - The sc->nr_reclaimed isn't correct if there're some slab caches
>   relcaimed in this path.
> - The slab caches may be reclaimed thoroughly if there're lots of
>   reclaimable slab caches and few page caches.
>   Let's take an easy example for this case.
>   If one memcg is full of slab caches and the limit of it is 512M, in
>   other words there're approximately 512M slab caches in this memcg.
>   Then the limit of the memcg is reached and the memcg reclaim begins,
>   and then in this memcg reclaim path it will continuesly reclaim the
>   slab caches until the sc->priority drops to 0.
>   After this reclaim stops, you will find there're few slab caches left,
>   which is less than 20M in my test case.
>   While after this patch applied the number is greater than 300M and
>   the sc->priority only drops to 3.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  mm/vmscan.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 18a66e5..d6c3fc8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3164,11 +3164,13 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  	if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask))
>  		return 1;
>  
> +	current->reclaim_state = &sc.reclaim_state;
>  	trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
>  
>  	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
>  
>  	trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
> +	current->reclaim_state = NULL;

Shouldn't we remove reclaim_state assignment from __perform_reclaim() after this?
  
>  	return nr_reclaimed;
>  }
> @@ -3191,6 +3193,7 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg,
>  	};
>  	unsigned long lru_pages;
>  
> +	current->reclaim_state = &sc.reclaim_state;
>  	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>  			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
>  
> @@ -3212,7 +3215,9 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg,
>  					cgroup_ino(memcg->css.cgroup),
>  					sc.nr_reclaimed);
>  
> +	current->reclaim_state = NULL;
>  	*nr_scanned = sc.nr_scanned;
> +
>  	return sc.nr_reclaimed;
>  }
>  
> @@ -3239,6 +3244,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
>  		.may_shrinkslab = 1,
>  	};
>  
> +	current->reclaim_state = &sc.reclaim_state;
>  	/*
>  	 * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
>  	 * take care of from where we get pages. So the node where we start the
> @@ -3263,6 +3269,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
>  	trace_mm_vmscan_memcg_reclaim_end(
>  				cgroup_ino(memcg->css.cgroup),
>  				nr_reclaimed);
> +	current->reclaim_state = NULL;
>  
>  	return nr_reclaimed;
>  }
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths
  2019-06-24  8:53   ` Kirill Tkhai
@ 2019-06-24 12:30     ` Yafang Shao
  2019-06-24 12:33       ` Kirill Tkhai
  0 siblings, 1 reply; 9+ messages in thread
From: Yafang Shao @ 2019-06-24 12:30 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Vladimir Davydov,
	Mel Gorman, Linux MM

On Mon, Jun 24, 2019 at 4:53 PM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>
> On 21.06.2019 13:14, Yafang Shao wrote:
> > There're six different reclaim paths by now,
> > - kswapd reclaim path
> > - node reclaim path
> > - hibernate preallocate memory reclaim path
> > - direct reclaim path
> > - memcg reclaim path
> > - memcg softlimit reclaim path
> >
> > The slab caches reclaimed in these paths are only calculated in the above
> > three paths.
> >
> > There're some drawbacks if we don't calculate the reclaimed slab caches.
> > - The sc->nr_reclaimed isn't correct if there're some slab caches
> >   relcaimed in this path.
> > - The slab caches may be reclaimed thoroughly if there're lots of
> >   reclaimable slab caches and few page caches.
> >   Let's take an easy example for this case.
> >   If one memcg is full of slab caches and the limit of it is 512M, in
> >   other words there're approximately 512M slab caches in this memcg.
> >   Then the limit of the memcg is reached and the memcg reclaim begins,
> >   and then in this memcg reclaim path it will continuesly reclaim the
> >   slab caches until the sc->priority drops to 0.
> >   After this reclaim stops, you will find there're few slab caches left,
> >   which is less than 20M in my test case.
> >   While after this patch applied the number is greater than 300M and
> >   the sc->priority only drops to 3.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  mm/vmscan.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 18a66e5..d6c3fc8 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -3164,11 +3164,13 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> >       if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask))
> >               return 1;
> >
> > +     current->reclaim_state = &sc.reclaim_state;
> >       trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
> >
> >       nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
> >
> >       trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
> > +     current->reclaim_state = NULL;
>
> Shouldn't we remove reclaim_state assignment from __perform_reclaim() after this?
>

Oh yes. We should remove it. Thanks for pointing out.
I will post a fix soon.

Thanks
Yafang

> >       return nr_reclaimed;
> >  }
> > @@ -3191,6 +3193,7 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg,
> >       };
> >       unsigned long lru_pages;
> >
> > +     current->reclaim_state = &sc.reclaim_state;
> >       sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> >                       (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> >
> > @@ -3212,7 +3215,9 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg,
> >                                       cgroup_ino(memcg->css.cgroup),
> >                                       sc.nr_reclaimed);
> >
> > +     current->reclaim_state = NULL;
> >       *nr_scanned = sc.nr_scanned;
> > +
> >       return sc.nr_reclaimed;
> >  }
> >
> > @@ -3239,6 +3244,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
> >               .may_shrinkslab = 1,
> >       };
> >
> > +     current->reclaim_state = &sc.reclaim_state;
> >       /*
> >        * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't
> >        * take care of from where we get pages. So the node where we start the
> > @@ -3263,6 +3269,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
> >       trace_mm_vmscan_memcg_reclaim_end(
> >                               cgroup_ino(memcg->css.cgroup),
> >                               nr_reclaimed);
> > +     current->reclaim_state = NULL;
> >
> >       return nr_reclaimed;
> >  }
> >
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths
  2019-06-24 12:30     ` Yafang Shao
@ 2019-06-24 12:33       ` Kirill Tkhai
  2019-06-24 12:40         ` Yafang Shao
  0 siblings, 1 reply; 9+ messages in thread
From: Kirill Tkhai @ 2019-06-24 12:33 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Vladimir Davydov,
	Mel Gorman, Linux MM

On 24.06.2019 15:30, Yafang Shao wrote:
> On Mon, Jun 24, 2019 at 4:53 PM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>>
>> On 21.06.2019 13:14, Yafang Shao wrote:
>>> There're six different reclaim paths by now,
>>> - kswapd reclaim path
>>> - node reclaim path
>>> - hibernate preallocate memory reclaim path
>>> - direct reclaim path
>>> - memcg reclaim path
>>> - memcg softlimit reclaim path
>>>
>>> The slab caches reclaimed in these paths are only calculated in the above
>>> three paths.
>>>
>>> There're some drawbacks if we don't calculate the reclaimed slab caches.
>>> - The sc->nr_reclaimed isn't correct if there're some slab caches
>>>   relcaimed in this path.
>>> - The slab caches may be reclaimed thoroughly if there're lots of
>>>   reclaimable slab caches and few page caches.
>>>   Let's take an easy example for this case.
>>>   If one memcg is full of slab caches and the limit of it is 512M, in
>>>   other words there're approximately 512M slab caches in this memcg.
>>>   Then the limit of the memcg is reached and the memcg reclaim begins,
>>>   and then in this memcg reclaim path it will continuesly reclaim the
>>>   slab caches until the sc->priority drops to 0.
>>>   After this reclaim stops, you will find there're few slab caches left,
>>>   which is less than 20M in my test case.
>>>   While after this patch applied the number is greater than 300M and
>>>   the sc->priority only drops to 3.
>>>
>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>>> ---
>>>  mm/vmscan.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 18a66e5..d6c3fc8 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -3164,11 +3164,13 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>>>       if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask))
>>>               return 1;
>>>
>>> +     current->reclaim_state = &sc.reclaim_state;
>>>       trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
>>>
>>>       nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
>>>
>>>       trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
>>> +     current->reclaim_state = NULL;
>>
>> Shouldn't we remove reclaim_state assignment from __perform_reclaim() after this?
>>
> 
> Oh yes. We should remove it. Thanks for pointing out.
> I will post a fix soon.

With the change above, feel free to add my Reviewed-by: to all of the series.

Kirill


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths
  2019-06-24 12:33       ` Kirill Tkhai
@ 2019-06-24 12:40         ` Yafang Shao
  0 siblings, 0 replies; 9+ messages in thread
From: Yafang Shao @ 2019-06-24 12:40 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Vladimir Davydov,
	Mel Gorman, Linux MM

On Mon, Jun 24, 2019 at 8:33 PM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>
> On 24.06.2019 15:30, Yafang Shao wrote:
> > On Mon, Jun 24, 2019 at 4:53 PM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> >>
> >> On 21.06.2019 13:14, Yafang Shao wrote:
> >>> There're six different reclaim paths by now,
> >>> - kswapd reclaim path
> >>> - node reclaim path
> >>> - hibernate preallocate memory reclaim path
> >>> - direct reclaim path
> >>> - memcg reclaim path
> >>> - memcg softlimit reclaim path
> >>>
> >>> The slab caches reclaimed in these paths are only calculated in the above
> >>> three paths.
> >>>
> >>> There're some drawbacks if we don't calculate the reclaimed slab caches.
> >>> - The sc->nr_reclaimed isn't correct if there're some slab caches
> >>>   relcaimed in this path.
> >>> - The slab caches may be reclaimed thoroughly if there're lots of
> >>>   reclaimable slab caches and few page caches.
> >>>   Let's take an easy example for this case.
> >>>   If one memcg is full of slab caches and the limit of it is 512M, in
> >>>   other words there're approximately 512M slab caches in this memcg.
> >>>   Then the limit of the memcg is reached and the memcg reclaim begins,
> >>>   and then in this memcg reclaim path it will continuesly reclaim the
> >>>   slab caches until the sc->priority drops to 0.
> >>>   After this reclaim stops, you will find there're few slab caches left,
> >>>   which is less than 20M in my test case.
> >>>   While after this patch applied the number is greater than 300M and
> >>>   the sc->priority only drops to 3.
> >>>
> >>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >>> ---
> >>>  mm/vmscan.c | 7 +++++++
> >>>  1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>> index 18a66e5..d6c3fc8 100644
> >>> --- a/mm/vmscan.c
> >>> +++ b/mm/vmscan.c
> >>> @@ -3164,11 +3164,13 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> >>>       if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask))
> >>>               return 1;
> >>>
> >>> +     current->reclaim_state = &sc.reclaim_state;
> >>>       trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
> >>>
> >>>       nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
> >>>
> >>>       trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
> >>> +     current->reclaim_state = NULL;
> >>
> >> Shouldn't we remove reclaim_state assignment from __perform_reclaim() after this?
> >>
> >
> > Oh yes. We should remove it. Thanks for pointing out.
> > I will post a fix soon.
>
> With the change above, feel free to add my Reviewed-by: to all of the series.
>

Sure, thanks for your review.

Thanks
Yafang


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-24 12:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-21 10:14 [PATCH 0/2] mm/vmscan: calculate reclaimed slab in all reclaim paths Yafang Shao
2019-06-21 10:14 ` [PATCH 1/2] mm/vmscan: add a new member reclaim_state in struct shrink_control Yafang Shao
2019-06-21 10:14 ` [PATCH 2/2] mm/vmscan: calculate reclaimed slab caches in all reclaim paths Yafang Shao
2019-06-22  3:30   ` Andrew Morton
2019-06-22  6:31     ` Yafang Shao
2019-06-24  8:53   ` Kirill Tkhai
2019-06-24 12:30     ` Yafang Shao
2019-06-24 12:33       ` Kirill Tkhai
2019-06-24 12:40         ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).