All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] mm: improvements in shrink slab
@ 2019-06-06 10:14 Yafang Shao
  2019-06-06 10:14 ` [PATCH v4 1/3] mm/vmstat: expose min_slab_pages in /proc/zoneinfo Yafang Shao
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Yafang Shao @ 2019-06-06 10:14 UTC (permalink / raw)
  To: akpm, mhocko, linux.bhar; +Cc: linux-mm, shaoyafang, Yafang Shao

In the past few days, I found an issue in shrink slab.
We I was trying to fix it, I find there are something in shrink slab need
to be improved.

- #1 is to expose the min_slab_pages to help us analyze shrink slab.

- #2 is an code improvement.

- #3 is a fix to a issue. This issue is very easy to produce.
In the zone reclaim mode.
First you continuously cat a random non-exist file to produce
more and more dentry, then you read big file to produce page cache.
Finally you will find that the denty will never be shrunk.
In order to fix this issue, a new bitmask no_pagecache is introduce,
which is 0 by defalt.

Changes since v3:
A new bitmask no_pagecache is introduced in struct scan_control.

Yafang Shao (3):
  mm/vmstat: expose min_slab_pages in /proc/zoneinfo
  mm/vmscan: change return type of shrink_node() to void
  mm/vmscan: shrink slab in node reclaim

 mm/vmscan.c | 33 +++++++++++++++++++++++++++++----
 mm/vmstat.c |  8 ++++++++
 2 files changed, 37 insertions(+), 4 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/3] mm/vmstat: expose min_slab_pages in /proc/zoneinfo
  2019-06-06 10:14 [PATCH v4 0/3] mm: improvements in shrink slab Yafang Shao
@ 2019-06-06 10:14 ` Yafang Shao
  2019-06-06 10:14 ` [PATCH v4 2/3] mm/vmscan: change return type of shrink_node() to void Yafang Shao
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Yafang Shao @ 2019-06-06 10:14 UTC (permalink / raw)
  To: akpm, mhocko, linux.bhar; +Cc: linux-mm, shaoyafang, Yafang Shao

On one of our servers, we find the dentry is continuously growing
without shrinking. We're not sure whether that is because reclaimable
slab is still less than min_slab_pages.
So if we expose min_slab_pages, it would be easy to compare.

As we can set min_slab_ratio with sysctl, we should expose the effective
in_slab_pages to user as well.

That is same with min_unmapped_pages.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/vmstat.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index fd7e16c..2e1c608 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1550,7 +1550,15 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 				NR_VM_NUMA_STAT_ITEMS],
 				node_page_state(pgdat, i));
 		}
+
+#ifdef CONFIG_NUMA
+		seq_printf(m, "\n      %-12s %lu", "min_slab",
+			   pgdat->min_slab_pages);
+		seq_printf(m, "\n      %-12s %lu", "min_unmapped",
+			   pgdat->min_unmapped_pages);
+#endif
 	}
+
 	seq_printf(m,
 		   "\n  pages free     %lu"
 		   "\n        min      %lu"
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/3] mm/vmscan: change return type of shrink_node() to void
  2019-06-06 10:14 [PATCH v4 0/3] mm: improvements in shrink slab Yafang Shao
  2019-06-06 10:14 ` [PATCH v4 1/3] mm/vmstat: expose min_slab_pages in /proc/zoneinfo Yafang Shao
@ 2019-06-06 10:14 ` Yafang Shao
  2019-06-06 10:14 ` [PATCH v4 3/3] mm/vmscan: shrink slab in node reclaim Yafang Shao
  2019-06-06 11:17 ` [PATCH v4 0/3] mm: improvements in shrink slab Michal Hocko
  3 siblings, 0 replies; 9+ messages in thread
From: Yafang Shao @ 2019-06-06 10:14 UTC (permalink / raw)
  To: akpm, mhocko, linux.bhar; +Cc: linux-mm, shaoyafang, Yafang Shao

As the return value of shrink_node() isn't used by any callsites,
we'd better change the return type of shrink_node() from static inline
to void.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/vmscan.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 84dcb65..281bfa9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2687,7 +2687,7 @@ static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)
 		(memcg && memcg_congested(pgdat, memcg));
 }
 
-static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
+static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 {
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	unsigned long nr_reclaimed, nr_scanned;
@@ -2857,8 +2857,6 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 	 */
 	if (reclaimable)
 		pgdat->kswapd_failures = 0;
-
-	return reclaimable;
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/3] mm/vmscan: shrink slab in node reclaim
  2019-06-06 10:14 [PATCH v4 0/3] mm: improvements in shrink slab Yafang Shao
  2019-06-06 10:14 ` [PATCH v4 1/3] mm/vmstat: expose min_slab_pages in /proc/zoneinfo Yafang Shao
  2019-06-06 10:14 ` [PATCH v4 2/3] mm/vmscan: change return type of shrink_node() to void Yafang Shao
@ 2019-06-06 10:14 ` Yafang Shao
  2019-06-06 11:17 ` [PATCH v4 0/3] mm: improvements in shrink slab Michal Hocko
  3 siblings, 0 replies; 9+ messages in thread
From: Yafang Shao @ 2019-06-06 10:14 UTC (permalink / raw)
  To: akpm, mhocko, linux.bhar; +Cc: linux-mm, shaoyafang, Yafang Shao

In the node reclaim, may_shrinkslab is 0 by default,
hence shrink_slab will never be performed in it.
While shrik_slab should be performed if the relcaimable slab is over
min slab limit.

If reclaimable pagecache is less than min_unmapped_pages while
reclaimable slab is greater than min_slab_pages, we only shrink slab.
Otherwise the min_unmapped_pages will be useless under this condition.
A new bitmask no_pagecache is introduced in scan_control for this
purpose, which is 0 by default.

reclaim_state.reclaimed_slab will tell us how many pages are
reclaimed in shrink slab.

This issue is very easy to produce, first you continuously cat a random
non-exist file to produce more and more dentry, then you read big file
to produce page cache. And finally you will find that the denty will
never be shrunk in node reclaim.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/vmscan.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 281bfa9..da53ed6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -91,6 +91,9 @@ struct scan_control {
 	/* e.g. boosted watermark reclaim leaves slabs alone */
 	unsigned int may_shrinkslab:1;
 
+	/* in node relcaim mode, we may shrink slab only */
+	unsigned int no_pagecache:1;
+
 	/*
 	 * Cgroups are not reclaimed below their configured memory.low,
 	 * unless we threaten to OOM. If any cgroups are skipped due to
@@ -2744,7 +2747,9 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 
 			reclaimed = sc->nr_reclaimed;
 			scanned = sc->nr_scanned;
-			shrink_node_memcg(pgdat, memcg, sc);
+
+			if (!sc->no_pagecache)
+				shrink_node_memcg(pgdat, memcg, sc);
 
 			if (sc->may_shrinkslab) {
 				shrink_slab(sc->gfp_mask, pgdat->node_id,
@@ -4176,6 +4181,10 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 		.may_writepage = !!(node_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
 		.may_swap = 1,
+		.may_shrinkslab = (node_page_state(pgdat, NR_SLAB_RECLAIMABLE) >
+				  pgdat->min_slab_pages),
+		.no_pagecache = !(node_pagecache_reclaimable(pgdat) >
+				pgdat->min_unmapped_pages),
 		.reclaim_idx = gfp_zone(gfp_mask),
 	};
 
@@ -4194,15 +4203,13 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
-		/*
-		 * Free memory by calling shrink node with increasing
-		 * priorities until we have enough memory freed.
-		 */
-		do {
-			shrink_node(pgdat, &sc);
-		} while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
-	}
+	/*
+	 * Free memory by calling shrink node with increasing
+	 * priorities until we have enough memory freed.
+	 */
+	do {
+		shrink_node(pgdat, &sc);
+	} while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
 
 	p->reclaim_state = NULL;
 	current->flags &= ~PF_SWAPWRITE;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] mm: improvements in shrink slab
  2019-06-06 10:14 [PATCH v4 0/3] mm: improvements in shrink slab Yafang Shao
                   ` (2 preceding siblings ...)
  2019-06-06 10:14 ` [PATCH v4 3/3] mm/vmscan: shrink slab in node reclaim Yafang Shao
@ 2019-06-06 11:17 ` Michal Hocko
  2019-06-06 14:18   ` Yafang Shao
  3 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2019-06-06 11:17 UTC (permalink / raw)
  To: Yafang Shao; +Cc: akpm, linux.bhar, linux-mm, shaoyafang

On Thu 06-06-19 18:14:37, Yafang Shao wrote:
> In the past few days, I found an issue in shrink slab.
> We I was trying to fix it, I find there are something in shrink slab need
> to be improved.
> 
> - #1 is to expose the min_slab_pages to help us analyze shrink slab.
> 
> - #2 is an code improvement.
> 
> - #3 is a fix to a issue. This issue is very easy to produce.
> In the zone reclaim mode.
> First you continuously cat a random non-exist file to produce
> more and more dentry, then you read big file to produce page cache.
> Finally you will find that the denty will never be shrunk.
> In order to fix this issue, a new bitmask no_pagecache is introduce,
> which is 0 by defalt.

Node reclaim mode is quite special and rarely used these days. Could you
be more specific on how did you get to see the above problems? Do you
really need node reclaim in your usecases or is this more about a
testing and seeing what happens. Not that I am against these changes but
I would like to understand the motivation. Especially because you are
exposing some internal implementation details of the node reclaim to the
userspace.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] mm: improvements in shrink slab
  2019-06-06 11:17 ` [PATCH v4 0/3] mm: improvements in shrink slab Michal Hocko
@ 2019-06-06 14:18   ` Yafang Shao
  2019-06-06 14:44     ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Yafang Shao @ 2019-06-06 14:18 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Bharath Vedartham, Linux MM, shaoyafang

[-- Attachment #1: Type: text/plain, Size: 2415 bytes --]

On Thu, Jun 6, 2019 at 7:17 PM Michal Hocko <mhocko@suse.com> wrote:

> On Thu 06-06-19 18:14:37, Yafang Shao wrote:
> > In the past few days, I found an issue in shrink slab.
> > We I was trying to fix it, I find there are something in shrink slab need
> > to be improved.
> >
> > - #1 is to expose the min_slab_pages to help us analyze shrink slab.
> >
> > - #2 is an code improvement.
> >
> > - #3 is a fix to a issue. This issue is very easy to produce.
> > In the zone reclaim mode.
> > First you continuously cat a random non-exist file to produce
> > more and more dentry, then you read big file to produce page cache.
> > Finally you will find that the denty will never be shrunk.
> > In order to fix this issue, a new bitmask no_pagecache is introduce,
> > which is 0 by defalt.
>
> Node reclaim mode is quite special and rarely used these days. Could you
> be more specific on how did you get to see the above problems? Do you
> really need node reclaim in your usecases or is this more about a
> testing and seeing what happens. Not that I am against these changes but
> I would like to understand the motivation. Especially because you are
> exposing some internal implementation details of the node reclaim to the
> userspace.
>
>
The slab issue we found on our server is on old kernel (kernel-3.10).
We found that the dentry was continuesly growing without shrinking in one
container on a server,
so I read slab code and found that memcg relcaim can't shrink slab in this
old kenrel,
but this issue was aready fixed in upstream.

When I was reading the shrink slab code in the upstream kernel,
I found the slab can't be shrinked in node reclaim.
So I did some test to produce this issue and post this patchset to fix it.
With my patch, the issue produced by me disapears.

But this is only a beginning in the node reclaim path...
Then I found another issue when I implemented a memory pressure monitor for
out containers,
which is vmpressure_prio() is missed in the node reclaim path.

Well, seems when we introduce new feature for page relciam, we always
ignore the node reclaim path.

Regarding node reclaim path, we always turn it off on our servers,
because we really found some latency spike caused by node reclaim
(the reason why node reclaim is turned on is not clear).

The reason I expose node reclaim details to userspace is because the user
can set node reclaim details now.

Thanks
Yafang

[-- Attachment #2: Type: text/html, Size: 3106 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] mm: improvements in shrink slab
  2019-06-06 14:18   ` Yafang Shao
@ 2019-06-06 14:44     ` Michal Hocko
  2019-06-06 15:03       ` Yafang Shao
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2019-06-06 14:44 UTC (permalink / raw)
  To: Yafang Shao; +Cc: Andrew Morton, Bharath Vedartham, Linux MM, shaoyafang

On Thu 06-06-19 22:18:41, Yafang Shao wrote:
[...]
> Well, seems when we introduce new feature for page relciam, we always
> ignore the node reclaim path.

Yes, node reclaim is quite weird and I am not really sure whether we
still have many users these days. It used to be mostly driven by
artificial benchmarks which highly benefit from the local node access.
We have turned off its automatic enabling when there are nodes with
higher access latency quite some time ago without anybody noticing
actually.

> Regarding node reclaim path, we always turn it off on our servers,
> because we really found some latency spike caused by node reclaim
> (the reason why node reclaim is turned on is not clear).

Yes, that was the case and the reason it is not enabled by default.

> The reason I expose node reclaim details to userspace is because the user
> can set node reclaim details now.

Well, just because somebody _can_ enable it doesn't sound like a
sufficient justification to expose even more implementation details of
this feature. I am not really sure there is a strong reason to touch the
code without a real usecase behind.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] mm: improvements in shrink slab
  2019-06-06 14:44     ` Michal Hocko
@ 2019-06-06 15:03       ` Yafang Shao
  2019-06-06 15:33         ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Yafang Shao @ 2019-06-06 15:03 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Bharath Vedartham, Linux MM, shaoyafang

[-- Attachment #1: Type: text/plain, Size: 1328 bytes --]

On Thu, Jun 6, 2019 at 10:44 PM Michal Hocko <mhocko@suse.com> wrote:

> On Thu 06-06-19 22:18:41, Yafang Shao wrote:
> [...]
> > Well, seems when we introduce new feature for page relciam, we always
> > ignore the node reclaim path.
>
> Yes, node reclaim is quite weird and I am not really sure whether we
> still have many users these days. It used to be mostly driven by
> artificial benchmarks which highly benefit from the local node access.
> We have turned off its automatic enabling when there are nodes with
> higher access latency quite some time ago without anybody noticing
> actually.
>
> > Regarding node reclaim path, we always turn it off on our servers,
> > because we really found some latency spike caused by node reclaim
> > (the reason why node reclaim is turned on is not clear).
>
> Yes, that was the case and the reason it is not enabled by default.
>
> > The reason I expose node reclaim details to userspace is because the user
> > can set node reclaim details now.
>
> Well, just because somebody _can_ enable it doesn't sound like a
> sufficient justification to expose even more implementation details of
> this feature. I am not really sure there is a strong reason to touch the
> code without a real usecase behind.
>
>
Got it.

So should we fix the bugs in node reclaim path then?

Thanks
Yafang

[-- Attachment #2: Type: text/html, Size: 1820 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] mm: improvements in shrink slab
  2019-06-06 15:03       ` Yafang Shao
@ 2019-06-06 15:33         ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2019-06-06 15:33 UTC (permalink / raw)
  To: Yafang Shao; +Cc: Andrew Morton, Bharath Vedartham, Linux MM, shaoyafang

On Thu 06-06-19 23:03:12, Yafang Shao wrote:
> On Thu, Jun 6, 2019 at 10:44 PM Michal Hocko <mhocko@suse.com> wrote:
> 
> > On Thu 06-06-19 22:18:41, Yafang Shao wrote:
[...]
> > > The reason I expose node reclaim details to userspace is because the user
> > > can set node reclaim details now.
> >
> > Well, just because somebody _can_ enable it doesn't sound like a
> > sufficient justification to expose even more implementation details of
> > this feature. I am not really sure there is a strong reason to touch the
> > code without a real usecase behind.
> >
> >
> Got it.
> 
> So should we fix the bugs in node reclaim path then?

I am all for fixing bugs, I am just nervous to expose even more internal
implementation details to the userspace if there is no _real_ usecase
requiring it.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-06 15:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-06 10:14 [PATCH v4 0/3] mm: improvements in shrink slab Yafang Shao
2019-06-06 10:14 ` [PATCH v4 1/3] mm/vmstat: expose min_slab_pages in /proc/zoneinfo Yafang Shao
2019-06-06 10:14 ` [PATCH v4 2/3] mm/vmscan: change return type of shrink_node() to void Yafang Shao
2019-06-06 10:14 ` [PATCH v4 3/3] mm/vmscan: shrink slab in node reclaim Yafang Shao
2019-06-06 11:17 ` [PATCH v4 0/3] mm: improvements in shrink slab Michal Hocko
2019-06-06 14:18   ` Yafang Shao
2019-06-06 14:44     ` Michal Hocko
2019-06-06 15:03       ` Yafang Shao
2019-06-06 15:33         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.