linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
@ 2020-09-08 14:24 zangchunxin
  2020-09-08 15:09 ` Chris Down
  0 siblings, 1 reply; 7+ messages in thread
From: zangchunxin @ 2020-09-08 14:24 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, Chunxin Zang, Muchun Song

From: Chunxin Zang <zangchunxin@bytedance.com>

On our server, there are about 10k memcg in one machine. They use memory
very frequently. When I tigger drop caches,the process will infinite loop
in drop_slab_node.
There are two reasons:
1.We have too many memcgs, even though one object freed in one memcg, the
  sum of object is bigger than 10.
2.We spend a lot of time in traverse memcg once. So, the memcg who
  traversed at the first have been freed many objects. Traverse memcg next
  time, the freed count bigger than 10 again.

We can get the following info through 'ps':

  root:~# ps -aux | grep drop
  root  357956 ... R    Aug25 21119854:55 echo 3 > /proc/sys/vm/drop_caches
  root 1771385 ... R    Aug16 21146421:17 echo 3 > /proc/sys/vm/drop_caches
  root 1986319 ... R    18:56 117:27 echo 3 > /proc/sys/vm/drop_caches
  root 2002148 ... R    Aug24 5720:39 echo 3 > /proc/sys/vm/drop_caches
  root 2564666 ... R    18:59 113:58 echo 3 > /proc/sys/vm/drop_caches
  root 2639347 ... R    Sep03 2383:39 echo 3 > /proc/sys/vm/drop_caches
  root 3904747 ... R    03:35 993:31 echo 3 > /proc/sys/vm/drop_caches
  root 4016780 ... R    Aug21 7882:18 echo 3 > /proc/sys/vm/drop_caches

Use bpftrace follow 'freed' value in drop_slab_node:

  root:~# bpftrace -e 'kprobe:drop_slab_node+70 {@ret=hist(reg("bp")); }'
  Attaching 1 probe...
  ^B^C

  @ret:
  [64, 128)        1 |                                                    |
  [128, 256)      28 |                                                    |
  [256, 512)     107 |@                                                   |
  [512, 1K)      298 |@@@                                                 |
  [1K, 2K)       613 |@@@@@@@                                             |
  [2K, 4K)      4435 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
  [4K, 8K)       442 |@@@@@                                               |
  [8K, 16K)      299 |@@@                                                 |
  [16K, 32K)     100 |@                                                   |
  [32K, 64K)     139 |@                                                   |
  [64K, 128K)     56 |                                                    |
  [128K, 256K)    26 |                                                    |
  [256K, 512K)     2 |                                                    |

In one drop caches action, only traverse memcg once maybe is better.
If user need more memory, they can do drop caches again.

Signed-off-by: Chunxin Zang <zangchunxin@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/vmscan.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b6d84326bdf2..9d8ee2ae5824 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -699,17 +699,12 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
 
 void drop_slab_node(int nid)
 {
-	unsigned long freed;
+	struct mem_cgroup *memcg = NULL;
 
+	memcg = mem_cgroup_iter(NULL, NULL, NULL);
 	do {
-		struct mem_cgroup *memcg = NULL;
-
-		freed = 0;
-		memcg = mem_cgroup_iter(NULL, NULL, NULL);
-		do {
-			freed += shrink_slab(GFP_KERNEL, nid, memcg, 0);
-		} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
-	} while (freed > 10);
+		shrink_slab(GFP_KERNEL, nid, memcg, 0);
+	} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
 }
 
 void drop_slab(void)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
  2020-09-08 14:24 [PATCH] mm/vmscan: fix infinite loop in drop_slab_node zangchunxin
@ 2020-09-08 15:09 ` Chris Down
  2020-09-08 15:42   ` Vlastimil Babka
  2020-09-09  4:19   ` [External] " Muchun Song
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Down @ 2020-09-08 15:09 UTC (permalink / raw)
  To: zangchunxin; +Cc: akpm, linux-mm, linux-kernel, Muchun Song

drop_caches by its very nature can be extremely performance intensive -- if 
someone wants to abort after trying too long, they can just send a 
TASK_KILLABLE signal, no? If exiting the loop and returning to usermode doesn't 
reliably work when doing that, then _that's_ something to improve, but this 
looks premature to me until that's demonstrated not to work.

zangchunxin@bytedance.com writes:
>In one drop caches action, only traverse memcg once maybe is better.
>If user need more memory, they can do drop caches again.

Can you please provide some measurements of the difference in reclamation in 
practice?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
  2020-09-08 15:09 ` Chris Down
@ 2020-09-08 15:42   ` Vlastimil Babka
  2020-09-08 17:55     ` Chris Down
  2020-09-09  4:19   ` [External] " Muchun Song
  1 sibling, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2020-09-08 15:42 UTC (permalink / raw)
  To: Chris Down, zangchunxin; +Cc: akpm, linux-mm, linux-kernel, Muchun Song

On 9/8/20 5:09 PM, Chris Down wrote:
> drop_caches by its very nature can be extremely performance intensive -- if 
> someone wants to abort after trying too long, they can just send a 
> TASK_KILLABLE signal, no? If exiting the loop and returning to usermode doesn't 
> reliably work when doing that, then _that's_ something to improve, but this 
> looks premature to me until that's demonstrated not to work.

Hm there might be existings scripts (even though I dislike those) running
drop_caches periodically, and they are currently not set up to be killed, so one
day it might surprise someone. Dropping should be a one-time event, not a
continual reclaim.

Maybe we could be a bit smarter and e.g. double the threshold currently
hardcoded as "10" with each iteration?

> zangchunxin@bytedance.com writes:
>>In one drop caches action, only traverse memcg once maybe is better.
>>If user need more memory, they can do drop caches again.
> 
> Can you please provide some measurements of the difference in reclamation in 
> practice?
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
  2020-09-08 15:42   ` Vlastimil Babka
@ 2020-09-08 17:55     ` Chris Down
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Down @ 2020-09-08 17:55 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: zangchunxin, akpm, linux-mm, linux-kernel, Muchun Song

Vlastimil Babka writes:
>On 9/8/20 5:09 PM, Chris Down wrote:
>> drop_caches by its very nature can be extremely performance intensive -- if
>> someone wants to abort after trying too long, they can just send a
>> TASK_KILLABLE signal, no? If exiting the loop and returning to usermode doesn't
>> reliably work when doing that, then _that's_ something to improve, but this
>> looks premature to me until that's demonstrated not to work.
>
>Hm there might be existings scripts (even though I dislike those) running
>drop_caches periodically, and they are currently not set up to be killed, so one
>day it might surprise someone. Dropping should be a one-time event, not a
>continual reclaim.

Sure, but these scripts can send it to their child themselves instead of using 
WCE, no? As far as I know, that's the already supported way to abort 
drop_caches-induced reclaim.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
  2020-09-08 15:09 ` Chris Down
  2020-09-08 15:42   ` Vlastimil Babka
@ 2020-09-09  4:19   ` Muchun Song
  2020-09-09  9:59     ` Chris Down
  1 sibling, 1 reply; 7+ messages in thread
From: Muchun Song @ 2020-09-09  4:19 UTC (permalink / raw)
  To: Chris Down, zangchunxin; +Cc: Andrew Morton, Linux Memory Management List, LKML

Hi Chris,

On Tue, Sep 8, 2020 at 11:09 PM Chris Down <chris@chrisdown.name> wrote:
>
> drop_caches by its very nature can be extremely performance intensive -- if
> someone wants to abort after trying too long, they can just send a
> TASK_KILLABLE signal, no? If exiting the loop and returning to usermode doesn't
> reliably work when doing that, then _that's_ something to improve, but this
> looks premature to me until that's demonstrated not to work.

Sending a TASK_KILLABLE signal? It didn't work now. Because the the
current task has no chance to handle the signal. So I think we may
need to do any of the following things to avoid this case happening.

1. Double the threshold currently hard coded as "10" with each iteration
    suggested by Vlastimil. It is also a good idea.

2. In the while loop, we can check whether the TASK_KILLABLE
    signal is set, if so, we should break the loop. like the following code
    snippe. Thanks.

@@ -704,6 +704,9 @@ void drop_slab_node(int nid)
  do {
  struct mem_cgroup *memcg = NULL;

+ if (fatal_signal_pending(current))
+ return;
+
  freed = 0;
  memcg = mem_cgroup_iter(NULL, NULL, NULL);
  do {

>
> zangchunxin@bytedance.com writes:
> >In one drop caches action, only traverse memcg once maybe is better.
> >If user need more memory, they can do drop caches again.
>
> Can you please provide some measurements of the difference in reclamation in
> practice?



--
Yours,
Muchun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
  2020-09-09  4:19   ` [External] " Muchun Song
@ 2020-09-09  9:59     ` Chris Down
  2020-09-09 12:10       ` Muchun Song
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Down @ 2020-09-09  9:59 UTC (permalink / raw)
  To: Muchun Song
  Cc: zangchunxin, Andrew Morton, Linux Memory Management List, LKML

Muchun Song writes:
>1. Double the threshold currently hard coded as "10" with each iteration
>    suggested by Vlastimil. It is also a good idea.

I think this sounds reasonable, although I'd like to see what the difference in 
reclaim looks like in practice.

>2. In the while loop, we can check whether the TASK_KILLABLE
>    signal is set, if so, we should break the loop. like the following code
>    snippe. Thanks.
>
>@@ -704,6 +704,9 @@ void drop_slab_node(int nid)
>  do {
>  struct mem_cgroup *memcg = NULL;
>
>+ if (fatal_signal_pending(current))
>+ return;
>+
>  freed = 0;
>  memcg = mem_cgroup_iter(NULL, NULL, NULL);
>  do {

Regardless of anything, I think this is probably a good idea. Could you send it 
as a patch? :-)

Thanks,

Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
  2020-09-09  9:59     ` Chris Down
@ 2020-09-09 12:10       ` Muchun Song
  0 siblings, 0 replies; 7+ messages in thread
From: Muchun Song @ 2020-09-09 12:10 UTC (permalink / raw)
  To: Chris Down; +Cc: zangchunxin, Andrew Morton, Linux Memory Management List, LKML

On Wed, Sep 9, 2020 at 5:59 PM Chris Down <chris@chrisdown.name> wrote:
>
> Muchun Song writes:
> >1. Double the threshold currently hard coded as "10" with each iteration
> >    suggested by Vlastimil. It is also a good idea.
>
> I think this sounds reasonable, although I'd like to see what the difference in
> reclaim looks like in practice.
>
> >2. In the while loop, we can check whether the TASK_KILLABLE
> >    signal is set, if so, we should break the loop. like the following code
> >    snippe. Thanks.
> >
> >@@ -704,6 +704,9 @@ void drop_slab_node(int nid)
> >  do {
> >  struct mem_cgroup *memcg = NULL;
> >
> >+ if (fatal_signal_pending(current))
> >+ return;
> >+
> >  freed = 0;
> >  memcg = mem_cgroup_iter(NULL, NULL, NULL);
> >  do {
>
> Regardless of anything, I think this is probably a good idea. Could you send it
> as a patch? :-)

OK, Will do that thanks.

>
> Thanks,
>
> Chris



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-09 14:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08 14:24 [PATCH] mm/vmscan: fix infinite loop in drop_slab_node zangchunxin
2020-09-08 15:09 ` Chris Down
2020-09-08 15:42   ` Vlastimil Babka
2020-09-08 17:55     ` Chris Down
2020-09-09  4:19   ` [External] " Muchun Song
2020-09-09  9:59     ` Chris Down
2020-09-09 12:10       ` Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).