* [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
@ 2020-09-08 14:24 zangchunxin
2020-09-08 15:09 ` Chris Down
0 siblings, 1 reply; 7+ messages in thread
From: zangchunxin @ 2020-09-08 14:24 UTC (permalink / raw)
To: akpm; +Cc: linux-mm, linux-kernel, Chunxin Zang, Muchun Song
From: Chunxin Zang <zangchunxin@bytedance.com>
On our server, there are about 10k memcg in one machine. They use memory
very frequently. When I tigger drop caches,the process will infinite loop
in drop_slab_node.
There are two reasons:
1.We have too many memcgs, even though one object freed in one memcg, the
sum of object is bigger than 10.
2.We spend a lot of time in traverse memcg once. So, the memcg who
traversed at the first have been freed many objects. Traverse memcg next
time, the freed count bigger than 10 again.
We can get the following info through 'ps':
root:~# ps -aux | grep drop
root 357956 ... R Aug25 21119854:55 echo 3 > /proc/sys/vm/drop_caches
root 1771385 ... R Aug16 21146421:17 echo 3 > /proc/sys/vm/drop_caches
root 1986319 ... R 18:56 117:27 echo 3 > /proc/sys/vm/drop_caches
root 2002148 ... R Aug24 5720:39 echo 3 > /proc/sys/vm/drop_caches
root 2564666 ... R 18:59 113:58 echo 3 > /proc/sys/vm/drop_caches
root 2639347 ... R Sep03 2383:39 echo 3 > /proc/sys/vm/drop_caches
root 3904747 ... R 03:35 993:31 echo 3 > /proc/sys/vm/drop_caches
root 4016780 ... R Aug21 7882:18 echo 3 > /proc/sys/vm/drop_caches
Use bpftrace follow 'freed' value in drop_slab_node:
root:~# bpftrace -e 'kprobe:drop_slab_node+70 {@ret=hist(reg("bp")); }'
Attaching 1 probe...
^B^C
@ret:
[64, 128) 1 | |
[128, 256) 28 | |
[256, 512) 107 |@ |
[512, 1K) 298 |@@@ |
[1K, 2K) 613 |@@@@@@@ |
[2K, 4K) 4435 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K) 442 |@@@@@ |
[8K, 16K) 299 |@@@ |
[16K, 32K) 100 |@ |
[32K, 64K) 139 |@ |
[64K, 128K) 56 | |
[128K, 256K) 26 | |
[256K, 512K) 2 | |
In one drop caches action, only traverse memcg once maybe is better.
If user need more memory, they can do drop caches again.
Signed-off-by: Chunxin Zang <zangchunxin@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
mm/vmscan.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b6d84326bdf2..9d8ee2ae5824 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -699,17 +699,12 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
void drop_slab_node(int nid)
{
- unsigned long freed;
+ struct mem_cgroup *memcg = NULL;
+ memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
- struct mem_cgroup *memcg = NULL;
-
- freed = 0;
- memcg = mem_cgroup_iter(NULL, NULL, NULL);
- do {
- freed += shrink_slab(GFP_KERNEL, nid, memcg, 0);
- } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
- } while (freed > 10);
+ shrink_slab(GFP_KERNEL, nid, memcg, 0);
+ } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
}
void drop_slab(void)
--
2.11.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
2020-09-08 14:24 [PATCH] mm/vmscan: fix infinite loop in drop_slab_node zangchunxin
@ 2020-09-08 15:09 ` Chris Down
2020-09-08 15:42 ` Vlastimil Babka
2020-09-09 4:19 ` [External] " Muchun Song
0 siblings, 2 replies; 7+ messages in thread
From: Chris Down @ 2020-09-08 15:09 UTC (permalink / raw)
To: zangchunxin; +Cc: akpm, linux-mm, linux-kernel, Muchun Song
drop_caches by its very nature can be extremely performance intensive -- if
someone wants to abort after trying too long, they can just send a
TASK_KILLABLE signal, no? If exiting the loop and returning to usermode doesn't
reliably work when doing that, then _that's_ something to improve, but this
looks premature to me until that's demonstrated not to work.
zangchunxin@bytedance.com writes:
>In one drop caches action, only traverse memcg once maybe is better.
>If user need more memory, they can do drop caches again.
Can you please provide some measurements of the difference in reclamation in
practice?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
2020-09-08 15:09 ` Chris Down
@ 2020-09-08 15:42 ` Vlastimil Babka
2020-09-08 17:55 ` Chris Down
2020-09-09 4:19 ` [External] " Muchun Song
1 sibling, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2020-09-08 15:42 UTC (permalink / raw)
To: Chris Down, zangchunxin; +Cc: akpm, linux-mm, linux-kernel, Muchun Song
On 9/8/20 5:09 PM, Chris Down wrote:
> drop_caches by its very nature can be extremely performance intensive -- if
> someone wants to abort after trying too long, they can just send a
> TASK_KILLABLE signal, no? If exiting the loop and returning to usermode doesn't
> reliably work when doing that, then _that's_ something to improve, but this
> looks premature to me until that's demonstrated not to work.
Hm there might be existings scripts (even though I dislike those) running
drop_caches periodically, and they are currently not set up to be killed, so one
day it might surprise someone. Dropping should be a one-time event, not a
continual reclaim.
Maybe we could be a bit smarter and e.g. double the threshold currently
hardcoded as "10" with each iteration?
> zangchunxin@bytedance.com writes:
>>In one drop caches action, only traverse memcg once maybe is better.
>>If user need more memory, they can do drop caches again.
>
> Can you please provide some measurements of the difference in reclamation in
> practice?
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
2020-09-08 15:42 ` Vlastimil Babka
@ 2020-09-08 17:55 ` Chris Down
0 siblings, 0 replies; 7+ messages in thread
From: Chris Down @ 2020-09-08 17:55 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: zangchunxin, akpm, linux-mm, linux-kernel, Muchun Song
Vlastimil Babka writes:
>On 9/8/20 5:09 PM, Chris Down wrote:
>> drop_caches by its very nature can be extremely performance intensive -- if
>> someone wants to abort after trying too long, they can just send a
>> TASK_KILLABLE signal, no? If exiting the loop and returning to usermode doesn't
>> reliably work when doing that, then _that's_ something to improve, but this
>> looks premature to me until that's demonstrated not to work.
>
>Hm there might be existings scripts (even though I dislike those) running
>drop_caches periodically, and they are currently not set up to be killed, so one
>day it might surprise someone. Dropping should be a one-time event, not a
>continual reclaim.
Sure, but these scripts can send it to their child themselves instead of using
WCE, no? As far as I know, that's the already supported way to abort
drop_caches-induced reclaim.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [External] Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
2020-09-08 15:09 ` Chris Down
2020-09-08 15:42 ` Vlastimil Babka
@ 2020-09-09 4:19 ` Muchun Song
2020-09-09 9:59 ` Chris Down
1 sibling, 1 reply; 7+ messages in thread
From: Muchun Song @ 2020-09-09 4:19 UTC (permalink / raw)
To: Chris Down, zangchunxin; +Cc: Andrew Morton, Linux Memory Management List, LKML
Hi Chris,
On Tue, Sep 8, 2020 at 11:09 PM Chris Down <chris@chrisdown.name> wrote:
>
> drop_caches by its very nature can be extremely performance intensive -- if
> someone wants to abort after trying too long, they can just send a
> TASK_KILLABLE signal, no? If exiting the loop and returning to usermode doesn't
> reliably work when doing that, then _that's_ something to improve, but this
> looks premature to me until that's demonstrated not to work.
Sending a TASK_KILLABLE signal? It didn't work now. Because the the
current task has no chance to handle the signal. So I think we may
need to do any of the following things to avoid this case happening.
1. Double the threshold currently hard coded as "10" with each iteration
suggested by Vlastimil. It is also a good idea.
2. In the while loop, we can check whether the TASK_KILLABLE
signal is set, if so, we should break the loop. like the following code
snippe. Thanks.
@@ -704,6 +704,9 @@ void drop_slab_node(int nid)
do {
struct mem_cgroup *memcg = NULL;
+ if (fatal_signal_pending(current))
+ return;
+
freed = 0;
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
>
> zangchunxin@bytedance.com writes:
> >In one drop caches action, only traverse memcg once maybe is better.
> >If user need more memory, they can do drop caches again.
>
> Can you please provide some measurements of the difference in reclamation in
> practice?
--
Yours,
Muchun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [External] Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
2020-09-09 4:19 ` [External] " Muchun Song
@ 2020-09-09 9:59 ` Chris Down
2020-09-09 12:10 ` Muchun Song
0 siblings, 1 reply; 7+ messages in thread
From: Chris Down @ 2020-09-09 9:59 UTC (permalink / raw)
To: Muchun Song
Cc: zangchunxin, Andrew Morton, Linux Memory Management List, LKML
Muchun Song writes:
>1. Double the threshold currently hard coded as "10" with each iteration
> suggested by Vlastimil. It is also a good idea.
I think this sounds reasonable, although I'd like to see what the difference in
reclaim looks like in practice.
>2. In the while loop, we can check whether the TASK_KILLABLE
> signal is set, if so, we should break the loop. like the following code
> snippe. Thanks.
>
>@@ -704,6 +704,9 @@ void drop_slab_node(int nid)
> do {
> struct mem_cgroup *memcg = NULL;
>
>+ if (fatal_signal_pending(current))
>+ return;
>+
> freed = 0;
> memcg = mem_cgroup_iter(NULL, NULL, NULL);
> do {
Regardless of anything, I think this is probably a good idea. Could you send it
as a patch? :-)
Thanks,
Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [External] Re: [PATCH] mm/vmscan: fix infinite loop in drop_slab_node
2020-09-09 9:59 ` Chris Down
@ 2020-09-09 12:10 ` Muchun Song
0 siblings, 0 replies; 7+ messages in thread
From: Muchun Song @ 2020-09-09 12:10 UTC (permalink / raw)
To: Chris Down; +Cc: zangchunxin, Andrew Morton, Linux Memory Management List, LKML
On Wed, Sep 9, 2020 at 5:59 PM Chris Down <chris@chrisdown.name> wrote:
>
> Muchun Song writes:
> >1. Double the threshold currently hard coded as "10" with each iteration
> > suggested by Vlastimil. It is also a good idea.
>
> I think this sounds reasonable, although I'd like to see what the difference in
> reclaim looks like in practice.
>
> >2. In the while loop, we can check whether the TASK_KILLABLE
> > signal is set, if so, we should break the loop. like the following code
> > snippe. Thanks.
> >
> >@@ -704,6 +704,9 @@ void drop_slab_node(int nid)
> > do {
> > struct mem_cgroup *memcg = NULL;
> >
> >+ if (fatal_signal_pending(current))
> >+ return;
> >+
> > freed = 0;
> > memcg = mem_cgroup_iter(NULL, NULL, NULL);
> > do {
>
> Regardless of anything, I think this is probably a good idea. Could you send it
> as a patch? :-)
OK, Will do that thanks.
>
> Thanks,
>
> Chris
--
Yours,
Muchun
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-09-09 14:59 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08 14:24 [PATCH] mm/vmscan: fix infinite loop in drop_slab_node zangchunxin
2020-09-08 15:09 ` Chris Down
2020-09-08 15:42 ` Vlastimil Babka
2020-09-08 17:55 ` Chris Down
2020-09-09 4:19 ` [External] " Muchun Song
2020-09-09 9:59 ` Chris Down
2020-09-09 12:10 ` Muchun Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).