linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter
@ 2017-07-26 13:07 Wenwei Tao
  2017-07-26 13:44 ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Wenwei Tao @ 2017-07-26 13:07 UTC (permalink / raw)
  To: hannes, mhocko, bsingharora, kamezawa.hiroyu
  Cc: yuwang.yuwang, cgroups, linux-mm, linux-kernel, Wenwei Tao

From: Wenwei Tao <wenwei.tww@alibaba-inc.com>

By removing the child cgroup while the parent cgroup is
under reclaim, we could trigger the following kernel panic
on kernel 3.10:
------------------------------------------------------------------------
kernel BUG at kernel/cgroup.c:893!
 invalid opcode: 0000 [#1] SMP
 CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1
 Workqueue: cgroup_destroy css_dput_fn
 task: ffff8817959a5780 ti: ffff8817e8886000 task.ti: ffff8817e8886000
 RIP: 0010:[<ffffffff810cd6e0>]  [<ffffffff810cd6e0>]
cgroup_diput+0xc0/0xf0
 RSP: 0000:ffff8817e8887da0  EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff8817a5dd5d40 RCX: dead000000000200
 RDX: 0000000000000000 RSI: ffff8817973a6910 RDI: ffff8817f54c2a00
 RBP: ffff8817e8887dc8 R08: ffff8817a5dd5dd0 R09: df9fb35794b01820
 R10: df9fb35794b01820 R11: 00007fa95b1efcda R12: ffff8817a5dd5d9c
 R13: ffff8817f38b3a40 R14: ffff8817973a6910 R15: ffff8817973a6910
 FS:  0000000000000000(0000) GS:ffff88181f220000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fa6e6234000 CR3: 000000179f19d000 CR4: 00000000000407e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Stack:
  ffff8817a5dd5d40 ffff8817a5dd5d9c ffff8817f38b3a40 ffff8817973a6910
  0000000000000040 ffff8817e8887df8 ffffffff811b37c2 ffff8817fa23c000
  ffff8817f57dbb80 ffff88181f232ac0 ffff88181f237500 ffff8817e8887e10
 Call Trace:
  [<ffffffff811b37c2>] dput+0x1a2/0x2f0
  [<ffffffff810cbacc>] cgroup_dput.isra.21+0x1c/0x30
  [<ffffffff810cbafd>] css_dput_fn+0x1d/0x20
  [<ffffffff81078ebc>] process_one_work+0x17c/0x460
  [<ffffffff81079b66>] worker_thread+0x116/0x3b0
  [<ffffffff81079a50>] ? manage_workers.isra.25+0x290/0x290
  [<ffffffff81080330>] kthread+0xc0/0xd0
  [<ffffffff81080270>] ? insert_kthread_work+0x40/0x40
  [<ffffffff815b1e08>] ret_from_fork+0x58/0x90
  [<ffffffff81080270>] ? insert_kthread_work+0x40/0x40
 Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15
48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b
49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7
 RIP  [<ffffffff810cd6e0>] cgroup_diput+0xc0/0xf0
 RSP <ffff8817e8887da0>
 ---[ end trace 85eeea5212c44f51 ]---
------------------------------------------------------------------------

I think there is a css double put in mem_cgroup_iter. Under reclaim,
we call mem_cgroup_iter the first time with prev == NULL, and we get
last_visited memcg from per zone's reclaim_iter then call __mem_cgroup_iter_next
try to get next alive memcg, __mem_cgroup_iter_next could return NULL
if last_visited is already the last one so we put the last_visited's
memcg css and continue to the next while loop, this time we might not
do css_tryget(&last_visited->css) if the dead_count is changed, but
we still do css_put(&last_visited->css), we put it twice, this could
trigger the BUG_ON at kernel/cgroup.c:893.

Reported-by: Wang Yu <yuwang.yuwang@alibaba-inc.com>
Tested-by: Wang Yu <yuwang.yuwang@alibaba-inc.com>
Signed-off-by: Wenwei Tao <wenwei.tww@alibaba-inc.com>
---
 mm/memcontrol.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 437ae2c..3d7a046 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1230,8 +1230,10 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 		memcg = __mem_cgroup_iter_next(root, last_visited);
 
 		if (reclaim) {
-			if (last_visited && last_visited != root)
+			if (last_visited && last_visited != root) {
 				css_put(&last_visited->css);
+				last_visited = NULL;
+			}
 
 			iter->last_visited = memcg;
 			smp_wmb();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter
  2017-07-26 13:07 [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter Wenwei Tao
@ 2017-07-26 13:44 ` Michal Hocko
  2017-07-27  3:30   ` Wenwei Tao
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2017-07-26 13:44 UTC (permalink / raw)
  To: Wenwei Tao
  Cc: hannes, bsingharora, kamezawa.hiroyu, yuwang.yuwang, cgroups,
	linux-mm, linux-kernel, Wenwei Tao

On Wed 26-07-17 21:07:42, Wenwei Tao wrote:
> From: Wenwei Tao <wenwei.tww@alibaba-inc.com>
> 
> By removing the child cgroup while the parent cgroup is
> under reclaim, we could trigger the following kernel panic
> on kernel 3.10:
> ------------------------------------------------------------------------
> kernel BUG at kernel/cgroup.c:893!
>  invalid opcode: 0000 [#1] SMP
>  CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1
>  Workqueue: cgroup_destroy css_dput_fn
>  task: ffff8817959a5780 ti: ffff8817e8886000 task.ti: ffff8817e8886000
>  RIP: 0010:[<ffffffff810cd6e0>]  [<ffffffff810cd6e0>]
> cgroup_diput+0xc0/0xf0
>  RSP: 0000:ffff8817e8887da0  EFLAGS: 00010246
>  RAX: 0000000000000000 RBX: ffff8817a5dd5d40 RCX: dead000000000200
>  RDX: 0000000000000000 RSI: ffff8817973a6910 RDI: ffff8817f54c2a00
>  RBP: ffff8817e8887dc8 R08: ffff8817a5dd5dd0 R09: df9fb35794b01820
>  R10: df9fb35794b01820 R11: 00007fa95b1efcda R12: ffff8817a5dd5d9c
>  R13: ffff8817f38b3a40 R14: ffff8817973a6910 R15: ffff8817973a6910
>  FS:  0000000000000000(0000) GS:ffff88181f220000(0000)
> knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fa6e6234000 CR3: 000000179f19d000 CR4: 00000000000407e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  Stack:
>   ffff8817a5dd5d40 ffff8817a5dd5d9c ffff8817f38b3a40 ffff8817973a6910
>   0000000000000040 ffff8817e8887df8 ffffffff811b37c2 ffff8817fa23c000
>   ffff8817f57dbb80 ffff88181f232ac0 ffff88181f237500 ffff8817e8887e10
>  Call Trace:
>   [<ffffffff811b37c2>] dput+0x1a2/0x2f0
>   [<ffffffff810cbacc>] cgroup_dput.isra.21+0x1c/0x30
>   [<ffffffff810cbafd>] css_dput_fn+0x1d/0x20
>   [<ffffffff81078ebc>] process_one_work+0x17c/0x460
>   [<ffffffff81079b66>] worker_thread+0x116/0x3b0
>   [<ffffffff81079a50>] ? manage_workers.isra.25+0x290/0x290
>   [<ffffffff81080330>] kthread+0xc0/0xd0
>   [<ffffffff81080270>] ? insert_kthread_work+0x40/0x40
>   [<ffffffff815b1e08>] ret_from_fork+0x58/0x90
>   [<ffffffff81080270>] ? insert_kthread_work+0x40/0x40
>  Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15
> 48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b
> 49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7
>  RIP  [<ffffffff810cd6e0>] cgroup_diput+0xc0/0xf0
>  RSP <ffff8817e8887da0>
>  ---[ end trace 85eeea5212c44f51 ]---
> ------------------------------------------------------------------------
> 
> I think there is a css double put in mem_cgroup_iter. Under reclaim,
> we call mem_cgroup_iter the first time with prev == NULL, and we get
> last_visited memcg from per zone's reclaim_iter then call __mem_cgroup_iter_next
> try to get next alive memcg, __mem_cgroup_iter_next could return NULL
> if last_visited is already the last one so we put the last_visited's
> memcg css and continue to the next while loop, this time we might not
> do css_tryget(&last_visited->css) if the dead_count is changed, but
> we still do css_put(&last_visited->css), we put it twice, this could
> trigger the BUG_ON at kernel/cgroup.c:893.

Yes, I guess your are right and I suspect that this has been silently
fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator
loading and updating"). I think a more appropriate fix is would be.
Are you able to reproduce and re-test it?
---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 437ae2cbe102..0848ec05c12a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1224,6 +1224,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
 				if (last_visited && last_visited != root &&
 				    !css_tryget(&last_visited->css))
 					last_visited = NULL;
+			} else {
+				last_visited = true;
 			}
 		}
 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter
  2017-07-26 13:44 ` Michal Hocko
@ 2017-07-27  3:30   ` Wenwei Tao
  2017-07-27  6:53     ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Wenwei Tao @ 2017-07-27  3:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Balbir Singh, kamezawa.hiroyu, yuwang.yuwang,
	cgroups, linux-mm, linux-kernel, Wenwei Tao

2017-07-26 21:44 GMT+08:00 Michal Hocko <mhocko@kernel.org>:
> On Wed 26-07-17 21:07:42, Wenwei Tao wrote:
>> From: Wenwei Tao <wenwei.tww@alibaba-inc.com>
>>
>> By removing the child cgroup while the parent cgroup is
>> under reclaim, we could trigger the following kernel panic
>> on kernel 3.10:
>> ------------------------------------------------------------------------
>> kernel BUG at kernel/cgroup.c:893!
>>  invalid opcode: 0000 [#1] SMP
>>  CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1
>>  Workqueue: cgroup_destroy css_dput_fn
>>  task: ffff8817959a5780 ti: ffff8817e8886000 task.ti: ffff8817e8886000
>>  RIP: 0010:[<ffffffff810cd6e0>]  [<ffffffff810cd6e0>]
>> cgroup_diput+0xc0/0xf0
>>  RSP: 0000:ffff8817e8887da0  EFLAGS: 00010246
>>  RAX: 0000000000000000 RBX: ffff8817a5dd5d40 RCX: dead000000000200
>>  RDX: 0000000000000000 RSI: ffff8817973a6910 RDI: ffff8817f54c2a00
>>  RBP: ffff8817e8887dc8 R08: ffff8817a5dd5dd0 R09: df9fb35794b01820
>>  R10: df9fb35794b01820 R11: 00007fa95b1efcda R12: ffff8817a5dd5d9c
>>  R13: ffff8817f38b3a40 R14: ffff8817973a6910 R15: ffff8817973a6910
>>  FS:  0000000000000000(0000) GS:ffff88181f220000(0000)
>> knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 00007fa6e6234000 CR3: 000000179f19d000 CR4: 00000000000407e0
>>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>  Stack:
>>   ffff8817a5dd5d40 ffff8817a5dd5d9c ffff8817f38b3a40 ffff8817973a6910
>>   0000000000000040 ffff8817e8887df8 ffffffff811b37c2 ffff8817fa23c000
>>   ffff8817f57dbb80 ffff88181f232ac0 ffff88181f237500 ffff8817e8887e10
>>  Call Trace:
>>   [<ffffffff811b37c2>] dput+0x1a2/0x2f0
>>   [<ffffffff810cbacc>] cgroup_dput.isra.21+0x1c/0x30
>>   [<ffffffff810cbafd>] css_dput_fn+0x1d/0x20
>>   [<ffffffff81078ebc>] process_one_work+0x17c/0x460
>>   [<ffffffff81079b66>] worker_thread+0x116/0x3b0
>>   [<ffffffff81079a50>] ? manage_workers.isra.25+0x290/0x290
>>   [<ffffffff81080330>] kthread+0xc0/0xd0
>>   [<ffffffff81080270>] ? insert_kthread_work+0x40/0x40
>>   [<ffffffff815b1e08>] ret_from_fork+0x58/0x90
>>   [<ffffffff81080270>] ? insert_kthread_work+0x40/0x40
>>  Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15
>> 48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b
>> 49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7
>>  RIP  [<ffffffff810cd6e0>] cgroup_diput+0xc0/0xf0
>>  RSP <ffff8817e8887da0>
>>  ---[ end trace 85eeea5212c44f51 ]---
>> ------------------------------------------------------------------------
>>
>> I think there is a css double put in mem_cgroup_iter. Under reclaim,
>> we call mem_cgroup_iter the first time with prev == NULL, and we get
>> last_visited memcg from per zone's reclaim_iter then call __mem_cgroup_iter_next
>> try to get next alive memcg, __mem_cgroup_iter_next could return NULL
>> if last_visited is already the last one so we put the last_visited's
>> memcg css and continue to the next while loop, this time we might not
>> do css_tryget(&last_visited->css) if the dead_count is changed, but
>> we still do css_put(&last_visited->css), we put it twice, this could
>> trigger the BUG_ON at kernel/cgroup.c:893.
>
> Yes, I guess your are right and I suspect that this has been silently
> fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator
> loading and updating"). I think a more appropriate fix is would be.
> Are you able to reproduce and re-test it?
> ---

Yes, I think this commit can fix this issue, and I backport this
commit to 3.10.107 kernel
and cannot reproduce this issue. I guess this commit might need to be
backported to 3.10.y
stable kernel.

> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 437ae2cbe102..0848ec05c12a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1224,6 +1224,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
>                                 if (last_visited && last_visited != root &&
>                                     !css_tryget(&last_visited->css))
>                                         last_visited = NULL;
> +                       } else {
> +                               last_visited = true;
>                         }
>                 }
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter
  2017-07-27  3:30   ` Wenwei Tao
@ 2017-07-27  6:53     ` Michal Hocko
  0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2017-07-27  6:53 UTC (permalink / raw)
  To: Wenwei Tao
  Cc: Johannes Weiner, Balbir Singh, kamezawa.hiroyu, yuwang.yuwang,
	cgroups, linux-mm, linux-kernel, Wenwei Tao

On Thu 27-07-17 11:30:50, Wenwei Tao wrote:
> 2017-07-26 21:44 GMT+08:00 Michal Hocko <mhocko@kernel.org>:
> > On Wed 26-07-17 21:07:42, Wenwei Tao wrote:
[...]
> >> I think there is a css double put in mem_cgroup_iter. Under reclaim,
> >> we call mem_cgroup_iter the first time with prev == NULL, and we get
> >> last_visited memcg from per zone's reclaim_iter then call __mem_cgroup_iter_next
> >> try to get next alive memcg, __mem_cgroup_iter_next could return NULL
> >> if last_visited is already the last one so we put the last_visited's
> >> memcg css and continue to the next while loop, this time we might not
> >> do css_tryget(&last_visited->css) if the dead_count is changed, but
> >> we still do css_put(&last_visited->css), we put it twice, this could
> >> trigger the BUG_ON at kernel/cgroup.c:893.
> >
> > Yes, I guess your are right and I suspect that this has been silently
> > fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator
> > loading and updating"). I think a more appropriate fix is would be.
> > Are you able to reproduce and re-test it?
> > ---
> 
> Yes, I think this commit can fix this issue, and I backport this
> commit to 3.10.107 kernel and cannot reproduce this issue. I guess
> this commit might need to be backported to 3.10.y stable kernel.

Please send it to the kernel-stable mailing list. 3.10 seems to be still
maintained.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-07-27  6:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-26 13:07 [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter Wenwei Tao
2017-07-26 13:44 ` Michal Hocko
2017-07-27  3:30   ` Wenwei Tao
2017-07-27  6:53     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).