linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [Bug] race condition at rebind_subsystems()
@ 2022-06-30 10:52 Jing-Ting Wu
  2022-07-15 11:59 ` Michal Koutný
  0 siblings, 1 reply; 7+ messages in thread
From: Jing-Ting Wu @ 2022-06-30 10:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Tejun Heo, Zefan Li, Matthias Brugger, cgroups, linux-kernel,
	linux-arm-kernel, linux-mediatek, Shakeel Butt, wsd_upstream,
	lixiong.liu, wenju.xu, jonathan.jmchen

Hi Johannes



We find the KE(kernel exception) happened when test the reboot test
case in T SW version with kernel-5.15.
The issue is unable to handle kernel paging request at virtual address.

Root cause:
The rebind_subsystems() is no lock held when move css object from A
list to B list,then let B's head be treated as css node at
list_for_each_entry_rcu().
Use the wrong css to get css->ss->css_rstat_flush should get a wrong
address.

The call stack as following:
kthread
-> worker_thread
-> process_one_work
-> flush_memcg_stats_dwork
-> __mem_cgroup_flush_stats
-> cgroup_rstat_flush_irqsafe
-> cgroup_rstat_flush_locked

Detail:
static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool
may_sleep)
{
...
   rcu_read_lock();
   list_for_each_entry_rcu(css, &pos->rstat_css_list,
                             rstat_css_node)
        css->ss->css_rstat_flush(css, cpu);
   rcu_read_unlock();
...
}

Because the content of css->ss is not a function address,
once the css_rstat_flush is called, kernel exception will happen.
When the issue happened, the list of pos->rstat_css_list at group A is
empty.
There must be racing conditions.

From reviewing code, we find rebind_subsystems() is no lock held when
move css object to other list.

int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
{
...
   if (ss->css_rstat_flush) {
         list_del_rcu(&css->rstat_css_node);
         list_add_rcu(&css->rstat_css_node,
                      &dcgrp->rstat_css_list);
   }
...
}

If we held a css object from A group list, at same time this css object
was moved to B group list.
Because current pos is in B’s list, link list was link the pos->next to
B’s head, so the pos->member will never equal A’s head, then the B’s
head(cgroup_root->cgroup->rstat_css_list) will be treated as css
node(css->rstat_css_node).
list_for_each_entry_rcu() use the container_of() to get css address,
and it treated the address of [cgroup_root->cgroup->rstat_css_list -
rstat_css_node] to be a css address.
cgroup_rstat_flush_locked() use the wrong css address to do css->ss-
>css_rstat_flush, then the wrong function address will be jump.

#define list_for_each_entry_rcu(pos, head, member,
cond...)             \
   for (__list_check_rcu(dummy, ## cond, 0), \
     pos = list_entry_rcu((head)->next, typeof(*pos), member); \
        &pos->member != (head);                                \
        pos = list_entry_rcu(pos->member.next, typeof(*pos), member))


We look the patch of move css object from A list to B list is merged by
following link:

https://android.googlesource.com/kernel/common/+/a7df69b81aac5bdeb5c5aef9addd680ce22feebf%5E%21/#F0


Do you have any suggestion for this issue?
Thank you.



Best Regards,
Jing-Ting Wu



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug] race condition at rebind_subsystems()
  2022-06-30 10:52 [Bug] race condition at rebind_subsystems() Jing-Ting Wu
@ 2022-07-15 11:59 ` Michal Koutný
  2022-07-15 16:47   ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Koutný @ 2022-07-15 11:59 UTC (permalink / raw)
  To: Jing-Ting Wu
  Cc: Johannes Weiner, Tejun Heo, Zefan Li, Matthias Brugger, cgroups,
	linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt,
	wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen

Hello Jing-Ting.

On Thu, Jun 30, 2022 at 06:52:10PM +0800, Jing-Ting Wu <jing-ting.wu@mediatek.com> wrote:
> Root cause:
> The rebind_subsystems() is no lock held when move css object from A
> list to B list,then let B's head be treated as css node at
> list_for_each_entry_rcu().

Nice outcome of the analysis.

> The call stack as following:
> kthread
> -> worker_thread
> -> process_one_work
> -> flush_memcg_stats_dwork
> -> __mem_cgroup_flush_stats
> -> cgroup_rstat_flush_irqsafe
> -> cgroup_rstat_flush_locked
> 
> Detail:
> static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool
> may_sleep)
> {
> ...
>    rcu_read_lock();
>    list_for_each_entry_rcu(css, &pos->rstat_css_list,
>                              rstat_css_node)
>         css->ss->css_rstat_flush(css, cpu);
>    rcu_read_unlock();
> ...
> }
> 
> Because the content of css->ss is not a function address,
> once the css_rstat_flush is called, kernel exception will happen.

I agree with your analysis. The list_for_each_entry_rcu() is not
accounting for the move from A to B and the head's rstat_css_list is
invalid in the cgroup_rstat_flush() context.

> When the issue happened, the list of pos->rstat_css_list at group A is
> empty.
> There must be racing conditions.
> 
> From reviewing code, we find rebind_subsystems() is no lock held when
> move css object to other list.
> 
> int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
> {
> ...
>    if (ss->css_rstat_flush) {
>          list_del_rcu(&css->rstat_css_node);
>          list_add_rcu(&css->rstat_css_node,
>                       &dcgrp->rstat_css_list);
>    }
> ...
> }
> 
> If we held a css object from A group list, at same time this css object
> was moved to B group list.
> [...]
> Do you have any suggestion for this issue?

The css->rstat_css_node should not be modified if there are possible RCU
readers elsewhere.
One way to fix this would be to insert synchronize_rcu() after
list_del_rcu() and before list_add_rcu().
(A further alternative (I've heard about) would be to utilize 'nulls'
RCU lists [1] to make the move between lists detectable.)

But as I'm looking at it from distance, it may be simpler and sufficient
to just take cgroup_rstat_lock around the list migration (the nesting
under cgroup_mutex that's held with rebind_subsystems() is fine).

HTH,
Michal

[1] https://www.kernel.org/doc/html/latest/RCU/rculist_nulls.html


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug] race condition at rebind_subsystems()
  2022-07-15 11:59 ` Michal Koutný
@ 2022-07-15 16:47   ` Tejun Heo
  2022-07-18  7:44     ` Jing-Ting Wu
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2022-07-15 16:47 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Jing-Ting Wu, Johannes Weiner, Zefan Li, Matthias Brugger,
	cgroups, linux-kernel, linux-arm-kernel, linux-mediatek,
	Shakeel Butt, wsd_upstream, lixiong.liu, wenju.xu,
	jonathan.jmchen

(resending, I messed up the message header, sorry)

Hello,

On Fri, Jul 15, 2022 at 01:59:38PM +0200, Michal Koutný wrote:
> The css->rstat_css_node should not be modified if there are possible RCU
> readers elsewhere.
> One way to fix this would be to insert synchronize_rcu() after
> list_del_rcu() and before list_add_rcu().
> (A further alternative (I've heard about) would be to utilize 'nulls'
> RCU lists [1] to make the move between lists detectable.)
> 
> But as I'm looking at it from distance, it may be simpler and sufficient
> to just take cgroup_rstat_lock around the list migration (the nesting
> under cgroup_mutex that's held with rebind_subsystems() is fine).

synchronize_rcu() prolly is the better fit here given how that list_node's
usage, but yeah, great find.

Thanks.

-- 
tejun

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug] race condition at rebind_subsystems()
  2022-07-15 16:47   ` Tejun Heo
@ 2022-07-18  7:44     ` Jing-Ting Wu
  2022-07-19 13:35       ` Michal Koutný
  0 siblings, 1 reply; 7+ messages in thread
From: Jing-Ting Wu @ 2022-07-18  7:44 UTC (permalink / raw)
  To: Tejun Heo, Michal Koutný
  Cc: Johannes Weiner, Zefan Li, Matthias Brugger, cgroups,
	linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt,
	wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen

On Fri, 2022-07-15 at 06:47 -1000, Tejun Heo wrote:
> (resending, I messed up the message header, sorry)
> 
> Hello,
> 
> On Fri, Jul 15, 2022 at 01:59:38PM +0200, Michal Koutný wrote:
> > The css->rstat_css_node should not be modified if there are
> > possible RCU
> > readers elsewhere.
> > One way to fix this would be to insert synchronize_rcu() after
> > list_del_rcu() and before list_add_rcu().
> > (A further alternative (I've heard about) would be to utilize
> > 'nulls'
> > RCU lists [1] to make the move between lists detectable.)
> > 
> > But as I'm looking at it from distance, it may be simpler and
> > sufficient
> > to just take cgroup_rstat_lock around the list migration (the
> > nesting
> > under cgroup_mutex that's held with rebind_subsystems() is fine).
> 
> synchronize_rcu() prolly is the better fit here given how that
> list_node's
> usage, but yeah, great find.
> 
> Thanks.
> 

Hi Michal and Tejun,


Thanks for your suggestion.
Accroding your description, is the following patch corrent?

--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1813,6 +1813,7 @@
 
 		if (ss->css_rstat_flush) {
 			list_del_rcu(&css->rstat_css_node);
+			synchronize_rcu();
 			list_add_rcu(&css->rstat_css_node,
 				     &dcgrp->rstat_css_list);
 		}


If the patch is correct, we will add this patch to our stability test.
And we will continue to observe whether the problem is solved.
Thank you.




Best regards,
Jing-Ting Wu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug] race condition at rebind_subsystems()
  2022-07-18  7:44     ` Jing-Ting Wu
@ 2022-07-19 13:35       ` Michal Koutný
  2022-07-21  6:46         ` Jing-Ting Wu
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Koutný @ 2022-07-19 13:35 UTC (permalink / raw)
  To: Jing-Ting Wu
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Matthias Brugger, cgroups,
	linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt,
	wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen

On Mon, Jul 18, 2022 at 03:44:21PM +0800, Jing-Ting Wu <jing-ting.wu@mediatek.com> wrote:
> Accroding your description, is the following patch corrent?

Yes, this is what I meant (i.e. grace period before invalidating the
removed rstat_css_node).

> If the patch is correct, we will add this patch to our stability test.
> And we will continue to observe whether the problem is solved.

It'd be great to have such confirmation.

Thanks,
Michal

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug] race condition at rebind_subsystems()
  2022-07-19 13:35       ` Michal Koutný
@ 2022-07-21  6:46         ` Jing-Ting Wu
  2022-08-07  6:58           ` Jing-Ting Wu
  0 siblings, 1 reply; 7+ messages in thread
From: Jing-Ting Wu @ 2022-07-21  6:46 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Matthias Brugger, cgroups,
	linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt,
	wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen

On Tue, 2022-07-19 at 15:35 +0200, Michal Koutný wrote:
> On Mon, Jul 18, 2022 at 03:44:21PM +0800, Jing-Ting Wu <
> jing-ting.wu@mediatek.com> wrote:
> > Accroding your description, is the following patch corrent?
> 
> Yes, this is what I meant (i.e. grace period before invalidating the
> removed rstat_css_node).
> 
> > If the patch is correct, we will add this patch to our stability
> > test.
> > And we will continue to observe whether the problem is solved.
> 
> It'd be great to have such confirmation.

Thanks for confirm the patch.
We already add patch to our stability test.
We will reply mail if we have any update.

> 
> Thanks,
> Michal



Best regards,
Jing-Ting Wu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug] race condition at rebind_subsystems()
  2022-07-21  6:46         ` Jing-Ting Wu
@ 2022-08-07  6:58           ` Jing-Ting Wu
  0 siblings, 0 replies; 7+ messages in thread
From: Jing-Ting Wu @ 2022-08-07  6:58 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Johannes Weiner, Zefan Li, Matthias Brugger, cgroups,
	linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt,
	wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen

On Thu, 2022-07-21 at 14:46 +0800, Jing-Ting Wu wrote:
> On Tue, 2022-07-19 at 15:35 +0200, Michal Koutný wrote:
> > On Mon, Jul 18, 2022 at 03:44:21PM +0800, Jing-Ting Wu <
> > jing-ting.wu@mediatek.com> wrote:
> > > Accroding your description, is the following patch corrent?
> > 
> > Yes, this is what I meant (i.e. grace period before invalidating
> > the
> > removed rstat_css_node).
> > 
> > > If the patch is correct, we will add this patch to our stability
> > > test.
> > > And we will continue to observe whether the problem is solved.
> > 
> > It'd be great to have such confirmation.
> 
> Thanks for confirm the patch.
> We already add patch to our stability test.
> We will reply mail if we have any update.
> 
> > 
> > Thanks,
> > Michal
> 

Hi, Michal

Use the patch of previous mail, we pass the stability test at previous
two week and still testing.

I think the patch is helpful to the syndrome.
It passed stability test over two week so far. (as-is: < one week
failed)
Do you help to upstream this patch to mainline?

Thank you.


Best regards,
Jing-Ting Wu





> 
> Best regards,
> Jing-Ting Wu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-07  7:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-30 10:52 [Bug] race condition at rebind_subsystems() Jing-Ting Wu
2022-07-15 11:59 ` Michal Koutný
2022-07-15 16:47   ` Tejun Heo
2022-07-18  7:44     ` Jing-Ting Wu
2022-07-19 13:35       ` Michal Koutný
2022-07-21  6:46         ` Jing-Ting Wu
2022-08-07  6:58           ` Jing-Ting Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).