* [Bug] race condition at rebind_subsystems()
@ 2022-06-30 10:52 Jing-Ting Wu
2022-07-15 11:59 ` Michal Koutný
0 siblings, 1 reply; 7+ messages in thread
From: Jing-Ting Wu @ 2022-06-30 10:52 UTC (permalink / raw)
To: Johannes Weiner
Cc: Tejun Heo, Zefan Li, Matthias Brugger, cgroups, linux-kernel,
linux-arm-kernel, linux-mediatek, Shakeel Butt, wsd_upstream,
lixiong.liu, wenju.xu, jonathan.jmchen
Hi Johannes
We find the KE(kernel exception) happened when test the reboot test
case in T SW version with kernel-5.15.
The issue is unable to handle kernel paging request at virtual address.
Root cause:
The rebind_subsystems() is no lock held when move css object from A
list to B list,then let B's head be treated as css node at
list_for_each_entry_rcu().
Use the wrong css to get css->ss->css_rstat_flush should get a wrong
address.
The call stack as following:
kthread
-> worker_thread
-> process_one_work
-> flush_memcg_stats_dwork
-> __mem_cgroup_flush_stats
-> cgroup_rstat_flush_irqsafe
-> cgroup_rstat_flush_locked
Detail:
static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool
may_sleep)
{
...
rcu_read_lock();
list_for_each_entry_rcu(css, &pos->rstat_css_list,
rstat_css_node)
css->ss->css_rstat_flush(css, cpu);
rcu_read_unlock();
...
}
Because the content of css->ss is not a function address,
once the css_rstat_flush is called, kernel exception will happen.
When the issue happened, the list of pos->rstat_css_list at group A is
empty.
There must be racing conditions.
From reviewing code, we find rebind_subsystems() is no lock held when
move css object to other list.
int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
{
...
if (ss->css_rstat_flush) {
list_del_rcu(&css->rstat_css_node);
list_add_rcu(&css->rstat_css_node,
&dcgrp->rstat_css_list);
}
...
}
If we held a css object from A group list, at same time this css object
was moved to B group list.
Because current pos is in B’s list, link list was link the pos->next to
B’s head, so the pos->member will never equal A’s head, then the B’s
head(cgroup_root->cgroup->rstat_css_list) will be treated as css
node(css->rstat_css_node).
list_for_each_entry_rcu() use the container_of() to get css address,
and it treated the address of [cgroup_root->cgroup->rstat_css_list -
rstat_css_node] to be a css address.
cgroup_rstat_flush_locked() use the wrong css address to do css->ss-
>css_rstat_flush, then the wrong function address will be jump.
#define list_for_each_entry_rcu(pos, head, member,
cond...) \
for (__list_check_rcu(dummy, ## cond, 0), \
pos = list_entry_rcu((head)->next, typeof(*pos), member); \
&pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
We look the patch of move css object from A list to B list is merged by
following link:
https://android.googlesource.com/kernel/common/+/a7df69b81aac5bdeb5c5aef9addd680ce22feebf%5E%21/#F0
Do you have any suggestion for this issue?
Thank you.
Best Regards,
Jing-Ting Wu
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bug] race condition at rebind_subsystems()
2022-06-30 10:52 [Bug] race condition at rebind_subsystems() Jing-Ting Wu
@ 2022-07-15 11:59 ` Michal Koutný
2022-07-15 16:47 ` Tejun Heo
0 siblings, 1 reply; 7+ messages in thread
From: Michal Koutný @ 2022-07-15 11:59 UTC (permalink / raw)
To: Jing-Ting Wu
Cc: Johannes Weiner, Tejun Heo, Zefan Li, Matthias Brugger, cgroups,
linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt,
wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen
Hello Jing-Ting.
On Thu, Jun 30, 2022 at 06:52:10PM +0800, Jing-Ting Wu <jing-ting.wu@mediatek.com> wrote:
> Root cause:
> The rebind_subsystems() is no lock held when move css object from A
> list to B list,then let B's head be treated as css node at
> list_for_each_entry_rcu().
Nice outcome of the analysis.
> The call stack as following:
> kthread
> -> worker_thread
> -> process_one_work
> -> flush_memcg_stats_dwork
> -> __mem_cgroup_flush_stats
> -> cgroup_rstat_flush_irqsafe
> -> cgroup_rstat_flush_locked
>
> Detail:
> static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool
> may_sleep)
> {
> ...
> rcu_read_lock();
> list_for_each_entry_rcu(css, &pos->rstat_css_list,
> rstat_css_node)
> css->ss->css_rstat_flush(css, cpu);
> rcu_read_unlock();
> ...
> }
>
> Because the content of css->ss is not a function address,
> once the css_rstat_flush is called, kernel exception will happen.
I agree with your analysis. The list_for_each_entry_rcu() is not
accounting for the move from A to B and the head's rstat_css_list is
invalid in the cgroup_rstat_flush() context.
> When the issue happened, the list of pos->rstat_css_list at group A is
> empty.
> There must be racing conditions.
>
> From reviewing code, we find rebind_subsystems() is no lock held when
> move css object to other list.
>
> int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
> {
> ...
> if (ss->css_rstat_flush) {
> list_del_rcu(&css->rstat_css_node);
> list_add_rcu(&css->rstat_css_node,
> &dcgrp->rstat_css_list);
> }
> ...
> }
>
> If we held a css object from A group list, at same time this css object
> was moved to B group list.
> [...]
> Do you have any suggestion for this issue?
The css->rstat_css_node should not be modified if there are possible RCU
readers elsewhere.
One way to fix this would be to insert synchronize_rcu() after
list_del_rcu() and before list_add_rcu().
(A further alternative (I've heard about) would be to utilize 'nulls'
RCU lists [1] to make the move between lists detectable.)
But as I'm looking at it from distance, it may be simpler and sufficient
to just take cgroup_rstat_lock around the list migration (the nesting
under cgroup_mutex that's held with rebind_subsystems() is fine).
HTH,
Michal
[1] https://www.kernel.org/doc/html/latest/RCU/rculist_nulls.html
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bug] race condition at rebind_subsystems()
2022-07-15 11:59 ` Michal Koutný
@ 2022-07-15 16:47 ` Tejun Heo
2022-07-18 7:44 ` Jing-Ting Wu
0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2022-07-15 16:47 UTC (permalink / raw)
To: Michal Koutný
Cc: Jing-Ting Wu, Johannes Weiner, Zefan Li, Matthias Brugger,
cgroups, linux-kernel, linux-arm-kernel, linux-mediatek,
Shakeel Butt, wsd_upstream, lixiong.liu, wenju.xu,
jonathan.jmchen
(resending, I messed up the message header, sorry)
Hello,
On Fri, Jul 15, 2022 at 01:59:38PM +0200, Michal Koutný wrote:
> The css->rstat_css_node should not be modified if there are possible RCU
> readers elsewhere.
> One way to fix this would be to insert synchronize_rcu() after
> list_del_rcu() and before list_add_rcu().
> (A further alternative (I've heard about) would be to utilize 'nulls'
> RCU lists [1] to make the move between lists detectable.)
>
> But as I'm looking at it from distance, it may be simpler and sufficient
> to just take cgroup_rstat_lock around the list migration (the nesting
> under cgroup_mutex that's held with rebind_subsystems() is fine).
synchronize_rcu() prolly is the better fit here given how that list_node's
usage, but yeah, great find.
Thanks.
--
tejun
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bug] race condition at rebind_subsystems()
2022-07-15 16:47 ` Tejun Heo
@ 2022-07-18 7:44 ` Jing-Ting Wu
2022-07-19 13:35 ` Michal Koutný
0 siblings, 1 reply; 7+ messages in thread
From: Jing-Ting Wu @ 2022-07-18 7:44 UTC (permalink / raw)
To: Tejun Heo, Michal Koutný
Cc: Johannes Weiner, Zefan Li, Matthias Brugger, cgroups,
linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt,
wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen
On Fri, 2022-07-15 at 06:47 -1000, Tejun Heo wrote:
> (resending, I messed up the message header, sorry)
>
> Hello,
>
> On Fri, Jul 15, 2022 at 01:59:38PM +0200, Michal Koutný wrote:
> > The css->rstat_css_node should not be modified if there are
> > possible RCU
> > readers elsewhere.
> > One way to fix this would be to insert synchronize_rcu() after
> > list_del_rcu() and before list_add_rcu().
> > (A further alternative (I've heard about) would be to utilize
> > 'nulls'
> > RCU lists [1] to make the move between lists detectable.)
> >
> > But as I'm looking at it from distance, it may be simpler and
> > sufficient
> > to just take cgroup_rstat_lock around the list migration (the
> > nesting
> > under cgroup_mutex that's held with rebind_subsystems() is fine).
>
> synchronize_rcu() prolly is the better fit here given how that
> list_node's
> usage, but yeah, great find.
>
> Thanks.
>
Hi Michal and Tejun,
Thanks for your suggestion.
Accroding your description, is the following patch corrent?
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1813,6 +1813,7 @@
if (ss->css_rstat_flush) {
list_del_rcu(&css->rstat_css_node);
+ synchronize_rcu();
list_add_rcu(&css->rstat_css_node,
&dcgrp->rstat_css_list);
}
If the patch is correct, we will add this patch to our stability test.
And we will continue to observe whether the problem is solved.
Thank you.
Best regards,
Jing-Ting Wu
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bug] race condition at rebind_subsystems()
2022-07-18 7:44 ` Jing-Ting Wu
@ 2022-07-19 13:35 ` Michal Koutný
2022-07-21 6:46 ` Jing-Ting Wu
0 siblings, 1 reply; 7+ messages in thread
From: Michal Koutný @ 2022-07-19 13:35 UTC (permalink / raw)
To: Jing-Ting Wu
Cc: Tejun Heo, Johannes Weiner, Zefan Li, Matthias Brugger, cgroups,
linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt,
wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen
On Mon, Jul 18, 2022 at 03:44:21PM +0800, Jing-Ting Wu <jing-ting.wu@mediatek.com> wrote:
> Accroding your description, is the following patch corrent?
Yes, this is what I meant (i.e. grace period before invalidating the
removed rstat_css_node).
> If the patch is correct, we will add this patch to our stability test.
> And we will continue to observe whether the problem is solved.
It'd be great to have such confirmation.
Thanks,
Michal
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bug] race condition at rebind_subsystems()
2022-07-19 13:35 ` Michal Koutný
@ 2022-07-21 6:46 ` Jing-Ting Wu
2022-08-07 6:58 ` Jing-Ting Wu
0 siblings, 1 reply; 7+ messages in thread
From: Jing-Ting Wu @ 2022-07-21 6:46 UTC (permalink / raw)
To: Michal Koutný
Cc: Tejun Heo, Johannes Weiner, Zefan Li, Matthias Brugger, cgroups,
linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt,
wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen
On Tue, 2022-07-19 at 15:35 +0200, Michal Koutný wrote:
> On Mon, Jul 18, 2022 at 03:44:21PM +0800, Jing-Ting Wu <
> jing-ting.wu@mediatek.com> wrote:
> > Accroding your description, is the following patch corrent?
>
> Yes, this is what I meant (i.e. grace period before invalidating the
> removed rstat_css_node).
>
> > If the patch is correct, we will add this patch to our stability
> > test.
> > And we will continue to observe whether the problem is solved.
>
> It'd be great to have such confirmation.
Thanks for confirm the patch.
We already add patch to our stability test.
We will reply mail if we have any update.
>
> Thanks,
> Michal
Best regards,
Jing-Ting Wu
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bug] race condition at rebind_subsystems()
2022-07-21 6:46 ` Jing-Ting Wu
@ 2022-08-07 6:58 ` Jing-Ting Wu
0 siblings, 0 replies; 7+ messages in thread
From: Jing-Ting Wu @ 2022-08-07 6:58 UTC (permalink / raw)
To: Michal Koutný
Cc: Tejun Heo, Johannes Weiner, Zefan Li, Matthias Brugger, cgroups,
linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt,
wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen
On Thu, 2022-07-21 at 14:46 +0800, Jing-Ting Wu wrote:
> On Tue, 2022-07-19 at 15:35 +0200, Michal Koutný wrote:
> > On Mon, Jul 18, 2022 at 03:44:21PM +0800, Jing-Ting Wu <
> > jing-ting.wu@mediatek.com> wrote:
> > > Accroding your description, is the following patch corrent?
> >
> > Yes, this is what I meant (i.e. grace period before invalidating
> > the
> > removed rstat_css_node).
> >
> > > If the patch is correct, we will add this patch to our stability
> > > test.
> > > And we will continue to observe whether the problem is solved.
> >
> > It'd be great to have such confirmation.
>
> Thanks for confirm the patch.
> We already add patch to our stability test.
> We will reply mail if we have any update.
>
> >
> > Thanks,
> > Michal
>
Hi, Michal
Use the patch of previous mail, we pass the stability test at previous
two week and still testing.
I think the patch is helpful to the syndrome.
It passed stability test over two week so far. (as-is: < one week
failed)
Do you help to upstream this patch to mainline?
Thank you.
Best regards,
Jing-Ting Wu
>
> Best regards,
> Jing-Ting Wu
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-08-07 7:00 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-30 10:52 [Bug] race condition at rebind_subsystems() Jing-Ting Wu
2022-07-15 11:59 ` Michal Koutný
2022-07-15 16:47 ` Tejun Heo
2022-07-18 7:44 ` Jing-Ting Wu
2022-07-19 13:35 ` Michal Koutný
2022-07-21 6:46 ` Jing-Ting Wu
2022-08-07 6:58 ` Jing-Ting Wu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).