* Possible race in dev_coredumpm()-del_timer() path
@ 2022-04-13 5:29 Mukesh Ojha
2022-04-13 5:34 ` Greg KH
0 siblings, 1 reply; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-13 5:29 UTC (permalink / raw)
To: linux-kernel; +Cc: tglx, sboyd, gregkh, johannes, Rafael J. Wysocki
Hi All,
We are hitting one race due to which try_to_grab_pending() is stuck .
In following scenario, while running (p1)dev_coredumpm() devcd device is
added to
the framework and uevent notification sent to userspace that result in
the call to (p2) devcd_data_write()
which eventually try to delete the queued timer which in the racy
scenario timer is not queued yet.
So, debug object report some warning and in the meantime timer is
initialized and queued from p1 path.
and from p2 path it gets overriden again timer->entry.pprev=NULL and
try_to_grab_pending() stuck
as del_timer() always return 0 as timer_pending() return false.
P1 P2(X)
dev_coredumpm()
Uevent notification sent to
userspace
for device addition
device_add() ========================>
Process X reads this uevents
notification and do write call
that results in call to
devcd_data_write()
mod_delayed_work()
try_to_grab_pending()
del_timer()
debug_assert_init()
INIT_DELAYED_WORK
(&devcd->del_wk, devcd_del);
schedule_delayed_work(&devcd->del_wk,
DEVCD_TIMEOUT);
debug_object_fixup()
timer_fixup_assert_init()
timer_setup()
do_init_timer() ==> reinitialized the timer to timer->entry.pprev=NULL
timer_pending()
!hlist_unhashed_lockless(&timer->entry)
!h->pprev
Thanks.
Mukesh
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-13 5:29 Possible race in dev_coredumpm()-del_timer() path Mukesh Ojha
@ 2022-04-13 5:34 ` Greg KH
2022-04-13 6:41 ` Johannes Berg
2022-04-13 10:16 ` Mukesh Ojha
0 siblings, 2 replies; 12+ messages in thread
From: Greg KH @ 2022-04-13 5:34 UTC (permalink / raw)
To: Mukesh Ojha; +Cc: linux-kernel, tglx, sboyd, johannes, Rafael J. Wysocki
On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
> Hi All,
>
> We are hitting one race due to which try_to_grab_pending() is stuck .
What kernel version are you using?
> In following scenario, while running (p1)dev_coredumpm() devcd device is
> added to
> the framework and uevent notification sent to userspace that result in the
> call to (p2) devcd_data_write()
> which eventually try to delete the queued timer which in the racy scenario
> timer is not queued yet.
> So, debug object report some warning and in the meantime timer is
> initialized and queued from p1 path.
> and from p2 path it gets overriden again timer->entry.pprev=NULL and
> try_to_grab_pending() stuck
> as del_timer() always return 0 as timer_pending() return false.
>
> P1 P2(X)
>
>
> dev_coredumpm()
>
> Uevent notification sent to
> userspace
> for device addition
>
> device_add() ========================> Process X
> reads this uevents
> notification and do write call
> that results in call to
>
> devcd_data_write()
> mod_delayed_work()
> try_to_grab_pending()
> del_timer()
> debug_assert_init()
>
> INIT_DELAYED_WORK
> (&devcd->del_wk, devcd_del);
> schedule_delayed_work(&devcd->del_wk,
> DEVCD_TIMEOUT);
>
> debug_object_fixup()
> timer_fixup_assert_init()
> timer_setup()
> do_init_timer() ==> reinitialized the timer to timer->entry.pprev=NULL
>
> timer_pending()
> !hlist_unhashed_lockless(&timer->entry)
> !h->pprev
The above is confusing and not able to be understood due to the
formatting mess. Care to fix this up and resend?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-13 5:34 ` Greg KH
@ 2022-04-13 6:41 ` Johannes Berg
2022-04-13 10:16 ` Mukesh Ojha
1 sibling, 0 replies; 12+ messages in thread
From: Johannes Berg @ 2022-04-13 6:41 UTC (permalink / raw)
To: Greg KH, Mukesh Ojha; +Cc: linux-kernel, tglx, sboyd, Rafael J. Wysocki
On Wed, 2022-04-13 at 07:34 +0200, Greg KH wrote:
> >
> > debug_object_fixup()
> >
Also, isn't that a debug path with a lot of prints or something?
johannes
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-13 5:34 ` Greg KH
2022-04-13 6:41 ` Johannes Berg
@ 2022-04-13 10:16 ` Mukesh Ojha
2022-04-13 10:58 ` Greg KH
1 sibling, 1 reply; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-13 10:16 UTC (permalink / raw)
To: Greg KH; +Cc: linux-kernel, tglx, sboyd, johannes, rafael
On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
> > Hi All,
> >
> > We are hitting one race due to which try_to_grab_pending() is stuck .
>
> What kernel version are you using?
5.10
Sorry, for the formatting mess.
> > In following scenario, while running (p1)dev_coredumpm() devcd device is
> > added to
> > the framework and uevent notification sent to userspace that result in the
> > call to (p2) devcd_data_write()
> > which eventually try to delete the queued timer which in the racy scenario
> > timer is not queued yet.
> > So, debug object report some warning and in the meantime timer is
> > initialized and queued from p1 path.
> > and from p2 path it gets overriden again timer->entry.pprev=NULL and
> > try_to_grab_pending() stuck
p1 p2(X)
dev_coredump() uevent sent to userspace
device_add() =========================> userspace process X reads the uevents
writes to devcd fd which
results into writes to
devcd_data_write()
mod_delayed_work()
try_to_grab_pending()
del_timer()
debug_assert_init()
INIT_DELAYED_WORK
schedule_delayed_work
debug_object_fixup()
timer_fixup_assert_init()
timer_setup()
do_init_timer() ==> reinitialized the timer to timer->entry.pprev=NULL
timer_pending()
!hlist_unhashed_lockless(&timer->entry)
!h->pprev ==> del_timer checks and finds to be NULL
stuck in try_to_grab_pending
Thanks,Mukesh
>
>
> > In following scenario, while running (p1)dev_coredumpm() devcd device is
> > added to
> > the framework and uevent notification sent to userspace that result in the
> > call to (p2) devcd_data_write()
> > which eventually try to delete the queued timer which in the racy scenario
> > timer is not queued yet.
> > So, debug object report some warning and in the meantime timer is
> > initialized and queued from p1 path.
> > and from p2 path it gets overriden again timer->entry.pprev=NULL and
> > try_to_grab_pending() stuck
> > as del_timer() always return 0 as timer_pending() return false.
> >
> > P1 P2(X)
> >
> >
> > dev_coredumpm()
> >
> > Uevent notification sent to
> > userspace
> > for device addition
> >
> > device_add() ========================> Process X
> > reads this uevents
> > notification and do write call
> > that results in call to
> >
> > devcd_data_write()
> > mod_delayed_work()
> > try_to_grab_pending()
> > del_timer()
> > debug_assert_init()
> >
> > INIT_DELAYED_WORK
> > (&devcd->del_wk, devcd_del);
> > schedule_delayed_work(&devcd->del_wk,
> > DEVCD_TIMEOUT);
> >
> > debug_object_fixup()
> > timer_fixup_assert_init()
> > timer_setup()
> > do_init_timer() ==> reinitialized the timer to timer->entry.pprev=NULL
> >
> > timer_pending()
> > !hlist_unhashed_lockless(&timer->entry)
> > !h->pprev
>
> The above is confusing and not able to be understood due to the
> formatting mess. Care to fix this up and resend?
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-13 10:16 ` Mukesh Ojha
@ 2022-04-13 10:58 ` Greg KH
2022-04-13 11:21 ` Mukesh Ojha
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Greg KH @ 2022-04-13 10:58 UTC (permalink / raw)
To: Mukesh Ojha; +Cc: linux-kernel, tglx, sboyd, johannes, rafael
On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
> > On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
> > > Hi All,
> > >
> > > We are hitting one race due to which try_to_grab_pending() is stuck .
> >
> > What kernel version are you using?
>
> 5.10
5.10.0 was released a very long time ago. Please use a more modern
kernel release :)
> Sorry, for the formatting mess.
>
> > > In following scenario, while running (p1)dev_coredumpm() devcd device is
> > > added to
> > > the framework and uevent notification sent to userspace that result in the
> > > call to (p2) devcd_data_write()
> > > which eventually try to delete the queued timer which in the racy scenario
> > > timer is not queued yet.
> > > So, debug object report some warning and in the meantime timer is
> > > initialized and queued from p1 path.
> > > and from p2 path it gets overriden again timer->entry.pprev=NULL and
> > > try_to_grab_pending() stuck
> p1 p2(X)
>
> dev_coredump() uevent sent to userspace
> device_add() =========================> userspace process X reads the uevents
> writes to devcd fd which
> results into writes to
>
> devcd_data_write()
> mod_delayed_work()
> try_to_grab_pending()
> del_timer()
> debug_assert_init()
> INIT_DELAYED_WORK
> schedule_delayed_work
> debug_object_fixup()
Why do you have object debugging enabled? That's going to take a LONG
time, and will find bugs in your code. Perhaps like this one?
What type of device is this? What bus? What driver?
And if you turn object debugging off, what happens?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-13 10:58 ` Greg KH
@ 2022-04-13 11:21 ` Mukesh Ojha
2022-04-13 13:01 ` Greg KH
2022-04-13 14:17 ` Mukesh Ojha
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-13 11:21 UTC (permalink / raw)
To: Greg KH; +Cc: linux-kernel, tglx, sboyd, johannes, rafael
On 4/13/2022 4:28 PM, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
>>> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are hitting one race due to which try_to_grab_pending() is stuck .
>>>
>>> What kernel version are you using?
>>
>> 5.10
>
> 5.10.0 was released a very long time ago. Please use a more modern
> kernel release :)
>
>> Sorry, for the formatting mess.
>>
>>>> In following scenario, while running (p1)dev_coredumpm() devcd device is
>>>> added to
>>>> the framework and uevent notification sent to userspace that result in the
>>>> call to (p2) devcd_data_write()
>>>> which eventually try to delete the queued timer which in the racy scenario
>>>> timer is not queued yet.
>>>> So, debug object report some warning and in the meantime timer is
>>>> initialized and queued from p1 path.
>>>> and from p2 path it gets overriden again timer->entry.pprev=NULL and
>>>> try_to_grab_pending() stuck
>> p1 p2(X)
>>
>> dev_coredump() uevent sent to userspace
>> device_add() =========================> userspace process X reads the uevents
>> writes to devcd fd which
>> results into writes to
>>
>> devcd_data_write()
>> mod_delayed_work()
>> try_to_grab_pending()
>> del_timer()
>> debug_assert_init()
>> INIT_DELAYED_WORK
>> schedule_delayed_work
>> debug_object_fixup()
>
> Why do you have object debugging enabled? That's going to take a LONG
> time, and will find bugs in your code. Perhaps like this one?
> There is no issue if we disable debug object.
Here, some client module try to collect dump
via dev_coredumpm() which creates devcdX device and
expects userspace to read this data. Here, it might be
exposing a synchronization issue between dev_coredumpm()
and devcd_data_write() perhaps, a mutex ??
================o<===============================
11
12 diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
13 index 9243468..a620dcb 100644
14 --- a/drivers/base/devcoredump.c
15 +++ b/drivers/base/devcoredump.c
16 @@ -29,6 +29,7 @@ struct devcd_entry {
17 struct device devcd_dev;
18 void *data;
19 size_t datalen;
20 + struct mutex mutex;
21 struct module *owner;
22 ssize_t (*read)(char *buffer, loff_t offset, size_t count,
23 void *data, size_t datalen);
24 @@ -88,7 +89,9 @@ static ssize_t devcd_data_write(struct file
*filp, struct kobject *kobj,
25 struct device *dev = kobj_to_dev(kobj);
26 struct devcd_entry *devcd = dev_to_devcd(dev);
27
28 + mutex_lock(&devcd->mutex);
29 mod_delayed_work(system_wq, &devcd->del_wk, 0);
30 + mutex_unlock(&devcd->mutex);
31
32 return count;
33 }
34 @@ -282,13 +285,14 @@ void dev_coredumpm(struct device *dev, struct
module *owner,
35 devcd->read = read;
36 devcd->free = free;
37 devcd->failing_dev = get_device(dev);
38 -
39 + mutex_init(&devcd->mutex);
40 device_initialize(&devcd->devcd_dev);
41
42 dev_set_name(&devcd->devcd_dev, "devcd%d",
43 atomic_inc_return(&devcd_count));
44 devcd->devcd_dev.class = &devcd_class;
45
46 + mutex_lock(&devcd->mutex);
47 if (device_add(&devcd->devcd_dev))
48 goto put_device;
49
50 @@ -302,10 +306,11 @@ void dev_coredumpm(struct device *dev, struct
module *owner,
51
52 INIT_DELAYED_WORK(&devcd->del_wk, devcd_del);
53 schedule_delayed_work(&devcd->del_wk, DEVCD_TIMEOUT);
54 -
55 + mutex_unlock(&devcd->mutex);
Thanks,
-Mukesh
> What type of device is this? What bus? What driver?
>
> And if you turn object debugging off, what happens?
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-13 11:21 ` Mukesh Ojha
@ 2022-04-13 13:01 ` Greg KH
0 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2022-04-13 13:01 UTC (permalink / raw)
To: Mukesh Ojha; +Cc: linux-kernel, tglx, sboyd, johannes, rafael
On Wed, Apr 13, 2022 at 04:51:18PM +0530, Mukesh Ojha wrote:
>
>
> On 4/13/2022 4:28 PM, Greg KH wrote:
> > On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
> > > On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
> > > > On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
> > > > > Hi All,
> > > > >
> > > > > We are hitting one race due to which try_to_grab_pending() is stuck .
> > > >
> > > > What kernel version are you using?
> > >
> > > 5.10
> >
> > 5.10.0 was released a very long time ago. Please use a more modern
> > kernel release :)
> >
> > > Sorry, for the formatting mess.
> > >
> > > > > In following scenario, while running (p1)dev_coredumpm() devcd device is
> > > > > added to
> > > > > the framework and uevent notification sent to userspace that result in the
> > > > > call to (p2) devcd_data_write()
> > > > > which eventually try to delete the queued timer which in the racy scenario
> > > > > timer is not queued yet.
> > > > > So, debug object report some warning and in the meantime timer is
> > > > > initialized and queued from p1 path.
> > > > > and from p2 path it gets overriden again timer->entry.pprev=NULL and
> > > > > try_to_grab_pending() stuck
> > > p1 p2(X)
> > >
> > > dev_coredump() uevent sent to userspace
> > > device_add() =========================> userspace process X reads the uevents
> > > writes to devcd fd which
> > > results into writes to
> > >
> > > devcd_data_write()
> > > mod_delayed_work()
> > > try_to_grab_pending()
> > > del_timer()
> > > debug_assert_init()
> > > INIT_DELAYED_WORK
> > > schedule_delayed_work
> > > debug_object_fixup()
> >
> > Why do you have object debugging enabled? That's going to take a LONG
> > time, and will find bugs in your code. Perhaps like this one?
> > There is no issue if we disable debug object.
> Here, some client module try to collect dump
> via dev_coredumpm() which creates devcdX device and
> expects userspace to read this data. Here, it might be
> exposing a synchronization issue between dev_coredumpm()
> and devcd_data_write() perhaps, a mutex ??
Any reason you did not answer any of the questions I asked?
{sigh}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-13 10:58 ` Greg KH
2022-04-13 11:21 ` Mukesh Ojha
@ 2022-04-13 14:17 ` Mukesh Ojha
2022-04-13 14:18 ` Mukesh Ojha
2022-04-14 10:38 ` Thomas Gleixner
3 siblings, 0 replies; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-13 14:17 UTC (permalink / raw)
To: Greg KH; +Cc: linux-kernel, tglx, sboyd, johannes, rafael
On 4/13/2022 4:28 PM, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
>>> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are hitting one race due to which try_to_grab_pending() is stuck .
>>>
>>> What kernel version are you using?
>>
>> 5.10
>
> 5.10.0 was released a very long time ago. Please use a more modern
> kernel release :)
>
It would not be feasible for us to switch to latest kernel and I think,
this issue could be there in recent kernel as well.
>> Sorry, for the formatting mess.
>>
>>>> In following scenario, while running (p1)dev_coredumpm() devcd device is
>>>> added to
>>>> the framework and uevent notification sent to userspace that result in the
>>>> call to (p2) devcd_data_write()
>>>> which eventually try to delete the queued timer which in the racy scenario
>>>> timer is not queued yet.
>>>> So, debug object report some warning and in the meantime timer is
>>>> initialized and queued from p1 path.
>>>> and from p2 path it gets overriden again timer->entry.pprev=NULL and
>>>> try_to_grab_pending() stuck
>> p1 p2(X)
>>
>> dev_coredump() uevent sent to userspace
>> device_add() =========================> userspace process X reads the uevents
>> writes to devcd fd which
>> results into writes to
>>
>> devcd_data_write()
>> mod_delayed_work()
>> try_to_grab_pending()
>> del_timer()
>> debug_assert_init()
>> INIT_DELAYED_WORK
>> schedule_delayed_work
>> debug_object_fixup()
>
> Why do you have object debugging enabled?
We have enabled object debugging to catch more issues around kernel.
> That's going to take a LONG
> time, and will find bugs in your code. Perhaps like this one?
>
> What type of device is this? What bus? What driver?
remoteproc client device driver would call dev_coredumpm() and devcd
device gets added as part of the call.
>
> And if you turn object debugging off, what happens?
We have not observed issue after disabling object debugging off.
Regards,
Mukesh
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-13 10:58 ` Greg KH
2022-04-13 11:21 ` Mukesh Ojha
2022-04-13 14:17 ` Mukesh Ojha
@ 2022-04-13 14:18 ` Mukesh Ojha
2022-04-14 10:38 ` Thomas Gleixner
3 siblings, 0 replies; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-13 14:18 UTC (permalink / raw)
To: Greg KH, linux-remoteproc; +Cc: linux-kernel, tglx, sboyd, johannes, rafael
On 4/13/2022 4:28 PM, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
>>> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are hitting one race due to which try_to_grab_pending() is stuck .
>>>
>>> What kernel version are you using?
>>
>> 5.10
>
> 5.10.0 was released a very long time ago. Please use a more modern
> kernel release :)
>
It would not be feasible for us to switch to latest kernel and I think,
this issue could be there in recent kernel as well.
>> Sorry, for the formatting mess.
>>
>>>> In following scenario, while running (p1)dev_coredumpm() devcd device is
>>>> added to
>>>> the framework and uevent notification sent to userspace that result in the
>>>> call to (p2) devcd_data_write()
>>>> which eventually try to delete the queued timer which in the racy scenario
>>>> timer is not queued yet.
>>>> So, debug object report some warning and in the meantime timer is
>>>> initialized and queued from p1 path.
>>>> and from p2 path it gets overriden again timer->entry.pprev=NULL and
>>>> try_to_grab_pending() stuck
>> p1 p2(X)
>>
>> dev_coredump() uevent sent to userspace
>> device_add() =========================> userspace process X reads the uevents
>> writes to devcd fd which
>> results into writes to
>>
>> devcd_data_write()
>> mod_delayed_work()
>> try_to_grab_pending()
>> del_timer()
>> debug_assert_init()
>> INIT_DELAYED_WORK
>> schedule_delayed_work
>> debug_object_fixup()
>
> Why do you have object debugging enabled?
We have enabled object debugging to catch more issues around kernel.
> That's going to take a LONG
> time, and will find bugs in your code. Perhaps like this one?
>
> What type of device is this? What bus? What driver?
remoteproc client device driver would call dev_coredumpm() and devcd
device gets added as part of the call.
>
> And if you turn object debugging off, what happens?
We have not observed issue after disabling object debugging off.
Regards,
Mukesh
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-13 10:58 ` Greg KH
` (2 preceding siblings ...)
2022-04-13 14:18 ` Mukesh Ojha
@ 2022-04-14 10:38 ` Thomas Gleixner
2022-04-14 11:20 ` Mukesh Ojha
3 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2022-04-14 10:38 UTC (permalink / raw)
To: Greg KH, Mukesh Ojha; +Cc: linux-kernel, sboyd, johannes, rafael
On Wed, Apr 13 2022 at 12:58, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> p1 p2(X)
>>
>> dev_coredump() uevent sent to userspace
>> device_add() =========================> userspace process X reads the uevents
>> writes to devcd fd which
>> results into writes to
>>
>> devcd_data_write()
>> mod_delayed_work()
>> try_to_grab_pending()
>> del_timer()
>> debug_assert_init()
>> INIT_DELAYED_WORK
>> schedule_delayed_work
>> debug_object_fixup()
>
> Why do you have object debugging enabled? That's going to take a LONG
> time, and will find bugs in your code. Perhaps like this one?
It's not finding bugs in his code. It finds bug in the upstream
dev_coredump code.
> And if you turn object debugging off, what happens?
The debugobject splat goes away, but the problem persists.
device_add() -> uevent
Preemption or concurrency:
devcd_data_write()
mod_delayed_work(..., w, 0); <- Uninitialized.
The dev_coredump code exposes the device before it is fully initialized
and a write ending up in devcd_data_write() touches uninitialized work.
It does not help to move the initialization before device_add() as that
creates another problem:
INIT_DELAYED_WORK(w)
...
device_add() -> uevent
Preemption or concurrency:
devcd_data_write()
mod_delayed_work(..., w, 0); <- Schedules work immediately
work_queue_runs()
devcd_del(w)
device_del()
put_device() <- Drops the last reference
initialization continues...
So, yes this needs serialization of some sort.
Same problem vs. disabled_store().
Thanks,
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-14 10:38 ` Thomas Gleixner
@ 2022-04-14 11:20 ` Mukesh Ojha
2022-04-14 11:31 ` Thomas Gleixner
0 siblings, 1 reply; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-14 11:20 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: gregkh, linux-kernel, sboyd, johannes, rafael
On Thu, Apr 14, 2022 at 12:38:13PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 13 2022 at 12:58, Greg KH wrote:
> > On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
> >> p1 p2(X)
> >>
> >> dev_coredump() uevent sent to userspace
> >> device_add() =========================> userspace process X reads the uevents
> >> writes to devcd fd which
> >> results into writes to
> >>
> >> devcd_data_write()
> >> mod_delayed_work()
> >> try_to_grab_pending()
> >> del_timer()
> >> debug_assert_init()
> >> INIT_DELAYED_WORK
> >> schedule_delayed_work
> >> debug_object_fixup()
> >
> > Why do you have object debugging enabled? That's going to take a LONG
> > time, and will find bugs in your code. Perhaps like this one?
>
> It's not finding bugs in his code. It finds bug in the upstream
> dev_coredump code.
>
> > And if you turn object debugging off, what happens?
>
> The debugobject splat goes away, but the problem persists.
>
> device_add() -> uevent
>
> Preemption or concurrency:
>
> devcd_data_write()
> mod_delayed_work(..., w, 0); <- Uninitialized.
>
> The dev_coredump code exposes the device before it is fully initialized
> and a write ending up in devcd_data_write() touches uninitialized work.
>
> It does not help to move the initialization before device_add() as that
> creates another problem:
>
> INIT_DELAYED_WORK(w)
> ...
> device_add() -> uevent
>
> Preemption or concurrency:
>
> devcd_data_write()
> mod_delayed_work(..., w, 0); <- Schedules work immediately
>
> work_queue_runs()
> devcd_del(w)
> device_del()
> put_device() <- Drops the last reference
>
> initialization continues...
>
> So, yes this needs serialization of some sort.
Hi Thomas,
Thanks for understanding the problem.
Can the patch mentioned at below link helps with the first problem ?
https://lore.kernel.org/lkml/57a04278-0a60-cc7d-7ce8-a75c2befd568@quicinc.com/
>
> Same problem vs. disabled_store().
you mean, while userspace is reading the data and suddenly disable_store() done from
sysfs.
-Mukesh
>
> Thanks,
>
> tglx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Possible race in dev_coredumpm()-del_timer() path
2022-04-14 11:20 ` Mukesh Ojha
@ 2022-04-14 11:31 ` Thomas Gleixner
0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2022-04-14 11:31 UTC (permalink / raw)
To: Mukesh Ojha; +Cc: gregkh, linux-kernel, sboyd, johannes, rafael
On Thu, Apr 14 2022 at 16:50, Mukesh Ojha wrote:
> On Thu, Apr 14, 2022 at 12:38:13PM +0200, Thomas Gleixner wrote:
>> So, yes this needs serialization of some sort.
>
> Thanks for understanding the problem.
> Can the patch mentioned at below link helps with the first problem ?
>
> https://lore.kernel.org/lkml/57a04278-0a60-cc7d-7ce8-a75c2befd568@quicinc.com/
Something like that.
>> Same problem vs. disabled_store().
>
> you mean, while userspace is reading the data and suddenly disable_store() done from
> sysfs.
No, that's not a problem because the reader holds a reference, but it's
the same problem vs. initialization:
device_add()
disable_store()
devcd_free()
mod_delayed_work()
....
Thanks,
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-04-14 11:32 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13 5:29 Possible race in dev_coredumpm()-del_timer() path Mukesh Ojha
2022-04-13 5:34 ` Greg KH
2022-04-13 6:41 ` Johannes Berg
2022-04-13 10:16 ` Mukesh Ojha
2022-04-13 10:58 ` Greg KH
2022-04-13 11:21 ` Mukesh Ojha
2022-04-13 13:01 ` Greg KH
2022-04-13 14:17 ` Mukesh Ojha
2022-04-13 14:18 ` Mukesh Ojha
2022-04-14 10:38 ` Thomas Gleixner
2022-04-14 11:20 ` Mukesh Ojha
2022-04-14 11:31 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).