linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible race in dev_coredumpm()-del_timer() path
@ 2022-04-13  5:29 Mukesh Ojha
  2022-04-13  5:34 ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-13  5:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, sboyd, gregkh, johannes, Rafael J. Wysocki

Hi All,

We are hitting one race due to which try_to_grab_pending() is stuck .
In following scenario, while running (p1)dev_coredumpm() devcd device is 
added to
the framework and uevent notification sent to userspace that result in 
the call to (p2) devcd_data_write()
which eventually try to delete the queued timer which in the racy 
scenario timer is not queued yet.
So, debug object report some warning and in the meantime timer is 
initialized and queued from p1 path.
and from p2 path it gets overriden again timer->entry.pprev=NULL and 
try_to_grab_pending() stuck
as del_timer() always return 0 as timer_pending() return false.

P1 P2(X)


      dev_coredumpm()

                                           Uevent notification sent to 
userspace
                                           for device addition

             device_add() ========================>                 
Process X reads this uevents
notification and do write call
that results in call to

devcd_data_write()
mod_delayed_work()
try_to_grab_pending()
del_timer()
debug_assert_init()

             INIT_DELAYED_WORK
                    (&devcd->del_wk, devcd_del);
             schedule_delayed_work(&devcd->del_wk,
                    DEVCD_TIMEOUT);

debug_object_fixup()
timer_fixup_assert_init()
timer_setup()
do_init_timer()   ==> reinitialized the timer to timer->entry.pprev=NULL

timer_pending()
!hlist_unhashed_lockless(&timer->entry)
!h->pprev



Thanks.
Mukesh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-13  5:29 Possible race in dev_coredumpm()-del_timer() path Mukesh Ojha
@ 2022-04-13  5:34 ` Greg KH
  2022-04-13  6:41   ` Johannes Berg
  2022-04-13 10:16   ` Mukesh Ojha
  0 siblings, 2 replies; 12+ messages in thread
From: Greg KH @ 2022-04-13  5:34 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: linux-kernel, tglx, sboyd, johannes, Rafael J. Wysocki

On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
> Hi All,
> 
> We are hitting one race due to which try_to_grab_pending() is stuck .

What kernel version are you using?

> In following scenario, while running (p1)dev_coredumpm() devcd device is
> added to
> the framework and uevent notification sent to userspace that result in the
> call to (p2) devcd_data_write()
> which eventually try to delete the queued timer which in the racy scenario
> timer is not queued yet.
> So, debug object report some warning and in the meantime timer is
> initialized and queued from p1 path.
> and from p2 path it gets overriden again timer->entry.pprev=NULL and
> try_to_grab_pending() stuck
> as del_timer() always return 0 as timer_pending() return false.
> 
> P1 P2(X)
> 
> 
>      dev_coredumpm()
> 
>                                           Uevent notification sent to
> userspace
>                                           for device addition
> 
>             device_add() ========================>                 Process X
> reads this uevents
> notification and do write call
> that results in call to
> 
> devcd_data_write()
> mod_delayed_work()
> try_to_grab_pending()
> del_timer()
> debug_assert_init()
> 
>             INIT_DELAYED_WORK
>                    (&devcd->del_wk, devcd_del);
>             schedule_delayed_work(&devcd->del_wk,
>                    DEVCD_TIMEOUT);
> 
> debug_object_fixup()
> timer_fixup_assert_init()
> timer_setup()
> do_init_timer()   ==> reinitialized the timer to timer->entry.pprev=NULL
> 
> timer_pending()
> !hlist_unhashed_lockless(&timer->entry)
> !h->pprev

The above is confusing and not able to be understood due to the
formatting mess.  Care to fix this up and resend?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-13  5:34 ` Greg KH
@ 2022-04-13  6:41   ` Johannes Berg
  2022-04-13 10:16   ` Mukesh Ojha
  1 sibling, 0 replies; 12+ messages in thread
From: Johannes Berg @ 2022-04-13  6:41 UTC (permalink / raw)
  To: Greg KH, Mukesh Ojha; +Cc: linux-kernel, tglx, sboyd, Rafael J. Wysocki

On Wed, 2022-04-13 at 07:34 +0200, Greg KH wrote:
> > 
> > debug_object_fixup()
> > 
Also, isn't that a debug path with a lot of prints or something?

johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-13  5:34 ` Greg KH
  2022-04-13  6:41   ` Johannes Berg
@ 2022-04-13 10:16   ` Mukesh Ojha
  2022-04-13 10:58     ` Greg KH
  1 sibling, 1 reply; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-13 10:16 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, tglx, sboyd, johannes, rafael

On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
> > Hi All,
> > 
> > We are hitting one race due to which try_to_grab_pending() is stuck .
> 
> What kernel version are you using?

5.10

Sorry, for the formatting mess.

> > In following scenario, while running (p1)dev_coredumpm() devcd device is
> > added to
> > the framework and uevent notification sent to userspace that result in the
> > call to (p2) devcd_data_write()
> > which eventually try to delete the queued timer which in the racy scenario
> > timer is not queued yet.
> > So, debug object report some warning and in the meantime timer is
> > initialized and queued from p1 path.
> > and from p2 path it gets overriden again timer->entry.pprev=NULL and
> > try_to_grab_pending() stuck
	p1 					p2(X)

   dev_coredump() uevent sent to userspace
   device_add()  =========================> userspace process X reads the uevents
                                            writes to devcd fd which
                                            results into writes to

                                            devcd_data_write()
					      mod_delayed_work()
                                                try_to_grab_pending()
						  del_timer()
						   debug_assert_init()
  INIT_DELAYED_WORK 
  schedule_delayed_work
						    debug_object_fixup()
						      timer_fixup_assert_init()
							timer_setup()
							  do_init_timer()   ==> reinitialized the timer to timer->entry.pprev=NULL
							
					          timer_pending()
						  !hlist_unhashed_lockless(&timer->entry)
						  !h->pprev  ==> del_timer checks and finds to be NULL
								 stuck in try_to_grab_pending

Thanks,Mukesh
> 
> 
> > In following scenario, while running (p1)dev_coredumpm() devcd device is
> > added to
> > the framework and uevent notification sent to userspace that result in the
> > call to (p2) devcd_data_write()
> > which eventually try to delete the queued timer which in the racy scenario
> > timer is not queued yet.
> > So, debug object report some warning and in the meantime timer is
> > initialized and queued from p1 path.
> > and from p2 path it gets overriden again timer->entry.pprev=NULL and
> > try_to_grab_pending() stuck
> > as del_timer() always return 0 as timer_pending() return false.
> > 
> > P1 P2(X)
> > 
> > 
> >      dev_coredumpm()
> > 
> >                                           Uevent notification sent to
> > userspace
> >                                           for device addition
> > 
> >             device_add() ========================>                 Process X
> > reads this uevents
> > notification and do write call
> > that results in call to
> > 
> > devcd_data_write()
> > mod_delayed_work()
> > try_to_grab_pending()
> > del_timer()
> > debug_assert_init()
> > 
> >             INIT_DELAYED_WORK
> >                    (&devcd->del_wk, devcd_del);
> >             schedule_delayed_work(&devcd->del_wk,
> >                    DEVCD_TIMEOUT);
> > 
> > debug_object_fixup()
> > timer_fixup_assert_init()
> > timer_setup()
> > do_init_timer()   ==> reinitialized the timer to timer->entry.pprev=NULL
> > 
> > timer_pending()
> > !hlist_unhashed_lockless(&timer->entry)
> > !h->pprev
> 
> The above is confusing and not able to be understood due to the
> formatting mess.  Care to fix this up and resend?
> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-13 10:16   ` Mukesh Ojha
@ 2022-04-13 10:58     ` Greg KH
  2022-04-13 11:21       ` Mukesh Ojha
                         ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Greg KH @ 2022-04-13 10:58 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: linux-kernel, tglx, sboyd, johannes, rafael

On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
> > On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
> > > Hi All,
> > > 
> > > We are hitting one race due to which try_to_grab_pending() is stuck .
> > 
> > What kernel version are you using?
> 
> 5.10

5.10.0 was released a very long time ago.  Please use a more modern
kernel release :)

> Sorry, for the formatting mess.
> 
> > > In following scenario, while running (p1)dev_coredumpm() devcd device is
> > > added to
> > > the framework and uevent notification sent to userspace that result in the
> > > call to (p2) devcd_data_write()
> > > which eventually try to delete the queued timer which in the racy scenario
> > > timer is not queued yet.
> > > So, debug object report some warning and in the meantime timer is
> > > initialized and queued from p1 path.
> > > and from p2 path it gets overriden again timer->entry.pprev=NULL and
> > > try_to_grab_pending() stuck
> 	p1 					p2(X)
> 
>    dev_coredump() uevent sent to userspace
>    device_add()  =========================> userspace process X reads the uevents
>                                             writes to devcd fd which
>                                             results into writes to
> 
>                                             devcd_data_write()
> 					      mod_delayed_work()
>                                                 try_to_grab_pending()
> 						  del_timer()
> 						   debug_assert_init()
>   INIT_DELAYED_WORK 
>   schedule_delayed_work
> 						    debug_object_fixup()

Why do you have object debugging enabled?  That's going to take a LONG
time, and will find bugs in your code.  Perhaps like this one?

What type of device is this?  What bus?  What driver?

And if you turn object debugging off, what happens?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-13 10:58     ` Greg KH
@ 2022-04-13 11:21       ` Mukesh Ojha
  2022-04-13 13:01         ` Greg KH
  2022-04-13 14:17       ` Mukesh Ojha
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-13 11:21 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, tglx, sboyd, johannes, rafael



On 4/13/2022 4:28 PM, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
>>> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are hitting one race due to which try_to_grab_pending() is stuck .
>>>
>>> What kernel version are you using?
>>
>> 5.10
> 
> 5.10.0 was released a very long time ago.  Please use a more modern
> kernel release :)
> 
>> Sorry, for the formatting mess.
>>
>>>> In following scenario, while running (p1)dev_coredumpm() devcd device is
>>>> added to
>>>> the framework and uevent notification sent to userspace that result in the
>>>> call to (p2) devcd_data_write()
>>>> which eventually try to delete the queued timer which in the racy scenario
>>>> timer is not queued yet.
>>>> So, debug object report some warning and in the meantime timer is
>>>> initialized and queued from p1 path.
>>>> and from p2 path it gets overriden again timer->entry.pprev=NULL and
>>>> try_to_grab_pending() stuck
>> 	p1 					p2(X)
>>
>>     dev_coredump() uevent sent to userspace
>>     device_add()  =========================> userspace process X reads the uevents
>>                                              writes to devcd fd which
>>                                              results into writes to
>>
>>                                              devcd_data_write()
>> 					      mod_delayed_work()
>>                                                  try_to_grab_pending()
>> 						  del_timer()
>> 						   debug_assert_init()
>>    INIT_DELAYED_WORK
>>    schedule_delayed_work
>> 						    debug_object_fixup()
> 
> Why do you have object debugging enabled?  That's going to take a LONG
> time, and will find bugs in your code.  Perhaps like this one?
> There is no issue if we disable debug object.
Here, some client module try to collect dump
via dev_coredumpm() which creates devcdX device and
expects userspace to read this data. Here, it might be
exposing a synchronization issue between dev_coredumpm()
and  devcd_data_write() perhaps, a mutex ??

================o<===============================

  11
  12 diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
  13 index 9243468..a620dcb 100644
  14 --- a/drivers/base/devcoredump.c
  15 +++ b/drivers/base/devcoredump.c
  16 @@ -29,6 +29,7 @@ struct devcd_entry {
  17         struct device devcd_dev;
  18         void *data;
  19         size_t datalen;
  20 +       struct mutex mutex;
  21         struct module *owner;
  22         ssize_t (*read)(char *buffer, loff_t offset, size_t count,
  23                         void *data, size_t datalen);
  24 @@ -88,7 +89,9 @@ static ssize_t devcd_data_write(struct file 
*filp, struct kobject *kobj,
  25         struct device *dev = kobj_to_dev(kobj);
  26         struct devcd_entry *devcd = dev_to_devcd(dev);
  27
  28 +       mutex_lock(&devcd->mutex);
  29         mod_delayed_work(system_wq, &devcd->del_wk, 0);
  30 +       mutex_unlock(&devcd->mutex);
  31
  32         return count;
  33  }
  34 @@ -282,13 +285,14 @@ void dev_coredumpm(struct device *dev, struct 
module *owner,
  35         devcd->read = read;
  36         devcd->free = free;
  37         devcd->failing_dev = get_device(dev);
  38 -
  39 +       mutex_init(&devcd->mutex);
  40         device_initialize(&devcd->devcd_dev);
  41
  42         dev_set_name(&devcd->devcd_dev, "devcd%d",
  43                      atomic_inc_return(&devcd_count));
  44         devcd->devcd_dev.class = &devcd_class;
  45
  46 +       mutex_lock(&devcd->mutex);
  47         if (device_add(&devcd->devcd_dev))
  48                 goto put_device;
  49
  50 @@ -302,10 +306,11 @@ void dev_coredumpm(struct device *dev, struct 
module *owner,
  51
  52         INIT_DELAYED_WORK(&devcd->del_wk, devcd_del);
  53         schedule_delayed_work(&devcd->del_wk, DEVCD_TIMEOUT);
  54 -
  55 +       mutex_unlock(&devcd->mutex);


Thanks,
-Mukesh

> What type of device is this?  What bus?  What driver?
> 
> And if you turn object debugging off, what happens?
> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-13 11:21       ` Mukesh Ojha
@ 2022-04-13 13:01         ` Greg KH
  0 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2022-04-13 13:01 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: linux-kernel, tglx, sboyd, johannes, rafael

On Wed, Apr 13, 2022 at 04:51:18PM +0530, Mukesh Ojha wrote:
> 
> 
> On 4/13/2022 4:28 PM, Greg KH wrote:
> > On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
> > > On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
> > > > On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
> > > > > Hi All,
> > > > > 
> > > > > We are hitting one race due to which try_to_grab_pending() is stuck .
> > > > 
> > > > What kernel version are you using?
> > > 
> > > 5.10
> > 
> > 5.10.0 was released a very long time ago.  Please use a more modern
> > kernel release :)
> > 
> > > Sorry, for the formatting mess.
> > > 
> > > > > In following scenario, while running (p1)dev_coredumpm() devcd device is
> > > > > added to
> > > > > the framework and uevent notification sent to userspace that result in the
> > > > > call to (p2) devcd_data_write()
> > > > > which eventually try to delete the queued timer which in the racy scenario
> > > > > timer is not queued yet.
> > > > > So, debug object report some warning and in the meantime timer is
> > > > > initialized and queued from p1 path.
> > > > > and from p2 path it gets overriden again timer->entry.pprev=NULL and
> > > > > try_to_grab_pending() stuck
> > > 	p1 					p2(X)
> > > 
> > >     dev_coredump() uevent sent to userspace
> > >     device_add()  =========================> userspace process X reads the uevents
> > >                                              writes to devcd fd which
> > >                                              results into writes to
> > > 
> > >                                              devcd_data_write()
> > > 					      mod_delayed_work()
> > >                                                  try_to_grab_pending()
> > > 						  del_timer()
> > > 						   debug_assert_init()
> > >    INIT_DELAYED_WORK
> > >    schedule_delayed_work
> > > 						    debug_object_fixup()
> > 
> > Why do you have object debugging enabled?  That's going to take a LONG
> > time, and will find bugs in your code.  Perhaps like this one?
> > There is no issue if we disable debug object.
> Here, some client module try to collect dump
> via dev_coredumpm() which creates devcdX device and
> expects userspace to read this data. Here, it might be
> exposing a synchronization issue between dev_coredumpm()
> and  devcd_data_write() perhaps, a mutex ??

Any reason you did not answer any of the questions I asked?

{sigh}


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-13 10:58     ` Greg KH
  2022-04-13 11:21       ` Mukesh Ojha
@ 2022-04-13 14:17       ` Mukesh Ojha
  2022-04-13 14:18       ` Mukesh Ojha
  2022-04-14 10:38       ` Thomas Gleixner
  3 siblings, 0 replies; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-13 14:17 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, tglx, sboyd, johannes, rafael



On 4/13/2022 4:28 PM, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
>>> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are hitting one race due to which try_to_grab_pending() is stuck .
>>>
>>> What kernel version are you using?
>>
>> 5.10
> 
> 5.10.0 was released a very long time ago.  Please use a more modern
> kernel release :)
> 

It would not be feasible for us to switch to latest kernel and I think, 
this issue could be there in recent kernel as well.

>> Sorry, for the formatting mess.
>>
>>>> In following scenario, while running (p1)dev_coredumpm() devcd device is
>>>> added to
>>>> the framework and uevent notification sent to userspace that result in the
>>>> call to (p2) devcd_data_write()
>>>> which eventually try to delete the queued timer which in the racy scenario
>>>> timer is not queued yet.
>>>> So, debug object report some warning and in the meantime timer is
>>>> initialized and queued from p1 path.
>>>> and from p2 path it gets overriden again timer->entry.pprev=NULL and
>>>> try_to_grab_pending() stuck
>> 	p1 					p2(X)
>>
>>     dev_coredump() uevent sent to userspace
>>     device_add()  =========================> userspace process X reads the uevents
>>                                              writes to devcd fd which
>>                                              results into writes to
>>
>>                                              devcd_data_write()
>> 					      mod_delayed_work()
>>                                                  try_to_grab_pending()
>> 						  del_timer()
>> 						   debug_assert_init()
>>    INIT_DELAYED_WORK
>>    schedule_delayed_work
>> 						    debug_object_fixup()
> 
> Why do you have object debugging enabled?

We have enabled object debugging to catch more issues around kernel.

>  That's going to take a LONG
> time, and will find bugs in your code.  Perhaps like this one? 
> 
> What type of device is this?  What bus?  What driver?

remoteproc client device driver would call dev_coredumpm() and devcd 
device gets added as part of the call.

> 
> And if you turn object debugging off, what happens?

We have not observed issue after disabling object debugging off.

Regards,
Mukesh

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-13 10:58     ` Greg KH
  2022-04-13 11:21       ` Mukesh Ojha
  2022-04-13 14:17       ` Mukesh Ojha
@ 2022-04-13 14:18       ` Mukesh Ojha
  2022-04-14 10:38       ` Thomas Gleixner
  3 siblings, 0 replies; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-13 14:18 UTC (permalink / raw)
  To: Greg KH, linux-remoteproc; +Cc: linux-kernel, tglx, sboyd, johannes, rafael



On 4/13/2022 4:28 PM, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
>>> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are hitting one race due to which try_to_grab_pending() is stuck .
>>>
>>> What kernel version are you using?
>>
>> 5.10
> 
> 5.10.0 was released a very long time ago.  Please use a more modern
> kernel release :)
> 

It would not be feasible for us to switch to latest kernel and I think, 
this issue could be there in recent kernel as well.

>> Sorry, for the formatting mess.
>>
>>>> In following scenario, while running (p1)dev_coredumpm() devcd device is
>>>> added to
>>>> the framework and uevent notification sent to userspace that result in the
>>>> call to (p2) devcd_data_write()
>>>> which eventually try to delete the queued timer which in the racy scenario
>>>> timer is not queued yet.
>>>> So, debug object report some warning and in the meantime timer is
>>>> initialized and queued from p1 path.
>>>> and from p2 path it gets overriden again timer->entry.pprev=NULL and
>>>> try_to_grab_pending() stuck
>> 	p1 					p2(X)
>>
>>     dev_coredump() uevent sent to userspace
>>     device_add()  =========================> userspace process X reads the uevents
>>                                              writes to devcd fd which
>>                                              results into writes to
>>
>>                                              devcd_data_write()
>> 					      mod_delayed_work()
>>                                                  try_to_grab_pending()
>> 						  del_timer()
>> 						   debug_assert_init()
>>    INIT_DELAYED_WORK
>>    schedule_delayed_work
>> 						    debug_object_fixup()
> 
> Why do you have object debugging enabled?

We have enabled object debugging to catch more issues around kernel.

>  That's going to take a LONG
> time, and will find bugs in your code.  Perhaps like this one? 
> 
> What type of device is this?  What bus?  What driver?

remoteproc client device driver would call dev_coredumpm() and devcd 
device gets added as part of the call.

> 
> And if you turn object debugging off, what happens?

We have not observed issue after disabling object debugging off.

Regards,
Mukesh

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-13 10:58     ` Greg KH
                         ` (2 preceding siblings ...)
  2022-04-13 14:18       ` Mukesh Ojha
@ 2022-04-14 10:38       ` Thomas Gleixner
  2022-04-14 11:20         ` Mukesh Ojha
  3 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2022-04-14 10:38 UTC (permalink / raw)
  To: Greg KH, Mukesh Ojha; +Cc: linux-kernel, sboyd, johannes, rafael

On Wed, Apr 13 2022 at 12:58, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> 	p1 					p2(X)
>> 
>>    dev_coredump() uevent sent to userspace
>>    device_add()  =========================> userspace process X reads the uevents
>>                                             writes to devcd fd which
>>                                             results into writes to
>> 
>>                                             devcd_data_write()
>> 					      mod_delayed_work()
>>                                                 try_to_grab_pending()
>> 						  del_timer()
>> 						   debug_assert_init()
>>   INIT_DELAYED_WORK 
>>   schedule_delayed_work
>> 						    debug_object_fixup()
>
> Why do you have object debugging enabled?  That's going to take a LONG
> time, and will find bugs in your code.  Perhaps like this one?

It's not finding bugs in his code. It finds bug in the upstream
dev_coredump code.

> And if you turn object debugging off, what happens?

The debugobject splat goes away, but the problem persists.

    device_add()        -> uevent

Preemption or concurrency:

    devcd_data_write()
       mod_delayed_work(..., w, 0);      <- Uninitialized.

The dev_coredump code exposes the device before it is fully initialized
and a write ending up in devcd_data_write() touches uninitialized work.

It does not help to move the initialization before device_add() as that
creates another problem:

    INIT_DELAYED_WORK(w)
    ...
    device_add()        -> uevent

Preemption or concurrency:

    devcd_data_write()
      mod_delayed_work(..., w, 0);      <- Schedules work immediately

    work_queue_runs()
      devcd_del(w)
        device_del()
        put_device()                    <- Drops the last reference

    initialization continues...
    
So, yes this needs serialization of some sort.

Same problem vs. disabled_store().

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-14 10:38       ` Thomas Gleixner
@ 2022-04-14 11:20         ` Mukesh Ojha
  2022-04-14 11:31           ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Mukesh Ojha @ 2022-04-14 11:20 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: gregkh, linux-kernel, sboyd, johannes, rafael

On Thu, Apr 14, 2022 at 12:38:13PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 13 2022 at 12:58, Greg KH wrote:
> > On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
> >> 	p1 					p2(X)
> >> 
> >>    dev_coredump() uevent sent to userspace
> >>    device_add()  =========================> userspace process X reads the uevents
> >>                                             writes to devcd fd which
> >>                                             results into writes to
> >> 
> >>                                             devcd_data_write()
> >> 					      mod_delayed_work()
> >>                                                 try_to_grab_pending()
> >> 						  del_timer()
> >> 						   debug_assert_init()
> >>   INIT_DELAYED_WORK 
> >>   schedule_delayed_work
> >> 						    debug_object_fixup()
> >
> > Why do you have object debugging enabled?  That's going to take a LONG
> > time, and will find bugs in your code.  Perhaps like this one?
> 
> It's not finding bugs in his code. It finds bug in the upstream
> dev_coredump code.
> 
> > And if you turn object debugging off, what happens?
> 
> The debugobject splat goes away, but the problem persists.
> 
>     device_add()        -> uevent
> 
> Preemption or concurrency:
> 
>     devcd_data_write()
>        mod_delayed_work(..., w, 0);      <- Uninitialized.
> 
> The dev_coredump code exposes the device before it is fully initialized
> and a write ending up in devcd_data_write() touches uninitialized work.
> 
> It does not help to move the initialization before device_add() as that
> creates another problem:
> 
>     INIT_DELAYED_WORK(w)
>     ...
>     device_add()        -> uevent
> 
> Preemption or concurrency:
> 
>     devcd_data_write()
>       mod_delayed_work(..., w, 0);      <- Schedules work immediately
> 
>     work_queue_runs()
>       devcd_del(w)
>         device_del()
>         put_device()                    <- Drops the last reference
> 
>     initialization continues...
>     
> So, yes this needs serialization of some sort.

Hi Thomas,

Thanks for understanding the problem.
Can the patch mentioned at below link helps with the first problem ?


https://lore.kernel.org/lkml/57a04278-0a60-cc7d-7ce8-a75c2befd568@quicinc.com/

> 
> Same problem vs. disabled_store().

you mean, while userspace is reading the data and suddenly disable_store() done from
sysfs.

-Mukesh
> 
> Thanks,
> 
>         tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Possible race in dev_coredumpm()-del_timer() path
  2022-04-14 11:20         ` Mukesh Ojha
@ 2022-04-14 11:31           ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2022-04-14 11:31 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: gregkh, linux-kernel, sboyd, johannes, rafael

On Thu, Apr 14 2022 at 16:50, Mukesh Ojha wrote:
> On Thu, Apr 14, 2022 at 12:38:13PM +0200, Thomas Gleixner wrote:
>> So, yes this needs serialization of some sort.
>
> Thanks for understanding the problem.
> Can the patch mentioned at below link helps with the first problem ?
>
> https://lore.kernel.org/lkml/57a04278-0a60-cc7d-7ce8-a75c2befd568@quicinc.com/

Something like that.

>> Same problem vs. disabled_store().
>
> you mean, while userspace is reading the data and suddenly disable_store() done from
> sysfs.

No, that's not a problem because the reader holds a reference, but it's
the same problem vs. initialization:

  device_add()
                        disable_store()
                          devcd_free()
                           mod_delayed_work()
                           ....
Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-14 11:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13  5:29 Possible race in dev_coredumpm()-del_timer() path Mukesh Ojha
2022-04-13  5:34 ` Greg KH
2022-04-13  6:41   ` Johannes Berg
2022-04-13 10:16   ` Mukesh Ojha
2022-04-13 10:58     ` Greg KH
2022-04-13 11:21       ` Mukesh Ojha
2022-04-13 13:01         ` Greg KH
2022-04-13 14:17       ` Mukesh Ojha
2022-04-13 14:18       ` Mukesh Ojha
2022-04-14 10:38       ` Thomas Gleixner
2022-04-14 11:20         ` Mukesh Ojha
2022-04-14 11:31           ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).