netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs
@ 2022-05-19 13:53 Duoming Zhou
  2022-05-19 14:58 ` Kalle Valo
  0 siblings, 1 reply; 9+ messages in thread
From: Duoming Zhou @ 2022-05-19 13:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: amitkarwar, ganapathi017, sharvari.harisangam, huxinming820,
	kvalo, davem, edumazet, kuba, pabeni, linux-wireless, netdev,
	Duoming Zhou

There are sleep in atomic context bugs when uploading device dump
data on usb interface. The root cause is that the operations that
may sleep are called in fw_dump_timer_fn which is a timer handler.
The call tree shows the execution paths that could lead to bugs:

   (Interrupt context)
fw_dump_timer_fn
  mwifiex_upload_device_dump
    dev_coredumpv(..., GFP_KERNEL)
      dev_coredumpm()
        kzalloc(sizeof(*devcd), gfp); //may sleep
        dev_set_name
          kobject_set_name_vargs
            kvasprintf_const(GFP_KERNEL, ...); //may sleep
            kstrdup(s, GFP_KERNEL); //may sleep

This patch moves the operations that may sleep into a work item.
The work item will run in another kernel thread which is in
process context to execute the bottom half of the interrupt.
So it could prevent atomic context from sleeping.

Fixes: f5ecd02a8b20 ("mwifiex: device dump support for usb interface")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
Changes in v2:
  - Fix compile problem.

 drivers/net/wireless/marvell/mwifiex/init.c      | 12 +++++++++++-
 drivers/net/wireless/marvell/mwifiex/main.h      |  1 +
 drivers/net/wireless/marvell/mwifiex/sta_event.c |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/init.c b/drivers/net/wireless/marvell/mwifiex/init.c
index 88c72d1827a..cc3f1121eb9 100644
--- a/drivers/net/wireless/marvell/mwifiex/init.c
+++ b/drivers/net/wireless/marvell/mwifiex/init.c
@@ -63,11 +63,19 @@ static void wakeup_timer_fn(struct timer_list *t)
 		adapter->if_ops.card_reset(adapter);
 }
 
+static void fw_dump_work(struct work_struct *work)
+{
+	struct mwifiex_adapter *adapter =
+		container_of(work, struct mwifiex_adapter, devdump_work);
+
+	mwifiex_upload_device_dump(adapter);
+}
+
 static void fw_dump_timer_fn(struct timer_list *t)
 {
 	struct mwifiex_adapter *adapter = from_timer(adapter, t, devdump_timer);
 
-	mwifiex_upload_device_dump(adapter);
+	schedule_work(&adapter->devdump_work);
 }
 
 /*
@@ -321,6 +329,7 @@ static void mwifiex_init_adapter(struct mwifiex_adapter *adapter)
 	adapter->active_scan_triggered = false;
 	timer_setup(&adapter->wakeup_timer, wakeup_timer_fn, 0);
 	adapter->devdump_len = 0;
+	INIT_WORK(&adapter->devdump_work, fw_dump_work);
 	timer_setup(&adapter->devdump_timer, fw_dump_timer_fn, 0);
 }
 
@@ -401,6 +410,7 @@ mwifiex_adapter_cleanup(struct mwifiex_adapter *adapter)
 {
 	del_timer(&adapter->wakeup_timer);
 	del_timer_sync(&adapter->devdump_timer);
+	cancel_work_sync(&adapter->devdump_work);
 	mwifiex_cancel_all_pending_cmd(adapter);
 	wake_up_interruptible(&adapter->cmd_wait_q.wait);
 	wake_up_interruptible(&adapter->hs_activate_wait_q);
diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index 332dd1c8db3..c8ac2f57f18 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -900,6 +900,7 @@ struct mwifiex_adapter {
 	struct work_struct rx_work;
 	struct workqueue_struct *dfs_workqueue;
 	struct work_struct dfs_work;
+	struct work_struct devdump_work;
 	bool rx_work_enabled;
 	bool rx_processing;
 	bool delay_main_work;
diff --git a/drivers/net/wireless/marvell/mwifiex/sta_event.c b/drivers/net/wireless/marvell/mwifiex/sta_event.c
index 7d42c5d2dbf..8e28d0107d7 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
@@ -644,6 +644,7 @@ mwifiex_fw_dump_info_event(struct mwifiex_private *priv,
 
 upload_dump:
 	del_timer_sync(&adapter->devdump_timer);
+	cancel_work_sync(&adapter->devdump_work);
 	mwifiex_upload_device_dump(adapter);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs
  2022-05-19 13:53 [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs Duoming Zhou
@ 2022-05-19 14:58 ` Kalle Valo
  2022-05-19 15:14   ` duoming
  0 siblings, 1 reply; 9+ messages in thread
From: Kalle Valo @ 2022-05-19 14:58 UTC (permalink / raw)
  To: Duoming Zhou
  Cc: linux-kernel, amitkarwar, ganapathi017, sharvari.harisangam,
	huxinming820, davem, edumazet, kuba, pabeni, linux-wireless,
	netdev

Duoming Zhou <duoming@zju.edu.cn> writes:

> There are sleep in atomic context bugs when uploading device dump
> data on usb interface. The root cause is that the operations that
> may sleep are called in fw_dump_timer_fn which is a timer handler.
> The call tree shows the execution paths that could lead to bugs:
>
>    (Interrupt context)
> fw_dump_timer_fn
>   mwifiex_upload_device_dump
>     dev_coredumpv(..., GFP_KERNEL)
>       dev_coredumpm()
>         kzalloc(sizeof(*devcd), gfp); //may sleep
>         dev_set_name
>           kobject_set_name_vargs
>             kvasprintf_const(GFP_KERNEL, ...); //may sleep
>             kstrdup(s, GFP_KERNEL); //may sleep
>
> This patch moves the operations that may sleep into a work item.
> The work item will run in another kernel thread which is in
> process context to execute the bottom half of the interrupt.
> So it could prevent atomic context from sleeping.
>
> Fixes: f5ecd02a8b20 ("mwifiex: device dump support for usb interface")
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>

mwifiex patches go to wireless-next, not net tree.

> ---
> Changes in v2:
>   - Fix compile problem.

So you don't even compile test your patches? That's bad and in that case
I'll just directly drop this. We expect that the patches are properly
tested.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs
  2022-05-19 14:58 ` Kalle Valo
@ 2022-05-19 15:14   ` duoming
  2022-05-19 15:48     ` Jeff Johnson
  0 siblings, 1 reply; 9+ messages in thread
From: duoming @ 2022-05-19 15:14 UTC (permalink / raw)
  To: Kalle Valo
  Cc: linux-kernel, amitkarwar, ganapathi017, sharvari.harisangam,
	huxinming820, davem, edumazet, kuba, pabeni, linux-wireless,
	netdev

Hello,

On Thu, 19 May 2022 17:58:47 +0300 Kalle Valo wrote:

> > There are sleep in atomic context bugs when uploading device dump
> > data on usb interface. The root cause is that the operations that
> > may sleep are called in fw_dump_timer_fn which is a timer handler.
> > The call tree shows the execution paths that could lead to bugs:
> >
> >    (Interrupt context)
> > fw_dump_timer_fn
> >   mwifiex_upload_device_dump
> >     dev_coredumpv(..., GFP_KERNEL)
> >       dev_coredumpm()
> >         kzalloc(sizeof(*devcd), gfp); //may sleep
> >         dev_set_name
> >           kobject_set_name_vargs
> >             kvasprintf_const(GFP_KERNEL, ...); //may sleep
> >             kstrdup(s, GFP_KERNEL); //may sleep
> >
> > This patch moves the operations that may sleep into a work item.
> > The work item will run in another kernel thread which is in
> > process context to execute the bottom half of the interrupt.
> > So it could prevent atomic context from sleeping.
> >
> > Fixes: f5ecd02a8b20 ("mwifiex: device dump support for usb interface")
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> 
> mwifiex patches go to wireless-next, not net tree.
> 
> > ---
> > Changes in v2:
> >   - Fix compile problem.
> 
> So you don't even compile test your patches? That's bad and in that case
> I'll just directly drop this. We expect that the patches are properly
> tested.

Ok, I will properly test this patch.

Best regards,
Duoming Zhou 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs
  2022-05-19 15:14   ` duoming
@ 2022-05-19 15:48     ` Jeff Johnson
  2022-05-20  0:08       ` duoming
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Johnson @ 2022-05-19 15:48 UTC (permalink / raw)
  To: duoming, Kalle Valo
  Cc: linux-kernel, amitkarwar, ganapathi017, sharvari.harisangam,
	huxinming820, davem, edumazet, kuba, pabeni, linux-wireless,
	netdev

On 5/19/2022 8:14 AM, duoming@zju.edu.cn wrote:
> Hello,
> 
> On Thu, 19 May 2022 17:58:47 +0300 Kalle Valo wrote:
> 
>>> There are sleep in atomic context bugs when uploading device dump
>>> data on usb interface. The root cause is that the operations that
>>> may sleep are called in fw_dump_timer_fn which is a timer handler.
>>> The call tree shows the execution paths that could lead to bugs:
>>>
>>>     (Interrupt context)
>>> fw_dump_timer_fn
>>>    mwifiex_upload_device_dump
>>>      dev_coredumpv(..., GFP_KERNEL)

just looking at this description, why isn't the simple fix just to 
change this call to use GFP_ATOMIC?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs
  2022-05-19 15:48     ` Jeff Johnson
@ 2022-05-20  0:08       ` duoming
  2022-05-20 16:08         ` Jeff Johnson
  0 siblings, 1 reply; 9+ messages in thread
From: duoming @ 2022-05-20  0:08 UTC (permalink / raw)
  To: Jeff Johnson
  Cc: Kalle Valo, linux-kernel, amitkarwar, ganapathi017,
	sharvari.harisangam, huxinming820, davem, edumazet, kuba, pabeni,
	linux-wireless, netdev

Hello,

On Thu, 19 May 2022 08:48:44 -0700 Jeff Johnson wrote:

> >>> There are sleep in atomic context bugs when uploading device dump
> >>> data on usb interface. The root cause is that the operations that
> >>> may sleep are called in fw_dump_timer_fn which is a timer handler.
> >>> The call tree shows the execution paths that could lead to bugs:
> >>>
> >>>     (Interrupt context)
> >>> fw_dump_timer_fn
> >>>    mwifiex_upload_device_dump
> >>>      dev_coredumpv(..., GFP_KERNEL)
> 
> just looking at this description, why isn't the simple fix just to 
> change this call to use GFP_ATOMIC?

Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve
partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c 
which is not influenced by dev_coredumpv().

 kobject_set_name_vargs
   kvasprintf_const(GFP_KERNEL, ...); //may sleep
   kstrdup(s, GFP_KERNEL); //may sleep

Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs
  2022-05-20  0:08       ` duoming
@ 2022-05-20 16:08         ` Jeff Johnson
  2022-05-21  3:21           ` duoming
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Johnson @ 2022-05-20 16:08 UTC (permalink / raw)
  To: duoming
  Cc: Kalle Valo, linux-kernel, amitkarwar, ganapathi017,
	sharvari.harisangam, huxinming820, davem, edumazet, kuba, pabeni,
	linux-wireless, netdev

On 5/19/2022 5:08 PM, duoming@zju.edu.cn wrote:
> Hello,
> 
> On Thu, 19 May 2022 08:48:44 -0700 Jeff Johnson wrote:
> 
>>>>> There are sleep in atomic context bugs when uploading device dump
>>>>> data on usb interface. The root cause is that the operations that
>>>>> may sleep are called in fw_dump_timer_fn which is a timer handler.
>>>>> The call tree shows the execution paths that could lead to bugs:
>>>>>
>>>>>      (Interrupt context)
>>>>> fw_dump_timer_fn
>>>>>     mwifiex_upload_device_dump
>>>>>       dev_coredumpv(..., GFP_KERNEL)
>>
>> just looking at this description, why isn't the simple fix just to
>> change this call to use GFP_ATOMIC?
> 
> Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve
> partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c
> which is not influenced by dev_coredumpv().
> 
>   kobject_set_name_vargs
>     kvasprintf_const(GFP_KERNEL, ...); //may sleep
>     kstrdup(s, GFP_KERNEL); //may sleep

Then it seems there is a problem with dev_coredumpm().

dev_coredumpm() takes a gfp param which means it expects to be called in 
any context, but it then calls dev_set_name() which, as you point out, 
cannot be called from an atomic context.

So if we cannot change the fact that dev_set_name() cannot be called 
from an atomic context, then it would seem to follow that 
dev_coredumpv()/dev_coredumpm() also cannot be called from an atomic 
context and hence their gfp param is pointless and should presumably be 
removed.

/jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs
  2022-05-20 16:08         ` Jeff Johnson
@ 2022-05-21  3:21           ` duoming
  2022-05-21  6:32             ` Kalle Valo
  0 siblings, 1 reply; 9+ messages in thread
From: duoming @ 2022-05-21  3:21 UTC (permalink / raw)
  To: Jeff Johnson
  Cc: Kalle Valo, linux-kernel, amitkarwar, ganapathi017,
	sharvari.harisangam, huxinming820, davem, edumazet, kuba, pabeni,
	linux-wireless, netdev

Hello,

On Fri, 20 May 2022 09:08:52 -0700 Jeff Johnson wrote:

> >>>>> There are sleep in atomic context bugs when uploading device dump
> >>>>> data on usb interface. The root cause is that the operations that
> >>>>> may sleep are called in fw_dump_timer_fn which is a timer handler.
> >>>>> The call tree shows the execution paths that could lead to bugs:
> >>>>>
> >>>>>      (Interrupt context)
> >>>>> fw_dump_timer_fn
> >>>>>     mwifiex_upload_device_dump
> >>>>>       dev_coredumpv(..., GFP_KERNEL)
> >>
> >> just looking at this description, why isn't the simple fix just to
> >> change this call to use GFP_ATOMIC?
> > 
> > Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve
> > partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c
> > which is not influenced by dev_coredumpv().
> > 
> >   kobject_set_name_vargs
> >     kvasprintf_const(GFP_KERNEL, ...); //may sleep
> >     kstrdup(s, GFP_KERNEL); //may sleep
> 
> Then it seems there is a problem with dev_coredumpm().
> 
> dev_coredumpm() takes a gfp param which means it expects to be called in 
> any context, but it then calls dev_set_name() which, as you point out, 
> cannot be called from an atomic context.
> 
> So if we cannot change the fact that dev_set_name() cannot be called 
> from an atomic context, then it would seem to follow that 
> dev_coredumpv also cannot be called from an atomic 
> context and hence their gfp param is pointless and should presumably be 
> removed.

Thanks for your time and suggestions! I think the gfp_t parameter of dev_coredumpv and
dev_coredumpm may not be removed, because it could be used to pass value to gfp_t
parameter of kzalloc in dev_coredumpm. What's more, there are also many other places
use dev_coredumpv and dev_coredumpm, if we remove the gfp_t parameter, there are too many
places that need to modify and these places are not in interrupt context. 

There are two solutions now: one is to moves the operations that may sleep into a work item.
Another is to change the gfp_t parameter of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC, and
change the gfp_t parameter of kvasprintf_const and kstrdup from GFP_KERNEL to 
"in_interrupt() ? GFP_ATOMIC : GFP_KERNEL".

The detail of the first solution is shown below:

diff --git a/drivers/net/wireless/marvell/mwifiex/init.c b/drivers/net/wireless/marvell/mwifiex/init.c
index 88c72d1827a..cc3f1121eb9 100644
--- a/drivers/net/wireless/marvell/mwifiex/init.c
+++ b/drivers/net/wireless/marvell/mwifiex/init.c
@@ -63,11 +63,19 @@ static void wakeup_timer_fn(struct timer_list *t)
                adapter->if_ops.card_reset(adapter);
 }

+static void fw_dump_work(struct work_struct *work)
+{
+       struct mwifiex_adapter *adapter =
+               container_of(work, struct mwifiex_adapter, devdump_work);
+
+       mwifiex_upload_device_dump(adapter);
+}
+
 static void fw_dump_timer_fn(struct timer_list *t)
 {
        struct mwifiex_adapter *adapter = from_timer(adapter, t, devdump_timer);

-       mwifiex_upload_device_dump(adapter);
+       schedule_work(&adapter->devdump_work);
 }

 /*
@@ -321,6 +329,7 @@ static void mwifiex_init_adapter(struct mwifiex_adapter *adapter)
        adapter->active_scan_triggered = false;
        timer_setup(&adapter->wakeup_timer, wakeup_timer_fn, 0);
        adapter->devdump_len = 0;
+       INIT_WORK(&adapter->devdump_work, fw_dump_work);
        timer_setup(&adapter->devdump_timer, fw_dump_timer_fn, 0);
 }

@@ -401,6 +410,7 @@ mwifiex_adapter_cleanup(struct mwifiex_adapter *adapter)
 {
        del_timer(&adapter->wakeup_timer);
        del_timer_sync(&adapter->devdump_timer);
+       cancel_work_sync(&adapter->devdump_work);
        mwifiex_cancel_all_pending_cmd(adapter);
        wake_up_interruptible(&adapter->cmd_wait_q.wait);
        wake_up_interruptible(&adapter->hs_activate_wait_q);
diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index 332dd1c8db3..c8ac2f57f18 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -900,6 +900,7 @@ struct mwifiex_adapter {
        struct work_struct rx_work;
        struct workqueue_struct *dfs_workqueue;
        struct work_struct dfs_work;
+       struct work_struct devdump_work;
        bool rx_work_enabled;
        bool rx_processing;
        bool delay_main_work;
diff --git a/drivers/net/wireless/marvell/mwifiex/sta_event.c b/drivers/net/wireless/marvell/mwifiex/sta_event.c
index 7d42c5d2dbf..8e28d0107d7 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
@@ -644,6 +644,7 @@ mwifiex_fw_dump_info_event(struct mwifiex_private *priv,

 upload_dump:
        del_timer_sync(&adapter->devdump_timer);
+       cancel_work_sync(&adapter->devdump_work);
        mwifiex_upload_device_dump(adapter);
 }

The detail of the second solution is shown below:

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index ace7371c477..258906920a2 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1116,7 +1116,7 @@ void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter)
        mwifiex_dbg(adapter, MSG,
                    "== mwifiex dump information to /sys/class/devcoredump start\n");
        dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len,
-                     GFP_KERNEL);
+                     GFP_ATOMIC);
        mwifiex_dbg(adapter, MSG,
                    "== mwifiex dump information to /sys/class/devcoredump end\n");

diff --git a/lib/kobject.c b/lib/kobject.c
index 5f0e71ab292..dbb2e57ef78 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -254,7 +254,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
        if (kobj->name && !fmt)
                return 0;

-       s = kvasprintf_const(GFP_KERNEL, fmt, vargs);
+       s = kvasprintf_const(in_interrupt() ? GFP_ATOMIC : GFP_KERNEL, fmt, vargs);
        if (!s)
                return -ENOMEM;

@@ -267,7 +267,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
        if (strchr(s, '/')) {
                char *t;

-               t = kstrdup(s, GFP_KERNEL);
+               t = kstrdup(s, in_interrupt() ? GFP_ATOMIC : GFP_KERNEL);
                kfree_const(s);
                if (!t)
                        return -ENOMEM;

Which one do you think is better? I will choose the better one to test.

Best regards,
Duoming Zhou

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs
  2022-05-21  3:21           ` duoming
@ 2022-05-21  6:32             ` Kalle Valo
  2022-05-21 13:59               ` duoming
  0 siblings, 1 reply; 9+ messages in thread
From: Kalle Valo @ 2022-05-21  6:32 UTC (permalink / raw)
  To: duoming
  Cc: Jeff Johnson, linux-kernel, amitkarwar, ganapathi017,
	sharvari.harisangam, huxinming820, davem, edumazet, kuba, pabeni,
	linux-wireless, netdev

duoming@zju.edu.cn writes:

> Hello,
>
> On Fri, 20 May 2022 09:08:52 -0700 Jeff Johnson wrote:
>
>> >>>>> There are sleep in atomic context bugs when uploading device dump
>> >>>>> data on usb interface. The root cause is that the operations that
>> >>>>> may sleep are called in fw_dump_timer_fn which is a timer handler.
>> >>>>> The call tree shows the execution paths that could lead to bugs:
>> >>>>>
>> >>>>>      (Interrupt context)
>> >>>>> fw_dump_timer_fn
>> >>>>>     mwifiex_upload_device_dump
>> >>>>>       dev_coredumpv(..., GFP_KERNEL)
>> >>
>> >> just looking at this description, why isn't the simple fix just to
>> >> change this call to use GFP_ATOMIC?
>> > 
>> > Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve
>> > partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c
>> > which is not influenced by dev_coredumpv().
>> > 
>> >   kobject_set_name_vargs
>> >     kvasprintf_const(GFP_KERNEL, ...); //may sleep
>> >     kstrdup(s, GFP_KERNEL); //may sleep
>> 
>> Then it seems there is a problem with dev_coredumpm().
>> 
>> dev_coredumpm() takes a gfp param which means it expects to be called in 
>> any context, but it then calls dev_set_name() which, as you point out, 
>> cannot be called from an atomic context.
>> 
>> So if we cannot change the fact that dev_set_name() cannot be called 
>> from an atomic context, then it would seem to follow that 
>> dev_coredumpv also cannot be called from an atomic 
>> context and hence their gfp param is pointless and should presumably be 
>> removed.
>
> Thanks for your time and suggestions! I think the gfp_t parameter of dev_coredumpv and
> dev_coredumpm may not be removed, because it could be used to pass value to gfp_t
> parameter of kzalloc in dev_coredumpm. What's more, there are also many other places
> use dev_coredumpv and dev_coredumpm, if we remove the gfp_t parameter, there are too many
> places that need to modify and these places are not in interrupt
> context.

"Too many users" is not a valid reason to leave a bug in place, either
dev_coredumpv() should support GFP_ATOMIC or the gfp_t parameter should
be removed.

> There are two solutions now: one is to moves the operations that may
> sleep into a work item.

That does not fix the root cause that dev_coredumpv() claims it can be
called in atomic contexts.

> Another is to change the gfp_t parameter of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC, and
> change the gfp_t parameter of kvasprintf_const and kstrdup from GFP_KERNEL to 
> "in_interrupt() ? GFP_ATOMIC : GFP_KERNEL".

in_interrupt() is deprecated and should not be used. And I don't think
it detects all atomic contexts like spinlocks.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs
  2022-05-21  6:32             ` Kalle Valo
@ 2022-05-21 13:59               ` duoming
  0 siblings, 0 replies; 9+ messages in thread
From: duoming @ 2022-05-21 13:59 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Jeff Johnson, linux-kernel, amitkarwar, ganapathi017,
	sharvari.harisangam, huxinming820, davem, edumazet, kuba, pabeni,
	linux-wireless, netdev

Hello,

On Sat, 21 May 2022 09:32:37 +0300 Kalle Valo wrote:

> >> >>>>> There are sleep in atomic context bugs when uploading device dump
> >> >>>>> data on usb interface. The root cause is that the operations that
> >> >>>>> may sleep are called in fw_dump_timer_fn which is a timer handler.
> >> >>>>> The call tree shows the execution paths that could lead to bugs:
> >> >>>>>
> >> >>>>>      (Interrupt context)
> >> >>>>> fw_dump_timer_fn
> >> >>>>>     mwifiex_upload_device_dump
> >> >>>>>       dev_coredumpv(..., GFP_KERNEL)
> >> >>
> >> >> just looking at this description, why isn't the simple fix just to
> >> >> change this call to use GFP_ATOMIC?
> >> > 
> >> > Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve
> >> > partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c
> >> > which is not influenced by dev_coredumpv().
> >> > 
> >> >   kobject_set_name_vargs
> >> >     kvasprintf_const(GFP_KERNEL, ...); //may sleep
> >> >     kstrdup(s, GFP_KERNEL); //may sleep
> >> 
> >> Then it seems there is a problem with dev_coredumpm().
> >> 
> >> dev_coredumpm() takes a gfp param which means it expects to be called in 
> >> any context, but it then calls dev_set_name() which, as you point out, 
> >> cannot be called from an atomic context.
> >> 
> >> So if we cannot change the fact that dev_set_name() cannot be called 
> >> from an atomic context, then it would seem to follow that 
> >> dev_coredumpv also cannot be called from an atomic 
> >> context and hence their gfp param is pointless and should presumably be 
> >> removed.
> >
> > Thanks for your time and suggestions! I think the gfp_t parameter of dev_coredumpv and
> > dev_coredumpm may not be removed, because it could be used to pass value to gfp_t
> > parameter of kzalloc in dev_coredumpm. What's more, there are also many other places
> > use dev_coredumpv and dev_coredumpm, if we remove the gfp_t parameter, there are too many
> > places that need to modify and these places are not in interrupt
> > context.
> 
> "Too many users" is not a valid reason to leave a bug in place, either
> dev_coredumpv() should support GFP_ATOMIC or the gfp_t parameter should
> be removed.

The following is one method that letting dev_coredump() support GFP_ATOMIC:

I think dev_set_name() is used to allocate memory to set a device name, 
which need only several bytes and there is little chance to sleep in the
real world. However the dev_coredumpv() is used to allocate memory to store
device coredump, which need lots of memory space and have more chances
to sleep. So I think only change the gfp_t parameter of dev_coredumpv()
from GFP_KERNEL to GFP_ATOMIC is ok.

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index ace7371c477..258906920a2 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1116,7 +1116,7 @@ void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter)
        mwifiex_dbg(adapter, MSG,
                    "== mwifiex dump information to /sys/class/devcoredump start\n");
        dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len,
-                     GFP_KERNEL);
+                     GFP_ATOMIC);
        mwifiex_dbg(adapter, MSG,
                    "== mwifiex dump information to /sys/class/devcoredump end\n");

> > There are two solutions now: one is to moves the operations that may
> > sleep into a work item.
> 
> That does not fix the root cause that dev_coredumpv() claims it can be
> called in atomic contexts.

I agree with you. There is not GFP_ATOMIC in lib/kobject.c. Should we modify
the gfp_t parameter in kobject.c in order to support atomic contexts? Do you have
any other good methods?

> > Another is to change the gfp_t parameter of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC, and
> > change the gfp_t parameter of kvasprintf_const and kstrdup from GFP_KERNEL to 
> > "in_interrupt() ? GFP_ATOMIC : GFP_KERNEL".
> 
> in_interrupt() is deprecated and should not be used. And I don't think
> it detects all atomic contexts like spinlocks.

I agree with you, the in_interrupt() is not proper.

Thanks for your time and suggestions!

Best regards,
Duoming Zhou

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-05-21 14:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19 13:53 [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs Duoming Zhou
2022-05-19 14:58 ` Kalle Valo
2022-05-19 15:14   ` duoming
2022-05-19 15:48     ` Jeff Johnson
2022-05-20  0:08       ` duoming
2022-05-20 16:08         ` Jeff Johnson
2022-05-21  3:21           ` duoming
2022-05-21  6:32             ` Kalle Valo
2022-05-21 13:59               ` duoming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).