* [RFC PATCH net 0/2] amd-xgbe: Sleeping while atomic during hibernate
@ 2019-10-15 13:49 James Morse
2019-10-15 13:49 ` [RFC PATCH net 1/2] amd-xgbe: Avoid sleeping in flush_workqueue() while holding a spinlock James Morse
2019-10-15 13:49 ` [RFC PATCH net 2/2] amd-xgbe: Avoid sleeping in napi_disable() " James Morse
0 siblings, 2 replies; 5+ messages in thread
From: James Morse @ 2019-10-15 13:49 UTC (permalink / raw)
To: netdev; +Cc: Tom Lendacky, Dave S . Miller
Hi guys,
While testing hibernate on Seattle, I ran over these two.
I'm not sure the second one is correct, so I'm hoping someone else will
know how to fix it!
Thanks,
James Morse (2):
amd-xgbe: Avoid sleeping in flush_workqueue() while holding a spinlock
amd-xgbe: Avoid sleeping in napi_disable() while holding a spinlock
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 26 ++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [RFC PATCH net 1/2] amd-xgbe: Avoid sleeping in flush_workqueue() while holding a spinlock
2019-10-15 13:49 [RFC PATCH net 0/2] amd-xgbe: Sleeping while atomic during hibernate James Morse
@ 2019-10-15 13:49 ` James Morse
2019-10-15 15:26 ` Lendacky, Thomas
2019-10-15 13:49 ` [RFC PATCH net 2/2] amd-xgbe: Avoid sleeping in napi_disable() " James Morse
1 sibling, 1 reply; 5+ messages in thread
From: James Morse @ 2019-10-15 13:49 UTC (permalink / raw)
To: netdev; +Cc: Tom Lendacky, Dave S . Miller
xgbe_powerdown() takes an irqsave spinlock, then calls flush_workqueue()
which takes a mutex. DEBUG_ATOMIC_SLEEP isn't happy about this:
| BUG: sleeping function called from invalid context at [...] mutex.c:281
| in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 2733, name: bash
| CPU: 4 PID: 2733 Comm: bash Tainted: G W 5.4.0-rc3 #113
| Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT)
| Call trace:
| show_stack+0x24/0x30
| dump_stack+0xb0/0xf8
| ___might_sleep+0x124/0x148
| __might_sleep+0x54/0x90
| mutex_lock+0x2c/0x80
| flush_workqueue+0x84/0x420
| xgbe_powerdown+0x6c/0x108
| xgbe_platform_suspend+0x34/0x80
| pm_generic_freeze+0x3c/0x58
| acpi_subsys_freeze+0x2c/0x38
| dpm_run_callback+0x3c/0x1e8
| __device_suspend+0x130/0x468
| dpm_suspend+0x114/0x388
| hibernation_snapshot+0xe8/0x378
| hibernate+0x18c/0x2f8
Drop the lock for the duration of xgbe_powerdown(). We have already
stopeed the timers, so service_work can't be re-queued. Move the
pdata->power_down flag earlier so that it can be used by the interrupt
handler to know not to re-queue the tx_tstamp_work.
Signed-off-by: James Morse <james.morse@arm.com>
---
RFC as I'm not familiar with this driver. I'm happy to test a better fix!
---
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 98f8f2033154..bfba7effcf9f 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -480,6 +480,8 @@ static void xgbe_isr_task(unsigned long data)
struct xgbe_channel *channel;
unsigned int dma_isr, dma_ch_isr;
unsigned int mac_isr, mac_tssr, mac_mdioisr;
+ unsigned long flags;
+ bool power_down;
unsigned int i;
/* The DMA interrupt status register also reports MAC and MTL
@@ -558,8 +560,14 @@ static void xgbe_isr_task(unsigned long data)
/* Read Tx Timestamp to clear interrupt */
pdata->tx_tstamp =
hw_if->get_tx_tstamp(pdata);
- queue_work(pdata->dev_workqueue,
- &pdata->tx_tstamp_work);
+
+ spin_lock_irqsave(&pdata->lock, flags);
+ power_down = pdata->power_down;
+ spin_unlock_irqrestore(&pdata->lock, flags);
+
+ if (!power_down)
+ queue_work(pdata->dev_workqueue,
+ &pdata->tx_tstamp_work);
}
}
@@ -1256,16 +1264,22 @@ int xgbe_powerdown(struct net_device *netdev, unsigned int caller)
netif_tx_stop_all_queues(netdev);
+ /* Stop service_work being re-queued by the service_timer */
xgbe_stop_timers(pdata);
+
+ /* Stop tx_tstamp_work being re-queued after flush_workqueue() */
+ pdata->power_down = 1;
+
+ /* drop the lock to allow flush_workqueue() to sleep. */
+ spin_unlock_irqrestore(&pdata->lock, flags);
flush_workqueue(pdata->dev_workqueue);
+ spin_lock_irqsave(&pdata->lock, flags);
hw_if->powerdown_tx(pdata);
hw_if->powerdown_rx(pdata);
xgbe_napi_disable(pdata, 0);
- pdata->power_down = 1;
-
spin_unlock_irqrestore(&pdata->lock, flags);
DBGPR("<--xgbe_powerdown\n");
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC PATCH net 2/2] amd-xgbe: Avoid sleeping in napi_disable() while holding a spinlock
2019-10-15 13:49 [RFC PATCH net 0/2] amd-xgbe: Sleeping while atomic during hibernate James Morse
2019-10-15 13:49 ` [RFC PATCH net 1/2] amd-xgbe: Avoid sleeping in flush_workqueue() while holding a spinlock James Morse
@ 2019-10-15 13:49 ` James Morse
2019-10-15 15:41 ` Lendacky, Thomas
1 sibling, 1 reply; 5+ messages in thread
From: James Morse @ 2019-10-15 13:49 UTC (permalink / raw)
To: netdev; +Cc: Tom Lendacky, Dave S . Miller
xgbe_powerdown() takes an irqsave spinlock, then calls napi_disable()
via xgbe_napi_disable(). napi_disable() might call msleep().
DEBUG_ATOMIC_SLEEP isn't happy about this:
| BUG: sleeping function called from invalid context at ../net/core/dev.c:6332
| in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 2831, name: bash
| CPU: 3 PID: 2831 Comm: bash Tainted: G W 5.4.0-rc3-00001-g9dbe793f263b #114
| Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT)
| Call trace:
| dump_backtrace+0x0/0x160
| show_stack+0x24/0x30
| dump_stack+0xb0/0xf8
| ___might_sleep+0x124/0x148
| __might_sleep+0x54/0x90
| napi_disable+0x48/0x140
| xgbe_napi_disable+0x64/0xc0
| xgbe_powerdown+0xb0/0x120
| xgbe_platform_suspend+0x34/0x80
| pm_generic_freeze+0x3c/0x58
| acpi_subsys_freeze+0x2c/0x38
| dpm_run_callback+0x3c/0x1e8
| __device_suspend+0x130/0x468
| dpm_suspend+0x114/0x388
| hibernation_snapshot+0xe8/0x378
| hibernate+0x18c/0x2f8
Move xgbe_napi_disable() outside the spin_lock()d region of
xgbe_powerdown(). This matches its use in xgbe_stop() ... but this
might only be safe because of the earlier call to xgbe_free_irqs().
Signed-off-by: James Morse <james.morse@arm.com>
---
RFC as I'm not familiar with this driver. I'm happy to test a better fix!
---
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index bfba7effcf9f..a6e6c21e921f 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1278,10 +1278,10 @@ int xgbe_powerdown(struct net_device *netdev, unsigned int caller)
hw_if->powerdown_tx(pdata);
hw_if->powerdown_rx(pdata);
- xgbe_napi_disable(pdata, 0);
-
spin_unlock_irqrestore(&pdata->lock, flags);
+ xgbe_napi_disable(pdata, 0);
+
DBGPR("<--xgbe_powerdown\n");
return 0;
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH net 1/2] amd-xgbe: Avoid sleeping in flush_workqueue() while holding a spinlock
2019-10-15 13:49 ` [RFC PATCH net 1/2] amd-xgbe: Avoid sleeping in flush_workqueue() while holding a spinlock James Morse
@ 2019-10-15 15:26 ` Lendacky, Thomas
0 siblings, 0 replies; 5+ messages in thread
From: Lendacky, Thomas @ 2019-10-15 15:26 UTC (permalink / raw)
To: James Morse, netdev; +Cc: Dave S . Miller
On 10/15/19 8:49 AM, James Morse wrote:
> xgbe_powerdown() takes an irqsave spinlock, then calls flush_workqueue()
> which takes a mutex. DEBUG_ATOMIC_SLEEP isn't happy about this:
> | BUG: sleeping function called from invalid context at [...] mutex.c:281
> | in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 2733, name: bash
> | CPU: 4 PID: 2733 Comm: bash Tainted: G W 5.4.0-rc3 #113
> | Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT)
> | Call trace:
> | show_stack+0x24/0x30
> | dump_stack+0xb0/0xf8
> | ___might_sleep+0x124/0x148
> | __might_sleep+0x54/0x90
> | mutex_lock+0x2c/0x80
> | flush_workqueue+0x84/0x420
> | xgbe_powerdown+0x6c/0x108
> | xgbe_platform_suspend+0x34/0x80
> | pm_generic_freeze+0x3c/0x58
> | acpi_subsys_freeze+0x2c/0x38
> | dpm_run_callback+0x3c/0x1e8
> | __device_suspend+0x130/0x468
> | dpm_suspend+0x114/0x388
> | hibernation_snapshot+0xe8/0x378
> | hibernate+0x18c/0x2f8
>
> Drop the lock for the duration of xgbe_powerdown(). We have already
> stopeed the timers, so service_work can't be re-queued. Move the
> pdata->power_down flag earlier so that it can be used by the interrupt
> handler to know not to re-queue the tx_tstamp_work.
>
> Signed-off-by: James Morse <james.morse@arm.com>
>
> ---
> RFC as I'm not familiar with this driver. I'm happy to test a better fix!
> ---
> drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 22 ++++++++++++++++++----
> 1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> index 98f8f2033154..bfba7effcf9f 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> @@ -480,6 +480,8 @@ static void xgbe_isr_task(unsigned long data)
> struct xgbe_channel *channel;
> unsigned int dma_isr, dma_ch_isr;
> unsigned int mac_isr, mac_tssr, mac_mdioisr;
> + unsigned long flags;
> + bool power_down;
> unsigned int i;
>
> /* The DMA interrupt status register also reports MAC and MTL
> @@ -558,8 +560,14 @@ static void xgbe_isr_task(unsigned long data)
> /* Read Tx Timestamp to clear interrupt */
> pdata->tx_tstamp =
> hw_if->get_tx_tstamp(pdata);
> - queue_work(pdata->dev_workqueue,
> - &pdata->tx_tstamp_work);
> +
> + spin_lock_irqsave(&pdata->lock, flags);
> + power_down = pdata->power_down;
> + spin_unlock_irqrestore(&pdata->lock, flags);
> +
> + if (!power_down)
> + queue_work(pdata->dev_workqueue,
> + &pdata->tx_tstamp_work);
Hmm, I think this could race. You probably want to protect queue_work()
with the spin lock, too, in which case you won't need the power_down var.
> }
> }
>
> @@ -1256,16 +1264,22 @@ int xgbe_powerdown(struct net_device *netdev, unsigned int caller)
>
> netif_tx_stop_all_queues(netdev);
>
> + /* Stop service_work being re-queued by the service_timer */
> xgbe_stop_timers(pdata);
> +
> + /* Stop tx_tstamp_work being re-queued after flush_workqueue() */
> + pdata->power_down = 1;
You can probably move this up even further, just after grabbing the lock.
> +
> + /* drop the lock to allow flush_workqueue() to sleep. */
> + spin_unlock_irqrestore(&pdata->lock, flags);
> flush_workqueue(pdata->dev_workqueue);
Rather than dropping and reacquiring the lock, I think you can get away
with moving this to after dropping the lock below.
Thanks,
Tom
> + spin_lock_irqsave(&pdata->lock, flags);
>
> hw_if->powerdown_tx(pdata);
> hw_if->powerdown_rx(pdata);
>
> xgbe_napi_disable(pdata, 0);
>
> - pdata->power_down = 1;
> -
> spin_unlock_irqrestore(&pdata->lock, flags);
>
> DBGPR("<--xgbe_powerdown\n");
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH net 2/2] amd-xgbe: Avoid sleeping in napi_disable() while holding a spinlock
2019-10-15 13:49 ` [RFC PATCH net 2/2] amd-xgbe: Avoid sleeping in napi_disable() " James Morse
@ 2019-10-15 15:41 ` Lendacky, Thomas
0 siblings, 0 replies; 5+ messages in thread
From: Lendacky, Thomas @ 2019-10-15 15:41 UTC (permalink / raw)
To: James Morse, netdev; +Cc: Dave S . Miller
On 10/15/19 8:49 AM, James Morse wrote:
> xgbe_powerdown() takes an irqsave spinlock, then calls napi_disable()
> via xgbe_napi_disable(). napi_disable() might call msleep().
> DEBUG_ATOMIC_SLEEP isn't happy about this:
> | BUG: sleeping function called from invalid context at ../net/core/dev.c:6332
> | in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 2831, name: bash
> | CPU: 3 PID: 2831 Comm: bash Tainted: G W 5.4.0-rc3-00001-g9dbe793f263b #114
> | Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT)
> | Call trace:
> | dump_backtrace+0x0/0x160
> | show_stack+0x24/0x30
> | dump_stack+0xb0/0xf8
> | ___might_sleep+0x124/0x148
> | __might_sleep+0x54/0x90
> | napi_disable+0x48/0x140
> | xgbe_napi_disable+0x64/0xc0
> | xgbe_powerdown+0xb0/0x120
> | xgbe_platform_suspend+0x34/0x80
> | pm_generic_freeze+0x3c/0x58
> | acpi_subsys_freeze+0x2c/0x38
> | dpm_run_callback+0x3c/0x1e8
> | __device_suspend+0x130/0x468
> | dpm_suspend+0x114/0x388
> | hibernation_snapshot+0xe8/0x378
> | hibernate+0x18c/0x2f8
>
> Move xgbe_napi_disable() outside the spin_lock()d region of
> xgbe_powerdown(). This matches its use in xgbe_stop() ... but this
> might only be safe because of the earlier call to xgbe_free_irqs().
>
> Signed-off-by: James Morse <james.morse@arm.com>
>
> ---
> RFC as I'm not familiar with this driver. I'm happy to test a better fix!
> ---
> drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> index bfba7effcf9f..a6e6c21e921f 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> @@ -1278,10 +1278,10 @@ int xgbe_powerdown(struct net_device *netdev, unsigned int caller)
> hw_if->powerdown_tx(pdata);
> hw_if->powerdown_rx(pdata);
>
> - xgbe_napi_disable(pdata, 0);
> -
> spin_unlock_irqrestore(&pdata->lock, flags);
>
> + xgbe_napi_disable(pdata, 0);
> +
As far as I can tell, I think this is safe. Whether inside or outside the
spinlock doesn't make a difference to the interrupt routine since it
doesn't acquire this lock. And the suspend/resume functions can't be
called at the same time.
Thanks,
Tom
> DBGPR("<--xgbe_powerdown\n");
>
> return 0;
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-10-15 15:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-15 13:49 [RFC PATCH net 0/2] amd-xgbe: Sleeping while atomic during hibernate James Morse
2019-10-15 13:49 ` [RFC PATCH net 1/2] amd-xgbe: Avoid sleeping in flush_workqueue() while holding a spinlock James Morse
2019-10-15 15:26 ` Lendacky, Thomas
2019-10-15 13:49 ` [RFC PATCH net 2/2] amd-xgbe: Avoid sleeping in napi_disable() " James Morse
2019-10-15 15:41 ` Lendacky, Thomas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).