* [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
@ 2022-02-16 17:50 ` Sebastian Andrzej Siewior
2022-02-17 6:35 ` Marek Szyprowski
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-16 17:50 UTC (permalink / raw)
To: Marek Szyprowski
Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen
I missed the obvious case where netif_ix() is invoked from hard-IRQ
context.
Disabling bottom halves is only needed in process context. This ensures
that the code remains on the current CPU and that the soft-interrupts
are processed at local_bh_enable() time.
In hard- and soft-interrupt context this is already the case and the
soft-interrupts will be processed once the context is left (at irq-exit
time).
Disable bottom halves if neither hard-interrupts nor soft-interrupts are
disabled. Update the kernel-doc, mention that interrupts must be enabled
if invoked from process context.
Fixes: baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked in any context.")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Marek, does this work for you?
net/core/dev.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 909fb38159108..87729491460fc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4860,7 +4860,9 @@ EXPORT_SYMBOL(__netif_rx);
* congestion control or by the protocol layers.
* The network buffer is passed via the backlog NAPI device. Modern NIC
* driver should use NAPI and GRO.
- * This function can used from any context.
+ * This function can used from interrupt and from process context. The
+ * caller from process context must not disable interrupts before invoking
+ * this function.
*
* return values:
* NET_RX_SUCCESS (no congestion)
@@ -4870,12 +4872,15 @@ EXPORT_SYMBOL(__netif_rx);
int netif_rx(struct sk_buff *skb)
{
int ret;
+ bool need_bh_off = !(hardirq_count() | softirq_count());
- local_bh_disable();
+ if (need_bh_off)
+ local_bh_disable();
trace_netif_rx_entry(skb);
ret = netif_rx_internal(skb);
trace_netif_rx_exit(ret);
- local_bh_enable();
+ if (need_bh_off)
+ local_bh_enable();
return ret;
}
EXPORT_SYMBOL(netif_rx);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
2022-02-16 17:50 ` [PATCH net-next] net: Correct wrong BH disable in hard-interrupt Sebastian Andrzej Siewior
@ 2022-02-17 6:35 ` Marek Szyprowski
2022-02-17 14:08 ` Marek Szyprowski
2022-02-23 7:56 ` Geert Uytterhoeven
2022-02-23 8:05 ` Sebastian Andrzej Siewior
2 siblings, 1 reply; 11+ messages in thread
From: Marek Szyprowski @ 2022-02-17 6:35 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen
Hi Andrzej,
On 16.02.2022 18:50, Sebastian Andrzej Siewior wrote:
> I missed the obvious case where netif_ix() is invoked from hard-IRQ
> context.
>
> Disabling bottom halves is only needed in process context. This ensures
> that the code remains on the current CPU and that the soft-interrupts
> are processed at local_bh_enable() time.
> In hard- and soft-interrupt context this is already the case and the
> soft-interrupts will be processed once the context is left (at irq-exit
> time).
>
> Disable bottom halves if neither hard-interrupts nor soft-interrupts are
> disabled. Update the kernel-doc, mention that interrupts must be enabled
> if invoked from process context.
>
> Fixes: baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked in any context.")
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> Marek, does this work for you?
Yes, this fixed the issue. Thanks!
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> net/core/dev.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 909fb38159108..87729491460fc 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4860,7 +4860,9 @@ EXPORT_SYMBOL(__netif_rx);
> * congestion control or by the protocol layers.
> * The network buffer is passed via the backlog NAPI device. Modern NIC
> * driver should use NAPI and GRO.
> - * This function can used from any context.
> + * This function can used from interrupt and from process context. The
> + * caller from process context must not disable interrupts before invoking
> + * this function.
> *
> * return values:
> * NET_RX_SUCCESS (no congestion)
> @@ -4870,12 +4872,15 @@ EXPORT_SYMBOL(__netif_rx);
> int netif_rx(struct sk_buff *skb)
> {
> int ret;
> + bool need_bh_off = !(hardirq_count() | softirq_count());
>
> - local_bh_disable();
> + if (need_bh_off)
> + local_bh_disable();
> trace_netif_rx_entry(skb);
> ret = netif_rx_internal(skb);
> trace_netif_rx_exit(ret);
> - local_bh_enable();
> + if (need_bh_off)
> + local_bh_enable();
> return ret;
> }
> EXPORT_SYMBOL(netif_rx);
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
2022-02-17 6:35 ` Marek Szyprowski
@ 2022-02-17 14:08 ` Marek Szyprowski
2022-02-17 14:21 ` Sebastian Andrzej Siewior
2022-02-18 9:34 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 11+ messages in thread
From: Marek Szyprowski @ 2022-02-17 14:08 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen
Hi All,
On 17.02.2022 07:35, Marek Szyprowski wrote:
> Hi Andrzej,
>
> On 16.02.2022 18:50, Sebastian Andrzej Siewior wrote:
>> I missed the obvious case where netif_ix() is invoked from hard-IRQ
>> context.
>>
>> Disabling bottom halves is only needed in process context. This ensures
>> that the code remains on the current CPU and that the soft-interrupts
>> are processed at local_bh_enable() time.
>> In hard- and soft-interrupt context this is already the case and the
>> soft-interrupts will be processed once the context is left (at irq-exit
>> time).
>>
>> Disable bottom halves if neither hard-interrupts nor soft-interrupts are
>> disabled. Update the kernel-doc, mention that interrupts must be enabled
>> if invoked from process context.
>>
>> Fixes: baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked
>> in any context.")
>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> ---
>> Marek, does this work for you?
>
> Yes, this fixed the issue. Thanks!
>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
I've just noticed that there is one more issue left to fix (the $subject
patch is already applied) - this one comes from threaded irq (if I got
the stack trace right):
------------[ cut here ]------------
WARNING: CPU: 0 PID: 147 at kernel/softirq.c:363
__local_bh_enable_ip+0xa8/0x1c0
Modules linked in: cpufreq_powersave cpufreq_conservative brcmfmac
brcmutil cfg80211 s3fwrn5_i2c s3fwrn5 nci crct10dif_ce exynos_gsc nfc
s5p_jpeg hci_uart btqca s
5p_mfc v4l2_mem2mem btbcm bluetooth videobuf2_dma_contig
videobuf2_memops ecdh_generic videobuf2_v4l2 panfrost drm_shmem_helper
ecc videobuf2_common gpu_sched rfkill videodev mc ip_tables x_tab
les ipv6
CPU: 0 PID: 147 Comm: irq/150-dwc3 Not tainted 5.17.0-rc4-next-20220217+
#4557
Hardware name: Samsung TM2E board (DT)
pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __local_bh_enable_ip+0xa8/0x1c0
lr : netif_rx+0xa4/0x2c0
...
Call trace:
__local_bh_enable_ip+0xa8/0x1c0
netif_rx+0xa4/0x2c0
rx_complete+0x214/0x250
usb_gadget_giveback_request+0x58/0x170
dwc3_gadget_giveback+0xe4/0x200
dwc3_gadget_endpoint_trbs_complete+0x100/0x388
dwc3_thread_interrupt+0x46c/0xe20
irq_thread_fn+0x28/0x98
irq_thread+0x184/0x238
kthread+0x100/0x120
ret_from_fork+0x10/0x20
irq event stamp: 645
hardirqs last enabled at (643): [<ffff8000080c93b8>]
finish_task_switch+0x98/0x288
hardirqs last disabled at (644): [<ffff8000090a6e34>]
_raw_spin_lock_irqsave+0xb4/0x148
softirqs last enabled at (252): [<ffff800008010488>] _stext+0x488/0x5cc
softirqs last disabled at (645): [<ffff800008ed71b0>] netif_rx+0x188/0x2c0
---[ end trace 0000000000000000 ]---
>
>> net/core/dev.c | 11 ++++++++---
>> 1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 909fb38159108..87729491460fc 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -4860,7 +4860,9 @@ EXPORT_SYMBOL(__netif_rx);
>> * congestion control or by the protocol layers.
>> * The network buffer is passed via the backlog NAPI device.
>> Modern NIC
>> * driver should use NAPI and GRO.
>> - * This function can used from any context.
>> + * This function can used from interrupt and from process
>> context. The
>> + * caller from process context must not disable interrupts before
>> invoking
>> + * this function.
>> *
>> * return values:
>> * NET_RX_SUCCESS (no congestion)
>> @@ -4870,12 +4872,15 @@ EXPORT_SYMBOL(__netif_rx);
>> int netif_rx(struct sk_buff *skb)
>> {
>> int ret;
>> + bool need_bh_off = !(hardirq_count() | softirq_count());
>> - local_bh_disable();
>> + if (need_bh_off)
>> + local_bh_disable();
>> trace_netif_rx_entry(skb);
>> ret = netif_rx_internal(skb);
>> trace_netif_rx_exit(ret);
>> - local_bh_enable();
>> + if (need_bh_off)
>> + local_bh_enable();
>> return ret;
>> }
>> EXPORT_SYMBOL(netif_rx);
>
> Best regards
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
2022-02-17 14:08 ` Marek Szyprowski
@ 2022-02-17 14:21 ` Sebastian Andrzej Siewior
2022-02-17 14:44 ` Marek Szyprowski
2022-02-18 9:34 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-17 14:21 UTC (permalink / raw)
To: Marek Szyprowski
Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen
On 2022-02-17 15:08:55 [+0100], Marek Szyprowski wrote:
> Hi All,
Hi,
> I've just noticed that there is one more issue left to fix (the $subject
> patch is already applied) - this one comes from threaded irq (if I got
> the stack trace right):
This is not `threadirqs' on the command line, right?
Sebastian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
2022-02-17 14:21 ` Sebastian Andrzej Siewior
@ 2022-02-17 14:44 ` Marek Szyprowski
0 siblings, 0 replies; 11+ messages in thread
From: Marek Szyprowski @ 2022-02-17 14:44 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen
On 17.02.2022 15:21, Sebastian Andrzej Siewior wrote:
> On 2022-02-17 15:08:55 [+0100], Marek Szyprowski wrote:
>> I've just noticed that there is one more issue left to fix (the $subject
>> patch is already applied) - this one comes from threaded irq (if I got
>> the stack trace right):
> This is not `threadirqs' on the command line, right?
I don't have 'threadirqs' in the kernel cmdline for that board.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
2022-02-17 14:08 ` Marek Szyprowski
2022-02-17 14:21 ` Sebastian Andrzej Siewior
@ 2022-02-18 9:34 ` Sebastian Andrzej Siewior
2022-02-18 10:02 ` Marek Szyprowski
1 sibling, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-18 9:34 UTC (permalink / raw)
To: Marek Szyprowski
Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen, Felipe Balbi, linux-usb
On 2022-02-17 15:08:55 [+0100], Marek Szyprowski wrote:
> Hi All,
Hi,
> >> Marek, does this work for you?
> >
> > Yes, this fixed the issue. Thanks!
> >
> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>
> I've just noticed that there is one more issue left to fix (the $subject
> patch is already applied) - this one comes from threaded irq (if I got
> the stack trace right):
netif_rx() did only set the matching softirq bit and not more. Based on
that I don't see why NOHZ shouldn't complain about a pending softirq
once the CPU goes idle. Therefore I think the change I made is good
since it uncovered that.
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 147 at kernel/softirq.c:363
> __local_bh_enable_ip+0xa8/0x1c0
…
> CPU: 0 PID: 147 Comm: irq/150-dwc3 Not tainted 5.17.0-rc4-next-20220217+
> #4557
> Hardware name: Samsung TM2E board (DT)
> pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __local_bh_enable_ip+0xa8/0x1c0
> lr : netif_rx+0xa4/0x2c0
> ...
>
> Call trace:
> __local_bh_enable_ip+0xa8/0x1c0
> netif_rx+0xa4/0x2c0
> rx_complete+0x214/0x250
> usb_gadget_giveback_request+0x58/0x170
> dwc3_gadget_giveback+0xe4/0x200
> dwc3_gadget_endpoint_trbs_complete+0x100/0x388
> dwc3_thread_interrupt+0x46c/0xe20
So dwc3_thread_interrupt() disables interrupts here. Felipe dropped it
and then added it back in
e5f68b4a3e7b0 ("Revert "usb: dwc3: gadget: remove unnecessary _irqsave()"")
I would suggest to revert it (the above commit) and fixing the lockdep
splat in the gadget driver and other. I don't see the g_ether warning
Felipe mentioned. It might come from f_ncm (since it uses a timer) or
something else in the network stack (that uses a timeout timer).
But not now.
As much as I hate it, I suggest:
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 183b90923f51b..a0c883f19a417 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4160,9 +4160,11 @@ static irqreturn_t dwc3_thread_interrupt(int irq, void *_evt)
unsigned long flags;
irqreturn_t ret = IRQ_NONE;
+ local_bh_disable();
spin_lock_irqsave(&dwc->lock, flags);
ret = dwc3_process_event_buf(evt);
spin_unlock_irqrestore(&dwc->lock, flags);
+ local_bh_enable();
return ret;
}
In the long run I would drop that irqsave (along with bh_disable() since
netif_rx() covers that) and make sure the there is no lockdep warning
popping up.
Marek, could you please give it a try?
Sebastian
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
2022-02-18 9:34 ` Sebastian Andrzej Siewior
@ 2022-02-18 10:02 ` Marek Szyprowski
2022-02-18 10:13 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 11+ messages in thread
From: Marek Szyprowski @ 2022-02-18 10:02 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen, Felipe Balbi, linux-usb
Hi,
On 18.02.2022 10:34, Sebastian Andrzej Siewior wrote:
> On 2022-02-17 15:08:55 [+0100], Marek Szyprowski wrote:
>>>> Marek, does this work for you?
>>> Yes, this fixed the issue. Thanks!
>>>
>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> I've just noticed that there is one more issue left to fix (the $subject
>> patch is already applied) - this one comes from threaded irq (if I got
>> the stack trace right):
> netif_rx() did only set the matching softirq bit and not more. Based on
> that I don't see why NOHZ shouldn't complain about a pending softirq
> once the CPU goes idle. Therefore I think the change I made is good
> since it uncovered that.
>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 147 at kernel/softirq.c:363
>> __local_bh_enable_ip+0xa8/0x1c0
> …
>> CPU: 0 PID: 147 Comm: irq/150-dwc3 Not tainted 5.17.0-rc4-next-20220217+
>> #4557
>> Hardware name: Samsung TM2E board (DT)
>> pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : __local_bh_enable_ip+0xa8/0x1c0
>> lr : netif_rx+0xa4/0x2c0
>> ...
>>
>> Call trace:
>> __local_bh_enable_ip+0xa8/0x1c0
>> netif_rx+0xa4/0x2c0
>> rx_complete+0x214/0x250
>> usb_gadget_giveback_request+0x58/0x170
>> dwc3_gadget_giveback+0xe4/0x200
>> dwc3_gadget_endpoint_trbs_complete+0x100/0x388
>> dwc3_thread_interrupt+0x46c/0xe20
> So dwc3_thread_interrupt() disables interrupts here. Felipe dropped it
> and then added it back in
> e5f68b4a3e7b0 ("Revert "usb: dwc3: gadget: remove unnecessary _irqsave()"")
>
> I would suggest to revert it (the above commit) and fixing the lockdep
> splat in the gadget driver and other. I don't see the g_ether warning
> Felipe mentioned. It might come from f_ncm (since it uses a timer) or
> something else in the network stack (that uses a timeout timer).
> But not now.
> As much as I hate it, I suggest:
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 183b90923f51b..a0c883f19a417 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -4160,9 +4160,11 @@ static irqreturn_t dwc3_thread_interrupt(int irq, void *_evt)
> unsigned long flags;
> irqreturn_t ret = IRQ_NONE;
>
> + local_bh_disable();
> spin_lock_irqsave(&dwc->lock, flags);
> ret = dwc3_process_event_buf(evt);
> spin_unlock_irqrestore(&dwc->lock, flags);
> + local_bh_enable();
>
> return ret;
> }
>
>
> In the long run I would drop that irqsave (along with bh_disable() since
> netif_rx() covers that) and make sure the there is no lockdep warning
> popping up.
>
> Marek, could you please give it a try?
Yes. The above change fixes the issue.
I've also tried to revert the mentioned commit e5f68b4a3e7b, but this
gives me the following lockdep warning:
IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready
========================================================
WARNING: possible irq lock inversion dependency detected
5.17.0-rc4-next-20220217+ #4563 Not tainted
--------------------------------------------------------
swapper/0/0 just changed the state of lock:
ffff000025548098 (_xmit_ETHER#2){+.-.}-{2:2}, at:
sch_direct_xmit+0x2f0/0x360
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&dev->lock#2){+.+.}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->lock#2);
local_irq_disable();
lock(_xmit_ETHER#2);
lock(&dev->lock#2);
<Interrupt>
lock(_xmit_ETHER#2);
*** DEADLOCK ***
5 locks held by swapper/0/0:
#0: ffff80000a5f39e0 ((&ndev->rs_timer)){+.-.}-{0:0}, at:
call_timer_fn+0x0/0x3b0
#1: ffff80000a695e00 (rcu_read_lock){....}-{1:2}, at:
ndisc_send_skb+0x138/0x730 [ipv6]
#2: ffff80000a696290 (rcu_read_lock_bh){....}-{1:2}, at:
ip6_finish_output2+0x94/0xbe8 [ipv6]
#3: ffff80000a696290 (rcu_read_lock_bh){....}-{1:2}, at:
__dev_queue_xmit+0x64/0xec8
#4: ffff0000246cd258 (dev->qdisc_tx_busylock ?:
&qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x9a4/0xec8
the shortest dependencies between 2nd lock and 1st lock:
-> (&dev->lock#2){+.+.}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire+0x120/0x3c8
_raw_spin_lock+0x58/0x78
gether_connect+0x118/0x1c0
ecm_set_alt+0xf4/0x170
composite_setup+0x8d0/0x1820
dwc3_ep0_delegate_req+0x40/0x68
dwc3_ep0_interrupt+0x5a4/0xf88
dwc3_thread_interrupt+0xb7c/0xe18
irq_thread_fn+0x28/0x98
irq_thread+0x184/0x238
kthread+0x100/0x120
ret_from_fork+0x10/0x20
SOFTIRQ-ON-W at:
lock_acquire+0x120/0x3c8
_raw_spin_lock+0x58/0x78
gether_connect+0x118/0x1c0
ecm_set_alt+0xf4/0x170
composite_setup+0x8d0/0x1820
dwc3_ep0_delegate_req+0x40/0x68
dwc3_ep0_interrupt+0x5a4/0xf88
dwc3_thread_interrupt+0xb7c/0xe18
irq_thread_fn+0x28/0x98
irq_thread+0x184/0x238
kthread+0x100/0x120
ret_from_fork+0x10/0x20
INITIAL USE at:
lock_acquire+0x120/0x3c8
_raw_spin_lock_irq+0x74/0xa0
eth_open+0x28/0x78
__dev_open+0x100/0x190
__dev_change_flags+0x16c/0x1b8
dev_change_flags+0x20/0x60
do_setlink+0x314/0xd28
__rtnl_newlink+0x418/0x798
rtnl_newlink+0x4c/0x78
rtnetlink_rcv_msg+0x2bc/0x4e0
netlink_rcv_skb+0xe8/0x130
rtnetlink_rcv+0x14/0x20
netlink_unicast+0x1d4/0x280
netlink_sendmsg+0x2cc/0x408
____sys_sendmsg+0x258/0x290
___sys_sendmsg+0x80/0xc0
__sys_sendmsg+0x60/0xb8
__arm64_sys_sendmsg+0x1c/0x28
invoke_syscall+0x40/0xf8
el0_svc_common.constprop.3+0x8c/0x120
do_el0_svc+0x20/0x98
el0_svc+0x48/0x100
el0t_64_sync_handler+0x8c/0xb0
el0t_64_sync+0x15c/0x160
}
... key at: [<ffff80000b5004b8>] __key.60862+0x0/0x10
... acquired at:
_raw_spin_lock_irqsave+0x78/0x148
eth_start_xmit+0x30/0x3a0
dev_hard_start_xmit+0x108/0x3d8
sch_direct_xmit+0xf4/0x360
__dev_queue_xmit+0xa10/0xec8
dev_queue_xmit+0x10/0x18
neigh_resolve_output+0x18c/0x290
ip6_finish_output2+0x20c/0xbe8 [ipv6]
__ip6_finish_output+0x104/0x2b8 [ipv6]
ip6_finish_output+0x30/0x108 [ipv6]
ip6_output+0x80/0x360 [ipv6]
mld_sendpack+0x570/0x5a0 [ipv6]
mld_ifc_work+0x2a4/0x4a0 [ipv6]
process_one_work+0x29c/0x6b0
worker_thread+0x48/0x420
kthread+0x100/0x120
ret_from_fork+0x10/0x20
-> (_xmit_ETHER#2){+.-.}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire+0x120/0x3c8
_raw_spin_lock+0x58/0x78
sch_direct_xmit+0x2f0/0x360
__dev_queue_xmit+0xa10/0xec8
dev_queue_xmit+0x10/0x18
neigh_resolve_output+0x18c/0x290
ip6_finish_output2+0x20c/0xbe8 [ipv6]
__ip6_finish_output+0x104/0x2b8 [ipv6]
ip6_finish_output+0x30/0x108 [ipv6]
ip6_output+0x80/0x360 [ipv6]
mld_sendpack+0x570/0x5a0 [ipv6]
mld_ifc_work+0x2a4/0x4a0 [ipv6]
process_one_work+0x29c/0x6b0
worker_thread+0x48/0x420
kthread+0x100/0x120
ret_from_fork+0x10/0x20
IN-SOFTIRQ-W at:
lock_acquire+0x120/0x3c8
_raw_spin_lock+0x58/0x78
sch_direct_xmit+0x2f0/0x360
__dev_queue_xmit+0xa10/0xec8
dev_queue_xmit+0x10/0x18
ip6_finish_output2+0x8bc/0xbe8 [ipv6]
__ip6_finish_output+0x104/0x2b8 [ipv6]
ip6_finish_output+0x30/0x108 [ipv6]
ip6_output+0x80/0x360 [ipv6]
ndisc_send_skb+0x464/0x730 [ipv6]
ndisc_send_rs+0x58/0x138 [ipv6]
addrconf_rs_timer+0x140/0x218 [ipv6]
call_timer_fn+0xb4/0x3b0
run_timer_softirq+0x290/0x678
_stext+0x11c/0x5cc
irq_exit_rcu+0x168/0x1a8
el1_interrupt+0x40/0x128
el1h_64_irq_handler+0x14/0x20
el1h_64_irq+0x64/0x68
arch_cpu_idle+0x14/0x20
default_idle_call+0x78/0x350
do_idle+0x1f0/0x280
cpu_startup_entry+0x24/0x80
rest_init+0x180/0x290
arch_call_rest_init+0xc/0x14
start_kernel+0x694/0x6cc
__primary_switched+0xc0/0xc8
INITIAL USE at:
lock_acquire+0x120/0x3c8
_raw_spin_lock+0x58/0x78
sch_direct_xmit+0x2f0/0x360
__dev_queue_xmit+0xa10/0xec8
dev_queue_xmit+0x10/0x18
neigh_resolve_output+0x18c/0x290
ip6_finish_output2+0x20c/0xbe8 [ipv6]
__ip6_finish_output+0x104/0x2b8 [ipv6]
ip6_finish_output+0x30/0x108 [ipv6]
ip6_output+0x80/0x360 [ipv6]
mld_sendpack+0x570/0x5a0 [ipv6]
mld_ifc_work+0x2a4/0x4a0 [ipv6]
process_one_work+0x29c/0x6b0
worker_thread+0x48/0x420
kthread+0x100/0x120
ret_from_fork+0x10/0x20
}
... key at: [<ffff80000b507ca8>] netdev_xmit_lock_key+0x10/0x390
... acquired at:
__lock_acquire+0x564/0x16f8
lock_acquire+0x120/0x3c8
_raw_spin_lock+0x58/0x78
sch_direct_xmit+0x2f0/0x360
__dev_queue_xmit+0xa10/0xec8
dev_queue_xmit+0x10/0x18
ip6_finish_output2+0x8bc/0xbe8 [ipv6]
__ip6_finish_output+0x104/0x2b8 [ipv6]
ip6_finish_output+0x30/0x108 [ipv6]
ip6_output+0x80/0x360 [ipv6]
ndisc_send_skb+0x464/0x730 [ipv6]
ndisc_send_rs+0x58/0x138 [ipv6]
addrconf_rs_timer+0x140/0x218 [ipv6]
call_timer_fn+0xb4/0x3b0
run_timer_softirq+0x290/0x678
_stext+0x11c/0x5cc
irq_exit_rcu+0x168/0x1a8
el1_interrupt+0x40/0x128
el1h_64_irq_handler+0x14/0x20
el1h_64_irq+0x64/0x68
arch_cpu_idle+0x14/0x20
default_idle_call+0x78/0x350
do_idle+0x1f0/0x280
cpu_startup_entry+0x24/0x80
rest_init+0x180/0x290
arch_call_rest_init+0xc/0x14
start_kernel+0x694/0x6cc
__primary_switched+0xc0/0xc8
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc4-next-20220217+ #4563
Hardware name: Samsung TM2E board (DT)
Call trace:
dump_backtrace.part.6+0xd8/0xe8
show_stack+0x14/0x60
dump_stack_lvl+0x88/0xb0
dump_stack+0x14/0x2c
print_irq_inversion_bug+0x194/0x1f8
mark_lock.part.52+0x29c/0x440
__lock_acquire+0x564/0x16f8
lock_acquire+0x120/0x3c8
_raw_spin_lock+0x58/0x78
sch_direct_xmit+0x2f0/0x360
__dev_queue_xmit+0xa10/0xec8
dev_queue_xmit+0x10/0x18
ip6_finish_output2+0x8bc/0xbe8 [ipv6]
__ip6_finish_output+0x104/0x2b8 [ipv6]
ip6_finish_output+0x30/0x108 [ipv6]
ip6_output+0x80/0x360 [ipv6]
ndisc_send_skb+0x464/0x730 [ipv6]
ndisc_send_rs+0x58/0x138 [ipv6]
addrconf_rs_timer+0x140/0x218 [ipv6]
call_timer_fn+0xb4/0x3b0
run_timer_softirq+0x290/0x678
_stext+0x11c/0x5cc
irq_exit_rcu+0x168/0x1a8
el1_interrupt+0x40/0x128
el1h_64_irq_handler+0x14/0x20
el1h_64_irq+0x64/0x68
arch_cpu_idle+0x14/0x20
default_idle_call+0x78/0x350
do_idle+0x1f0/0x280
cpu_startup_entry+0x24/0x80
rest_init+0x180/0x290
arch_call_rest_init+0xc/0x14
start_kernel+0x694/0x6cc
__primary_switched+0xc0/0xc8
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
2022-02-18 10:02 ` Marek Szyprowski
@ 2022-02-18 10:13 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-18 10:13 UTC (permalink / raw)
To: Marek Szyprowski
Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen, Felipe Balbi, linux-usb
On 2022-02-18 11:02:35 [+0100], Marek Szyprowski wrote:
> Hi,
Hi,
> > Marek, could you please give it a try?
>
> Yes. The above change fixes the issue.
Awesome. I will make a patch and send today.
> I've also tried to revert the mentioned commit e5f68b4a3e7b, but this
> gives me the following lockdep warning:
Oh oh. Thank you for that splat.
Sebastian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
2022-02-16 17:50 ` [PATCH net-next] net: Correct wrong BH disable in hard-interrupt Sebastian Andrzej Siewior
2022-02-17 6:35 ` Marek Szyprowski
@ 2022-02-23 7:56 ` Geert Uytterhoeven
2022-02-23 8:05 ` Sebastian Andrzej Siewior
2 siblings, 0 replies; 11+ messages in thread
From: Geert Uytterhoeven @ 2022-02-23 7:56 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Marek Szyprowski, bpf, netdev, David S. Miller,
Alexei Starovoitov, Daniel Borkmann, Eric Dumazet,
Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
Thomas Gleixner, Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen
Hi Sebastian,
On Wed, 16 Feb 2022, Sebastian Andrzej Siewior wrote:
> I missed the obvious case where netif_ix() is invoked from hard-IRQ
> context.
>
> Disabling bottom halves is only needed in process context. This ensures
> that the code remains on the current CPU and that the soft-interrupts
> are processed at local_bh_enable() time.
> In hard- and soft-interrupt context this is already the case and the
> soft-interrupts will be processed once the context is left (at irq-exit
> time).
>
> Disable bottom halves if neither hard-interrupts nor soft-interrupts are
> disabled. Update the kernel-doc, mention that interrupts must be enabled
> if invoked from process context.
>
> Fixes: baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked in any context.")
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Thanks, this fixes the issue on rbtx4927[1].
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
[1] https://lore.kernel.org/all/alpine.DEB.2.22.394.2202221622570.372449@ramsan.of.borg/
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
2022-02-16 17:50 ` [PATCH net-next] net: Correct wrong BH disable in hard-interrupt Sebastian Andrzej Siewior
2022-02-17 6:35 ` Marek Szyprowski
2022-02-23 7:56 ` Geert Uytterhoeven
@ 2022-02-23 8:05 ` Sebastian Andrzej Siewior
2022-02-23 15:55 ` Jakub Kicinski
2 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-23 8:05 UTC (permalink / raw)
To: Jakub Kicinski, David S. Miller
Cc: Marek Szyprowski, bpf, netdev, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Jesper Dangaard Brouer,
John Fastabend, Thomas Gleixner,
Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen
On 2022-02-16 18:50:46 [+0100], Sebastian Andrzej Siewior wrote:
> I missed the obvious case where netif_ix() is invoked from hard-IRQ
> context.
>
> Disabling bottom halves is only needed in process context. This ensures
> that the code remains on the current CPU and that the soft-interrupts
> are processed at local_bh_enable() time.
> In hard- and soft-interrupt context this is already the case and the
> soft-interrupts will be processed once the context is left (at irq-exit
> time).
>
> Disable bottom halves if neither hard-interrupts nor soft-interrupts are
> disabled. Update the kernel-doc, mention that interrupts must be enabled
> if invoked from process context.
>
> Fixes: baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked in any context.")
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
This patch is marked as "Changes Requested" in patchwork. Could someone
please explain?
The USB/dwc3 fallout reported by Marek was addressed in
usb: dwc3: gadget: Let the interrupt handler disable bottom halves.
https://lore.kernel.org/r/Yg/YPejVQH3KkRVd@linutronix.de
and is not a shortcoming in this patch but a problem in dwc3 that was
just noticed.
Sebastian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
2022-02-23 8:05 ` Sebastian Andrzej Siewior
@ 2022-02-23 15:55 ` Jakub Kicinski
0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2022-02-23 15:55 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: David S. Miller, Marek Szyprowski, bpf, netdev,
Alexei Starovoitov, Daniel Borkmann, Eric Dumazet,
Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
Toke Høiland-Jørgensen,
Toke Høiland-Jørgensen
On Wed, 23 Feb 2022 09:05:43 +0100 Sebastian Andrzej Siewior wrote:
> On 2022-02-16 18:50:46 [+0100], Sebastian Andrzej Siewior wrote:
> > I missed the obvious case where netif_ix() is invoked from hard-IRQ
> > context.
> >
> > Disabling bottom halves is only needed in process context. This ensures
> > that the code remains on the current CPU and that the soft-interrupts
> > are processed at local_bh_enable() time.
> > In hard- and soft-interrupt context this is already the case and the
> > soft-interrupts will be processed once the context is left (at irq-exit
> > time).
> >
> > Disable bottom halves if neither hard-interrupts nor soft-interrupts are
> > disabled. Update the kernel-doc, mention that interrupts must be enabled
> > if invoked from process context.
> >
> > Fixes: baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked in any context.")
> > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> This patch is marked as "Changes Requested" in patchwork. Could someone
> please explain?
> The USB/dwc3 fallout reported by Marek was addressed in
> usb: dwc3: gadget: Let the interrupt handler disable bottom halves.
> https://lore.kernel.org/r/Yg/YPejVQH3KkRVd@linutronix.de
>
> and is not a shortcoming in this patch but a problem in dwc3 that was
> just noticed.
Unclear, let me apply it now.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-02-23 15:55 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CGME20220216175054eucas1p2d8aef6c75806dcdab18b37a4e317dd08@eucas1p2.samsung.com>
2022-02-16 17:50 ` [PATCH net-next] net: Correct wrong BH disable in hard-interrupt Sebastian Andrzej Siewior
2022-02-17 6:35 ` Marek Szyprowski
2022-02-17 14:08 ` Marek Szyprowski
2022-02-17 14:21 ` Sebastian Andrzej Siewior
2022-02-17 14:44 ` Marek Szyprowski
2022-02-18 9:34 ` Sebastian Andrzej Siewior
2022-02-18 10:02 ` Marek Szyprowski
2022-02-18 10:13 ` Sebastian Andrzej Siewior
2022-02-23 7:56 ` Geert Uytterhoeven
2022-02-23 8:05 ` Sebastian Andrzej Siewior
2022-02-23 15:55 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).