linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
       [not found]     ` <c2a64979-73d1-2c22-e048-c275c9f81558@samsung.com>
@ 2022-02-18  9:34       ` Sebastian Andrzej Siewior
  2022-02-18 10:02         ` Marek Szyprowski
  0 siblings, 1 reply; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-18  9:34 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
	Toke Høiland-Jørgensen,
	Toke Høiland-Jørgensen, Felipe Balbi, linux-usb

On 2022-02-17 15:08:55 [+0100], Marek Szyprowski wrote:
> Hi All,
Hi,

> >> Marek, does this work for you?
> >
> > Yes, this fixed the issue. Thanks!
> >
> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> I've just noticed that there is one more issue left to fix (the $subject 
> patch is already applied) - this one comes from threaded irq (if I got 
> the stack trace right):

netif_rx() did only set the matching softirq bit and not more. Based on
that I don't see why NOHZ shouldn't complain about a pending softirq
once the CPU goes idle. Therefore I think the change I made is good
since it uncovered that.

> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 147 at kernel/softirq.c:363 
> __local_bh_enable_ip+0xa8/0x1c0
> CPU: 0 PID: 147 Comm: irq/150-dwc3 Not tainted 5.17.0-rc4-next-20220217+ 
> #4557
> Hardware name: Samsung TM2E board (DT)
> pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __local_bh_enable_ip+0xa8/0x1c0
> lr : netif_rx+0xa4/0x2c0
> ...
> 
> Call trace:
>   __local_bh_enable_ip+0xa8/0x1c0
>   netif_rx+0xa4/0x2c0
>   rx_complete+0x214/0x250
>   usb_gadget_giveback_request+0x58/0x170
>   dwc3_gadget_giveback+0xe4/0x200
>   dwc3_gadget_endpoint_trbs_complete+0x100/0x388
>   dwc3_thread_interrupt+0x46c/0xe20

So dwc3_thread_interrupt() disables interrupts here. Felipe dropped it
and then added it back in
    e5f68b4a3e7b0 ("Revert "usb: dwc3: gadget: remove unnecessary _irqsave()"")

I would suggest to revert it (the above commit) and fixing the lockdep
splat in the gadget driver and other. I don't see the g_ether warning
Felipe mentioned. It might come from f_ncm (since it uses a timer) or
something else in the network stack (that uses a timeout timer).
But not now.
As much as I hate it, I suggest:

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 183b90923f51b..a0c883f19a417 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4160,9 +4160,11 @@ static irqreturn_t dwc3_thread_interrupt(int irq, void *_evt)
 	unsigned long flags;
 	irqreturn_t ret = IRQ_NONE;
 
+	local_bh_disable();
 	spin_lock_irqsave(&dwc->lock, flags);
 	ret = dwc3_process_event_buf(evt);
 	spin_unlock_irqrestore(&dwc->lock, flags);
+	local_bh_enable();
 
 	return ret;
 }


In the long run I would drop that irqsave (along with bh_disable() since
netif_rx() covers that) and make sure the there is no lockdep warning
popping up.

Marek, could you please give it a try?

Sebastian

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
  2022-02-18  9:34       ` [PATCH net-next] net: Correct wrong BH disable in hard-interrupt Sebastian Andrzej Siewior
@ 2022-02-18 10:02         ` Marek Szyprowski
  2022-02-18 10:13           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 3+ messages in thread
From: Marek Szyprowski @ 2022-02-18 10:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
	Toke Høiland-Jørgensen,
	Toke Høiland-Jørgensen, Felipe Balbi, linux-usb

Hi,

On 18.02.2022 10:34, Sebastian Andrzej Siewior wrote:
> On 2022-02-17 15:08:55 [+0100], Marek Szyprowski wrote:
>>>> Marek, does this work for you?
>>> Yes, this fixed the issue. Thanks!
>>>
>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> I've just noticed that there is one more issue left to fix (the $subject
>> patch is already applied) - this one comes from threaded irq (if I got
>> the stack trace right):
> netif_rx() did only set the matching softirq bit and not more. Based on
> that I don't see why NOHZ shouldn't complain about a pending softirq
> once the CPU goes idle. Therefore I think the change I made is good
> since it uncovered that.
>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 147 at kernel/softirq.c:363
>> __local_bh_enable_ip+0xa8/0x1c0
> …
>> CPU: 0 PID: 147 Comm: irq/150-dwc3 Not tainted 5.17.0-rc4-next-20220217+
>> #4557
>> Hardware name: Samsung TM2E board (DT)
>> pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : __local_bh_enable_ip+0xa8/0x1c0
>> lr : netif_rx+0xa4/0x2c0
>> ...
>>
>> Call trace:
>>    __local_bh_enable_ip+0xa8/0x1c0
>>    netif_rx+0xa4/0x2c0
>>    rx_complete+0x214/0x250
>>    usb_gadget_giveback_request+0x58/0x170
>>    dwc3_gadget_giveback+0xe4/0x200
>>    dwc3_gadget_endpoint_trbs_complete+0x100/0x388
>>    dwc3_thread_interrupt+0x46c/0xe20
> So dwc3_thread_interrupt() disables interrupts here. Felipe dropped it
> and then added it back in
>      e5f68b4a3e7b0 ("Revert "usb: dwc3: gadget: remove unnecessary _irqsave()"")
>
> I would suggest to revert it (the above commit) and fixing the lockdep
> splat in the gadget driver and other. I don't see the g_ether warning
> Felipe mentioned. It might come from f_ncm (since it uses a timer) or
> something else in the network stack (that uses a timeout timer).
> But not now.
> As much as I hate it, I suggest:
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 183b90923f51b..a0c883f19a417 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -4160,9 +4160,11 @@ static irqreturn_t dwc3_thread_interrupt(int irq, void *_evt)
>   	unsigned long flags;
>   	irqreturn_t ret = IRQ_NONE;
>   
> +	local_bh_disable();
>   	spin_lock_irqsave(&dwc->lock, flags);
>   	ret = dwc3_process_event_buf(evt);
>   	spin_unlock_irqrestore(&dwc->lock, flags);
> +	local_bh_enable();
>   
>   	return ret;
>   }
>
>
> In the long run I would drop that irqsave (along with bh_disable() since
> netif_rx() covers that) and make sure the there is no lockdep warning
> popping up.
>
> Marek, could you please give it a try?

Yes. The above change fixes the issue.


I've also tried to revert the mentioned commit e5f68b4a3e7b, but this 
gives me the following lockdep warning:

IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready

========================================================
WARNING: possible irq lock inversion dependency detected
5.17.0-rc4-next-20220217+ #4563 Not tainted
--------------------------------------------------------
swapper/0/0 just changed the state of lock:
ffff000025548098 (_xmit_ETHER#2){+.-.}-{2:2}, at: 
sch_direct_xmit+0x2f0/0x360
but this lock took another, SOFTIRQ-unsafe lock in the past:
  (&dev->lock#2){+.+.}-{2:2}


and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
  Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&dev->lock#2);
                                local_irq_disable();
                                lock(_xmit_ETHER#2);
                                lock(&dev->lock#2);
   <Interrupt>
     lock(_xmit_ETHER#2);

  *** DEADLOCK ***

5 locks held by swapper/0/0:
  #0: ffff80000a5f39e0 ((&ndev->rs_timer)){+.-.}-{0:0}, at: 
call_timer_fn+0x0/0x3b0
  #1: ffff80000a695e00 (rcu_read_lock){....}-{1:2}, at: 
ndisc_send_skb+0x138/0x730 [ipv6]
  #2: ffff80000a696290 (rcu_read_lock_bh){....}-{1:2}, at: 
ip6_finish_output2+0x94/0xbe8 [ipv6]
  #3: ffff80000a696290 (rcu_read_lock_bh){....}-{1:2}, at: 
__dev_queue_xmit+0x64/0xec8
  #4: ffff0000246cd258 (dev->qdisc_tx_busylock ?: 
&qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x9a4/0xec8

the shortest dependencies between 2nd lock and 1st lock:
  -> (&dev->lock#2){+.+.}-{2:2} {
     HARDIRQ-ON-W at:
                       lock_acquire+0x120/0x3c8
                       _raw_spin_lock+0x58/0x78
                       gether_connect+0x118/0x1c0
                       ecm_set_alt+0xf4/0x170
                       composite_setup+0x8d0/0x1820
                       dwc3_ep0_delegate_req+0x40/0x68
                       dwc3_ep0_interrupt+0x5a4/0xf88
                       dwc3_thread_interrupt+0xb7c/0xe18
                       irq_thread_fn+0x28/0x98
                       irq_thread+0x184/0x238
                       kthread+0x100/0x120
                       ret_from_fork+0x10/0x20
     SOFTIRQ-ON-W at:
                       lock_acquire+0x120/0x3c8
                       _raw_spin_lock+0x58/0x78
                       gether_connect+0x118/0x1c0
                       ecm_set_alt+0xf4/0x170
                       composite_setup+0x8d0/0x1820
                       dwc3_ep0_delegate_req+0x40/0x68
                       dwc3_ep0_interrupt+0x5a4/0xf88
                       dwc3_thread_interrupt+0xb7c/0xe18
                       irq_thread_fn+0x28/0x98
                       irq_thread+0x184/0x238
                       kthread+0x100/0x120
                       ret_from_fork+0x10/0x20
     INITIAL USE at:
                      lock_acquire+0x120/0x3c8
                      _raw_spin_lock_irq+0x74/0xa0
                      eth_open+0x28/0x78
                      __dev_open+0x100/0x190
                      __dev_change_flags+0x16c/0x1b8
                      dev_change_flags+0x20/0x60
                      do_setlink+0x314/0xd28
                      __rtnl_newlink+0x418/0x798
                      rtnl_newlink+0x4c/0x78
                      rtnetlink_rcv_msg+0x2bc/0x4e0
                      netlink_rcv_skb+0xe8/0x130
                      rtnetlink_rcv+0x14/0x20
                      netlink_unicast+0x1d4/0x280
                      netlink_sendmsg+0x2cc/0x408
                      ____sys_sendmsg+0x258/0x290
                      ___sys_sendmsg+0x80/0xc0
                      __sys_sendmsg+0x60/0xb8
                      __arm64_sys_sendmsg+0x1c/0x28
                      invoke_syscall+0x40/0xf8
                      el0_svc_common.constprop.3+0x8c/0x120
                      do_el0_svc+0x20/0x98
                      el0_svc+0x48/0x100
                      el0t_64_sync_handler+0x8c/0xb0
                      el0t_64_sync+0x15c/0x160
   }
   ... key      at: [<ffff80000b5004b8>] __key.60862+0x0/0x10
   ... acquired at:
    _raw_spin_lock_irqsave+0x78/0x148
    eth_start_xmit+0x30/0x3a0
    dev_hard_start_xmit+0x108/0x3d8
    sch_direct_xmit+0xf4/0x360
    __dev_queue_xmit+0xa10/0xec8
    dev_queue_xmit+0x10/0x18
    neigh_resolve_output+0x18c/0x290
    ip6_finish_output2+0x20c/0xbe8 [ipv6]
    __ip6_finish_output+0x104/0x2b8 [ipv6]
    ip6_finish_output+0x30/0x108 [ipv6]
    ip6_output+0x80/0x360 [ipv6]
    mld_sendpack+0x570/0x5a0 [ipv6]
    mld_ifc_work+0x2a4/0x4a0 [ipv6]
    process_one_work+0x29c/0x6b0
    worker_thread+0x48/0x420
    kthread+0x100/0x120
    ret_from_fork+0x10/0x20

-> (_xmit_ETHER#2){+.-.}-{2:2} {
    HARDIRQ-ON-W at:
                     lock_acquire+0x120/0x3c8
                     _raw_spin_lock+0x58/0x78
                     sch_direct_xmit+0x2f0/0x360
                     __dev_queue_xmit+0xa10/0xec8
                     dev_queue_xmit+0x10/0x18
                     neigh_resolve_output+0x18c/0x290
                     ip6_finish_output2+0x20c/0xbe8 [ipv6]
                     __ip6_finish_output+0x104/0x2b8 [ipv6]
                     ip6_finish_output+0x30/0x108 [ipv6]
                     ip6_output+0x80/0x360 [ipv6]
                     mld_sendpack+0x570/0x5a0 [ipv6]
                     mld_ifc_work+0x2a4/0x4a0 [ipv6]
                     process_one_work+0x29c/0x6b0
                     worker_thread+0x48/0x420
                     kthread+0x100/0x120
                     ret_from_fork+0x10/0x20
    IN-SOFTIRQ-W at:
                     lock_acquire+0x120/0x3c8
                     _raw_spin_lock+0x58/0x78
                     sch_direct_xmit+0x2f0/0x360
                     __dev_queue_xmit+0xa10/0xec8
                     dev_queue_xmit+0x10/0x18
                     ip6_finish_output2+0x8bc/0xbe8 [ipv6]
                     __ip6_finish_output+0x104/0x2b8 [ipv6]
                     ip6_finish_output+0x30/0x108 [ipv6]
                     ip6_output+0x80/0x360 [ipv6]
                     ndisc_send_skb+0x464/0x730 [ipv6]
                     ndisc_send_rs+0x58/0x138 [ipv6]
                     addrconf_rs_timer+0x140/0x218 [ipv6]
                     call_timer_fn+0xb4/0x3b0
                     run_timer_softirq+0x290/0x678
                     _stext+0x11c/0x5cc
                     irq_exit_rcu+0x168/0x1a8
                     el1_interrupt+0x40/0x128
                     el1h_64_irq_handler+0x14/0x20
                     el1h_64_irq+0x64/0x68
                     arch_cpu_idle+0x14/0x20
                     default_idle_call+0x78/0x350
                     do_idle+0x1f0/0x280
                     cpu_startup_entry+0x24/0x80
                     rest_init+0x180/0x290
                     arch_call_rest_init+0xc/0x14
                     start_kernel+0x694/0x6cc
                     __primary_switched+0xc0/0xc8
    INITIAL USE at:
                    lock_acquire+0x120/0x3c8
                    _raw_spin_lock+0x58/0x78
                    sch_direct_xmit+0x2f0/0x360
                    __dev_queue_xmit+0xa10/0xec8
                    dev_queue_xmit+0x10/0x18
                    neigh_resolve_output+0x18c/0x290
                    ip6_finish_output2+0x20c/0xbe8 [ipv6]
                    __ip6_finish_output+0x104/0x2b8 [ipv6]
                    ip6_finish_output+0x30/0x108 [ipv6]
                    ip6_output+0x80/0x360 [ipv6]
                    mld_sendpack+0x570/0x5a0 [ipv6]
                    mld_ifc_work+0x2a4/0x4a0 [ipv6]
                    process_one_work+0x29c/0x6b0
                    worker_thread+0x48/0x420
                    kthread+0x100/0x120
                    ret_from_fork+0x10/0x20
  }
  ... key      at: [<ffff80000b507ca8>] netdev_xmit_lock_key+0x10/0x390
  ... acquired at:
    __lock_acquire+0x564/0x16f8
    lock_acquire+0x120/0x3c8
    _raw_spin_lock+0x58/0x78
    sch_direct_xmit+0x2f0/0x360
    __dev_queue_xmit+0xa10/0xec8
    dev_queue_xmit+0x10/0x18
    ip6_finish_output2+0x8bc/0xbe8 [ipv6]
    __ip6_finish_output+0x104/0x2b8 [ipv6]
    ip6_finish_output+0x30/0x108 [ipv6]
    ip6_output+0x80/0x360 [ipv6]
    ndisc_send_skb+0x464/0x730 [ipv6]
    ndisc_send_rs+0x58/0x138 [ipv6]
    addrconf_rs_timer+0x140/0x218 [ipv6]
    call_timer_fn+0xb4/0x3b0
    run_timer_softirq+0x290/0x678
    _stext+0x11c/0x5cc
    irq_exit_rcu+0x168/0x1a8
    el1_interrupt+0x40/0x128
    el1h_64_irq_handler+0x14/0x20
    el1h_64_irq+0x64/0x68
    arch_cpu_idle+0x14/0x20
    default_idle_call+0x78/0x350
    do_idle+0x1f0/0x280
    cpu_startup_entry+0x24/0x80
    rest_init+0x180/0x290
    arch_call_rest_init+0xc/0x14
    start_kernel+0x694/0x6cc
    __primary_switched+0xc0/0xc8


stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc4-next-20220217+ #4563
Hardware name: Samsung TM2E board (DT)
Call trace:
  dump_backtrace.part.6+0xd8/0xe8
  show_stack+0x14/0x60
  dump_stack_lvl+0x88/0xb0
  dump_stack+0x14/0x2c
  print_irq_inversion_bug+0x194/0x1f8
  mark_lock.part.52+0x29c/0x440
  __lock_acquire+0x564/0x16f8
  lock_acquire+0x120/0x3c8
  _raw_spin_lock+0x58/0x78
  sch_direct_xmit+0x2f0/0x360
  __dev_queue_xmit+0xa10/0xec8
  dev_queue_xmit+0x10/0x18
  ip6_finish_output2+0x8bc/0xbe8 [ipv6]
  __ip6_finish_output+0x104/0x2b8 [ipv6]
  ip6_finish_output+0x30/0x108 [ipv6]
  ip6_output+0x80/0x360 [ipv6]
  ndisc_send_skb+0x464/0x730 [ipv6]
  ndisc_send_rs+0x58/0x138 [ipv6]
  addrconf_rs_timer+0x140/0x218 [ipv6]
  call_timer_fn+0xb4/0x3b0
  run_timer_softirq+0x290/0x678
  _stext+0x11c/0x5cc
  irq_exit_rcu+0x168/0x1a8
  el1_interrupt+0x40/0x128
  el1h_64_irq_handler+0x14/0x20
  el1h_64_irq+0x64/0x68
  arch_cpu_idle+0x14/0x20
  default_idle_call+0x78/0x350
  do_idle+0x1f0/0x280
  cpu_startup_entry+0x24/0x80
  rest_init+0x180/0x290
  arch_call_rest_init+0xc/0x14
  start_kernel+0x694/0x6cc
  __primary_switched+0xc0/0xc8


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] net: Correct wrong BH disable in hard-interrupt.
  2022-02-18 10:02         ` Marek Szyprowski
@ 2022-02-18 10:13           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-18 10:13 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: bpf, netdev, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Thomas Gleixner,
	Toke Høiland-Jørgensen,
	Toke Høiland-Jørgensen, Felipe Balbi, linux-usb

On 2022-02-18 11:02:35 [+0100], Marek Szyprowski wrote:
> Hi,
Hi,

> > Marek, could you please give it a try?
> 
> Yes. The above change fixes the issue.

Awesome. I will make a patch and send today.

> I've also tried to revert the mentioned commit e5f68b4a3e7b, but this 
> gives me the following lockdep warning:

Oh oh. Thank you for that splat.

Sebastian

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-02-18 10:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20220216175054eucas1p2d8aef6c75806dcdab18b37a4e317dd08@eucas1p2.samsung.com>
     [not found] ` <Yg05duINKBqvnxUc@linutronix.de>
     [not found]   ` <ce67e9c9-966e-3a08-e571-b7f1dacb3814@samsung.com>
     [not found]     ` <c2a64979-73d1-2c22-e048-c275c9f81558@samsung.com>
2022-02-18  9:34       ` [PATCH net-next] net: Correct wrong BH disable in hard-interrupt Sebastian Andrzej Siewior
2022-02-18 10:02         ` Marek Szyprowski
2022-02-18 10:13           ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).