All of lore.kernel.org
 help / color / mirror / Atom feed
* usbnet: driver_info->stop required to stop USB interrupts?
@ 2014-03-10 18:33 Grant Grundler
  2014-03-10 22:58 ` Grant Grundler
  0 siblings, 1 reply; 25+ messages in thread
From: Grant Grundler @ 2014-03-10 18:33 UTC (permalink / raw)
  To: netdev; +Cc: Freddy Xin, linux-usb, Allan Chou

[-- Attachment #1: Type: text/plain, Size: 11493 bytes --]

I've trying to unravel a page fault panic I've run into a few times
while testing load/unload of asix driver with ChromeOS 3.8.11 based
kernel.  I'm running into this crash on both ARM and X86. Panic output
below is from Exynos 5422 system. Test script attached.

My _guess_ is usbnet_stop() is racing with a USB interrupt from the
device and loses. First glance at the stack trace implies the
interrupt handler is trying to access something that has previously
been released.

usbnet_stop() calls driver_info->stop() if provided by the driver.  If
my guess above is correct, does that mean "stop()" call is expected
(required?) to stop interrupts coming from that USB device?
Or is something else supposed to stop RX (or other USB) traffic?

ax88179_178a.c appears to be the only usbnet driver that provides a
.stop call and was able to complete 10K iterations. asix driver
completes 200-5000 iterations before failing for different causes.

thanks,
grant

----invoke the reload_asix script and monitor test ---
scp reload_asix $T:/tmp
for i in `seq 10000`; do echo -n "RELOAD $i  " ; ssh $T ".
/tmp/reload_asix eth0 100_full" ; J=$? ; if [ $J -eq 255 ] ; then echo
" SSH timeout" ; break ; fi ; ssh $T "cat /var/log/reload-asix.out" ;
if [ $J -ne 0 ] ; then echo "  ERROR $J" ; fi ; sleep 3 ; done | tee
~/reload-AX88772-$IP-04.out

---- tombstone from Exynos 5422 on asix driver unload ----
...
[28488.367522] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[28488.380574] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
[28493.308354] usbcore: deregistering interface driver asix
[28493.310775] asix 1-1:1.0 eth0: unregister 'asix'
usb-xhci-hcd.4.auto-1, ASIX AX88772 USB 2.0 Ethernet
[28494.369787] usbcore: registered new interface driver asix
[28494.725186] asix 1-1:1.0 eth0: register 'asix' at
usb-xhci-hcd.4.auto-1, ASIX AX88772 USB 2.0 Ethernet,
c8:d7:19:d8:0b:d3
[28494.725262] usb 1-1: authorized to connect
[28495.545485] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[28497.455518] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[28497.466586] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
[28502.302851] usbcore: deregistering interface driver asix
[28502.308652] asix 1-1:1.0 eth0: unregister 'asix'
usb-xhci-hcd.4.auto-1, ASIX AX88772 USB 2.0 Ethernet
[28502.308717] Unable to handle kernel paging request at virtual
address e24cb004
[28502.308739] pgd = ea514000
[28502.308753] [e24cb004] *pgd=4241141e(bad)
[28502.308782] Internal error: Oops: 8000000d [#1] SMP ARM
[28502.308795] Modules linked in: asix(-) uvcvideo videobuf2_vmalloc
i2c_dev uinput exynos_gsc v4l2_mem2mem btmrvl_sdio sbs_9018(C)
mwifiex_sdio mwifiex btmrvl s5p_mfc videobuf2_core zram(C) bluetooth
videobuf2_dma_contig videobuf2_memops rtc_s3c zuse cfg80211
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables usbnet
joydev [last unloaded: asix]
[28502.308998] CPU: 0    Tainted: G         C    (3.8.11 #6)
[28502.309016] PC is at 0xe24cb004
[28502.309039] LR is at __wake_up_common+0x5c/0x88
[28502.309058] pc : [<e24cb004>]    lr : [<c014f848>]    psr: 80000093
[28502.309058] sp : ef10be10  ip : e24cb004  fp : ef10be3c
[28502.309076] r10: e1a0c00d  r9 : 00000000  r8 : 00000003
[28502.309091] r7 : 00000000  r6 : 00000001  r5 : e92d3ff4  r4 : ea409d14
[28502.309106] r3 : 00000000  r2 : 00000000  r1 : 00000003  r0 : c060ced4
[28502.309122] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM
Segment kernel
[28502.309138] Control: 10c5387d  Table: 4a51406a  DAC: 00000015
[28502.309153] Process ksoftirqd/0 (pid: 3, stack limit = 0xef10a240)
[28502.309168] Stack: (0xef10be10 to 0xef10c000)
[28502.309186] be00:                                     00000000
ea409d04 40000013 00000001
[28502.309209] be20: 00000003 00000000 00000100 3f6fdf7c ef10be6c
ef10be40 c0151c08 c014f7f8
[28502.309231] be40: 00000000 ef10be50 c0529a44 ea5ac540 00000000
ea5ac64c 00000000 00000000
[28502.309254] be60: ef10be8c ef10be70 bf00a0e4 c0151bcc bf009fa4
ea5ac6bc ea5ac6c0 c084c790
[28502.309277] be80: ef10beb4 ef10be90 c012bcb4 bf009fb0 c012bc1c
ef10a038 00000001 c090209c
[28502.309300] bea0: 00000006 c09795c0 ef10bf04 ef10beb8 c012b348
c012bc28 c0934314 ef10a000
[28502.309322] bec0: 00000001 ef10a020 00000000 00000000 04208040
0000000a ef10bf04 00000000
[28502.309345] bee0: c0934314 ef10a000 00000001 ef10a020 00000000
00000000 ef10bf1c ef10bf08
[28502.309368] bf00: c012b48c c012b234 c012b44c ef056d00 ef10bf44
ef10bf20 c014f204 c012b458
[28502.309391] bf20: ef101e48 00000000 ef056d00 c014f098 00000000
00000000 ef10bfac ef10bf48
[28502.309413] bf40: c01455b4 c014f0a4 00000001 00000000 ef056d00
00000000 00030003 dead4ead
[28502.309436] bf60: ffffffff ffffffff ef10bf68 ef10bf68 00000000
00000000 dead4ead ffffffff
[28502.309459] bf80: ffffffff ef10bf84 ef10bf84 271ae517 ef101e48
c01454ec 00000000 00000000
[28502.309480] bfa0: 00000000 ef10bfb0 c0106118 c01454f8 00000000
00000000 00000000 00000000
[28502.309500] bfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[28502.309520] bfe0: 00000000 00000000 00000000 00000000 00000013
00000000 00000000 00000000
[28502.309532] Backtrace:
[28502.309565] [<c014f848>] (__wake_up_common+0x5c/0x88) from
[<c0151c08>] (__wake_up+0x48/0x5c)
[28502.309597] [<c0151c08>] (__wake_up+0x48/0x5c) from [<bf00a0e4>]
(usbnet_bh+0x140/0x210 [usbnet])
[28502.309631] [<bf00a0e4>] (usbnet_bh+0x140/0x210 [usbnet]) from
[<c012bcb4>] (tasklet_action+0x98/0xf4)
[28502.309663] [<c012bcb4>] (tasklet_action+0x98/0xf4) from
[<c012b348>] (__do_softirq+0x120/0x224)
[28502.309692] [<c012b348>] (__do_softirq+0x120/0x224) from
[<c012b48c>] (run_ksoftirqd+0x40/0x60)
[28502.309719] [<c012b48c>] (run_ksoftirqd+0x40/0x60) from
[<c014f204>] (smpboot_thread_fn+0x16c/0x184)
[28502.309746] [<c014f204>] (smpboot_thread_fn+0x16c/0x184) from
[<c01455b4>] (kthread+0xc8/0xd8)
[28502.309775] [<c01455b4>] (kthread+0xc8/0xd8) from [<c0106118>]
(ret_from_fork+0x14/0x20)
[28502.309795] Code: 00000000 00000000 00000000 00000000 (00000000)
[28502.309815] ---[ end trace 980060b6dbaf7494 ]---
[28502.324123] Kernel panic - not syncing: Fatal exception in interrupt
[28502.324160] CPU1: stopping
[28502.324170] Backtrace:
[28502.324193] [<c010d3d0>] (unwind_backtrace+0x0/0x118) from
[<c060914c>] (dump_stack+0x28/0x30)
[28502.324208] [<c060914c>] (dump_stack+0x28/0x30) from [<c010bcb8>]
(handle_IPI+0xf0/0x170)
[28502.324221] [<c010bcb8>] (handle_IPI+0xf0/0x170) from [<c0100430>]
(gic_handle_irq+0x68/0x70)
[28502.324235] [<c0100430>] (gic_handle_irq+0x68/0x70) from
[<c0105c80>] (__irq_svc+0x40/0x50)
[28502.324244] Exception stack(0xea409cf0 to 0xea409d38)
[28502.324253] 9ce0:                                     00000002
ea5ac6c0 00000003 00000001
[28502.324264] 9d00: ea5ac6bc ea5ac6c0 bf31d788 ea5ac6e0 00200200
00000000 00000000 ea409d4c
[28502.324273] 9d20: 00000000 ea409d38 c012af58 c012af80 20000013 ffffffff
[28502.324288] [<c0105c80>] (__irq_svc+0x40/0x50) from [<c012af80>]
(tasklet_kill+0x78/0x8c)
[28502.324307] [<c012af80>] (tasklet_kill+0x78/0x8c) from [<bf00a950>]
(usbnet_stop+0x110/0x178 [usbnet])
[28502.324325] [<bf00a950>] (usbnet_stop+0x110/0x178 [usbnet]) from
[<c053368c>] (__dev_close_many+0xa8/0xcc)
[28502.324339] [<c053368c>] (__dev_close_many+0xa8/0xcc) from
[<c05337bc>] (dev_close_many+0x98/0x118)
[28502.324353] [<c05337bc>] (dev_close_many+0x98/0x118) from
[<c0535348>] (rollback_registered_many+0xd4/0x204)
[28502.324367] [<c0535348>] (rollback_registered_many+0xd4/0x204) from
[<c0537c6c>] (unregister_netdevice_queue+0x98/0xf4)
[28502.324381] [<c0537c6c>] (unregister_netdevice_queue+0x98/0xf4)
from [<c0537cf0>] (unregister_netdev+0x28/0x30)
[28502.324395] [<c0537cf0>] (unregister_netdev+0x28/0x30) from
[<bf009610>] (usbnet_disconnect+0x8c/0xe4 [usbnet])
[28502.324412] [<bf009610>] (usbnet_disconnect+0x8c/0xe4 [usbnet])
from [<c04266f4>] (usb_unbind_interface+0x70/0x170)
[28502.324429] [<c04266f4>] (usb_unbind_interface+0x70/0x170) from
[<c03c8648>] (__device_release_driver+0xac/0xf8)
[28502.324443] [<c03c8648>] (__device_release_driver+0xac/0xf8) from
[<c03c8c70>] (driver_detach+0x94/0xbc)
[28502.324455] [<c03c8c70>] (driver_detach+0x94/0xbc) from
[<c03c81b0>] (bus_remove_driver+0x78/0xc4)
[28502.324467] [<c03c81b0>] (bus_remove_driver+0x78/0xc4) from
[<c03c92c8>] (driver_unregister+0x54/0x78)
[28502.324480] [<c03c92c8>] (driver_unregister+0x54/0x78) from
[<c0425b4c>] (usb_deregister+0x6c/0xd4)
[28502.324495] [<c0425b4c>] (usb_deregister+0x6c/0xd4) from
[<bf31c82c>] (cleanup_module+0x14/0x7e8 [asix])
[28502.324518] [<bf31c82c>] (cleanup_module+0x14/0x7e8 [asix]) from
[<c0177c88>] (sys_delete_module+0x1c4/0x254)
[28502.324532] [<c0177c88>] (sys_delete_module+0x1c4/0x254) from
[<c0106080>] (ret_fast_syscall+0x0/0x30)
[28502.324547] CPU3: stopping
[28502.324565] Backtrace:
[28502.324610] [<c010d3d0>] (unwind_backtrace+0x0/0x118) from
[<c060914c>] (dump_stack+0x28/0x30)
[28502.324637] [<c060914c>] (dump_stack+0x28/0x30) from [<c010bcb8>]
(handle_IPI+0xf0/0x170)
[28502.324664] [<c010bcb8>] (handle_IPI+0xf0/0x170) from [<c0100430>]
(gic_handle_irq+0x68/0x70)
[28502.324692] [<c0100430>] (gic_handle_irq+0x68/0x70) from
[<c0105e00>] (__irq_usr+0x40/0x60)
[28502.324708] Exception stack(0xed205fb0 to 0xed205ff8)
[28502.324726] 5fa0:                                     00000000
00000100 00000099 ffffff67
[28502.324747] 5fc0: b859b140 b84dc8c0 00000100 00000000 00000000
00000000 00000000 00000001
[28502.324767] 5fe0: b292a5a1 abbbdf08 b5fbbded b292a5a0 80000030 ffffffff
[28502.324781] CPU2: stopping
[28502.324794] Backtrace:
[28502.324822] [<c010d3d0>] (unwind_backtrace+0x0/0x118) from
[<c060914c>] (dump_stack+0x28/0x30)
[28502.324848] [<c060914c>] (dump_stack+0x28/0x30) from [<c010bcb8>]
(handle_IPI+0xf0/0x170)
[28502.324873] [<c010bcb8>] (handle_IPI+0xf0/0x170) from [<c0100430>]
(gic_handle_irq+0x68/0x70)
[28502.324897] [<c0100430>] (gic_handle_irq+0x68/0x70) from
[<c0105c80>] (__irq_svc+0x40/0x50)
[28502.324912] Exception stack(0xed357e38 to 0xed357e80)
[28502.324928] 7e20:
    c097c000 00000000
[28502.324951] 7e40: 00000000 c195c195 c0a0df48 c097f820 00000c01
000003fe b6e01d95 ea587800
[28502.324974] 7e60: 00000064 ed357e8c ed357e80 ed357e80 c060db6c
c060db70 60000013 ffffffff
[28502.324999] [<c0105c80>] (__irq_svc+0x40/0x50) from [<c060db70>]
(_raw_spin_unlock_irq+0x1c/0x20)
[28502.325027] [<c060db70>] (_raw_spin_unlock_irq+0x1c/0x20) from
[<c0125484>] (do_syslog+0x36c/0x5f0)
[28502.325058] [<c0125484>] (do_syslog+0x36c/0x5f0) from [<c02546fc>]
(kmsg_read+0x3c/0x64)
[28502.325089] [<c02546fc>] (kmsg_read+0x3c/0x64) from [<c02484f0>]
(proc_reg_read+0x90/0xa4)
[28502.325117] [<c02484f0>] (proc_reg_read+0x90/0xa4) from
[<c01f88a8>] (vfs_read+0xb8/0x148)
[28502.325143] [<c01f88a8>] (vfs_read+0xb8/0x148) from [<c01f8ae0>]
(sys_read+0x5c/0xa4)
[28502.325168] [<c01f8ae0>] (sys_read+0x5c/0xa4) from [<c0106080>]
(ret_fast_syscall+0x0/0x30)
[28502.325184] task_migration_notifier = c0936778
[28502.325207] page containing tmn: c0936758: 00000001 00000000
dead4ead ffffffff
[28502.325228] page containing tmn: c0936768: ffffffff c093676c
c093676c 00000000
[28502.325248] page containing tmn: c0936778: 00000000 dead4ead
ffffffff ffffffff
[28502.325267] page containing tmn: c0936788: 00000000 c014f914
c014f8f0 00000000
[28502.325286] page containing tmn: c0936798: 00000000 00000000
00000000 00000000
[28502.325301] page containing tmn: c09367a8: 00000000
[28502.325329] CPU0 PC: <c011c828> exynos5_panic_notify+0x54/0xb0

[-- Attachment #2: reload_asix --]
[-- Type: application/octet-stream, Size: 2222 bytes --]

#!/bin/bash -x
#
# 1) edit ETH variable to match target machine
# 2) scp reload_asix root@$IP:/tmp
# 3) set T="root@$IP"
#
# then invoke on a "host" machine (in chroot) with:
# for i in `seq 10000`; do
#	echo -n "RELOAD $i  "
#	ssh $T '. /tmp/reload_asix eth0 100_full'
#	J=$? ; if [ $J -eq 255 ] ; then echo "  SSH timeout" ; break ; fi
#	ssh $T "cat /tmp/reload-asix.out"
#	if [ $J -ne 0 ] ; then echo "  ERROR $J" ; fi
#	sleep 3
# done | tee reload_asix-loop.out

ETH="$1"

DRIVER=$(ethtool -i $ETH | awk '/driver:/ { print $2 }')

# e.g. LINKSP="1000_full" or "100_full"
# "$(cat /sys/class/net/$ETH/speed)_$(cat /sys/class/net/$ETH/duplex)"
# We can't read it here since current state might be wedged from
# previous iteration. Toggling authorized field might fix it.
LINKSP="$2"

# assuming only one ASIX device is plugged in...really want to key off $ETH
AUTH=$(find /sys/devices -name manufacturer | xargs fgrep -l ASIX | sed 's/manufacturer/authorized/')

# redirect all output to a file. SSH might drop.
# exec > /tmp/`date  --rfc-3339=date`-reload-$$.out 2>&1
exec > /var/log/reload-$DRIVER.out 2>&1

echo -n "$(date +"%Y%m%d%H%M%S")"

mac="$(cat /sys/class/net/$ETH/address)"
ip="$(ifconfig $ETH | awk '/inet / { print $2}')"
rmmod $DRIVER

# side effect of auth/deauth is a USB reset on reconnect. :)
echo 0 > $AUTH
sleep 1

modprobe $DRIVER

echo 1 > $AUTH
sleep 1

mac2="$(cat /sys/class/net/$ETH/address)"
if [ "$mac" != "$mac2" ] ; then
	echo "  MAC mismatch before link: $mac != $mac2  reread: $(cat /sys/class/net/$ETH/address)"
	exit 254
fi

let i=0
l="NULL"
while [ "$l" != "$LINKSP" -a  $i -lt 60 ] ; do
	sleep 1
	let i+=1
	l="$(cat /sys/class/net/$ETH/speed)_$(cat /sys/class/net/$ETH/duplex)"
done

printf "  LINK %s (%d sec)" "$l" $i

mac2="$(cat /sys/class/net/$ETH/address)"
if [ "$mac" != "$mac2" ] ; then
	echo
	echo "  MAC mismatch after link: $mac != $mac2  reread: $(cat /sys/class/net/$ETH/address)"
	exit 253
fi

if [ "$l" != "$LINKSP" ] ; then
	exit 252
fi

let i=0
ip2=" "
while [ $i -lt 60 ] ; do
   sleep 1
   let i+=1
   
   ip2="$(ifconfig $ETH | awk '/inet / { print $2}')"
   if [ "$ip" = "$ip2" ] ; then
      printf "  IP %s (%d Sec)\n" $ip $i
      exit 0
   fi
done

exit 251

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-10 18:33 usbnet: driver_info->stop required to stop USB interrupts? Grant Grundler
@ 2014-03-10 22:58 ` Grant Grundler
  2014-03-11  2:53   ` Julius Werner
  0 siblings, 1 reply; 25+ messages in thread
From: Grant Grundler @ 2014-03-10 22:58 UTC (permalink / raw)
  To: netdev; +Cc: Freddy Xin, linux-usb, Allan Chou

On Mon, Mar 10, 2014 at 11:33 AM, Grant Grundler <grundler@google.com> wrote:
> I've trying to unravel a page fault panic I've run into a few times
> while testing load/unload of asix driver with ChromeOS 3.8.11 based
> kernel.  I'm running into this crash on both ARM and X86.

Correction - I can only confirm I've seen this on ARM.

sorry,
grant

> Panic output below is from Exynos 5422 system. Test script attached.
>
> My _guess_ is usbnet_stop() is racing with a USB interrupt from the
> device and loses. First glance at the stack trace implies the
> interrupt handler is trying to access something that has previously
> been released.
>
> usbnet_stop() calls driver_info->stop() if provided by the driver.  If
> my guess above is correct, does that mean "stop()" call is expected
> (required?) to stop interrupts coming from that USB device?
> Or is something else supposed to stop RX (or other USB) traffic?
>
> ax88179_178a.c appears to be the only usbnet driver that provides a
> .stop call and was able to complete 10K iterations. asix driver
> completes 200-5000 iterations before failing for different causes.
>
> thanks,
> grant
>
> ----invoke the reload_asix script and monitor test ---
> scp reload_asix $T:/tmp
> for i in `seq 10000`; do echo -n "RELOAD $i  " ; ssh $T ".
> /tmp/reload_asix eth0 100_full" ; J=$? ; if [ $J -eq 255 ] ; then echo
> " SSH timeout" ; break ; fi ; ssh $T "cat /var/log/reload-asix.out" ;
> if [ $J -ne 0 ] ; then echo "  ERROR $J" ; fi ; sleep 3 ; done | tee
> ~/reload-AX88772-$IP-04.out
>
> ---- tombstone from Exynos 5422 on asix driver unload ----
> ...
> [28488.367522] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [28488.380574] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
> [28493.308354] usbcore: deregistering interface driver asix
> [28493.310775] asix 1-1:1.0 eth0: unregister 'asix'
> usb-xhci-hcd.4.auto-1, ASIX AX88772 USB 2.0 Ethernet
> [28494.369787] usbcore: registered new interface driver asix
> [28494.725186] asix 1-1:1.0 eth0: register 'asix' at
> usb-xhci-hcd.4.auto-1, ASIX AX88772 USB 2.0 Ethernet,
> c8:d7:19:d8:0b:d3
> [28494.725262] usb 1-1: authorized to connect
> [28495.545485] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [28497.455518] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [28497.466586] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
> [28502.302851] usbcore: deregistering interface driver asix
> [28502.308652] asix 1-1:1.0 eth0: unregister 'asix'
> usb-xhci-hcd.4.auto-1, ASIX AX88772 USB 2.0 Ethernet
> [28502.308717] Unable to handle kernel paging request at virtual
> address e24cb004
> [28502.308739] pgd = ea514000
> [28502.308753] [e24cb004] *pgd=4241141e(bad)
> [28502.308782] Internal error: Oops: 8000000d [#1] SMP ARM
> [28502.308795] Modules linked in: asix(-) uvcvideo videobuf2_vmalloc
> i2c_dev uinput exynos_gsc v4l2_mem2mem btmrvl_sdio sbs_9018(C)
> mwifiex_sdio mwifiex btmrvl s5p_mfc videobuf2_core zram(C) bluetooth
> videobuf2_dma_contig videobuf2_memops rtc_s3c zuse cfg80211
> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables usbnet
> joydev [last unloaded: asix]
> [28502.308998] CPU: 0    Tainted: G         C    (3.8.11 #6)
> [28502.309016] PC is at 0xe24cb004
> [28502.309039] LR is at __wake_up_common+0x5c/0x88
> [28502.309058] pc : [<e24cb004>]    lr : [<c014f848>]    psr: 80000093
> [28502.309058] sp : ef10be10  ip : e24cb004  fp : ef10be3c
> [28502.309076] r10: e1a0c00d  r9 : 00000000  r8 : 00000003
> [28502.309091] r7 : 00000000  r6 : 00000001  r5 : e92d3ff4  r4 : ea409d14
> [28502.309106] r3 : 00000000  r2 : 00000000  r1 : 00000003  r0 : c060ced4
> [28502.309122] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM
> Segment kernel
> [28502.309138] Control: 10c5387d  Table: 4a51406a  DAC: 00000015
> [28502.309153] Process ksoftirqd/0 (pid: 3, stack limit = 0xef10a240)
> [28502.309168] Stack: (0xef10be10 to 0xef10c000)
> [28502.309186] be00:                                     00000000
> ea409d04 40000013 00000001
> [28502.309209] be20: 00000003 00000000 00000100 3f6fdf7c ef10be6c
> ef10be40 c0151c08 c014f7f8
> [28502.309231] be40: 00000000 ef10be50 c0529a44 ea5ac540 00000000
> ea5ac64c 00000000 00000000
> [28502.309254] be60: ef10be8c ef10be70 bf00a0e4 c0151bcc bf009fa4
> ea5ac6bc ea5ac6c0 c084c790
> [28502.309277] be80: ef10beb4 ef10be90 c012bcb4 bf009fb0 c012bc1c
> ef10a038 00000001 c090209c
> [28502.309300] bea0: 00000006 c09795c0 ef10bf04 ef10beb8 c012b348
> c012bc28 c0934314 ef10a000
> [28502.309322] bec0: 00000001 ef10a020 00000000 00000000 04208040
> 0000000a ef10bf04 00000000
> [28502.309345] bee0: c0934314 ef10a000 00000001 ef10a020 00000000
> 00000000 ef10bf1c ef10bf08
> [28502.309368] bf00: c012b48c c012b234 c012b44c ef056d00 ef10bf44
> ef10bf20 c014f204 c012b458
> [28502.309391] bf20: ef101e48 00000000 ef056d00 c014f098 00000000
> 00000000 ef10bfac ef10bf48
> [28502.309413] bf40: c01455b4 c014f0a4 00000001 00000000 ef056d00
> 00000000 00030003 dead4ead
> [28502.309436] bf60: ffffffff ffffffff ef10bf68 ef10bf68 00000000
> 00000000 dead4ead ffffffff
> [28502.309459] bf80: ffffffff ef10bf84 ef10bf84 271ae517 ef101e48
> c01454ec 00000000 00000000
> [28502.309480] bfa0: 00000000 ef10bfb0 c0106118 c01454f8 00000000
> 00000000 00000000 00000000
> [28502.309500] bfc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> [28502.309520] bfe0: 00000000 00000000 00000000 00000000 00000013
> 00000000 00000000 00000000
> [28502.309532] Backtrace:
> [28502.309565] [<c014f848>] (__wake_up_common+0x5c/0x88) from
> [<c0151c08>] (__wake_up+0x48/0x5c)
> [28502.309597] [<c0151c08>] (__wake_up+0x48/0x5c) from [<bf00a0e4>]
> (usbnet_bh+0x140/0x210 [usbnet])
> [28502.309631] [<bf00a0e4>] (usbnet_bh+0x140/0x210 [usbnet]) from
> [<c012bcb4>] (tasklet_action+0x98/0xf4)
> [28502.309663] [<c012bcb4>] (tasklet_action+0x98/0xf4) from
> [<c012b348>] (__do_softirq+0x120/0x224)
> [28502.309692] [<c012b348>] (__do_softirq+0x120/0x224) from
> [<c012b48c>] (run_ksoftirqd+0x40/0x60)
> [28502.309719] [<c012b48c>] (run_ksoftirqd+0x40/0x60) from
> [<c014f204>] (smpboot_thread_fn+0x16c/0x184)
> [28502.309746] [<c014f204>] (smpboot_thread_fn+0x16c/0x184) from
> [<c01455b4>] (kthread+0xc8/0xd8)
> [28502.309775] [<c01455b4>] (kthread+0xc8/0xd8) from [<c0106118>]
> (ret_from_fork+0x14/0x20)
> [28502.309795] Code: 00000000 00000000 00000000 00000000 (00000000)
> [28502.309815] ---[ end trace 980060b6dbaf7494 ]---
> [28502.324123] Kernel panic - not syncing: Fatal exception in interrupt
> [28502.324160] CPU1: stopping
> [28502.324170] Backtrace:
> [28502.324193] [<c010d3d0>] (unwind_backtrace+0x0/0x118) from
> [<c060914c>] (dump_stack+0x28/0x30)
> [28502.324208] [<c060914c>] (dump_stack+0x28/0x30) from [<c010bcb8>]
> (handle_IPI+0xf0/0x170)
> [28502.324221] [<c010bcb8>] (handle_IPI+0xf0/0x170) from [<c0100430>]
> (gic_handle_irq+0x68/0x70)
> [28502.324235] [<c0100430>] (gic_handle_irq+0x68/0x70) from
> [<c0105c80>] (__irq_svc+0x40/0x50)
> [28502.324244] Exception stack(0xea409cf0 to 0xea409d38)
> [28502.324253] 9ce0:                                     00000002
> ea5ac6c0 00000003 00000001
> [28502.324264] 9d00: ea5ac6bc ea5ac6c0 bf31d788 ea5ac6e0 00200200
> 00000000 00000000 ea409d4c
> [28502.324273] 9d20: 00000000 ea409d38 c012af58 c012af80 20000013 ffffffff
> [28502.324288] [<c0105c80>] (__irq_svc+0x40/0x50) from [<c012af80>]
> (tasklet_kill+0x78/0x8c)
> [28502.324307] [<c012af80>] (tasklet_kill+0x78/0x8c) from [<bf00a950>]
> (usbnet_stop+0x110/0x178 [usbnet])
> [28502.324325] [<bf00a950>] (usbnet_stop+0x110/0x178 [usbnet]) from
> [<c053368c>] (__dev_close_many+0xa8/0xcc)
> [28502.324339] [<c053368c>] (__dev_close_many+0xa8/0xcc) from
> [<c05337bc>] (dev_close_many+0x98/0x118)
> [28502.324353] [<c05337bc>] (dev_close_many+0x98/0x118) from
> [<c0535348>] (rollback_registered_many+0xd4/0x204)
> [28502.324367] [<c0535348>] (rollback_registered_many+0xd4/0x204) from
> [<c0537c6c>] (unregister_netdevice_queue+0x98/0xf4)
> [28502.324381] [<c0537c6c>] (unregister_netdevice_queue+0x98/0xf4)
> from [<c0537cf0>] (unregister_netdev+0x28/0x30)
> [28502.324395] [<c0537cf0>] (unregister_netdev+0x28/0x30) from
> [<bf009610>] (usbnet_disconnect+0x8c/0xe4 [usbnet])
> [28502.324412] [<bf009610>] (usbnet_disconnect+0x8c/0xe4 [usbnet])
> from [<c04266f4>] (usb_unbind_interface+0x70/0x170)
> [28502.324429] [<c04266f4>] (usb_unbind_interface+0x70/0x170) from
> [<c03c8648>] (__device_release_driver+0xac/0xf8)
> [28502.324443] [<c03c8648>] (__device_release_driver+0xac/0xf8) from
> [<c03c8c70>] (driver_detach+0x94/0xbc)
> [28502.324455] [<c03c8c70>] (driver_detach+0x94/0xbc) from
> [<c03c81b0>] (bus_remove_driver+0x78/0xc4)
> [28502.324467] [<c03c81b0>] (bus_remove_driver+0x78/0xc4) from
> [<c03c92c8>] (driver_unregister+0x54/0x78)
> [28502.324480] [<c03c92c8>] (driver_unregister+0x54/0x78) from
> [<c0425b4c>] (usb_deregister+0x6c/0xd4)
> [28502.324495] [<c0425b4c>] (usb_deregister+0x6c/0xd4) from
> [<bf31c82c>] (cleanup_module+0x14/0x7e8 [asix])
> [28502.324518] [<bf31c82c>] (cleanup_module+0x14/0x7e8 [asix]) from
> [<c0177c88>] (sys_delete_module+0x1c4/0x254)
> [28502.324532] [<c0177c88>] (sys_delete_module+0x1c4/0x254) from
> [<c0106080>] (ret_fast_syscall+0x0/0x30)
> [28502.324547] CPU3: stopping
> [28502.324565] Backtrace:
> [28502.324610] [<c010d3d0>] (unwind_backtrace+0x0/0x118) from
> [<c060914c>] (dump_stack+0x28/0x30)
> [28502.324637] [<c060914c>] (dump_stack+0x28/0x30) from [<c010bcb8>]
> (handle_IPI+0xf0/0x170)
> [28502.324664] [<c010bcb8>] (handle_IPI+0xf0/0x170) from [<c0100430>]
> (gic_handle_irq+0x68/0x70)
> [28502.324692] [<c0100430>] (gic_handle_irq+0x68/0x70) from
> [<c0105e00>] (__irq_usr+0x40/0x60)
> [28502.324708] Exception stack(0xed205fb0 to 0xed205ff8)
> [28502.324726] 5fa0:                                     00000000
> 00000100 00000099 ffffff67
> [28502.324747] 5fc0: b859b140 b84dc8c0 00000100 00000000 00000000
> 00000000 00000000 00000001
> [28502.324767] 5fe0: b292a5a1 abbbdf08 b5fbbded b292a5a0 80000030 ffffffff
> [28502.324781] CPU2: stopping
> [28502.324794] Backtrace:
> [28502.324822] [<c010d3d0>] (unwind_backtrace+0x0/0x118) from
> [<c060914c>] (dump_stack+0x28/0x30)
> [28502.324848] [<c060914c>] (dump_stack+0x28/0x30) from [<c010bcb8>]
> (handle_IPI+0xf0/0x170)
> [28502.324873] [<c010bcb8>] (handle_IPI+0xf0/0x170) from [<c0100430>]
> (gic_handle_irq+0x68/0x70)
> [28502.324897] [<c0100430>] (gic_handle_irq+0x68/0x70) from
> [<c0105c80>] (__irq_svc+0x40/0x50)
> [28502.324912] Exception stack(0xed357e38 to 0xed357e80)
> [28502.324928] 7e20:
>     c097c000 00000000
> [28502.324951] 7e40: 00000000 c195c195 c0a0df48 c097f820 00000c01
> 000003fe b6e01d95 ea587800
> [28502.324974] 7e60: 00000064 ed357e8c ed357e80 ed357e80 c060db6c
> c060db70 60000013 ffffffff
> [28502.324999] [<c0105c80>] (__irq_svc+0x40/0x50) from [<c060db70>]
> (_raw_spin_unlock_irq+0x1c/0x20)
> [28502.325027] [<c060db70>] (_raw_spin_unlock_irq+0x1c/0x20) from
> [<c0125484>] (do_syslog+0x36c/0x5f0)
> [28502.325058] [<c0125484>] (do_syslog+0x36c/0x5f0) from [<c02546fc>]
> (kmsg_read+0x3c/0x64)
> [28502.325089] [<c02546fc>] (kmsg_read+0x3c/0x64) from [<c02484f0>]
> (proc_reg_read+0x90/0xa4)
> [28502.325117] [<c02484f0>] (proc_reg_read+0x90/0xa4) from
> [<c01f88a8>] (vfs_read+0xb8/0x148)
> [28502.325143] [<c01f88a8>] (vfs_read+0xb8/0x148) from [<c01f8ae0>]
> (sys_read+0x5c/0xa4)
> [28502.325168] [<c01f8ae0>] (sys_read+0x5c/0xa4) from [<c0106080>]
> (ret_fast_syscall+0x0/0x30)
> [28502.325184] task_migration_notifier = c0936778
> [28502.325207] page containing tmn: c0936758: 00000001 00000000
> dead4ead ffffffff
> [28502.325228] page containing tmn: c0936768: ffffffff c093676c
> c093676c 00000000
> [28502.325248] page containing tmn: c0936778: 00000000 dead4ead
> ffffffff ffffffff
> [28502.325267] page containing tmn: c0936788: 00000000 c014f914
> c014f8f0 00000000
> [28502.325286] page containing tmn: c0936798: 00000000 00000000
> 00000000 00000000
> [28502.325301] page containing tmn: c09367a8: 00000000
> [28502.325329] CPU0 PC: <c011c828> exynos5_panic_notify+0x54/0xb0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-10 22:58 ` Grant Grundler
@ 2014-03-11  2:53   ` Julius Werner
  2014-03-11  9:10     ` Oliver Neukum
                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Julius Werner @ 2014-03-11  2:53 UTC (permalink / raw)
  To: Grant Grundler; +Cc: netdev, Freddy Xin, linux-usb, Allan Chou

I think usbnet_stop() raced with the dev->bh tasklet, which by itself
might not be a problem (usbnet_stop() later kills the tasklet itself,
so it should expect that it can be running before that). The issue is
that it calls usbnet_terminate_urbs() before that, which temporarily
installs a waitqueue in dev->wait in order to be able to wait on the
tasklet to run and finish up some queues. The waiting itself looks
okay, but the access to 'dev->wait' is totally unprotected and can
race arbitrarily. I think in this case usbnet_bh() managed to succeed
it's dev->wait check just before usbnet_terminate_urbs() sets it back
to NULL. The latter then finishes and the waitqueue_t structure on its
stack gets overwritten by other functions halfway through the
wake_up() call in usbnet_bh().

I think the best solution would be to just make dev->wait a directly
embedded structure inside struct usbnet instead of a pointer to
something stack-allocated. usbnet_bh() could just call wake_up()
unconditionally (if empty it will be a noop), and then one other check
for !dev->wait could be replaced with a call to waitqueue_active().
Then the waitqueue-internal locks should be enough to protect all
accesses.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-11  2:53   ` Julius Werner
@ 2014-03-11  9:10     ` Oliver Neukum
  2014-03-11  9:14     ` Oliver Neukum
  2014-03-11 17:31     ` Grant Grundler
  2 siblings, 0 replies; 25+ messages in thread
From: Oliver Neukum @ 2014-03-11  9:10 UTC (permalink / raw)
  To: Julius Werner; +Cc: Grant Grundler, netdev, Freddy Xin, linux-usb, Allan Chou

On Mon, 2014-03-10 at 19:53 -0700, Julius Werner wrote:
> I think usbnet_stop() raced with the dev->bh tasklet, which by itself
> might not be a problem (usbnet_stop() later kills the tasklet itself,
> so it should expect that it can be running before that). The issue is
> that it calls usbnet_terminate_urbs() before that, which temporarily
> installs a waitqueue in dev->wait in order to be able to wait on the
> tasklet to run and finish up some queues. The waiting itself looks
> okay, but the access to 'dev->wait' is totally unprotected and can
> race arbitrarily. I think in this case usbnet_bh() managed to succeed
> it's dev->wait check just before usbnet_terminate_urbs() sets it back
> to NULL. The latter then finishes and the waitqueue_t structure on its
> stack gets overwritten by other functions halfway through the
> wake_up() call in usbnet_bh().
> 
> I think the best solution would be to just make dev->wait a directly
> embedded structure inside struct usbnet instead of a pointer to
> something stack-allocated. usbnet_bh() could just call wake_up()
> unconditionally (if empty it will be a noop), and then one other check
> for !dev->wait could be replaced with a call to waitqueue_active().
> Then the waitqueue-internal locks should be enough to protect all
> accesses.

The diagnosis seems spot on. The fix is not quite so simple.
dev->wait is abused as a flag in resume(). The easiest fix is just
to make sure resume() is never called while stopping.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-11  2:53   ` Julius Werner
  2014-03-11  9:10     ` Oliver Neukum
@ 2014-03-11  9:14     ` Oliver Neukum
       [not found]       ` <1394529263.16128.6.camel-B2T3B9s34ElbnMAlSieJcQ@public.gmane.org>
  2014-03-11 17:31     ` Grant Grundler
  2 siblings, 1 reply; 25+ messages in thread
From: Oliver Neukum @ 2014-03-11  9:14 UTC (permalink / raw)
  To: Julius Werner; +Cc: Grant Grundler, netdev, Freddy Xin, linux-usb, Allan Chou

On Mon, 2014-03-10 at 19:53 -0700, Julius Werner wrote:

> I think the best solution would be to just make dev->wait a directly
> embedded structure inside struct usbnet instead of a pointer to
> something stack-allocated. usbnet_bh() could just call wake_up()
> unconditionally (if empty it will be a noop), and then one other check
> for !dev->wait could be replaced with a call to waitqueue_active().
> Then the waitqueue-internal locks should be enough to protect all
> accesses.

What do you think?

	Regards
		Oliver

>From 5ea2cb8a2893953fc7067aeae093ab20239d5108 Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum@suse.de>
Date: Tue, 11 Mar 2014 09:59:48 +0100
Subject: [PATCH] usbnet: include wait queue head in device structure

This fixes a race which happens by freeing an object on the stack.
Quoting Julius:
> The issue is
> that it calls usbnet_terminate_urbs() before that, which temporarily
> installs a waitqueue in dev->wait in order to be able to wait on the
> tasklet to run and finish up some queues. The waiting itself looks
> okay, but the access to 'dev->wait' is totally unprotected and can
> race arbitrarily. I think in this case usbnet_bh() managed to succeed
> it's dev->wait check just before usbnet_terminate_urbs() sets it back
> to NULL. The latter then finishes and the waitqueue_t structure on its
> stack gets overwritten by other functions halfway through the
> wake_up() call in usbnet_bh().

The fix is to just not allocate the data structure on the stack.
As dev->wait is abused as a flag it also takes a runtime PM change
to fix this bug.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
---
 drivers/net/usb/usbnet.c   | 27 ++++++++++++++-------------
 include/linux/usb/usbnet.h |  2 +-
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index dd10d58..b5bcb48 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -752,14 +752,12 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs);
 // precondition: never called in_interrupt
 static void usbnet_terminate_urbs(struct usbnet *dev)
 {
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(unlink_wakeup);
 	DECLARE_WAITQUEUE(wait, current);
 	int temp;
 
 	/* ensure there are no more active urbs */
-	add_wait_queue(&unlink_wakeup, &wait);
+	add_wait_queue(&dev->wait, &wait);
 	set_current_state(TASK_UNINTERRUPTIBLE);
-	dev->wait = &unlink_wakeup;
 	temp = unlink_urbs(dev, &dev->txq) +
 		unlink_urbs(dev, &dev->rxq);
 
@@ -773,15 +771,14 @@ static void usbnet_terminate_urbs(struct usbnet *dev)
 				  "waited for %d urb completions\n", temp);
 	}
 	set_current_state(TASK_RUNNING);
-	dev->wait = NULL;
-	remove_wait_queue(&unlink_wakeup, &wait);
+	remove_wait_queue(&dev->wait, &wait);
 }
 
 int usbnet_stop (struct net_device *net)
 {
 	struct usbnet		*dev = netdev_priv(net);
 	struct driver_info	*info = dev->driver_info;
-	int			retval;
+	int			retval, pm;
 
 	clear_bit(EVENT_DEV_OPEN, &dev->flags);
 	netif_stop_queue (net);
@@ -791,6 +788,8 @@ int usbnet_stop (struct net_device *net)
 		   net->stats.rx_packets, net->stats.tx_packets,
 		   net->stats.rx_errors, net->stats.tx_errors);
 
+	/* to not race resume */
+	pm = usb_autopm_get_interface(dev->intf);
 	/* allow minidriver to stop correctly (wireless devices to turn off
 	 * radio etc) */
 	if (info->stop) {
@@ -817,6 +816,9 @@ int usbnet_stop (struct net_device *net)
 	dev->flags = 0;
 	del_timer_sync (&dev->delay);
 	tasklet_kill (&dev->bh);
+	if (!pm)
+		usb_autopm_put_interface(dev->intf);
+
 	if (info->manage_power &&
 	    !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags))
 		info->manage_power(dev, 0);
@@ -1438,10 +1440,8 @@ static void usbnet_bh (unsigned long param)
 	clear_bit(EVENT_RX_KILL, &dev->flags);
 
 	// waiting for all pending urbs to complete?
-	if (dev->wait) {
-		if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
-			wake_up (dev->wait);
-		}
+	if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
+		wake_up (&dev->wait);
 
 	// or are we maybe short a few urbs?
 	} else if (netif_running (dev->net) &&
@@ -1791,9 +1791,10 @@ int usbnet_resume (struct usb_interface *intf)
 		spin_unlock_irq(&dev->txq.lock);
 
 		if (test_bit(EVENT_DEV_OPEN, &dev->flags)) {
-			/* handle remote wakeup ASAP */
-			if (!dev->wait &&
-				netif_device_present(dev->net) &&
+			/* handle remote wakeup ASAP
+			 * we cannot race against stop
+			 */
+			if (netif_device_present(dev->net) &&
 				!timer_pending(&dev->delay) &&
 				!test_bit(EVENT_RX_HALT, &dev->flags))
 					rx_alloc_submit(dev, GFP_NOIO);
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index e303eef..0662e98 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -30,7 +30,7 @@ struct usbnet {
 	struct driver_info	*driver_info;
 	const char		*driver_name;
 	void			*driver_priv;
-	wait_queue_head_t	*wait;
+	wait_queue_head_t	wait;
 	struct mutex		phy_mutex;
 	unsigned char		suspend_count;
 	unsigned char		pkt_cnt, pkt_err;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
       [not found]       ` <1394529263.16128.6.camel-B2T3B9s34ElbnMAlSieJcQ@public.gmane.org>
@ 2014-03-11 17:27         ` Julius Werner
  2014-03-17 21:53           ` Julius Werner
  0 siblings, 1 reply; 25+ messages in thread
From: Julius Werner @ 2014-03-11 17:27 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Julius Werner, Grant Grundler, netdev, Freddy Xin,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Allan Chou

I don't know enough about how runtime PM works in this driver to
really review that patch, sorry. (Would this do a complete resume of
the device if we call usbnet_stop() while it was suspended? Is that
what we want?)

I think you could have also preserved the original logic of using
dev->wait as a flag if you had just replaced 'if (!dev->wait &&' with
'if (!waitqueue_active(&dev->wait) &&' to check whether the waitqueue
is empty.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-11  2:53   ` Julius Werner
  2014-03-11  9:10     ` Oliver Neukum
  2014-03-11  9:14     ` Oliver Neukum
@ 2014-03-11 17:31     ` Grant Grundler
  2014-03-11 17:37       ` Grant Grundler
       [not found]       ` <CANEJEGv77PnbLZsyOdTevXjij-_wLSq4uGCnSjnAwN9OK8m3Yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2 siblings, 2 replies; 25+ messages in thread
From: Grant Grundler @ 2014-03-11 17:31 UTC (permalink / raw)
  To: Julius Werner; +Cc: netdev, Freddy Xin, linux-usb, Allan Chou

On Mon, Mar 10, 2014 at 7:53 PM, Julius Werner <jwerner@chromium.org> wrote:
> I think usbnet_stop() raced with the dev->bh tasklet, which by itself
> might not be a problem (usbnet_stop() later kills the tasklet itself,
> so it should expect that it can be running before that). The issue is
> that it calls usbnet_terminate_urbs() before that, which temporarily
> installs a waitqueue in dev->wait in order to be able to wait on the
> tasklet to run and finish up some queues. The waiting itself looks
> okay, but the access to 'dev->wait' is totally unprotected and can
> race arbitrarily. I think in this case usbnet_bh() managed to succeed
> it's dev->wait check just before usbnet_terminate_urbs() sets it back
> to NULL. The latter then finishes and the waitqueue_t structure on its
> stack gets overwritten by other functions halfway through the
> wake_up() call in usbnet_bh().

Awesome - thanks Julius! :)

FWIW, I've reproduced this on "Samsung Chromebook" (Exynos 5250) with
AX88772 USB dongle using the instructions I posted before (ie bouncing
the USB port with reload_asix script).

cheers,
grant

[23231.533805] asix 3-1:1.0 eth0: link up, 1000Mbps, full-duplex, lpa 0xCDE1
[23235.755652] usbcore: deregistering interface driver asix
[23235.761722] asix 3-1:1.0 eth0: unregister 'asix'
usb-12110000.usb-1, ASIX AX88178 USB 2.0 Ethernet
[23235.761763] Unable to handle kernel paging request at virtual
address e24cb004
[23235.761771] pgd = ebf70000
[23235.761777] [e24cb004] *pgd=6241141e(bad)
[23235.761792] Internal error: Oops: 8000000d [#1] SMP ARM
[23235.761798] Modules linked in: asix(-) exynos_gsc v4l2_mem2mem
isl29018(C) sbs_battery i2c_dev uinput mwifiex_sdio mwifiex
btmrvl_sdio btmrvl s5p_mfc videobuf2_dma_contig rtc_s3c bluetooth
zram(C) zsmalloc(C) fuse cfg80211 nf_conntrack_ipv6 nf_defrag_ipv6
ip6table_filter ip6_tables uvcvideo videobuf2_core videobuf2_vmalloc
videobuf2_memops usbnet joydev [last unloaded: asix]
[23235.761898] CPU: 0    Tainted: G         C    (3.8.11 #25)
[23235.761906] PC is at 0xe24cb004
[23235.761916] LR is at __wake_up_common+0x5c/0x88
[23235.761924] pc : [<e24cb004>]    lr : [<c014f870>]    psr: 80000093
[23235.761924] sp : ef0e3e10  ip : e24cb004  fp : ef0e3e3c
[23235.761931] r10: e1a0c00d  r9 : 00000000  r8 : 00000003
[23235.761938] r7 : 00000000  r6 : 00000001  r5 : e92d3ff4  r4 : eab13d14
[23235.761943] r3 : 00000000  r2 : 00000000  r1 : 00000003  r0 : c060d0f4
[23235.761951] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM
Segment kernel
[23235.761957] Control: 10c5387d  Table: 6bf7006a  DAC: 00000015
[23235.761964] Process ksoftirqd/0 (pid: 3, stack limit = 0xef0e2240)
[23235.761970] Stack: (0xef0e3e10 to 0xef0e4000)
[23235.761977] 3e00:                                     00000000
eab13d04 40000013 00000001
[23235.761986] 3e20: 00000003 00000000 00000100 3f6fdf7c ef0e3e6c
ef0e3e40 c0151c30 c014f820
[23235.761994] 3e40: 00000000 ef0e3e50 c052861c edaddd40 00000000
edadde4c 00000000 00000000
[23235.762001] 3e60: ef0e3e8c ef0e3e70 bf00a0e4 c0151bf4 bf009fa4
edaddebc edaddec0 c084c790
[23235.762009] 3e80: ef0e3eb4 ef0e3e90 c012bcb4 bf009fb0 c012bc1c
ef0e2038 00000009 c090209c
[23235.762016] 3ea0: 00000006 c09790c0 ef0e3f04 ef0e3eb8 c012b348
c012bc28 c0934324 ef0e2000
[23235.762024] 3ec0: 00000001 ef0e2020 00000000 00000000 04208040
00000005 c0153f94 00000000
[23235.762032] 3ee0: c0934324 ef0e2000 00000001 ef0e2020 00000000
00000000 ef0e3f1c ef0e3f08
[23235.762039] 3f00: c012b48c c012b234 c012b44c ef0409c0 ef0e3f44
ef0e3f20 c014f22c c012b458
[23235.762046] 3f20: ef0dde48 00000000 ef0409c0 c014f0c0 00000000
00000000 ef0e3fac ef0e3f48
[23235.762054] 3f40: c01455b4 c014f0cc 00000001 00000000 ef0409c0
00000000 00030003 dead4ead
[23235.762061] 3f60: ffffffff ffffffff ef0e3f68 ef0e3f68 00000000
00000000 dead4ead ffffffff
[23235.762068] 3f80: ffffffff ef0e3f84 ef0e3f84 271ae517 ef0dde48
c01454ec 00000000 00000000
[23235.762075] 3fa0: 00000000 ef0e3fb0 c0106118 c01454f8 00000000
00000000 00000000 00000000
[23235.762082] 3fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[23235.762089] 3fe0: 00000000 00000000 00000000 00000000 00000013
00000000 f77e7f69 e1459824
[23235.762094] Backtrace:
[23235.762107] [<c014f870>] (__wake_up_common+0x5c/0x88) from
[<c0151c30>] (__wake_up+0x48/0x5c)
[23235.762121] [<c0151c30>] (__wake_up+0x48/0x5c) from [<bf00a0e4>]
(usbnet_bh+0x140/0x210 [usbnet])
[23235.762135] [<bf00a0e4>] (usbnet_bh+0x140/0x210 [usbnet]) from
[<c012bcb4>] (tasklet_action+0x98/0xf4)
[23235.762148] [<c012bcb4>] (tasklet_action+0x98/0xf4) from
[<c012b348>] (__do_softirq+0x120/0x224)
[23235.762160] [<c012b348>] (__do_softirq+0x120/0x224) from
[<c012b48c>] (run_ksoftirqd+0x40/0x60)
[23235.762170] [<c012b48c>] (run_ksoftirqd+0x40/0x60) from
[<c014f22c>] (smpboot_thread_fn+0x16c/0x184)
[23235.762180] [<c014f22c>] (smpboot_thread_fn+0x16c/0x184) from
[<c01455b4>] (kthread+0xc8/0xd8)
[23235.762191] [<c01455b4>] (kthread+0xc8/0xd8) from [<c0106118>]
(ret_from_fork+0x14/0x20)
[23235.762200] Code: 0000efe8 00003f15 0000eff0 00000000 (0000f004)
[23235.762209] ---[ end trace 3ad68dc3731b37c5 ]---
[23235.766529] Kernel panic - not syncing: Fatal exception in interrupt
[23235.766539] CPU1: stopping
[23235.766546] Backtrace:
[23235.766564] [<c010d3d0>] (unwind_backtrace+0x0/0x118) from
[<c060936c>] (dump_stack+0x28/0x30)
[23235.766577] [<c060936c>] (dump_stack+0x28/0x30) from [<c010bcb8>]
(handle_IPI+0xf0/0x170)
[23235.766588] [<c010bcb8>] (handle_IPI+0xf0/0x170) from [<c0100430>]
(gic_handle_irq+0x68/0x70)
[23235.766598] [<c0100430>] (gic_handle_irq+0x68/0x70) from
[<c0105c80>] (__irq_svc+0x40/0x50)
[23235.766605] Exception stack(0xeab13cf0 to 0xeab13d38)
[23235.766612] 3ce0:                                     00000002
edaddec0 00000003 00000001
[23235.766620] 3d00: edaddebc edaddec0 bfa78744 edaddee0 00200200
00000000 00000000 eab13d4c
[23235.766627] 3d20: 00000000 eab13d38 c012af58 c012af74 20000013 ffffffff
[23235.766639] [<c0105c80>] (__irq_svc+0x40/0x50) from [<c012af74>]
(tasklet_kill+0x6c/0x8c)
[23235.766653] [<c012af74>] (tasklet_kill+0x6c/0x8c) from [<bf00a950>]
(usbnet_stop+0x110/0x178 [usbnet])
[23235.766667] [<bf00a950>] (usbnet_stop+0x110/0x178 [usbnet]) from
[<c0532298>] (__dev_close_many+0xa8/0xcc)
[23235.766677] [<c0532298>] (__dev_close_many+0xa8/0xcc) from
[<c05323c8>] (dev_close_many+0x98/0x118)
[23235.766688] [<c05323c8>] (dev_close_many+0x98/0x118) from
[<c0533fcc>] (rollback_registered_many+0xd4/0x204)
[23235.766700] [<c0533fcc>] (rollback_registered_many+0xd4/0x204) from
[<c05368f0>] (unregister_netdevice_queue+0x98/0xf4)
[23235.766711] [<c05368f0>] (unregister_netdevice_queue+0x98/0xf4)
from [<c0536974>] (unregister_netdev+0x28/0x30)
[23235.766722] [<c0536974>] (unregister_netdev+0x28/0x30) from
[<bf009610>] (usbnet_disconnect+0x8c/0xe4 [usbnet])
[23235.766739] [<bf009610>] (usbnet_disconnect+0x8c/0xe4 [usbnet])
from [<c0425328>] (usb_unbind_interface+0x70/0x170)
[23235.766753] [<c0425328>] (usb_unbind_interface+0x70/0x170) from
[<c03c727c>] (__device_release_driver+0xac/0xf8)
[23235.766765] [<c03c727c>] (__device_release_driver+0xac/0xf8) from
[<c03c78a4>] (driver_detach+0x94/0xbc)
[23235.766775] [<c03c78a4>] (driver_detach+0x94/0xbc) from
[<c03c6de4>] (bus_remove_driver+0x78/0xc4)
[23235.766785] [<c03c6de4>] (bus_remove_driver+0x78/0xc4) from
[<c03c7efc>] (driver_unregister+0x54/0x78)
[23235.766796] [<c03c7efc>] (driver_unregister+0x54/0x78) from
[<c0424780>] (usb_deregister+0x6c/0xd4)
[23235.766807] [<c0424780>] (usb_deregister+0x6c/0xd4) from
[<bfa7782c>] (cleanup_module+0x14/0x7e8 [asix])
[23235.766827] [<bfa7782c>] (cleanup_module+0x14/0x7e8 [asix]) from
[<c0177cb0>] (sys_delete_module+0x1c4/0x254)
[23235.766838] [<c0177cb0>] (sys_delete_module+0x1c4/0x254) from
[<c0106080>] (ret_fast_syscall+0x0/0x30)
[23235.766846] task_migration_notifier = c0936790
[23235.766855] page containing tmn: c0936770: 00000001 00000000
dead4ead ffffffff
[23235.766863] page containing tmn: c0936780: ffffffff c0936784
c0936784 00000000
[23235.766871] page containing tmn: c0936790: 00000000 dead4ead
ffffffff ffffffff
[23235.766878] page containing tmn: c09367a0: 20202020 00000000
beab7861 c014f93c
[23235.766886] page containing tmn: c09367b0: c014f918 00000000
00000000 00000000
[23235.766892] page containing tmn: c09367c0: 00000000 00000000 00000000
[23235.766907] CPU0 PC: <c011c830> exynos5_panic_notify+0x5c/0xb0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-11 17:31     ` Grant Grundler
@ 2014-03-11 17:37       ` Grant Grundler
       [not found]       ` <CANEJEGv77PnbLZsyOdTevXjij-_wLSq4uGCnSjnAwN9OK8m3Yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 0 replies; 25+ messages in thread
From: Grant Grundler @ 2014-03-11 17:37 UTC (permalink / raw)
  To: Julius Werner; +Cc: netdev, Freddy Xin, linux-usb, Allan Chou

On Tue, Mar 11, 2014 at 10:31 AM, Grant Grundler <grundler@google.com> wrote:
...
> FWIW, I've reproduced this on "Samsung Chromebook" (Exynos 5250) with
> AX88772 USB dongle using the instructions I posted before (ie bouncing
> the USB port with reload_asix script).

Sorry - with AX88178 dongle - not AX88772
(5 machines on my desk right now).

grant

>
> cheers,
> grant
>
> [23231.533805] asix 3-1:1.0 eth0: link up, 1000Mbps, full-duplex, lpa 0xCDE1
> [23235.755652] usbcore: deregistering interface driver asix
> [23235.761722] asix 3-1:1.0 eth0: unregister 'asix'
> usb-12110000.usb-1, ASIX AX88178 USB 2.0 Ethernet
> [23235.761763] Unable to handle kernel paging request at virtual
> address e24cb004
> [23235.761771] pgd = ebf70000
> [23235.761777] [e24cb004] *pgd=6241141e(bad)
> [23235.761792] Internal error: Oops: 8000000d [#1] SMP ARM
> [23235.761798] Modules linked in: asix(-) exynos_gsc v4l2_mem2mem
> isl29018(C) sbs_battery i2c_dev uinput mwifiex_sdio mwifiex
> btmrvl_sdio btmrvl s5p_mfc videobuf2_dma_contig rtc_s3c bluetooth
> zram(C) zsmalloc(C) fuse cfg80211 nf_conntrack_ipv6 nf_defrag_ipv6
> ip6table_filter ip6_tables uvcvideo videobuf2_core videobuf2_vmalloc
> videobuf2_memops usbnet joydev [last unloaded: asix]
> [23235.761898] CPU: 0    Tainted: G         C    (3.8.11 #25)
> [23235.761906] PC is at 0xe24cb004
> [23235.761916] LR is at __wake_up_common+0x5c/0x88
> [23235.761924] pc : [<e24cb004>]    lr : [<c014f870>]    psr: 80000093
> [23235.761924] sp : ef0e3e10  ip : e24cb004  fp : ef0e3e3c
> [23235.761931] r10: e1a0c00d  r9 : 00000000  r8 : 00000003
> [23235.761938] r7 : 00000000  r6 : 00000001  r5 : e92d3ff4  r4 : eab13d14
> [23235.761943] r3 : 00000000  r2 : 00000000  r1 : 00000003  r0 : c060d0f4
> [23235.761951] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM
> Segment kernel
> [23235.761957] Control: 10c5387d  Table: 6bf7006a  DAC: 00000015
> [23235.761964] Process ksoftirqd/0 (pid: 3, stack limit = 0xef0e2240)
> [23235.761970] Stack: (0xef0e3e10 to 0xef0e4000)
> [23235.761977] 3e00:                                     00000000
> eab13d04 40000013 00000001
> [23235.761986] 3e20: 00000003 00000000 00000100 3f6fdf7c ef0e3e6c
> ef0e3e40 c0151c30 c014f820
> [23235.761994] 3e40: 00000000 ef0e3e50 c052861c edaddd40 00000000
> edadde4c 00000000 00000000
> [23235.762001] 3e60: ef0e3e8c ef0e3e70 bf00a0e4 c0151bf4 bf009fa4
> edaddebc edaddec0 c084c790
> [23235.762009] 3e80: ef0e3eb4 ef0e3e90 c012bcb4 bf009fb0 c012bc1c
> ef0e2038 00000009 c090209c
> [23235.762016] 3ea0: 00000006 c09790c0 ef0e3f04 ef0e3eb8 c012b348
> c012bc28 c0934324 ef0e2000
> [23235.762024] 3ec0: 00000001 ef0e2020 00000000 00000000 04208040
> 00000005 c0153f94 00000000
> [23235.762032] 3ee0: c0934324 ef0e2000 00000001 ef0e2020 00000000
> 00000000 ef0e3f1c ef0e3f08
> [23235.762039] 3f00: c012b48c c012b234 c012b44c ef0409c0 ef0e3f44
> ef0e3f20 c014f22c c012b458
> [23235.762046] 3f20: ef0dde48 00000000 ef0409c0 c014f0c0 00000000
> 00000000 ef0e3fac ef0e3f48
> [23235.762054] 3f40: c01455b4 c014f0cc 00000001 00000000 ef0409c0
> 00000000 00030003 dead4ead
> [23235.762061] 3f60: ffffffff ffffffff ef0e3f68 ef0e3f68 00000000
> 00000000 dead4ead ffffffff
> [23235.762068] 3f80: ffffffff ef0e3f84 ef0e3f84 271ae517 ef0dde48
> c01454ec 00000000 00000000
> [23235.762075] 3fa0: 00000000 ef0e3fb0 c0106118 c01454f8 00000000
> 00000000 00000000 00000000
> [23235.762082] 3fc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> [23235.762089] 3fe0: 00000000 00000000 00000000 00000000 00000013
> 00000000 f77e7f69 e1459824
> [23235.762094] Backtrace:
> [23235.762107] [<c014f870>] (__wake_up_common+0x5c/0x88) from
> [<c0151c30>] (__wake_up+0x48/0x5c)
> [23235.762121] [<c0151c30>] (__wake_up+0x48/0x5c) from [<bf00a0e4>]
> (usbnet_bh+0x140/0x210 [usbnet])
> [23235.762135] [<bf00a0e4>] (usbnet_bh+0x140/0x210 [usbnet]) from
> [<c012bcb4>] (tasklet_action+0x98/0xf4)
> [23235.762148] [<c012bcb4>] (tasklet_action+0x98/0xf4) from
> [<c012b348>] (__do_softirq+0x120/0x224)
> [23235.762160] [<c012b348>] (__do_softirq+0x120/0x224) from
> [<c012b48c>] (run_ksoftirqd+0x40/0x60)
> [23235.762170] [<c012b48c>] (run_ksoftirqd+0x40/0x60) from
> [<c014f22c>] (smpboot_thread_fn+0x16c/0x184)
> [23235.762180] [<c014f22c>] (smpboot_thread_fn+0x16c/0x184) from
> [<c01455b4>] (kthread+0xc8/0xd8)
> [23235.762191] [<c01455b4>] (kthread+0xc8/0xd8) from [<c0106118>]
> (ret_from_fork+0x14/0x20)
> [23235.762200] Code: 0000efe8 00003f15 0000eff0 00000000 (0000f004)
> [23235.762209] ---[ end trace 3ad68dc3731b37c5 ]---
> [23235.766529] Kernel panic - not syncing: Fatal exception in interrupt
> [23235.766539] CPU1: stopping
> [23235.766546] Backtrace:
> [23235.766564] [<c010d3d0>] (unwind_backtrace+0x0/0x118) from
> [<c060936c>] (dump_stack+0x28/0x30)
> [23235.766577] [<c060936c>] (dump_stack+0x28/0x30) from [<c010bcb8>]
> (handle_IPI+0xf0/0x170)
> [23235.766588] [<c010bcb8>] (handle_IPI+0xf0/0x170) from [<c0100430>]
> (gic_handle_irq+0x68/0x70)
> [23235.766598] [<c0100430>] (gic_handle_irq+0x68/0x70) from
> [<c0105c80>] (__irq_svc+0x40/0x50)
> [23235.766605] Exception stack(0xeab13cf0 to 0xeab13d38)
> [23235.766612] 3ce0:                                     00000002
> edaddec0 00000003 00000001
> [23235.766620] 3d00: edaddebc edaddec0 bfa78744 edaddee0 00200200
> 00000000 00000000 eab13d4c
> [23235.766627] 3d20: 00000000 eab13d38 c012af58 c012af74 20000013 ffffffff
> [23235.766639] [<c0105c80>] (__irq_svc+0x40/0x50) from [<c012af74>]
> (tasklet_kill+0x6c/0x8c)
> [23235.766653] [<c012af74>] (tasklet_kill+0x6c/0x8c) from [<bf00a950>]
> (usbnet_stop+0x110/0x178 [usbnet])
> [23235.766667] [<bf00a950>] (usbnet_stop+0x110/0x178 [usbnet]) from
> [<c0532298>] (__dev_close_many+0xa8/0xcc)
> [23235.766677] [<c0532298>] (__dev_close_many+0xa8/0xcc) from
> [<c05323c8>] (dev_close_many+0x98/0x118)
> [23235.766688] [<c05323c8>] (dev_close_many+0x98/0x118) from
> [<c0533fcc>] (rollback_registered_many+0xd4/0x204)
> [23235.766700] [<c0533fcc>] (rollback_registered_many+0xd4/0x204) from
> [<c05368f0>] (unregister_netdevice_queue+0x98/0xf4)
> [23235.766711] [<c05368f0>] (unregister_netdevice_queue+0x98/0xf4)
> from [<c0536974>] (unregister_netdev+0x28/0x30)
> [23235.766722] [<c0536974>] (unregister_netdev+0x28/0x30) from
> [<bf009610>] (usbnet_disconnect+0x8c/0xe4 [usbnet])
> [23235.766739] [<bf009610>] (usbnet_disconnect+0x8c/0xe4 [usbnet])
> from [<c0425328>] (usb_unbind_interface+0x70/0x170)
> [23235.766753] [<c0425328>] (usb_unbind_interface+0x70/0x170) from
> [<c03c727c>] (__device_release_driver+0xac/0xf8)
> [23235.766765] [<c03c727c>] (__device_release_driver+0xac/0xf8) from
> [<c03c78a4>] (driver_detach+0x94/0xbc)
> [23235.766775] [<c03c78a4>] (driver_detach+0x94/0xbc) from
> [<c03c6de4>] (bus_remove_driver+0x78/0xc4)
> [23235.766785] [<c03c6de4>] (bus_remove_driver+0x78/0xc4) from
> [<c03c7efc>] (driver_unregister+0x54/0x78)
> [23235.766796] [<c03c7efc>] (driver_unregister+0x54/0x78) from
> [<c0424780>] (usb_deregister+0x6c/0xd4)
> [23235.766807] [<c0424780>] (usb_deregister+0x6c/0xd4) from
> [<bfa7782c>] (cleanup_module+0x14/0x7e8 [asix])
> [23235.766827] [<bfa7782c>] (cleanup_module+0x14/0x7e8 [asix]) from
> [<c0177cb0>] (sys_delete_module+0x1c4/0x254)
> [23235.766838] [<c0177cb0>] (sys_delete_module+0x1c4/0x254) from
> [<c0106080>] (ret_fast_syscall+0x0/0x30)
> [23235.766846] task_migration_notifier = c0936790
> [23235.766855] page containing tmn: c0936770: 00000001 00000000
> dead4ead ffffffff
> [23235.766863] page containing tmn: c0936780: ffffffff c0936784
> c0936784 00000000
> [23235.766871] page containing tmn: c0936790: 00000000 dead4ead
> ffffffff ffffffff
> [23235.766878] page containing tmn: c09367a0: 20202020 00000000
> beab7861 c014f93c
> [23235.766886] page containing tmn: c09367b0: c014f918 00000000
> 00000000 00000000
> [23235.766892] page containing tmn: c09367c0: 00000000 00000000 00000000
> [23235.766907] CPU0 PC: <c011c830> exynos5_panic_notify+0x5c/0xb0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
       [not found]       ` <CANEJEGv77PnbLZsyOdTevXjij-_wLSq4uGCnSjnAwN9OK8m3Yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-11 17:52         ` Grant Grundler
  0 siblings, 0 replies; 25+ messages in thread
From: Grant Grundler @ 2014-03-11 17:52 UTC (permalink / raw)
  To: Julius Werner
  Cc: netdev, Freddy Xin, linux-usb-u79uwXL29TY76Z2rM5mHXA, Allan Chou

On Tue, Mar 11, 2014 at 10:31 AM, Grant Grundler <grundler-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> FWIW, I've reproduced this on "Samsung Chromebook" (Exynos 5250) with

FYA - I just noticed the system crashed on the 2048th iteration :)

...
RELOAD 2046  20140310202523  LINK 1000_full (3 sec)  IP 192.168.1.116 (1 Sec)
RELOAD 2047  20140310202533  LINK 1000_full (2 sec)  IP 192.168.1.116 (1 Sec)
RELOAD 2048   SSH timeout

thought folks would be amused by the coincidence.

grant
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-11 17:27         ` Julius Werner
@ 2014-03-17 21:53           ` Julius Werner
  2014-03-17 21:55             ` Oliver Neukum
  0 siblings, 1 reply; 25+ messages in thread
From: Julius Werner @ 2014-03-17 21:53 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Grant Grundler, Julius Werner, netdev, Freddy Xin, linux-usb, Allan Chou

Hi Oliver,

Any update on the state of this patch? Did it get picked up for merge
somewhere? Do you agree with my suggestion of sticking closer to the
original logic instead of adding that autopm_get(), or do you maybe
want to add some more reviewers to confirm your code? I don't really
care that much which way we choose in the end, I just want to make
sure this bug gets fixed and we don't forget about it.

Thanks,
Julius

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-17 21:53           ` Julius Werner
@ 2014-03-17 21:55             ` Oliver Neukum
  2014-03-17 22:02               ` Grant Grundler
  2014-03-18 20:51               ` Grant Grundler
  0 siblings, 2 replies; 25+ messages in thread
From: Oliver Neukum @ 2014-03-17 21:55 UTC (permalink / raw)
  To: Julius Werner; +Cc: Grant Grundler, netdev, Freddy Xin, linux-usb, Allan Chou

On Mon, 2014-03-17 at 14:53 -0700, Julius Werner wrote:
> Hi Oliver,
> 
> Any update on the state of this patch? Did it get picked up for merge
> somewhere? Do you agree with my suggestion of sticking closer to the
> original logic instead of adding that autopm_get(), or do you maybe
> want to add some more reviewers to confirm your code? I don't really
> care that much which way we choose in the end, I just want to make
> sure this bug gets fixed and we don't forget about it.

I am hping for the reporter of the original bug to test it.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-17 21:55             ` Oliver Neukum
@ 2014-03-17 22:02               ` Grant Grundler
  2014-03-18 20:51               ` Grant Grundler
  1 sibling, 0 replies; 25+ messages in thread
From: Grant Grundler @ 2014-03-17 22:02 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Julius Werner, netdev, Freddy Xin, linux-usb, Allan Chou

On Mon, Mar 17, 2014 at 2:55 PM, Oliver Neukum <oneukum@suse.de> wrote:
> On Mon, 2014-03-17 at 14:53 -0700, Julius Werner wrote:
>> Hi Oliver,
>>
>> Any update on the state of this patch? Did it get picked up for merge
>> somewhere? Do you agree with my suggestion of sticking closer to the
>> original logic instead of adding that autopm_get(), or do you maybe
>> want to add some more reviewers to confirm your code? I don't really
>> care that much which way we choose in the end, I just want to make
>> sure this bug gets fixed and we don't forget about it.
>
> I am hping for the reporter of the original bug to test it.

Ok. I didn't realize I was part of the dependency chain here. :)
It seemed "obvious" to me once Julius described the race.

I'll try it on the offending system but fear I will run into some other problem.

cheers,
grant

>
>         Regards
>                 Oliver
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-17 21:55             ` Oliver Neukum
  2014-03-17 22:02               ` Grant Grundler
@ 2014-03-18 20:51               ` Grant Grundler
       [not found]                 ` <CANEJEGuaVPNZKFiXY9CywFuqghgiQqV_sii7TE3STCNvXLQ3pQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-03-19  1:09                 ` Julius Werner
  1 sibling, 2 replies; 25+ messages in thread
From: Grant Grundler @ 2014-03-18 20:51 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Julius Werner, netdev, Freddy Xin, linux-usb, Allan Chou

On Mon, Mar 17, 2014 at 2:55 PM, Oliver Neukum <oneukum@suse.de> wrote:
> I am hping for the reporter of the original bug to test it.

Oliver,
on a haswell system running ChromeOS-3.8 kernel, this patch as-is
resulted in a "Bad Spinlock Magic" error and subsequent pagefault.
I believe the sequence was:
   usbnet_open -> tasklet_schedule(dev->bh) -> usbnet_bh -> wake_up
(&dev->wait) -> panic

I tried adding the following change on top of your patch but believe
the plumbing still isn't quite correct since the USB device (eth0) is
reporting a link but no TX or RX of traffic:
 @@ -805,6 +807,9 @@ int usbnet_open (struct net_device *net)
                goto done;
        }

+       /* usbnet_bh() expects the spinlock to be initialized. */
+       init_waitqueue_head(&dev->wait);
+
        /* hard_mtu or rx_urb_size may change in reset() */
        usbnet_update_max_qlen(dev);

I suspect this hunk of your patch is now causing different problems at
init time:
@@ -1438,10 +1440,8 @@ static void usbnet_bh (unsigned long param)
        clear_bit(EVENT_RX_KILL, &dev->flags);

        // waiting for all pending urbs to complete?
-       if (dev->wait) {
-               if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
-                       wake_up (dev->wait);
-               }
+       if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
+               wake_up (&dev->wait);

        // or are we maybe short a few urbs?
        } else if (netif_running (dev->net) &&

Please advise on what you'd like me to try next.

cheers,
grant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
       [not found]                 ` <CANEJEGuaVPNZKFiXY9CywFuqghgiQqV_sii7TE3STCNvXLQ3pQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-18 20:52                   ` Grant Grundler
  0 siblings, 0 replies; 25+ messages in thread
From: Grant Grundler @ 2014-03-18 20:52 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Julius Werner, netdev, Freddy Xin,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Allan Chou

On Tue, Mar 18, 2014 at 1:51 PM, Grant Grundler <grundler-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Mon, Mar 17, 2014 at 2:55 PM, Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org> wrote:
>> I am hping for the reporter of the original bug to test it.
>
> Oliver,
> on a haswell system running ChromeOS-3.8 kernel, this patch as-is
> resulted in a "Bad Spinlock Magic" error and subsequent pagefault.
> I believe the sequence was:
>    usbnet_open -> tasklet_schedule(dev->bh) -> usbnet_bh -> wake_up
> (&dev->wait) -> panic

BTW, full console output is available on this public patch:
    https://code.google.com/p/chromium/issues/detail?id=353648

cheers,
grant

>
> I tried adding the following change on top of your patch but believe
> the plumbing still isn't quite correct since the USB device (eth0) is
> reporting a link but no TX or RX of traffic:
>  @@ -805,6 +807,9 @@ int usbnet_open (struct net_device *net)
>                 goto done;
>         }
>
> +       /* usbnet_bh() expects the spinlock to be initialized. */
> +       init_waitqueue_head(&dev->wait);
> +
>         /* hard_mtu or rx_urb_size may change in reset() */
>         usbnet_update_max_qlen(dev);
>
> I suspect this hunk of your patch is now causing different problems at
> init time:
> @@ -1438,10 +1440,8 @@ static void usbnet_bh (unsigned long param)
>         clear_bit(EVENT_RX_KILL, &dev->flags);
>
>         // waiting for all pending urbs to complete?
> -       if (dev->wait) {
> -               if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
> -                       wake_up (dev->wait);
> -               }
> +       if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
> +               wake_up (&dev->wait);
>
>         // or are we maybe short a few urbs?
>         } else if (netif_running (dev->net) &&
>
> Please advise on what you'd like me to try next.
>
> cheers,
> grant
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-18 20:51               ` Grant Grundler
       [not found]                 ` <CANEJEGuaVPNZKFiXY9CywFuqghgiQqV_sii7TE3STCNvXLQ3pQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-19  1:09                 ` Julius Werner
       [not found]                   ` <CAODwPW-E-3HcXCGEmp9JpW6hU-Xt-FHwBtqeSSTDGaRPqO49Sw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 25+ messages in thread
From: Julius Werner @ 2014-03-19  1:09 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Oliver Neukum, Julius Werner, netdev, Freddy Xin, linux-usb, Allan Chou

> I tried adding the following change on top of your patch but believe
> the plumbing still isn't quite correct since the USB device (eth0) is
> reporting a link but no TX or RX of traffic:
>  @@ -805,6 +807,9 @@ int usbnet_open (struct net_device *net)
>                 goto done;
>         }
>
> +       /* usbnet_bh() expects the spinlock to be initialized. */
> +       init_waitqueue_head(&dev->wait);
> +
>         /* hard_mtu or rx_urb_size may change in reset() */
>         usbnet_update_max_qlen(dev);


I think a better place for this would be in usbnet_probe() (together
with all the other dev->xxx initialization). I'm not quite sure how
that could cause the transfer problems you are seeing, but at least
you will no longer initialize the waitqueue multiple times on multiple
usbnet_open (which is probably a bad idea).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
       [not found]                   ` <CAODwPW-E-3HcXCGEmp9JpW6hU-Xt-FHwBtqeSSTDGaRPqO49Sw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-19  1:40                     ` Grant Grundler
       [not found]                       ` <CANEJEGscbVF3gRLNtTAcRSpHGVm1iDDKEeBscXcwj2-YdkiDXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Grant Grundler @ 2014-03-19  1:40 UTC (permalink / raw)
  To: Julius Werner
  Cc: Oliver Neukum, netdev, Freddy Xin,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Allan Chou

On Tue, Mar 18, 2014 at 6:09 PM, Julius Werner <jwerner-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
> I think a better place for this would be in usbnet_probe() (together
> with all the other dev->xxx initialization).

Definitely better.

@@ -1536,6 +1536,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb
        dev->driver_name = name;
        dev->msg_enable = netif_msg_init (msg_level, NETIF_MSG_DRV
                                | NETIF_MSG_PROBE | NETIF_MSG_LINK);
+       init_waitqueue_head(&dev->wait);
        skb_queue_head_init (&dev->rxq);
        skb_queue_head_init (&dev->txq);
        skb_queue_head_init (&dev->done);


> I'm not quite sure how
> that could cause the transfer problems you are seeing, but at least
> you will no longer initialize the waitqueue multiple times on multiple
> usbnet_open (which is probably a bad idea).

Yeah, I agree. I don't expect usbnet_open would get called multiple
times but I'll try the above anyway.

thanks,
grant
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
       [not found]                       ` <CANEJEGscbVF3gRLNtTAcRSpHGVm1iDDKEeBscXcwj2-YdkiDXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-20  0:58                         ` Grant Grundler
  2014-03-20  7:15                           ` Oliver Neukum
  0 siblings, 1 reply; 25+ messages in thread
From: Grant Grundler @ 2014-03-20  0:58 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: netdev, Freddy Xin, linux-usb-u79uwXL29TY76Z2rM5mHXA, Allan Chou,
	Julius Werner

On Tue, Mar 18, 2014 at 6:40 PM, Grant Grundler <grundler-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Tue, Mar 18, 2014 at 6:09 PM, Julius Werner <jwerner-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
>> I think a better place for this would be in usbnet_probe() (together
>> with all the other dev->xxx initialization).
>
> Definitely better.
>
> @@ -1536,6 +1536,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb
>         dev->driver_name = name;
>         dev->msg_enable = netif_msg_init (msg_level, NETIF_MSG_DRV
>                                 | NETIF_MSG_PROBE | NETIF_MSG_LINK);
> +       init_waitqueue_head(&dev->wait);
>         skb_queue_head_init (&dev->rxq);
>         skb_queue_head_init (&dev->txq);
>         skb_queue_head_init (&dev->done);

Oliver,
So even with this additional change to usbnet_probe, the device is
reporting a link but can't transmit packets.
I've tried with three different USB dongles (AX88178, AX88772B, SMSC75xx).

In the last case, I ran "ifconfig eth0 192.168.1.100" and then tried
to ping 192.168.1.1. "ifconfig eth0" then reports "RX packets 0 bytes
0" and "TX packets 0 bytes 10796". The TX packets != bytes seems odd
and suggests (to me) that data getting discarded someplace in the
stack. I confirmed by running tcpdump on the other end of the link
(192.168.1.1 side) and got nothing from 192.168.1.100.

I believe the "10796" TX bytes is due to the connection manager
"trying really hard" to get an IP address (via DHCP).

Your patch looks like it should work. But it's missing something and I
don't have a clue what it is. Ideas?

cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-20  0:58                         ` Grant Grundler
@ 2014-03-20  7:15                           ` Oliver Neukum
  2014-03-20 16:35                             ` Grant Grundler
  0 siblings, 1 reply; 25+ messages in thread
From: Oliver Neukum @ 2014-03-20  7:15 UTC (permalink / raw)
  To: Grant Grundler; +Cc: netdev, Freddy Xin, linux-usb, Allan Chou, Julius Werner

On Wed, 2014-03-19 at 17:58 -0700, Grant Grundler wrote:

> Oliver,
> So even with this additional change to usbnet_probe, the device is
> reporting a link but can't transmit packets.
> I've tried with three different USB dongles (AX88178, AX88772B, SMSC75xx).
> 
> In the last case, I ran "ifconfig eth0 192.168.1.100" and then tried
> to ping 192.168.1.1. "ifconfig eth0" then reports "RX packets 0 bytes
> 0" and "TX packets 0 bytes 10796". The TX packets != bytes seems odd
> and suggests (to me) that data getting discarded someplace in the
> stack. I confirmed by running tcpdump on the other end of the link
> (192.168.1.1 side) and got nothing from 192.168.1.100.
> 
> I believe the "10796" TX bytes is due to the connection manager
> "trying really hard" to get an IP address (via DHCP).
> 
> Your patch looks like it should work. But it's missing something and I
> don't have a clue what it is. Ideas?

I have an idea. Could you test this patch?

	Regards
		Oliver

>From 9a4c4a62c113c5a1abd8345418ccbf95706efdd2 Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum@suse.de>
Date: Tue, 11 Mar 2014 09:59:48 +0100
Subject: [PATCH] usbnet: include wait queue head in device structure

This fixes a race which happens by freeing an object on the stack.
Quoting Julius:
> The issue is
> that it calls usbnet_terminate_urbs() before that, which temporarily
> installs a waitqueue in dev->wait in order to be able to wait on the
> tasklet to run and finish up some queues. The waiting itself looks
> okay, but the access to 'dev->wait' is totally unprotected and can
> race arbitrarily. I think in this case usbnet_bh() managed to succeed
> it's dev->wait check just before usbnet_terminate_urbs() sets it back
> to NULL. The latter then finishes and the waitqueue_t structure on its
> stack gets overwritten by other functions halfway through the
> wake_up() call in usbnet_bh().

The fix is to just not allocate the data structure on the stack.
As dev->wait is abused as a flag it also takes a runtime PM change
to fix this bug.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
---
 drivers/net/usb/usbnet.c   | 29 ++++++++++++++++-------------
 include/linux/usb/usbnet.h |  2 +-
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index dd10d58..a8da1af 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -752,14 +752,12 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs);
 // precondition: never called in_interrupt
 static void usbnet_terminate_urbs(struct usbnet *dev)
 {
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(unlink_wakeup);
 	DECLARE_WAITQUEUE(wait, current);
 	int temp;
 
 	/* ensure there are no more active urbs */
-	add_wait_queue(&unlink_wakeup, &wait);
+	add_wait_queue(&dev->wait, &wait);
 	set_current_state(TASK_UNINTERRUPTIBLE);
-	dev->wait = &unlink_wakeup;
 	temp = unlink_urbs(dev, &dev->txq) +
 		unlink_urbs(dev, &dev->rxq);
 
@@ -773,15 +771,14 @@ static void usbnet_terminate_urbs(struct usbnet *dev)
 				  "waited for %d urb completions\n", temp);
 	}
 	set_current_state(TASK_RUNNING);
-	dev->wait = NULL;
-	remove_wait_queue(&unlink_wakeup, &wait);
+	remove_wait_queue(&dev->wait, &wait);
 }
 
 int usbnet_stop (struct net_device *net)
 {
 	struct usbnet		*dev = netdev_priv(net);
 	struct driver_info	*info = dev->driver_info;
-	int			retval;
+	int			retval, pm;
 
 	clear_bit(EVENT_DEV_OPEN, &dev->flags);
 	netif_stop_queue (net);
@@ -791,6 +788,8 @@ int usbnet_stop (struct net_device *net)
 		   net->stats.rx_packets, net->stats.tx_packets,
 		   net->stats.rx_errors, net->stats.tx_errors);
 
+	/* to not race resume */
+	pm = usb_autopm_get_interface(dev->intf);
 	/* allow minidriver to stop correctly (wireless devices to turn off
 	 * radio etc) */
 	if (info->stop) {
@@ -817,6 +816,9 @@ int usbnet_stop (struct net_device *net)
 	dev->flags = 0;
 	del_timer_sync (&dev->delay);
 	tasklet_kill (&dev->bh);
+	if (!pm)
+		usb_autopm_put_interface(dev->intf);
+
 	if (info->manage_power &&
 	    !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags))
 		info->manage_power(dev, 0);
@@ -1438,10 +1440,9 @@ static void usbnet_bh (unsigned long param)
 	clear_bit(EVENT_RX_KILL, &dev->flags);
 
 	// waiting for all pending urbs to complete?
-	if (dev->wait) {
-		if ((dev->txq.qlen + dev->rxq.qlen + dev->done.qlen) == 0) {
-			wake_up (dev->wait);
-		}
+	if (waitqueue_active(&dev->wait)) {
+		if (dev->txq.qlen + dev->rxq.qlen + dev->done.qlen == 0)
+			wake_up_all(&dev->wait);
 
 	// or are we maybe short a few urbs?
 	} else if (netif_running (dev->net) &&
@@ -1580,6 +1581,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 	dev->driver_name = name;
 	dev->msg_enable = netif_msg_init (msg_level, NETIF_MSG_DRV
 				| NETIF_MSG_PROBE | NETIF_MSG_LINK);
+	init_waitqueue_head(&dev->wait);
 	skb_queue_head_init (&dev->rxq);
 	skb_queue_head_init (&dev->txq);
 	skb_queue_head_init (&dev->done);
@@ -1791,9 +1793,10 @@ int usbnet_resume (struct usb_interface *intf)
 		spin_unlock_irq(&dev->txq.lock);
 
 		if (test_bit(EVENT_DEV_OPEN, &dev->flags)) {
-			/* handle remote wakeup ASAP */
-			if (!dev->wait &&
-				netif_device_present(dev->net) &&
+			/* handle remote wakeup ASAP
+			 * we cannot race against stop
+			 */
+			if (netif_device_present(dev->net) &&
 				!timer_pending(&dev->delay) &&
 				!test_bit(EVENT_RX_HALT, &dev->flags))
 					rx_alloc_submit(dev, GFP_NOIO);
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index e303eef..0662e98 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -30,7 +30,7 @@ struct usbnet {
 	struct driver_info	*driver_info;
 	const char		*driver_name;
 	void			*driver_priv;
-	wait_queue_head_t	*wait;
+	wait_queue_head_t	wait;
 	struct mutex		phy_mutex;
 	unsigned char		suspend_count;
 	unsigned char		pkt_cnt, pkt_err;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-20  7:15                           ` Oliver Neukum
@ 2014-03-20 16:35                             ` Grant Grundler
  2014-03-20 21:19                               ` Grant Grundler
  0 siblings, 1 reply; 25+ messages in thread
From: Grant Grundler @ 2014-03-20 16:35 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: netdev, Freddy Xin, linux-usb, Allan Chou, Julius Werner

On Thu, Mar 20, 2014 at 12:15 AM, Oliver Neukum <oneukum@suse.de> wrote:
...
> I have an idea. Could you test this patch?
...
> -       if (dev->wait) {
..
> +       if (waitqueue_active(&dev->wait)) {

Yes - building new image now (and transfer to USB and boot from USB).
Should know in an hour or so (doing other things in parallel).

I was sure the problem is in usbnet_bh() since that's the only code
change I'm actually exercising (so far). The way I was reading the
code, we might see extra wake_up calls...but there is clearly more
going on.

Can you please explain why we need to check if the waitqueue is active?

This patch should also add a comment to hint why waitqueue_active()
must be called.
Why? Several experienced people looked at the code and didn't see the
problem including the original author of the patch.

thanks,
grant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-20 16:35                             ` Grant Grundler
@ 2014-03-20 21:19                               ` Grant Grundler
       [not found]                                 ` <CANEJEGues6qVUEpWR5ggejg7dmS1PqzOHvZeWdxGJSnJh6c8Tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-03-21  8:33                                 ` Oliver Neukum
  0 siblings, 2 replies; 25+ messages in thread
From: Grant Grundler @ 2014-03-20 21:19 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: netdev, Freddy Xin, linux-usb, Allan Chou, Julius Werner

On Thu, Mar 20, 2014 at 9:35 AM, Grant Grundler <grundler@google.com> wrote:
> On Thu, Mar 20, 2014 at 12:15 AM, Oliver Neukum <oneukum@suse.de> wrote:
> ...
>> I have an idea. Could you test this patch?
> ...
>> -       if (dev->wait) {
> ..
>> +       if (waitqueue_active(&dev->wait)) {
>
> Yes - building new image now (and transfer to USB and boot from USB).
> Should know in an hour or so (doing other things in parallel).

Sorry...took a bit longer since my previous test method (bash
/tmp/reload_asix) was abusing a security exploit that is now
fixed...so had to move my script into a RO executable file system:

for i in `seq 10000`; do echo -n "RELOAD $i  " ; ssh $T
"/bin/reload_asix eth0 1000_full" ; J=$? ; if [ $J -eq 255 ] ; then
echo; " SSH timeout" ; break ; fi ; ssh $T "cat
/var/log/reload-asix.out" ; if [ $J -ne 0 ] ; then echo "  ERROR $J" ;
fi ; sleep 3 ; done | tee ~/reload-AX88178-leon-192.168.1.100-06.out

This is running now and things look happy so far. :) This will take
more than 30h to complete.

So please add:
  "Tested-by: Grant Grundler <grundler@chromium.org>"

> Can you please explain why we need to check if the waitqueue is active?

and add a comment that answers the above question.

thanks!
grant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
       [not found]                                 ` <CANEJEGues6qVUEpWR5ggejg7dmS1PqzOHvZeWdxGJSnJh6c8Tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-20 23:53                                   ` Julius Werner
       [not found]                                     ` <CAODwPW8vJf_AWDiea5jcunMu8HjqsDnRMfy25BkGDPHuVuRRHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Julius Werner @ 2014-03-20 23:53 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Oliver Neukum, netdev, Freddy Xin,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Allan Chou, Julius Werner

>> Can you please explain why we need to check if the waitqueue is active?
>
> and add a comment that answers the above question.

Ooooohhhhh.... the braces!!! Well, that's just mean...

I expect that we don't really need the waitqueue_active() check there
as long as we fix the patch to make sure the control flow in the rest
of the statements actually stays the same. (That's why I really like
to put comments for an else block next to or under the opening brace,
instead of above with another empty line...)
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
       [not found]                                     ` <CAODwPW8vJf_AWDiea5jcunMu8HjqsDnRMfy25BkGDPHuVuRRHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-21  8:32                                       ` Oliver Neukum
  0 siblings, 0 replies; 25+ messages in thread
From: Oliver Neukum @ 2014-03-21  8:32 UTC (permalink / raw)
  To: Julius Werner
  Cc: Grant Grundler, netdev, Freddy Xin,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Allan Chou

On Thu, 2014-03-20 at 16:53 -0700, Julius Werner wrote:
> >> Can you please explain why we need to check if the waitqueue is active?
> >
> > and add a comment that answers the above question.
> 
> Ooooohhhhh.... the braces!!! Well, that's just mean...

Yes, I was unsure about this, but so it is.

> I expect that we don't really need the waitqueue_active() check there
> as long as we fix the patch to make sure the control flow in the rest
> of the statements actually stays the same. (That's why I really like
> to put comments for an else block next to or under the opening brace,
> instead of above with another empty line...)


We cannot. If the driver intends to shut down traffic then resubmission
in the bottom half must not happen. The check for the intention to shut
down is necessary.

	Regards
		Oliver



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-20 21:19                               ` Grant Grundler
       [not found]                                 ` <CANEJEGues6qVUEpWR5ggejg7dmS1PqzOHvZeWdxGJSnJh6c8Tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-21  8:33                                 ` Oliver Neukum
  2014-03-21 16:18                                   ` Grant Grundler
  2014-03-24 19:27                                   ` Grant Grundler
  1 sibling, 2 replies; 25+ messages in thread
From: Oliver Neukum @ 2014-03-21  8:33 UTC (permalink / raw)
  To: Grant Grundler; +Cc: netdev, Freddy Xin, linux-usb, Allan Chou, Julius Werner

On Thu, 2014-03-20 at 14:19 -0700, Grant Grundler wrote:
> On Thu, Mar 20, 2014 at 9:35 AM, Grant Grundler <grundler@google.com> wrote:
> > On Thu, Mar 20, 2014 at 12:15 AM, Oliver Neukum <oneukum@suse.de> wrote:
> > ...
> >> I have an idea. Could you test this patch?
> > ...
> >> -       if (dev->wait) {
> > ..
> >> +       if (waitqueue_active(&dev->wait)) {
> >
> > Yes - building new image now (and transfer to USB and boot from USB).
> > Should know in an hour or so (doing other things in parallel).
> 
> Sorry...took a bit longer since my previous test method (bash
> /tmp/reload_asix) was abusing a security exploit that is now
> fixed...so had to move my script into a RO executable file system:
> 
> for i in `seq 10000`; do echo -n "RELOAD $i  " ; ssh $T
> "/bin/reload_asix eth0 1000_full" ; J=$? ; if [ $J -eq 255 ] ; then
> echo; " SSH timeout" ; break ; fi ; ssh $T "cat
> /var/log/reload-asix.out" ; if [ $J -ne 0 ] ; then echo "  ERROR $J" ;
> fi ; sleep 3 ; done | tee ~/reload-AX88178-leon-192.168.1.100-06.out
> 
> This is running now and things look happy so far. :) This will take
> more than 30h to complete.

Very well. Thorough testing is good. I'll wait for the result. Could
you notify me of the final outcome?

And, Julius, I owe you an apology. The check is necessary.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-21  8:33                                 ` Oliver Neukum
@ 2014-03-21 16:18                                   ` Grant Grundler
  2014-03-24 19:27                                   ` Grant Grundler
  1 sibling, 0 replies; 25+ messages in thread
From: Grant Grundler @ 2014-03-21 16:18 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: netdev, Freddy Xin, linux-usb, Allan Chou, Julius Werner

On Fri, Mar 21, 2014 at 1:33 AM, Oliver Neukum <oneukum@suse.de> wrote:
...
> Very well. Thorough testing is good. I'll wait for the result. Could
> you notify me of the final outcome?

Ceratinly. Bad news is I had to restart my workstation that was
driving the test due to completely different issue (loopbacks hung -
known bug). reload_asix ran about 2000 iterations with no problems on
AX88178 part.
It's running again now on the same target machine.

> And, Julius, I owe you an apology. The check is necessary.

++oliver

Sorry to harp on this more, but I still think a comment is warranted:
  e.g. "Don't resend packets during shutdown of this USB net device."

And instead of naming the queue "wait", call the field "shutdown queue".

cheers,
grant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: usbnet: driver_info->stop required to stop USB interrupts?
  2014-03-21  8:33                                 ` Oliver Neukum
  2014-03-21 16:18                                   ` Grant Grundler
@ 2014-03-24 19:27                                   ` Grant Grundler
  1 sibling, 0 replies; 25+ messages in thread
From: Grant Grundler @ 2014-03-24 19:27 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: netdev, Freddy Xin, linux-usb, Allan Chou, Julius Werner

On Fri, Mar 21, 2014 at 1:33 AM, Oliver Neukum <oneukum@suse.de> wrote:
...
> Very well. Thorough testing is good. I'll wait for the result. Could
> you notify me of the final outcome?

Ship it. :)

The testing so far has completed over 15K iterations of unload/reload
of the asix driver on the original failing system with no failures so
far. Previously I could reproduce the problem in 2K-8K iterations.

The caveat is due to various issues with my test environment, the test
never ran more than 7K iterations continuously. The ~15k is spread out
across four invocations (2K, 3K, 7K, 4K) of the previously shared
"for" loop.

Please add my Reported-by + Tested-by to this patch.

thanks
grant

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-03-24 19:27 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-10 18:33 usbnet: driver_info->stop required to stop USB interrupts? Grant Grundler
2014-03-10 22:58 ` Grant Grundler
2014-03-11  2:53   ` Julius Werner
2014-03-11  9:10     ` Oliver Neukum
2014-03-11  9:14     ` Oliver Neukum
     [not found]       ` <1394529263.16128.6.camel-B2T3B9s34ElbnMAlSieJcQ@public.gmane.org>
2014-03-11 17:27         ` Julius Werner
2014-03-17 21:53           ` Julius Werner
2014-03-17 21:55             ` Oliver Neukum
2014-03-17 22:02               ` Grant Grundler
2014-03-18 20:51               ` Grant Grundler
     [not found]                 ` <CANEJEGuaVPNZKFiXY9CywFuqghgiQqV_sii7TE3STCNvXLQ3pQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-18 20:52                   ` Grant Grundler
2014-03-19  1:09                 ` Julius Werner
     [not found]                   ` <CAODwPW-E-3HcXCGEmp9JpW6hU-Xt-FHwBtqeSSTDGaRPqO49Sw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-19  1:40                     ` Grant Grundler
     [not found]                       ` <CANEJEGscbVF3gRLNtTAcRSpHGVm1iDDKEeBscXcwj2-YdkiDXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-20  0:58                         ` Grant Grundler
2014-03-20  7:15                           ` Oliver Neukum
2014-03-20 16:35                             ` Grant Grundler
2014-03-20 21:19                               ` Grant Grundler
     [not found]                                 ` <CANEJEGues6qVUEpWR5ggejg7dmS1PqzOHvZeWdxGJSnJh6c8Tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-20 23:53                                   ` Julius Werner
     [not found]                                     ` <CAODwPW8vJf_AWDiea5jcunMu8HjqsDnRMfy25BkGDPHuVuRRHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-21  8:32                                       ` Oliver Neukum
2014-03-21  8:33                                 ` Oliver Neukum
2014-03-21 16:18                                   ` Grant Grundler
2014-03-24 19:27                                   ` Grant Grundler
2014-03-11 17:31     ` Grant Grundler
2014-03-11 17:37       ` Grant Grundler
     [not found]       ` <CANEJEGv77PnbLZsyOdTevXjij-_wLSq4uGCnSjnAwN9OK8m3Yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-11 17:52         ` Grant Grundler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.