* cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
@ 2021-03-04 22:59 Ramsay, Lincoln
2021-03-09 12:52 ` Oliver Neukum
0 siblings, 1 reply; 6+ messages in thread
From: Ramsay, Lincoln @ 2021-03-04 22:59 UTC (permalink / raw)
To: oneukum; +Cc: linux-usb
Hi folks,
Opengear makes a device (OM2200) that you're supposed to plug into consoles in order to access them remotely but the Cisco 2960-X is causing us grief. We can trivially break our device in just 3 steps.
1. Connect the Cisco 2960-X console.
2. (Re)boot our device.
3. Open the Cisco's console device (/dev/ttyACM0) and write to it.
When we were using Linux 5.2.32 this wasn't fatal. It was possible to disconnect and reconnect the Cisco and it would work as expected. The same was observed on our older devices that run Linux 3.10 on ARM and on a laptop running macOS 10.13. But we upgraded to Linux 5.4.61 and it got much worse. I did some digging and it seems that the cdc-acm cooldown commit (f4d1cf2ef83caeab212e842fd238cb8353f59fa2) is the cause.
Before I continue, I need to acknowledge that the Cisco 2960-X is really broken. Unlike every other Cisco console I could find to test with, it shows up as USB 2 rather than USB 1, causes warnings to be printed and sends corrupt identity strings.
usb 2-1.1: new high-speed USB device number 6 using ehci-pci
usb 2-1.1: config 1 interface 0 altsetting 0 endpoint 0x82 has an invalid bInterval 255, changing to 11
usb 2-1.1: config 1 interface 1 altsetting 0 bulk endpoint 0x1 has invalid maxpacket 64
usb 2-1.1: config 1 interface 1 altsetting 0 bulk endpoint 0x81 has invalid maxpacket 64
usb 2-1.1: New USB device found, idVendor=05a6, idProduct=0009, bcdDevice= 0.00
usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-1.1: Product: C�~B�~@~@ल^D
usb 2-1.1: Manufacturer: C�~B�~@~@ल^D
usb 2-1.1: SerialNumber: C�~B�~@~@ल^D�~@�~B
cdc_acm 2-1.1:1.0: ttyACM0: USB ACM device
Despite this though, it does seem to work, except when it is connected during boot. In that case, we get this kernel warning:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 0 at kernel/workqueue.c:1477 __queue_work+0x25a/0x300
Modules linked in: xt_CT xt_tcpudp nf_nat_tftp nft_objref nf_conntrack_tftp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables_set nft_chain_nat ip6table_nat ip>
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 5.4.61-og #1
Hardware name: Opengear hedgehog/hedgehog, BIOS 698f4312a5-jenkins 08/28/2020
RIP: 0010:__queue_work+0x25a/0x300
Code: 94 b5 73 a9 00 01 1f 00 75 0f 65 48 8b 3c 25 00 5d 01 00 f6 47 24 20 75 24 0f 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b e9 79 fe ff ff 48 8d 53 60 83 c9 02 >
RSP: 0018:ffffb59640114e88 EFLAGS: 00010002
RAX: ffffa049e7203790 RBX: ffffa049eaba2f00 RCX: ffffa049c79f61b8
RDX: ffffa049e7203798 RSI: 000000007fffffff RDI: ffffa049eab9ef80
RBP: ffffa049ea010000 R08: 0000000000000000 R09: ffffb59640114db8
R10: 0000000000000040 R11: 0000000000000000 R12: 0000000000000003
R13: 0000000000000007 R14: 0000000000000004 R15: ffffa049e7203790
FS: 0000000000000000(0000) GS:ffffa049eab80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff773c1a024 CR3: 000000011e548000 CR4: 00000000000406e0
Call Trace:
<IRQ>
queue_work_on+0x17/0x20
__usb_hcd_giveback_urb+0x4e/0xb0
usb_giveback_urb_bh+0x8e/0xe0
tasklet_action_common.isra.0+0x48/0xa0
__do_softirq+0xd1/0x213
irq_exit+0xc8/0xd0
do_IRQ+0x48/0xd0
common_interrupt+0xf/0xf
</IRQ>
RIP: 0010:cpuidle_enter_state+0x120/0x2a0
Code: e8 75 8a aa ff 31 ff 49 89 c6 e8 bb 9a aa ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 6a 01 00 00 31 ff e8 54 c3 ae ff fb 45 85 ed <0f> 88 c2 00 00 00 49 63 f5 4c 89 f1 48 8d >
RSP: 0018:ffffb59640087e80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdd
RAX: ffffa049eab9f600 RBX: ffffffff8d650480 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 00000000803d7d59 RDI: 0000000000000000
RBP: 00000210865b9db0 R08: 00000210865f689a R09: 000000007fffffff
R10: ffffa049eab9e700 R11: ffffa049eab9e6e0 R12: ffffa049e78c9000
R13: 0000000000000002 R14: 00000210865f689a R15: 0000000000000000
cpuidle_enter+0x24/0x40
do_idle+0x1bf/0x230
cpu_startup_entry+0x14/0x20
start_secondary+0x14a/0x180
secondary_startup_64+0xa4/0xb0
---[ end trace 12a803438e4082c9 ]--
It comes from __queue_work: WARN_ON(!list_empty(&work->entry))
Once this happens, we can no longer disconnect and reconnect the Cisco. Only a reboot seems to get things working again. If we disconnect and reconnect the Cisco without writing to it, we avoid the issue.
While reverting the cdc-acm cooldown patch gets us back to the not-great-but-not-fatal behaviour, I don't feel that this is a useful long-term situation. I guess that someone (probably me - I doubt many people have access to one of these things) needs to see if we can make the Cisco 2960-X behave better, maybe by enabling some of the 'quirks' in the cdc-adm driver.
But I also wonder why this cooldown is triggering the error, and if there's maybe something in here that is bad, but only exposed by a broken device like Cisco?
Any guidance would be appreciated.
Lincoln
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
2021-03-04 22:59 cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB Ramsay, Lincoln
@ 2021-03-09 12:52 ` Oliver Neukum
2021-03-10 1:30 ` Ramsay, Lincoln
0 siblings, 1 reply; 6+ messages in thread
From: Oliver Neukum @ 2021-03-09 12:52 UTC (permalink / raw)
To: Ramsay, Lincoln; +Cc: linux-usb
Am Donnerstag, den 04.03.2021, 22:59 +0000 schrieb Ramsay, Lincoln:
> Hi folks,
>
> Opengear makes a device (OM2200) that you're supposed to plug into consoles in order to access them remotely but the Cisco 2960-X is causing us grief. We can trivially break our device in just 3 steps.
>
> 1. Connect the Cisco 2960-X console.
> 2. (Re)boot our device.
> 3. Open the Cisco's console device (/dev/ttyACM0) and write to it.
What exactly happens after that?
> When we were using Linux 5.2.32 this wasn't fatal. It was possible to disconnect and reconnect the Cisco and it would work as expected. The same was observed on our older devices that run Linux 3.10 on ARM and on a laptop running macOS 10.13. But we upgraded to Linux 5.4.61 and it got much worse. I did some digging and it seems that the cdc-acm cooldown commit (f4d1cf2ef83caeab212e842fd238cb8353f59fa2) is the cause.
>
> Before I continue, I need to acknowledge that the Cisco 2960-X is really broken. Unlike every other Cisco console I could find to test with, it shows up as USB 2 rather than USB 1, causes warnings to be printed and sends corrupt identity strings.
>
> usb 2-1.1: new high-speed USB device number 6 using ehci-pci
> usb 2-1.1: config 1 interface 0 altsetting 0 endpoint 0x82 has an invalid bInterval 255, changing to 11
> usb 2-1.1: config 1 interface 1 altsetting 0 bulk endpoint 0x1 has invalid maxpacket 64
> usb 2-1.1: config 1 interface 1 altsetting 0 bulk endpoint 0x81 has invalid maxpacket 64
> usb 2-1.1: New USB device found, idVendor=05a6, idProduct=0009, bcdDevice= 0.00
> usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> usb 2-1.1: Product: C�~B�~@~@ल^D
> usb 2-1.1: Manufacturer: C�~B�~@~@ल^D
> usb 2-1.1: SerialNumber: C�~B�~@~@ल^D�~@�~B
> cdc_acm 2-1.1:1.0: ttyACM0: USB ACM device
>
> Despite this though, it does seem to work, except when it is connected during boot. In that case, we get this kernel warning:
Did your test kernel contain 38203b8385bf6283537162bde7d499f83096471 ?
Regards
Oliver
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
2021-03-09 12:52 ` Oliver Neukum
@ 2021-03-10 1:30 ` Ramsay, Lincoln
2021-03-10 8:45 ` Oliver Neukum
2021-03-10 8:56 ` Greg KH
0 siblings, 2 replies; 6+ messages in thread
From: Ramsay, Lincoln @ 2021-03-10 1:30 UTC (permalink / raw)
To: Oliver Neukum; +Cc: linux-usb
> Am Donnerstag, den 04.03.2021, 22:59 +0000 schrieb Ramsay, Lincoln:
> > 1. Connect the Cisco 2960-X console.
> > 2. (Re)boot our device.
> > 3. Open the Cisco's console device (/dev/ttyACM0) and write to it.
>
> What exactly happens after that?
The kernel warning about the empty work on the queue is printed to the console (and journal) and then nothing. Reading/writing doesn't work (but it didn't work before the cooldown patch either). The system doesn't die (ie. networking is still going) but USB appears to be dead (though I only tested the same console being connected to different USB ports).
> Did your test kernel contain 38203b8385bf6283537162bde7d499f83096471 ?
No... our newest builds use kernel 5.8.18 and that commit seems to be in 5.10. But backporting that to our kernel seems like a much nicer fix than reverting the cooldown patch.
I tried doing that and it is good. It doesn't make the Cisco magically work but there's no kernel warning and USB isn't dead so the console can be disconnected and re-connected and it works again. Nice.
Unless you've got any tips for dealing with the Cisco's brokenness, I guess we're all good.
Thanks,
Lincoln
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
2021-03-10 1:30 ` Ramsay, Lincoln
@ 2021-03-10 8:45 ` Oliver Neukum
2021-03-10 22:45 ` Ramsay, Lincoln
2021-03-10 8:56 ` Greg KH
1 sibling, 1 reply; 6+ messages in thread
From: Oliver Neukum @ 2021-03-10 8:45 UTC (permalink / raw)
To: Ramsay, Lincoln; +Cc: linux-usb
Am Mittwoch, den 10.03.2021, 01:30 +0000 schrieb Ramsay, Lincoln:
> > Am Donnerstag, den 04.03.2021, 22:59 +0000 schrieb Ramsay, Lincoln:
> > > 1. Connect the Cisco 2960-X console.
> > > 2. (Re)boot our device.
> > > 3. Open the Cisco's console device (/dev/ttyACM0) and write to it.
> >
> > What exactly happens after that?
>
> The kernel warning about the empty work on the queue is printed to the console (and journal) and then nothing. Reading/writing doesn't work (but it didn't work before the cooldown patch either). The system doesn't die (ie. networking is still going) but USB appears to be dead (though I only tested the same console being connected to different USB ports).
>
> > Did your test kernel contain 38203b8385bf6283537162bde7d499f83096471 ?
>
> No... our newest builds use kernel 5.8.18 and that commit seems to be in 5.10. But backporting that to our kernel seems like a much nicer fix than reverting the cooldown patch.
Good. So are the two failure modes identical now?
> I tried doing that and it is good. It doesn't make the Cisco magically work but there's no kernel warning and USB isn't dead so the console can be disconnected and re-connected and it works again. Nice.
>
> Unless you've got any tips for dealing with the Cisco's brokenness, I guess we're all good.
You could try the reset utility from the newest set of usb tools.
Regards
Oliver
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
2021-03-10 1:30 ` Ramsay, Lincoln
2021-03-10 8:45 ` Oliver Neukum
@ 2021-03-10 8:56 ` Greg KH
1 sibling, 0 replies; 6+ messages in thread
From: Greg KH @ 2021-03-10 8:56 UTC (permalink / raw)
To: Ramsay, Lincoln; +Cc: Oliver Neukum, linux-usb
On Wed, Mar 10, 2021 at 01:30:21AM +0000, Ramsay, Lincoln wrote:
> > Am Donnerstag, den 04.03.2021, 22:59 +0000 schrieb Ramsay, Lincoln:
> > > 1. Connect the Cisco 2960-X console.
> > > 2. (Re)boot our device.
> > > 3. Open the Cisco's console device (/dev/ttyACM0) and write to it.
> >
> > What exactly happens after that?
>
> The kernel warning about the empty work on the queue is printed to the console (and journal) and then nothing. Reading/writing doesn't work (but it didn't work before the cooldown patch either). The system doesn't die (ie. networking is still going) but USB appears to be dead (though I only tested the same console being connected to different USB ports).
>
> > Did your test kernel contain 38203b8385bf6283537162bde7d499f83096471 ?
>
> No... our newest builds use kernel 5.8.18 and that commit seems to be
> in 5.10. But backporting that to our kernel seems like a much nicer
> fix than reverting the cooldown patch.
Please note that 5.8.18 is very old and obsolete and known insecure.
Please do not use it if at all possible as the community can not support
it at all, and I doubt whomever you got that kernel from can either :(
thanks,
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
2021-03-10 8:45 ` Oliver Neukum
@ 2021-03-10 22:45 ` Ramsay, Lincoln
0 siblings, 0 replies; 6+ messages in thread
From: Ramsay, Lincoln @ 2021-03-10 22:45 UTC (permalink / raw)
To: Oliver Neukum; +Cc: linux-usb
> Good. So are the two failure modes identical now?
Yes.
>> Unless you've got any tips for dealing with the Cisco's brokenness, I guess we're all good.
>
> You could try the reset utility from the newest set of usb tools.
I'll take a look.
Thanks!
Lincoln
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-03-10 22:46 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-04 22:59 cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB Ramsay, Lincoln
2021-03-09 12:52 ` Oliver Neukum
2021-03-10 1:30 ` Ramsay, Lincoln
2021-03-10 8:45 ` Oliver Neukum
2021-03-10 22:45 ` Ramsay, Lincoln
2021-03-10 8:56 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).