linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
@ 2021-03-04 22:59 Ramsay, Lincoln
  2021-03-09 12:52 ` Oliver Neukum
  0 siblings, 1 reply; 6+ messages in thread
From: Ramsay, Lincoln @ 2021-03-04 22:59 UTC (permalink / raw)
  To: oneukum; +Cc: linux-usb

Hi folks,

Opengear makes a device (OM2200) that you're supposed to plug into consoles in order to access them remotely but the Cisco 2960-X is causing us grief. We can trivially break our device in just 3 steps.

1. Connect the Cisco 2960-X console.
2. (Re)boot our device.
3. Open the Cisco's console device (/dev/ttyACM0) and write to it.

When we were using Linux 5.2.32 this wasn't fatal. It was possible to disconnect and reconnect the Cisco and it would work as expected. The same was observed on our older devices that run Linux 3.10 on ARM and on a laptop running macOS 10.13. But we upgraded to Linux 5.4.61 and it got much worse. I did some digging and it seems that the cdc-acm cooldown commit (f4d1cf2ef83caeab212e842fd238cb8353f59fa2) is the cause.

Before I continue, I need to acknowledge that the Cisco 2960-X is really broken. Unlike every other Cisco console I could find to test with, it shows up as USB 2 rather than USB 1, causes warnings to be printed and sends corrupt identity strings.

    usb 2-1.1: new high-speed USB device number 6 using ehci-pci
    usb 2-1.1: config 1 interface 0 altsetting 0 endpoint 0x82 has an invalid bInterval 255, changing to 11
    usb 2-1.1: config 1 interface 1 altsetting 0 bulk endpoint 0x1 has invalid maxpacket 64
    usb 2-1.1: config 1 interface 1 altsetting 0 bulk endpoint 0x81 has invalid maxpacket 64
    usb 2-1.1: New USB device found, idVendor=05a6, idProduct=0009, bcdDevice= 0.00
    usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
    usb 2-1.1: Product: C�~B�~@~@ल^D
    usb 2-1.1: Manufacturer: C�~B�~@~@ल^D
    usb 2-1.1: SerialNumber: C�~B�~@~@ल^D�~@�~B
    cdc_acm 2-1.1:1.0: ttyACM0: USB ACM device

Despite this though, it does seem to work, except when it is connected during boot. In that case, we get this kernel warning:

    ------------[ cut here ]------------
    WARNING: CPU: 3 PID: 0 at kernel/workqueue.c:1477 __queue_work+0x25a/0x300
    Modules linked in: xt_CT xt_tcpudp nf_nat_tftp nft_objref nf_conntrack_tftp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables_set nft_chain_nat ip6table_nat ip>
    CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           O      5.4.61-og #1
    Hardware name: Opengear hedgehog/hedgehog, BIOS 698f4312a5-jenkins 08/28/2020
    RIP: 0010:__queue_work+0x25a/0x300
    Code: 94 b5 73 a9 00 01 1f 00 75 0f 65 48 8b 3c 25 00 5d 01 00 f6 47 24 20 75 24 0f 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b e9 79 fe ff ff 48 8d 53 60 83 c9 02 >
    RSP: 0018:ffffb59640114e88 EFLAGS: 00010002
    RAX: ffffa049e7203790 RBX: ffffa049eaba2f00 RCX: ffffa049c79f61b8
    RDX: ffffa049e7203798 RSI: 000000007fffffff RDI: ffffa049eab9ef80
    RBP: ffffa049ea010000 R08: 0000000000000000 R09: ffffb59640114db8
    R10: 0000000000000040 R11: 0000000000000000 R12: 0000000000000003
    R13: 0000000000000007 R14: 0000000000000004 R15: ffffa049e7203790
    FS:  0000000000000000(0000) GS:ffffa049eab80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007ff773c1a024 CR3: 000000011e548000 CR4: 00000000000406e0
    Call Trace:
     <IRQ>
     queue_work_on+0x17/0x20
     __usb_hcd_giveback_urb+0x4e/0xb0
     usb_giveback_urb_bh+0x8e/0xe0
     tasklet_action_common.isra.0+0x48/0xa0
     __do_softirq+0xd1/0x213
     irq_exit+0xc8/0xd0
     do_IRQ+0x48/0xd0
     common_interrupt+0xf/0xf
     </IRQ>
    RIP: 0010:cpuidle_enter_state+0x120/0x2a0
    Code: e8 75 8a aa ff 31 ff 49 89 c6 e8 bb 9a aa ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 6a 01 00 00 31 ff e8 54 c3 ae ff fb 45 85 ed <0f> 88 c2 00 00 00 49 63 f5 4c 89 f1 48 8d >
    RSP: 0018:ffffb59640087e80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdd
    RAX: ffffa049eab9f600 RBX: ffffffff8d650480 RCX: 000000000000001f
    RDX: 0000000000000000 RSI: 00000000803d7d59 RDI: 0000000000000000
    RBP: 00000210865b9db0 R08: 00000210865f689a R09: 000000007fffffff
    R10: ffffa049eab9e700 R11: ffffa049eab9e6e0 R12: ffffa049e78c9000
    R13: 0000000000000002 R14: 00000210865f689a R15: 0000000000000000
     cpuidle_enter+0x24/0x40
     do_idle+0x1bf/0x230
     cpu_startup_entry+0x14/0x20
     start_secondary+0x14a/0x180
     secondary_startup_64+0xa4/0xb0
    ---[ end trace 12a803438e4082c9 ]--

It comes from __queue_work: WARN_ON(!list_empty(&work->entry))

Once this happens, we can no longer disconnect and reconnect the Cisco. Only a reboot seems to get things working again. If we disconnect and reconnect the Cisco without writing to it, we avoid the issue.


While reverting the cdc-acm cooldown patch gets us back to the not-great-but-not-fatal behaviour, I don't feel that this is a useful long-term situation. I guess that someone (probably me - I doubt many people have access to one of these things) needs to see if we can make the Cisco 2960-X behave better, maybe by enabling some of the 'quirks' in the cdc-adm driver.

But I also wonder why this cooldown is triggering the error, and if there's maybe something in here that is bad, but only exposed by a broken device like Cisco?

Any guidance would be appreciated.

Lincoln

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
  2021-03-04 22:59 cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB Ramsay, Lincoln
@ 2021-03-09 12:52 ` Oliver Neukum
  2021-03-10  1:30   ` Ramsay, Lincoln
  0 siblings, 1 reply; 6+ messages in thread
From: Oliver Neukum @ 2021-03-09 12:52 UTC (permalink / raw)
  To: Ramsay, Lincoln; +Cc: linux-usb

Am Donnerstag, den 04.03.2021, 22:59 +0000 schrieb Ramsay, Lincoln:
> Hi folks,
> 
> Opengear makes a device (OM2200) that you're supposed to plug into consoles in order to access them remotely but the Cisco 2960-X is causing us grief. We can trivially break our device in just 3 steps.
> 
> 1. Connect the Cisco 2960-X console.
> 2. (Re)boot our device.
> 3. Open the Cisco's console device (/dev/ttyACM0) and write to it.

What exactly happens after that?

> When we were using Linux 5.2.32 this wasn't fatal. It was possible to disconnect and reconnect the Cisco and it would work as expected. The same was observed on our older devices that run Linux 3.10 on ARM and on a laptop running macOS 10.13. But we upgraded to Linux 5.4.61 and it got much worse. I did some digging and it seems that the cdc-acm cooldown commit (f4d1cf2ef83caeab212e842fd238cb8353f59fa2) is the cause.
> 
> Before I continue, I need to acknowledge that the Cisco 2960-X is really broken. Unlike every other Cisco console I could find to test with, it shows up as USB 2 rather than USB 1, causes warnings to be printed and sends corrupt identity strings.
> 
>     usb 2-1.1: new high-speed USB device number 6 using ehci-pci
>     usb 2-1.1: config 1 interface 0 altsetting 0 endpoint 0x82 has an invalid bInterval 255, changing to 11
>     usb 2-1.1: config 1 interface 1 altsetting 0 bulk endpoint 0x1 has invalid maxpacket 64
>     usb 2-1.1: config 1 interface 1 altsetting 0 bulk endpoint 0x81 has invalid maxpacket 64
>     usb 2-1.1: New USB device found, idVendor=05a6, idProduct=0009, bcdDevice= 0.00
>     usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
>     usb 2-1.1: Product: C�~B�~@~@ल^D
>     usb 2-1.1: Manufacturer: C�~B�~@~@ल^D
>     usb 2-1.1: SerialNumber: C�~B�~@~@ल^D�~@�~B
>     cdc_acm 2-1.1:1.0: ttyACM0: USB ACM device
> 
> Despite this though, it does seem to work, except when it is connected during boot. In that case, we get this kernel warning:

Did your test kernel contain 38203b8385bf6283537162bde7d499f83096471 ?

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
  2021-03-09 12:52 ` Oliver Neukum
@ 2021-03-10  1:30   ` Ramsay, Lincoln
  2021-03-10  8:45     ` Oliver Neukum
  2021-03-10  8:56     ` Greg KH
  0 siblings, 2 replies; 6+ messages in thread
From: Ramsay, Lincoln @ 2021-03-10  1:30 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-usb

> Am Donnerstag, den 04.03.2021, 22:59 +0000 schrieb Ramsay, Lincoln:
> > 1. Connect the Cisco 2960-X console.
> > 2. (Re)boot our device.
> > 3. Open the Cisco's console device (/dev/ttyACM0) and write to it.
> 
> What exactly happens after that?

The kernel warning about the empty work on the queue is printed to the console (and journal) and then nothing. Reading/writing doesn't work (but it didn't work before the cooldown patch either). The system doesn't die (ie. networking is still going) but USB appears to be dead (though I only tested the same console being connected to different USB ports).

> Did your test kernel contain 38203b8385bf6283537162bde7d499f83096471 ?

No... our newest builds use kernel 5.8.18 and that commit seems to be in 5.10. But backporting that to our kernel seems like a much nicer fix than reverting the cooldown patch.

I tried doing that and it is good. It doesn't make the Cisco magically work but there's no kernel warning and USB isn't dead so the console can be disconnected and re-connected and it works again. Nice.

Unless you've got any tips for dealing with the Cisco's brokenness, I guess we're all good.

Thanks,
Lincoln

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
  2021-03-10  1:30   ` Ramsay, Lincoln
@ 2021-03-10  8:45     ` Oliver Neukum
  2021-03-10 22:45       ` Ramsay, Lincoln
  2021-03-10  8:56     ` Greg KH
  1 sibling, 1 reply; 6+ messages in thread
From: Oliver Neukum @ 2021-03-10  8:45 UTC (permalink / raw)
  To: Ramsay, Lincoln; +Cc: linux-usb

Am Mittwoch, den 10.03.2021, 01:30 +0000 schrieb Ramsay, Lincoln:
> > Am Donnerstag, den 04.03.2021, 22:59 +0000 schrieb Ramsay, Lincoln:
> > > 1. Connect the Cisco 2960-X console.
> > > 2. (Re)boot our device.
> > > 3. Open the Cisco's console device (/dev/ttyACM0) and write to it.
> > 
> > What exactly happens after that?
> 
> The kernel warning about the empty work on the queue is printed to the console (and journal) and then nothing. Reading/writing doesn't work (but it didn't work before the cooldown patch either). The system doesn't die (ie. networking is still going) but USB appears to be dead (though I only tested the same console being connected to different USB ports).
> 
> > Did your test kernel contain 38203b8385bf6283537162bde7d499f83096471 ?
> 
> No... our newest builds use kernel 5.8.18 and that commit seems to be in 5.10. But backporting that to our kernel seems like a much nicer fix than reverting the cooldown patch.

Good. So are the two failure modes identical now?

> I tried doing that and it is good. It doesn't make the Cisco magically work but there's no kernel warning and USB isn't dead so the console can be disconnected and re-connected and it works again. Nice.
> 
> Unless you've got any tips for dealing with the Cisco's brokenness, I guess we're all good.

You could try the reset utility from the newest set of usb tools.

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
  2021-03-10  1:30   ` Ramsay, Lincoln
  2021-03-10  8:45     ` Oliver Neukum
@ 2021-03-10  8:56     ` Greg KH
  1 sibling, 0 replies; 6+ messages in thread
From: Greg KH @ 2021-03-10  8:56 UTC (permalink / raw)
  To: Ramsay, Lincoln; +Cc: Oliver Neukum, linux-usb

On Wed, Mar 10, 2021 at 01:30:21AM +0000, Ramsay, Lincoln wrote:
> > Am Donnerstag, den 04.03.2021, 22:59 +0000 schrieb Ramsay, Lincoln:
> > > 1. Connect the Cisco 2960-X console.
> > > 2. (Re)boot our device.
> > > 3. Open the Cisco's console device (/dev/ttyACM0) and write to it.
> > 
> > What exactly happens after that?
> 
> The kernel warning about the empty work on the queue is printed to the console (and journal) and then nothing. Reading/writing doesn't work (but it didn't work before the cooldown patch either). The system doesn't die (ie. networking is still going) but USB appears to be dead (though I only tested the same console being connected to different USB ports).
> 
> > Did your test kernel contain 38203b8385bf6283537162bde7d499f83096471 ?
> 
> No... our newest builds use kernel 5.8.18 and that commit seems to be
> in 5.10. But backporting that to our kernel seems like a much nicer
> fix than reverting the cooldown patch.

Please note that 5.8.18 is very old and obsolete and known insecure.
Please do not use it if at all possible as the community can not support
it at all, and I doubt whomever you got that kernel from can either :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB
  2021-03-10  8:45     ` Oliver Neukum
@ 2021-03-10 22:45       ` Ramsay, Lincoln
  0 siblings, 0 replies; 6+ messages in thread
From: Ramsay, Lincoln @ 2021-03-10 22:45 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-usb

> Good. So are the two failure modes identical now?

Yes.

>> Unless you've got any tips for dealing with the Cisco's brokenness, I guess we're all good.
>
> You could try the reset utility from the newest set of usb tools.

I'll take a look.

Thanks!
Lincoln

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-10 22:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-04 22:59 cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB Ramsay, Lincoln
2021-03-09 12:52 ` Oliver Neukum
2021-03-10  1:30   ` Ramsay, Lincoln
2021-03-10  8:45     ` Oliver Neukum
2021-03-10 22:45       ` Ramsay, Lincoln
2021-03-10  8:56     ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).