All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel Oops in cdc_acm
@ 2020-05-25 12:00 Jean Rene Dawin
  2020-05-25 12:28 ` Oliver Neukum
  0 siblings, 1 reply; 12+ messages in thread
From: Jean Rene Dawin @ 2020-05-25 12:00 UTC (permalink / raw)
  To: linux-usb; +Cc: Oliver Neukum

[-- Attachment #1: Type: text/plain, Size: 13011 bytes --]

Hi,

I use a Nokia 515 connected via USB as a UMTS modem for my PC:
CPU0: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (family: 0x6, model: 0x17, stepping: 0xa)

Output of lsusb -v -d for the device is attached.

The battery of the phone is old, and sometimes the phone just turns off.
Up intil recently (before 5.6.10) turning on the phone again would restore the connection.

But roughly since commits 0afccd7601514c4b83d8cc58c740089cc447051d and/or  4e7279cd1d19f48f0af2a10ed020febaa9ac092 the kernel gets an Oops and eventually the system hangs.
The behaviour is reproducible by taking out the battery from the phone during a connection.

Tested wtih 5.6.10, 5.6.14 and 5.7-rc6. The following dmesg output is from 5.7-rc6:


# connect USB cable to phone
[   59.829803] usb 1-2: new high-speed USB device number 2 using ehci-pci
[   59.981396] usb 1-2: New USB device found, idVendor=0421, idProduct=06d3, bcdDevice=10.34
[   59.981399] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   59.981401] usb 1-2: Product: Nokia 515
[   59.981403] usb 1-2: Manufacturer: Nokia
[   59.981405] usb 1-2: SerialNumber: 357888058069763
[   59.999766] cdc_acm 1-2:1.3: ttyACM1: USB ACM device
[   60.168058] NET: Registered protocol family 35
[   60.220317] usbcore: registered new interface driver cdc_phonet
[   60.613363] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[   60.627472] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[   60.639107] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[   60.639109] cfg80211: failed to load regulatory.db
[   60.704801] usbcore: registered new interface driver cdc_ether
[   60.733393] usb 1-2: bad CDC descriptors
[   60.733721] usbcore: registered new interface driver rndis_host
[   60.781979] usb 1-2: bad CDC descriptors
[   60.782581] usbcore: registered new interface driver rndis_wlan

# remove USB cable from phone
[   69.543818] usb 1-2: USB disconnect, device number 2

# connect USB cable to phone
[   75.276486] usb 1-2: new high-speed USB device number 3 using ehci-pci
[   75.427623] usb 1-2: New USB device found, idVendor=0421, idProduct=06d3, bcdDevice=10.34
[   75.427626] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   75.427628] usb 1-2: Product: Nokia 515
[   75.427630] usb 1-2: Manufacturer: Nokia
[   75.427632] usb 1-2: SerialNumber: 357888058069763
[   75.446969] cdc_acm 1-2:1.3: ttyACM1: USB ACM device
[   75.447734] usb 1-2: bad CDC descriptors
[   75.447819] usb 1-2: bad CDC descriptors

# remove battery from phone
[  328.284229] usb 1-2: USB disconnect, device number 3
[  328.308826] cdc_acm 1-2:1.3: failed to set dtr/rts
[  332.757719] usb 1-2: new high-speed USB device number 4 using ehci-pci
[  332.906985] usb 1-2: New USB device found, idVendor=0421, idProduct=067a, bcdDevice= 0.04
[  332.906988] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[  332.906990] usb 1-2: Product: NOKIA USB ROM
[  332.906992] usb 1-2: Manufacturer: NOKIA
[  334.033533] usb 1-2: USB disconnect, device number 4

# ... similar messages for device number 5 until 9

# insert battery into phone
[  346.147723] usb 1-2: new high-speed USB device number 10 using ehci-pci
[  346.297258] usb 1-2: New USB device found, idVendor=0421, idProduct=06d3, bcdDevice=10.34
[  346.297261] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  346.297263] usb 1-2: Product: Nokia 515
[  346.297265] usb 1-2: Manufacturer: Nokia
[  346.297267] usb 1-2: SerialNumber: 357888058069763
[  346.314973] cdc_acm 1-2:1.3: ttyACM1: USB ACM device
[  346.315577] usb 1-2: bad CDC descriptors
[  346.316096] usb 1-2: bad CDC descriptors

# switch on phone
[  355.209929] BUG: unable to handle page fault for address: 000000003697c000
[  355.209934] #PF: supervisor read access in kernel mode
[  355.209936] #PF: error_code(0x0000) - not-present page
[  355.209938] PGD 0 P4D 0 
[  355.209942] Oops: 0000 [#1] PREEMPT SMP PTI
[  355.209945] CPU: 0 PID: 226 Comm: kworker/0:2 Not tainted 5.7.0-rc6 #1
[  355.209947] Hardware name: Dell Inc. OptiPlex 755                 /0GM819, BIOS A22 06/11/2012
[  355.209956] Workqueue: events acm_softint [cdc_acm]
[  355.209962] RIP: 0010:dev_vprintk_emit+0x57/0x1f8
[  355.209965] Code: 20 00 00 00 48 89 54 24 08 65 48 8b 04 25 28 00 00 00 48 89 84 24 a0 00 00 00 31 c0 f3 ab 48 8b 86 b8 02 00 00 48 85 c0 74 05 <4c> 8b 28 eb 12 48 8b 56 60 31 c9 48 85 d2 0f 84 4e 01 00 00 4c 8b
[  355.209967] RSP: 0018:ffffb38b40767c80 EFLAGS: 00010206
[  355.209970] RAX: 000000003697c000 RBX: ffff9bbcb8e36020 RCX: 0000000000000000
[  355.209972] RDX: ffffffff8b7afa65 RSI: ffff9bbcb8e36060 RDI: ffffb38b40767d20
[  355.209974] RBP: ffffb38b40767db0 R08: ffff9bbcc475b800 R09: ffffb38b40767dc0
[  355.209976] R10: 0000000000000003 R11: 0000000000000018 R12: ffff9bbcb8e367b0
[  355.209978] R13: ffff9bbcb8e36020 R14: ffffb38b40767ca0 R15: ffff9bbcb8e36060
[  355.209980] FS:  0000000000000000(0000) GS:ffff9bbcefe00000(0000) knlGS:0000000000000000
[  355.209983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  355.209985] CR2: 000000003697c000 CR3: 000000022b002000 CR4: 00000000000406f0
[  355.209987] Call Trace:
[  355.209995]  dev_printk_emit+0x5e/0x75
[  355.210000]  ? __switch_to_asm+0x40/0x70
[  355.210002]  ? __switch_to_asm+0x34/0x70
[  355.210004]  ? __switch_to_asm+0x40/0x70
[  355.210008]  ? tty_port_lower_dtr_rts+0x20/0x20
[  355.210011]  _dev_err+0x7b/0x92
[  355.210017]  acm_submit_read_urb.cold+0x1c/0x25 [cdc_acm]
[  355.210021]  acm_softint+0x65/0xe0 [cdc_acm]
[  355.210026]  process_one_work+0x1da/0x3d0
[  355.210030]  worker_thread+0x4a/0x3d0
[  355.210034]  kthread+0x122/0x160
[  355.210037]  ? process_one_work+0x3d0/0x3d0
[  355.210039]  ? kthread_park+0x90/0x90
[  355.210042]  ret_from_fork+0x35/0x40
[  355.210046] Modules linked in: ppp_deflate bsd_comp ppp_async ppp_generic slhc rndis_wlan rndis_host cdc_ether usbnet cfg80211 cdc_phonet phonet mii rfkill radeon snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio coretemp snd_hda_codec_hdmi kvm_intel snd_hda_intel i2c_algo_bit snd_emu10k1 ttm snd_intel_dspcfg snd_hda_codec drm_kms_helper kvm snd_util_mem snd_ac97_codec snd_hda_core snd_rawmidi snd_seq_device cec ac97_bus snd_hwdep snd_pcm rc_core mousedev input_leds syscopyarea hid_generic ppdev mei_wdt iTCO_wdt sysfillrect iTCO_vendor_support pktcdvd snd_timer dcdbas cdc_acm irqbypass emu10k1_gp gameport e1000e sysimgblt usbhid pcspkr parport_pc evdev snd mei_me mac_hid fb_sys_fops hid soundcore mei parport lpc_ich intel_agp i2c_i801 intel_gtt acpi_cpufreq drm sg agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uhci_hcd ata_generic floppy ehci_pci ehci_hcd sr_mod cdrom pata_acpi
[  355.210097] CR2: 000000003697c000
[  355.210101] ---[ end trace fbed6981d1f79e84 ]---
[  355.210104] RIP: 0010:dev_vprintk_emit+0x57/0x1f8
[  355.210107] Code: 20 00 00 00 48 89 54 24 08 65 48 8b 04 25 28 00 00 00 48 89 84 24 a0 00 00 00 31 c0 f3 ab 48 8b 86 b8 02 00 00 48 85 c0 74 05 <4c> 8b 28 eb 12 48 8b 56 60 31 c9 48 85 d2 0f 84 4e 01 00 00 4c 8b
[  355.210109] RSP: 0018:ffffb38b40767c80 EFLAGS: 00010206
[  355.210112] RAX: 000000003697c000 RBX: ffff9bbcb8e36020 RCX: 0000000000000000
[  355.210114] RDX: ffffffff8b7afa65 RSI: ffff9bbcb8e36060 RDI: ffffb38b40767d20
[  355.210116] RBP: ffffb38b40767db0 R08: ffff9bbcc475b800 R09: ffffb38b40767dc0
[  355.210118] R10: 0000000000000003 R11: 0000000000000018 R12: ffff9bbcb8e367b0
[  355.210120] R13: ffff9bbcb8e36020 R14: ffffb38b40767ca0 R15: ffff9bbcb8e36060
[  355.210122] FS:  0000000000000000(0000) GS:ffff9bbcefe00000(0000) knlGS:0000000000000000
[  355.210125] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  355.210127] CR2: 000000003697c000 CR3: 000000022b002000 CR4: 00000000000406f0
[  355.602759] ------------[ cut here ]------------
[  355.602766] WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:1473 __queue_work+0x38a/0x430
[  355.602768] Modules linked in: ppp_deflate bsd_comp ppp_async ppp_generic slhc rndis_wlan rndis_host cdc_ether usbnet cfg80211 cdc_phonet phonet mii rfkill radeon snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio coretemp snd_hda_codec_hdmi kvm_intel snd_hda_intel i2c_algo_bit snd_emu10k1 ttm snd_intel_dspcfg snd_hda_codec drm_kms_helper kvm snd_util_mem snd_ac97_codec snd_hda_core snd_rawmidi snd_seq_device cec ac97_bus snd_hwdep snd_pcm rc_core mousedev input_leds syscopyarea hid_generic ppdev mei_wdt iTCO_wdt sysfillrect iTCO_vendor_support pktcdvd snd_timer dcdbas cdc_acm irqbypass emu10k1_gp gameport e1000e sysimgblt usbhid pcspkr parport_pc evdev snd mei_me mac_hid fb_sys_fops hid soundcore mei parport lpc_ich intel_agp i2c_i801 intel_gtt acpi_cpufreq drm sg agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uhci_hcd ata_generic floppy ehci_pci ehci_hcd sr_mod cdrom pata_acpi
[  355.602811] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D           5.7.0-rc6 #1
[  355.602814] Hardware name: Dell Inc. OptiPlex 755                 /0GM819, BIOS A22 06/11/2012
[  355.602817] RIP: 0010:__queue_work+0x38a/0x430
[  355.602820] Code: 72 96 75 a9 00 01 1f 00 75 0f 65 48 8b 3c 25 c0 8b 01 00 f6 47 24 20 75 25 0f 0b 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b e9 bd fd ff ff 41 83 cc 02 48 8d 53 60 e9 a2 fd ff ff e8 fd
[  355.602822] RSP: 0018:ffffb38b40003e70 EFLAGS: 00010002
[  355.602825] RAX: ffff9bbcb8e36790 RBX: ffff9bbcefe32700 RCX: 0000000000000000
[  355.602827] RDX: ffff9bbcb8e36798 RSI: 0000000000000000 RDI: 0000000000000001
[  355.602829] RBP: 0000000000000025 R08: ffff9bbcee802238 R09: ffff9bbcee802260
[  355.602831] R10: 0000000000000000 R11: ffffffff8ba4c6b8 R12: ffff9bbceec18e00
[  355.602833] R13: 0000000000000000 R14: 0000000000000140 R15: ffff9bbcb8e36790
[  355.602836] FS:  0000000000000000(0000) GS:ffff9bbcefe00000(0000) knlGS:0000000000000000
[  355.602838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  355.602840] CR2: 00007fc9ae17f000 CR3: 000000022b800000 CR4: 00000000000406f0
[  355.602842] Call Trace:
[  355.602846]  <IRQ>
[  355.602851]  queue_work_on+0x36/0x40
[  355.602856]  __usb_hcd_giveback_urb+0x6f/0x120
[  355.602859]  usb_giveback_urb_bh+0xa6/0x100
[  355.602864]  tasklet_action_common.isra.0+0x5f/0x130
[  355.602868]  __do_softirq+0x111/0x34d
[  355.602872]  irq_exit+0xac/0xd0
[  355.602875]  do_IRQ+0x89/0x140
[  355.602878]  common_interrupt+0xf/0xf
[  355.602880]  </IRQ>
[  355.602883] RIP: 0010:mwait_idle+0x80/0x200
[  355.602886] Code: 8b 04 25 c0 8b 01 00 0f 01 c8 48 8b 00 a8 08 0f 85 74 01 00 00 e9 07 00 00 00 0f 00 2d 3b e5 43 00 31 c0 48 89 c1 fb 0f 01 c9 <e8> eb 39 ad ff 89 c5 66 66 66 66 90 65 48 8b 04 25 c0 8b 01 00 f0
[  355.602888] RSP: 0018:ffffffff8ba03e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc
[  355.602891] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  355.602893] RDX: 0000000000000000 RSI: 7fffffad34e49283 RDI: ffffffff8b815f90
[  355.602895] RBP: 0000000000000000 R08: 00000066a171aa2e R09: 00000052d20f9e9b
[  355.602897] R10: 0000000000000004 R11: 000000000008d19a R12: ffffffff8ba14840
[  355.602899] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8ba14840
[  355.602905]  do_idle+0x204/0x270
[  355.602909]  cpu_startup_entry+0x19/0x20
[  355.602913]  start_kernel+0x85f/0x884
[  355.602918]  secondary_startup_64+0xb6/0xc0
[  355.602923] ---[ end trace fbed6981d1f79e85 ]---
[  362.628141] cdc_acm 1-2:1.3: failed to set dtr/rts
[  372.634805] cdc_acm 1-2:1.4: urb 3 failed submission with -22
[  382.648907] cdc_acm 1-2:1.4: urb 3 failed submission with -22

# ... similar messages every 10 sconds

I tried to use gdb for more info about the messages. But I'm a novice here. If more or different info is needed, I'm happy to provide it.

$ gdb drivers/usb/class/cdc-acm.o 

(gdb) l *acm_softint+0x65
0x2665 is in acm_softint (/tmp/linux-5.7-rc6/drivers/usb/class/cdc-acm.c:587).
582				clear_bit(EVENT_RX_STALL, &acm->flags);
583			}
584		}
585	
586		if (test_and_clear_bit(ACM_ERROR_DELAY, &acm->flags)) {
587			for (i = 0; i < ACM_NR; i++) 
588				if (test_and_clear_bit(i, &acm->urbs_in_error_delay))
589						acm_submit_read_urb(acm, i, GFP_NOIO);
590		}
591	

(gdb) l acm_submit_read_urb.cold
428			return 0;
429	
430		res = usb_submit_urb(acm->read_urbs[index], mem_flags);
431		if (res) {
432			if (res != -EPERM && res != -ENODEV) {
433				dev_err(&acm->data->dev,
434					"urb %d failed submission with %d\n",
435					index, res);
436			} else {
437				dev_vdbg(&acm->data->dev, "intended failure %d\n", res);

(gdb) l *acm_submit_read_urb+0x1c
0x206c is in acm_submit_read_urb (/tmp/linux-5.7-rc6/arch/x86/include/asm/bitops.h:162).
157	}
158	
159	static __always_inline bool
160	arch_test_and_clear_bit(long nr, volatile unsigned long *addr)
161	{
162		return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btr), *addr, c, "Ir", nr);
163	}
164	
165	/*
166	 * Note: the operation is performed atomically with respect to

Regards,
Jean Rene Dawin

[-- Attachment #2: nokia515_lsusb --]
[-- Type: text/plain, Size: 10220 bytes --]


Bus 001 Device 017: ID 0421:06d3 Nokia Mobile Phones Nokia 515
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            2 Communications
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x0421 Nokia Mobile Phones
  idProduct          0x06d3 
  bcdDevice           10.34
  iManufacturer           1 Nokia
  iProduct                2 Nokia 515
  iSerial                 3 357888058069763
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x0124
    bNumInterfaces          9
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              500mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass         2 Communications
      bInterfaceSubClass      8 Wireless Handset Control
      bInterfaceProtocol      0 
      iInterface              0 
      CDC Header:
        bcdCDC               1.10
      CDC WHCM:
        bcdVersion           1.00
      CDC Union:
        bMasterInterface        0
        bSlaveInterface         1 2 3 4 5 6 7 8 
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass         2 Communications
      bInterfaceSubClass    254 
      bInterfaceProtocol      0 
      iInterface              0 
      CDC Header:
        bcdCDC               1.10
      UNRECOGNIZED CDC:  05 24 ab 05 71
      CDC Union:
        bMasterInterface        1
        bSlaveInterface         2 
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass        10 CDC Data
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       1
      bNumEndpoints           2
      bInterfaceClass        10 CDC Data
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      ** UNRECOGNIZED:  04 24 fd 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x01  EP 1 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        3
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         2 Communications
      bInterfaceSubClass      2 Abstract (modem)
      bInterfaceProtocol      1 AT-commands (v.25ter)
      iInterface              0 
      CDC Header:
        bcdCDC               1.10
      CDC ACM:
        bmCapabilities       0x06
          sends break
          line coding and serial state
      CDC Call Management:
        bmCapabilities       0x03
          call management
          use DataInterface
        bDataInterface          4
      CDC Union:
        bMasterInterface        3
        bSlaveInterface         4 
      UNRECOGNIZED CDC:  04 24 fd 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               9
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        4
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass        10 CDC Data
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      ** UNRECOGNIZED:  04 24 fd 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x03  EP 3 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        5
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass         2 Communications
      bInterfaceSubClass     11 OBEX
      bInterfaceProtocol      0 
      iInterface              0 
      CDC Header:
        bcdCDC               1.10
      CDC OBEX:
        bcdVersion           1.00
      CDC Union:
        bMasterInterface        5
        bSlaveInterface         6 
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        6
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass        10 CDC Data
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        6
      bAlternateSetting       1
      bNumEndpoints           2
      bInterfaceClass        10 CDC Data
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      ** UNRECOGNIZED:  04 24 fd 01
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x84  EP 4 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x04  EP 4 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        7
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         2 Communications
      bInterfaceSubClass      2 Abstract (modem)
      bInterfaceProtocol    255 Vendor Specific (MSFT RNDIS?)
      iInterface              0 
      CDC Header:
        bcdCDC               1.10
      CDC ACM:
        bmCapabilities       0x06
          sends break
          line coding and serial state
      CDC Call Management:
        bmCapabilities       0x03
          call management
          use DataInterface
        bDataInterface          8
      CDC Union:
        bMasterInterface        7
        bSlaveInterface         8 
      UNRECOGNIZED CDC:  04 24 fd 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x85  EP 5 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval              11
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        8
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass        10 CDC Data
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      ** UNRECOGNIZED:  04 24 fd 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x86  EP 6 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x06  EP 6 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            2 Communications
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  bNumConfigurations      1
Device Status:     0x0000
  (Bus Powered)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-25 12:00 Kernel Oops in cdc_acm Jean Rene Dawin
@ 2020-05-25 12:28 ` Oliver Neukum
  2020-05-25 19:16   ` Jean Rene Dawin
  0 siblings, 1 reply; 12+ messages in thread
From: Oliver Neukum @ 2020-05-25 12:28 UTC (permalink / raw)
  To: Jean Rene Dawin, linux-usb

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

Am Montag, den 25.05.2020, 14:00 +0200 schrieb Jean Rene Dawin:
> Hi,
> 
> I use a Nokia 515 connected via USB as a UMTS modem for my PC:
> CPU0: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (family: 0x6, model: 0x17, stepping: 0xa)
> 
> Output of lsusb -v -d for the device is attached.
> 
> The battery of the phone is old, and sometimes the phone just turns off.
> Up intil recently (before 5.6.10) turning on the phone again would restore the connection.
> 
> But roughly since commits 0afccd7601514c4b83d8cc58c740089cc447051d and/or  4e7279cd1d19f48f0af2a10ed020febaa9ac092 the kernel gets an Oops and eventually the system hangs.
> The behaviour is reproducible by taking out the battery from the phone during a connection.
> 
> Tested wtih 5.6.10, 5.6.14 and 5.7-rc6. The following dmesg output is from 5.7-rc6:

Hi,

it looks to me like I made a mistake in fixing the error handling for
some devices. Could you test the attached patch?

	Regards
		Oliver

[-- Attachment #2: CDC-ACM-heed-quirk-also-in-error-handling.patch --]
[-- Type: text/x-patch, Size: 954 bytes --]

From 338fe738603d2612a317c9bec98236eb094ae109 Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum@suse.com>
Date: Mon, 25 May 2020 14:21:44 +0200
Subject: [PATCH] CDC-ACM: heed quirk also in error handling

If buffers are iterated over in the error case, the lower limits
for quirky devices must be heeded.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
---
 drivers/usb/class/cdc-acm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 7678ae4afd53..be4543569822 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -585,7 +585,7 @@ static void acm_softint(struct work_struct *work)
 	}
 
 	if (test_and_clear_bit(ACM_ERROR_DELAY, &acm->flags)) {
-		for (i = 0; i < ACM_NR; i++) 
+		for (i = 0; i < acm->rx_buflimit; i++) 
 			if (test_and_clear_bit(i, &acm->urbs_in_error_delay))
 					acm_submit_read_urb(acm, i, GFP_NOIO);
 	}
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-25 12:28 ` Oliver Neukum
@ 2020-05-25 19:16   ` Jean Rene Dawin
  2020-05-26  8:08     ` Oliver Neukum
  2020-05-26 11:13     ` Oliver Neukum
  0 siblings, 2 replies; 12+ messages in thread
From: Jean Rene Dawin @ 2020-05-25 19:16 UTC (permalink / raw)
  To: linux-usb; +Cc: Oliver Neukum

Oliver Neukum wrote on Mon 25/05/20 14:28:
> 
> Hi,
> 
> it looks to me like I made a mistake in fixing the error handling for
> some devices. Could you test the attached patch?
> 
> 	Regards
> 		Oliver

> From 338fe738603d2612a317c9bec98236eb094ae109 Mon Sep 17 00:00:00 2001
> From: Oliver Neukum <oneukum@suse.com>
> Date: Mon, 25 May 2020 14:21:44 +0200
> Subject: [PATCH] CDC-ACM: heed quirk also in error handling
> 
> If buffers are iterated over in the error case, the lower limits
> for quirky devices must be heeded.
> 
> Signed-off-by: Oliver Neukum <oneukum@suse.com>
> ---
>  drivers/usb/class/cdc-acm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
> index 7678ae4afd53..be4543569822 100644
> --- a/drivers/usb/class/cdc-acm.c
> +++ b/drivers/usb/class/cdc-acm.c
> @@ -585,7 +585,7 @@ static void acm_softint(struct work_struct *work)
>  	}
>  
>  	if (test_and_clear_bit(ACM_ERROR_DELAY, &acm->flags)) {
> -		for (i = 0; i < ACM_NR; i++) 
> +		for (i = 0; i < acm->rx_buflimit; i++) 
>  			if (test_and_clear_bit(i, &acm->urbs_in_error_delay))
>  					acm_submit_read_urb(acm, i, GFP_NOIO);
>  	}
> -- 
> 2.16.4
> 

Hi,

thanks for the quick reply and the patch.

With the patch it is a different behaviour:

The first time the battery is removed and inserted again, everything is
quite fine. Except that the USB cable has to be plugged out and in for the
connection to be re-established. But that was needed in previous
kernels, too.

But if the battery is removed a second time, and inserted again and the
phone is turned on, it results in the traces below:


[  152.220209] usb 1-2: new high-speed USB device number 3 using ehci-pci
[  152.370626] usb 1-2: New USB device found, idVendor=0421, idProduct=06d3, bcdDevice=10.34
[  152.370629] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  152.370631] usb 1-2: Product: Nokia 515
[  152.370633] usb 1-2: Manufacturer: Nokia
[  152.370635] usb 1-2: SerialNumber: 357888058069763
[  152.389002] cdc_acm 1-2:1.3: ttyACM1: USB ACM device
[  152.389614] usb 1-2: bad CDC descriptors
[  152.389689] usb 1-2: bad CDC descriptors

# remove battery first time
[  197.530805] usb 1-2: USB disconnect, device number 3
[  197.552601] cdc_acm 1-2:1.3: failed to set dtr/rts
[  197.847038] usb 1-2: new high-speed USB device number 4 using ehci-pci
[  197.996385] usb 1-2: New USB device found, idVendor=0421, idProduct=067a, bcdDevice= 0.04
[  197.996388] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[  197.996390] usb 1-2: Product: NOKIA USB ROM
[  197.996392] usb 1-2: Manufacturer: NOKIA
[  199.122934] usb 1-2: USB disconnect, device number 4

# ... similar for device 5 to 14

# insert battery and turn on phone
[  239.883845] usb 1-2: new high-speed USB device number 15 using ehci-pci
[  240.034011] usb 1-2: New USB device found, idVendor=0421, idProduct=06d3, bcdDevice=10.34
[  240.034014] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  240.034016] usb 1-2: Product: Nokia 515
[  240.034018] usb 1-2: Manufacturer: Nokia
[  240.034020] usb 1-2: SerialNumber: 357888058069763
[  240.051887] cdc_acm 1-2:1.3: ttyACM1: USB ACM device
[  240.052498] usb 1-2: bad CDC descriptors
[  240.052574] usb 1-2: bad CDC descriptors

# remove USB cable because network doesn't come up automatically
[  416.475847] usb 1-2: USB disconnect, device number 15
[  416.509698] cdc_acm 1-2:1.3: failed to set dtr/rts

# insert USB cable
[  420.801127] usb 1-2: new high-speed USB device number 16 using ehci-pci
[  420.952738] usb 1-2: New USB device found, idVendor=0421, idProduct=06d3, bcdDevice=10.34
[  420.952741] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  420.952743] usb 1-2: Product: Nokia 515
[  420.952745] usb 1-2: Manufacturer: Nokia
[  420.952747] usb 1-2: SerialNumber: 357888058069763
[  420.971623] cdc_acm 1-2:1.3: ttyACM1: USB ACM device
[  420.972474] usb 1-2: bad CDC descriptors
[  420.972551] usb 1-2: bad CDC descriptors
# Connection ok. Transferring data

# remove battery second time
[  525.734297] usb 1-2: USB disconnect, device number 16
[  525.753834] cdc_acm 1-2:1.3: failed to set dtr/rts
[  526.048151] usb 1-2: new high-speed USB device number 17 using ehci-pci
[  526.197512] usb 1-2: New USB device found, idVendor=0421, idProduct=067a, bcdDevice= 0.04
[  526.197516] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[  526.197518] usb 1-2: Product: NOKIA USB ROM
[  526.197520] usb 1-2: Manufacturer: NOKIA
[  527.324329] usb 1-2: USB disconnect, device number 17

# ... similar for device 18 to 36

# insert battery
[  564.928283] usb 1-2: new high-speed USB device number 37 using ehci-pci
[  565.078132] usb 1-2: New USB device found, idVendor=0421, idProduct=06d3, bcdDevice=10.34
[  565.078135] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  565.078137] usb 1-2: Product: Nokia 515
[  565.078139] usb 1-2: Manufacturer: Nokia
[  565.078141] usb 1-2: SerialNumber: 357888058069763
[  565.095618] cdc_acm 1-2:1.3: ttyACM1: USB ACM device
[  565.096238] usb 1-2: bad CDC descriptors
[  565.096321] usb 1-2: bad CDC descriptors

# turn on phone
[  572.928634] ------------[ cut here ]------------
[  572.928643] WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:1473 __queue_work+0x38a/0x430
[  572.928644] Modules linked in: ppp_deflate bsd_comp ppp_async ppp_generic slhc rndis_wlan rndis_host cdc_ether usbnet cfg80211 cdc_phonet phonet mii rfkill radeon snd_hda_codec_analog snd_hda_codec_generic coretemp i2c_algo_bit kvm_intel ttm ledtrig_audio snd_hda_codec_hdmi kvm snd_emu10k1 drm_kms_helper snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_util_mem snd_ac97_codec snd_hda_core cec mousedev input_leds rc_core snd_rawmidi snd_seq_device hid_generic ac97_bus iTCO_wdt snd_hwdep syscopyarea sysfillrect snd_pcm ppdev dcdbas mei_wdt iTCO_vendor_support irqbypass e1000e sysimgblt pktcdvd fb_sys_fops cdc_acm snd_timer usbhid parport_pc snd mei_me pcspkr parport emu10k1_gp evdev hid gameport mei i2c_i801 mac_hid intel_agp lpc_ich soundcore intel_gtt acpi_cpufreq drm sg agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sr_mod uhci_hcd cdrom ehci_pci ehci_hcd ata_generic floppy pata_acpi
[  572.928695] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc6 #1
[  572.928696] Hardware name: Dell Inc. OptiPlex 755                 /0GM819, BIOS A22 06/11/2012
[  572.928699] RIP: 0010:__queue_work+0x38a/0x430
[  572.928701] Code: 72 b6 49 a9 00 01 1f 00 75 0f 65 48 8b 3c 25 c0 8b 01 00 f6 47 24 20 75 25 0f 0b 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b e9 bd fd ff ff 41 83 cc 02 48 8d 53 60 e9 a2 fd ff ff e8 fd
[  572.928703] RSP: 0018:ffffb99dc0003e70 EFLAGS: 00010002
[  572.928705] RAX: ffffa28a6a1cd790 RBX: ffffa28a6fe32700 RCX: 0000000000000000
[  572.928706] RDX: ffffa28a6a1cd798 RSI: 0000000000000000 RDI: 0000000000000001
[  572.928707] RBP: 0000000000000011 R08: ffffa28a6e802238 R09: ffffa28a6e802260
[  572.928708] R10: 0000000000000000 R11: ffffffffb784c6b8 R12: ffffa28a6ec18e00
[  572.928710] R13: 0000000000000000 R14: 0000000000000140 R15: ffffa28a6a1cd790
[  572.928712] FS:  0000000000000000(0000) GS:ffffa28a6fe00000(0000) knlGS:0000000000000000
[  572.928713] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  572.928715] CR2: 00007f2b4d8a8410 CR3: 0000000227670000 CR4: 00000000000406f0
[  572.928716] Call Trace:
[  572.928719]  <IRQ>
[  572.928725]  queue_work_on+0x36/0x40
[  572.928729]  __usb_hcd_giveback_urb+0x6f/0x120
[  572.928732]  usb_giveback_urb_bh+0xa6/0x100
[  572.928736]  tasklet_action_common.isra.0+0x5f/0x130
[  572.928740]  __do_softirq+0x111/0x34d
[  572.928743]  irq_exit+0xac/0xd0
[  572.928745]  do_IRQ+0x89/0x140
[  572.928748]  common_interrupt+0xf/0xf
[  572.928750]  </IRQ>
[  572.928752] RIP: 0010:mwait_idle+0x80/0x200
[  572.928754] Code: 8b 04 25 c0 8b 01 00 0f 01 c8 48 8b 00 a8 08 0f 85 74 01 00 00 e9 07 00 00 00 0f 00 2d 3b e5 43 00 31 c0 48 89 c1 fb 0f 01 c9 <e8> eb 39 ad ff 89 c5 66 66 66 66 90 65 48 8b 04 25 c0 8b 01 00 f0
[  572.928755] RSP: 0018:ffffffffb7803e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb
[  572.928757] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  572.928758] RDX: 0000000000000000 RSI: 7fffff7a9f630eae RDI: ffffffffb7615f90
[  572.928759] RBP: 0000000000000000 R08: 00000066a171aa2e R09: 000000856102aafb
[  572.928761] R10: 0000000000000301 R11: 000000000000000c R12: ffffffffb7814840
[  572.928762] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffb7814840
[  572.928768]  do_idle+0x204/0x270
[  572.928772]  cpu_startup_entry+0x19/0x20
[  572.928775]  start_kernel+0x85f/0x884
[  572.928780]  secondary_startup_64+0xb6/0xc0
[  572.928784] ---[ end trace 61951ee4bf36f1be ]---
[  573.888424] usb 1-2: USB disconnect, device number 37

# no more messages

(gdb) l *usb_giveback_urb_bh+0xa6
0xfd6 is in usb_giveback_urb_bh (/tmp/linux-5.7-rc6/drivers/usb/core/hcd.c:1675).
1670	
1671			urb = list_entry(local_list.next, struct urb, urb_list);
1672			list_del_init(&urb->urb_list);
1673			bh->completing_ep = urb->ep;
1674			__usb_hcd_giveback_urb(urb);
1675			bh->completing_ep = NULL;
1676		}
1677	
1678		/* check if there are new URBs to giveback */
1679		spin_lock_irq(&bh->lock);
(gdb) l *__usb_hcd_giveback_urb+0x6f
0xe7f is in __usb_hcd_giveback_urb (/tmp/linux-5.7-rc6/drivers/usb/core/hcd.c:1650).
1645	
1646		/* pass ownership to the completion handler */
1647		urb->status = status;
1648		urb->complete(urb);
1649	
1650		usb_anchor_resume_wakeups(anchor);
1651		atomic_dec(&urb->use_count);
1652		if (unlikely(atomic_read(&urb->reject)))
1653			wake_up(&usb_kill_urb_queue);
1654		usb_put_urb(urb);
(gdb) 

Regards,
Jean Rene

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-25 19:16   ` Jean Rene Dawin
@ 2020-05-26  8:08     ` Oliver Neukum
  2020-05-26 11:16       ` Jean Rene Dawin
  2020-05-26 11:13     ` Oliver Neukum
  1 sibling, 1 reply; 12+ messages in thread
From: Oliver Neukum @ 2020-05-26  8:08 UTC (permalink / raw)
  To: Jean Rene Dawin, linux-usb

Am Montag, den 25.05.2020, 21:16 +0200 schrieb Jean Rene Dawin:


Hi,

good, so the primary bug is fixed. I will send it upstream.
May I include your "Reported-by:" to give you the deserved fame?

> The first time the battery is removed and inserted again, everything is
> quite fine. Except that the USB cable has to be plugged out and in for the
> connection to be re-established. But that was needed in previous
> kernels, too.

Yes, I know. It sucks. But I cannot tell that it is the same device
in the kernel. Nor can I reestablish the connection, as it is done with
AT commands. It also means I cannot support reset_resume(). It sucks.
Any ideas?

I can only encourage you to contact the people developing the tool set
you are using.

> # turn on phone
> [  572.928634] ------------[ cut here ]------------
> [  572.928643] WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:1473 __queue_work+0x38a/0x430

This is extremely interesting and unexpected. It is in USB core. I am
looking into this.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-25 19:16   ` Jean Rene Dawin
  2020-05-26  8:08     ` Oliver Neukum
@ 2020-05-26 11:13     ` Oliver Neukum
  2020-05-26 19:57       ` Jean Rene Dawin
  2020-05-27  8:28       ` Jean Rene Dawin
  1 sibling, 2 replies; 12+ messages in thread
From: Oliver Neukum @ 2020-05-26 11:13 UTC (permalink / raw)
  To: Jean Rene Dawin, linux-usb

Am Montag, den 25.05.2020, 21:16 +0200 schrieb Jean Rene Dawin:
> Oliver Neukum wrote on Mon 25/05/20 14:28:
> > 
> > Hi,
> > 
> > it looks to me like I made a mistake in fixing the error handling for
> > some devices. Could you test the attached patch?
> > 
> > 	Regards
> > 		Oliver
> > From 338fe738603d2612a317c9bec98236eb094ae109 Mon Sep 17 00:00:00 2001
> > From: Oliver Neukum <oneukum@suse.com>
> > Date: Mon, 25 May 2020 14:21:44 +0200
> > Subject: [PATCH] CDC-ACM: heed quirk also in error handling
> > 
> > If buffers are iterated over in the error case, the lower limits
> > for quirky devices must be heeded.
> > 
> > Signed-off-by: Oliver Neukum <oneukum@suse.com>
> > ---
> >  drivers/usb/class/cdc-acm.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
> > index 7678ae4afd53..be4543569822 100644
> > --- a/drivers/usb/class/cdc-acm.c
> > +++ b/drivers/usb/class/cdc-acm.c
> > @@ -585,7 +585,7 @@ static void acm_softint(struct work_struct *work)
> >  	}
> >  
> >  	if (test_and_clear_bit(ACM_ERROR_DELAY, &acm->flags)) {
> > -		for (i = 0; i < ACM_NR; i++) 
> > +		for (i = 0; i < acm->rx_buflimit; i++) 
> >  			if (test_and_clear_bit(i, &acm->urbs_in_error_delay))
> >  					acm_submit_read_urb(acm, i, GFP_NOIO);
> >  	}
> > -- 
> > 2.16.4
> > 
> 
> Hi,
> 
> thanks for the quick reply and the patch.
> 
> With the patch it is a different behaviour:
> 
> The first time the battery is removed and inserted again, everything is
> quite fine. Except that the USB cable has to be plugged out and in for the
> connection to be re-established. But that was needed in previous
> kernels, too.
> 
> But if the battery is removed a second time, and inserted again and the
> phone is turned on, it results in the traces below:

Hi,

may I ask whether you did the test with removing the battery twice with
an older kernel? Could you please go back to
f6cc6093a729ede1ff5658b493237c42b82ba107
and repeat the test of a second battery removal with that state?
I just cannot find anything pointing to a change that could cause
this issue within that time.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-26  8:08     ` Oliver Neukum
@ 2020-05-26 11:16       ` Jean Rene Dawin
  0 siblings, 0 replies; 12+ messages in thread
From: Jean Rene Dawin @ 2020-05-26 11:16 UTC (permalink / raw)
  To: linux-usb; +Cc: Oliver Neukum

Oliver Neukum wrote on Tue 26/05/20 10:08:
> Am Montag, den 25.05.2020, 21:16 +0200 schrieb Jean Rene Dawin:
>
>
> Hi,
>
> good, so the primary bug is fixed. I will send it upstream.
> May I include your "Reported-by:" to give you the deserved fame?
Yes.

> Yes, I know. It sucks. But I cannot tell that it is the same device
> in the kernel. Nor can I reestablish the connection, as it is done with
> AT commands. It also means I cannot support reset_resume(). It sucks.
> Any ideas?
Not really. I will have to check if the ppp daemon can handle that.

Regards,
Jean Rene

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-26 11:13     ` Oliver Neukum
@ 2020-05-26 19:57       ` Jean Rene Dawin
  2020-05-27  8:53         ` Oliver Neukum
  2020-05-27  8:28       ` Jean Rene Dawin
  1 sibling, 1 reply; 12+ messages in thread
From: Jean Rene Dawin @ 2020-05-26 19:57 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-usb

Oliver Neukum wrote on Tue 26/05/20 13:13:
> Hi,
> 
> may I ask whether you did the test with removing the battery twice with
> an older kernel? Could you please go back to
> f6cc6093a729ede1ff5658b493237c42b82ba107
> and repeat the test of a second battery removal with that state?
> I just cannot find anything pointing to a change that could cause
> this issue within that time.

Hi,

testing with f6cc6093a729ede1ff5658b493237c42b82ba107 looks like this:

=========== 1 =============
# remove battery first time
[  447.818693] usb 1-2: USB disconnect, device number 2
[  447.856667] cdc_acm 1-2:1.3: failed to set dtr/rts
# loop disconnect/connect

# insert battery first time
[  481.267878] usb 1-2: new high-speed USB device number 20 using ehci-pci

# turn on phone first time
[  488.984876] cdc_acm 1-2:1.3: failed to set dtr/rts
[  490.562661] usb 1-2: USB disconnect, device number 20
[  504.341285] usb 1-2: new high-speed USB device number 21 using ehci-pci
# unplug/plug USB cable - network ok

=========== 2 =============
# remove battery second time
[  576.310900] usb 1-2: USB disconnect, device number 22
[  576.334020] cdc_acm 1-2:1.3: failed to set dtr/rts
# loop disconnect/connect

# insert battery second time
[  606.764940] usb 1-2: new high-speed USB device number 38 using ehci-pci

# turn on phone second time
[  616.398126] usb 1-2: USB disconnect, device number 38
[  616.418763] cdc_acm 1-2:1.3: acm_port_activate - usb_submit_urb(ctrl irq) failed
[  630.208351] usb 1-2: new high-speed USB device number 39 using ehci-pci
# unplug/plug USB cable - network ok

=========== 3 =============
# remove battery third time
[  711.081997] usb 1-2: USB disconnect, device number 40
[  711.099678] cdc_acm 1-2:1.3: failed to set dtr/rts
# loop disconnect/connect

# insert battery third time
[  744.458707] usb 1-2: new high-speed USB device number 58 using ehci-pci

# turn on phone third time
[  751.195202] cdc_acm 1-2:1.3: failed to set dtr/rts
[  752.213695] cdc_acm 1-2:1.3: failed to set dtr/rts
[  752.457590] usb 1-2: USB disconnect, device number 58
# unplug/plug USB cable - network ok

Interesting is, that at the second time, the usb disconnect message
conmes first, and then a message from cdc_acm acm_port_activate.

Regards,
Jean Rene

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-26 11:13     ` Oliver Neukum
  2020-05-26 19:57       ` Jean Rene Dawin
@ 2020-05-27  8:28       ` Jean Rene Dawin
  2020-05-27  9:33         ` Oliver Neukum
  1 sibling, 1 reply; 12+ messages in thread
From: Jean Rene Dawin @ 2020-05-27  8:28 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-usb

Oliver Neukum wrote on Tue 26/05/20 13:13:
> 
> Hi,
> 
> may I ask whether you did the test with removing the battery twice with
> an older kernel? Could you please go back to
> f6cc6093a729ede1ff5658b493237c42b82ba107
> and repeat the test of a second battery removal with that state?
> I just cannot find anything pointing to a change that could cause
> this issue within that time.


Hi,

I tested again with 5.7-rc6 and the following applied to drivers/usb/class/cdc-acm.c :

--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -579,8 +579,8 @@ static void acm_softint(struct work_struct *work)
                                usb_kill_urb(acm->read_urbs[i]);
                        usb_clear_halt(acm->dev, acm->in);
                        acm_submit_read_urbs(acm, GFP_KERNEL);
-                       clear_bit(EVENT_RX_STALL, &acm->flags);
                }
+               clear_bit(EVENT_RX_STALL, &acm->flags);
        }

        if (test_and_clear_bit(ACM_ERROR_DELAY, &acm->flags)) {

This in addtion to your earlier patch :

> > > diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
> > > index 7678ae4afd53..be4543569822 100644
> > > --- a/drivers/usb/class/cdc-acm.c
> > > +++ b/drivers/usb/class/cdc-acm.c
> > > @@ -585,7 +585,7 @@ static void acm_softint(struct work_struct *work)
> > >   }
> > >
> > >   if (test_and_clear_bit(ACM_ERROR_DELAY, &acm->flags)) {
> > > -         for (i = 0; i < ACM_NR; i++)
> > > +         for (i = 0; i < acm->rx_buflimit; i++)
> > >                   if (test_and_clear_bit(i, &acm->urbs_in_error_delay))
> > >                                   acm_submit_read_urb(acm, i, GFP_NOIO);


And with these patches the behaviour seems to me like with
f6cc6093a729ede1ff5658b493237c42b82ba10. I'm not sure if this is correct
but to me it looked odd that before 0afccd7601514c4b83d8cc58c740089cc447051d the function

   clear_bit(EVENT_RX_STALL, &acm->flags);


was always called when the block of

   if (test_bit(EVENT_RX_STALL, &acm->flags)) {

was entered.

But since 0afccd7601514c4b83d8cc58c740089cc447051d
only when condition

   if (!acm->susp_count) {

was also fulfilled:

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 84d6f7df09a4..4ef68e6671aa 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -557,14 +557,14 @@ static void acm_softint(struct work_struct *work)
        struct acm *acm = container_of(work, struct acm, work);

        if (test_bit(EVENT_RX_STALL, &acm->flags)) {
-               if (!(usb_autopm_get_interface(acm->data))) {
+               smp_mb(); /* against acm_suspend() */
+               if (!acm->susp_count) {
                        for (i = 0; i < acm->rx_buflimit; i++)
                                usb_kill_urb(acm->read_urbs[i]);
                        usb_clear_halt(acm->dev, acm->in);
                        acm_submit_read_urbs(acm, GFP_KERNEL);
-                       usb_autopm_put_interface(acm->data);
+                       clear_bit(EVENT_RX_STALL, &acm->flags);
                }
-               clear_bit(EVENT_RX_STALL, &acm->flags);
        }

        if (test_and_clear_bit(EVENT_TTY_WAKEUP, &acm->flags))

Regards,
Jean Rene


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-26 19:57       ` Jean Rene Dawin
@ 2020-05-27  8:53         ` Oliver Neukum
  2020-05-28  8:51           ` Jean Rene Dawin
  0 siblings, 1 reply; 12+ messages in thread
From: Oliver Neukum @ 2020-05-27  8:53 UTC (permalink / raw)
  To: Jean Rene Dawin; +Cc: linux-usb

Am Dienstag, den 26.05.2020, 21:57 +0200 schrieb Jean Rene Dawin:
> Oliver Neukum wrote on Tue 26/05/20 13:13:
> > Hi,
> > 
> > may I ask whether you did the test with removing the battery twice with
> > an older kernel? Could you please go back to
> > f6cc6093a729ede1ff5658b493237c42b82ba107
> > and repeat the test of a second battery removal with that state?
> > I just cannot find anything pointing to a change that could cause
> > this issue within that time.
> 
> Hi,
> 
> testing with f6cc6093a729ede1ff5658b493237c42b82ba107 looks like this:

OK, we have two possibilities here. Either
a4e7279cd1d19f48f0af2a10ed020febaa9ac092 or
0afccd7601514c4b83d8cc58c740089cc447051d

have had a really wierd effect, or they introduced a bug
that hid a later bug. Can I ask you to run a complicated test
to decide between these possibilities?

Could you test a4e7279cd1d19f48f0af2a10ed020febaa9ac092
together with the patch I sent you applied on top?

> Interesting is, that at the second time, the usb disconnect message
> conmes first, and then a message from cdc_acm acm_port_activate.

Basically you are deliberately creating a race condition between error
handling and disconnect. That is a feature of the HC design.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-27  8:28       ` Jean Rene Dawin
@ 2020-05-27  9:33         ` Oliver Neukum
  0 siblings, 0 replies; 12+ messages in thread
From: Oliver Neukum @ 2020-05-27  9:33 UTC (permalink / raw)
  To: Jean Rene Dawin; +Cc: linux-usb

Am Mittwoch, den 27.05.2020, 10:28 +0200 schrieb Jean Rene Dawin:

Hi,

> But since 0afccd7601514c4b83d8cc58c740089cc447051d
> only when condition
> 
>    if (!acm->susp_count) {
> 
> was also fulfilled:

This is odd, because it is actually a bug to clear a HALT on a
suspended device. Yet you have tested.
Can I ask you to retest (twice removal) with
0afccd7601514c4b83d8cc58c740089cc447051d

so that we can be sure, that it causes this issue?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-27  8:53         ` Oliver Neukum
@ 2020-05-28  8:51           ` Jean Rene Dawin
  2020-06-03  7:26             ` Jean Rene Dawin
  0 siblings, 1 reply; 12+ messages in thread
From: Jean Rene Dawin @ 2020-05-28  8:51 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-usb

Oliver Neukum wrote on Wed 27/05/20 10:53:
> OK, we have two possibilities here. Either
> a4e7279cd1d19f48f0af2a10ed020febaa9ac092 or
> 0afccd7601514c4b83d8cc58c740089cc447051d
> 
> have had a really wierd effect, or they introduced a bug
> that hid a later bug. Can I ask you to run a complicated test
> to decide between these possibilities?
> 
> Could you test a4e7279cd1d19f48f0af2a10ed020febaa9ac092
> together with the patch I sent you applied on top?

Hi,

I tested 0afccd7601514c4b83d8cc58c740089cc447051d and couldn't get it to
show the symptoms.

Then I tested a4e7279cd1d19f48f0af2a10ed020febaa9ac092 with your patch
applied and it still showed the symptom

[  730.590055] Call Trace:
[  730.590058]  <IRQ>
[  730.590062]  queue_work_on+0x36/0x40
[  730.590065]  __usb_hcd_giveback_urb+0x6f/0x120
[  730.590067]  usb_giveback_urb_bh+0xa6/0x100
[  730.590070]  tasklet_action_common.isra.0+0x5f/0x130
[  730.590074]  __do_softirq+0x111/0x34d
[  730.590077]  irq_exit+0xac/0xd0
[  730.590078]  do_IRQ+0x89/0x140
[  730.590080]  common_interrupt+0xf/0xf

but very non-deterministic. The first time the error came up after
the battery was removed the second time. Then after the fourth time. 
And testing again this morning it was after six removals.
I will try to narrow the conditions that tirgger the behaviour.
It seems the timing is important.

Regards,
Jean Rene

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel Oops in cdc_acm
  2020-05-28  8:51           ` Jean Rene Dawin
@ 2020-06-03  7:26             ` Jean Rene Dawin
  0 siblings, 0 replies; 12+ messages in thread
From: Jean Rene Dawin @ 2020-06-03  7:26 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-usb

Jean Rene Dawin wrote on Thu 28/05/20 10:51:
> Oliver Neukum wrote on Wed 27/05/20 10:53:
> > OK, we have two possibilities here. Either
> > a4e7279cd1d19f48f0af2a10ed020febaa9ac092 or
> > 0afccd7601514c4b83d8cc58c740089cc447051d
> 
> Then I tested a4e7279cd1d19f48f0af2a10ed020febaa9ac092 with your patch
> applied and it still showed the symptom

Hi,

more testing shows the crash can be triggered by 

- romving the battery the first time (but only sometimes)
- re-insertiing battery and turning on the phone (after some interval)

The trace when crashing looks like this:

[  122.890637] Call Trace:
[  122.890640]  <IRQ>
[  122.890645]  queue_work_on+0x36/0x40
[  122.890650]  __usb_hcd_giveback_urb+0x6f/0x120
[  122.890653]  usb_giveback_urb_bh+0xa6/0x100
[  122.890657]  tasklet_action_common.isra.0+0x5f/0x130
[  122.890661]  __do_softirq+0x111/0x34d
[  122.890665]  irq_exit+0xac/0xd0
[  122.890667]  do_IRQ+0x89/0x140
[  122.890670]  common_interrupt+0xf/0xf
[  122.890672]  </IRQ>

Doing a function_graph ftrace on usb_giveback_urb_bh shows a difference
between working and crashshing behaviour:

Working:
# remove battery
 2802.875749 |   3)   0.331 us    |      usb_anchor_suspend_wakeups();
 2802.875749 |   3)   0.283 us    |      usb_unanchor_urb();
 2802.875750 |   3)   0.283 us    |      hub_irq();

# insert battery
 2818.992447 |   3)   0.265 us    |      usb_anchor_suspend_wakeups();
 2818.992448 |   3)   0.279 us    |      usb_unanchor_urb();
 2818.992448 |   3)   0.277 us    |      hub_irq();

# turn on phone
 2829.835833 |   3)   0.262 us    |      usb_anchor_suspend_wakeups();
 2829.835834 |   3)   0.273 us    |      usb_unanchor_urb();
 2829.835834 |   3)   0.294 us    |      hub_irq();


Crashing:

# from dmesg
[ 1537.742750] WARNING: CPU: 3 PID: 0 at kernel/workqueue.c:1473 __queue_work+0x38a/0x430

# remove battery / turn on phone
 1536.448472 |   3)   0.373 us    |      usb_anchor_suspend_wakeups();
 1536.448473 |   3)   0.280 us    |      usb_unanchor_urb();
 1536.448473 |   3)               |      acm_read_bulk_callback [cdc_acm]() {
 1536.448474 |   3)   0.306 us    |        ktime_get_mono_fast_ns();
[...]                                                                

 1536.748347 |   3)   0.279 us    |      usb_anchor_suspend_wakeups();
 1536.748348 |   3)   0.289 us    |      usb_unanchor_urb();          
 1536.748348 |   3)               |      acm_write_bulk [cdc_acm]() { 
 1536.748349 |   3)               |        _raw_spin_lock_irqsave() {
[...]                                                                
                                                                     
 1537.749348 |   3)   0.292 us    |      usb_anchor_suspend_wakeups();
 1537.749348 |   3)   0.298 us    |      usb_unanchor_urb();          
 1537.749349 |   3)               |      acm_write_bulk [cdc_acm]() { 
 1537.749349 |   3)               |        _raw_spin_lock_irqsave() {
[...]                                                                
                                                                     
 1537.749370 |   3)               |        queue_work_on() {
 1537.749370 |   3)               |          __queue_work() {
 1537.749370 |   3)   0.273 us    |            __rcu_read_lock();
 1537.749371 |   3)   0.451 us    |            get_work_pool();  
 1537.749372 |   3)               |            _raw_spin_lock() {
 1537.749372 |   3)   0.270 us    |              preempt_count_add();
 1537.749373 |   3)   0.836 us    |            }                     
 1537.749373 |   3)               |            do_invalid_op() {     
 1537.749374 |   3)   0.364 us    |              uprobe_get_trap_addr();
 1537.749374 |   3)               |              do_error_trap() {      
 1537.749375 |   3)               |                is_valid_bugaddr() { 
 1537.749375 |   3)               |                  __probe_kernel_read() {
 1537.749376 |   3)               |                    __check_object_size() {
 1537.749376 |   3)   0.292 us    |                      check_stack_object();
 1537.749377 |   3)   0.397 us    |                      __virt_addr_valid(); 


To me it looks like the problem arises when urb->complete(urb) is called
in 

static void __usb_hcd_giveback_urb(struct urb *urb)

from drivers/usb/core/hcd.c:

1641    usb_anchor_suspend_wakeups(anchor);
1642    usb_unanchor_urb(urb);
1643    if (likely(status == 0))
1644            usb_led_activity(USB_LED_EVENT_HOST);
1645
1646    /* pass ownership to the completion handler */
1647    urb->status = status;
1648    urb->complete(urb);

If the "wrong" function is set in urb->complete I see the crash.
In the normal case hub_irq() seems to be set. 
In the crashing case something like acm_write_bulk.
May this be the cause?

Regards,
Jean Rene

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-06-03  7:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25 12:00 Kernel Oops in cdc_acm Jean Rene Dawin
2020-05-25 12:28 ` Oliver Neukum
2020-05-25 19:16   ` Jean Rene Dawin
2020-05-26  8:08     ` Oliver Neukum
2020-05-26 11:16       ` Jean Rene Dawin
2020-05-26 11:13     ` Oliver Neukum
2020-05-26 19:57       ` Jean Rene Dawin
2020-05-27  8:53         ` Oliver Neukum
2020-05-28  8:51           ` Jean Rene Dawin
2020-06-03  7:26             ` Jean Rene Dawin
2020-05-27  8:28       ` Jean Rene Dawin
2020-05-27  9:33         ` Oliver Neukum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.