All of lore.kernel.org
 help / color / mirror / Atom feed
* hid-thingm: kernel panic on remove
@ 2014-09-02 17:46 Dylan Alex Simon
  2014-09-02 19:58 ` Benjamin Tissoires
  0 siblings, 1 reply; 16+ messages in thread
From: Dylan Alex Simon @ 2014-09-02 17:46 UTC (permalink / raw)
  To: linux-input

Whenever either disconnecting the USB device or simply rmmod'ing the module
(even when not in use), I get a kernel panic.  I haven't managed to capture a
backtrace, but at least the first two lines were saved after an rmmod:

18:53:17 kernel: thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
<snip, rmmod hid-thingm:>
08:38:42 kernel: BUG: unable to handle kernel paging request at fffffffb8a80aaf8
08:38:42 kernel: IP: [<ffffffff8106e30c>] osq_lock+0x3c/0x110

Let me know if you'd like me to try to capture more info, but this problem
seems very reproducible (at least with a mk2 device; I never had the problem on
older kernels with a mk1).  I do direct write()s to the hidraw device, but
don't otherwise use the driver while it's loaded.

Also at https://bugzilla.kernel.org/show_bug.cgi?id=83751

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-02 17:46 hid-thingm: kernel panic on remove Dylan Alex Simon
@ 2014-09-02 19:58 ` Benjamin Tissoires
  2014-09-02 20:12   ` Dylan Alex Simon
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Benjamin Tissoires @ 2014-09-02 19:58 UTC (permalink / raw)
  To: Dylan Alex Simon, linux-input, Jiri Kosina

Hi,

On 09/02/2014 01:46 PM, Dylan Alex Simon wrote:
> Whenever either disconnecting the USB device or simply rmmod'ing the module
> (even when not in use), I get a kernel panic.  I haven't managed to capture a
> backtrace, but at least the first two lines were saved after an rmmod:
> 
> 18:53:17 kernel: thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
> <snip, rmmod hid-thingm:>
> 08:38:42 kernel: BUG: unable to handle kernel paging request at fffffffb8a80aaf8
> 08:38:42 kernel: IP: [<ffffffff8106e30c>] osq_lock+0x3c/0x110
> 
> Let me know if you'd like me to try to capture more info, but this problem
> seems very reproducible (at least with a mk2 device; I never had the problem on
> older kernels with a mk1).  I do direct write()s to the hidraw device, but
> don't otherwise use the driver while it's loaded.
> 
> Also at https://bugzilla.kernel.org/show_bug.cgi?id=83751

Do you happen to see a "unsupported firmware " error when plugging your device?

If so, then the following patch should help with the panic (but you will an other one to be able to use again your device).

---

>From 791297375227b91990b37f94fc9de93156a3c21a Mon Sep 17 00:00:00 2001
From: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Date: Tue, 2 Sep 2014 15:50:43 -0400
Subject: [PATCH] HID: thingm: set the proper error code before leaving

In case of an unsupported firmware, the driver bails out without setting
the LEDs interfaces, but forget to set the proper error code.
err is then still equal to 0 and the hid subsytem consider the device
to be in perfect shape.
When removing it, thingm_remove() tries to unbind the rgb LEDs which
has not been created, leading to a segfault.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
---
 drivers/hid/hid-thingm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
index 134be89..f91f971 100644
--- a/drivers/hid/hid-thingm.c
+++ b/drivers/hid/hid-thingm.c
@@ -250,6 +250,7 @@ static int thingm_probe(struct hid_device *hdev, const struct hid_device_id *id)
 
 	if (!tdev->fwinfo) {
 		hid_err(hdev, "unsupported firmware %c\n", tdev->version.major);
+		err = -ENODEV;
 		goto stop;
 	}
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-02 19:58 ` Benjamin Tissoires
@ 2014-09-02 20:12   ` Dylan Alex Simon
  2014-09-02 21:32   ` Jiri Kosina
  2014-09-03 19:48   ` hid-thingm: kernel panic " Benjamin Tissoires
  2 siblings, 0 replies; 16+ messages in thread
From: Dylan Alex Simon @ 2014-09-02 20:12 UTC (permalink / raw)
  To: Benjamin Tissoires; +Cc: linux-input, Jiri Kosina

>From Benjamin Tissoires <benjamin.tissoires@gmail.com>, Tue, Sep 02, 2014 at 03:58:08PM -0400:
> Hi,
> 
> On 09/02/2014 01:46 PM, Dylan Alex Simon wrote:
> > Whenever either disconnecting the USB device or simply rmmod'ing the module
> > (even when not in use), I get a kernel panic.  I haven't managed to capture a
> > backtrace, but at least the first two lines were saved after an rmmod.
> > 
> > Let me know if you'd like me to try to capture more info, but this problem
> > seems very reproducible (at least with a mk2 device; I never had the problem on
> > older kernels with a mk1).  I do direct write()s to the hidraw device, but
> > don't otherwise use the driver while it's loaded.
> > 
> > Also at https://bugzilla.kernel.org/show_bug.cgi?id=83751
> 
> Do you happen to see a "unsupported firmware " error when plugging your device?
> 
> If so, then the following patch should help with the panic (but you will an other one to be able to use again your device).

No, no unsupported firmware messages.  The device and driver work fine
otherwise.  It registers (and unregisters) all the devices.  Hotplug events:

18:53:17 kernel: thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
18:53:17 hotplug: 694 add module path:/module/hid_thingm
18:53:17 hotplug: 695 add hidraw path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/hidraw/hidraw3
18:53:17 hotplug: 696 add leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:red:led1
18:53:17 hotplug: 698 add leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:blue:led1
18:53:17 hotplug: 697 add leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:green:led1
18:53:17 hotplug: 699 add leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:red:led2
18:53:17 hotplug: 700 add leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:green:led2
18:53:17 hotplug: 701 add leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:blue:led2
18:53:17 hotplug: 702 add drivers path:/bus/hid/drivers/thingm
<snip, rmmod:>
08:38:42 hotplug: 710 remove leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:red:led1
08:38:42 hotplug: 712 remove leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:blue:led1
08:38:42 hotplug: 711 remove leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:green:led1
08:38:42 hotplug: 714 remove leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:green:led2
08:38:42 hotplug: 713 remove leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:red:led2
08:38:42 hotplug: 715 remove leds path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/leds/thingm3:blue:led2
08:38:42 hotplug: 716 remove hidraw path:/devices/pci0000:00/0000:00:12.2/usb1/1-3/1-3.1/1-3.1.4/1-3.1.4:1.0/0003:27B8:01ED.0004/hidraw/hidraw3
08:38:42 kernel: BUG: unable to handle kernel paging request at fffffffb8a80aaf8
08:38:42 kernel: IP: [<ffffffff8106e30c>] osq_lock+0x3c/0x110

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-02 19:58 ` Benjamin Tissoires
  2014-09-02 20:12   ` Dylan Alex Simon
@ 2014-09-02 21:32   ` Jiri Kosina
  2014-09-03  2:35     ` Dylan Alex Simon
  2014-09-03 19:48   ` hid-thingm: kernel panic " Benjamin Tissoires
  2 siblings, 1 reply; 16+ messages in thread
From: Jiri Kosina @ 2014-09-02 21:32 UTC (permalink / raw)
  To: Benjamin Tissoires; +Cc: Dylan Alex Simon, linux-input

On Tue, 2 Sep 2014, Benjamin Tissoires wrote:

> > Whenever either disconnecting the USB device or simply rmmod'ing the module
> > (even when not in use), I get a kernel panic.  I haven't managed to capture a
> > backtrace, but at least the first two lines were saved after an rmmod:
> > 
> > 18:53:17 kernel: thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
> > <snip, rmmod hid-thingm:>
> > 08:38:42 kernel: BUG: unable to handle kernel paging request at fffffffb8a80aaf8
> > 08:38:42 kernel: IP: [<ffffffff8106e30c>] osq_lock+0x3c/0x110

Hmm, so the only lock that is taken in thingm driver itself is 
rgb->tdev->lock, which is thingm_device->lock, properly initialized in 
_probe().

So my first thought was that the work is not cancelled properly, causing 
use-after-free on the lock with the workqueue firing at the time the 
drvier has cleaned up everything, but that doesn't seem to be the case, as 
thingm_remove() -> thingm_remove_rgb() seems to be doing the right thing.

Dylan, is there any chance for you to capture more complete backtrace from 
the oops? serial console, netconsole, anything?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-02 21:32   ` Jiri Kosina
@ 2014-09-03  2:35     ` Dylan Alex Simon
  2014-09-03  7:29       ` Jiri Kosina
  0 siblings, 1 reply; 16+ messages in thread
From: Dylan Alex Simon @ 2014-09-03  2:35 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Benjamin Tissoires, linux-input

>From Jiri Kosina <jkosina@suse.cz>, Tue, Sep 02, 2014 at 11:32:30PM +0200:
> On Tue, 2 Sep 2014, Benjamin Tissoires wrote:
> 
> > > Whenever either disconnecting the USB device or simply rmmod'ing the module
> > > (even when not in use), I get a kernel panic.  I haven't managed to capture a
> > > backtrace, but at least the first two lines were saved after an rmmod:
> > > 
> > > 18:53:17 kernel: thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
> > > <snip, rmmod hid-thingm:>
> > > 08:38:42 kernel: BUG: unable to handle kernel paging request at fffffffb8a80aaf8
> > > 08:38:42 kernel: IP: [<ffffffff8106e30c>] osq_lock+0x3c/0x110
> 
> Hmm, so the only lock that is taken in thingm driver itself is 
> rgb->tdev->lock, which is thingm_device->lock, properly initialized in 
> _probe().
> 
> So my first thought was that the work is not cancelled properly, causing 
> use-after-free on the lock with the workqueue firing at the time the 
> drvier has cleaned up everything, but that doesn't seem to be the case, as 
> thingm_remove() -> thingm_remove_rgb() seems to be doing the right thing.
> 
> Dylan, is there any chance for you to capture more complete backtrace from 
> the oops? serial console, netconsole, anything?

Some combination of kernel debugging options and killing processes let it
survive long enough to write the backtrace to disk.  A simple modprobe/rmmod
wasn't enough, though, it required a few tries removing the device and then
rmmod (though has definitely happend on just one removal before).  Let me know
if there's anything else I can try.

<insmod>
[   28.855960] thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
<rmmod;insmod>
[  147.037008] thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
<unplug>
[  218.496688] usb 1-3.1.4: USB disconnect, device number 7
[  218.502278] hid : failed to write color
[  218.506131] hid : failed to write color
<plug>
[  233.557300] usb 1-3.1.4: new full-speed USB device number 8 using ehci-pci
[  233.657195] usb 1-3.1.4: config 1 interface 0 altsetting 0 has 2 endpoint descriptors, different from the interface descriptor's value: 1
[  233.660402] thingm 0003:27B8:01ED.0005: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
<rmmod>
[  253.682724] BUG: unable to handle kernel paging request at ffffffffa00af0cf
[  253.682807] IP:
[  253.682812]  [<ffffffffa00af0cf>] 0xffffffffa00af0cf
[  253.682817] PGD 1814067 PUD 1815063 PMD 42cace067 PTE 0
[  253.682820] Oops: 0010 [#1] SMP 
[  253.682830] Modules linked in: led_class cuse fuse snd_emu10k1 snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq_device snd_pcm snd_timer ipt_ULOG [last unloaded: hid_thingm]
[  253.682833] CPU: 0 PID: 849 Comm: kworker/0:2 Not tainted 3.16.1-00001-g98fed6d #145
[  253.682835] Hardware name: empty empty/S8010-LE, BIOS 'V2.03B   ' 03/15/2012
[  253.682838] Workqueue: events 0xffffffffa00af040
[  253.682840] task: ffff88042e330050 ti: ffff880429d8c000 task.ti: ffff880429d8c000
[  253.682844] RIP: 0010:[<ffffffffa00af0cf>]  [<ffffffffa00af0cf>] 0xffffffffa00af0cf
[  253.682846] RSP: 0018:ffff880429d8fdd0  EFLAGS: 00010286
[  253.682847] RAX: 0000000000000009 RBX: ffff88042ca83af0 RCX: 0000000000000302
[  253.682849] RDX: 0000000000000078 RSI: 0000000000000286 RDI: ffff88042caaade0
[  253.682850] RBP: ffff880429d8fdf0 R08: ffff8804acaaade0 R09: 0000000000000282
[  253.682852] R10: ffff88042c93dbc0 R11: 000000000000001f R12: ffff88042c885e80
[  253.682853] R13: 0000000000000000 R14: ffff88043ec14e00 R15: ffff88043ec113c0
[  253.682856] FS:  00007f58ebcec700(0000) GS:ffff88043ec00000(0000) knlGS:0000000000000000
[  253.682857] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  253.682859] CR2: ffffffffa00af0cf CR3: 0000000001813000 CR4: 00000000000407f0
[  253.682859] Stack:
[  253.682863]  010000000000a000 0001000000000063 000000008867f918 ffff88042ca83af0
[  253.682866]  ffff880429d8fe38 ffffffff81052c2f ffff88043ec113c0 000000003ec113c0
[  253.682869]  ffff88043ec113c0 ffff88043ec113e8 ffff88042e330050 ffff88042c885eb0
[  253.682870] Call Trace:
[  253.682878]  [<ffffffff81052c2f>] process_one_work+0x14f/0x400
[  253.682882]  [<ffffffff81053423>] worker_thread+0x63/0x540
[  253.682886]  [<ffffffff810533c0>] ? create_and_start_worker+0x60/0x60
[  253.682889]  [<ffffffff81059038>] kthread+0xe8/0x100
[  253.682893]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0
[  253.682897]  [<ffffffff815323ec>] ret_from_fork+0x7c/0xb0
[  253.682900]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0
[  253.682906] Code:  Bad RIP value.
[  253.682908] RIP  [<ffffffffa00af0cf>] 0xffffffffa00af0cf
[  253.682909]  RSP <ffff880429d8fdd0>
[  253.682910] CR2: ffffffffa00af0cf
[  253.682913] ---[ end trace 38f1b789201cd967 ]---
[  253.682946] BUG: unable to handle kernel paging request at ffffffffffffffc8
[  253.682950] IP: [<ffffffff810595eb>] kthread_data+0xb/0x20
[  253.682953] PGD 1814067 PUD 1816067 PMD 0 
[  253.682955] Oops: 0000 [#2] SMP 
[  253.682964] Modules linked in: led_class cuse fuse snd_emu10k1 snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq_device snd_pcm snd_timer ipt_ULOG [last unloaded: hid_thingm]
[  253.682967] CPU: 0 PID: 849 Comm: kworker/0:2 Tainted: G      D       3.16.1-00001-g98fed6d #145
[  253.682969] Hardware name: empty empty/S8010-LE, BIOS 'V2.03B   ' 03/15/2012
[  253.682987] task: ffff88042e330050 ti: ffff880429d8c000 task.ti: ffff880429d8c000
[  253.682990] RIP: 0010:[<ffffffff810595eb>]  [<ffffffff810595eb>] kthread_data+0xb/0x20
[  253.682992] RSP: 0018:ffff880429d8fa40  EFLAGS: 00010002
[  253.682993] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffff80cd76
[  253.682995] RDX: 0000000050f1eeb6 RSI: 0000000000000000 RDI: ffff88042e330050
[  253.682996] RBP: ffff880429d8fa40 R08: ffff88042c8928a0 R09: 0000000000000001
[  253.682998] R10: 00000000000001ea R11: 0000000000000000 R12: 0000000000000000
[  253.682999] R13: ffff88042e330400 R14: 0000000000000000 R15: ffff88042e330050
[  253.683002] FS:  00007f58ebcec700(0000) GS:ffff88043ec00000(0000) knlGS:0000000000000000
[  253.683003] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  253.683005] CR2: 0000000000000028 CR3: 0000000001813000 CR4: 00000000000407f0
[  253.683005] Stack:
[  253.683009]  ffff880429d8fa58 ffffffff8105396c ffff88043ec11c40 ffff880429d8fac0
[  253.683011]  ffffffff8152db81 000000000000a000 ffff88042e330050 ffff880429d8fa88
[  253.683014]  ffff880429d8ffd8 ffff880429d8fad0 ffffffff8103e2a9 ffff88042e330610
[  253.683015] Call Trace:
[  253.683019]  [<ffffffff8105396c>] wq_worker_sleeping+0xc/0x90
[  253.683024]  [<ffffffff8152db81>] __schedule+0x4b1/0x730
[  253.683029]  [<ffffffff8103e2a9>] ? release_task+0x249/0x3d0
[  253.683033]  [<ffffffff8152de24>] schedule+0x24/0x60
[  253.683036]  [<ffffffff8103eb92>] do_exit+0x762/0xa10
[  253.683041]  [<ffffffff81005998>] oops_end+0x68/0x90
[  253.683046]  [<ffffffff8103242c>] no_context+0x12c/0x2f0
[  253.683050]  [<ffffffff81032675>] __bad_area_nosemaphore+0x85/0x1f0
[  253.683054]  [<ffffffff810327ee>] bad_area_nosemaphore+0xe/0x10
[  253.683058]  [<ffffffff81032b16>] __do_page_fault+0xb6/0x4d0
[  253.683062]  [<ffffffff81531e09>] ? _raw_spin_unlock_irq+0x9/0x10
[  253.683066]  [<ffffffff8141a43f>] ? urb_destroy+0x1f/0x30
[  253.683069]  [<ffffffff8141ad09>] ? usb_free_urb+0x19/0x20
[  253.683072]  [<ffffffff8141b342>] ? usb_start_wait_urb+0xa2/0xf0
[  253.683075]  [<ffffffff81032f5c>] do_page_fault+0xc/0x10
[  253.683079]  [<ffffffff81533922>] page_fault+0x22/0x30
[  253.683084]  [<ffffffff81052c2f>] process_one_work+0x14f/0x400
[  253.683087]  [<ffffffff81053423>] worker_thread+0x63/0x540
[  253.683091]  [<ffffffff810533c0>] ? create_and_start_worker+0x60/0x60
[  253.683093]  [<ffffffff81059038>] kthread+0xe8/0x100
[  253.683097]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0
[  253.683100]  [<ffffffff815323ec>] ret_from_fork+0x7c/0xb0
[  253.683103]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0
[  253.683132] Code: 00 48 89 e5 5d 48 8b 40 b8 48 c1 e8 02 83 e0 01 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 87 58 03 00 00 55 48 89 e5 <48> 8b 40 c8 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 
[  253.683135] RIP  [<ffffffff810595eb>] kthread_data+0xb/0x20
[  253.683136]  RSP <ffff880429d8fa40>
[  253.683137] CR2: ffffffffffffffc8
[  253.683139] ---[ end trace 38f1b789201cd968 ]---
[  253.683140] Fixing recursive fault but reboot is needed!
[  313.753106] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 7, t=18003 jiffies, g=1479, c=1478, q=300)
<more traces follow but probably irrelevant>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-03  2:35     ` Dylan Alex Simon
@ 2014-09-03  7:29       ` Jiri Kosina
  2014-09-03  8:17         ` Dylan Alex Simon
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Kosina @ 2014-09-03  7:29 UTC (permalink / raw)
  To: Dylan Alex Simon; +Cc: Benjamin Tissoires, linux-input

On Tue, 2 Sep 2014, Dylan Alex Simon wrote:

> Some combination of kernel debugging options and killing processes let it
> survive long enough to write the backtrace to disk.  A simple modprobe/rmmod
> wasn't enough, though, it required a few tries removing the device and then
> rmmod (though has definitely happend on just one removal before).  Let me know
> if there's anything else I can try.
> 
> <insmod>
> [   28.855960] thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
> <rmmod;insmod>
> [  147.037008] thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
> <unplug>
> [  218.496688] usb 1-3.1.4: USB disconnect, device number 7
> [  218.502278] hid : failed to write color
> [  218.506131] hid : failed to write color
> <plug>
> [  233.557300] usb 1-3.1.4: new full-speed USB device number 8 using ehci-pci
> [  233.657195] usb 1-3.1.4: config 1 interface 0 altsetting 0 has 2 endpoint descriptors, different from the interface descriptor's value: 1
> [  233.660402] thingm 0003:27B8:01ED.0005: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
> <rmmod>
> [  253.682724] BUG: unable to handle kernel paging request at ffffffffa00af0cf
> [  253.682807] IP:
> [  253.682812]  [<ffffffffa00af0cf>] 0xffffffffa00af0cf
> [  253.682817] PGD 1814067 PUD 1815063 PMD 42cace067 PTE 0
> [  253.682820] Oops: 0010 [#1] SMP 
> [  253.682830] Modules linked in: led_class cuse fuse snd_emu10k1 snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq_device snd_pcm snd_timer ipt_ULOG [last unloaded: hid_thingm]
> [  253.682833] CPU: 0 PID: 849 Comm: kworker/0:2 Not tainted 3.16.1-00001-g98fed6d #145
> [  253.682835] Hardware name: empty empty/S8010-LE, BIOS 'V2.03B   ' 03/15/2012
> [  253.682838] Workqueue: events 0xffffffffa00af040
> [  253.682840] task: ffff88042e330050 ti: ffff880429d8c000 task.ti: ffff880429d8c000
> [  253.682844] RIP: 0010:[<ffffffffa00af0cf>]  [<ffffffffa00af0cf>] 0xffffffffa00af0cf
> [  253.682846] RSP: 0018:ffff880429d8fdd0  EFLAGS: 00010286
> [  253.682847] RAX: 0000000000000009 RBX: ffff88042ca83af0 RCX: 0000000000000302
> [  253.682849] RDX: 0000000000000078 RSI: 0000000000000286 RDI: ffff88042caaade0
> [  253.682850] RBP: ffff880429d8fdf0 R08: ffff8804acaaade0 R09: 0000000000000282
> [  253.682852] R10: ffff88042c93dbc0 R11: 000000000000001f R12: ffff88042c885e80
> [  253.682853] R13: 0000000000000000 R14: ffff88043ec14e00 R15: ffff88043ec113c0
> [  253.682856] FS:  00007f58ebcec700(0000) GS:ffff88043ec00000(0000) knlGS:0000000000000000
> [  253.682857] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  253.682859] CR2: ffffffffa00af0cf CR3: 0000000001813000 CR4: 00000000000407f0
> [  253.682859] Stack:
> [  253.682863]  010000000000a000 0001000000000063 000000008867f918 ffff88042ca83af0
> [  253.682866]  ffff880429d8fe38 ffffffff81052c2f ffff88043ec113c0 000000003ec113c0
> [  253.682869]  ffff88043ec113c0 ffff88043ec113e8 ffff88042e330050 ffff88042c885eb0
> [  253.682870] Call Trace:
> [  253.682878]  [<ffffffff81052c2f>] process_one_work+0x14f/0x400
> [  253.682882]  [<ffffffff81053423>] worker_thread+0x63/0x540
> [  253.682886]  [<ffffffff810533c0>] ? create_and_start_worker+0x60/0x60
> [  253.682889]  [<ffffffff81059038>] kthread+0xe8/0x100
> [  253.682893]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0
> [  253.682897]  [<ffffffff815323ec>] ret_from_fork+0x7c/0xb0
> [  253.682900]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0

Alright, this supports my original hunch. I think I see the race. Could 
you please try the patch below? Thanks.




diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
index 134be89..743a517 100644
--- a/drivers/hid/hid-thingm.c
+++ b/drivers/hid/hid-thingm.c
@@ -208,7 +208,7 @@ unregister_red:
 
 static void thingm_remove_rgb(struct thingm_rgb *rgb)
 {
-	flush_work(&rgb->work);
+	cancel_work_sync(&rgb->work);
 	led_classdev_unregister(&rgb->red.ldev);
 	led_classdev_unregister(&rgb->green.ldev);
 	led_classdev_unregister(&rgb->blue.ldev);

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-03  7:29       ` Jiri Kosina
@ 2014-09-03  8:17         ` Dylan Alex Simon
  2014-09-03  8:21           ` Jiri Kosina
  0 siblings, 1 reply; 16+ messages in thread
From: Dylan Alex Simon @ 2014-09-03  8:17 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Benjamin Tissoires, linux-input

>From Jiri Kosina <jkosina@suse.cz>, Wed, Sep 03, 2014 at 09:29:07AM +0200:
> On Tue, 2 Sep 2014, Dylan Alex Simon wrote:
> 
> > Some combination of kernel debugging options and killing processes let it
> > survive long enough to write the backtrace to disk.  A simple modprobe/rmmod
> > wasn't enough, though, it required a few tries removing the device and then
> > rmmod (though has definitely happend on just one removal before).  Let me know
> > if there's anything else I can try.
> 
> Alright, this supports my original hunch. I think I see the race. Could 
> you please try the patch below? Thanks.
> 
> 
> 
> 
> diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
> index 134be89..743a517 100644
> --- a/drivers/hid/hid-thingm.c
> +++ b/drivers/hid/hid-thingm.c
> @@ -208,7 +208,7 @@ unregister_red:
>  
>  static void thingm_remove_rgb(struct thingm_rgb *rgb)
>  {
> -	flush_work(&rgb->work);
> +	cancel_work_sync(&rgb->work);
>  	led_classdev_unregister(&rgb->red.ldev);
>  	led_classdev_unregister(&rgb->green.ldev);
>  	led_classdev_unregister(&rgb->blue.ldev);

Same problem (only rmmod this time, no write error, but still
doesn't happen every time):

[  213.180726] thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
<rmmod>
[  217.399934] BUG: unable to handle kernel paging request at ffffffffa00aa0cf
[  217.400034] IP:
[  217.400038]  [<ffffffffa00aa0cf>] 0xffffffffa00aa0cf
[  217.400039] PGD 1814067 
[  217.400040] PUD 1815063 
[  217.400044] PMD 42c266067 PTE 0
[  217.400048] Oops: 0010 [#1] SMP 
[  217.400057] Modules linked in: led_class cuse fuse snd_emu10k1 snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq_device snd_pcm snd_timer ipt_ULOG [last unloaded: hid_thingm]
[  217.400061] CPU: 1 PID: 749 Comm: kworker/1:2 Not tainted 3.16.1-00001-g98fed6d-dirty #146
[  217.400063] Hardware name: empty empty/S8010-LE, BIOS 'V2.03B   ' 03/15/2012
[  217.400066] Workqueue: events 0xffffffffa00aa040
[  217.400068] task: ffff88042df88790 ti: ffff8800bb9e4000 task.ti: ffff8800bb9e4000
[  217.400072] RIP: 0010:[<ffffffffa00aa0cf>]  [<ffffffffa00aa0cf>] 0xffffffffa00aa0cf
[  217.400073] RSP: 0018:ffff8800bb9e7dd0  EFLAGS: 00010286
[  217.400075] RAX: 0000000000000009 RBX: ffff88042c285af0 RCX: 0000000000000302
[  217.400077] RDX: 0000000000000078 RSI: 0000000000000286 RDI: ffff88042d0e1a80
[  217.400078] RBP: ffff8800bb9e7df0 R08: ffff8804ad0e1a80 R09: 0000000000000282
[  217.400079] R10: 0000000000000001 R11: 000000002c95c8ba R12: ffff88042c8b0580
[  217.400081] R13: 0000000000000000 R14: ffff88043ec54e00 R15: ffff88043ec513c0
[  217.400087] FS:  00007f02217fc700(0000) GS:ffff88043ec40000(0000) knlGS:0000000000000000
[  217.400089] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  217.400091] CR2: ffffffffa00aa0cf CR3: 000000042af90000 CR4: 00000000000407e0
[  217.400091] Stack:
[  217.400096]  010000000000a000 0001000000000063 000000000ba997f6 ffff88042c285af0
[  217.400099]  ffff8800bb9e7e38 ffffffff81052c2f ffff88043ec513c0 000000003ec513c0
[  217.400104]  ffff88043ec513c0 ffff88043ec513e8 ffff88042df88790 ffff88042c8b05b0
[  217.400105] Call Trace:
[  217.400114]  [<ffffffff81052c2f>] process_one_work+0x14f/0x400
[  217.400120]  [<ffffffff81053423>] worker_thread+0x63/0x540
[  217.400125]  [<ffffffff810533c0>] ? create_and_start_worker+0x60/0x60
[  217.400130]  [<ffffffff81059038>] kthread+0xe8/0x100
[  217.400136]  [<ffffffff8152de24>] ? schedule+0x24/0x60
[  217.400144]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0
[  217.400149]  [<ffffffff815323ec>] ret_from_fork+0x7c/0xb0
[  217.400153]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0
[  217.400159] Code:  Bad RIP value.
[  217.400163] RIP  [<ffffffffa00aa0cf>] 0xffffffffa00aa0cf
[  217.400164]  RSP <ffff8800bb9e7dd0>
[  217.400165] CR2: ffffffffa00aa0cf
[  217.400168] ---[ end trace 9bd9c9db3e942a93 ]---

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-03  8:17         ` Dylan Alex Simon
@ 2014-09-03  8:21           ` Jiri Kosina
  2014-09-03 13:03             ` Dylan Alex Simon
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Kosina @ 2014-09-03  8:21 UTC (permalink / raw)
  To: Dylan Alex Simon; +Cc: Benjamin Tissoires, linux-input

On Wed, 3 Sep 2014, Dylan Alex Simon wrote:

> > Alright, this supports my original hunch. I think I see the race. Could 
> > you please try the patch below? Thanks.
> > 
> > 
> > 
> > 
> > diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
> > index 134be89..743a517 100644
> > --- a/drivers/hid/hid-thingm.c
> > +++ b/drivers/hid/hid-thingm.c
> > @@ -208,7 +208,7 @@ unregister_red:
> >  
> >  static void thingm_remove_rgb(struct thingm_rgb *rgb)
> >  {
> > -	flush_work(&rgb->work);
> > +	cancel_work_sync(&rgb->work);
> >  	led_classdev_unregister(&rgb->red.ldev);
> >  	led_classdev_unregister(&rgb->green.ldev);
> >  	led_classdev_unregister(&rgb->blue.ldev);
> 
> Same problem (only rmmod this time, no write error, but still
> doesn't happen every time):

I sent you wrong version of the patch, sorry for that. Could you please 
try the one below instead?


diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
index 134be89..dffc50d 100644
--- a/drivers/hid/hid-thingm.c
+++ b/drivers/hid/hid-thingm.c
@@ -208,7 +208,7 @@ unregister_red:
 
 static void thingm_remove_rgb(struct thingm_rgb *rgb)
 {
-	flush_work(&rgb->work);
+	cancel_work_sync(&rgb->work);
 	led_classdev_unregister(&rgb->red.ldev);
 	led_classdev_unregister(&rgb->green.ldev);
 	led_classdev_unregister(&rgb->blue.ldev);
@@ -286,10 +286,10 @@ static void thingm_remove(struct hid_device *hdev)
 	struct thingm_device *tdev = hid_get_drvdata(hdev);
 	int i;
 
+	hid_hw_stop(hdev);
+
 	for (i = 0; i < tdev->fwinfo->numrgb; ++i)
 		thingm_remove_rgb(tdev->rgb + i);
-
-	hid_hw_stop(hdev);
 }
 
 static const struct hid_device_id thingm_table[] = {

> 
-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-03  8:21           ` Jiri Kosina
@ 2014-09-03 13:03             ` Dylan Alex Simon
  2014-09-03 13:10               ` Jiri Kosina
  0 siblings, 1 reply; 16+ messages in thread
From: Dylan Alex Simon @ 2014-09-03 13:03 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Benjamin Tissoires, linux-input

> I sent you wrong version of the patch, sorry for that. Could you please 
> try the one below instead?
> 
> 
> diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
> index 134be89..dffc50d 100644
> --- a/drivers/hid/hid-thingm.c
> +++ b/drivers/hid/hid-thingm.c
> @@ -208,7 +208,7 @@ unregister_red:
>  
>  static void thingm_remove_rgb(struct thingm_rgb *rgb)
>  {
> -	flush_work(&rgb->work);
> +	cancel_work_sync(&rgb->work);
>  	led_classdev_unregister(&rgb->red.ldev);
>  	led_classdev_unregister(&rgb->green.ldev);
>  	led_classdev_unregister(&rgb->blue.ldev);
> @@ -286,10 +286,10 @@ static void thingm_remove(struct hid_device *hdev)
>  	struct thingm_device *tdev = hid_get_drvdata(hdev);
>  	int i;
>  
> +	hid_hw_stop(hdev);
> +
>  	for (i = 0; i < tdev->fwinfo->numrgb; ++i)
>  		thingm_remove_rgb(tdev->rgb + i);
> -
> -	hid_hw_stop(hdev);
>  }
>  
>  static const struct hid_device_id thingm_table[] = {
> 

Still no luck (it really is using the new module, even though it didn't
rebuild the kernel).  I should also mention this is gcc 4.9.1.  I can rebuild
with 4.8.3 if there's a reason to suspect that, though I haven't seen any
other issues.

[  145.571979] thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
[  149.151153] BUG: unable to handle kernel paging request at ffffffffa00aa0cf
[  149.151246] IP:
[  149.151251]  [<ffffffffa00aa0cf>] 0xffffffffa00aa0cf
[  149.151257] PGD 1814067 PUD 1815063 PMD 42ba9e067 PTE 0
[  149.151262] Oops: 0010 [#1] SMP 
[  149.151273] Modules linked in: led_class cuse fuse snd_emu10k1 snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq_device snd_pcm snd_timer ipt_ULOG [last unloaded: hid_thingm]
[  149.151279] CPU: 7 PID: 664 Comm: kworker/7:2 Not tainted 3.16.1-00001-g98fed6d-dirty #146
[  149.151280] Hardware name: empty empty/S8010-LE, BIOS 'V2.03B   ' 03/15/2012
[  149.151286] Workqueue: events 0xffffffffa00aa040
[  149.151288] task: ffff88042de12010 ti: ffff8800bb96c000 task.ti: ffff8800bb96c000
[  149.151294] RIP: 0010:[<ffffffffa00aa0cf>]  [<ffffffffa00aa0cf>] 0xffffffffa00aa0cf
[  149.151296] RSP: 0018:ffff8800bb96fdd0  EFLAGS: 00010286
[  149.151298] RAX: 0000000000000009 RBX: ffff88042ba612f0 RCX: 0000000000000302
[  149.151300] RDX: 0000000000000049 RSI: 0000000000000286 RDI: ffff88042aa95d80
[  149.151302] RBP: ffff8800bb96fdf0 R08: ffff8804aaa95d80 R09: 0000000000000282
[  149.151303] R10: 000000716d515080 R11: 0000000000000026 R12: ffff88042d189940
[  149.151305] R13: 0000000000000000 R14: ffff88043edd4e00 R15: ffff88043edd13c0
[  149.151308] FS:  00007f383d7a3700(0000) GS:ffff88043edc0000(0000) knlGS:0000000000000000
[  149.151311] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  149.151315] CR2: ffffffffa00aa0cf CR3: 0000000001813000 CR4: 00000000000407e0
[  149.151316] Stack:
[  149.151321]  010000000000a000 0001000000000063 00000000f7ea6138 ffff88042ba612f0
[  149.151325]  ffff8800bb96fe38 ffffffff81052c2f ffff88043edd13c0 000000003edd13c0
[  149.151329]  ffff88043edd13c0 ffff88043edd13e8 ffff88042de12010 ffff88042d189970
[  149.151330] Call Trace:
[  149.151341]  [<ffffffff81052c2f>] process_one_work+0x14f/0x400
[  149.151349]  [<ffffffff81053423>] worker_thread+0x63/0x540
[  149.151354]  [<ffffffff810533c0>] ? create_and_start_worker+0x60/0x60
[  149.151359]  [<ffffffff81059038>] kthread+0xe8/0x100
[  149.151364]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0
[  149.151370]  [<ffffffff815323ec>] ret_from_fork+0x7c/0xb0
[  149.151374]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0
[  149.151382] Code:  Bad RIP value.
[  149.151386] RIP  [<ffffffffa00aa0cf>] 0xffffffffa00aa0cf
[  149.151387]  RSP <ffff8800bb96fdd0>
[  149.151389] CR2: ffffffffa00aa0cf
[  149.151392] ---[ end trace 0d9114b619b5b2dc ]---
[  149.151442] BUG: unable to handle kernel paging request at ffffffffffffffc8

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-03 13:03             ` Dylan Alex Simon
@ 2014-09-03 13:10               ` Jiri Kosina
  2014-09-03 14:16                 ` Dylan Alex Simon
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Kosina @ 2014-09-03 13:10 UTC (permalink / raw)
  To: Dylan Alex Simon; +Cc: Benjamin Tissoires, linux-input

On Wed, 3 Sep 2014, Dylan Alex Simon wrote:

> Still no luck (it really is using the new module, even though it didn't
> rebuild the kernel).  I should also mention this is gcc 4.9.1.  I can rebuild
> with 4.8.3 if there's a reason to suspect that, though I haven't seen any
> other issues.
> 
> [  145.571979] thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
> [  149.151153] BUG: unable to handle kernel paging request at ffffffffa00aa0cf
> [  149.151246] IP:
> [  149.151251]  [<ffffffffa00aa0cf>] 0xffffffffa00aa0cf
> [  149.151257] PGD 1814067 PUD 1815063 PMD 42ba9e067 PTE 0
> [  149.151262] Oops: 0010 [#1] SMP 
> [  149.151273] Modules linked in: led_class cuse fuse snd_emu10k1 snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq_device snd_pcm snd_timer ipt_ULOG [last unloaded: hid_thingm]
> [  149.151279] CPU: 7 PID: 664 Comm: kworker/7:2 Not tainted 3.16.1-00001-g98fed6d-dirty #146
> [  149.151280] Hardware name: empty empty/S8010-LE, BIOS 'V2.03B   ' 03/15/2012
> [  149.151286] Workqueue: events 0xffffffffa00aa040
> [  149.151288] task: ffff88042de12010 ti: ffff8800bb96c000 task.ti: ffff8800bb96c000
> [  149.151294] RIP: 0010:[<ffffffffa00aa0cf>]  [<ffffffffa00aa0cf>] 0xffffffffa00aa0cf
> [  149.151296] RSP: 0018:ffff8800bb96fdd0  EFLAGS: 00010286
> [  149.151298] RAX: 0000000000000009 RBX: ffff88042ba612f0 RCX: 0000000000000302
> [  149.151300] RDX: 0000000000000049 RSI: 0000000000000286 RDI: ffff88042aa95d80
> [  149.151302] RBP: ffff8800bb96fdf0 R08: ffff8804aaa95d80 R09: 0000000000000282
> [  149.151303] R10: 000000716d515080 R11: 0000000000000026 R12: ffff88042d189940
> [  149.151305] R13: 0000000000000000 R14: ffff88043edd4e00 R15: ffff88043edd13c0
> [  149.151308] FS:  00007f383d7a3700(0000) GS:ffff88043edc0000(0000) knlGS:0000000000000000
> [  149.151311] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  149.151315] CR2: ffffffffa00aa0cf CR3: 0000000001813000 CR4: 00000000000407e0
> [  149.151316] Stack:
> [  149.151321]  010000000000a000 0001000000000063 00000000f7ea6138 ffff88042ba612f0
> [  149.151325]  ffff8800bb96fe38 ffffffff81052c2f ffff88043edd13c0 000000003edd13c0
> [  149.151329]  ffff88043edd13c0 ffff88043edd13e8 ffff88042de12010 ffff88042d189970
> [  149.151330] Call Trace:
> [  149.151341]  [<ffffffff81052c2f>] process_one_work+0x14f/0x400
> [  149.151349]  [<ffffffff81053423>] worker_thread+0x63/0x540
> [  149.151354]  [<ffffffff810533c0>] ? create_and_start_worker+0x60/0x60
> [  149.151359]  [<ffffffff81059038>] kthread+0xe8/0x100
> [  149.151364]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0
> [  149.151370]  [<ffffffff815323ec>] ret_from_fork+0x7c/0xb0
> [  149.151374]  [<ffffffff81058f50>] ? kthread_create_on_node+0x1b0/0x1b0

Hrm, what we really want is to first destroy the LED interface, otherwise 
someone might have re-queued the workqueue through thingm_led_set() after 
it has been flushed / cancelled already.

Could you please give it one more shot with the patch below? Thanks for 
your prompt testing.


diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
index 134be89..3d8cf5e 100644
--- a/drivers/hid/hid-thingm.c
+++ b/drivers/hid/hid-thingm.c
@@ -208,10 +208,10 @@ unregister_red:
 
 static void thingm_remove_rgb(struct thingm_rgb *rgb)
 {
-	flush_work(&rgb->work);
 	led_classdev_unregister(&rgb->red.ldev);
 	led_classdev_unregister(&rgb->green.ldev);
 	led_classdev_unregister(&rgb->blue.ldev);
+	cancel_work_sync(&rgb->work);
 }
 
 static int thingm_probe(struct hid_device *hdev, const struct hid_device_id *id)
@@ -286,10 +286,10 @@ static void thingm_remove(struct hid_device *hdev)
 	struct thingm_device *tdev = hid_get_drvdata(hdev);
 	int i;
 
+	hid_hw_stop(hdev);
+
 	for (i = 0; i < tdev->fwinfo->numrgb; ++i)
 		thingm_remove_rgb(tdev->rgb + i);
-
-	hid_hw_stop(hdev);
 }
 
 static const struct hid_device_id thingm_table[] = {

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-03 13:10               ` Jiri Kosina
@ 2014-09-03 14:16                 ` Dylan Alex Simon
  2014-09-03 14:37                   ` Jiri Kosina
  0 siblings, 1 reply; 16+ messages in thread
From: Dylan Alex Simon @ 2014-09-03 14:16 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Benjamin Tissoires, linux-input

> Hrm, what we really want is to first destroy the LED interface, otherwise 
> someone might have re-queued the workqueue through thingm_led_set() after 
> it has been flushed / cancelled already.
> 
> Could you please give it one more shot with the patch below? Thanks for 
> your prompt testing.
> 
> 
> diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
> index 134be89..3d8cf5e 100644
> --- a/drivers/hid/hid-thingm.c
> +++ b/drivers/hid/hid-thingm.c
> @@ -208,10 +208,10 @@ unregister_red:
>  
>  static void thingm_remove_rgb(struct thingm_rgb *rgb)
>  {
> -	flush_work(&rgb->work);
>  	led_classdev_unregister(&rgb->red.ldev);
>  	led_classdev_unregister(&rgb->green.ldev);
>  	led_classdev_unregister(&rgb->blue.ldev);
> +	cancel_work_sync(&rgb->work);
>  }
>  
>  static int thingm_probe(struct hid_device *hdev, const struct hid_device_id *id)
> @@ -286,10 +286,10 @@ static void thingm_remove(struct hid_device *hdev)
>  	struct thingm_device *tdev = hid_get_drvdata(hdev);
>  	int i;
>  
> +	hid_hw_stop(hdev);
> +
>  	for (i = 0; i < tdev->fwinfo->numrgb; ++i)
>  		thingm_remove_rgb(tdev->rgb + i);
> -
> -	hid_hw_stop(hdev);
>  }
>  
>  static const struct hid_device_id thingm_table[] = {
> 

I think that did the trick.  I've run through the whole cycle about 20 times
in various conditions, and it seems solid so far.  Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-03 14:16                 ` Dylan Alex Simon
@ 2014-09-03 14:37                   ` Jiri Kosina
  2014-09-04  1:05                     ` Dylan Alex Simon
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Kosina @ 2014-09-03 14:37 UTC (permalink / raw)
  To: Dylan Alex Simon; +Cc: Benjamin Tissoires, linux-input

On Wed, 3 Sep 2014, Dylan Alex Simon wrote:

> I think that did the trick.  I've run through the whole cycle about 20 times
> in various conditions, and it seems solid so far.  Thanks!

Thanks. I actually think that the minimal necessary fix is below. Could 
you please do a (hopefully last) round of testing with just this patch 
applied?

diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
index 134be89..c1d21fa 100644
--- a/drivers/hid/hid-thingm.c
+++ b/drivers/hid/hid-thingm.c
@@ -208,10 +208,10 @@ unregister_red:
 
 static void thingm_remove_rgb(struct thingm_rgb *rgb)
 {
-	flush_work(&rgb->work);
 	led_classdev_unregister(&rgb->red.ldev);
 	led_classdev_unregister(&rgb->green.ldev);
 	led_classdev_unregister(&rgb->blue.ldev);
+	flush_work(&rgb->work);
 }
 
 static int thingm_probe(struct hid_device *hdev, const struct hid_device_id *id)

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-02 19:58 ` Benjamin Tissoires
  2014-09-02 20:12   ` Dylan Alex Simon
  2014-09-02 21:32   ` Jiri Kosina
@ 2014-09-03 19:48   ` Benjamin Tissoires
  2014-09-03 21:47     ` Jiri Kosina
  2 siblings, 1 reply; 16+ messages in thread
From: Benjamin Tissoires @ 2014-09-03 19:48 UTC (permalink / raw)
  To: Dylan Alex Simon, linux-input, Jiri Kosina

On Tue, Sep 2, 2014 at 3:58 PM, Benjamin Tissoires
<benjamin.tissoires@gmail.com> wrote:
> Hi,
>
> On 09/02/2014 01:46 PM, Dylan Alex Simon wrote:
>> Whenever either disconnecting the USB device or simply rmmod'ing the module
>> (even when not in use), I get a kernel panic.  I haven't managed to capture a
>> backtrace, but at least the first two lines were saved after an rmmod:
>>
>> 18:53:17 kernel: thingm 0003:27B8:01ED.0004: hidraw3: USB HID v1.01 Device [ThingM blink(1) mk2] on usb-0000:00:12.2-3.1.4/input0
>> <snip, rmmod hid-thingm:>
>> 08:38:42 kernel: BUG: unable to handle kernel paging request at fffffffb8a80aaf8
>> 08:38:42 kernel: IP: [<ffffffff8106e30c>] osq_lock+0x3c/0x110
>>
>> Let me know if you'd like me to try to capture more info, but this problem
>> seems very reproducible (at least with a mk2 device; I never had the problem on
>> older kernels with a mk1).  I do direct write()s to the hidraw device, but
>> don't otherwise use the driver while it's loaded.
>>
>> Also at https://bugzilla.kernel.org/show_bug.cgi?id=83751
>
> Do you happen to see a "unsupported firmware " error when plugging your device?
>
> If so, then the following patch should help with the panic (but you will an other one to be able to use again your device).
>
> ---
>
> From 791297375227b91990b37f94fc9de93156a3c21a Mon Sep 17 00:00:00 2001
> From: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> Date: Tue, 2 Sep 2014 15:50:43 -0400
> Subject: [PATCH] HID: thingm: set the proper error code before leaving
>
> In case of an unsupported firmware, the driver bails out without setting
> the LEDs interfaces, but forget to set the proper error code.
> err is then still equal to 0 and the hid subsytem consider the device
> to be in perfect shape.
> When removing it, thingm_remove() tries to unbind the rgb LEDs which
> has not been created, leading to a segfault.
>
> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> ---
>  drivers/hid/hid-thingm.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
> index 134be89..f91f971 100644
> --- a/drivers/hid/hid-thingm.c
> +++ b/drivers/hid/hid-thingm.c
> @@ -250,6 +250,7 @@ static int thingm_probe(struct hid_device *hdev, const struct hid_device_id *id)
>
>         if (!tdev->fwinfo) {
>                 hid_err(hdev, "unsupported firmware %c\n", tdev->version.major);
> +               err = -ENODEV;
>                 goto stop;
>         }
>
> --
> 2.1.0
>

Jiri, could you still consider including this one? If Thingm decides
to create a firmware with 3.X, it will fail, so I guess this is still
required, even if it does not fix Dylan's problem.

Cheers,
Benjamin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-03 19:48   ` hid-thingm: kernel panic " Benjamin Tissoires
@ 2014-09-03 21:47     ` Jiri Kosina
  0 siblings, 0 replies; 16+ messages in thread
From: Jiri Kosina @ 2014-09-03 21:47 UTC (permalink / raw)
  To: Benjamin Tissoires; +Cc: Dylan Alex Simon, linux-input

On Wed, 3 Sep 2014, Benjamin Tissoires wrote:

> > From: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> > Date: Tue, 2 Sep 2014 15:50:43 -0400
> > Subject: [PATCH] HID: thingm: set the proper error code before leaving
> >
> > In case of an unsupported firmware, the driver bails out without setting
> > the LEDs interfaces, but forget to set the proper error code.
> > err is then still equal to 0 and the hid subsytem consider the device
> > to be in perfect shape.
> > When removing it, thingm_remove() tries to unbind the rgb LEDs which
> > has not been created, leading to a segfault.
> >
> > Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> > ---
> >  drivers/hid/hid-thingm.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
> > index 134be89..f91f971 100644
> > --- a/drivers/hid/hid-thingm.c
> > +++ b/drivers/hid/hid-thingm.c
> > @@ -250,6 +250,7 @@ static int thingm_probe(struct hid_device *hdev, const struct hid_device_id *id)
> >
> >         if (!tdev->fwinfo) {
> >                 hid_err(hdev, "unsupported firmware %c\n", tdev->version.major);
> > +               err = -ENODEV;
> >                 goto stop;
> >         }
> 
> Jiri, could you still consider including this one? If Thingm decides
> to create a firmware with 3.X, it will fail, so I guess this is still
> required, even if it does not fix Dylan's problem.

It's now queued in for-3.17/upstream-fixes. I don't think it justifies 
pull request to Linus by itself, but if there is anything else that does, 
it'll piggy-back. Otherwise it'll go into 3.18. Please shout if it's not 
okay.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: hid-thingm: kernel panic on remove
  2014-09-03 14:37                   ` Jiri Kosina
@ 2014-09-04  1:05                     ` Dylan Alex Simon
  2014-09-04  7:02                       ` [PATCH] HID: thingm: fix workqueue race " Jiri Kosina
  0 siblings, 1 reply; 16+ messages in thread
From: Dylan Alex Simon @ 2014-09-04  1:05 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Benjamin Tissoires, linux-input

>From Jiri Kosina <jkosina@suse.cz>, Wed, Sep 03, 2014 at 04:37:05PM +0200:
> On Wed, 3 Sep 2014, Dylan Alex Simon wrote:
> 
> > I think that did the trick.  I've run through the whole cycle about 20 times
> > in various conditions, and it seems solid so far.  Thanks!
> 
> Thanks. I actually think that the minimal necessary fix is below. Could 
> you please do a (hopefully last) round of testing with just this patch 
> applied?
> 
> diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
> index 134be89..c1d21fa 100644
> --- a/drivers/hid/hid-thingm.c
> +++ b/drivers/hid/hid-thingm.c
> @@ -208,10 +208,10 @@ unregister_red:
>  
>  static void thingm_remove_rgb(struct thingm_rgb *rgb)
>  {
> -	flush_work(&rgb->work);
>  	led_classdev_unregister(&rgb->red.ldev);
>  	led_classdev_unregister(&rgb->green.ldev);
>  	led_classdev_unregister(&rgb->blue.ldev);
> +	flush_work(&rgb->work);
>  }
>  
>  static int thingm_probe(struct hid_device *hdev, const struct hid_device_id *id)
> 

(Sorry, didn't get a chance back at the machine until now.)  This seems to be
working fine, as well.  I now do get the "failed to write color" messages
again when unplugging, which didn't show up with the other patch, but no
panics.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] HID: thingm: fix workqueue race on remove
  2014-09-04  1:05                     ` Dylan Alex Simon
@ 2014-09-04  7:02                       ` Jiri Kosina
  0 siblings, 0 replies; 16+ messages in thread
From: Jiri Kosina @ 2014-09-04  7:02 UTC (permalink / raw)
  To: Dylan Alex Simon
  Cc: Benjamin Tissoires, Vivien Didelot, linux-input, linux-kernel

thingm_remove_rgb() needs to flush the workqueue after all the LED classes 
have been unregistered, otherwise the removal might race with another LED 
event coming, causing thingm_led_set() to schedule additional work after 
thingm_remove_rgb() has flushed it. This obviously causes oops later, as 
the scheduled work has been freed in the meantime.

In addition to that, move the hid_hw_stop() to an earlier place, so that 
dmesg is not polluted by failure messages about not being able to write 
the LED while the device is being shut down.

Reported-and-tested-by: Dylan Alex Simon <dylan-kernel@dylex.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

Queued in hid.git#for-3.18/upstream

 drivers/hid/hid-thingm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/hid/hid-thingm.c b/drivers/hid/hid-thingm.c
index 134be89..f206398 100644
--- a/drivers/hid/hid-thingm.c
+++ b/drivers/hid/hid-thingm.c
@@ -208,10 +208,10 @@ unregister_red:
 
 static void thingm_remove_rgb(struct thingm_rgb *rgb)
 {
-	flush_work(&rgb->work);
 	led_classdev_unregister(&rgb->red.ldev);
 	led_classdev_unregister(&rgb->green.ldev);
 	led_classdev_unregister(&rgb->blue.ldev);
+	flush_work(&rgb->work);
 }
 
 static int thingm_probe(struct hid_device *hdev, const struct hid_device_id *id)
@@ -286,10 +286,10 @@ static void thingm_remove(struct hid_device *hdev)
 	struct thingm_device *tdev = hid_get_drvdata(hdev);
 	int i;
 
+	hid_hw_stop(hdev);
+
 	for (i = 0; i < tdev->fwinfo->numrgb; ++i)
 		thingm_remove_rgb(tdev->rgb + i);
-
-	hid_hw_stop(hdev);
 }
 
 static const struct hid_device_id thingm_table[] = {

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-09-04  7:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02 17:46 hid-thingm: kernel panic on remove Dylan Alex Simon
2014-09-02 19:58 ` Benjamin Tissoires
2014-09-02 20:12   ` Dylan Alex Simon
2014-09-02 21:32   ` Jiri Kosina
2014-09-03  2:35     ` Dylan Alex Simon
2014-09-03  7:29       ` Jiri Kosina
2014-09-03  8:17         ` Dylan Alex Simon
2014-09-03  8:21           ` Jiri Kosina
2014-09-03 13:03             ` Dylan Alex Simon
2014-09-03 13:10               ` Jiri Kosina
2014-09-03 14:16                 ` Dylan Alex Simon
2014-09-03 14:37                   ` Jiri Kosina
2014-09-04  1:05                     ` Dylan Alex Simon
2014-09-04  7:02                       ` [PATCH] HID: thingm: fix workqueue race " Jiri Kosina
2014-09-03 19:48   ` hid-thingm: kernel panic " Benjamin Tissoires
2014-09-03 21:47     ` Jiri Kosina

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.