All of lore.kernel.org
 help / color / mirror / Atom feed
* hid-related 5.2-rc1 boot hang
@ 2019-05-28 16:43 Dave Hansen
  2019-05-28 17:14 ` Jiri Kosina
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2019-05-28 16:43 UTC (permalink / raw)
  To: Jiri Kosina, Hans de Goede, Benjamin Tissoires,
	open list:HID CORE LAYER, LKML

I have a system that works fine on 5.1.  When updating to 5.2-rc1, it
hangs at boot waiting on an instance of systemd-udevd.  The kernel
backtrace (https://photos.app.goo.gl/EV8rf7FofWouvdeE8) looks like it's
doing an finit_module() that dives into the hid code and is waiting on a
request_module().

This hang only occurs if I have a particular USB device inserted:

> Bus 001 Device 007: ID 046d:c52b Logitech, Inc. Unifying Receiver

Bisecting the issue points at this (unlikely to be the culprit) commit:

> [161f62cd07fde123fd52bf6d5b6fd6513cca968e] HID: macally: Add support for Macally ikey keyboard

This bisect result is probably just a bisect artifact.  The first real,
bad commit is a merge commit: 63b6f0b827d.  This commit merges a bunch
of stuff, but includes changes to the hid request_module() code and to
the logitech-hidpp which is the driver for the above device.

I also have a picture of the hang which includes __request_module()
dumping out the string it is passed:

	https://photos.app.goo.gl/tUETiCBZHJfKqWPu8

This is easy enough to work around, and the system works fine if I just
unplug the Logitech device and plug it in after boot.  But, it would be
nice to figure out what's going wrong.  I guess it could easily be some
interaction between systemd, the driver and the request_module() ordering.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-05-28 16:43 hid-related 5.2-rc1 boot hang Dave Hansen
@ 2019-05-28 17:14 ` Jiri Kosina
  2019-05-28 17:45   ` Benjamin Tissoires
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Kosina @ 2019-05-28 17:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Hans de Goede, Benjamin Tissoires, open list:HID CORE LAYER, LKML

On Tue, 28 May 2019, Dave Hansen wrote:

> I have a system that works fine on 5.1.  When updating to 5.2-rc1, it
> hangs at boot waiting on an instance of systemd-udevd.  The kernel
> backtrace (https://photos.app.goo.gl/EV8rf7FofWouvdeE8) looks like it's
> doing an finit_module() that dives into the hid code and is waiting on a
> request_module().

Dave,

thanks for the report.

Just to confirm -- I guess reverting 4ceabaf79 and a025a18fe would work 
this around, right?

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-05-28 17:14 ` Jiri Kosina
@ 2019-05-28 17:45   ` Benjamin Tissoires
  2019-05-28 18:11     ` Dave Hansen
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Tissoires @ 2019-05-28 17:45 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Dave Hansen, Hans de Goede, open list:HID CORE LAYER, LKML

On Tue, May 28, 2019 at 7:15 PM Jiri Kosina <jikos@kernel.org> wrote:
>
> On Tue, 28 May 2019, Dave Hansen wrote:
>
> > I have a system that works fine on 5.1.  When updating to 5.2-rc1, it
> > hangs at boot waiting on an instance of systemd-udevd.  The kernel
> > backtrace (https://photos.app.goo.gl/EV8rf7FofWouvdeE8) looks like it's
> > doing an finit_module() that dives into the hid code and is waiting on a
> > request_module().
>
> Dave,
>
> thanks for the report.
>
> Just to confirm -- I guess reverting 4ceabaf79 and a025a18fe would work
> this around, right?
>

It would be also interesting to know which distribution and which
systemd version you are running (if you are on systemd).

This rings a bell as I recall  playing with
request_module/request_firmware a while ago, but Hans convinced me
that no such delay would be induced.

Cheers,
Benjamin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-05-28 17:45   ` Benjamin Tissoires
@ 2019-05-28 18:11     ` Dave Hansen
  2019-05-29  9:17       ` Hans de Goede
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2019-05-28 18:11 UTC (permalink / raw)
  To: Benjamin Tissoires, Jiri Kosina
  Cc: Hans de Goede, open list:HID CORE LAYER, LKML

On 5/28/19 10:45 AM, Benjamin Tissoires wrote:
> On Tue, May 28, 2019 at 7:15 PM Jiri Kosina <jikos@kernel.org> wrote:
>> Just to confirm -- I guess reverting 4ceabaf79 and a025a18fe would work
>> this around, right?

Yes, reverting that pair on top of 5.2-rc1 works around the issue.

> It would be also interesting to know which distribution and which
> systemd version you are running (if you are on systemd).

$ systemd --version
systemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP
+LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS
+KMOD -IDN

on "Ubuntu 16.04.6"


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-05-28 18:11     ` Dave Hansen
@ 2019-05-29  9:17       ` Hans de Goede
  2019-05-30 16:56         ` Dave Hansen
  0 siblings, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2019-05-29  9:17 UTC (permalink / raw)
  To: Dave Hansen, Benjamin Tissoires, Jiri Kosina
  Cc: open list:HID CORE LAYER, LKML

Hi,

On 5/28/19 8:11 PM, Dave Hansen wrote:
> On 5/28/19 10:45 AM, Benjamin Tissoires wrote:
>> On Tue, May 28, 2019 at 7:15 PM Jiri Kosina <jikos@kernel.org> wrote:
>>> Just to confirm -- I guess reverting 4ceabaf79 and a025a18fe would work
>>> this around, right?
> 
> Yes, reverting that pair on top of 5.2-rc1 works around the issue.

Thank you for catching this and for testing the reverts. We've several bug
reports which I suspect are related to this.

/sbin/modprobe really should not hang when called by the kernel, as the
kernel does this in several other places too.

At the same time this clearly is a regression, so I'm afraid we will need
to revert the 2 commits.

Benjamin, Jiri, I really like the improvements these 2 commits bring
combined with Benjamin's changes removing the need for all the device specific
drivers to have  HID_QUIRK_HAVE_SPECIAL_DRIVER quirk.

Maybe instead of reverting them, we wrap them in a Kconfig option, which
defaults to N, with a warning that this requires an userspace where
/sbin/modprobe does not hang ?  It would be useful for the Kconfig
help text if we knew why it hangs. I guess this may have something to do
with it running from the initrd? Maybe this is not the real modprobe but
busybox's modprobe?

Dave, can you try building your initrd without the hid-logitech-dj module
included in the initrd?


Also can you check if your modprobe is provided by module-init-tools
or by kmod ?

I believe we really need more information before we can properly decide
how to deal with this. Luckily we still have same time.

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-05-29  9:17       ` Hans de Goede
@ 2019-05-30 16:56         ` Dave Hansen
  2019-05-31 22:15           ` Jiri Kosina
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2019-05-30 16:56 UTC (permalink / raw)
  To: Hans de Goede, Benjamin Tissoires, Jiri Kosina
  Cc: open list:HID CORE LAYER, LKML

On 5/29/19 2:17 AM, Hans de Goede wrote:
...
> Dave, can you try building your initrd without the hid-logitech-dj module
> included in the initrd?

I did this on a vanilla 5.2-rc2 kernel (without the reverts) and still
experienced the boot hang while the device was inserted.

> Also can you check if your modprobe is provided by module-init-tools
> or by kmod ?

$ dpkg -S `which modprobe`
kmod: /sbin/modprobe




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-05-30 16:56         ` Dave Hansen
@ 2019-05-31 22:15           ` Jiri Kosina
  2019-06-03  9:11             ` Hans de Goede
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Kosina @ 2019-05-31 22:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Hans de Goede, Benjamin Tissoires, open list:HID CORE LAYER, LKML

On Thu, 30 May 2019, Dave Hansen wrote:

> On 5/29/19 2:17 AM, Hans de Goede wrote:
> ...
> > Dave, can you try building your initrd without the hid-logitech-dj module
> > included in the initrd?
> 
> I did this on a vanilla 5.2-rc2 kernel (without the reverts) and still
> experienced the boot hang while the device was inserted.
> 
> > Also can you check if your modprobe is provided by module-init-tools
> > or by kmod ?
> 
> $ dpkg -S `which modprobe`
> kmod: /sbin/modprobe

Benjamin, Hans, are you looking into this?

If not, I think we should start reverting (at least the request_module() 
changes, not sure about the rest of logitech issues yet) next week.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-05-31 22:15           ` Jiri Kosina
@ 2019-06-03  9:11             ` Hans de Goede
  2019-06-03  9:32               ` Hans de Goede
  2019-06-03  9:51               ` Hans de Goede
  0 siblings, 2 replies; 22+ messages in thread
From: Hans de Goede @ 2019-06-03  9:11 UTC (permalink / raw)
  To: Jiri Kosina, Dave Hansen
  Cc: Benjamin Tissoires, open list:HID CORE LAYER, LKML

Hi,

On 01-06-19 00:15, Jiri Kosina wrote:
> On Thu, 30 May 2019, Dave Hansen wrote:
> 
>> On 5/29/19 2:17 AM, Hans de Goede wrote:
>> ...
>>> Dave, can you try building your initrd without the hid-logitech-dj module
>>> included in the initrd?
>>
>> I did this on a vanilla 5.2-rc2 kernel (without the reverts) and still
>> experienced the boot hang while the device was inserted.
>>
>>> Also can you check if your modprobe is provided by module-init-tools
>>> or by kmod ?
>>
>> $ dpkg -S `which modprobe`
>> kmod: /sbin/modprobe
> 
> Benjamin, Hans, are you looking into this?

Not really, I cannot reproduce the request_module problem. I was hoping some
of the info from Dave would help to pinpoint it, but it does not :|

> If not, I think we should start reverting (at least the request_module()
> changes

I agree we need to do something about the request_module changes.

I myself was thinking about somehow making them conditional, e.g. we
could add a (temporary) module option defaulting to false for this
while we investigate further.

I'm afraid that if we just revert we will never find the root cause and then
we will be stuck with the suboptimal behavior of first the generic hid driver
binding followed by a unbind + bind of the new driver shortly afterwards,
which also leads to a ton of udev events being fired to userspace (well I
guess this does make for a good stress test of the userspace hotplug code).

> not sure about the rest of logitech issues yet) next week.

The main problem seems to be the request_module patches. Although I also
have 2 reports of problems with hid-logitech-dj driving the 0xc52f product-id,
so we may need to drop that product-id from hid-logitech-dj, I'm working on
that one...

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-03  9:11             ` Hans de Goede
@ 2019-06-03  9:32               ` Hans de Goede
  2019-06-03  9:51               ` Hans de Goede
  1 sibling, 0 replies; 22+ messages in thread
From: Hans de Goede @ 2019-06-03  9:32 UTC (permalink / raw)
  To: Jiri Kosina, Dave Hansen
  Cc: Benjamin Tissoires, open list:HID CORE LAYER, LKML

Hi,

On 03-06-19 11:11, Hans de Goede wrote:
> Hi,
> 
> On 01-06-19 00:15, Jiri Kosina wrote:
>> On Thu, 30 May 2019, Dave Hansen wrote:
>>
>>> On 5/29/19 2:17 AM, Hans de Goede wrote:
>>> ...
>>>> Dave, can you try building your initrd without the hid-logitech-dj module
>>>> included in the initrd?
>>>
>>> I did this on a vanilla 5.2-rc2 kernel (without the reverts) and still
>>> experienced the boot hang while the device was inserted.
>>>
>>>> Also can you check if your modprobe is provided by module-init-tools
>>>> or by kmod ?
>>>
>>> $ dpkg -S `which modprobe`
>>> kmod: /sbin/modprobe
>>
>> Benjamin, Hans, are you looking into this?
> 
> Not really, I cannot reproduce the request_module problem. I was hoping some
> of the info from Dave would help to pinpoint it, but it does not :|
> 
>> If not, I think we should start reverting (at least the request_module()
>> changes
> 
> I agree we need to do something about the request_module changes.
> 
> I myself was thinking about somehow making them conditional, e.g. we
> could add a (temporary) module option defaulting to false for this
> while we investigate further.
> 
> I'm afraid that if we just revert we will never find the root cause and then
> we will be stuck with the suboptimal behavior of first the generic hid driver
> binding followed by a unbind + bind of the new driver shortly afterwards,
> which also leads to a ton of udev events being fired to userspace (well I
> guess this does make for a good stress test of the userspace hotplug code).

Quick update, we have another report of module-loading related problems
which are likely related:

https://bugzilla.kernel.org/show_bug.cgi?id=203741

In this case there is no hang, instead there is a 1 to 3 minute delay.

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-03  9:11             ` Hans de Goede
  2019-06-03  9:32               ` Hans de Goede
@ 2019-06-03  9:51               ` Hans de Goede
  2019-06-03 13:55                 ` Benjamin Tissoires
  1 sibling, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2019-06-03  9:51 UTC (permalink / raw)
  To: Jiri Kosina, Dave Hansen
  Cc: Benjamin Tissoires, open list:HID CORE LAYER, LKML

Hi Again,

On 03-06-19 11:11, Hans de Goede wrote:
<snip>

>> not sure about the rest of logitech issues yet) next week.
> 
> The main problem seems to be the request_module patches. Although I also
> have 2 reports of problems with hid-logitech-dj driving the 0xc52f product-id,
> so we may need to drop that product-id from hid-logitech-dj, I'm working on
> that one...

Besides the modprobe hanging issue, the only other issues all
(2 reporters) seem to be with 0xc52f receivers. We have a bug
open for this:

https://bugzilla.kernel.org/show_bug.cgi?id=203619

And I've asked the reporter of the second bug to add his logs
to that bug.

Regards,

Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-03  9:51               ` Hans de Goede
@ 2019-06-03 13:55                 ` Benjamin Tissoires
  2019-06-03 14:17                   ` Hans de Goede
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Tissoires @ 2019-06-03 13:55 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Jiri Kosina, Dave Hansen, open list:HID CORE LAYER, LKML

On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi Again,
>
> On 03-06-19 11:11, Hans de Goede wrote:
> <snip>
>
> >> not sure about the rest of logitech issues yet) next week.
> >
> > The main problem seems to be the request_module patches. Although I also

Can't we use request_module_nowait() instead, and set a reasonable
timeout that we detect only once to check if userspace is compatible:

In pseudo-code:
if (!request_module_checked) {
  request_module_nowait(name);
  use_request_module = wait_event_timeout(wq,
        first_module_loaded, 10 seconds in jiffies);
  request_module_checked = true;
} else if (use_request_module) {
  request_module(name);
}


> > have 2 reports of problems with hid-logitech-dj driving the 0xc52f product-id,
> > so we may need to drop that product-id from hid-logitech-dj, I'm working on
> > that one...
>
> Besides the modprobe hanging issue, the only other issues all
> (2 reporters) seem to be with 0xc52f receivers. We have a bug
> open for this:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=203619
>
> And I've asked the reporter of the second bug to add his logs
> to that bug.

We should likely just remove c52f from the list of supported devices.
C52f receivers seem to have a different firmware as they are meant to
work with different devices than C534. So I guess it is safer to not
handle those right now and get the code in when it is ready.

Cheers,
Benjamin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-03 13:55                 ` Benjamin Tissoires
@ 2019-06-03 14:17                   ` Hans de Goede
  2019-06-04  7:51                     ` Benjamin Tissoires
  0 siblings, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2019-06-03 14:17 UTC (permalink / raw)
  To: Benjamin Tissoires
  Cc: Jiri Kosina, Dave Hansen, open list:HID CORE LAYER, LKML

Hi,

On 03-06-19 15:55, Benjamin Tissoires wrote:
> On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi Again,
>>
>> On 03-06-19 11:11, Hans de Goede wrote:
>> <snip>
>>
>>>> not sure about the rest of logitech issues yet) next week.
>>>
>>> The main problem seems to be the request_module patches. Although I also
> 
> Can't we use request_module_nowait() instead, and set a reasonable
> timeout that we detect only once to check if userspace is compatible:
> 
> In pseudo-code:
> if (!request_module_checked) {
>    request_module_nowait(name);
>    use_request_module = wait_event_timeout(wq,
>          first_module_loaded, 10 seconds in jiffies);
>    request_module_checked = true;
> } else if (use_request_module) {
>    request_module(name);
> }

Well looking at the just attached dmesg , the modprobe
when triggered by udev from userspace succeeds in about
0.5 seconds, so it seems that the modprobe hangs happens
when called from within the kernel rather then from within
userspace.

What I do not know if is the hang is inside userspace, or
maybe it happens when modprobe calls back into the kernel,
if the hang happens when modprobe calls back into the kernel,
then other modprobes (done from udev) likely will hang too
since I think only 1 modprobe can happen at a time.

I really wish we knew what distinguished working systems
from non working systems :|

I cannot find a common denominator; other then the systems
are not running Fedora. So far we've reports from both Ubuntu 16.04
and Tumbleweed, so software version wise these 2 are wide apart.

>>> have 2 reports of problems with hid-logitech-dj driving the 0xc52f product-id,
>>> so we may need to drop that product-id from hid-logitech-dj, I'm working on
>>> that one...
>>
>> Besides the modprobe hanging issue, the only other issues all
>> (2 reporters) seem to be with 0xc52f receivers. We have a bug
>> open for this:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=203619
>>
>> And I've asked the reporter of the second bug to add his logs
>> to that bug.
> 
> We should likely just remove c52f from the list of supported devices.
> C52f receivers seem to have a different firmware as they are meant to
> work with different devices than C534. So I guess it is safer to not
> handle those right now and get the code in when it is ready.

Ack. Can you prepare a patch to drop the c52f id?

Regards,

Hans



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-03 14:17                   ` Hans de Goede
@ 2019-06-04  7:51                     ` Benjamin Tissoires
  2019-06-04  8:05                       ` Hans de Goede
  2019-06-04 10:08                       ` Benjamin Tissoires
  0 siblings, 2 replies; 22+ messages in thread
From: Benjamin Tissoires @ 2019-06-04  7:51 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Jiri Kosina, Dave Hansen, open list:HID CORE LAYER, LKML

On Mon, Jun 3, 2019 at 4:17 PM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 03-06-19 15:55, Benjamin Tissoires wrote:
> > On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@redhat.com> wrote:
> >>
> >> Hi Again,
> >>
> >> On 03-06-19 11:11, Hans de Goede wrote:
> >> <snip>
> >>
> >>>> not sure about the rest of logitech issues yet) next week.
> >>>
> >>> The main problem seems to be the request_module patches. Although I also
> >
> > Can't we use request_module_nowait() instead, and set a reasonable
> > timeout that we detect only once to check if userspace is compatible:
> >
> > In pseudo-code:
> > if (!request_module_checked) {
> >    request_module_nowait(name);
> >    use_request_module = wait_event_timeout(wq,
> >          first_module_loaded, 10 seconds in jiffies);
> >    request_module_checked = true;
> > } else if (use_request_module) {
> >    request_module(name);
> > }
>
> Well looking at the just attached dmesg , the modprobe
> when triggered by udev from userspace succeeds in about
> 0.5 seconds, so it seems that the modprobe hangs happens
> when called from within the kernel rather then from within
> userspace.
>
> What I do not know if is the hang is inside userspace, or
> maybe it happens when modprobe calls back into the kernel,
> if the hang happens when modprobe calls back into the kernel,
> then other modprobes (done from udev) likely will hang too
> since I think only 1 modprobe can happen at a time.
>
> I really wish we knew what distinguished working systems
> from non working systems :|
>
> I cannot find a common denominator; other then the systems
> are not running Fedora. So far we've reports from both Ubuntu 16.04
> and Tumbleweed, so software version wise these 2 are wide apart.

I am trying to reproduce the lock locally, and installed an opensuse
Tumbleweed in a VM. When forwarding a Unifying receiver to the VM, I
do not see the lock with either my vanilla compiled kernel and the rpm
found in http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/

Next step is install Tumbleweed on bare metal, but I do not see how
this could introduce a difference (maybe USB2 vs 3).

>
> >>> have 2 reports of problems with hid-logitech-dj driving the 0xc52f product-id,
> >>> so we may need to drop that product-id from hid-logitech-dj, I'm working on
> >>> that one...
> >>
> >> Besides the modprobe hanging issue, the only other issues all
> >> (2 reporters) seem to be with 0xc52f receivers. We have a bug
> >> open for this:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=203619
> >>
> >> And I've asked the reporter of the second bug to add his logs
> >> to that bug.
> >
> > We should likely just remove c52f from the list of supported devices.
> > C52f receivers seem to have a different firmware as they are meant to
> > work with different devices than C534. So I guess it is safer to not
> > handle those right now and get the code in when it is ready.
>
> Ack. Can you prepare a patch to drop the c52f id?

Yes. I have an other revert never submitted that I need to push, so I
guess I can do a revert session today.

I think I'll also buy one device with hopefully the C52F receiver as
the report descriptors attached in
https://bugzilla.kernel.org/show_bug.cgi?id=203619 seems different to
what I would have expected.

Cheers,
Benjamin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-04  7:51                     ` Benjamin Tissoires
@ 2019-06-04  8:05                       ` Hans de Goede
  2019-06-04  8:36                         ` Hans de Goede
  2019-06-04 10:08                       ` Benjamin Tissoires
  1 sibling, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2019-06-04  8:05 UTC (permalink / raw)
  To: Benjamin Tissoires
  Cc: Jiri Kosina, Dave Hansen, open list:HID CORE LAYER, LKML

Hi,

On 04-06-19 09:51, Benjamin Tissoires wrote:
> On Mon, Jun 3, 2019 at 4:17 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 03-06-19 15:55, Benjamin Tissoires wrote:
>>> On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@redhat.com> wrote:
>>>>
>>>> Hi Again,
>>>>
>>>> On 03-06-19 11:11, Hans de Goede wrote:
>>>> <snip>
>>>>
>>>>>> not sure about the rest of logitech issues yet) next week.
>>>>>
>>>>> The main problem seems to be the request_module patches. Although I also
>>>
>>> Can't we use request_module_nowait() instead, and set a reasonable
>>> timeout that we detect only once to check if userspace is compatible:
>>>
>>> In pseudo-code:
>>> if (!request_module_checked) {
>>>     request_module_nowait(name);
>>>     use_request_module = wait_event_timeout(wq,
>>>           first_module_loaded, 10 seconds in jiffies);
>>>     request_module_checked = true;
>>> } else if (use_request_module) {
>>>     request_module(name);
>>> }
>>
>> Well looking at the just attached dmesg , the modprobe
>> when triggered by udev from userspace succeeds in about
>> 0.5 seconds, so it seems that the modprobe hangs happens
>> when called from within the kernel rather then from within
>> userspace.
>>
>> What I do not know if is the hang is inside userspace, or
>> maybe it happens when modprobe calls back into the kernel,
>> if the hang happens when modprobe calls back into the kernel,
>> then other modprobes (done from udev) likely will hang too
>> since I think only 1 modprobe can happen at a time.
>>
>> I really wish we knew what distinguished working systems
>> from non working systems :|
>>
>> I cannot find a common denominator; other then the systems
>> are not running Fedora. So far we've reports from both Ubuntu 16.04
>> and Tumbleweed, so software version wise these 2 are wide apart.
> 
> I am trying to reproduce the lock locally, and installed an opensuse
> Tumbleweed in a VM. When forwarding a Unifying receiver to the VM, I
> do not see the lock with either my vanilla compiled kernel and the rpm
> found in http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/
> 
> Next step is install Tumbleweed on bare metal, but I do not see how
> this could introduce a difference (maybe USB2 vs 3).

Ok, thank you for looking into this.

>>>>> have 2 reports of problems with hid-logitech-dj driving the 0xc52f product-id,
>>>>> so we may need to drop that product-id from hid-logitech-dj, I'm working on
>>>>> that one...
>>>>
>>>> Besides the modprobe hanging issue, the only other issues all
>>>> (2 reporters) seem to be with 0xc52f receivers. We have a bug
>>>> open for this:
>>>>
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=203619
>>>>
>>>> And I've asked the reporter of the second bug to add his logs
>>>> to that bug.
>>>
>>> We should likely just remove c52f from the list of supported devices.
>>> C52f receivers seem to have a different firmware as they are meant to
>>> work with different devices than C534. So I guess it is safer to not
>>> handle those right now and get the code in when it is ready.
>>
>> Ack. Can you prepare a patch to drop the c52f id?
> 
> Yes. I have an other revert never submitted that I need to push, so I
> guess I can do a revert session today.
> 
> I think I'll also buy one device with hopefully the C52F receiver as
> the report descriptors attached in
> https://bugzilla.kernel.org/show_bug.cgi?id=203619 seems different to
> what I would have expected.

They are actually what I expected :)

The first USB interface is a mouse boot class device, since this is a mouse
only receiver. This means that the mouse report is unnumbered and we need to
extend the unnumbered mouse-report handling to handle this case. Also the
device is using the same highres mouse-reports as the gaming receiver is.

I'm actually preparing a patch right now which should fix this. Still might
be better to do the revert for 5.2 and get proper support for the c52f
receiver into 5.3.

Regards,

Hans

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-04  8:05                       ` Hans de Goede
@ 2019-06-04  8:36                         ` Hans de Goede
  2019-06-04  8:53                           ` Benjamin Tissoires
  0 siblings, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2019-06-04  8:36 UTC (permalink / raw)
  To: Benjamin Tissoires
  Cc: Jiri Kosina, Dave Hansen, open list:HID CORE LAYER, LKML

Hi,

On 04-06-19 10:05, Hans de Goede wrote:

<snip>

>>>> We should likely just remove c52f from the list of supported devices.
>>>> C52f receivers seem to have a different firmware as they are meant to
>>>> work with different devices than C534. So I guess it is safer to not
>>>> handle those right now and get the code in when it is ready.
>>>
>>> Ack. Can you prepare a patch to drop the c52f id?
>>
>> Yes. I have an other revert never submitted that I need to push, so I
>> guess I can do a revert session today.
>>
>> I think I'll also buy one device with hopefully the C52F receiver as
>> the report descriptors attached in
>> https://bugzilla.kernel.org/show_bug.cgi?id=203619 seems different to
>> what I would have expected.
> 
> They are actually what I expected :)
> 
> The first USB interface is a mouse boot class device, since this is a mouse
> only receiver. This means that the mouse report is unnumbered and we need to
> extend the unnumbered mouse-report handling to handle this case. Also the
> device is using the same highres mouse-reports as the gaming receiver is.
> 
> I'm actually preparing a patch right now which should fix this. Still might
> be better to do the revert for 5.2 and get proper support for the c52f
> receiver into 5.3.

I've attached a patch to the bug:
https://bugzilla.kernel.org/show_bug.cgi?id=203619

Which should fix this. It is quite simple and safe, so if we get testing
feedback relatively soon, we could go with the fix instead of dropping the
product-id, your call.

Regards,

Hans

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-04  8:36                         ` Hans de Goede
@ 2019-06-04  8:53                           ` Benjamin Tissoires
  2019-06-04 21:08                             ` Jiri Kosina
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Tissoires @ 2019-06-04  8:53 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Jiri Kosina, Dave Hansen, open list:HID CORE LAYER, LKML

On Tue, Jun 4, 2019 at 10:36 AM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 04-06-19 10:05, Hans de Goede wrote:
>
> <snip>
>
> >>>> We should likely just remove c52f from the list of supported devices.
> >>>> C52f receivers seem to have a different firmware as they are meant to
> >>>> work with different devices than C534. So I guess it is safer to not
> >>>> handle those right now and get the code in when it is ready.
> >>>
> >>> Ack. Can you prepare a patch to drop the c52f id?
> >>
> >> Yes. I have an other revert never submitted that I need to push, so I
> >> guess I can do a revert session today.
> >>
> >> I think I'll also buy one device with hopefully the C52F receiver as
> >> the report descriptors attached in
> >> https://bugzilla.kernel.org/show_bug.cgi?id=203619 seems different to
> >> what I would have expected.
> >
> > They are actually what I expected :)
> >
> > The first USB interface is a mouse boot class device, since this is a mouse
> > only receiver. This means that the mouse report is unnumbered and we need to
> > extend the unnumbered mouse-report handling to handle this case. Also the
> > device is using the same highres mouse-reports as the gaming receiver is.
> >
> > I'm actually preparing a patch right now which should fix this. Still might
> > be better to do the revert for 5.2 and get proper support for the c52f
> > receiver into 5.3.
>
> I've attached a patch to the bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=203619

Cool, thanks.

>
> Which should fix this. It is quite simple and safe, so if we get testing
> feedback relatively soon, we could go with the fix instead of dropping the
> product-id, your call.

I should receive the M280 tomorrow, hopefully with the C52F. If the
receiver is correct and the tests are successful, I'd prefer to take
this one over the revert :)

Cheers,
Benjamin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-04  7:51                     ` Benjamin Tissoires
  2019-06-04  8:05                       ` Hans de Goede
@ 2019-06-04 10:08                       ` Benjamin Tissoires
  2019-06-04 10:50                         ` Hans de Goede
  1 sibling, 1 reply; 22+ messages in thread
From: Benjamin Tissoires @ 2019-06-04 10:08 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Jiri Kosina, Dave Hansen, open list:HID CORE LAYER, LKML

On Tue, Jun 4, 2019 at 9:51 AM Benjamin Tissoires
<benjamin.tissoires@redhat.com> wrote:
>
> On Mon, Jun 3, 2019 at 4:17 PM Hans de Goede <hdegoede@redhat.com> wrote:
> >
> > Hi,
> >
> > On 03-06-19 15:55, Benjamin Tissoires wrote:
> > > On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@redhat.com> wrote:
> > >>
> > >> Hi Again,
> > >>
> > >> On 03-06-19 11:11, Hans de Goede wrote:
> > >> <snip>
> > >>
> > >>>> not sure about the rest of logitech issues yet) next week.
> > >>>
> > >>> The main problem seems to be the request_module patches. Although I also
> > >
> > > Can't we use request_module_nowait() instead, and set a reasonable
> > > timeout that we detect only once to check if userspace is compatible:
> > >
> > > In pseudo-code:
> > > if (!request_module_checked) {
> > >    request_module_nowait(name);
> > >    use_request_module = wait_event_timeout(wq,
> > >          first_module_loaded, 10 seconds in jiffies);
> > >    request_module_checked = true;
> > > } else if (use_request_module) {
> > >    request_module(name);
> > > }
> >
> > Well looking at the just attached dmesg , the modprobe
> > when triggered by udev from userspace succeeds in about
> > 0.5 seconds, so it seems that the modprobe hangs happens
> > when called from within the kernel rather then from within
> > userspace.
> >
> > What I do not know if is the hang is inside userspace, or
> > maybe it happens when modprobe calls back into the kernel,
> > if the hang happens when modprobe calls back into the kernel,
> > then other modprobes (done from udev) likely will hang too
> > since I think only 1 modprobe can happen at a time.
> >
> > I really wish we knew what distinguished working systems
> > from non working systems :|
> >
> > I cannot find a common denominator; other then the systems
> > are not running Fedora. So far we've reports from both Ubuntu 16.04
> > and Tumbleweed, so software version wise these 2 are wide apart.
>
> I am trying to reproduce the lock locally, and installed an opensuse
> Tumbleweed in a VM. When forwarding a Unifying receiver to the VM, I
> do not see the lock with either my vanilla compiled kernel and the rpm
> found in http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/
>
> Next step is install Tumbleweed on bare metal, but I do not see how
> this could introduce a difference (maybe USB2 vs 3).

Making progress here.

The difference between Ubuntu/Tumbleweed and Fedora: usbhid is shipped
as a module while in Fedora usbhid is included in the kernel.

If I rmmod hid_* and usbhid, then modprobe usbhid, the command hangs
for 3 minutes.
If usbhid is already loaded, inserting a receiver is immediate
regarding the loading of the external modules.

So my assumption is that when the device gets detected at boot, usbhid
gets loaded by the kernel event, which in turns attempts to call
__request_module, but the modprobe can't be fulfilled because it's
already waiting for the initial usbhid modprobe to finish.

Still don't know how to solve that, but I thought I should share.

Cheers,
Benjamin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-04 10:08                       ` Benjamin Tissoires
@ 2019-06-04 10:50                         ` Hans de Goede
  2019-06-04 12:25                           ` Benjamin Tissoires
  0 siblings, 1 reply; 22+ messages in thread
From: Hans de Goede @ 2019-06-04 10:50 UTC (permalink / raw)
  To: Benjamin Tissoires
  Cc: Jiri Kosina, Dave Hansen, open list:HID CORE LAYER, LKML

Hi,

On 04-06-19 12:08, Benjamin Tissoires wrote:
> On Tue, Jun 4, 2019 at 9:51 AM Benjamin Tissoires
> <benjamin.tissoires@redhat.com> wrote:
>>
>> On Mon, Jun 3, 2019 at 4:17 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>>
>>> Hi,
>>>
>>> On 03-06-19 15:55, Benjamin Tissoires wrote:
>>>> On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@redhat.com> wrote:
>>>>>
>>>>> Hi Again,
>>>>>
>>>>> On 03-06-19 11:11, Hans de Goede wrote:
>>>>> <snip>
>>>>>
>>>>>>> not sure about the rest of logitech issues yet) next week.
>>>>>>
>>>>>> The main problem seems to be the request_module patches. Although I also
>>>>
>>>> Can't we use request_module_nowait() instead, and set a reasonable
>>>> timeout that we detect only once to check if userspace is compatible:
>>>>
>>>> In pseudo-code:
>>>> if (!request_module_checked) {
>>>>     request_module_nowait(name);
>>>>     use_request_module = wait_event_timeout(wq,
>>>>           first_module_loaded, 10 seconds in jiffies);
>>>>     request_module_checked = true;
>>>> } else if (use_request_module) {
>>>>     request_module(name);
>>>> }
>>>
>>> Well looking at the just attached dmesg , the modprobe
>>> when triggered by udev from userspace succeeds in about
>>> 0.5 seconds, so it seems that the modprobe hangs happens
>>> when called from within the kernel rather then from within
>>> userspace.
>>>
>>> What I do not know if is the hang is inside userspace, or
>>> maybe it happens when modprobe calls back into the kernel,
>>> if the hang happens when modprobe calls back into the kernel,
>>> then other modprobes (done from udev) likely will hang too
>>> since I think only 1 modprobe can happen at a time.
>>>
>>> I really wish we knew what distinguished working systems
>>> from non working systems :|
>>>
>>> I cannot find a common denominator; other then the systems
>>> are not running Fedora. So far we've reports from both Ubuntu 16.04
>>> and Tumbleweed, so software version wise these 2 are wide apart.
>>
>> I am trying to reproduce the lock locally, and installed an opensuse
>> Tumbleweed in a VM. When forwarding a Unifying receiver to the VM, I
>> do not see the lock with either my vanilla compiled kernel and the rpm
>> found in http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/
>>
>> Next step is install Tumbleweed on bare metal, but I do not see how
>> this could introduce a difference (maybe USB2 vs 3).
> 
> Making progress here.
> 
> The difference between Ubuntu/Tumbleweed and Fedora: usbhid is shipped
> as a module while in Fedora usbhid is included in the kernel.
> 
> If I rmmod hid_* and usbhid, then modprobe usbhid, the command hangs
> for 3 minutes.
> If usbhid is already loaded, inserting a receiver is immediate
> regarding the loading of the external modules.
> 
> So my assumption is that when the device gets detected at boot, usbhid
> gets loaded by the kernel event, which in turns attempts to call
> __request_module, but the modprobe can't be fulfilled because it's
> already waiting for the initial usbhid modprobe to finish.
> 
> Still don't know how to solve that, but I thought I should share.

Hmm, we may be hitting the scenario described in the big comment
around line 3500 of kernel/module.c.

But I'm not sure that is what is happening here.

Maybe you can put a WARN_ON(1) in request_module and look at the
backtrace ?  That may help to figure out what is going on; or
alternatively it might help to find some way to detect this and
if it happens skip the request_module...

Regards,

Hans

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-04 10:50                         ` Hans de Goede
@ 2019-06-04 12:25                           ` Benjamin Tissoires
  2019-06-04 13:12                             ` Hans de Goede
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Tissoires @ 2019-06-04 12:25 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Jiri Kosina, Dave Hansen, open list:HID CORE LAYER, LKML

[-- Attachment #1: Type: text/plain, Size: 6271 bytes --]

On Tue, Jun 4, 2019 at 12:50 PM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 04-06-19 12:08, Benjamin Tissoires wrote:
> > On Tue, Jun 4, 2019 at 9:51 AM Benjamin Tissoires
> > <benjamin.tissoires@redhat.com> wrote:
> >>
> >> On Mon, Jun 3, 2019 at 4:17 PM Hans de Goede <hdegoede@redhat.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 03-06-19 15:55, Benjamin Tissoires wrote:
> >>>> On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@redhat.com> wrote:
> >>>>>
> >>>>> Hi Again,
> >>>>>
> >>>>> On 03-06-19 11:11, Hans de Goede wrote:
> >>>>> <snip>
> >>>>>
> >>>>>>> not sure about the rest of logitech issues yet) next week.
> >>>>>>
> >>>>>> The main problem seems to be the request_module patches. Although I also
> >>>>
> >>>> Can't we use request_module_nowait() instead, and set a reasonable
> >>>> timeout that we detect only once to check if userspace is compatible:
> >>>>
> >>>> In pseudo-code:
> >>>> if (!request_module_checked) {
> >>>>     request_module_nowait(name);
> >>>>     use_request_module = wait_event_timeout(wq,
> >>>>           first_module_loaded, 10 seconds in jiffies);
> >>>>     request_module_checked = true;
> >>>> } else if (use_request_module) {
> >>>>     request_module(name);
> >>>> }
> >>>
> >>> Well looking at the just attached dmesg , the modprobe
> >>> when triggered by udev from userspace succeeds in about
> >>> 0.5 seconds, so it seems that the modprobe hangs happens
> >>> when called from within the kernel rather then from within
> >>> userspace.
> >>>
> >>> What I do not know if is the hang is inside userspace, or
> >>> maybe it happens when modprobe calls back into the kernel,
> >>> if the hang happens when modprobe calls back into the kernel,
> >>> then other modprobes (done from udev) likely will hang too
> >>> since I think only 1 modprobe can happen at a time.
> >>>
> >>> I really wish we knew what distinguished working systems
> >>> from non working systems :|
> >>>
> >>> I cannot find a common denominator; other then the systems
> >>> are not running Fedora. So far we've reports from both Ubuntu 16.04
> >>> and Tumbleweed, so software version wise these 2 are wide apart.
> >>
> >> I am trying to reproduce the lock locally, and installed an opensuse
> >> Tumbleweed in a VM. When forwarding a Unifying receiver to the VM, I
> >> do not see the lock with either my vanilla compiled kernel and the rpm
> >> found in http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/
> >>
> >> Next step is install Tumbleweed on bare metal, but I do not see how
> >> this could introduce a difference (maybe USB2 vs 3).
> >
> > Making progress here.
> >
> > The difference between Ubuntu/Tumbleweed and Fedora: usbhid is shipped
> > as a module while in Fedora usbhid is included in the kernel.
> >
> > If I rmmod hid_* and usbhid, then modprobe usbhid, the command hangs
> > for 3 minutes.
> > If usbhid is already loaded, inserting a receiver is immediate
> > regarding the loading of the external modules.
> >
> > So my assumption is that when the device gets detected at boot, usbhid
> > gets loaded by the kernel event, which in turns attempts to call
> > __request_module, but the modprobe can't be fulfilled because it's
> > already waiting for the initial usbhid modprobe to finish.
> >
> > Still don't know how to solve that, but I thought I should share.
>
> Hmm, we may be hitting the scenario described in the big comment
> around line 3500 of kernel/module.c.

Well, we are not locking during do_init_module(), but in waiting for
the completion of request_module(). So as I read the trace, we wait
for userspace to call/terminate modprobe.

>
> But I'm not sure that is what is happening here.
>
> Maybe you can put a WARN_ON(1) in request_module and look at the
> backtrace ?  That may help to figure out what is going on; or
> alternatively it might help to find some way to detect this and
> if it happens skip the request_module...

Ftrace is much easier to deal with:
---
/sys/kernel/debug/tracing # cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 4/4   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
        modprobe-23236 [003] ....  9191.880917: __request_module
<-hid_add_device
        modprobe-23236 [003] ....  9191.880937: <stack trace>
 => 0xffffffffc0851061
 => __request_module
 => hid_add_device
 => usbhid_probe
 => usb_probe_interface
 => really_probe
 => driver_probe_device
 => device_driver_attach
 => __driver_attach
 => bus_for_each_dev
 => bus_add_driver
 => driver_register
 => usb_register_driver
 => hid_init
 => do_one_initcall
 => do_init_module
 => load_module
 => __do_sys_finit_module
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe
        modprobe-23236 [003] ....  9191.892765: __request_module
<-hid_add_device
        modprobe-23236 [003] ....  9191.892784: <stack trace>
 => 0xffffffffc0851061
 => __request_module
 => hid_add_device
 => usbhid_probe
 => usb_probe_interface
 => really_probe
 => driver_probe_device
 => device_driver_attach
 => __driver_attach
 => bus_for_each_dev
 => bus_add_driver
 => driver_register
 => usb_register_driver
 => hid_init
 => do_one_initcall
 => do_init_module
 => load_module
 => __do_sys_finit_module
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe
---

This is what happen with the logitech receiver plugged in and while
modprobing usbhid. The modprobe command hangs, and I took the trace
here.
If I hit Ctrl-C, the modprobe loading continue properly.

There is nothing special here in the stack trace, except that there
are 2 calls to request_module here (one for the touchscreen I guess
and one for the logitech receiver).

I have attached the call graph of the same sequence:
wait_for_completion_killable() seems to be the culprit, we are waiting
for userspace to notify it has done calling modprobe.

My idea would be to defer any call to hid_add_device() into a
workqueue and see if that unlocks the situation.

Cheers,
Benjamin

[-- Attachment #2: function_graph.txt --]
[-- Type: text/plain, Size: 21638 bytes --]

/sys/kernel/debug/tracing # cat trace
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0)               |  __request_module() {
 0)   0.256 us    |    current_is_async();
 0)               |    security_kernel_module_request() {
 0)   0.307 us    |      integrity_kernel_module_request();
 0)   0.625 us    |    }
 0)               |    kmem_cache_alloc_trace() {
 0)               |      _cond_resched() {
 0)   0.123 us    |        rcu_all_qs();
 0)   0.355 us    |      }
 0)   0.122 us    |      should_failslab();
 0)               |      _cond_resched() {
 0)   0.120 us    |        rcu_all_qs();
 0)   0.355 us    |      }
 0)   0.120 us    |      memcg_kmem_put_cache();
 0)   1.563 us    |    }
 0)               |    kstrdup() {
 0)               |      __kmalloc_track_caller() {
 0)   0.115 us    |        kmalloc_slab();
 0)               |        _cond_resched() {
 0)   0.118 us    |          rcu_all_qs();
 0)   0.348 us    |        }
 0)   0.119 us    |        should_failslab();
 0)               |        _cond_resched() {
 0)   0.115 us    |          rcu_all_qs();
 0)   0.345 us    |        }
 0)   0.119 us    |        memcg_kmem_put_cache();
 0)   1.757 us    |      }
 0)   2.040 us    |    }
 0)               |    call_usermodehelper_setup() {
 0)               |      kmem_cache_alloc_trace() {
 0)               |        _cond_resched() {
 0)   0.116 us    |          rcu_all_qs();
 0)   0.345 us    |        }
 0)   0.117 us    |        should_failslab();
 0)               |        _cond_resched() {
 0)   0.118 us    |          rcu_all_qs();
 0)   0.347 us    |        }
 0)   0.123 us    |        memcg_kmem_put_cache();
 0)   1.792 us    |      }
 0)   2.023 us    |    }
 0)               |    call_usermodehelper_exec() {
 0)               |      queue_work_on() {
 0)               |        __queue_work() {
 0)   0.126 us    |          get_work_pool();
 0)   0.183 us    |          _raw_spin_lock();
 0)               |          insert_work() {
 0)   0.121 us    |            get_pwq.isra.0();
 0)               |            wake_up_process() {
 0)               |              try_to_wake_up() {
 0)   0.227 us    |                _raw_spin_lock_irqsave();
 0)               |                select_task_rq_fair() {
 0)   0.133 us    |                  available_idle_cpu();
 0)   0.166 us    |                  update_cfs_rq_h_load();
 0)               |                  select_idle_sibling() {
 0)   0.137 us    |                    available_idle_cpu();
 0)   0.371 us    |                  }
 0)   1.299 us    |                }
 0)   0.133 us    |                _raw_spin_lock();
 0)   0.148 us    |                update_rq_clock();
 0)               |                ttwu_do_activate() {
 0)               |                  activate_task() {
 0)               |                    enqueue_task_fair() {
 0)               |                      enqueue_entity() {
 0)   0.123 us    |                        update_curr();
 0)               |                        __update_load_avg_se() {
 0)   0.118 us    |                          __accumulate_pelt_segments();
 0)   0.420 us    |                        }
 0)   0.125 us    |                        __update_load_avg_cfs_rq();
 0)   0.122 us    |                        update_cfs_group();
 0)   0.138 us    |                        account_entity_enqueue();
 0)   0.120 us    |                        place_entity();
 0)   0.119 us    |                        __enqueue_entity();
 0)   2.146 us    |                      }
 0)   0.115 us    |                      hrtick_update();
 0)   2.624 us    |                    }
 0)   2.890 us    |                  }
 0)               |                  ttwu_do_wakeup() {
 0)               |                    check_preempt_curr() {
 0)   0.130 us    |                      resched_curr();
 0)   0.417 us    |                    }
 0)   1.034 us    |                  }
 0)   4.269 us    |                }
 0)   0.410 us    |                _raw_spin_unlock_irqrestore();
 0)   7.466 us    |              }
 0)   7.702 us    |            }
 0)   8.164 us    |          }
 0)   9.079 us    |        }
 0)   9.370 us    |      }
 0)               |      wait_for_completion_killable() {
 0)               |        _cond_resched() {
 0)   0.118 us    |          rcu_all_qs();
 0)   0.365 us    |        }
 0)   0.122 us    |        _raw_spin_lock_irq();
 0)               |        schedule_timeout() {
 0)               |          schedule() {
 0)               |            rcu_note_context_switch() {
 0)   0.147 us    |              rcu_qs();
 0)   0.384 us    |            }
 0)   0.124 us    |            _raw_spin_lock();
 0)   0.144 us    |            update_rq_clock();
 0)               |            deactivate_task() {
 0)               |              dequeue_task_fair() {
 0)               |                dequeue_entity() {
 0)               |                  update_curr() {
 0)   0.121 us    |                    update_min_vruntime();
 0)   0.132 us    |                    cpuacct_charge();
 0)               |                    __cgroup_account_cputime() {
 0)   0.124 us    |                      cgroup_rstat_updated();
 0)   0.354 us    |                    }
 0)   1.088 us    |                  }
 0)   0.127 us    |                  __update_load_avg_se();
 0)   0.136 us    |                  __update_load_avg_cfs_rq();
 0)   0.118 us    |                  clear_buddies();
 0)   0.124 us    |                  account_entity_dequeue();
 0)   0.120 us    |                  update_cfs_group();
 0)   0.128 us    |                  update_min_vruntime();
 0)   2.780 us    |                }
 0)   0.114 us    |                hrtick_update();
 0)   3.242 us    |              }
 0)   3.503 us    |            }
 0)               |            pick_next_task_fair() {
 0)   0.121 us    |              __msecs_to_jiffies();
 0)   0.389 us    |            }
 0)               |            pick_next_task_idle() {
 0)               |              put_prev_task_fair() {
 0)               |                put_prev_entity() {
 0)   0.122 us    |                  check_cfs_rq_runtime();
 0)   0.355 us    |                }
 0)   0.581 us    |              }
 0)   0.123 us    |              __update_idle_core();
 0)   1.055 us    |            }
 0)   0.118 us    |            enter_lazy_tlb();
 2)               |            finish_task_switch() {
 2)               |              __mmdrop() {
 2)               |                pgd_free() {
 2)   0.248 us    |                  _raw_spin_lock();
 2)               |                  free_pages() {
 2)               |                    free_pages.part.0() {
 2)               |                      __free_pages() {
 2)               |                        __free_pages_ok() {
 2)               |                          free_one_page() {
 2)   0.213 us    |                            _raw_spin_lock();
 2)   0.219 us    |                            __mod_zone_page_state();
 2)   1.102 us    |                          }
 2)   1.839 us    |                        }
 2)   2.337 us    |                      }
 2)   2.672 us    |                    }
 2)   2.986 us    |                  }
 2)   4.044 us    |                }
 2)   0.218 us    |                destroy_context_ldt();
 2)               |                kmem_cache_free() {
 2)   0.416 us    |                  ___cache_free();
 2)   1.051 us    |                }
 2)   6.234 us    |              }
 2)   8.542 us    |            }
 2) * 13308.49 us |          } /* schedule */
 2) * 13308.85 us |        } /* schedule_timeout */
 2)   0.235 us    |        _raw_spin_lock_irq();
 2) * 13310.47 us |      } /* wait_for_completion_killable */
 2)               |      free_modprobe_argv() {
 2)               |        kfree() {
 2)   0.213 us    |          ___cache_free();
 2)   0.657 us    |        }
 2)               |        kfree() {
 2)   0.153 us    |          ___cache_free();
 2)   0.557 us    |        }
 2)   1.732 us    |      }
 2)               |      kfree() {
 2)   0.207 us    |        ___cache_free();
 2)   0.734 us    |      }
 2)               |      __wake_up() {
 2)               |        __wake_up_common_lock() {
 2)   0.265 us    |          _raw_spin_lock_irqsave();
 2)   0.199 us    |          __wake_up_common();
 2)   0.214 us    |          _raw_spin_unlock_irqrestore();
 2)   1.333 us    |        }
 2)   1.648 us    |      }
 2) * 13325.51 us |    } /* call_usermodehelper_exec */
 2)               |    __wake_up() {
 2)               |      __wake_up_common_lock() {
 2)   0.195 us    |        _raw_spin_lock_irqsave();
 2)   0.200 us    |        __wake_up_common();
 2)   0.189 us    |        _raw_spin_unlock_irqrestore();
 2)   1.274 us    |      }
 2)   1.629 us    |    }
 2) * 13337.37 us |  } /* __request_module */
 2)               |  __request_module() {
 2)   0.279 us    |    current_is_async();
 2)               |    security_kernel_module_request() {
 2)   0.310 us    |      integrity_kernel_module_request();
 2)   0.799 us    |    }
 2)               |    kmem_cache_alloc_trace() {
 2)               |      _cond_resched() {
 2)   0.209 us    |        rcu_all_qs();
 2)   0.598 us    |      }
 2)   0.208 us    |      should_failslab();
 2)               |      _cond_resched() {
 2)   0.202 us    |        rcu_all_qs();
 2)   0.597 us    |      }
 2)   0.206 us    |      memcg_kmem_put_cache();
 2)   2.596 us    |    }
 2)               |    kstrdup() {
 2)               |      __kmalloc_track_caller() {
 2)   0.209 us    |        kmalloc_slab();
 2)               |        _cond_resched() {
 2)   0.200 us    |          rcu_all_qs();
 2)   0.598 us    |        }
 2)   0.399 us    |        should_failslab();
 2)               |        _cond_resched() {
 2)   0.179 us    |          rcu_all_qs();
 2)   0.491 us    |        }
 2)   0.174 us    |        memcg_kmem_put_cache();
 2)   3.004 us    |      }
 2)   3.424 us    |    }
 2)               |    call_usermodehelper_setup() {
 2)               |      kmem_cache_alloc_trace() {
 2)               |        _cond_resched() {
 2)   0.171 us    |          rcu_all_qs();
 2)   0.816 us    |        }
 2)   0.173 us    |        should_failslab();
 2)               |        _cond_resched() {
 2)   0.188 us    |          rcu_all_qs();
 2)   0.544 us    |        }
 2)   0.205 us    |        memcg_kmem_put_cache();
 2)   2.728 us    |      }
 2)   3.100 us    |    }
 2)               |    call_usermodehelper_exec() {
 2)               |      queue_work_on() {
 2)               |        __queue_work() {
 2)   0.213 us    |          get_work_pool();
 2)   0.206 us    |          _raw_spin_lock();
 2)               |          insert_work() {
 2)   0.194 us    |            get_pwq.isra.0();
 2)               |            wake_up_process() {
 2)               |              try_to_wake_up() {
 2)   0.248 us    |                _raw_spin_lock_irqsave();
 2)               |                select_task_rq_fair() {
 2)   0.205 us    |                  available_idle_cpu();
 2)   0.252 us    |                  update_cfs_rq_h_load();
 2)               |                  select_idle_sibling() {
 2)   0.277 us    |                    available_idle_cpu();
 2)   0.666 us    |                  }
 2)   2.074 us    |                }
 2)   0.213 us    |                _raw_spin_lock();
 2)   0.240 us    |                update_rq_clock();
 2)               |                ttwu_do_activate() {
 2)               |                  activate_task() {
 2)               |                    enqueue_task_fair() {
 2)               |                      enqueue_entity() {
 2)   0.209 us    |                        update_curr();
 2)               |                        __update_load_avg_se() {
 2)   0.198 us    |                          __accumulate_pelt_segments();
 2)   0.715 us    |                        }
 2)   0.221 us    |                        __update_load_avg_cfs_rq();
 2)   0.205 us    |                        update_cfs_group();
 2)   0.274 us    |                        account_entity_enqueue();
 2)   0.209 us    |                        place_entity();
 2)   0.211 us    |                        __enqueue_entity();
 2)   3.718 us    |                      }
 2)   0.208 us    |                      hrtick_update();
 2)   4.568 us    |                    }
 2)   5.005 us    |                  }
 2)               |                  ttwu_do_wakeup() {
 2)               |                    check_preempt_curr() {
 2)   0.222 us    |                      resched_curr();
 2)   0.700 us    |                    }
 2)   1.473 us    |                  }
 2)   7.098 us    |                }
 2)   0.218 us    |                _raw_spin_unlock_irqrestore();
 2) + 11.605 us   |              }
 2) + 11.947 us   |            }
 2) + 12.730 us   |          }
 2) + 14.091 us   |        }
 2) + 14.527 us   |      }
 2)               |      wait_for_completion_killable() {
 2)               |        _cond_resched() {
 2)   0.210 us    |          rcu_all_qs();
 2)   0.636 us    |        }
 2)   0.241 us    |        _raw_spin_lock_irq();
 2)               |        schedule_timeout() {
 2)               |          schedule() {
 2)               |            rcu_note_context_switch() {
 2)   0.217 us    |              rcu_qs();
 2)   0.618 us    |            }
 2)   0.200 us    |            _raw_spin_lock();
 2)   0.213 us    |            update_rq_clock();
 2)               |            deactivate_task() {
 2)               |              dequeue_task_fair() {
 2)               |                dequeue_entity() {
 2)               |                  update_curr() {
 2)   0.196 us    |                    update_min_vruntime();
 2)   0.239 us    |                    cpuacct_charge();
 2)               |                    __cgroup_account_cputime() {
 2)   0.202 us    |                      cgroup_rstat_updated();
 2)   0.638 us    |                    }
 2)   1.946 us    |                  }
 2)   0.210 us    |                  __update_load_avg_se();
 2)   0.213 us    |                  __update_load_avg_cfs_rq();
 2)   0.200 us    |                  clear_buddies();
 2)   0.212 us    |                  account_entity_dequeue();
 2)   0.203 us    |                  update_cfs_group();
 2)   0.207 us    |                  update_min_vruntime();
 2)   5.029 us    |                }
 2)   0.194 us    |                hrtick_update();
 2)   5.793 us    |              }
 2)   6.205 us    |            }
 2)               |            pick_next_task_fair() {
 2)               |              update_blocked_averages() {
 2)   0.149 us    |                _raw_spin_lock_irqsave();
 2)   0.205 us    |                update_rq_clock();
 2)   0.167 us    |                __update_load_avg_cfs_rq();
 2)   0.206 us    |                update_rt_rq_load_avg();
 2)   0.184 us    |                update_dl_rq_load_avg();
 2)   0.183 us    |                _raw_spin_unlock_irqrestore();
 2)   2.510 us    |              }
 2)               |              load_balance() {
 2)               |                find_busiest_group() {
 2)   0.172 us    |                  update_nohz_stats();
 2)   0.176 us    |                  idle_cpu();
 2)   0.207 us    |                  update_nohz_stats();
 2)   1.818 us    |                }
 2)   2.339 us    |              }
 2)   0.171 us    |              __msecs_to_jiffies();
 2)               |              load_balance() {
 2)               |                find_busiest_group() {
 2)   0.211 us    |                  update_nohz_stats();
 2)   0.203 us    |                  update_nohz_stats();
 2)   0.190 us    |                  idle_cpu();
 2)   0.216 us    |                  update_nohz_stats();
 2)   0.222 us    |                  update_nohz_stats();
 2)   2.711 us    |                }
 2)   0.228 us    |                _raw_spin_lock_irqsave();
 2)   0.262 us    |                update_rq_clock();
 2)   0.233 us    |                can_migrate_task();
 2)   0.322 us    |                can_migrate_task();
 2)   0.222 us    |                update_cfs_rq_h_load();
 2)               |                deactivate_task() {
 2)               |                  dequeue_task_fair() {
 2)               |                    dequeue_entity() {
 2)               |                      update_curr() {
 2)   0.209 us    |                        update_min_vruntime();
 2)   0.259 us    |                        cpuacct_charge();
 2)               |                        __cgroup_account_cputime() {
 2)   0.193 us    |                          cgroup_rstat_updated();
 2)   0.738 us    |                        }
 2)   2.022 us    |                      }
 2)   0.223 us    |                      __update_load_avg_se();
 2)   0.213 us    |                      __update_load_avg_cfs_rq();
 2)   0.202 us    |                      clear_buddies();
 2)   0.215 us    |                      account_entity_dequeue();
 2)   0.204 us    |                      update_cfs_group();
 2)   0.202 us    |                      update_min_vruntime();
 2)   4.891 us    |                    }
 2)   0.205 us    |                    hrtick_update();
 2)   5.711 us    |                  }
 2)   6.143 us    |                }
 2)               |                set_task_cpu() {
 2)               |                  migrate_task_rq_fair() {
 2)               |                    detach_entity_cfs_rq() {
 2)   0.215 us    |                      __update_load_avg_se();
 2)   0.204 us    |                      __update_load_avg_cfs_rq();
 2)   0.219 us    |                      propagate_entity_cfs_rq.isra.0();
 2)   1.433 us    |                    }
 2)   1.814 us    |                  }
 2)   0.195 us    |                  set_task_rq_fair();
 2)   2.664 us    |                }
 2)   0.202 us    |                _raw_spin_lock();
 2)   0.219 us    |                update_rq_clock();
 2)               |                attach_task() {
 2)               |                  activate_task() {
 2)               |                    enqueue_task_fair() {
 2)               |                      enqueue_entity() {
 2)               |                        update_curr() {
 2)   0.225 us    |                          update_min_vruntime();
 2)   0.202 us    |                          cpuacct_charge();
 2)               |                          __cgroup_account_cputime() {
 2)   0.196 us    |                            cgroup_rstat_updated();
 2)   0.569 us    |                          }
 2)   2.120 us    |                        }
 2)   0.208 us    |                        __update_load_avg_cfs_rq();
 2)   0.219 us    |                        attach_entity_load_avg();
 2)   0.203 us    |                        update_cfs_group();
 2)   0.184 us    |                        account_entity_enqueue();
 2)   0.217 us    |                        __enqueue_entity();
 2)   4.678 us    |                      }
 2)   0.177 us    |                      hrtick_update();
 2)   5.408 us    |                    }
 2)   5.795 us    |                  }
 2)               |                  check_preempt_curr() {
 2)               |                    check_preempt_wakeup() {
 2)   0.205 us    |                      update_curr();
 2)   0.201 us    |                      wakeup_preempt_entity.isra.0();
 2)   0.970 us    |                    }
 2)   1.359 us    |                  }
 2)   7.688 us    |                }
 2) + 23.511 us   |              }
 2)   0.201 us    |              __msecs_to_jiffies();
 2)   0.205 us    |              _raw_spin_lock();
 2)   0.197 us    |              check_cfs_rq_runtime();
 2)               |              pick_next_entity() {
 2)   0.198 us    |                clear_buddies();
 2)   0.600 us    |              }
 2)               |              put_prev_entity() {
 2)   0.188 us    |                check_cfs_rq_runtime();
 2)   0.583 us    |              }
 2)               |              set_next_entity() {
 2)   0.209 us    |                __update_load_avg_se();
 2)   0.215 us    |                __update_load_avg_cfs_rq();
 2)   1.053 us    |              }
 2) + 33.704 us   |            }
 2)               |            switch_mm_irqs_off() {
 2)   0.607 us    |              load_new_mm_cr3();
 2)   2.798 us    |            }
 3)   5.150 us    |            finish_task_switch();

--------------------------------------------------------------------------------
Ctrl-C hit, notice the delay
--------------------------------------------------------------------------------
 
 3) $ 54568382 us |          } /* schedule */
 3) $ 54568383 us |        } /* schedule_timeout */
 3)   1.011 us    |        _raw_spin_lock_irq();
 3) $ 54568388 us |      } /* wait_for_completion_killable */
 3)               |      __wake_up() {
 3)               |        __wake_up_common_lock() {
 3)   0.911 us    |          _raw_spin_lock_irqsave();
 3)   1.015 us    |          __wake_up_common();
 3)   0.958 us    |          _raw_spin_unlock_irqrestore();
 3)   6.186 us    |        }
 3)   7.650 us    |      }
 3) $ 54568414 us |    } /* call_usermodehelper_exec */
 3)               |    __wake_up() {
 3)               |      __wake_up_common_lock() {
 3)   0.720 us    |        _raw_spin_lock_irqsave();
 3)   0.826 us    |        __wake_up_common();
 3)   0.850 us    |        _raw_spin_unlock_irqrestore();
 3)   5.655 us    |      }
 3)   7.309 us    |    }
 3) $ 54568437 us |  } /* __request_module */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-04 12:25                           ` Benjamin Tissoires
@ 2019-06-04 13:12                             ` Hans de Goede
  0 siblings, 0 replies; 22+ messages in thread
From: Hans de Goede @ 2019-06-04 13:12 UTC (permalink / raw)
  To: Benjamin Tissoires
  Cc: Jiri Kosina, Dave Hansen, open list:HID CORE LAYER, LKML

Hi,

On 04-06-19 14:25, Benjamin Tissoires wrote:
> On Tue, Jun 4, 2019 at 12:50 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 04-06-19 12:08, Benjamin Tissoires wrote:
>>> On Tue, Jun 4, 2019 at 9:51 AM Benjamin Tissoires
>>> <benjamin.tissoires@redhat.com> wrote:
>>>>
>>>> On Mon, Jun 3, 2019 at 4:17 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 03-06-19 15:55, Benjamin Tissoires wrote:
>>>>>> On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@redhat.com> wrote:
>>>>>>>
>>>>>>> Hi Again,
>>>>>>>
>>>>>>> On 03-06-19 11:11, Hans de Goede wrote:
>>>>>>> <snip>
>>>>>>>
>>>>>>>>> not sure about the rest of logitech issues yet) next week.
>>>>>>>>
>>>>>>>> The main problem seems to be the request_module patches. Although I also
>>>>>>
>>>>>> Can't we use request_module_nowait() instead, and set a reasonable
>>>>>> timeout that we detect only once to check if userspace is compatible:
>>>>>>
>>>>>> In pseudo-code:
>>>>>> if (!request_module_checked) {
>>>>>>      request_module_nowait(name);
>>>>>>      use_request_module = wait_event_timeout(wq,
>>>>>>            first_module_loaded, 10 seconds in jiffies);
>>>>>>      request_module_checked = true;
>>>>>> } else if (use_request_module) {
>>>>>>      request_module(name);
>>>>>> }
>>>>>
>>>>> Well looking at the just attached dmesg , the modprobe
>>>>> when triggered by udev from userspace succeeds in about
>>>>> 0.5 seconds, so it seems that the modprobe hangs happens
>>>>> when called from within the kernel rather then from within
>>>>> userspace.
>>>>>
>>>>> What I do not know if is the hang is inside userspace, or
>>>>> maybe it happens when modprobe calls back into the kernel,
>>>>> if the hang happens when modprobe calls back into the kernel,
>>>>> then other modprobes (done from udev) likely will hang too
>>>>> since I think only 1 modprobe can happen at a time.
>>>>>
>>>>> I really wish we knew what distinguished working systems
>>>>> from non working systems :|
>>>>>
>>>>> I cannot find a common denominator; other then the systems
>>>>> are not running Fedora. So far we've reports from both Ubuntu 16.04
>>>>> and Tumbleweed, so software version wise these 2 are wide apart.
>>>>
>>>> I am trying to reproduce the lock locally, and installed an opensuse
>>>> Tumbleweed in a VM. When forwarding a Unifying receiver to the VM, I
>>>> do not see the lock with either my vanilla compiled kernel and the rpm
>>>> found in http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/
>>>>
>>>> Next step is install Tumbleweed on bare metal, but I do not see how
>>>> this could introduce a difference (maybe USB2 vs 3).
>>>
>>> Making progress here.
>>>
>>> The difference between Ubuntu/Tumbleweed and Fedora: usbhid is shipped
>>> as a module while in Fedora usbhid is included in the kernel.
>>>
>>> If I rmmod hid_* and usbhid, then modprobe usbhid, the command hangs
>>> for 3 minutes.
>>> If usbhid is already loaded, inserting a receiver is immediate
>>> regarding the loading of the external modules.
>>>
>>> So my assumption is that when the device gets detected at boot, usbhid
>>> gets loaded by the kernel event, which in turns attempts to call
>>> __request_module, but the modprobe can't be fulfilled because it's
>>> already waiting for the initial usbhid modprobe to finish.
>>>
>>> Still don't know how to solve that, but I thought I should share.
>>
>> Hmm, we may be hitting the scenario described in the big comment
>> around line 3500 of kernel/module.c.
> 
> Well, we are not locking during do_init_module(), but in waiting for
> the completion of request_module(). So as I read the trace, we wait
> for userspace to call/terminate modprobe.
> 
>>
>> But I'm not sure that is what is happening here.
>>
>> Maybe you can put a WARN_ON(1) in request_module and look at the
>> backtrace ?  That may help to figure out what is going on; or
>> alternatively it might help to find some way to detect this and
>> if it happens skip the request_module...
> 
> Ftrace is much easier to deal with:
> ---
> /sys/kernel/debug/tracing # cat trace
> # tracer: function
> #
> # entries-in-buffer/entries-written: 4/4   #P:4
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>          modprobe-23236 [003] ....  9191.880917: __request_module
> <-hid_add_device
>          modprobe-23236 [003] ....  9191.880937: <stack trace>
>   => 0xffffffffc0851061
>   => __request_module
>   => hid_add_device
>   => usbhid_probe
>   => usb_probe_interface
>   => really_probe
>   => driver_probe_device
>   => device_driver_attach
>   => __driver_attach
>   => bus_for_each_dev
>   => bus_add_driver
>   => driver_register
>   => usb_register_driver
>   => hid_init
>   => do_one_initcall
>   => do_init_module
>   => load_module
>   => __do_sys_finit_module
>   => do_syscall_64
>   => entry_SYSCALL_64_after_hwframe
>          modprobe-23236 [003] ....  9191.892765: __request_module
> <-hid_add_device
>          modprobe-23236 [003] ....  9191.892784: <stack trace>
>   => 0xffffffffc0851061
>   => __request_module
>   => hid_add_device
>   => usbhid_probe
>   => usb_probe_interface
>   => really_probe
>   => driver_probe_device
>   => device_driver_attach
>   => __driver_attach
>   => bus_for_each_dev
>   => bus_add_driver
>   => driver_register
>   => usb_register_driver
>   => hid_init
>   => do_one_initcall
>   => do_init_module
>   => load_module
>   => __do_sys_finit_module
>   => do_syscall_64
>   => entry_SYSCALL_64_after_hwframe
> ---
> 
> This is what happen with the logitech receiver plugged in and while
> modprobing usbhid. The modprobe command hangs, and I took the trace
> here.
> If I hit Ctrl-C, the modprobe loading continue properly.
> 
> There is nothing special here in the stack trace, except that there
> are 2 calls to request_module here (one for the touchscreen I guess
> and one for the logitech receiver).
> 
> I have attached the call graph of the same sequence:
> wait_for_completion_killable() seems to be the culprit, we are waiting
> for userspace to notify it has done calling modprobe.
> 
> My idea would be to defer any call to hid_add_device() into a
> workqueue and see if that unlocks the situation.

Yes that is probably a good solution. When the first bug reports came
in I was thinking that the nested hid_add_device calls from hid-logitech-dj
calling hid_add_device() were the problem, but those are already deferred
to a workqueue so those are not really nested.

Looking at various callers if hid_add_device, if we always defer, we also
need the caller to give a callback to call on add_device error, which would
then be used to free various resources related to the hid device.

If me make hid_add_device itself always defer, I guess we may want a non
deferred version of hid_add_device for the hid_add_device calls in
hid-logitech-dj.

In essence we are dealing with nested hid_add_device calls here right?

So another solution would be to have an atomic counter and call
atomic_long_inc_return on it before the hid_add_device and then if its
previous value was not 0, skip the request_module ?
This does mean though that we then may get inconsistent behavior if
2 unrelated hid_add_device-s are racing with each-other :|

Regards,

Hans

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-04  8:53                           ` Benjamin Tissoires
@ 2019-06-04 21:08                             ` Jiri Kosina
  2019-06-05 12:29                               ` Benjamin Tissoires
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Kosina @ 2019-06-04 21:08 UTC (permalink / raw)
  To: Benjamin Tissoires
  Cc: Hans de Goede, Dave Hansen, open list:HID CORE LAYER, LKML

On Tue, 4 Jun 2019, Benjamin Tissoires wrote:

> > Which should fix this. It is quite simple and safe, so if we get testing
> > feedback relatively soon, we could go with the fix instead of dropping the
> > product-id, your call.
> 
> I should receive the M280 tomorrow, hopefully with the C52F. If the
> receiver is correct and the tests are successful, I'd prefer to take
> this one over the revert :)

Agreed at this release phase. And if it doesn't work, immediate revert for 
now and proper fix for 5.3-rc1.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: hid-related 5.2-rc1 boot hang
  2019-06-04 21:08                             ` Jiri Kosina
@ 2019-06-05 12:29                               ` Benjamin Tissoires
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Tissoires @ 2019-06-05 12:29 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Hans de Goede, Dave Hansen, open list:HID CORE LAYER, LKML

On Tue, Jun 4, 2019 at 11:08 PM Jiri Kosina <jikos@kernel.org> wrote:
>
> On Tue, 4 Jun 2019, Benjamin Tissoires wrote:
>
> > > Which should fix this. It is quite simple and safe, so if we get testing
> > > feedback relatively soon, we could go with the fix instead of dropping the
> > > product-id, your call.
> >
> > I should receive the M280 tomorrow, hopefully with the C52F. If the
> > receiver is correct and the tests are successful, I'd prefer to take
> > this one over the revert :)
>
> Agreed at this release phase. And if it doesn't work, immediate revert for
> now and proper fix for 5.3-rc1.
>

Mouse received, patch tested, patch validated.

Honestly, before fully reviewing the patch, I thought it would be more
5.3 material, but after a thorough review, there is nothing scary in
it and we can safely push it on top of the 5.2 branch.

I'm about to send the other mentioned reverts and push everything in 5.2-fixes

Cheers,
Benjamin

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-06-05 12:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-28 16:43 hid-related 5.2-rc1 boot hang Dave Hansen
2019-05-28 17:14 ` Jiri Kosina
2019-05-28 17:45   ` Benjamin Tissoires
2019-05-28 18:11     ` Dave Hansen
2019-05-29  9:17       ` Hans de Goede
2019-05-30 16:56         ` Dave Hansen
2019-05-31 22:15           ` Jiri Kosina
2019-06-03  9:11             ` Hans de Goede
2019-06-03  9:32               ` Hans de Goede
2019-06-03  9:51               ` Hans de Goede
2019-06-03 13:55                 ` Benjamin Tissoires
2019-06-03 14:17                   ` Hans de Goede
2019-06-04  7:51                     ` Benjamin Tissoires
2019-06-04  8:05                       ` Hans de Goede
2019-06-04  8:36                         ` Hans de Goede
2019-06-04  8:53                           ` Benjamin Tissoires
2019-06-04 21:08                             ` Jiri Kosina
2019-06-05 12:29                               ` Benjamin Tissoires
2019-06-04 10:08                       ` Benjamin Tissoires
2019-06-04 10:50                         ` Hans de Goede
2019-06-04 12:25                           ` Benjamin Tissoires
2019-06-04 13:12                             ` Hans de Goede

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.