linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] USB ports do not work after suspend/resume cycle with v6.6.2
@ 2023-11-23 18:20 Oleksandr Natalenko
  2023-11-24 11:43 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 4+ messages in thread
From: Oleksandr Natalenko @ 2023-11-23 18:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-usb, stable, Greg Kroah-Hartman, Mathias Nyman,
	Philipp Zabel, Basavaraj Natikar, Mario Limonciello, Sasha Levin,
	Linus Torvalds, Thorsten Leemhuis, Petr Tesarik,
	Krzysztof Kozlowski, Javier Martinez Canillas, Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 3273 bytes --]

Hello.

Since v6.6.2 kernel release I'm experiencing a regression with regard to USB ports behaviour after a suspend/resume cycle.

If a USB port is empty before suspending, after resuming the machine the port doesn't work. After a device insertion there's no reaction in the kernel log whatsoever, although I do see that the device gets powered up physically. If the machine is suspended with a device inserted into the USB port, the port works fine after resume.

This is an AMD-based machine with hci version 0x110 reported. As per the changelog between v6.6.1 and v6.6.2, 603 commits were backported into v6.6.2, and one of the commits was as follows:

$ git log --oneline v6.6.1..v6.6.2 -- drivers/usb/host/xhci-pci.c
14a51fa544225 xhci: Loosen RPM as default policy to cover for AMD xHC 1.1

It seems that this commit explicitly enables runtime PM specifically for my platform. As per dmesg:

v6.6.1: quirks 0x0000000000000410
v6.6.2: quirks 0x0000000200000410

Here, bit 33 gets set, which, as expected, corresponds to:

drivers/usb/host/xhci.h
1895:#define XHCI_DEFAULT_PM_RUNTIME_ALLOW      BIT_ULL(33)

This commit is backported from the upstream commit 4baf12181509, which is one of 16 commits of the following series named "xhci features":

https://lore.kernel.org/all/20231019102924.2797346-1-mathias.nyman@linux.intel.com/

It appears that there was another commit in this series, also from Basavaraj (in Cc), a5d6264b638e, which was not picked for v6.6.2, but which stated the following:

	Use the low-power states of the underlying platform to enable runtime PM.
	If the platform doesn't support runtime D3, then enabling default RPM will
	result in the controller malfunctioning, as in the case of hotplug devices
	not being detected because of a failed interrupt generation.

It felt like this was exactly my case. So, I've conducted two tests:

1. Reverted 14a51fa544225 from v6.6.2. With this revert the USB ports started to work fine, just as they did in v6.6.1.
2. Left 14a51fa544225 in place, but also applied upstream a5d6264b638e on top of v6.6.2. With this patch added the USB ports also work after a suspend/resume cycle.

This runtime PM enablement did also impact my AX200 Bluetooth device, resulting in long delays before headphones/speaker can connect, but I've solved this with btusb.enable_autosuspend=N. I think this has nothing to do with the original issue, and I'm OK with this workaround unless someone has got a different idea.

With that, please consider either reverting 14a51fa544225 from the stable kernel, or applying a5d6264b638e in addition to it. Given the mainline kernel has got both of them, I'm in favour of applying additional commit to the stable kernel.

I'm also Cc'ing all the people from our Mastodon discussion where I initially complained about the issue as well as about stable kernel branch stability:

https://activitypub.natalenko.name/@oleksandr/statuses/01HFRXBYWMXF9G4KYPE3XHH0S8

I'm not going to expand more on that in this email, especially given Greg indicated he read the conversation, but I'm open to continuing this discussion as I still think that current workflow brings visible issues to ordinary users, and hence some adjustments should be made.

Thank you.

-- 
Oleksandr Natalenko (post-factum)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [REGRESSION] USB ports do not work after suspend/resume cycle with v6.6.2
  2023-11-23 18:20 [REGRESSION] USB ports do not work after suspend/resume cycle with v6.6.2 Oleksandr Natalenko
@ 2023-11-24 11:43 ` Greg Kroah-Hartman
  2023-11-24 12:59   ` Vlastimil Babka
  0 siblings, 1 reply; 4+ messages in thread
From: Greg Kroah-Hartman @ 2023-11-24 11:43 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: linux-kernel, linux-usb, stable, Mathias Nyman, Philipp Zabel,
	Basavaraj Natikar, Mario Limonciello, Sasha Levin,
	Linus Torvalds, Thorsten Leemhuis, Petr Tesarik,
	Krzysztof Kozlowski, Javier Martinez Canillas, Vlastimil Babka

On Thu, Nov 23, 2023 at 07:20:46PM +0100, Oleksandr Natalenko wrote:
> Hello.
> 
> Since v6.6.2 kernel release I'm experiencing a regression with regard to USB ports behaviour after a suspend/resume cycle.
> 
> If a USB port is empty before suspending, after resuming the machine the port doesn't work. After a device insertion there's no reaction in the kernel log whatsoever, although I do see that the device gets powered up physically. If the machine is suspended with a device inserted into the USB port, the port works fine after resume.
> 
> This is an AMD-based machine with hci version 0x110 reported. As per the changelog between v6.6.1 and v6.6.2, 603 commits were backported into v6.6.2, and one of the commits was as follows:
> 
> $ git log --oneline v6.6.1..v6.6.2 -- drivers/usb/host/xhci-pci.c
> 14a51fa544225 xhci: Loosen RPM as default policy to cover for AMD xHC 1.1
> 
> It seems that this commit explicitly enables runtime PM specifically for my platform. As per dmesg:
> 
> v6.6.1: quirks 0x0000000000000410
> v6.6.2: quirks 0x0000000200000410
> 
> Here, bit 33 gets set, which, as expected, corresponds to:
> 
> drivers/usb/host/xhci.h
> 1895:#define XHCI_DEFAULT_PM_RUNTIME_ALLOW      BIT_ULL(33)
> 
> This commit is backported from the upstream commit 4baf12181509, which is one of 16 commits of the following series named "xhci features":
> 
> https://lore.kernel.org/all/20231019102924.2797346-1-mathias.nyman@linux.intel.com/
> 
> It appears that there was another commit in this series, also from Basavaraj (in Cc), a5d6264b638e, which was not picked for v6.6.2, but which stated the following:
> 
> 	Use the low-power states of the underlying platform to enable runtime PM.
> 	If the platform doesn't support runtime D3, then enabling default RPM will
> 	result in the controller malfunctioning, as in the case of hotplug devices
> 	not being detected because of a failed interrupt generation.
> 
> It felt like this was exactly my case. So, I've conducted two tests:
> 
> 1. Reverted 14a51fa544225 from v6.6.2. With this revert the USB ports started to work fine, just as they did in v6.6.1.
> 2. Left 14a51fa544225 in place, but also applied upstream a5d6264b638e on top of v6.6.2. With this patch added the USB ports also work after a suspend/resume cycle.
> 
> This runtime PM enablement did also impact my AX200 Bluetooth device, resulting in long delays before headphones/speaker can connect, but I've solved this with btusb.enable_autosuspend=N. I think this has nothing to do with the original issue, and I'm OK with this workaround unless someone has got a different idea.
> 
> With that, please consider either reverting 14a51fa544225 from the stable kernel, or applying a5d6264b638e in addition to it. Given the mainline kernel has got both of them, I'm in favour of applying additional commit to the stable kernel.

I've applied this other commit as well to all of the affected branches,
thanks for letting us know.

> I'm also Cc'ing all the people from our Mastodon discussion where I initially complained about the issue as well as about stable kernel branch stability:
> 
> https://activitypub.natalenko.name/@oleksandr/statuses/01HFRXBYWMXF9G4KYPE3XHH0S8
> 
> I'm not going to expand more on that in this email, especially given Greg indicated he read the conversation, but I'm open to continuing this discussion as I still think that current workflow brings visible issues to ordinary users, and hence some adjustments should be made.

What type of adjustments exactly?  Testing on wide ranges of systems is
pretty hard, and this patch explicitly was set to be backported when it
hit Linus's tree, it just looks like someone forgot to mark the
follow-up patch that you found also to be properly backported.

We will always make mistakes, we are only human.  The best thing to do
is if we get notified quickly of issues, like you did here, and work to
resolve them, as we have done here.  So again, thanks for letting us
know about the problem, and be sure to let us know of any future issues
you might find as well.

Remember, hardware is messy, and the kernel's job is to fix hardware
issues and quirks in it.  Sometimes we get it wrong as we are trying to
fix up inconsistencies and they cause other problems, so in the end, we
can only grumble at the hardware companies for stuff like this, be
patient with those of us who have to deal with this mess :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [REGRESSION] USB ports do not work after suspend/resume cycle with v6.6.2
  2023-11-24 11:43 ` Greg Kroah-Hartman
@ 2023-11-24 12:59   ` Vlastimil Babka
  2023-11-24 13:05     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 4+ messages in thread
From: Vlastimil Babka @ 2023-11-24 12:59 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Oleksandr Natalenko
  Cc: linux-kernel, linux-usb, stable, Mathias Nyman, Philipp Zabel,
	Basavaraj Natikar, Mario Limonciello, Sasha Levin,
	Linus Torvalds, Thorsten Leemhuis, Petr Tesarik,
	Krzysztof Kozlowski, Javier Martinez Canillas, workflows

+Cc workflows

On 11/24/23 12:43, Greg Kroah-Hartman wrote:
> On Thu, Nov 23, 2023 at 07:20:46PM +0100, Oleksandr Natalenko wrote:
>> Hello.
>> 
>> Since v6.6.2 kernel release I'm experiencing a regression with regard
>> to USB ports behaviour after a suspend/resume cycle.
>> 
>> If a USB port is empty before suspending, after resuming the machine
>> the port doesn't work. After a device insertion there's no reaction in
>> the kernel log whatsoever, although I do see that the device gets
>> powered up physically. If the machine is suspended with a device
>> inserted into the USB port, the port works fine after resume.
>> 
>> This is an AMD-based machine with hci version 0x110 reported. As per
>> the changelog between v6.6.1 and v6.6.2, 603 commits were backported
>> into v6.6.2, and one of the commits was as follows:
>> 
>> $ git log --oneline v6.6.1..v6.6.2 -- drivers/usb/host/xhci-pci.c 
>> 14a51fa544225 xhci: Loosen RPM as default policy to cover for AMD xHC
>> 1.1
>> 
>> It seems that this commit explicitly enables runtime PM specifically
>> for my platform. As per dmesg:
>> 
>> v6.6.1: quirks 0x0000000000000410 v6.6.2: quirks 0x0000000200000410
>> 
>> Here, bit 33 gets set, which, as expected, corresponds to:
>> 
>> drivers/usb/host/xhci.h 1895:#define XHCI_DEFAULT_PM_RUNTIME_ALLOW
>> BIT_ULL(33)
>> 
>> This commit is backported from the upstream commit 4baf12181509, which
>> is one of 16 commits of the following series named "xhci features":
>> 
>> https://lore.kernel.org/all/20231019102924.2797346-1-mathias.nyman@linux.intel.com/
>>
>>  It appears that there was another commit in this series, also from
>> Basavaraj (in Cc), a5d6264b638e, which was not picked for v6.6.2, but
>> which stated the following:
>> 
>> Use the low-power states of the underlying platform to enable runtime
>> PM. If the platform doesn't support runtime D3, then enabling default
>> RPM will result in the controller malfunctioning, as in the case of
>> hotplug devices not being detected because of a failed interrupt
>> generation.
>> 
>> It felt like this was exactly my case. So, I've conducted two tests:
>> 
>> 1. Reverted 14a51fa544225 from v6.6.2. With this revert the USB ports
>> started to work fine, just as they did in v6.6.1. 2. Left 14a51fa544225
>> in place, but also applied upstream a5d6264b638e on top of v6.6.2. With
>> this patch added the USB ports also work after a suspend/resume cycle.
>> 
>> This runtime PM enablement did also impact my AX200 Bluetooth device,
>> resulting in long delays before headphones/speaker can connect, but
>> I've solved this with btusb.enable_autosuspend=N. I think this has
>> nothing to do with the original issue, and I'm OK with this workaround
>> unless someone has got a different idea.
>> 
>> With that, please consider either reverting 14a51fa544225 from the
>> stable kernel, or applying a5d6264b638e in addition to it. Given the
>> mainline kernel has got both of them, I'm in favour of applying
>> additional commit to the stable kernel.
> 
> I've applied this other commit as well to all of the affected branches, 
> thanks for letting us know.
> 
>> I'm also Cc'ing all the people from our Mastodon discussion where I
>> initially complained about the issue as well as about stable kernel
>> branch stability:
>> 
>> https://activitypub.natalenko.name/@oleksandr/statuses/01HFRXBYWMXF9G4KYPE3XHH0S8
>>
>>  I'm not going to expand more on that in this email, especially given
>> Greg indicated he read the conversation, but I'm open to continuing
>> this discussion as I still think that current workflow brings visible
>> issues to ordinary users, and hence some adjustments should be made.
> 
> What type of adjustments exactly?  Testing on wide ranges of systems is
> pretty hard, and this patch explicitly was set to be backported when it
> hit Linus's tree,

Are you sure about that "explicitly was set to be backported" part?
According to Documentation/process/stable-kernel-rules.rst:

> There are three options to submit a change to -stable trees:
> 
>  1. Add a 'stable tag' to the description of a patch you then submit for
>     mainline inclusion.
>  2. Ask the stable team to pick up a patch already mainlined.
>  3. Submit a patch to the stable team that is equivalent to a change already
>     mainlined.

I don't see a stable tag in 4baf12181509 ("xhci: Loosen RPM as default
policy to cover for AMD xHC 1.1"), was it option 2 or 3 then?

Do you mean the Fixes: tag? the docs only say that can replace the "# 3.3.x"
part to determine where backporting should stop, but is not itself an
explicit marking for stable backport?

> it just looks like someone forgot to mark the
> follow-up patch that you found also to be properly backported.
> 
> We will always make mistakes, we are only human.  The best thing to do
> is if we get notified quickly of issues, like you did here, and work to
> resolve them, as we have done here.  So again, thanks for letting us
> know about the problem, and be sure to let us know of any future issues
> you might find as well.
> 
> Remember, hardware is messy, and the kernel's job is to fix hardware
> issues and quirks in it.  Sometimes we get it wrong as we are trying to
> fix up inconsistencies and they cause other problems, so in the end, we
> can only grumble at the hardware companies for stuff like this, be
> patient with those of us who have to deal with this mess :)
> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [REGRESSION] USB ports do not work after suspend/resume cycle with v6.6.2
  2023-11-24 12:59   ` Vlastimil Babka
@ 2023-11-24 13:05     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 4+ messages in thread
From: Greg Kroah-Hartman @ 2023-11-24 13:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Oleksandr Natalenko, linux-kernel, linux-usb, stable,
	Mathias Nyman, Philipp Zabel, Basavaraj Natikar,
	Mario Limonciello, Sasha Levin, Linus Torvalds,
	Thorsten Leemhuis, Petr Tesarik, Krzysztof Kozlowski,
	Javier Martinez Canillas, workflows

On Fri, Nov 24, 2023 at 01:59:24PM +0100, Vlastimil Babka wrote:
> +Cc workflows
> 
> On 11/24/23 12:43, Greg Kroah-Hartman wrote:
> > On Thu, Nov 23, 2023 at 07:20:46PM +0100, Oleksandr Natalenko wrote:
> >> Hello.
> >> 
> >> Since v6.6.2 kernel release I'm experiencing a regression with regard
> >> to USB ports behaviour after a suspend/resume cycle.
> >> 
> >> If a USB port is empty before suspending, after resuming the machine
> >> the port doesn't work. After a device insertion there's no reaction in
> >> the kernel log whatsoever, although I do see that the device gets
> >> powered up physically. If the machine is suspended with a device
> >> inserted into the USB port, the port works fine after resume.
> >> 
> >> This is an AMD-based machine with hci version 0x110 reported. As per
> >> the changelog between v6.6.1 and v6.6.2, 603 commits were backported
> >> into v6.6.2, and one of the commits was as follows:
> >> 
> >> $ git log --oneline v6.6.1..v6.6.2 -- drivers/usb/host/xhci-pci.c 
> >> 14a51fa544225 xhci: Loosen RPM as default policy to cover for AMD xHC
> >> 1.1
> >> 
> >> It seems that this commit explicitly enables runtime PM specifically
> >> for my platform. As per dmesg:
> >> 
> >> v6.6.1: quirks 0x0000000000000410 v6.6.2: quirks 0x0000000200000410
> >> 
> >> Here, bit 33 gets set, which, as expected, corresponds to:
> >> 
> >> drivers/usb/host/xhci.h 1895:#define XHCI_DEFAULT_PM_RUNTIME_ALLOW
> >> BIT_ULL(33)
> >> 
> >> This commit is backported from the upstream commit 4baf12181509, which
> >> is one of 16 commits of the following series named "xhci features":
> >> 
> >> https://lore.kernel.org/all/20231019102924.2797346-1-mathias.nyman@linux.intel.com/
> >>
> >>  It appears that there was another commit in this series, also from
> >> Basavaraj (in Cc), a5d6264b638e, which was not picked for v6.6.2, but
> >> which stated the following:
> >> 
> >> Use the low-power states of the underlying platform to enable runtime
> >> PM. If the platform doesn't support runtime D3, then enabling default
> >> RPM will result in the controller malfunctioning, as in the case of
> >> hotplug devices not being detected because of a failed interrupt
> >> generation.
> >> 
> >> It felt like this was exactly my case. So, I've conducted two tests:
> >> 
> >> 1. Reverted 14a51fa544225 from v6.6.2. With this revert the USB ports
> >> started to work fine, just as they did in v6.6.1. 2. Left 14a51fa544225
> >> in place, but also applied upstream a5d6264b638e on top of v6.6.2. With
> >> this patch added the USB ports also work after a suspend/resume cycle.
> >> 
> >> This runtime PM enablement did also impact my AX200 Bluetooth device,
> >> resulting in long delays before headphones/speaker can connect, but
> >> I've solved this with btusb.enable_autosuspend=N. I think this has
> >> nothing to do with the original issue, and I'm OK with this workaround
> >> unless someone has got a different idea.
> >> 
> >> With that, please consider either reverting 14a51fa544225 from the
> >> stable kernel, or applying a5d6264b638e in addition to it. Given the
> >> mainline kernel has got both of them, I'm in favour of applying
> >> additional commit to the stable kernel.
> > 
> > I've applied this other commit as well to all of the affected branches, 
> > thanks for letting us know.
> > 
> >> I'm also Cc'ing all the people from our Mastodon discussion where I
> >> initially complained about the issue as well as about stable kernel
> >> branch stability:
> >> 
> >> https://activitypub.natalenko.name/@oleksandr/statuses/01HFRXBYWMXF9G4KYPE3XHH0S8
> >>
> >>  I'm not going to expand more on that in this email, especially given
> >> Greg indicated he read the conversation, but I'm open to continuing
> >> this discussion as I still think that current workflow brings visible
> >> issues to ordinary users, and hence some adjustments should be made.
> > 
> > What type of adjustments exactly?  Testing on wide ranges of systems is
> > pretty hard, and this patch explicitly was set to be backported when it
> > hit Linus's tree,
> 
> Are you sure about that "explicitly was set to be backported" part?
> According to Documentation/process/stable-kernel-rules.rst:
> 
> > There are three options to submit a change to -stable trees:
> > 
> >  1. Add a 'stable tag' to the description of a patch you then submit for
> >     mainline inclusion.
> >  2. Ask the stable team to pick up a patch already mainlined.
> >  3. Submit a patch to the stable team that is equivalent to a change already
> >     mainlined.
> 
> I don't see a stable tag in 4baf12181509 ("xhci: Loosen RPM as default
> policy to cover for AMD xHC 1.1"), was it option 2 or 3 then?
> 
> Do you mean the Fixes: tag? the docs only say that can replace the "# 3.3.x"
> part to determine where backporting should stop, but is not itself an
> explicit marking for stable backport?

No, I mean the "The subsystem maintainer knew this needed to be added to
the stable trees so they told the stable maintainer to do so."  Now the
fact that I am both people at once, and did so in my own head instead of
writing myself a public email, might not have made this all that obvious :)

thanks,

gre gk-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-24 13:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-23 18:20 [REGRESSION] USB ports do not work after suspend/resume cycle with v6.6.2 Oleksandr Natalenko
2023-11-24 11:43 ` Greg Kroah-Hartman
2023-11-24 12:59   ` Vlastimil Babka
2023-11-24 13:05     ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).