Fwd: Waking up from resume locks up on sr device

* Fwd: Waking up from resume locks up on sr device
@ 2023-06-09 11:04 Bagas Sanjaya
  2023-06-10  6:38 ` Bagas Sanjaya
  0 siblings, 1 reply; 28+ messages in thread
From: Bagas Sanjaya @ 2023-06-09 11:04 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Pavel Machek, Greg Kroah-Hartman,
	Kees Cook, Tony Luck, Guilherme G. Piccoli, Thorsten Leemhuis,
	James E.J. Bottomley, Martin K. Petersen, Phillip Potter,
	Joe Breuer
  Cc: Linux Power Management, Linux Kernel Mailing List,
	Linux Hardening, Linux Regressions, Linux SCSI

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> I'm running LibreELEC.tv on an x86_64 machine that, following a (kernel) update, now locks up hard while trying to device_resume() => device_lock() on sr 2:0:0:0 (the only sr device in the system).
> 
> Through some digging of my own, I can pretty much isolate the fault in this device_lock() call:
> https://elixir.bootlin.com/linux/v6.3.4/source/drivers/base/power/main.c#L919
> 
> I put an additional debug line exactly before the device_lock(dev) call, like this:
> dev_info(dev, "device_lock() in device_resume()");
> 
> This is the last diagnostic I see, that device_lock() call never returns, ie line 920 in main.c is never reached (confirmed via TRACE_RESUME).
> The device, in my case, is printed as sr 2:0:0:0.
> 
> Knowing this, as a workaround, booting with libata.force=3:disable (libata port 3 corresponds to the SATA channel that sr 2:0:0:0 is attached to) allows suspend/resume to work correctly (but the optical drive is not accessible, obviously).
> 
> When resume hangs, the kernel is not _completely_ locked, interestingly the machine responds to pings and I see the e1000e 'link up' message a couple seconds after the hanging sr2 device_lock().
> Magic SysRq, however, does NOT work in that state; possibly because not enough of USB is resumed yet. Resuming devices seems to broadly follow a kind of breadth-first order; I see USB ports getting resumed closely before the lockup, but no USB (target) devices.
> 
> This is a regression, earlier kernels would work correctly on the exact same hardware. Since it's an 'embedded' type (LibreELEC.tv) install that overwrites its system parts completely on each update, I don't have a clear historical record of kernel versions. From the timeline and my memory, moving from 5.x to 6.x would make sense. Due to the nature of the system, it's somewhat inconvenient for me to try numerous kernel versions blindly for a bisection; I will try to test against some current 5.x soon, however.
> 
> I do have the hope that this information already might give someone with more background a strong idea about the issue.
> 
> Next, I will try to put debug_show_all_locks() before device_lock(), since I can't Alt+SysRq+d.

See Bugzilla for the full thread.

Anyway, I'm adding it to regzbot (with rough version range since the reporter
only knows major kernel version numbers):

#regzbot introduced: v5.0..v6.4-rc5 https://bugzilla.kernel.org/show_bug.cgi?id=217530
#regzbot title: Waking up from resume locks up on SCSI CD/DVD drive

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217530

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 28+ messages in thread