* 5.11-rc device reordering breaks ThinkPad rmi4 suspend @ 2021-01-11 4:44 Hugh Dickins 2021-01-11 12:57 ` Rafael J. Wysocki 2021-01-11 13:43 ` Thierry Reding 0 siblings, 2 replies; 19+ messages in thread From: Hugh Dickins @ 2021-01-11 4:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Thierry Reding, Jonathan Hunter, Saravana Kannan, Greg Kroah-Hartman, linux-kernel, linux-pm [-- Attachment #1: Type: TEXT/PLAIN, Size: 364 bytes --] Hi Rafael, Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) fails to suspend when running 5.11-rc kernels: bisected to 5b6164d3465f ("driver core: Reorder devices on successful probe"), and reverting that fixes it. dmesg.xz attached, but go ahead and ask me to switch on a debug option to extract further info if that may help. Thanks, Hugh [-- Attachment #2: Type: APPLICATION/x-xz, Size: 14460 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 4:44 5.11-rc device reordering breaks ThinkPad rmi4 suspend Hugh Dickins @ 2021-01-11 12:57 ` Rafael J. Wysocki 2021-01-11 17:20 ` Hugh Dickins 2021-01-11 13:43 ` Thierry Reding 1 sibling, 1 reply; 19+ messages in thread From: Rafael J. Wysocki @ 2021-01-11 12:57 UTC (permalink / raw) To: Hugh Dickins Cc: Rafael J. Wysocki, Thierry Reding, Jonathan Hunter, Saravana Kannan, Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM Hi Hugh, Thanks for the report! On Mon, Jan 11, 2021 at 5:44 AM Hugh Dickins <hughd@google.com> wrote: > > Hi Rafael, > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > fails to suspend when running 5.11-rc kernels: bisected to > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > me to switch on a debug option to extract further info if that may help. Does the driver abort the suspend transition by returning an error or does something else happen? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 12:57 ` Rafael J. Wysocki @ 2021-01-11 17:20 ` Hugh Dickins 0 siblings, 0 replies; 19+ messages in thread From: Hugh Dickins @ 2021-01-11 17:20 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Hugh Dickins, Thierry Reding, Jonathan Hunter, Saravana Kannan, Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM On Mon, 11 Jan 2021, Rafael J. Wysocki wrote: > On Mon, Jan 11, 2021 at 5:44 AM Hugh Dickins <hughd@google.com> wrote: > > > > Hi Rafael, > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > fails to suspend when running 5.11-rc kernels: bisected to > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > me to switch on a debug option to extract further info if that may help. > > Does the driver abort the suspend transition by returning an error or > does something else happen? Both. Thierry has pointed to the lines showing failed suspend transition; and I forgot to mention that the touchpad is unresponsive from then on (I might not have noticed the failed suspend without that). But I don't suppose that unresponsiveness is worth worrying about: things went wrong in suspend, so it's not surprising if the driver does not recover well. Thank you both for getting on to this so quickly - but don't worry about getting my touchpad working: I'm glad to see you discussing the wider issues of ordering that this has brought up. Hugh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 4:44 5.11-rc device reordering breaks ThinkPad rmi4 suspend Hugh Dickins 2021-01-11 12:57 ` Rafael J. Wysocki @ 2021-01-11 13:43 ` Thierry Reding 2021-01-11 14:43 ` Rafael J. Wysocki ` (2 more replies) 1 sibling, 3 replies; 19+ messages in thread From: Thierry Reding @ 2021-01-11 13:43 UTC (permalink / raw) To: Hugh Dickins Cc: Rafael J. Wysocki, Jonathan Hunter, Saravana Kannan, Greg Kroah-Hartman, linux-kernel, linux-pm [-- Attachment #1: Type: text/plain, Size: 4328 bytes --] On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > Hi Rafael, > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > fails to suspend when running 5.11-rc kernels: bisected to > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > me to switch on a debug option to extract further info if that may help. Hi Hugh, Quoting what I think are the relevant parts of that log: [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 [ 34.475130] PM: Device 6-002c failed to suspend: error -6 [ 34.475187] PM: Some devices failed to suspend, or early wake event detected [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts! [ 34.487935] acpi LNXPOWER:02: Turning OFF [ 34.488707] acpi LNXPOWER:01: Turning OFF [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! [ 34.489669] psmouse: probe of serio2 failed with error -1 [ 34.489882] OOM killer enabled. [ 34.489891] Restarting tasks ... done. [ 34.589183] PM: suspend exit [ 34.589839] PM: suspend entry (s2idle) [ 34.605884] Filesystems sync: 0.017 seconds [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. [ 34.613645] OOM killer disabled. [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 [ 34.653160] PM: Device 6-002c failed to suspend: error -6 [ 34.653174] PM: Some devices failed to suspend, or early wake event detected [ 34.660515] OOM killer enabled. [ 34.660524] Restarting tasks ... [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! [ 34.661591] psmouse: probe of serio2 failed with error -1 [ 34.669469] done. [ 34.748386] PM: suspend exit I think what might be happening here is that the offending patch causes some devices to be reordered in a way different to how they were ordered originally and the rmi4 driver currently depends on that implicit order. Interestingly one of the bugs that the offending patch fixes is similar in the failure mode but for the reverse reason: the implicit order causes suspend/resume to fail. I suspect that the underlying reason here is that rmi4 needs something in order to successfully suspend (i.e. read the IRQ status registers) that has already been suspended where it hadn't prior to the offending patch. It can't be the I2C controller itself that has been suspended, because the parent/child relationship should prevent that from happening. I'm not familiar with how exactly rmi4 works, so I'll have to do some digging to hopefully pinpoint exactly what's going wrong here. In the meantime, it would be useful to know what exactly the I2C hierarchy looks like. For example, what's the I2C controller that the RMI4 device is hooked up to. According to the above, that's I2C bus 6, so you should be able to find out some details about it by inspecting the corresponding sysfs nodes: $ ls -l /sys/class/i2c-adapter/i2c-6/ $ cat /sys/class/i2c-adapter/i2c-6/name $ ls -l /sys/class/i2c-adapter/i2c-6/device/ Thierry [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 13:43 ` Thierry Reding @ 2021-01-11 14:43 ` Rafael J. Wysocki 2021-01-11 14:57 ` Rafael J. Wysocki 2021-01-11 17:42 ` Hugh Dickins 2 siblings, 0 replies; 19+ messages in thread From: Rafael J. Wysocki @ 2021-01-11 14:43 UTC (permalink / raw) To: Thierry Reding Cc: Hugh Dickins, Rafael J. Wysocki, Jonathan Hunter, Saravana Kannan, Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM On Mon, Jan 11, 2021 at 2:43 PM Thierry Reding <treding@nvidia.com> wrote: > > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > Hi Rafael, > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > fails to suspend when running 5.11-rc kernels: bisected to > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > me to switch on a debug option to extract further info if that may help. > > Hi Hugh, > > Quoting what I think are the relevant parts of that log: I'm not sure how I overlooked that part of the log. Oh well. > [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) > [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 This is a transport device read operation failing, but I'm not sure how it is related to suspend. > [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. And this is the rmi_write() in rmi_f01_suspend() failing AFAICS. > [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 > [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 > [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 So the call chain is rmi_smb_suspend()->rmi_driver_suspend()->rmi_suspend_functions()->suspend_one_function()->rmi_f01_suspend(). > [ 34.475130] PM: Device 6-002c failed to suspend: error -6 > [ 34.475187] PM: Some devices failed to suspend, or early wake event detected > [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts! > [ 34.487935] acpi LNXPOWER:02: Turning OFF > [ 34.488707] acpi LNXPOWER:01: Turning OFF > [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > [ 34.489669] psmouse: probe of serio2 failed with error -1 > [ 34.489882] OOM killer enabled. > [ 34.489891] Restarting tasks ... done. > [ 34.589183] PM: suspend exit > [ 34.589839] PM: suspend entry (s2idle) > [ 34.605884] Filesystems sync: 0.017 seconds > [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. > [ 34.613645] OOM killer disabled. > [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) > [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 > [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 > [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > [ 34.653160] PM: Device 6-002c failed to suspend: error -6 > [ 34.653174] PM: Some devices failed to suspend, or early wake event detected > [ 34.660515] OOM killer enabled. > [ 34.660524] Restarting tasks ... > [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > [ 34.661591] psmouse: probe of serio2 failed with error -1 > [ 34.669469] done. > [ 34.748386] PM: suspend exit > > I think what might be happening here is that the offending patch causes > some devices to be reordered in a way different to how they were ordered > originally and the rmi4 driver currently depends on that implicit order. Yes, that's what appears to be happening. > Interestingly one of the bugs that the offending patch fixes is similar > in the failure mode but for the reverse reason: the implicit order > causes suspend/resume to fail. > > I suspect that the underlying reason here is that rmi4 needs something > in order to successfully suspend (i.e. read the IRQ status registers) > that has already been suspended where it hadn't prior to the offending > patch. Definitely, something has been suspended prematurely. > It can't be the I2C controller itself that has been suspended, > because the parent/child relationship should prevent that from > happening. Well, assuming that there is such a parent-child dependency. It looks like there is at least one level of indirection between i2c and the affected device. > I'm not familiar with how exactly rmi4 works, so I'll have to do > some digging to hopefully pinpoint exactly what's going wrong here. > > In the meantime, it would be useful to know what exactly the I2C > hierarchy looks like. For example, what's the I2C controller that the > RMI4 device is hooked up to. According to the above, that's I2C bus 6, > so you should be able to find out some details about it by inspecting > the corresponding sysfs nodes: > > $ ls -l /sys/class/i2c-adapter/i2c-6/ > $ cat /sys/class/i2c-adapter/i2c-6/name > $ ls -l /sys/class/i2c-adapter/i2c-6/device/ > > Thierry ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 13:43 ` Thierry Reding 2021-01-11 14:43 ` Rafael J. Wysocki @ 2021-01-11 14:57 ` Rafael J. Wysocki 2021-01-11 16:12 ` Thierry Reding 2021-01-11 17:42 ` Hugh Dickins 2 siblings, 1 reply; 19+ messages in thread From: Rafael J. Wysocki @ 2021-01-11 14:57 UTC (permalink / raw) To: Thierry Reding Cc: Hugh Dickins, Rafael J. Wysocki, Jonathan Hunter, Saravana Kannan, Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM On Mon, Jan 11, 2021 at 2:43 PM Thierry Reding <treding@nvidia.com> wrote: > > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > Hi Rafael, > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > fails to suspend when running 5.11-rc kernels: bisected to > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > me to switch on a debug option to extract further info if that may help. > > Hi Hugh, > > Quoting what I think are the relevant parts of that log: > > [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) > [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 > [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 > [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 > [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > [ 34.475130] PM: Device 6-002c failed to suspend: error -6 > [ 34.475187] PM: Some devices failed to suspend, or early wake event detected > [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts! > [ 34.487935] acpi LNXPOWER:02: Turning OFF > [ 34.488707] acpi LNXPOWER:01: Turning OFF > [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > [ 34.489669] psmouse: probe of serio2 failed with error -1 > [ 34.489882] OOM killer enabled. > [ 34.489891] Restarting tasks ... done. > [ 34.589183] PM: suspend exit > [ 34.589839] PM: suspend entry (s2idle) > [ 34.605884] Filesystems sync: 0.017 seconds > [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. > [ 34.613645] OOM killer disabled. > [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) > [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 > [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 > [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > [ 34.653160] PM: Device 6-002c failed to suspend: error -6 > [ 34.653174] PM: Some devices failed to suspend, or early wake event detected > [ 34.660515] OOM killer enabled. > [ 34.660524] Restarting tasks ... > [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > [ 34.661591] psmouse: probe of serio2 failed with error -1 > [ 34.669469] done. > [ 34.748386] PM: suspend exit > > I think what might be happening here is that the offending patch causes > some devices to be reordered in a way different to how they were ordered > originally and the rmi4 driver currently depends on that implicit order. Actually, the only possible case in which the commit in question can introduce suspend failures like this is when some dependency information is missing and so the reordering causes the ordering to change from the (working) implicit one. > Interestingly one of the bugs that the offending patch fixes is similar > in the failure mode but for the reverse reason: the implicit order > causes suspend/resume to fail. And that happens because some dependency information is missing. So we have failing cases when dependency information is missing, so instead of fixing those we have tried to make the core change the ordering after every successful probe in the hope that this will take care of the problem without introducing new breakage. However, it evidently has introduced new breakage and in order to fix it we need to figure out what dependency information is missing in the failing cases and put that information in, but we may as well do the same for the cases that are failing without the offending change. So why don't we revert the commit in question and do just that? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 14:57 ` Rafael J. Wysocki @ 2021-01-11 16:12 ` Thierry Reding 2021-01-11 16:57 ` Rafael J. Wysocki 0 siblings, 1 reply; 19+ messages in thread From: Thierry Reding @ 2021-01-11 16:12 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Hugh Dickins, Jonathan Hunter, Saravana Kannan, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM [-- Attachment #1: Type: text/plain, Size: 7333 bytes --] On Mon, Jan 11, 2021 at 03:57:37PM +0100, Rafael J. Wysocki wrote: > On Mon, Jan 11, 2021 at 2:43 PM Thierry Reding <treding@nvidia.com> wrote: > > > > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > > Hi Rafael, > > > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > > fails to suspend when running 5.11-rc kernels: bisected to > > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > > me to switch on a debug option to extract further info if that may help. > > > > Hi Hugh, > > > > Quoting what I think are the relevant parts of that log: > > > > [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) > > [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 > > [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 > > [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > [ 34.475130] PM: Device 6-002c failed to suspend: error -6 > > [ 34.475187] PM: Some devices failed to suspend, or early wake event detected > > [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts! > > [ 34.487935] acpi LNXPOWER:02: Turning OFF > > [ 34.488707] acpi LNXPOWER:01: Turning OFF > > [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > [ 34.489669] psmouse: probe of serio2 failed with error -1 > > [ 34.489882] OOM killer enabled. > > [ 34.489891] Restarting tasks ... done. > > [ 34.589183] PM: suspend exit > > [ 34.589839] PM: suspend entry (s2idle) > > [ 34.605884] Filesystems sync: 0.017 seconds > > [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. > > [ 34.613645] OOM killer disabled. > > [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) > > [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 > > [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > [ 34.653160] PM: Device 6-002c failed to suspend: error -6 > > [ 34.653174] PM: Some devices failed to suspend, or early wake event detected > > [ 34.660515] OOM killer enabled. > > [ 34.660524] Restarting tasks ... > > [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > [ 34.661591] psmouse: probe of serio2 failed with error -1 > > [ 34.669469] done. > > [ 34.748386] PM: suspend exit > > > > I think what might be happening here is that the offending patch causes > > some devices to be reordered in a way different to how they were ordered > > originally and the rmi4 driver currently depends on that implicit order. > > Actually, the only possible case in which the commit in question can > introduce suspend failures like this is when some dependency > information is missing and so the reordering causes the ordering to > change from the (working) implicit one. > > > Interestingly one of the bugs that the offending patch fixes is similar > > in the failure mode but for the reverse reason: the implicit order > > causes suspend/resume to fail. > > And that happens because some dependency information is missing. > > So we have failing cases when dependency information is missing, so > instead of fixing those we have tried to make the core change the > ordering after every successful probe in the hope that this will take > care of the problem without introducing new breakage. > > However, it evidently has introduced new breakage and in order to fix > it we need to figure out what dependency information is missing in the > failing cases and put that information in, but we may as well do the > same for the cases that are failing without the offending change. > > So why don't we revert the commit in question and do just that? Unfortunately it isn't that easy. In fact, all the dependency information already exists in the case that I cited in 5b6164d3465f ("driver core: Reorder devices on successful probe"), but it's the driver core that suspends/resumes the devices in the wrong order. The reason is because the ACONNECT device depends on the BPMP device (via a power-domains property), but it's also instantiated before the BPMP device (because it is listed earlier in device tree, which is sorted by unit-address first, then alphabetically). BPMP being a CPU non-addressable device it doesn't have a unit-address and hence is listed very late in device tree (by convention). Normally this is would not be a problem because deferred probe would take care of it. But there is one corner-case which happens when the BPMP is built into the kernel (which it usually is, as it provides access to resources necessary for booting, such as clocks and resets) and ACONNECT is built as a loadable module. In that case, BPMP gets probed before ACONNECT and hence when ACONNECT does eventually get loaded, the BPMP is already there, meaning ACONNECT won't defer probe and hence the DPM suspend/resume order is not fixed up by the deferred probe code. And that's precisely what the offending commit addresses. However, the downside is, and we did discuss this during review, that it operates under the (somewhat optimistic) assumption that all the dependency information exists. This is because reordering on successful probe can potentially introduce regressions for dependencies that were previously implicit. So if a system has component B that depends on component A but doesn't model that dependency via some child/parent relationship or an explicit relationship that would be flagged by deferred probe, then this implicit dependency can break by the new reordering on successful probe. I very much suspect that that's exactly what's going on here. This RMI4 device very likely implicitly depends on some other resource getting enabled but doesn't properly model that dependency. If we find out what that dependency is and return -EPROBE_DEFER when that dependency has not probed yet, then deferred probe will automatically take care of ordering everything correctly again (or, in fact, ordering by successful probe will take care of it already because RMI4 would initially fail with -EPROBE_DEFER). Adding Vincent, Jason, Andrew and Lucas (who have recently worked on this driver), perhaps they have some better understanding of what missing dependencies might be causing the above errors. Thierry [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 16:12 ` Thierry Reding @ 2021-01-11 16:57 ` Rafael J. Wysocki 2021-01-11 22:44 ` Saravana Kannan 2021-01-12 17:57 ` Thierry Reding 0 siblings, 2 replies; 19+ messages in thread From: Rafael J. Wysocki @ 2021-01-11 16:57 UTC (permalink / raw) To: Thierry Reding Cc: Rafael J. Wysocki, Hugh Dickins, Jonathan Hunter, Saravana Kannan, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM On Mon, Jan 11, 2021 at 5:12 PM Thierry Reding <treding@nvidia.com> wrote: > > On Mon, Jan 11, 2021 at 03:57:37PM +0100, Rafael J. Wysocki wrote: > > On Mon, Jan 11, 2021 at 2:43 PM Thierry Reding <treding@nvidia.com> wrote: > > > > > > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > > > Hi Rafael, > > > > > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > > > fails to suspend when running 5.11-rc kernels: bisected to > > > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > > > me to switch on a debug option to extract further info if that may help. > > > > > > Hi Hugh, > > > > > > Quoting what I think are the relevant parts of that log: > > > > > > [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) > > > [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 > > > [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > [ 34.475130] PM: Device 6-002c failed to suspend: error -6 > > > [ 34.475187] PM: Some devices failed to suspend, or early wake event detected > > > [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts! > > > [ 34.487935] acpi LNXPOWER:02: Turning OFF > > > [ 34.488707] acpi LNXPOWER:01: Turning OFF > > > [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > [ 34.489669] psmouse: probe of serio2 failed with error -1 > > > [ 34.489882] OOM killer enabled. > > > [ 34.489891] Restarting tasks ... done. > > > [ 34.589183] PM: suspend exit > > > [ 34.589839] PM: suspend entry (s2idle) > > > [ 34.605884] Filesystems sync: 0.017 seconds > > > [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. > > > [ 34.613645] OOM killer disabled. > > > [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > > [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) > > > [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > [ 34.653160] PM: Device 6-002c failed to suspend: error -6 > > > [ 34.653174] PM: Some devices failed to suspend, or early wake event detected > > > [ 34.660515] OOM killer enabled. > > > [ 34.660524] Restarting tasks ... > > > [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > [ 34.661591] psmouse: probe of serio2 failed with error -1 > > > [ 34.669469] done. > > > [ 34.748386] PM: suspend exit > > > > > > I think what might be happening here is that the offending patch causes > > > some devices to be reordered in a way different to how they were ordered > > > originally and the rmi4 driver currently depends on that implicit order. > > > > Actually, the only possible case in which the commit in question can > > introduce suspend failures like this is when some dependency > > information is missing and so the reordering causes the ordering to > > change from the (working) implicit one. > > > > > Interestingly one of the bugs that the offending patch fixes is similar > > > in the failure mode but for the reverse reason: the implicit order > > > causes suspend/resume to fail. > > > > And that happens because some dependency information is missing. > > > > So we have failing cases when dependency information is missing, so > > instead of fixing those we have tried to make the core change the > > ordering after every successful probe in the hope that this will take > > care of the problem without introducing new breakage. > > > > However, it evidently has introduced new breakage and in order to fix > > it we need to figure out what dependency information is missing in the > > failing cases and put that information in, but we may as well do the > > same for the cases that are failing without the offending change. > > > > So why don't we revert the commit in question and do just that? > > Unfortunately it isn't that easy. In fact, all the dependency > information already exists in the case that I cited in 5b6164d3465f > ("driver core: Reorder devices on successful probe"), but it's the > driver core that suspends/resumes the devices in the wrong order. > > The reason is because the ACONNECT device depends on the BPMP device > (via a power-domains property), but it's also instantiated before the > BPMP device (because it is listed earlier in device tree, which is > sorted by unit-address first, then alphabetically). BPMP being a CPU > non-addressable device it doesn't have a unit-address and hence is > listed very late in device tree (by convention). Normally this is would > not be a problem because deferred probe would take care of it. But there > is one corner-case which happens when the BPMP is built into the kernel > (which it usually is, as it provides access to resources necessary for > booting, such as clocks and resets) and ACONNECT is built as a loadable > module. In that case, BPMP gets probed before ACONNECT and hence when > ACONNECT does eventually get loaded, the BPMP is already there, meaning > ACONNECT won't defer probe and hence the DPM suspend/resume order is not > fixed up by the deferred probe code. What about using a device link to enforce the right ordering, then? Deferred probing is not a way to ensure the suitable suspend/resume ordering. > And that's precisely what the offending commit addresses. However, the > downside is, and we did discuss this during review, that it operates > under the (somewhat optimistic) assumption that all the dependency > information exists. This is because reordering on successful probe can > potentially introduce regressions for dependencies that were previously > implicit. So if a system has component B that depends on component A but > doesn't model that dependency via some child/parent relationship or an > explicit relationship that would be flagged by deferred probe, Again, deferred probing may not help here. > then this implicit dependency can break by the new reordering on successful probe. > > I very much suspect that that's exactly what's going on here. This RMI4 > device very likely implicitly depends on some other resource getting > enabled but doesn't properly model that dependency. If we find out what > that dependency is and return -EPROBE_DEFER when that dependency has not > probed yet, then deferred probe will automatically take care of ordering > everything correctly again (or, in fact, ordering by successful probe > will take care of it already because RMI4 would initially fail with > -EPROBE_DEFER). > > Adding Vincent, Jason, Andrew and Lucas (who have recently worked on > this driver), perhaps they have some better understanding of what > missing dependencies might be causing the above errors. IMV it is a mistake to believe that deferred probing can get everything right for you in every case, with or without the offending commit. Sometimes you need to tell the core what the right ordering is and that's what device links are for. As it stands today, that commit doesn't improve the situation and it adds overhead and complexity. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 16:57 ` Rafael J. Wysocki @ 2021-01-11 22:44 ` Saravana Kannan 2021-01-11 23:42 ` Hugh Dickins ` (2 more replies) 2021-01-12 17:57 ` Thierry Reding 1 sibling, 3 replies; 19+ messages in thread From: Saravana Kannan @ 2021-01-11 22:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Thierry Reding, Hugh Dickins, Jonathan Hunter, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM On Mon, Jan 11, 2021 at 8:57 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Mon, Jan 11, 2021 at 5:12 PM Thierry Reding <treding@nvidia.com> wrote: > > > > On Mon, Jan 11, 2021 at 03:57:37PM +0100, Rafael J. Wysocki wrote: > > > On Mon, Jan 11, 2021 at 2:43 PM Thierry Reding <treding@nvidia.com> wrote: > > > > > > > > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > > > > Hi Rafael, > > > > > > > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > > > > fails to suspend when running 5.11-rc kernels: bisected to > > > > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > > > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > > > > me to switch on a debug option to extract further info if that may help. > > > > > > > > Hi Hugh, > > > > > > > > Quoting what I think are the relevant parts of that log: > > > > > > > > [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) > > > > [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 > > > > [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > > [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > > [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > > [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > > [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > > [ 34.475130] PM: Device 6-002c failed to suspend: error -6 > > > > [ 34.475187] PM: Some devices failed to suspend, or early wake event detected > > > > [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > > [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > > [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts! > > > > [ 34.487935] acpi LNXPOWER:02: Turning OFF > > > > [ 34.488707] acpi LNXPOWER:01: Turning OFF > > > > [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > > [ 34.489669] psmouse: probe of serio2 failed with error -1 > > > > [ 34.489882] OOM killer enabled. > > > > [ 34.489891] Restarting tasks ... done. > > > > [ 34.589183] PM: suspend exit > > > > [ 34.589839] PM: suspend entry (s2idle) > > > > [ 34.605884] Filesystems sync: 0.017 seconds > > > > [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. > > > > [ 34.613645] OOM killer disabled. > > > > [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > > > [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) > > > > [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > > [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > > [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > > [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > > [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > > [ 34.653160] PM: Device 6-002c failed to suspend: error -6 > > > > [ 34.653174] PM: Some devices failed to suspend, or early wake event detected > > > > [ 34.660515] OOM killer enabled. > > > > [ 34.660524] Restarting tasks ... > > > > [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > > [ 34.661591] psmouse: probe of serio2 failed with error -1 > > > > [ 34.669469] done. > > > > [ 34.748386] PM: suspend exit > > > > > > > > I think what might be happening here is that the offending patch causes > > > > some devices to be reordered in a way different to how they were ordered > > > > originally and the rmi4 driver currently depends on that implicit order. > > > > > > Actually, the only possible case in which the commit in question can > > > introduce suspend failures like this is when some dependency > > > information is missing and so the reordering causes the ordering to > > > change from the (working) implicit one. > > > > > > > Interestingly one of the bugs that the offending patch fixes is similar > > > > in the failure mode but for the reverse reason: the implicit order > > > > causes suspend/resume to fail. > > > > > > And that happens because some dependency information is missing. > > > > > > So we have failing cases when dependency information is missing, so > > > instead of fixing those we have tried to make the core change the > > > ordering after every successful probe in the hope that this will take > > > care of the problem without introducing new breakage. > > > > > > However, it evidently has introduced new breakage and in order to fix > > > it we need to figure out what dependency information is missing in the > > > failing cases and put that information in, but we may as well do the > > > same for the cases that are failing without the offending change. > > > > > > So why don't we revert the commit in question and do just that? > > > > Unfortunately it isn't that easy. In fact, all the dependency > > information already exists in the case that I cited in 5b6164d3465f > > ("driver core: Reorder devices on successful probe"), but it's the > > driver core that suspends/resumes the devices in the wrong order. > > > > The reason is because the ACONNECT device depends on the BPMP device > > (via a power-domains property), but it's also instantiated before the > > BPMP device (because it is listed earlier in device tree, which is > > sorted by unit-address first, then alphabetically). BPMP being a CPU > > non-addressable device it doesn't have a unit-address and hence is > > listed very late in device tree (by convention). Normally this is would > > not be a problem because deferred probe would take care of it. But there > > is one corner-case which happens when the BPMP is built into the kernel > > (which it usually is, as it provides access to resources necessary for > > booting, such as clocks and resets) and ACONNECT is built as a loadable > > module. In that case, BPMP gets probed before ACONNECT and hence when > > ACONNECT does eventually get loaded, the BPMP is already there, meaning > > ACONNECT won't defer probe and hence the DPM suspend/resume order is not > > fixed up by the deferred probe code. > > What about using a device link to enforce the right ordering, then? > > Deferred probing is not a way to ensure the suitable suspend/resume ordering. Thierry, Can you try booting with fw_devlink=on with this series? It's queued up for 5.12-rc1 https://lore.kernel.org/lkml/20201218031703.3053753-1-saravanak@google.com/ It might solve your issue, but I think your patch still addresses a real issue. > > And that's precisely what the offending commit addresses. However, the > > downside is, and we did discuss this during review, that it operates > > under the (somewhat optimistic) assumption that all the dependency > > information exists. This is because reordering on successful probe can > > potentially introduce regressions for dependencies that were previously > > implicit. So if a system has component B that depends on component A but > > doesn't model that dependency via some child/parent relationship or an > > explicit relationship that would be flagged by deferred probe, > > Again, deferred probing may not help here. > > > then this implicit dependency can break by the new reordering on successful probe. > > > > I very much suspect that that's exactly what's going on here. This RMI4 > > device very likely implicitly depends on some other resource getting > > enabled but doesn't properly model that dependency. If we find out what > > that dependency is and return -EPROBE_DEFER when that dependency has not > > probed yet, then deferred probe will automatically take care of ordering > > everything correctly again (or, in fact, ordering by successful probe > > will take care of it already because RMI4 would initially fail with > > -EPROBE_DEFER). > > > > Adding Vincent, Jason, Andrew and Lucas (who have recently worked on > > this driver), perhaps they have some better understanding of what > > missing dependencies might be causing the above errors. > > IMV it is a mistake to believe that deferred probing can get > everything right for you in every case, with or without the offending > commit. Sometimes you need to tell the core what the right ordering > is and that's what device links are for. IMHO, Thierry's patch is the right way to imply dependencies when device links aren't explicitly calling out dependencies. It's not really depending on deferred probe to imply dependency order. Rather, it's saying that the order in which devices probe is a better way to imply dependency than relying on the order in which devices are added. For Thierry's case, fw_devlink=on might solve his problem, but that's solving the problem by explicitly calling out the dependency (by getting it from DT where the dependency is explicitly called out). For implicit cases, we still need his patch. I wonder how > As it stands today, that commit doesn't improve the situation and it > adds overhead and complexity. I'm okay if we revert it for now, but that doesn't solve the overarching ordering issues though. I happen to have an X1 Carbon (different gen though) lying around and I poked at its /sys folders. None of the devices in the rmi4_smbus are considered the grandchildren of the i2c device. I think the real problem is rmi_register_transport_device() [1] not setting up the parent for any of the new devices it's adding. Hugh, can you try this patch? diff --git a/drivers/input/rmi4/rmi_bus.c b/drivers/input/rmi4/rmi_bus.c index 24f31a5c0e04..50a0134b6901 100644 --- a/drivers/input/rmi4/rmi_bus.c +++ b/drivers/input/rmi4/rmi_bus.c @@ -90,6 +90,7 @@ int rmi_register_transport_device(struct rmi_transport_dev *xport) rmi_dev->dev.bus = &rmi_bus_type; rmi_dev->dev.type = &rmi_device_type; + rmi_dev->dev.parent = xport->dev; xport->rmi_dev = rmi_dev; -Saravana [1] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/input/rmi4/rmi_bus.c#n74 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 22:44 ` Saravana Kannan @ 2021-01-11 23:42 ` Hugh Dickins 2021-01-12 0:14 ` Saravana Kannan 2021-01-12 12:43 ` Rafael J. Wysocki 2021-01-12 17:37 ` Thierry Reding 2 siblings, 1 reply; 19+ messages in thread From: Hugh Dickins @ 2021-01-11 23:42 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Thierry Reding, Hugh Dickins, Jonathan Hunter, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM On Mon, 11 Jan 2021, Saravana Kannan wrote: > > I happen to have an X1 Carbon (different gen though) lying around and > I poked at its /sys folders. None of the devices in the rmi4_smbus are > considered the grandchildren of the i2c device. I think the real > problem is rmi_register_transport_device() [1] not setting up the > parent for any of the new devices it's adding. > > Hugh, can you try this patch? Just tried, but no, this patch does not help; but I bet you're along the right lines, and something as simple will do it. > > diff --git a/drivers/input/rmi4/rmi_bus.c b/drivers/input/rmi4/rmi_bus.c > index 24f31a5c0e04..50a0134b6901 100644 > --- a/drivers/input/rmi4/rmi_bus.c > +++ b/drivers/input/rmi4/rmi_bus.c > @@ -90,6 +90,7 @@ int rmi_register_transport_device(struct > rmi_transport_dev *xport) > > rmi_dev->dev.bus = &rmi_bus_type; > rmi_dev->dev.type = &rmi_device_type; > + rmi_dev->dev.parent = xport->dev; > > xport->rmi_dev = rmi_dev; > > -Saravana > > [1] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/input/rmi4/rmi_bus.c#n74 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 23:42 ` Hugh Dickins @ 2021-01-12 0:14 ` Saravana Kannan 2021-01-12 0:44 ` Hugh Dickins 0 siblings, 1 reply; 19+ messages in thread From: Saravana Kannan @ 2021-01-12 0:14 UTC (permalink / raw) To: Hugh Dickins Cc: Rafael J. Wysocki, Thierry Reding, Jonathan Hunter, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM On Mon, Jan 11, 2021 at 3:42 PM Hugh Dickins <hughd@google.com> wrote: > > On Mon, 11 Jan 2021, Saravana Kannan wrote: > > > > I happen to have an X1 Carbon (different gen though) lying around and > > I poked at its /sys folders. None of the devices in the rmi4_smbus are > > considered the grandchildren of the i2c device. I think the real > > problem is rmi_register_transport_device() [1] not setting up the > > parent for any of the new devices it's adding. > > > > Hugh, can you try this patch? > > Just tried, but no, this patch does not help; but I bet > you're along the right lines, and something as simple will do it. Did you see this patch change the organization of devices under /sys/devices/? The rmi* devices need to be under one of the i2c devices after this patch. Is that not the case? Or is that the case, but you are still seeing suspend/resume issues? -Saravana > > > > > diff --git a/drivers/input/rmi4/rmi_bus.c b/drivers/input/rmi4/rmi_bus.c > > index 24f31a5c0e04..50a0134b6901 100644 > > --- a/drivers/input/rmi4/rmi_bus.c > > +++ b/drivers/input/rmi4/rmi_bus.c > > @@ -90,6 +90,7 @@ int rmi_register_transport_device(struct > > rmi_transport_dev *xport) > > > > rmi_dev->dev.bus = &rmi_bus_type; > > rmi_dev->dev.type = &rmi_device_type; > > + rmi_dev->dev.parent = xport->dev; > > > > xport->rmi_dev = rmi_dev; > > > > -Saravana > > > > [1] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/input/rmi4/rmi_bus.c#n74 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-12 0:14 ` Saravana Kannan @ 2021-01-12 0:44 ` Hugh Dickins 2021-01-12 2:16 ` Saravana Kannan 0 siblings, 1 reply; 19+ messages in thread From: Hugh Dickins @ 2021-01-12 0:44 UTC (permalink / raw) To: Saravana Kannan Cc: Hugh Dickins, Rafael J. Wysocki, Thierry Reding, Jonathan Hunter, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM On Mon, 11 Jan 2021, Saravana Kannan wrote: > On Mon, Jan 11, 2021 at 3:42 PM Hugh Dickins <hughd@google.com> wrote: > > On Mon, 11 Jan 2021, Saravana Kannan wrote: > > > > > > I happen to have an X1 Carbon (different gen though) lying around and > > > I poked at its /sys folders. None of the devices in the rmi4_smbus are > > > considered the grandchildren of the i2c device. I think the real > > > problem is rmi_register_transport_device() [1] not setting up the > > > parent for any of the new devices it's adding. > > > > > > Hugh, can you try this patch? > > > > Just tried, but no, this patch does not help; but I bet > > you're along the right lines, and something as simple will do it. > > Did you see this patch change the organization of devices under /sys/devices/? > The rmi* devices need to be under one of the i2c devices after this > patch. Is that not the case? Or is that the case, but you are still > seeing suspend/resume issues? Now that I look, yes, that patch has moved the directory /sys/devices/rmi4-00 to /sys/devices/pci0000:00/0000:00:1f.4/i2c-6/6-002c/rmi4-00 But I still see the same suspend issues despite that. Hugh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-12 0:44 ` Hugh Dickins @ 2021-01-12 2:16 ` Saravana Kannan 2021-01-12 4:42 ` Hugh Dickins 0 siblings, 1 reply; 19+ messages in thread From: Saravana Kannan @ 2021-01-12 2:16 UTC (permalink / raw) To: Hugh Dickins Cc: Rafael J. Wysocki, Thierry Reding, Jonathan Hunter, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM On Mon, Jan 11, 2021 at 4:44 PM Hugh Dickins <hughd@google.com> wrote: > > On Mon, 11 Jan 2021, Saravana Kannan wrote: > > On Mon, Jan 11, 2021 at 3:42 PM Hugh Dickins <hughd@google.com> wrote: > > > On Mon, 11 Jan 2021, Saravana Kannan wrote: > > > > > > > > I happen to have an X1 Carbon (different gen though) lying around and > > > > I poked at its /sys folders. None of the devices in the rmi4_smbus are > > > > considered the grandchildren of the i2c device. I think the real > > > > problem is rmi_register_transport_device() [1] not setting up the > > > > parent for any of the new devices it's adding. > > > > > > > > Hugh, can you try this patch? > > > > > > Just tried, but no, this patch does not help; but I bet > > > you're along the right lines, and something as simple will do it. > > > > Did you see this patch change the organization of devices under /sys/devices/? > > The rmi* devices need to be under one of the i2c devices after this > > patch. Is that not the case? Or is that the case, but you are still > > seeing suspend/resume issues? > > Now that I look, yes, that patch has moved the directory > /sys/devices/rmi4-00 > to > /sys/devices/pci0000:00/0000:00:1f.4/i2c-6/6-002c/rmi4-00 What about child devices of rmi4-00? Does it still have the rmi4-00.fn* devices as children? I'd think so, but just double checking. > > But I still see the same suspend issues despite that. Can you please get new logs to see if the failure reasons are still the same? I'd think this parent/child relationship would at least avoid the "Failed to read irqs" errors that seem to be due to I2C dependency. -Saravana ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-12 2:16 ` Saravana Kannan @ 2021-01-12 4:42 ` Hugh Dickins 0 siblings, 0 replies; 19+ messages in thread From: Hugh Dickins @ 2021-01-12 4:42 UTC (permalink / raw) To: Saravana Kannan Cc: Hugh Dickins, Rafael J. Wysocki, Thierry Reding, Jonathan Hunter, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM [-- Attachment #1: Type: TEXT/PLAIN, Size: 2196 bytes --] On Mon, 11 Jan 2021, Saravana Kannan wrote: > On Mon, Jan 11, 2021 at 4:44 PM Hugh Dickins <hughd@google.com> wrote: > > On Mon, 11 Jan 2021, Saravana Kannan wrote: > > > > > > Did you see this patch change the organization of devices under /sys/devices/? > > > The rmi* devices need to be under one of the i2c devices after this > > > patch. Is that not the case? Or is that the case, but you are still > > > seeing suspend/resume issues? > > > > Now that I look, yes, that patch has moved the directory > > /sys/devices/rmi4-00 > > to > > /sys/devices/pci0000:00/0000:00:1f.4/i2c-6/6-002c/rmi4-00 > > What about child devices of rmi4-00? Does it still have the > rmi4-00.fn* devices as children? I'd think so, but just double > checking. Yes, the patch moved the rmi4-00 directory and its contents. > > > > > But I still see the same suspend issues despite that. > > Can you please get new logs to see if the failure reasons are still > the same? I'd think this parent/child relationship would at least > avoid the "Failed to read irqs" errors that seem to be due to I2C > dependency. No, it did not avoid the "Failed to read irqs" error (though my recollection from earlier failures before I mailed out, is that that particular error is intermittent: sometimes it showed up, other times not; but always the "Failed to write sleep mode"). I configured CONFIG_PM_DEBUG=y and booted with pm_debug_messages this time, dmesgsys.tar attached, contents: dmesg.rc3 # dmesg of boot and attempt to suspend on 5.11-rc3 sys.rc3 # find /sys | sort | grep -v /sys/fs/cgroup afterwards dmesg.saravana # dmesg of boot and attempt to suspend with your patch sys.saravana # find /sys | sort | grep -v /sys/fs/cgroup afterwards dmesg.revert # dmesg of boot+suspend+resume, rc3 without 5b6164d3465f sys.revert # find /sys | sort | grep -v /sys/fs/cgroup afterwards Not as many debug messages as I was expecting: perhaps you can point me to something else to tune to get more info out? And perhaps it was a mistake to snapshot the /sys hierarchy after rather than before: I see now that it does make some difference. I filtered out /sys/fs/cgroup because that enlarged the diffs with no relevance. Hugh [-- Attachment #2: Type: APPLICATION/x-tar, Size: 99912 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 22:44 ` Saravana Kannan 2021-01-11 23:42 ` Hugh Dickins @ 2021-01-12 12:43 ` Rafael J. Wysocki 2021-01-12 17:37 ` Thierry Reding 2 siblings, 0 replies; 19+ messages in thread From: Rafael J. Wysocki @ 2021-01-12 12:43 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Thierry Reding, Hugh Dickins, Jonathan Hunter, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM On Mon, Jan 11, 2021 at 11:44 PM Saravana Kannan <saravanak@google.com> wrote: > > On Mon, Jan 11, 2021 at 8:57 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Mon, Jan 11, 2021 at 5:12 PM Thierry Reding <treding@nvidia.com> wrote: > > > > > > On Mon, Jan 11, 2021 at 03:57:37PM +0100, Rafael J. Wysocki wrote: > > > > On Mon, Jan 11, 2021 at 2:43 PM Thierry Reding <treding@nvidia.com> wrote: > > > > > > > > > > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > > > > > Hi Rafael, > > > > > > > > > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > > > > > fails to suspend when running 5.11-rc kernels: bisected to > > > > > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > > > > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > > > > > me to switch on a debug option to extract further info if that may help. > > > > > > > > > > Hi Hugh, > > > > > > > > > > Quoting what I think are the relevant parts of that log: > > > > > > > > > > [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) > > > > > [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 > > > > > [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > > > [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > > > [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > > > [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > > > [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > > > [ 34.475130] PM: Device 6-002c failed to suspend: error -6 > > > > > [ 34.475187] PM: Some devices failed to suspend, or early wake event detected > > > > > [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > > > [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > > > [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts! > > > > > [ 34.487935] acpi LNXPOWER:02: Turning OFF > > > > > [ 34.488707] acpi LNXPOWER:01: Turning OFF > > > > > [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > > > [ 34.489669] psmouse: probe of serio2 failed with error -1 > > > > > [ 34.489882] OOM killer enabled. > > > > > [ 34.489891] Restarting tasks ... done. > > > > > [ 34.589183] PM: suspend exit > > > > > [ 34.589839] PM: suspend entry (s2idle) > > > > > [ 34.605884] Filesystems sync: 0.017 seconds > > > > > [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. > > > > > [ 34.613645] OOM killer disabled. > > > > > [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > > > > [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) > > > > > [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > > > [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > > > [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > > > [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > > > [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > > > [ 34.653160] PM: Device 6-002c failed to suspend: error -6 > > > > > [ 34.653174] PM: Some devices failed to suspend, or early wake event detected > > > > > [ 34.660515] OOM killer enabled. > > > > > [ 34.660524] Restarting tasks ... > > > > > [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > > > [ 34.661591] psmouse: probe of serio2 failed with error -1 > > > > > [ 34.669469] done. > > > > > [ 34.748386] PM: suspend exit > > > > > > > > > > I think what might be happening here is that the offending patch causes > > > > > some devices to be reordered in a way different to how they were ordered > > > > > originally and the rmi4 driver currently depends on that implicit order. > > > > > > > > Actually, the only possible case in which the commit in question can > > > > introduce suspend failures like this is when some dependency > > > > information is missing and so the reordering causes the ordering to > > > > change from the (working) implicit one. > > > > > > > > > Interestingly one of the bugs that the offending patch fixes is similar > > > > > in the failure mode but for the reverse reason: the implicit order > > > > > causes suspend/resume to fail. > > > > > > > > And that happens because some dependency information is missing. > > > > > > > > So we have failing cases when dependency information is missing, so > > > > instead of fixing those we have tried to make the core change the > > > > ordering after every successful probe in the hope that this will take > > > > care of the problem without introducing new breakage. > > > > > > > > However, it evidently has introduced new breakage and in order to fix > > > > it we need to figure out what dependency information is missing in the > > > > failing cases and put that information in, but we may as well do the > > > > same for the cases that are failing without the offending change. > > > > > > > > So why don't we revert the commit in question and do just that? > > > > > > Unfortunately it isn't that easy. In fact, all the dependency > > > information already exists in the case that I cited in 5b6164d3465f > > > ("driver core: Reorder devices on successful probe"), but it's the > > > driver core that suspends/resumes the devices in the wrong order. > > > > > > The reason is because the ACONNECT device depends on the BPMP device > > > (via a power-domains property), but it's also instantiated before the > > > BPMP device (because it is listed earlier in device tree, which is > > > sorted by unit-address first, then alphabetically). BPMP being a CPU > > > non-addressable device it doesn't have a unit-address and hence is > > > listed very late in device tree (by convention). Normally this is would > > > not be a problem because deferred probe would take care of it. But there > > > is one corner-case which happens when the BPMP is built into the kernel > > > (which it usually is, as it provides access to resources necessary for > > > booting, such as clocks and resets) and ACONNECT is built as a loadable > > > module. In that case, BPMP gets probed before ACONNECT and hence when > > > ACONNECT does eventually get loaded, the BPMP is already there, meaning > > > ACONNECT won't defer probe and hence the DPM suspend/resume order is not > > > fixed up by the deferred probe code. > > > > What about using a device link to enforce the right ordering, then? > > > > Deferred probing is not a way to ensure the suitable suspend/resume ordering. > > Thierry, > > Can you try booting with fw_devlink=on with this series? It's queued > up for 5.12-rc1 > https://lore.kernel.org/lkml/20201218031703.3053753-1-saravanak@google.com/ > > It might solve your issue, but I think your patch still addresses a real issue. > > > > And that's precisely what the offending commit addresses. However, the > > > downside is, and we did discuss this during review, that it operates > > > under the (somewhat optimistic) assumption that all the dependency > > > information exists. This is because reordering on successful probe can > > > potentially introduce regressions for dependencies that were previously > > > implicit. So if a system has component B that depends on component A but > > > doesn't model that dependency via some child/parent relationship or an > > > explicit relationship that would be flagged by deferred probe, > > > > Again, deferred probing may not help here. > > > > > then this implicit dependency can break by the new reordering on successful probe. > > > > > > I very much suspect that that's exactly what's going on here. This RMI4 > > > device very likely implicitly depends on some other resource getting > > > enabled but doesn't properly model that dependency. If we find out what > > > that dependency is and return -EPROBE_DEFER when that dependency has not > > > probed yet, then deferred probe will automatically take care of ordering > > > everything correctly again (or, in fact, ordering by successful probe > > > will take care of it already because RMI4 would initially fail with > > > -EPROBE_DEFER). > > > > > > Adding Vincent, Jason, Andrew and Lucas (who have recently worked on > > > this driver), perhaps they have some better understanding of what > > > missing dependencies might be causing the above errors. > > > > IMV it is a mistake to believe that deferred probing can get > > everything right for you in every case, with or without the offending > > commit. Sometimes you need to tell the core what the right ordering > > is and that's what device links are for. > > IMHO, Thierry's patch is the right way to imply dependencies when > device links aren't explicitly calling out dependencies. It's not > really depending on deferred probe to imply dependency order. Rather, > it's saying that the order in which devices probe is a better way to > imply dependency than relying on the order in which devices are added. Well, it breaks existing setups. Moreover, I'm not really convinced that it is a better way to "imply dependencies", it is just different. > For Thierry's case, fw_devlink=on might solve his problem, but that's > solving the problem by explicitly calling out the dependency (by > getting it from DT where the dependency is explicitly called out). For > implicit cases, we still need his patch. I wonder how The "implicit" cases are all broken potentially, because the dpm_list ordering only matters for the order in which PM sleep callbacks are invoked, and that doesn't help if they are async and it is meaningless for runtime PM. > > As it stands today, that commit doesn't improve the situation and it > > adds overhead and complexity. > > I'm okay if we revert it for now, but that doesn't solve the > overarching ordering issues though. No, it doesn't and the commit doesn't solve it either which is my point. The situation before and after the commit is generally the same, even though the set of affected systems is different in each case, so because the general strategy of the kernel development is to avoid new breakage, it should be reverted. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 22:44 ` Saravana Kannan 2021-01-11 23:42 ` Hugh Dickins 2021-01-12 12:43 ` Rafael J. Wysocki @ 2021-01-12 17:37 ` Thierry Reding 2 siblings, 0 replies; 19+ messages in thread From: Thierry Reding @ 2021-01-12 17:37 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Hugh Dickins, Jonathan Hunter, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM [-- Attachment #1: Type: text/plain, Size: 11688 bytes --] On Mon, Jan 11, 2021 at 02:44:03PM -0800, Saravana Kannan wrote: > On Mon, Jan 11, 2021 at 8:57 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Mon, Jan 11, 2021 at 5:12 PM Thierry Reding <treding@nvidia.com> wrote: > > > > > > On Mon, Jan 11, 2021 at 03:57:37PM +0100, Rafael J. Wysocki wrote: > > > > On Mon, Jan 11, 2021 at 2:43 PM Thierry Reding <treding@nvidia.com> wrote: > > > > > > > > > > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > > > > > Hi Rafael, > > > > > > > > > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > > > > > fails to suspend when running 5.11-rc kernels: bisected to > > > > > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > > > > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > > > > > me to switch on a debug option to extract further info if that may help. > > > > > > > > > > Hi Hugh, > > > > > > > > > > Quoting what I think are the relevant parts of that log: > > > > > > > > > > [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) > > > > > [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 > > > > > [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > > > [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > > > [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > > > [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > > > [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > > > [ 34.475130] PM: Device 6-002c failed to suspend: error -6 > > > > > [ 34.475187] PM: Some devices failed to suspend, or early wake event detected > > > > > [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > > > [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > > > [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts! > > > > > [ 34.487935] acpi LNXPOWER:02: Turning OFF > > > > > [ 34.488707] acpi LNXPOWER:01: Turning OFF > > > > > [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > > > [ 34.489669] psmouse: probe of serio2 failed with error -1 > > > > > [ 34.489882] OOM killer enabled. > > > > > [ 34.489891] Restarting tasks ... done. > > > > > [ 34.589183] PM: suspend exit > > > > > [ 34.589839] PM: suspend entry (s2idle) > > > > > [ 34.605884] Filesystems sync: 0.017 seconds > > > > > [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. > > > > > [ 34.613645] OOM killer disabled. > > > > > [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > > > > [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) > > > > > [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > > > [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > > > [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > > > [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > > > [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > > > [ 34.653160] PM: Device 6-002c failed to suspend: error -6 > > > > > [ 34.653174] PM: Some devices failed to suspend, or early wake event detected > > > > > [ 34.660515] OOM killer enabled. > > > > > [ 34.660524] Restarting tasks ... > > > > > [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > > > [ 34.661591] psmouse: probe of serio2 failed with error -1 > > > > > [ 34.669469] done. > > > > > [ 34.748386] PM: suspend exit > > > > > > > > > > I think what might be happening here is that the offending patch causes > > > > > some devices to be reordered in a way different to how they were ordered > > > > > originally and the rmi4 driver currently depends on that implicit order. > > > > > > > > Actually, the only possible case in which the commit in question can > > > > introduce suspend failures like this is when some dependency > > > > information is missing and so the reordering causes the ordering to > > > > change from the (working) implicit one. > > > > > > > > > Interestingly one of the bugs that the offending patch fixes is similar > > > > > in the failure mode but for the reverse reason: the implicit order > > > > > causes suspend/resume to fail. > > > > > > > > And that happens because some dependency information is missing. > > > > > > > > So we have failing cases when dependency information is missing, so > > > > instead of fixing those we have tried to make the core change the > > > > ordering after every successful probe in the hope that this will take > > > > care of the problem without introducing new breakage. > > > > > > > > However, it evidently has introduced new breakage and in order to fix > > > > it we need to figure out what dependency information is missing in the > > > > failing cases and put that information in, but we may as well do the > > > > same for the cases that are failing without the offending change. > > > > > > > > So why don't we revert the commit in question and do just that? > > > > > > Unfortunately it isn't that easy. In fact, all the dependency > > > information already exists in the case that I cited in 5b6164d3465f > > > ("driver core: Reorder devices on successful probe"), but it's the > > > driver core that suspends/resumes the devices in the wrong order. > > > > > > The reason is because the ACONNECT device depends on the BPMP device > > > (via a power-domains property), but it's also instantiated before the > > > BPMP device (because it is listed earlier in device tree, which is > > > sorted by unit-address first, then alphabetically). BPMP being a CPU > > > non-addressable device it doesn't have a unit-address and hence is > > > listed very late in device tree (by convention). Normally this is would > > > not be a problem because deferred probe would take care of it. But there > > > is one corner-case which happens when the BPMP is built into the kernel > > > (which it usually is, as it provides access to resources necessary for > > > booting, such as clocks and resets) and ACONNECT is built as a loadable > > > module. In that case, BPMP gets probed before ACONNECT and hence when > > > ACONNECT does eventually get loaded, the BPMP is already there, meaning > > > ACONNECT won't defer probe and hence the DPM suspend/resume order is not > > > fixed up by the deferred probe code. > > > > What about using a device link to enforce the right ordering, then? > > > > Deferred probing is not a way to ensure the suitable suspend/resume ordering. > > Thierry, > > Can you try booting with fw_devlink=on with this series? It's queued > up for 5.12-rc1 > https://lore.kernel.org/lkml/20201218031703.3053753-1-saravanak@google.com/ > > It might solve your issue, but I think your patch still addresses a real issue. I've tried booting next-20210112 on the Jetson AGX Xavier device where the suspend/resume problem with ACONNECT was happening and even if I revert the reordering patch, ACONNECT resumes correctly. Interestingly, and I don't know if that's expected or not, even setting fw_devlink=off on today's linux-next no longer exhibits the problem. My impression from going through the fw_devlink code was that switching it off would disable any reordering that was going on, so I'm not sure I understand why the initial problem is fixed even in that case. > > > And that's precisely what the offending commit addresses. However, the > > > downside is, and we did discuss this during review, that it operates > > > under the (somewhat optimistic) assumption that all the dependency > > > information exists. This is because reordering on successful probe can > > > potentially introduce regressions for dependencies that were previously > > > implicit. So if a system has component B that depends on component A but > > > doesn't model that dependency via some child/parent relationship or an > > > explicit relationship that would be flagged by deferred probe, > > > > Again, deferred probing may not help here. > > > > > then this implicit dependency can break by the new reordering on successful probe. > > > > > > I very much suspect that that's exactly what's going on here. This RMI4 > > > device very likely implicitly depends on some other resource getting > > > enabled but doesn't properly model that dependency. If we find out what > > > that dependency is and return -EPROBE_DEFER when that dependency has not > > > probed yet, then deferred probe will automatically take care of ordering > > > everything correctly again (or, in fact, ordering by successful probe > > > will take care of it already because RMI4 would initially fail with > > > -EPROBE_DEFER). > > > > > > Adding Vincent, Jason, Andrew and Lucas (who have recently worked on > > > this driver), perhaps they have some better understanding of what > > > missing dependencies might be causing the above errors. > > > > IMV it is a mistake to believe that deferred probing can get > > everything right for you in every case, with or without the offending > > commit. Sometimes you need to tell the core what the right ordering > > is and that's what device links are for. > > IMHO, Thierry's patch is the right way to imply dependencies when > device links aren't explicitly calling out dependencies. It's not > really depending on deferred probe to imply dependency order. Rather, > it's saying that the order in which devices probe is a better way to > imply dependency than relying on the order in which devices are added. > > For Thierry's case, fw_devlink=on might solve his problem, but that's > solving the problem by explicitly calling out the dependency (by > getting it from DT where the dependency is explicitly called out). For > implicit cases, we still need his patch. I wonder how Yes, I think reordering on probe is logically the right thing to do. However, I also understand Rafael's argument that this can now cause failures that were not there before. I do think that any failures exposed by this patch are in fact bugs caused by missing explicit dependencies that are hidden right now by mere chance because devices are instantiated in an order that makes them work. If for any reason the instantiation order changes these can easily break again. > > As it stands today, that commit doesn't improve the situation and it > > adds overhead and complexity. > > I'm okay if we revert it for now, but that doesn't solve the > overarching ordering issues though. Given that the ACONNECT problem seems to have been solved I don't have any objections to reverting this either. On the other hand, perhaps we can stage this in by tying it to a command-line option? I'm not sure how helpful that would be, because if it isn't a default, people are unlikely to go and test with it enabled and report issues. I suppose it's also not a perfect solution because it doesn't weed out all of the missing dependencies. Besides the implied dependencies that can now get jumbled, I'm sure there are lots of implied dependencies that won't get jumbled and therefore will still go unnoticed. And perhaps that's fine. We only need to fix the bugs that are actually causing problems, right? Thierry [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 16:57 ` Rafael J. Wysocki 2021-01-11 22:44 ` Saravana Kannan @ 2021-01-12 17:57 ` Thierry Reding 2021-01-12 20:38 ` Saravana Kannan 1 sibling, 1 reply; 19+ messages in thread From: Thierry Reding @ 2021-01-12 17:57 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Hugh Dickins, Jonathan Hunter, Saravana Kannan, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM [-- Attachment #1: Type: text/plain, Size: 8617 bytes --] On Mon, Jan 11, 2021 at 05:57:17PM +0100, Rafael J. Wysocki wrote: > On Mon, Jan 11, 2021 at 5:12 PM Thierry Reding <treding@nvidia.com> wrote: > > > > On Mon, Jan 11, 2021 at 03:57:37PM +0100, Rafael J. Wysocki wrote: > > > On Mon, Jan 11, 2021 at 2:43 PM Thierry Reding <treding@nvidia.com> wrote: > > > > > > > > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > > > > Hi Rafael, > > > > > > > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > > > > fails to suspend when running 5.11-rc kernels: bisected to > > > > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > > > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > > > > me to switch on a debug option to extract further info if that may help. > > > > > > > > Hi Hugh, > > > > > > > > Quoting what I think are the relevant parts of that log: > > > > > > > > [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) > > > > [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 > > > > [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > > [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > > [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > > [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > > [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > > [ 34.475130] PM: Device 6-002c failed to suspend: error -6 > > > > [ 34.475187] PM: Some devices failed to suspend, or early wake event detected > > > > [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > > [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > > [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts! > > > > [ 34.487935] acpi LNXPOWER:02: Turning OFF > > > > [ 34.488707] acpi LNXPOWER:01: Turning OFF > > > > [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > > [ 34.489669] psmouse: probe of serio2 failed with error -1 > > > > [ 34.489882] OOM killer enabled. > > > > [ 34.489891] Restarting tasks ... done. > > > > [ 34.589183] PM: suspend exit > > > > [ 34.589839] PM: suspend entry (s2idle) > > > > [ 34.605884] Filesystems sync: 0.017 seconds > > > > [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. > > > > [ 34.613645] OOM killer disabled. > > > > [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > > > [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) > > > > [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > > [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > > [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > > [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > > [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > > [ 34.653160] PM: Device 6-002c failed to suspend: error -6 > > > > [ 34.653174] PM: Some devices failed to suspend, or early wake event detected > > > > [ 34.660515] OOM killer enabled. > > > > [ 34.660524] Restarting tasks ... > > > > [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > > [ 34.661591] psmouse: probe of serio2 failed with error -1 > > > > [ 34.669469] done. > > > > [ 34.748386] PM: suspend exit > > > > > > > > I think what might be happening here is that the offending patch causes > > > > some devices to be reordered in a way different to how they were ordered > > > > originally and the rmi4 driver currently depends on that implicit order. > > > > > > Actually, the only possible case in which the commit in question can > > > introduce suspend failures like this is when some dependency > > > information is missing and so the reordering causes the ordering to > > > change from the (working) implicit one. > > > > > > > Interestingly one of the bugs that the offending patch fixes is similar > > > > in the failure mode but for the reverse reason: the implicit order > > > > causes suspend/resume to fail. > > > > > > And that happens because some dependency information is missing. > > > > > > So we have failing cases when dependency information is missing, so > > > instead of fixing those we have tried to make the core change the > > > ordering after every successful probe in the hope that this will take > > > care of the problem without introducing new breakage. > > > > > > However, it evidently has introduced new breakage and in order to fix > > > it we need to figure out what dependency information is missing in the > > > failing cases and put that information in, but we may as well do the > > > same for the cases that are failing without the offending change. > > > > > > So why don't we revert the commit in question and do just that? > > > > Unfortunately it isn't that easy. In fact, all the dependency > > information already exists in the case that I cited in 5b6164d3465f > > ("driver core: Reorder devices on successful probe"), but it's the > > driver core that suspends/resumes the devices in the wrong order. > > > > The reason is because the ACONNECT device depends on the BPMP device > > (via a power-domains property), but it's also instantiated before the > > BPMP device (because it is listed earlier in device tree, which is > > sorted by unit-address first, then alphabetically). BPMP being a CPU > > non-addressable device it doesn't have a unit-address and hence is > > listed very late in device tree (by convention). Normally this is would > > not be a problem because deferred probe would take care of it. But there > > is one corner-case which happens when the BPMP is built into the kernel > > (which it usually is, as it provides access to resources necessary for > > booting, such as clocks and resets) and ACONNECT is built as a loadable > > module. In that case, BPMP gets probed before ACONNECT and hence when > > ACONNECT does eventually get loaded, the BPMP is already there, meaning > > ACONNECT won't defer probe and hence the DPM suspend/resume order is not > > fixed up by the deferred probe code. > > What about using a device link to enforce the right ordering, then? I was going to implement that, but then I realized that the specific problem I was facing with ACONNECT had been solved differently in the meantime. I wasn't able to exactly pinpoint what fixed it, but I suspect it might have been some of Saravana's fw_devlink code. It's a bit difficult to find out what exactly changed, because it happened after the offending commit was already merged, so I would have to go through all linux-next releases since early December and revert my patch to find out when the change happened and then bisect which change exactly did it. But yes, using a device link would've done the trick as well. However the idea had been to potentially fix many more subtle cases like the one we faced in ACONNECT at the same time. It's unfortunate that it breaks a bunch of other cases that apparently are also missing dependency information and just happen to work fine with the status quo. > Deferred probing is not a way to ensure the suitable suspend/resume ordering. Well, it is in the majority of cases because deferred probe causes the reordering of the suspend/resume queue. And that all makes sense because suppliers should always be suspended after all their consumers. But that's not what the offending patch was doing. The purpose was to ensure that the default suspend/resume ordering matches the probe order, which is both more logical (though I suppose that can be subjective) and ensures that deferred probing can work properly for all cases. Anyway, this is ultimately just switching out one set of broken cases for another, so might as well revert the offending patch and concentrate on fixing the broken cases one by one as we find them. Like I said, I'm slightly concerned about drivers like rmi4 breaking unexpectedly down the road because some other patch caused the ordering to change, so reverting now may just be putting off the inevitable. But you're obviously right that we shouldn't randomly break working setups, so I'm fine with the revert. Thierry [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-12 17:57 ` Thierry Reding @ 2021-01-12 20:38 ` Saravana Kannan 0 siblings, 0 replies; 19+ messages in thread From: Saravana Kannan @ 2021-01-12 20:38 UTC (permalink / raw) To: Thierry Reding Cc: Rafael J. Wysocki, Hugh Dickins, Jonathan Hunter, Greg Kroah-Hartman, Vincent Huang, Jason A. Donenfeld, Andrew Duggan, Lucas Stach, Linux Kernel Mailing List, Linux PM I'm just going to combine my response to the 2-3 emails in this one response. On Tue, Jan 12, 2021 at 9:57 AM Thierry Reding <treding@nvidia.com> wrote: > > On Mon, Jan 11, 2021 at 05:57:17PM +0100, Rafael J. Wysocki wrote: > > On Mon, Jan 11, 2021 at 5:12 PM Thierry Reding <treding@nvidia.com> wrote: > > > > > > On Mon, Jan 11, 2021 at 03:57:37PM +0100, Rafael J. Wysocki wrote: > > > > On Mon, Jan 11, 2021 at 2:43 PM Thierry Reding <treding@nvidia.com> wrote: > > > > > > > > > > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > > > > > Hi Rafael, > > > > > > > > > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > > > > > fails to suspend when running 5.11-rc kernels: bisected to > > > > > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > > > > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > > > > > me to switch on a debug option to extract further info if that may help. > > > > > > > > > > Hi Hugh, > > > > > > > > > > Quoting what I think are the relevant parts of that log: > > > > > > > > > > [ 34.373742] printk: Suspending console(s) (use no_console_suspend to debug) > > > > > [ 34.429015] rmi4_physical rmi4-00: Failed to read irqs, code=-6 > > > > > [ 34.474973] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > > > [ 34.474994] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > > > [ 34.475001] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > > > [ 34.475105] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > > > [ 34.475113] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > > > [ 34.475130] PM: Device 6-002c failed to suspend: error -6 > > > > > [ 34.475187] PM: Some devices failed to suspend, or early wake event detected > > > > > [ 34.480324] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > > > [ 34.480748] rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-6). > > > > > [ 34.481558] rmi4_physical rmi4-00: rmi_driver_clear_irq_bits: Failed to change enabled interrupts! > > > > > [ 34.487935] acpi LNXPOWER:02: Turning OFF > > > > > [ 34.488707] acpi LNXPOWER:01: Turning OFF > > > > > [ 34.489554] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > > > [ 34.489669] psmouse: probe of serio2 failed with error -1 > > > > > [ 34.489882] OOM killer enabled. > > > > > [ 34.489891] Restarting tasks ... done. > > > > > [ 34.589183] PM: suspend exit > > > > > [ 34.589839] PM: suspend entry (s2idle) > > > > > [ 34.605884] Filesystems sync: 0.017 seconds > > > > > [ 34.607594] Freezing user space processes ... (elapsed 0.006 seconds) done. > > > > > [ 34.613645] OOM killer disabled. > > > > > [ 34.613650] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > > > > [ 34.615482] printk: Suspending console(s) (use no_console_suspend to debug) > > > > > [ 34.653097] rmi4_f01 rmi4-00.fn01: Failed to write sleep mode: -6. > > > > > [ 34.653108] rmi4_f01 rmi4-00.fn01: Suspend failed with code -6. > > > > > [ 34.653115] rmi4_physical rmi4-00: Failed to suspend functions: -6 > > > > > [ 34.653123] rmi4_smbus 6-002c: Failed to suspend device: -6 > > > > > [ 34.653129] PM: dpm_run_callback(): rmi_smb_suspend+0x0/0x3c returns -6 > > > > > [ 34.653160] PM: Device 6-002c failed to suspend: error -6 > > > > > [ 34.653174] PM: Some devices failed to suspend, or early wake event detected > > > > > [ 34.660515] OOM killer enabled. > > > > > [ 34.660524] Restarting tasks ... > > > > > [ 34.661456] rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts! > > > > > [ 34.661591] psmouse: probe of serio2 failed with error -1 > > > > > [ 34.669469] done. > > > > > [ 34.748386] PM: suspend exit > > > > > > > > > > I think what might be happening here is that the offending patch causes > > > > > some devices to be reordered in a way different to how they were ordered > > > > > originally and the rmi4 driver currently depends on that implicit order. > > > > > > > > Actually, the only possible case in which the commit in question can > > > > introduce suspend failures like this is when some dependency > > > > information is missing and so the reordering causes the ordering to > > > > change from the (working) implicit one. > > > > > > > > > Interestingly one of the bugs that the offending patch fixes is similar > > > > > in the failure mode but for the reverse reason: the implicit order > > > > > causes suspend/resume to fail. > > > > > > > > And that happens because some dependency information is missing. > > > > > > > > So we have failing cases when dependency information is missing, so > > > > instead of fixing those we have tried to make the core change the > > > > ordering after every successful probe in the hope that this will take > > > > care of the problem without introducing new breakage. > > > > > > > > However, it evidently has introduced new breakage and in order to fix > > > > it we need to figure out what dependency information is missing in the > > > > failing cases and put that information in, but we may as well do the > > > > same for the cases that are failing without the offending change. > > > > > > > > So why don't we revert the commit in question and do just that? > > > > > > Unfortunately it isn't that easy. In fact, all the dependency > > > information already exists in the case that I cited in 5b6164d3465f > > > ("driver core: Reorder devices on successful probe"), but it's the > > > driver core that suspends/resumes the devices in the wrong order. > > > > > > The reason is because the ACONNECT device depends on the BPMP device > > > (via a power-domains property), but it's also instantiated before the > > > BPMP device (because it is listed earlier in device tree, which is > > > sorted by unit-address first, then alphabetically). BPMP being a CPU > > > non-addressable device it doesn't have a unit-address and hence is > > > listed very late in device tree (by convention). Normally this is would > > > not be a problem because deferred probe would take care of it. But there > > > is one corner-case which happens when the BPMP is built into the kernel > > > (which it usually is, as it provides access to resources necessary for > > > booting, such as clocks and resets) and ACONNECT is built as a loadable > > > module. In that case, BPMP gets probed before ACONNECT and hence when > > > ACONNECT does eventually get loaded, the BPMP is already there, meaning > > > ACONNECT won't defer probe and hence the DPM suspend/resume order is not > > > fixed up by the deferred probe code. > > > > What about using a device link to enforce the right ordering, then? > > I was going to implement that, but then I realized that the specific > problem I was facing with ACONNECT had been solved differently in the > meantime. I wasn't able to exactly pinpoint what fixed it, but I suspect > it might have been some of Saravana's fw_devlink code. I'd be very surprised if fw_devlink code fixed anything for you when you don't see the issue with fw_devlink=off/permissive. In those two modes, fw_devlink is pretty much a NOP when it comes to probe/suspend ordering. > It's a bit > difficult to find out what exactly changed, because it happened after > the offending commit was already merged, so I would have to go through > all linux-next releases since early December and revert my patch to find > out when the change happened and then bisect which change exactly did > it. > > But yes, using a device link would've done the trick as well. However > the idea had been to potentially fix many more subtle cases like the one > we faced in ACONNECT at the same time. It's unfortunate that it breaks a > bunch of other cases that apparently are also missing dependency > information and just happen to work fine with the status quo. > > > Deferred probing is not a way to ensure the suitable suspend/resume ordering. > > Well, it is in the majority of cases because deferred probe causes the > reordering of the suspend/resume queue. And that all makes sense because > suppliers should always be suspended after all their consumers. > > But that's not what the offending patch was doing. The purpose was to > ensure that the default suspend/resume ordering matches the probe order, > which is both more logical (though I suppose that can be subjective) I agree with you too that the suspend order matching the probe order is more logical than device add order. However, I'm now completely changing my position on the patch (the one that reorder on probe). I think we should revert it because reordering on probe has a bunch of problems. Take this example: 1. Device-A probe() starts running. a. Device-A probe adds two devices - Device-B and Device-C. But incorrectly doesn't mark Device-A as their parents. b. Say the driver for B has already been loaded, so then Device-B's probe() runs. c. Device-B is reordered to the end of list dpm_list (and so are it's children). 4. Device-A probe() completes. 5. Device-A is reordered to the end of dpm_list but device-B and device-C are incorrectly NOT reorder to come after device-A. And this gets worse when async probing is enabled or if a driver-B is registered by another thread during Device-A probe. The same problem also exists for supplier-consumer relationships. A consumer's probe can finish before the supplier's probe. This is because the supplier would have registered with the frameworks before the supplier's probe completes. And that's all that's needed for the consumer's probe to finish. There's no way to fix this without the complete dependency info (parent-child _and_ supplier-consumer info) being shared with the driver core. So, even the reordering of dpm_list after a deferred probe is broken/wrong and it's pretty racy. It just happens to work for what's already working with the current boot timing in those devices. If the complete dependency info was provided, we could delete the reorder after the deferred probe. But that's obviously not going to happen for existing devices and we just have to live with the code being there. So for future issues related to ordering, we should just focus on fixing/adding parent-child relationships (Eg: Hugh's rmi4 driver issue) or adding more device links (Eg: Thierry's issue). As for adding device links, hopefully fw_devlink=on as default will happen soon and at least take care of this for DT based systems. Hopefully someone can add fw_devlink support for ACPI (it's closer to the bottom of my long TODO list). > and > ensures that deferred probing can work properly for all cases. > > Anyway, this is ultimately just switching out one set of broken cases > for another, so might as well revert the offending patch and concentrate > on fixing the broken cases one by one as we find them. Can you see if fw_devlink=on boots for you? If so, that's the fix. If it doesn't, let's fix it to make sure it works for you. In the linux-next tree, it should be limited to driver fixes (no changes needed to the fw_devlink code). Can you make it the default for your test platform (add it to your command line) so that it doesn't break in the future if it's already working? > Like I said, I'm slightly concerned about drivers like rmi4 breaking > unexpectedly down the road I think we should fix these too as we find them. At a minimum the rmi4 driver isn't capturing parent/child relationship correctly. -Saravana > because some other patch caused the ordering > to change, so reverting now may just be putting off the inevitable. But > you're obviously right that we shouldn't randomly break working setups, > so I'm fine with the revert. > > Thierry ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 5.11-rc device reordering breaks ThinkPad rmi4 suspend 2021-01-11 13:43 ` Thierry Reding 2021-01-11 14:43 ` Rafael J. Wysocki 2021-01-11 14:57 ` Rafael J. Wysocki @ 2021-01-11 17:42 ` Hugh Dickins 2 siblings, 0 replies; 19+ messages in thread From: Hugh Dickins @ 2021-01-11 17:42 UTC (permalink / raw) To: Thierry Reding Cc: Hugh Dickins, Rafael J. Wysocki, Jonathan Hunter, Saravana Kannan, Greg Kroah-Hartman, linux-kernel, linux-pm [-- Attachment #1: Type: TEXT/PLAIN, Size: 2068 bytes --] On Mon, 11 Jan 2021, Thierry Reding wrote: > On Sun, Jan 10, 2021 at 08:44:13PM -0800, Hugh Dickins wrote: > > > > Synaptics RMI4 SMBus touchpad on ThinkPad X1 Carbon (5th generation) > > fails to suspend when running 5.11-rc kernels: bisected to > > 5b6164d3465f ("driver core: Reorder devices on successful probe"), > > and reverting that fixes it. dmesg.xz attached, but go ahead and ask > > me to switch on a debug option to extract further info if that may help. ... > > I think what might be happening here is that the offending patch causes > some devices to be reordered in a way different to how they were ordered > originally and the rmi4 driver currently depends on that implicit order. Yes, all that you explained makes good sense, thanks. > I'm not familiar with how exactly rmi4 works, so I'll have to do > some digging to hopefully pinpoint exactly what's going wrong here. > > In the meantime, it would be useful to know what exactly the I2C > hierarchy looks like. For example, what's the I2C controller that the > RMI4 device is hooked up to. According to the above, that's I2C bus 6, > so you should be able to find out some details about it by inspecting > the corresponding sysfs nodes: > > $ ls -l /sys/class/i2c-adapter/i2c-6/ > $ cat /sys/class/i2c-adapter/i2c-6/name > $ ls -l /sys/class/i2c-adapter/i2c-6/device/ That's curious: I don't even have a /sys/class/i2c-adapter directory. (And I did wonder if you meant to say "smbus" rather than "i2c", though I don't have any /sys/class/smbus* either: I have no notion of the relationship between i2c and smbus, but I thought the failing write_block calls were the ones in rmi_smbus.c rather than rmi_i2c.c.) I've attached compressed output of "find /sys/bus /sys/class | sort": /sys/bus looked more relevant than /sys/class, maybe it will help point in the right direction? And in case it's relevant, maybe I should mention that this is a non-modular, all-built-in kernel. But as I said to Rafael, my touchpad can wait: the wider ordering discussion is much more important. Hugh [-- Attachment #2: Type: APPLICATION/x-xz, Size: 4240 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-01-12 21:53 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-11 4:44 5.11-rc device reordering breaks ThinkPad rmi4 suspend Hugh Dickins 2021-01-11 12:57 ` Rafael J. Wysocki 2021-01-11 17:20 ` Hugh Dickins 2021-01-11 13:43 ` Thierry Reding 2021-01-11 14:43 ` Rafael J. Wysocki 2021-01-11 14:57 ` Rafael J. Wysocki 2021-01-11 16:12 ` Thierry Reding 2021-01-11 16:57 ` Rafael J. Wysocki 2021-01-11 22:44 ` Saravana Kannan 2021-01-11 23:42 ` Hugh Dickins 2021-01-12 0:14 ` Saravana Kannan 2021-01-12 0:44 ` Hugh Dickins 2021-01-12 2:16 ` Saravana Kannan 2021-01-12 4:42 ` Hugh Dickins 2021-01-12 12:43 ` Rafael J. Wysocki 2021-01-12 17:37 ` Thierry Reding 2021-01-12 17:57 ` Thierry Reding 2021-01-12 20:38 ` Saravana Kannan 2021-01-11 17:42 ` Hugh Dickins
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.