* PM regression in next @ 2018-01-12 0:01 Tony Lindgren 2018-01-12 0:18 ` Andrew Morton 0 siblings, 1 reply; 33+ messages in thread From: Tony Lindgren @ 2018-01-12 0:01 UTC (permalink / raw) To: Stephen Rothwell, Andrew Morton Cc: linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki Hi all, I'm seeing a considerable idle power consumption regression in Linux next, with power consumption for my idle test system going to 17.5mW compared to the usual 8mW on my test device. Git bisect points to merge commit e130bc1d00a4 ("Merge branch 'akpm-current/current'") being the first bad commit. I have also verified that commit 70286688e5ad ("ipc/mqueue.c: have RT tasks queue in by priority in wq_add()") is good, and commit e2d7fe89e8ae ("Merge remote-tracking branch 'init_task/init_task'") is good. Any ideas? Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 0:01 PM regression in next Tony Lindgren @ 2018-01-12 0:18 ` Andrew Morton 2018-01-12 0:23 ` Tony Lindgren 0 siblings, 1 reply; 33+ messages in thread From: Andrew Morton @ 2018-01-12 0:18 UTC (permalink / raw) To: Tony Lindgren Cc: Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki On Thu, 11 Jan 2018 16:01:13 -0800 Tony Lindgren <tony@atomide.com> wrote: > Hi all, > > I'm seeing a considerable idle power consumption regression in > Linux next, with power consumption for my idle test system going > to 17.5mW compared to the usual 8mW on my test device. > > Git bisect points to merge commit e130bc1d00a4 ("Merge branch > 'akpm-current/current'") being the first bad commit. > > I have also verified that commit 70286688e5ad ("ipc/mqueue.c: > have RT tasks queue in by priority in wq_add()") is good, and > commit e2d7fe89e8ae ("Merge remote-tracking branch > 'init_task/init_task'") is good. Do you mean that everything up to and including 70286688e5ad ("ipc/mqueue.c: have RT tasks queue in by priority in wq_add()") is good? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 0:18 ` Andrew Morton @ 2018-01-12 0:23 ` Tony Lindgren 2018-01-12 0:45 ` Andrew Morton 0 siblings, 1 reply; 33+ messages in thread From: Tony Lindgren @ 2018-01-12 0:23 UTC (permalink / raw) To: Andrew Morton Cc: Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki * Andrew Morton <akpm@linux-foundation.org> [180112 00:18]: > On Thu, 11 Jan 2018 16:01:13 -0800 Tony Lindgren <tony@atomide.com> wrote: > > > Hi all, > > > > I'm seeing a considerable idle power consumption regression in > > Linux next, with power consumption for my idle test system going > > to 17.5mW compared to the usual 8mW on my test device. > > > > Git bisect points to merge commit e130bc1d00a4 ("Merge branch > > 'akpm-current/current'") being the first bad commit. > > > > I have also verified that commit 70286688e5ad ("ipc/mqueue.c: > > have RT tasks queue in by priority in wq_add()") is good, and > > commit e2d7fe89e8ae ("Merge remote-tracking branch > > 'init_task/init_task'") is good. > > Do you mean that everything up to and including 70286688e5ad > ("ipc/mqueue.c: have RT tasks queue in by priority in wq_add()") is > good? Yes I'm not seeing the regression in your branch at commit 70286688e5ad. I'm seeing it only with the merge commit e130bc1d00a4. Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 0:23 ` Tony Lindgren @ 2018-01-12 0:45 ` Andrew Morton 2018-01-12 1:20 ` Tony Lindgren 0 siblings, 1 reply; 33+ messages in thread From: Andrew Morton @ 2018-01-12 0:45 UTC (permalink / raw) To: Tony Lindgren Cc: Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki On Thu, 11 Jan 2018 16:23:22 -0800 Tony Lindgren <tony@atomide.com> wrote: > * Andrew Morton <akpm@linux-foundation.org> [180112 00:18]: > > On Thu, 11 Jan 2018 16:01:13 -0800 Tony Lindgren <tony@atomide.com> wrote: > > > > > Hi all, > > > > > > I'm seeing a considerable idle power consumption regression in > > > Linux next, with power consumption for my idle test system going > > > to 17.5mW compared to the usual 8mW on my test device. > > > > > > Git bisect points to merge commit e130bc1d00a4 ("Merge branch > > > 'akpm-current/current'") being the first bad commit. > > > > > > I have also verified that commit 70286688e5ad ("ipc/mqueue.c: > > > have RT tasks queue in by priority in wq_add()") is good, and > > > commit e2d7fe89e8ae ("Merge remote-tracking branch > > > 'init_task/init_task'") is good. > > > > Do you mean that everything up to and including 70286688e5ad > > ("ipc/mqueue.c: have RT tasks queue in by priority in wq_add()") is > > good? > > Yes I'm not seeing the regression in your branch at commit > 70286688e5ad. I'm seeing it only with the merge commit > e130bc1d00a4. > That's weird. All I'm seeing between 70286688e5ad and end-of-mm is: tools-objtool-makefile-dont-assume-sync-checksh-is-executable.patch ipc-mqueue-add-missing-error-code-in-init_mqueue_fs.patch vfs-remove-might_sleep-from-clear_inode.patch mm-remove-duplicate-includes.patch mm-remove-unneeded-kallsyms-include.patch hrtimer-remove-unneeded-kallsyms-include.patch genirq-remove-unneeded-kallsyms-include.patch mm-memblock-memblock_is_map-region_memory-can-be-boolean.patch lib-lockref-__lockref_is_dead-can-be-boolean.patch kernel-cpuset-current_cpuset_is_being_rebound-can-be-boolean.patch kernel-resource-iomem_is_exclusive-can-be-boolean.patch kernel-module-module_is_live-can-be-boolean.patch kernel-mutex-mutex_is_locked-can-be-boolean.patch crash_dump-is_kdump_kernel-can-be-boolean.patch fix-const-confusion-in-certs-blacklist.patch fix-read-buffer-overflow-in-delta-ipc.patch kasan-rework-kconfig-settings.patch sparc64-ng4-memset-32-bits-overflow.patch lib-crc-ccitt-add-ccitt-false-crc16-variant.patch And I don't see how any of those can cause this. Did anything else change, like context switch rates, interrupt rates, etc? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 0:45 ` Andrew Morton @ 2018-01-12 1:20 ` Tony Lindgren 2018-01-12 1:32 ` Tony Lindgren 2018-01-12 19:00 ` Tony Lindgren 0 siblings, 2 replies; 33+ messages in thread From: Tony Lindgren @ 2018-01-12 1:20 UTC (permalink / raw) To: Andrew Morton Cc: Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki * Andrew Morton <akpm@linux-foundation.org> [180112 00:45]: > On Thu, 11 Jan 2018 16:23:22 -0800 Tony Lindgren <tony@atomide.com> wrote: > > > * Andrew Morton <akpm@linux-foundation.org> [180112 00:18]: > > > On Thu, 11 Jan 2018 16:01:13 -0800 Tony Lindgren <tony@atomide.com> wrote: > > > > > > > Hi all, > > > > > > > > I'm seeing a considerable idle power consumption regression in > > > > Linux next, with power consumption for my idle test system going > > > > to 17.5mW compared to the usual 8mW on my test device. > > > > > > > > Git bisect points to merge commit e130bc1d00a4 ("Merge branch > > > > 'akpm-current/current'") being the first bad commit. > > > > > > > > I have also verified that commit 70286688e5ad ("ipc/mqueue.c: > > > > have RT tasks queue in by priority in wq_add()") is good, and > > > > commit e2d7fe89e8ae ("Merge remote-tracking branch > > > > 'init_task/init_task'") is good. > > > > > > Do you mean that everything up to and including 70286688e5ad > > > ("ipc/mqueue.c: have RT tasks queue in by priority in wq_add()") is > > > good? > > > > Yes I'm not seeing the regression in your branch at commit > > 70286688e5ad. I'm seeing it only with the merge commit > > e130bc1d00a4. > > > > That's weird. All I'm seeing between 70286688e5ad and end-of-mm is: > > tools-objtool-makefile-dont-assume-sync-checksh-is-executable.patch > ipc-mqueue-add-missing-error-code-in-init_mqueue_fs.patch > > vfs-remove-might_sleep-from-clear_inode.patch > > mm-remove-duplicate-includes.patch > > mm-remove-unneeded-kallsyms-include.patch > hrtimer-remove-unneeded-kallsyms-include.patch > genirq-remove-unneeded-kallsyms-include.patch > > mm-memblock-memblock_is_map-region_memory-can-be-boolean.patch > lib-lockref-__lockref_is_dead-can-be-boolean.patch > kernel-cpuset-current_cpuset_is_being_rebound-can-be-boolean.patch > kernel-resource-iomem_is_exclusive-can-be-boolean.patch > kernel-module-module_is_live-can-be-boolean.patch > kernel-mutex-mutex_is_locked-can-be-boolean.patch > crash_dump-is_kdump_kernel-can-be-boolean.patch > > fix-const-confusion-in-certs-blacklist.patch > fix-read-buffer-overflow-in-delta-ipc.patch > > kasan-rework-kconfig-settings.patch > > sparc64-ng4-memset-32-bits-overflow.patch > > lib-crc-ccitt-add-ccitt-false-crc16-variant.patch Well there are some changes in merge commit e130bc1d00a4.. > And I don't see how any of those can cause this. Did anything else > change, like context switch rates, interrupt rates, etc? Well I tried to measure suspend power consumption and noticed that system suspend fails too hand hangs the network device: # echo mem > /sys/power/state [ 32.577850] PM: suspend entry (deep) [ 32.582031] PM: Syncing filesystems ... done. [ 32.598083] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 32.608398] OOM killer disabled. [ 32.611846] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. [ 32.622192] Suspending console(s) (use no_console_suspend to debug) [ 32.651123] dpm_run_callback(): mdio_bus_suspend+0x0/0x24 returns 4352 [ 32.651428] PM: Device 2c000000.ethernet-ffffffff:01 failed to suspend: error 4352 [ 32.653289] PM: Some devices failed to suspend, or early wake event detected [ 32.685455] OOM killer enabled. [ 32.688629] Restarting tasks ... done. [ 32.695983] PM: suspend exit ash: write error: Bad address That too works just fine at commit 70286688e5ad. Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 1:20 ` Tony Lindgren @ 2018-01-12 1:32 ` Tony Lindgren 2018-01-12 12:23 ` Rafael J. Wysocki 2018-01-12 19:00 ` Tony Lindgren 1 sibling, 1 reply; 33+ messages in thread From: Tony Lindgren @ 2018-01-12 1:32 UTC (permalink / raw) To: Andrew Morton Cc: Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki * Tony Lindgren <tony@atomide.com> [180111 17:20]: > Well I tried to measure suspend power consumption and noticed > that system suspend fails too hand hangs the network device: > > # echo mem > /sys/power/state > [ 32.577850] PM: suspend entry (deep) > [ 32.582031] PM: Syncing filesystems ... done. > [ 32.598083] Freezing user space processes ... (elapsed 0.002 seconds) done. > [ 32.608398] OOM killer disabled. > [ 32.611846] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. > [ 32.622192] Suspending console(s) (use no_console_suspend to debug) > [ 32.651123] dpm_run_callback(): mdio_bus_suspend+0x0/0x24 returns 4352 > [ 32.651428] PM: Device 2c000000.ethernet-ffffffff:01 failed to suspend: error 4352 > [ 32.653289] PM: Some devices failed to suspend, or early wake event detected > [ 32.685455] OOM killer enabled. > [ 32.688629] Restarting tasks ... done. > [ 32.695983] PM: suspend exit > ash: write error: Bad address > > That too works just fine at commit 70286688e5ad. Suspend fails at commit e2d7fe89e8ae though, so looks like we have two separate issues. I'll try to bisect that separately. Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 1:32 ` Tony Lindgren @ 2018-01-12 12:23 ` Rafael J. Wysocki 2018-01-12 12:30 ` Rafael J. Wysocki 0 siblings, 1 reply; 33+ messages in thread From: Rafael J. Wysocki @ 2018-01-12 12:23 UTC (permalink / raw) To: Tony Lindgren Cc: Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Linux PM On Friday, January 12, 2018 2:32:57 AM CET Tony Lindgren wrote: > * Tony Lindgren <tony@atomide.com> [180111 17:20]: > > Well I tried to measure suspend power consumption and noticed > > that system suspend fails too hand hangs the network device: > > > > # echo mem > /sys/power/state > > [ 32.577850] PM: suspend entry (deep) > > [ 32.582031] PM: Syncing filesystems ... done. > > [ 32.598083] Freezing user space processes ... (elapsed 0.002 seconds) done. > > [ 32.608398] OOM killer disabled. > > [ 32.611846] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. > > [ 32.622192] Suspending console(s) (use no_console_suspend to debug) > > [ 32.651123] dpm_run_callback(): mdio_bus_suspend+0x0/0x24 returns 4352 > > [ 32.651428] PM: Device 2c000000.ethernet-ffffffff:01 failed to suspend: error 4352 This looks totally bogus. First, "error" should be a negative number and we print it as int. Second, error codes are not in this range anyway. > > [ 32.653289] PM: Some devices failed to suspend, or early wake event detected > > [ 32.685455] OOM killer enabled. > > [ 32.688629] Restarting tasks ... done. > > [ 32.695983] PM: suspend exit > > ash: write error: Bad address > > > > That too works just fine at commit 70286688e5ad. > > Suspend fails at commit e2d7fe89e8ae though, so looks like we > have two separate issues. I'll try to bisect that separately. BTW, can you please CC PM bug reports to linux-pm? That may help sometimes. Thanks, Rafael ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 12:23 ` Rafael J. Wysocki @ 2018-01-12 12:30 ` Rafael J. Wysocki 2018-01-12 13:01 ` Lars-Peter Clausen 0 siblings, 1 reply; 33+ messages in thread From: Rafael J. Wysocki @ 2018-01-12 12:30 UTC (permalink / raw) To: Tony Lindgren Cc: Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Linux PM On Friday, January 12, 2018 1:23:54 PM CET Rafael J. Wysocki wrote: > On Friday, January 12, 2018 2:32:57 AM CET Tony Lindgren wrote: > > * Tony Lindgren <tony@atomide.com> [180111 17:20]: > > > Well I tried to measure suspend power consumption and noticed > > > that system suspend fails too hand hangs the network device: > > > > > > # echo mem > /sys/power/state > > > [ 32.577850] PM: suspend entry (deep) > > > [ 32.582031] PM: Syncing filesystems ... done. > > > [ 32.598083] Freezing user space processes ... (elapsed 0.002 seconds) done. > > > [ 32.608398] OOM killer disabled. > > > [ 32.611846] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. > > > [ 32.622192] Suspending console(s) (use no_console_suspend to debug) > > > [ 32.651123] dpm_run_callback(): mdio_bus_suspend+0x0/0x24 returns 4352 > > > [ 32.651428] PM: Device 2c000000.ethernet-ffffffff:01 failed to suspend: error 4352 > > This looks totally bogus. > > First, "error" should be a negative number and we print it as int. > > Second, error codes are not in this range anyway. > > > > [ 32.653289] PM: Some devices failed to suspend, or early wake event detected > > > [ 32.685455] OOM killer enabled. > > > [ 32.688629] Restarting tasks ... done. > > > [ 32.695983] PM: suspend exit > > > ash: write error: Bad address > > > > > > That too works just fine at commit 70286688e5ad. > > > > Suspend fails at commit e2d7fe89e8ae though, so looks like we > > have two separate issues. I'll try to bisect that separately. I guess what may happen is that something started to return positive numbers which confuse things all over when passed along by its callers as error codes. Thanks, Rafael ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 12:30 ` Rafael J. Wysocki @ 2018-01-12 13:01 ` Lars-Peter Clausen 2018-01-12 13:16 ` Andrew Lunn 0 siblings, 1 reply; 33+ messages in thread From: Lars-Peter Clausen @ 2018-01-12 13:01 UTC (permalink / raw) To: Rafael J. Wysocki, Tony Lindgren Cc: Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Linux PM, Geert Uytterhoeven On 01/12/2018 01:30 PM, Rafael J. Wysocki wrote: > On Friday, January 12, 2018 1:23:54 PM CET Rafael J. Wysocki wrote: >> On Friday, January 12, 2018 2:32:57 AM CET Tony Lindgren wrote: >>> * Tony Lindgren <tony@atomide.com> [180111 17:20]: >>>> Well I tried to measure suspend power consumption and noticed >>>> that system suspend fails too hand hangs the network device: >>>> >>>> # echo mem > /sys/power/state >>>> [ 32.577850] PM: suspend entry (deep) >>>> [ 32.582031] PM: Syncing filesystems ... done. >>>> [ 32.598083] Freezing user space processes ... (elapsed 0.002 seconds) done. >>>> [ 32.608398] OOM killer disabled. >>>> [ 32.611846] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. >>>> [ 32.622192] Suspending console(s) (use no_console_suspend to debug) >>>> [ 32.651123] dpm_run_callback(): mdio_bus_suspend+0x0/0x24 returns 4352 >>>> [ 32.651428] PM: Device 2c000000.ethernet-ffffffff:01 failed to suspend: error 4352 >> >> This looks totally bogus. >> >> First, "error" should be a negative number and we print it as int. >> >> Second, error codes are not in this range anyway. >> >>>> [ 32.653289] PM: Some devices failed to suspend, or early wake event detected >>>> [ 32.685455] OOM killer enabled. >>>> [ 32.688629] Restarting tasks ... done. >>>> [ 32.695983] PM: suspend exit >>>> ash: write error: Bad address >>>> >>>> That too works just fine at commit 70286688e5ad. >>> >>> Suspend fails at commit e2d7fe89e8ae though, so looks like we >>> have two separate issues. I'll try to bisect that separately. > > I guess what may happen is that something started to return positive numbers > which confuse things all over when passed along by its callers as error codes. I guess it this: https://patchwork.kernel.org/patch/10151763/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 13:01 ` Lars-Peter Clausen @ 2018-01-12 13:16 ` Andrew Lunn 2018-01-12 13:52 ` Tony Lindgren 0 siblings, 1 reply; 33+ messages in thread From: Andrew Lunn @ 2018-01-12 13:16 UTC (permalink / raw) To: Lars-Peter Clausen Cc: Rafael J. Wysocki, Tony Lindgren, Stephen Rothwell, Linux PM, linux-kernel, Geert Uytterhoeven, Andrew Morton, linux-omap, linux-arm-kernel On Fri, Jan 12, 2018 at 02:01:14PM +0100, Lars-Peter Clausen wrote: > On 01/12/2018 01:30 PM, Rafael J. Wysocki wrote: > > On Friday, January 12, 2018 1:23:54 PM CET Rafael J. Wysocki wrote: > >> On Friday, January 12, 2018 2:32:57 AM CET Tony Lindgren wrote: > >>> * Tony Lindgren <tony@atomide.com> [180111 17:20]: > >>>> Well I tried to measure suspend power consumption and noticed > >>>> that system suspend fails too hand hangs the network device: > >>>> > >>>> # echo mem > /sys/power/state > >>>> [ 32.577850] PM: suspend entry (deep) > >>>> [ 32.582031] PM: Syncing filesystems ... done. > >>>> [ 32.598083] Freezing user space processes ... (elapsed 0.002 seconds) done. > >>>> [ 32.608398] OOM killer disabled. > >>>> [ 32.611846] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. > >>>> [ 32.622192] Suspending console(s) (use no_console_suspend to debug) > >>>> [ 32.651123] dpm_run_callback(): mdio_bus_suspend+0x0/0x24 returns 4352 > >>>> [ 32.651428] PM: Device 2c000000.ethernet-ffffffff:01 failed to suspend: error 4352 > >> > >> This looks totally bogus. > >> > >> First, "error" should be a negative number and we print it as int. > >> > >> Second, error codes are not in this range anyway. > >> > >>>> [ 32.653289] PM: Some devices failed to suspend, or early wake event detected > >>>> [ 32.685455] OOM killer enabled. > >>>> [ 32.688629] Restarting tasks ... done. > >>>> [ 32.695983] PM: suspend exit > >>>> ash: write error: Bad address > >>>> > >>>> That too works just fine at commit 70286688e5ad. > >>> > >>> Suspend fails at commit e2d7fe89e8ae though, so looks like we > >>> have two separate issues. I'll try to bisect that separately. > > > > I guess what may happen is that something started to return positive numbers > > which confuse things all over when passed along by its callers as error codes. > > > I guess it this: https://patchwork.kernel.org/patch/10151763/ Hi Tony, Please try: https://patchwork.ozlabs.org/patch/859297/ Hopefully we will get this into net-next soon. Thanks Andrew ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 13:16 ` Andrew Lunn @ 2018-01-12 13:52 ` Tony Lindgren 2018-01-12 13:55 ` Andrew Lunn 0 siblings, 1 reply; 33+ messages in thread From: Tony Lindgren @ 2018-01-12 13:52 UTC (permalink / raw) To: Andrew Lunn Cc: Lars-Peter Clausen, Rafael J. Wysocki, Stephen Rothwell, Linux PM, linux-kernel, Geert Uytterhoeven, Andrew Morton, linux-omap, linux-arm-kernel * Andrew Lunn <andrew@lunn.ch> [180112 13:16]: > On Fri, Jan 12, 2018 at 02:01:14PM +0100, Lars-Peter Clausen wrote: > > On 01/12/2018 01:30 PM, Rafael J. Wysocki wrote: > > > On Friday, January 12, 2018 1:23:54 PM CET Rafael J. Wysocki wrote: > > >> On Friday, January 12, 2018 2:32:57 AM CET Tony Lindgren wrote: > > >>> * Tony Lindgren <tony@atomide.com> [180111 17:20]: > > >>>> Well I tried to measure suspend power consumption and noticed > > >>>> that system suspend fails too hand hangs the network device: > > >>>> > > >>>> # echo mem > /sys/power/state > > >>>> [ 32.577850] PM: suspend entry (deep) > > >>>> [ 32.582031] PM: Syncing filesystems ... done. > > >>>> [ 32.598083] Freezing user space processes ... (elapsed 0.002 seconds) done. > > >>>> [ 32.608398] OOM killer disabled. > > >>>> [ 32.611846] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. > > >>>> [ 32.622192] Suspending console(s) (use no_console_suspend to debug) > > >>>> [ 32.651123] dpm_run_callback(): mdio_bus_suspend+0x0/0x24 returns 4352 > > >>>> [ 32.651428] PM: Device 2c000000.ethernet-ffffffff:01 failed to suspend: error 4352 > > >> > > >> This looks totally bogus. > > >> > > >> First, "error" should be a negative number and we print it as int. > > >> > > >> Second, error codes are not in this range anyway. > > >> > > >>>> [ 32.653289] PM: Some devices failed to suspend, or early wake event detected > > >>>> [ 32.685455] OOM killer enabled. > > >>>> [ 32.688629] Restarting tasks ... done. > > >>>> [ 32.695983] PM: suspend exit > > >>>> ash: write error: Bad address > > >>>> > > >>>> That too works just fine at commit 70286688e5ad. > > >>> > > >>> Suspend fails at commit e2d7fe89e8ae though, so looks like we > > >>> have two separate issues. I'll try to bisect that separately. > > > > > > I guess what may happen is that something started to return positive numbers > > > which confuse things all over when passed along by its callers as error codes. > > > > > > I guess it this: https://patchwork.kernel.org/patch/10151763/ > > Hi Tony, > > Please try: > > https://patchwork.ozlabs.org/patch/859297/ > > Hopefully we will get this into net-next soon. Thanks that fixes the suspend error. And I was able to confirm that the suspend power consumption is OK. That still leaves the mystery of the runtime idle power consumption being much higher with commit e130bc1d00a4. Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 13:52 ` Tony Lindgren @ 2018-01-12 13:55 ` Andrew Lunn 2018-01-12 14:14 ` Tony Lindgren 0 siblings, 1 reply; 33+ messages in thread From: Andrew Lunn @ 2018-01-12 13:55 UTC (permalink / raw) To: Tony Lindgren Cc: Lars-Peter Clausen, Rafael J. Wysocki, Stephen Rothwell, Linux PM, linux-kernel, Geert Uytterhoeven, Andrew Morton, linux-omap, linux-arm-kernel > Thanks that fixes the suspend error. And I was able to confirm > that the suspend power consumption is OK. > > That still leaves the mystery of the runtime idle power consumption > being much higher with commit e130bc1d00a4. Did you re-measure the runtime power? Do you have an unused PHY? It could be it is not getting shut down. 1G PHYs can be quite power hungry. Andrew ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 13:55 ` Andrew Lunn @ 2018-01-12 14:14 ` Tony Lindgren 0 siblings, 0 replies; 33+ messages in thread From: Tony Lindgren @ 2018-01-12 14:14 UTC (permalink / raw) To: Andrew Lunn Cc: Lars-Peter Clausen, Rafael J. Wysocki, Stephen Rothwell, Linux PM, linux-kernel, Geert Uytterhoeven, Andrew Morton, linux-omap, linux-arm-kernel * Andrew Lunn <andrew@lunn.ch> [180112 13:55]: > > Thanks that fixes the suspend error. And I was able to confirm > > that the suspend power consumption is OK. > > > > That still leaves the mystery of the runtime idle power consumption > > being much higher with commit e130bc1d00a4. > > Did you re-measure the runtime power? Do you have an unused PHY? It > could be it is not getting shut down. 1G PHYs can be quite power > hungry. Yes this is a different issue, the increase in runtime PM consumption measurement I'm measuring is for a SoM board. It contains the SoC + memory + PMIC and few devices. The Ethernet controller is on a separate optional base board and not related to the runtime PM issue. Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 1:20 ` Tony Lindgren 2018-01-12 1:32 ` Tony Lindgren @ 2018-01-12 19:00 ` Tony Lindgren 2018-01-12 19:12 ` Mark Brown 1 sibling, 1 reply; 33+ messages in thread From: Tony Lindgren @ 2018-01-12 19:00 UTC (permalink / raw) To: Andrew Morton, Kuninori Morimoto Cc: Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm, Mark Brown * Tony Lindgren <tony@atomide.com> [180112 01:20]: > * Andrew Morton <akpm@linux-foundation.org> [180112 00:45]: > > On Thu, 11 Jan 2018 16:23:22 -0800 Tony Lindgren <tony@atomide.com> wrote: > > > > > * Andrew Morton <akpm@linux-foundation.org> [180112 00:18]: > > > > On Thu, 11 Jan 2018 16:01:13 -0800 Tony Lindgren <tony@atomide.com> wrote: > > > > > > > > > Hi all, > > > > > > > > > > I'm seeing a considerable idle power consumption regression in > > > > > Linux next, with power consumption for my idle test system going > > > > > to 17.5mW compared to the usual 8mW on my test device. > > > > > > > > > > Git bisect points to merge commit e130bc1d00a4 ("Merge branch > > > > > 'akpm-current/current'") being the first bad commit. > > > > > > > > > > I have also verified that commit 70286688e5ad ("ipc/mqueue.c: > > > > > have RT tasks queue in by priority in wq_add()") is good, and > > > > > commit e2d7fe89e8ae ("Merge remote-tracking branch > > > > > 'init_task/init_task'") is good. > > > > > > > > Do you mean that everything up to and including 70286688e5ad > > > > ("ipc/mqueue.c: have RT tasks queue in by priority in wq_add()") is > > > > good? > > > > > > Yes I'm not seeing the regression in your branch at commit > > > 70286688e5ad. I'm seeing it only with the merge commit > > > e130bc1d00a4. > > > > > > > That's weird. All I'm seeing between 70286688e5ad and end-of-mm is: ... > Well there are some changes in merge commit e130bc1d00a4.. So it seems that the Makefile changes in Linux next merge commit e130bc1d00a4 cause changes with CONFIG_CC_STACKPROTECTOR_AUTO=y. With next-20180112 out now, that's now commit 3823b7cc7a5e ("Merge branch 'akpm-current/current'"). Not sure if that's a bug or not.. Anyways, I reran my "bisect boot wait-idle measure" test with CONFIG_CC_STACKPROTECTOR_STRONG=y selected to rule out the AUTO option, and bisect now points to a different commit that makes more sense for my test case. It's commit 3bb0f7c31b1a ("ASoC: don't use snd_soc_write/read on twl4030"). And that is for the PMIC on my test system, so adding Kuninori and Mark to the thread :) Kuninori, it seems that commit 3bb0f7c31b1a causes higher power consumption on an idle system on omap3 using twl4030. Reverting 3bb0f7c31b1a makes things behave again. My guess is that twl4030_read does not do the same as snd_soc_read in the driver? Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 19:00 ` Tony Lindgren @ 2018-01-12 19:12 ` Mark Brown 2018-01-12 21:07 ` Tony Lindgren 0 siblings, 1 reply; 33+ messages in thread From: Mark Brown @ 2018-01-12 19:12 UTC (permalink / raw) To: Tony Lindgren Cc: Andrew Morton, Kuninori Morimoto, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm [-- Attachment #1: Type: text/plain, Size: 899 bytes --] On Fri, Jan 12, 2018 at 11:00:46AM -0800, Tony Lindgren wrote: > It's commit 3bb0f7c31b1a ("ASoC: don't use snd_soc_write/read > on twl4030"). And that is for the PMIC on my test system, so > adding Kuninori and Mark to the thread :) > Kuninori, it seems that commit 3bb0f7c31b1a causes higher > power consumption on an idle system on omap3 using twl4030. > Reverting 3bb0f7c31b1a makes things behave again. My guess > is that twl4030_read does not do the same as snd_soc_read > in the driver? As far as I can tell it should end up boiling down to the same thing but I didn't follow through in detail, they should both bottom out in twl_i2c_read_u8() if they hit hardware - all snd_soc_read() did was call twl4030_read(), the patch just removes the indirection through assigning the pointer. Could you try deleting the attempt to read from the cache in twl4030_read() and always go to hardware? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 19:12 ` Mark Brown @ 2018-01-12 21:07 ` Tony Lindgren 2018-01-12 21:15 ` Mark Brown 2018-01-12 21:38 ` Mark Brown 0 siblings, 2 replies; 33+ messages in thread From: Tony Lindgren @ 2018-01-12 21:07 UTC (permalink / raw) To: Mark Brown Cc: Andrew Morton, Kuninori Morimoto, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm * Mark Brown <broonie@kernel.org> [180112 19:13]: > On Fri, Jan 12, 2018 at 11:00:46AM -0800, Tony Lindgren wrote: > > > It's commit 3bb0f7c31b1a ("ASoC: don't use snd_soc_write/read > > on twl4030"). And that is for the PMIC on my test system, so > > adding Kuninori and Mark to the thread :) > > > Kuninori, it seems that commit 3bb0f7c31b1a causes higher > > power consumption on an idle system on omap3 using twl4030. > > Reverting 3bb0f7c31b1a makes things behave again. My guess > > is that twl4030_read does not do the same as snd_soc_read > > in the driver? > > As far as I can tell it should end up boiling down to the same thing but > I didn't follow through in detail, they should both bottom out in > twl_i2c_read_u8() if they hit hardware - all snd_soc_read() did was call > twl4030_read(), the patch just removes the indirection through assigning > the pointer. > > Could you try deleting the attempt to read from the cache in > twl4030_read() and always go to hardware? Thanks I tried that, but that's not it. Tturns out just adding back .read = twl4030_read fixes it.. I added a dummy function for read and am now seeing a bunch of reads that now don't happen: (twl4030_dummy_read [snd_soc_twl4030]) from [<bf1cc3b4>] (snd_soc_codec_drv_read+0x1c/0x28 [snd_soc_core]) (snd_soc_codec_drv_read [snd_soc_core]) from [<bf1da1cc>] (snd_soc_dapm_new_widgets+0x29c/0x578 [snd_soc_core]) (snd_soc_dapm_new_widgets [snd_soc_core]) from [<bf1d2f30>] (snd_soc_register_card+0xad0/0xe30 [snd_soc_core]) (snd_soc_register_card [snd_soc_core]) from [<bf1e1850>] (devm_snd_soc_register_card+0x30/0x70 [snd_soc_core]) (devm_snd_soc_register_card [snd_soc_core]) from [<bf234364>] (omap_twl4030_probe+0x100/0x1d0 [snd_soc_omap_twl4030]) (omap_twl4030_probe [snd_soc_omap_twl4030]) from [<c0606660>] (platform_drv_probe+0x50/0xb0) So probably there are other asoc drivers broken too with these kind of patches until snd_soc_codec_drv_write() and snd_soc_codec_drv_read() are fixed? Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 21:07 ` Tony Lindgren @ 2018-01-12 21:15 ` Mark Brown 2018-01-12 21:50 ` Tony Lindgren 2018-01-12 21:38 ` Mark Brown 1 sibling, 1 reply; 33+ messages in thread From: Mark Brown @ 2018-01-12 21:15 UTC (permalink / raw) To: Tony Lindgren Cc: Andrew Morton, Kuninori Morimoto, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm [-- Attachment #1: Type: text/plain, Size: 1223 bytes --] On Fri, Jan 12, 2018 at 01:07:06PM -0800, Tony Lindgren wrote: > (twl4030_dummy_read [snd_soc_twl4030]) from [<bf1cc3b4>] > (snd_soc_codec_drv_read+0x1c/0x28 [snd_soc_core]) > (snd_soc_codec_drv_read [snd_soc_core]) from [<bf1da1cc>] > (snd_soc_dapm_new_widgets+0x29c/0x578 [snd_soc_core]) > (snd_soc_dapm_new_widgets [snd_soc_core]) from [<bf1d2f30>] > (snd_soc_register_card+0xad0/0xe30 [snd_soc_core]) > (snd_soc_register_card [snd_soc_core]) from [<bf1e1850>] > (devm_snd_soc_register_card+0x30/0x70 [snd_soc_core]) > (devm_snd_soc_register_card [snd_soc_core]) from [<bf234364>] > (omap_twl4030_probe+0x100/0x1d0 [snd_soc_omap_twl4030]) > (omap_twl4030_probe [snd_soc_omap_twl4030]) from [<c0606660>] > (platform_drv_probe+0x50/0xb0) So does the read not happen otherwise? Or is it failing somehow and the error getting ignored? > So probably there are other asoc drivers broken too with these > kind of patches until snd_soc_codec_drv_write() and > snd_soc_codec_drv_read() are fixed? I rather suspect someone would've noticed by now TBH - I suspect this is more likely to be an issue with the rather baroque code that the TWL drivers have possibly coupled with the whole multiple I2C devices issue. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 21:15 ` Mark Brown @ 2018-01-12 21:50 ` Tony Lindgren 2018-01-12 22:11 ` Mark Brown 0 siblings, 1 reply; 33+ messages in thread From: Tony Lindgren @ 2018-01-12 21:50 UTC (permalink / raw) To: Mark Brown Cc: Andrew Morton, Kuninori Morimoto, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm * Mark Brown <broonie@kernel.org> [180112 21:15]: > On Fri, Jan 12, 2018 at 01:07:06PM -0800, Tony Lindgren wrote: > > > (twl4030_dummy_read [snd_soc_twl4030]) from [<bf1cc3b4>] > > (snd_soc_codec_drv_read+0x1c/0x28 [snd_soc_core]) > > (snd_soc_codec_drv_read [snd_soc_core]) from [<bf1da1cc>] > > (snd_soc_dapm_new_widgets+0x29c/0x578 [snd_soc_core]) > > (snd_soc_dapm_new_widgets [snd_soc_core]) from [<bf1d2f30>] > > (snd_soc_register_card+0xad0/0xe30 [snd_soc_core]) > > (snd_soc_register_card [snd_soc_core]) from [<bf1e1850>] > > (devm_snd_soc_register_card+0x30/0x70 [snd_soc_core]) > > (devm_snd_soc_register_card [snd_soc_core]) from [<bf234364>] > > (omap_twl4030_probe+0x100/0x1d0 [snd_soc_omap_twl4030]) > > (omap_twl4030_probe [snd_soc_omap_twl4030]) from [<c0606660>] > > (platform_drv_probe+0x50/0xb0) > > So does the read not happen otherwise? Or is it failing somehow and the > error getting ignored? All the snd_soc_dapm_new_widgets() related reads go missing without the .read being there. Probably the same story for writes. > > So probably there are other asoc drivers broken too with these > > kind of patches until snd_soc_codec_drv_write() and > > snd_soc_codec_drv_read() are fixed? > > I rather suspect someone would've noticed by now TBH - I suspect this is > more likely to be an issue with the rather baroque code that the TWL > drivers have possibly coupled with the whole multiple I2C devices issue. Hmm these calls are in sound/soc/soc-core.c though? But yeah, sure it could be some legacy code issue somewhere.. Based on a quick grep these have .read and .write removed in next so they may all be broken now: sound/soc/codecs/cx20442.c sound/soc/codecs/tlv320dac33.c sound/soc/codecs/twl4030.c sound/soc/codecs/twl6040.c sound/soc/codecs/uda1380.c sound/soc/fsl/fsl_ssi.c Can you confirm that some of these are still working? Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 21:50 ` Tony Lindgren @ 2018-01-12 22:11 ` Mark Brown 2018-01-12 22:49 ` Tony Lindgren 0 siblings, 1 reply; 33+ messages in thread From: Mark Brown @ 2018-01-12 22:11 UTC (permalink / raw) To: Tony Lindgren Cc: Andrew Morton, Kuninori Morimoto, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm [-- Attachment #1: Type: text/plain, Size: 1313 bytes --] On Fri, Jan 12, 2018 at 01:50:10PM -0800, Tony Lindgren wrote: > > I rather suspect someone would've noticed by now TBH - I suspect this is > > more likely to be an issue with the rather baroque code that the TWL > > drivers have possibly coupled with the whole multiple I2C devices issue. > Hmm these calls are in sound/soc/soc-core.c though? But yeah, sure > it could be some legacy code issue somewhere.. Most devices have one regmap per device which can be retrieved with dev_get_regmap(), it's the attempt to use that which I suspect is broken. Like I said snd_soc_codec_init_regmap() ought to fix things if that's the issue. > sound/soc/codecs/cx20442.c > sound/soc/codecs/tlv320dac33.c > sound/soc/codecs/twl4030.c > sound/soc/codecs/twl6040.c > sound/soc/codecs/uda1380.c > sound/soc/fsl/fsl_ssi.c > Can you confirm that some of these are still working? Most of the non-TWL ones have no active users I'm aware of but fsl_ssi.c is under active development at the minute, one series was getting some successful testing earlier today so if it's broken I'd hope someone would have noticed although since it doesn't use DAPM or anything snd_soc_read()/write() will never get used. cx20442 was only in a single archaic consumer product, uda1380 has been legacy since before I was ever involved in ASoC. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 22:11 ` Mark Brown @ 2018-01-12 22:49 ` Tony Lindgren 2018-01-12 22:59 ` Mark Brown 0 siblings, 1 reply; 33+ messages in thread From: Tony Lindgren @ 2018-01-12 22:49 UTC (permalink / raw) To: Mark Brown, Peter Ujfalusi Cc: Andrew Morton, Kuninori Morimoto, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm * Mark Brown <broonie@kernel.org> [180112 22:11]: > On Fri, Jan 12, 2018 at 01:50:10PM -0800, Tony Lindgren wrote: > > > > I rather suspect someone would've noticed by now TBH - I suspect this is > > > more likely to be an issue with the rather baroque code that the TWL > > > drivers have possibly coupled with the whole multiple I2C devices issue. > > > Hmm these calls are in sound/soc/soc-core.c though? But yeah, sure > > it could be some legacy code issue somewhere.. > > Most devices have one regmap per device which can be retrieved with > dev_get_regmap(), it's the attempt to use that which I suspect is > broken. Like I said snd_soc_codec_init_regmap() ought to fix things if > that's the issue. OK. Adding Peter to loop as it's his driver after all. Not sure how well mixing regmap register access to the same module with cached twl4030_read() would work :) Maybe there should also be some big warning happening if snd_soc_codec_init_regmap() is now needed and no regmap is found? > > sound/soc/codecs/cx20442.c > > sound/soc/codecs/tlv320dac33.c > > sound/soc/codecs/twl4030.c > > sound/soc/codecs/twl6040.c > > sound/soc/codecs/uda1380.c > > sound/soc/fsl/fsl_ssi.c > > > Can you confirm that some of these are still working? > > Most of the non-TWL ones have no active users I'm aware of but fsl_ssi.c > is under active development at the minute, one series was getting some > successful testing earlier today so if it's broken I'd hope someone > would have noticed although since it doesn't use DAPM or anything > snd_soc_read()/write() will never get used. cx20442 was only in a > single archaic consumer product, uda1380 has been legacy since before I > was ever involved in ASoC. OK. sounds like we should just revert the read and write removal patches for now as they are now known to be incomplete for non-regmap cases and cause regressions. That way we can deal with them properly after the merge window and test them. I'm not sure if adding back just the .read and .write is a safe revert in all cases but it might be doable. Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 22:49 ` Tony Lindgren @ 2018-01-12 22:59 ` Mark Brown 2018-01-15 1:45 ` Kuninori Morimoto 0 siblings, 1 reply; 33+ messages in thread From: Mark Brown @ 2018-01-12 22:59 UTC (permalink / raw) To: Tony Lindgren Cc: Peter Ujfalusi, Andrew Morton, Kuninori Morimoto, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm [-- Attachment #1: Type: text/plain, Size: 992 bytes --] On Fri, Jan 12, 2018 at 02:49:59PM -0800, Tony Lindgren wrote: > * Mark Brown <broonie@kernel.org> [180112 22:11]: > > Most devices have one regmap per device which can be retrieved with > > dev_get_regmap(), it's the attempt to use that which I suspect is > > broken. Like I said snd_soc_codec_init_regmap() ought to fix things if > > that's the issue. > OK. Adding Peter to loop as it's his driver after all. Not sure > how well mixing regmap register access to the same module with > cached twl4030_read() would work :) Yes, that local cache is not a super good idea any more and hopefully redundant. > Maybe there should also be some big warning happening if > snd_soc_codec_init_regmap() is now needed and no regmap is > found? Some devices just plain don't have registers at all (perhaps GPIOs or just stub drivers providing capability information). However we should be screaming loudly about the fact that the I/O we tried to do fails, that clearly shouldn't be being ignored. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 22:59 ` Mark Brown @ 2018-01-15 1:45 ` Kuninori Morimoto 2018-01-15 16:50 ` Tony Lindgren 0 siblings, 1 reply; 33+ messages in thread From: Kuninori Morimoto @ 2018-01-15 1:45 UTC (permalink / raw) To: Mark Brown Cc: Tony Lindgren, Peter Ujfalusi, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm Hi all > > > Most devices have one regmap per device which can be retrieved with > > > dev_get_regmap(), it's the attempt to use that which I suspect is > > > broken. Like I said snd_soc_codec_init_regmap() ought to fix things if > > > that's the issue. > > > OK. Adding Peter to loop as it's his driver after all. Not sure > > how well mixing regmap register access to the same module with > > cached twl4030_read() would work :) > > Yes, that local cache is not a super good idea any more and hopefully > redundant. > > > Maybe there should also be some big warning happening if > > snd_soc_codec_init_regmap() is now needed and no regmap is > > found? > > Some devices just plain don't have registers at all (perhaps GPIOs or > just stub drivers providing capability information). However we should > be screaming loudly about the fact that the I/O we tried to do fails, > that clearly shouldn't be being ignored. I'm sorry that my patch breaks your drivers. It seems removing .read/.write callback was too much aggressive. I hope your driver will be OK by using regmap. In worst case, we can back .read/.write, but it will be component driver side. Best regards --- Kuninori Morimoto ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-15 1:45 ` Kuninori Morimoto @ 2018-01-15 16:50 ` Tony Lindgren 2018-01-15 17:19 ` Mark Brown [not found] ` <871siqbte4.wl%kuninori.morimoto.gx@renesas.com> 0 siblings, 2 replies; 33+ messages in thread From: Tony Lindgren @ 2018-01-15 16:50 UTC (permalink / raw) To: Kuninori Morimoto Cc: Mark Brown, Peter Ujfalusi, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm * Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> [180115 01:46]: > > Hi all > > > > > Most devices have one regmap per device which can be retrieved with > > > > dev_get_regmap(), it's the attempt to use that which I suspect is > > > > broken. Like I said snd_soc_codec_init_regmap() ought to fix things if > > > > that's the issue. > > > > > OK. Adding Peter to loop as it's his driver after all. Not sure > > > how well mixing regmap register access to the same module with > > > cached twl4030_read() would work :) > > > > Yes, that local cache is not a super good idea any more and hopefully > > redundant. > > > > > Maybe there should also be some big warning happening if > > > snd_soc_codec_init_regmap() is now needed and no regmap is > > > found? > > > > Some devices just plain don't have registers at all (perhaps GPIOs or > > just stub drivers providing capability information). However we should > > be screaming loudly about the fact that the I/O we tried to do fails, > > that clearly shouldn't be being ignored. > > I'm sorry that my patch breaks your drivers. > It seems removing .read/.write callback was too much aggressive. No problem, things happen. The thing I'm worried about here though is that these patches were "completely untested clean-up patches" :) For the next set of changes like this, please make sure you Cc the driver authors so they can test and ack the changes if you can't test them, OK? > I hope your driver will be OK by using regmap. > In worst case, we can back .read/.write, but it will be component driver side. Well we have the merge window about to open in a week, so please just revert the known broken patches. Like I described, it looks all the non-regmap drivers with read/write function pointers removed are broken now. Just adding back the read/write function pointers while keeping your other changes seems also risky to me at this point. It seems it should work for twl4030 and twl6030 though, have not checked the other drivers but they seem to have more changes. Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-15 16:50 ` Tony Lindgren @ 2018-01-15 17:19 ` Mark Brown 2018-01-15 17:52 ` Tony Lindgren [not found] ` <871siqbte4.wl%kuninori.morimoto.gx@renesas.com> 1 sibling, 1 reply; 33+ messages in thread From: Mark Brown @ 2018-01-15 17:19 UTC (permalink / raw) To: Tony Lindgren Cc: Kuninori Morimoto, Peter Ujfalusi, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm [-- Attachment #1: Type: text/plain, Size: 434 bytes --] On Mon, Jan 15, 2018 at 08:50:51AM -0800, Tony Lindgren wrote: > * Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> [180115 01:46]: > > I hope your driver will be OK by using regmap. > > In worst case, we can back .read/.write, but it will be component driver side. > Well we have the merge window about to open in a week, so > please just revert the known broken patches. Did anyone actually try testing the fix I posted yet? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-15 17:19 ` Mark Brown @ 2018-01-15 17:52 ` Tony Lindgren 2018-01-15 17:56 ` Mark Brown 0 siblings, 1 reply; 33+ messages in thread From: Tony Lindgren @ 2018-01-15 17:52 UTC (permalink / raw) To: Mark Brown Cc: Kuninori Morimoto, Peter Ujfalusi, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm * Mark Brown <broonie@kernel.org> [180115 17:19]: > On Mon, Jan 15, 2018 at 08:50:51AM -0800, Tony Lindgren wrote: > > * Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> [180115 01:46]: > > > > I hope your driver will be OK by using regmap. > > > In worst case, we can back .read/.write, but it will be component driver side. > > > Well we have the merge window about to open in a week, so > > please just revert the known broken patches. > > Did anyone actually try testing the fix I posted yet? Hmm I don't seem to have anything in my inbox. What might be the subject for it? Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-15 17:52 ` Tony Lindgren @ 2018-01-15 17:56 ` Mark Brown 2018-01-15 18:06 ` Tony Lindgren 0 siblings, 1 reply; 33+ messages in thread From: Mark Brown @ 2018-01-15 17:56 UTC (permalink / raw) To: Tony Lindgren Cc: Kuninori Morimoto, Peter Ujfalusi, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm [-- Attachment #1: Type: text/plain, Size: 355 bytes --] On Mon, Jan 15, 2018 at 09:52:20AM -0800, Tony Lindgren wrote: > * Mark Brown <broonie@kernel.org> [180115 17:19]: > > Did anyone actually try testing the fix I posted yet? > Hmm I don't seem to have anything in my inbox. What might > be the subject for it? Sorry, I didn't actually post it - just suggested adding the snd_soc_codec_set_regmap() call. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-15 17:56 ` Mark Brown @ 2018-01-15 18:06 ` Tony Lindgren 2018-01-15 18:13 ` Mark Brown 0 siblings, 1 reply; 33+ messages in thread From: Tony Lindgren @ 2018-01-15 18:06 UTC (permalink / raw) To: Mark Brown Cc: Kuninori Morimoto, Peter Ujfalusi, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm * Mark Brown <broonie@kernel.org> [180115 17:56]: > On Mon, Jan 15, 2018 at 09:52:20AM -0800, Tony Lindgren wrote: > > * Mark Brown <broonie@kernel.org> [180115 17:19]: > > > > Did anyone actually try testing the fix I posted yet? > > > Hmm I don't seem to have anything in my inbox. What might > > be the subject for it? > > Sorry, I didn't actually post it - just suggested adding the > snd_soc_codec_set_regmap() call. Oh I see. Yeah that seems like a good long term solution after the regressions are fixed. Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-15 18:06 ` Tony Lindgren @ 2018-01-15 18:13 ` Mark Brown 2018-01-15 18:55 ` Tony Lindgren 0 siblings, 1 reply; 33+ messages in thread From: Mark Brown @ 2018-01-15 18:13 UTC (permalink / raw) To: Tony Lindgren Cc: Kuninori Morimoto, Peter Ujfalusi, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm [-- Attachment #1: Type: text/plain, Size: 429 bytes --] On Mon, Jan 15, 2018 at 10:06:26AM -0800, Tony Lindgren wrote: > * Mark Brown <broonie@kernel.org> [180115 17:56]: > > Sorry, I didn't actually post it - just suggested adding the > > snd_soc_codec_set_regmap() call. > Oh I see. Yeah that seems like a good long term solution after > the regressions are fixed. I would like to see us do that sooner rather than later assuming it addresses the issue, it's quicker and simpler. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-15 18:13 ` Mark Brown @ 2018-01-15 18:55 ` Tony Lindgren 2018-01-16 0:38 ` Kuninori Morimoto 2018-01-17 9:47 ` Peter Ujfalusi 0 siblings, 2 replies; 33+ messages in thread From: Tony Lindgren @ 2018-01-15 18:55 UTC (permalink / raw) To: Mark Brown Cc: Kuninori Morimoto, Peter Ujfalusi, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm * Mark Brown <broonie@kernel.org> [180115 18:14]: > On Mon, Jan 15, 2018 at 10:06:26AM -0800, Tony Lindgren wrote: > > * Mark Brown <broonie@kernel.org> [180115 17:56]: > > > > Sorry, I didn't actually post it - just suggested adding the > > > snd_soc_codec_set_regmap() call. > > > Oh I see. Yeah that seems like a good long term solution after > > the regressions are fixed. > > I would like to see us do that sooner rather than later assuming it > addresses the issue, it's quicker and simpler. Peter, can you take a look at that? Meanwhile, here's a regression fix for Linux next for twl4030 and 6040 drivers. Regards, Tony 8< --------------------- >From tony Mon Sep 17 00:00:00 2001 From: Tony Lindgren <tony@atomide.com> Date: Fri, 12 Jan 2018 10:24:36 -0800 Subject: [PATCH] ASoC: Fix twl4030 and 6040 regression by adding back read and write Commit 3bb0f7c31b1a ("ASoC: don't use snd_soc_write/read on twl4030") caused regressions for both twl4030 and twl6040 as it assumes the ASoC driver is using regmap. As a side effect, this also causes a considerable increase in idle power consumption omap3 boards using twl4030 as the PMIC. This is because the removal of read and write function pointers causes some of the ASoC IO functions to not do anything. For example, snd_soc_register_card() calls snd_soc_dapm_new_widgets() that calls snd_soc_codec_drv_read() that now does nothing. A long term solution suggested by Mark Brown <broonie@kernel.org> is to make the twl drivers use regmap by adding a call to snd_soc_codec_set_regmap(). This however needs more consideration as currently the driver internal reads do caching and we would have both regmap access and internal read/write access accessing the same hardware registers. So to fix the regression, let's just do a partial revert adding back the read and write function pointers. Note that other non-regmap ASoC drivers may need similar patches. Fixes: 3bb0f7c31b1a ("ASoC: don't use snd_soc_write/read on twl4030") Fixes: 93a00c467fe9 ("ASoC: don't use snd_soc_write/read on twl6040") Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> --- sound/soc/codecs/twl4030.c | 2 ++ sound/soc/codecs/twl6040.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/sound/soc/codecs/twl4030.c b/sound/soc/codecs/twl4030.c --- a/sound/soc/codecs/twl4030.c +++ b/sound/soc/codecs/twl4030.c @@ -2195,6 +2195,8 @@ static int twl4030_soc_remove(struct snd_soc_codec *codec) static const struct snd_soc_codec_driver soc_codec_dev_twl4030 = { .probe = twl4030_soc_probe, .remove = twl4030_soc_remove, + .read = twl4030_read, + .write = twl4030_write, .set_bias_level = twl4030_set_bias_level, .idle_bias_off = true, diff --git a/sound/soc/codecs/twl6040.c b/sound/soc/codecs/twl6040.c --- a/sound/soc/codecs/twl6040.c +++ b/sound/soc/codecs/twl6040.c @@ -1158,6 +1158,8 @@ static int twl6040_remove(struct snd_soc_codec *codec) static const struct snd_soc_codec_driver soc_codec_dev_twl6040 = { .probe = twl6040_probe, .remove = twl6040_remove, + .read = twl6040_read, + .write = twl6040_write, .set_bias_level = twl6040_set_bias_level, .suspend_bias_off = true, .ignore_pmdown_time = true, -- 2.15.0 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-15 18:55 ` Tony Lindgren @ 2018-01-16 0:38 ` Kuninori Morimoto 2018-01-17 9:47 ` Peter Ujfalusi 1 sibling, 0 replies; 33+ messages in thread From: Kuninori Morimoto @ 2018-01-16 0:38 UTC (permalink / raw) To: Tony Lindgren Cc: Mark Brown, Peter Ujfalusi, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm Hi Mark, Tony > From tony Mon Sep 17 00:00:00 2001 > From: Tony Lindgren <tony@atomide.com> > Date: Fri, 12 Jan 2018 10:24:36 -0800 > Subject: [PATCH] ASoC: Fix twl4030 and 6040 regression by adding back read > and write > > Commit 3bb0f7c31b1a ("ASoC: don't use snd_soc_write/read on twl4030") > caused regressions for both twl4030 and twl6040 as it assumes the > ASoC driver is using regmap. As a side effect, this also causes a > considerable increase in idle power consumption omap3 boards using > twl4030 as the PMIC. > > This is because the removal of read and write function pointers > causes some of the ASoC IO functions to not do anything. For example, > snd_soc_register_card() calls snd_soc_dapm_new_widgets() that calls > snd_soc_codec_drv_read() that now does nothing. > > A long term solution suggested by Mark Brown <broonie@kernel.org> > is to make the twl drivers use regmap by adding a call to > snd_soc_codec_set_regmap(). This however needs more consideration > as currently the driver internal reads do caching and we would have > both regmap access and internal read/write access accessing the same > hardware registers. > > So to fix the regression, let's just do a partial revert adding back > the read and write function pointers. Note that other non-regmap > ASoC drivers may need similar patches. > > Fixes: 3bb0f7c31b1a ("ASoC: don't use snd_soc_write/read on twl4030") > Fixes: 93a00c467fe9 ("ASoC: don't use snd_soc_write/read on twl6040") > Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> > Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> > Signed-off-by: Tony Lindgren <tony@atomide.com> > --- Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Mark I think all other driver needs .read/.write back. I will post it. And .read/.write is supported on codec driver side only now, so, we need it to component side too. I will post it too. Best regards --- Kuninori Morimoto ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-15 18:55 ` Tony Lindgren 2018-01-16 0:38 ` Kuninori Morimoto @ 2018-01-17 9:47 ` Peter Ujfalusi 1 sibling, 0 replies; 33+ messages in thread From: Peter Ujfalusi @ 2018-01-17 9:47 UTC (permalink / raw) To: Tony Lindgren, Mark Brown Cc: Kuninori Morimoto, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm Hi, On 2018-01-15 20:55, Tony Lindgren wrote: > * Mark Brown <broonie@kernel.org> [180115 18:14]: >> On Mon, Jan 15, 2018 at 10:06:26AM -0800, Tony Lindgren wrote: >>> * Mark Brown <broonie@kernel.org> [180115 17:56]: >> >>>> Sorry, I didn't actually post it - just suggested adding the >>>> snd_soc_codec_set_regmap() call. >> >>> Oh I see. Yeah that seems like a good long term solution after >>> the regressions are fixed. >> >> I would like to see us do that sooner rather than later assuming it >> addresses the issue, it's quicker and simpler. > > Peter, can you take a look at that? Sorry I have missed the thread... > > Meanwhile, here's a regression fix for Linux next for twl4030 > and 6040 drivers. The .read and the .write in snd_soc_codec_driver is needed. Without it DAPM can not read/write to registers when we have codec driver which does not have regmap. twl4030/6040 is an MFD device and the regmap is owned by the mfd driver, the codec, vibra, gpio, clk, etc sub devices are using callbacks to do IO. I will look at the local caching, but afaik it was needed and can not be done in regmap or mfd level. I will also take a look at the dac33 driver to see if it is broken, but I don't have Nokia n9/n950 where I could test it. Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com> > > Regards, > > Tony > > 8< --------------------- > From tony Mon Sep 17 00:00:00 2001 > From: Tony Lindgren <tony@atomide.com> > Date: Fri, 12 Jan 2018 10:24:36 -0800 > Subject: [PATCH] ASoC: Fix twl4030 and 6040 regression by adding back read > and write > > Commit 3bb0f7c31b1a ("ASoC: don't use snd_soc_write/read on twl4030") > caused regressions for both twl4030 and twl6040 as it assumes the > ASoC driver is using regmap. As a side effect, this also causes a > considerable increase in idle power consumption omap3 boards using > twl4030 as the PMIC. > > This is because the removal of read and write function pointers > causes some of the ASoC IO functions to not do anything. For example, > snd_soc_register_card() calls snd_soc_dapm_new_widgets() that calls > snd_soc_codec_drv_read() that now does nothing. > > A long term solution suggested by Mark Brown <broonie@kernel.org> > is to make the twl drivers use regmap by adding a call to > snd_soc_codec_set_regmap(). This however needs more consideration > as currently the driver internal reads do caching and we would have > both regmap access and internal read/write access accessing the same > hardware registers. > > So to fix the regression, let's just do a partial revert adding back > the read and write function pointers. Note that other non-regmap > ASoC drivers may need similar patches. > > Fixes: 3bb0f7c31b1a ("ASoC: don't use snd_soc_write/read on twl4030") > Fixes: 93a00c467fe9 ("ASoC: don't use snd_soc_write/read on twl6040") > Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> > Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> > Signed-off-by: Tony Lindgren <tony@atomide.com> > --- > sound/soc/codecs/twl4030.c | 2 ++ > sound/soc/codecs/twl6040.c | 2 ++ > 2 files changed, 4 insertions(+) > > diff --git a/sound/soc/codecs/twl4030.c b/sound/soc/codecs/twl4030.c > --- a/sound/soc/codecs/twl4030.c > +++ b/sound/soc/codecs/twl4030.c > @@ -2195,6 +2195,8 @@ static int twl4030_soc_remove(struct snd_soc_codec *codec) > static const struct snd_soc_codec_driver soc_codec_dev_twl4030 = { > .probe = twl4030_soc_probe, > .remove = twl4030_soc_remove, > + .read = twl4030_read, > + .write = twl4030_write, > .set_bias_level = twl4030_set_bias_level, > .idle_bias_off = true, > > diff --git a/sound/soc/codecs/twl6040.c b/sound/soc/codecs/twl6040.c > --- a/sound/soc/codecs/twl6040.c > +++ b/sound/soc/codecs/twl6040.c > @@ -1158,6 +1158,8 @@ static int twl6040_remove(struct snd_soc_codec *codec) > static const struct snd_soc_codec_driver soc_codec_dev_twl6040 = { > .probe = twl6040_probe, > .remove = twl6040_remove, > + .read = twl6040_read, > + .write = twl6040_write, > .set_bias_level = twl6040_set_bias_level, > .suspend_bias_off = true, > .ignore_pmdown_time = true, > - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <871siqbte4.wl%kuninori.morimoto.gx@renesas.com>]
* Re: PM regression in next [not found] ` <871siqbte4.wl%kuninori.morimoto.gx@renesas.com> @ 2018-01-16 0:36 ` Tony Lindgren 0 siblings, 0 replies; 33+ messages in thread From: Tony Lindgren @ 2018-01-16 0:36 UTC (permalink / raw) To: Kuninori Morimoto Cc: Mark Brown, Peter Ujfalusi, Andrew Morton, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm * Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> [180115 23:22]: > > For the next set of changes like this, > > please make sure you Cc the driver authors so they can test > > and ack the changes if you can't test them, OK? > > Actually, I have posted "convert codec/platform to component" > patch-set to ALSA ML, without Cc:ing to you. > > Subject: [PATCH 00/38] ASoC: replace platform to component > Date: Fri, 12 Jan 2018 09:40:29 +0900 > > Subject: [PATCH 000/187] ASoC: replace codec to component > Date: Fri, 12 Jan 2018 10:05:05 +0900 > > I will send test request mail to each driver's authoer Thanks, yeah that's something for Peter to look at :) Regards, Tony ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PM regression in next 2018-01-12 21:07 ` Tony Lindgren 2018-01-12 21:15 ` Mark Brown @ 2018-01-12 21:38 ` Mark Brown 1 sibling, 0 replies; 33+ messages in thread From: Mark Brown @ 2018-01-12 21:38 UTC (permalink / raw) To: Tony Lindgren Cc: Andrew Morton, Kuninori Morimoto, Stephen Rothwell, linux-kernel, linux-arm-kernel, linux-omap, Rafael J. Wysocki, linux-pm [-- Attachment #1: Type: text/plain, Size: 804 bytes --] On Fri, Jan 12, 2018 at 01:07:06PM -0800, Tony Lindgren wrote: > Tturns out just adding back .read = twl4030_read fixes it.. > I added a dummy function for read and am now seeing a bunch > of reads that now don't happen: What's supposed to happen here is that you end up with a component->regmap set and snd_soc_component_read() uses that to do the read. That normally gets retrieved via dev_get_regmap() but it looks like this is broken here by the rather novel chip design confusing the assumptions that code makes. I *think* that setting the regmap to twl_get_regmap(TWL4030_MODULE_AUDIO_VOICE) using snd_soc_codec_init_regmap() in twl4030_soc_probe ought to do the trick (assuming my guesses about what's going on are right) but can't test. I'd check twl6040 too, I bet it's got the same issue. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2018-01-17 9:47 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-12 0:01 PM regression in next Tony Lindgren 2018-01-12 0:18 ` Andrew Morton 2018-01-12 0:23 ` Tony Lindgren 2018-01-12 0:45 ` Andrew Morton 2018-01-12 1:20 ` Tony Lindgren 2018-01-12 1:32 ` Tony Lindgren 2018-01-12 12:23 ` Rafael J. Wysocki 2018-01-12 12:30 ` Rafael J. Wysocki 2018-01-12 13:01 ` Lars-Peter Clausen 2018-01-12 13:16 ` Andrew Lunn 2018-01-12 13:52 ` Tony Lindgren 2018-01-12 13:55 ` Andrew Lunn 2018-01-12 14:14 ` Tony Lindgren 2018-01-12 19:00 ` Tony Lindgren 2018-01-12 19:12 ` Mark Brown 2018-01-12 21:07 ` Tony Lindgren 2018-01-12 21:15 ` Mark Brown 2018-01-12 21:50 ` Tony Lindgren 2018-01-12 22:11 ` Mark Brown 2018-01-12 22:49 ` Tony Lindgren 2018-01-12 22:59 ` Mark Brown 2018-01-15 1:45 ` Kuninori Morimoto 2018-01-15 16:50 ` Tony Lindgren 2018-01-15 17:19 ` Mark Brown 2018-01-15 17:52 ` Tony Lindgren 2018-01-15 17:56 ` Mark Brown 2018-01-15 18:06 ` Tony Lindgren 2018-01-15 18:13 ` Mark Brown 2018-01-15 18:55 ` Tony Lindgren 2018-01-16 0:38 ` Kuninori Morimoto 2018-01-17 9:47 ` Peter Ujfalusi [not found] ` <871siqbte4.wl%kuninori.morimoto.gx@renesas.com> 2018-01-16 0:36 ` Tony Lindgren 2018-01-12 21:38 ` Mark Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).