From: Viresh Kumar <viresh.kumar@linaro.org> To: Qian Cai <quic_qiancai@quicinc.com> Cc: Rafael Wysocki <rjw@rjwysocki.net>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Jonathan Corbet <corbet@lwn.net>, Len Brown <lenb@kernel.org>, Michael Ellerman <mpe@ellerman.id.au>, Paul Mackerras <paulus@samba.org>, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>, linux-pm@vger.kernel.org, Vincent Guittot <vincent.guittot@linaro.org>, Ionela Voinescu <ionela.voinescu@arm.com>, Dirk Brandewie <dirk.j.brandewie@intel.com>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code Date: Tue, 15 Jun 2021 13:20:56 +0530 [thread overview] Message-ID: <20210615075056.dfkbiftuoihtrfpo@vireshk-i7> (raw) In-Reply-To: <eaaaf171-5937-e0f2-8447-c1b20b474c62@quicinc.com> Hi Qian, First of all thanks for testing this, I need more of your help to test this out :) FWIW, I did test this on my Hikey board today, with some hacks, and tried multiple insmod/rmmod operations for the driver, and I wasn't able to reproduce the issue you reported. I did enable the list-debug config option. On 14-06-21, 09:48, Qian Cai wrote: > Unfortunately, this series looks like needing more works. > > [ 487.773586][ T0] CPU17: Booted secondary processor 0x0000000801 [0x503f0002] > [ 487.976495][ T670] list_del corruption. next->prev should be ffff009b66e9ec70, but was ffff009b66dfec70 > [ 487.987037][ T670] ------------[ cut here ]------------ > [ 487.992351][ T670] kernel BUG at lib/list_debug.c:54! > [ 487.997810][ T670] Internal error: Oops - BUG: 0 [#1] SMP > [ 488.003295][ T670] Modules linked in: cpufreq_userspace xfs loop cppc_cpufreq processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class > [ 488.021759][ T670] CPU: 1 PID: 670 Comm: cppc_fie Not tainted 5.13.0-rc5-next-20210611+ #46 > [ 488.030190][ T670] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020 > [ 488.038705][ T670] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--) > [ 488.045398][ T670] pc : __list_del_entry_valid+0x154/0x158 > [ 488.050969][ T670] lr : __list_del_entry_valid+0x154/0x158 > [ 488.056534][ T670] sp : ffff8000229afd70 > [ 488.060534][ T670] x29: ffff8000229afd70 x28: ffff0008c8f4f340 x27: dfff800000000000 > [ 488.068361][ T670] x26: ffff009b66e9ec70 x25: ffff800011c8b4d0 x24: ffff0008d4bfe488 > [ 488.076188][ T670] x23: ffff0008c8f4f340 x22: ffff0008c8f4f340 x21: ffff009b6789ec70 > [ 488.084015][ T670] x20: ffff0008d4bfe4c8 x19: ffff009b66e9ec70 x18: ffff0008c8f4fd70 > [ 488.091842][ T670] x17: 20747562202c3037 x16: 6365396536366239 x15: 0000000000000028 > [ 488.099669][ T670] x14: 0000000000000000 x13: 0000000000000001 x12: ffff60136cdd3447 > [ 488.107495][ T670] x11: 1fffe0136cdd3446 x10: ffff60136cdd3446 x9 : ffff8000103ee444 > [ 488.115322][ T670] x8 : ffff009b66e9a237 x7 : 0000000000000001 x6 : ffff009b66e9a230 > [ 488.123149][ T670] x5 : 00009fec9322cbba x4 : ffff60136cdd3447 x3 : 1fffe001191e9e69 > [ 488.130975][ T670] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000054 > [ 488.138803][ T670] Call trace: > [ 488.141935][ T670] __list_del_entry_valid+0x154/0x158 > [ 488.147153][ T670] kthread_worker_fn+0x15c/0xda0 This is a strange place to get the issue from. And this is a new issue. > [ 488.151939][ T670] kthread+0x3ac/0x460 > [ 488.155854][ T670] ret_from_fork+0x10/0x18 > [ 488.160120][ T670] Code: 911e8000 aa1303e1 910a0000 941b595b (d4210000) > [ 488.166901][ T670] ---[ end trace e637e2d38b2cc087 ]--- > [ 488.172206][ T670] Kernel panic - not syncing: Oops - BUG: Fatal exception > [ 488.179182][ T670] SMP: stopping secondary CPUs > [ 489.209347][ T670] SMP: failed to stop secondary CPUs 0-1,10-11,16-17,31 > [ 489.216128][ T][ T670] Memoryn ]--- Can you give details on what exactly did you try to do, to get this ? Normal boot or something more ? I have made some changes to the way calls were happening, may get this thing sorted. Can you please try this branch ? https://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git/log/?h=cpufreq/cppc I can see one place where race can happen, i.e. between topology_clear_scale_freq_source() and topology_scale_freq_tick(). It is possible that sfd->set_freq_scale() may get called for a previously set handler as there is no protection there. I will see how to fix that. But I am not sure if the issue reported above comes from there. Anyway, please give my branch a try, lets see. -- viresh
WARNING: multiple messages have this Message-ID (diff)
From: Viresh Kumar <viresh.kumar@linaro.org> To: Qian Cai <quic_qiancai@quicinc.com> Cc: linuxppc-dev@lists.ozlabs.org, Vincent Guittot <vincent.guittot@linaro.org>, linux-doc@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>, Dirk Brandewie <dirk.j.brandewie@intel.com>, linux-pm@vger.kernel.org, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>, Rafael Wysocki <rjw@rjwysocki.net>, linux-kernel@vger.kernel.org, Paul Mackerras <paulus@samba.org>, Ionela Voinescu <ionela.voinescu@arm.com>, Len Brown <lenb@kernel.org> Subject: Re: [PATCH 0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code Date: Tue, 15 Jun 2021 13:20:56 +0530 [thread overview] Message-ID: <20210615075056.dfkbiftuoihtrfpo@vireshk-i7> (raw) In-Reply-To: <eaaaf171-5937-e0f2-8447-c1b20b474c62@quicinc.com> Hi Qian, First of all thanks for testing this, I need more of your help to test this out :) FWIW, I did test this on my Hikey board today, with some hacks, and tried multiple insmod/rmmod operations for the driver, and I wasn't able to reproduce the issue you reported. I did enable the list-debug config option. On 14-06-21, 09:48, Qian Cai wrote: > Unfortunately, this series looks like needing more works. > > [ 487.773586][ T0] CPU17: Booted secondary processor 0x0000000801 [0x503f0002] > [ 487.976495][ T670] list_del corruption. next->prev should be ffff009b66e9ec70, but was ffff009b66dfec70 > [ 487.987037][ T670] ------------[ cut here ]------------ > [ 487.992351][ T670] kernel BUG at lib/list_debug.c:54! > [ 487.997810][ T670] Internal error: Oops - BUG: 0 [#1] SMP > [ 488.003295][ T670] Modules linked in: cpufreq_userspace xfs loop cppc_cpufreq processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class > [ 488.021759][ T670] CPU: 1 PID: 670 Comm: cppc_fie Not tainted 5.13.0-rc5-next-20210611+ #46 > [ 488.030190][ T670] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020 > [ 488.038705][ T670] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--) > [ 488.045398][ T670] pc : __list_del_entry_valid+0x154/0x158 > [ 488.050969][ T670] lr : __list_del_entry_valid+0x154/0x158 > [ 488.056534][ T670] sp : ffff8000229afd70 > [ 488.060534][ T670] x29: ffff8000229afd70 x28: ffff0008c8f4f340 x27: dfff800000000000 > [ 488.068361][ T670] x26: ffff009b66e9ec70 x25: ffff800011c8b4d0 x24: ffff0008d4bfe488 > [ 488.076188][ T670] x23: ffff0008c8f4f340 x22: ffff0008c8f4f340 x21: ffff009b6789ec70 > [ 488.084015][ T670] x20: ffff0008d4bfe4c8 x19: ffff009b66e9ec70 x18: ffff0008c8f4fd70 > [ 488.091842][ T670] x17: 20747562202c3037 x16: 6365396536366239 x15: 0000000000000028 > [ 488.099669][ T670] x14: 0000000000000000 x13: 0000000000000001 x12: ffff60136cdd3447 > [ 488.107495][ T670] x11: 1fffe0136cdd3446 x10: ffff60136cdd3446 x9 : ffff8000103ee444 > [ 488.115322][ T670] x8 : ffff009b66e9a237 x7 : 0000000000000001 x6 : ffff009b66e9a230 > [ 488.123149][ T670] x5 : 00009fec9322cbba x4 : ffff60136cdd3447 x3 : 1fffe001191e9e69 > [ 488.130975][ T670] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000054 > [ 488.138803][ T670] Call trace: > [ 488.141935][ T670] __list_del_entry_valid+0x154/0x158 > [ 488.147153][ T670] kthread_worker_fn+0x15c/0xda0 This is a strange place to get the issue from. And this is a new issue. > [ 488.151939][ T670] kthread+0x3ac/0x460 > [ 488.155854][ T670] ret_from_fork+0x10/0x18 > [ 488.160120][ T670] Code: 911e8000 aa1303e1 910a0000 941b595b (d4210000) > [ 488.166901][ T670] ---[ end trace e637e2d38b2cc087 ]--- > [ 488.172206][ T670] Kernel panic - not syncing: Oops - BUG: Fatal exception > [ 488.179182][ T670] SMP: stopping secondary CPUs > [ 489.209347][ T670] SMP: failed to stop secondary CPUs 0-1,10-11,16-17,31 > [ 489.216128][ T][ T670] Memoryn ]--- Can you give details on what exactly did you try to do, to get this ? Normal boot or something more ? I have made some changes to the way calls were happening, may get this thing sorted. Can you please try this branch ? https://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git/log/?h=cpufreq/cppc I can see one place where race can happen, i.e. between topology_clear_scale_freq_source() and topology_scale_freq_tick(). It is possible that sfd->set_freq_scale() may get called for a previously set handler as there is no protection there. I will see how to fix that. But I am not sure if the issue reported above comes from there. Anyway, please give my branch a try, lets see. -- viresh
next prev parent reply other threads:[~2021-06-15 7:51 UTC|newest] Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-06-10 8:23 [PATCH 0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code Viresh Kumar 2021-06-10 8:23 ` Viresh Kumar 2021-06-10 8:23 ` [PATCH 1/5] cpufreq: cppc: Migrate to ->exit() callback instead of ->stop_cpu() Viresh Kumar 2021-06-10 8:23 ` [PATCH 2/5] cpufreq: intel_pstate: " Viresh Kumar 2021-06-10 8:23 ` [PATCH 3/5] cpufreq: powerenv: " Viresh Kumar 2021-06-10 8:23 ` Viresh Kumar 2021-06-10 8:24 ` [PATCH 4/5] cpufreq: Add start_cpu() and stop_cpu() callbacks Viresh Kumar 2021-06-10 8:24 ` [PATCH 5/5] cpufreq: cppc: Fix suspend/resume specific races with the FIE code Viresh Kumar 2021-06-14 13:48 ` [PATCH 0/5] cpufreq: cppc: Fix suspend/resume specific races with " Qian Cai 2021-06-14 13:48 ` Qian Cai 2021-06-15 7:50 ` Viresh Kumar [this message] 2021-06-15 7:50 ` Viresh Kumar 2021-06-15 9:38 ` Viresh Kumar 2021-06-15 9:38 ` Viresh Kumar 2021-06-15 12:17 ` Qian Cai 2021-06-15 12:17 ` Qian Cai 2021-06-16 4:57 ` Viresh Kumar 2021-06-16 4:57 ` Viresh Kumar
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210615075056.dfkbiftuoihtrfpo@vireshk-i7 \ --to=viresh.kumar@linaro.org \ --cc=benh@kernel.crashing.org \ --cc=corbet@lwn.net \ --cc=dirk.j.brandewie@intel.com \ --cc=ionela.voinescu@arm.com \ --cc=lenb@kernel.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-pm@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mpe@ellerman.id.au \ --cc=paulus@samba.org \ --cc=quic_qiancai@quicinc.com \ --cc=rjw@rjwysocki.net \ --cc=srinivas.pandruvada@linux.intel.com \ --cc=vincent.guittot@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.