All of lore.kernel.org
 help / color / mirror / Atom feed
From: Viresh Kumar <viresh.kumar@linaro.org>
To: Qian Cai <quic_qiancai@quicinc.com>
Cc: Rafael Wysocki <rjw@rjwysocki.net>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Jonathan Corbet <corbet@lwn.net>, Len Brown <lenb@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Paul Mackerras <paulus@samba.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	linux-pm@vger.kernel.org,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Ionela Voinescu <ionela.voinescu@arm.com>,
	Dirk Brandewie <dirk.j.brandewie@intel.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code
Date: Tue, 15 Jun 2021 13:20:56 +0530	[thread overview]
Message-ID: <20210615075056.dfkbiftuoihtrfpo@vireshk-i7> (raw)
In-Reply-To: <eaaaf171-5937-e0f2-8447-c1b20b474c62@quicinc.com>

Hi Qian,

First of all thanks for testing this, I need more of your help to test
this out :)

FWIW, I did test this on my Hikey board today, with some hacks, and
tried multiple insmod/rmmod operations for the driver, and I wasn't
able to reproduce the issue you reported. I did enable the list-debug
config option.

On 14-06-21, 09:48, Qian Cai wrote:
> Unfortunately, this series looks like needing more works.
> 
> [  487.773586][    T0] CPU17: Booted secondary processor 0x0000000801 [0x503f0002]
> [  487.976495][  T670] list_del corruption. next->prev should be ffff009b66e9ec70, but was ffff009b66dfec70
> [  487.987037][  T670] ------------[ cut here ]------------
> [  487.992351][  T670] kernel BUG at lib/list_debug.c:54!
> [  487.997810][  T670] Internal error: Oops - BUG: 0 [#1] SMP
> [  488.003295][  T670] Modules linked in: cpufreq_userspace xfs loop cppc_cpufreq processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class
> [  488.021759][  T670] CPU: 1 PID: 670 Comm: cppc_fie Not tainted 5.13.0-rc5-next-20210611+ #46
> [  488.030190][  T670] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> [  488.038705][  T670] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--)
> [  488.045398][  T670] pc : __list_del_entry_valid+0x154/0x158
> [  488.050969][  T670] lr : __list_del_entry_valid+0x154/0x158
> [  488.056534][  T670] sp : ffff8000229afd70
> [  488.060534][  T670] x29: ffff8000229afd70 x28: ffff0008c8f4f340 x27: dfff800000000000
> [  488.068361][  T670] x26: ffff009b66e9ec70 x25: ffff800011c8b4d0 x24: ffff0008d4bfe488
> [  488.076188][  T670] x23: ffff0008c8f4f340 x22: ffff0008c8f4f340 x21: ffff009b6789ec70
> [  488.084015][  T670] x20: ffff0008d4bfe4c8 x19: ffff009b66e9ec70 x18: ffff0008c8f4fd70
> [  488.091842][  T670] x17: 20747562202c3037 x16: 6365396536366239 x15: 0000000000000028
> [  488.099669][  T670] x14: 0000000000000000 x13: 0000000000000001 x12: ffff60136cdd3447
> [  488.107495][  T670] x11: 1fffe0136cdd3446 x10: ffff60136cdd3446 x9 : ffff8000103ee444
> [  488.115322][  T670] x8 : ffff009b66e9a237 x7 : 0000000000000001 x6 : ffff009b66e9a230
> [  488.123149][  T670] x5 : 00009fec9322cbba x4 : ffff60136cdd3447 x3 : 1fffe001191e9e69
> [  488.130975][  T670] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000054
> [  488.138803][  T670] Call trace:
> [  488.141935][  T670]  __list_del_entry_valid+0x154/0x158
> [  488.147153][  T670]  kthread_worker_fn+0x15c/0xda0

This is a strange place to get the issue from. And this is a new
issue.

> [  488.151939][  T670]  kthread+0x3ac/0x460
> [  488.155854][  T670]  ret_from_fork+0x10/0x18
> [  488.160120][  T670] Code: 911e8000 aa1303e1 910a0000 941b595b (d4210000)
> [  488.166901][  T670] ---[ end trace e637e2d38b2cc087 ]---
> [  488.172206][  T670] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [  488.179182][  T670] SMP: stopping secondary CPUs
> [  489.209347][  T670] SMP: failed to stop secondary CPUs 0-1,10-11,16-17,31
> [  489.216128][  T][  T670] Memoryn ]---

Can you give details on what exactly did you try to do, to get this ?
Normal boot or something more ?

I have made some changes to the way calls were happening, may get this
thing sorted. Can you please try this branch ?

https://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git/log/?h=cpufreq/cppc

I can see one place where race can happen, i.e. between
topology_clear_scale_freq_source() and topology_scale_freq_tick(). It
is possible that sfd->set_freq_scale() may get called for a previously
set handler as there is no protection there.

I will see how to fix that. But I am not sure if the issue reported
above comes from there.

Anyway, please give my branch a try, lets see.

-- 
viresh

WARNING: multiple messages have this Message-ID (diff)
From: Viresh Kumar <viresh.kumar@linaro.org>
To: Qian Cai <quic_qiancai@quicinc.com>
Cc: linuxppc-dev@lists.ozlabs.org,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-doc@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	Dirk Brandewie <dirk.j.brandewie@intel.com>,
	linux-pm@vger.kernel.org,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Rafael Wysocki <rjw@rjwysocki.net>,
	linux-kernel@vger.kernel.org, Paul Mackerras <paulus@samba.org>,
	Ionela Voinescu <ionela.voinescu@arm.com>,
	Len Brown <lenb@kernel.org>
Subject: Re: [PATCH 0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code
Date: Tue, 15 Jun 2021 13:20:56 +0530	[thread overview]
Message-ID: <20210615075056.dfkbiftuoihtrfpo@vireshk-i7> (raw)
In-Reply-To: <eaaaf171-5937-e0f2-8447-c1b20b474c62@quicinc.com>

Hi Qian,

First of all thanks for testing this, I need more of your help to test
this out :)

FWIW, I did test this on my Hikey board today, with some hacks, and
tried multiple insmod/rmmod operations for the driver, and I wasn't
able to reproduce the issue you reported. I did enable the list-debug
config option.

On 14-06-21, 09:48, Qian Cai wrote:
> Unfortunately, this series looks like needing more works.
> 
> [  487.773586][    T0] CPU17: Booted secondary processor 0x0000000801 [0x503f0002]
> [  487.976495][  T670] list_del corruption. next->prev should be ffff009b66e9ec70, but was ffff009b66dfec70
> [  487.987037][  T670] ------------[ cut here ]------------
> [  487.992351][  T670] kernel BUG at lib/list_debug.c:54!
> [  487.997810][  T670] Internal error: Oops - BUG: 0 [#1] SMP
> [  488.003295][  T670] Modules linked in: cpufreq_userspace xfs loop cppc_cpufreq processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class
> [  488.021759][  T670] CPU: 1 PID: 670 Comm: cppc_fie Not tainted 5.13.0-rc5-next-20210611+ #46
> [  488.030190][  T670] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> [  488.038705][  T670] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--)
> [  488.045398][  T670] pc : __list_del_entry_valid+0x154/0x158
> [  488.050969][  T670] lr : __list_del_entry_valid+0x154/0x158
> [  488.056534][  T670] sp : ffff8000229afd70
> [  488.060534][  T670] x29: ffff8000229afd70 x28: ffff0008c8f4f340 x27: dfff800000000000
> [  488.068361][  T670] x26: ffff009b66e9ec70 x25: ffff800011c8b4d0 x24: ffff0008d4bfe488
> [  488.076188][  T670] x23: ffff0008c8f4f340 x22: ffff0008c8f4f340 x21: ffff009b6789ec70
> [  488.084015][  T670] x20: ffff0008d4bfe4c8 x19: ffff009b66e9ec70 x18: ffff0008c8f4fd70
> [  488.091842][  T670] x17: 20747562202c3037 x16: 6365396536366239 x15: 0000000000000028
> [  488.099669][  T670] x14: 0000000000000000 x13: 0000000000000001 x12: ffff60136cdd3447
> [  488.107495][  T670] x11: 1fffe0136cdd3446 x10: ffff60136cdd3446 x9 : ffff8000103ee444
> [  488.115322][  T670] x8 : ffff009b66e9a237 x7 : 0000000000000001 x6 : ffff009b66e9a230
> [  488.123149][  T670] x5 : 00009fec9322cbba x4 : ffff60136cdd3447 x3 : 1fffe001191e9e69
> [  488.130975][  T670] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000054
> [  488.138803][  T670] Call trace:
> [  488.141935][  T670]  __list_del_entry_valid+0x154/0x158
> [  488.147153][  T670]  kthread_worker_fn+0x15c/0xda0

This is a strange place to get the issue from. And this is a new
issue.

> [  488.151939][  T670]  kthread+0x3ac/0x460
> [  488.155854][  T670]  ret_from_fork+0x10/0x18
> [  488.160120][  T670] Code: 911e8000 aa1303e1 910a0000 941b595b (d4210000)
> [  488.166901][  T670] ---[ end trace e637e2d38b2cc087 ]---
> [  488.172206][  T670] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [  488.179182][  T670] SMP: stopping secondary CPUs
> [  489.209347][  T670] SMP: failed to stop secondary CPUs 0-1,10-11,16-17,31
> [  489.216128][  T][  T670] Memoryn ]---

Can you give details on what exactly did you try to do, to get this ?
Normal boot or something more ?

I have made some changes to the way calls were happening, may get this
thing sorted. Can you please try this branch ?

https://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git/log/?h=cpufreq/cppc

I can see one place where race can happen, i.e. between
topology_clear_scale_freq_source() and topology_scale_freq_tick(). It
is possible that sfd->set_freq_scale() may get called for a previously
set handler as there is no protection there.

I will see how to fix that. But I am not sure if the issue reported
above comes from there.

Anyway, please give my branch a try, lets see.

-- 
viresh

  reply	other threads:[~2021-06-15  7:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10  8:23 [PATCH 0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code Viresh Kumar
2021-06-10  8:23 ` Viresh Kumar
2021-06-10  8:23 ` [PATCH 1/5] cpufreq: cppc: Migrate to ->exit() callback instead of ->stop_cpu() Viresh Kumar
2021-06-10  8:23 ` [PATCH 2/5] cpufreq: intel_pstate: " Viresh Kumar
2021-06-10  8:23 ` [PATCH 3/5] cpufreq: powerenv: " Viresh Kumar
2021-06-10  8:23   ` Viresh Kumar
2021-06-10  8:24 ` [PATCH 4/5] cpufreq: Add start_cpu() and stop_cpu() callbacks Viresh Kumar
2021-06-10  8:24 ` [PATCH 5/5] cpufreq: cppc: Fix suspend/resume specific races with the FIE code Viresh Kumar
2021-06-14 13:48 ` [PATCH 0/5] cpufreq: cppc: Fix suspend/resume specific races with " Qian Cai
2021-06-14 13:48   ` Qian Cai
2021-06-15  7:50   ` Viresh Kumar [this message]
2021-06-15  7:50     ` Viresh Kumar
2021-06-15  9:38     ` Viresh Kumar
2021-06-15  9:38       ` Viresh Kumar
2021-06-15 12:17     ` Qian Cai
2021-06-15 12:17       ` Qian Cai
2021-06-16  4:57       ` Viresh Kumar
2021-06-16  4:57         ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210615075056.dfkbiftuoihtrfpo@vireshk-i7 \
    --to=viresh.kumar@linaro.org \
    --cc=benh@kernel.crashing.org \
    --cc=corbet@lwn.net \
    --cc=dirk.j.brandewie@intel.com \
    --cc=ionela.voinescu@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=quic_qiancai@quicinc.com \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.