All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Francisco Jerez <currojerez@riseup.net>
Cc: kernel test robot <rong.a.chen@intel.com>,
	Linux PM <linux-pm@vger.kernel.org>,
	Linux Documentation <linux-doc@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Giovanni Gherdovich <ggherdovich@suse.cz>,
	Doug Smythies <dsmythies@telus.net>, 0day robot <lkp@intel.com>,
	lkp@lists.01.org
Subject: Re: [cpufreq] 48fc4efcdd: WARNING:possible_circular_locking_dependency_detected
Date: Mon, 03 Aug 2020 19:19:47 +0200	[thread overview]
Message-ID: <2379312.oHovjqWFI2@kreacher> (raw)
In-Reply-To: <87v9i3clnh.fsf@riseup.net>

On Saturday, August 1, 2020 1:25:22 AM CEST Francisco Jerez wrote:
> kernel test robot <rong.a.chen@intel.com> writes:
> 
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: 48fc4efcdd584e8f04da8b195262ec38221bd6db ("[PATCH v2 2/2] cpufreq: intel_pstate: Implement passive mode with HWP enabled")
> > url: https://github.com/0day-ci/linux/commits/Rafael-J-Wysocki/cpufreq-intel_pstate-Implement-passive-mode-with-HWP-enabled/20200717-014718
> > base: https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git linux-next
> >
> > in testcase: kernel-selftests
> > with following parameters:
> >
> > 	group: kselftests-cpufreq
> > 	ucode: 0x5002f01
> >
> > test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> > test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> >
> >
> > on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <rong.a.chen@intel.com>
> >
> >
> > [   74.745395] WARNING: possible circular locking dependency detected
> > [   74.752071] 5.8.0-rc5-00064-g48fc4efcdd584 #1 Not tainted
> > [   74.757949] ------------------------------------------------------
> > [   74.764604] main.sh/3019 is trying to acquire lock:
> > [   74.769954] ffffffff82ec7f08 (intel_pstate_driver_lock){+.+.}-{3:3}, at: store_energy_performance_preference+0xec/0x250
> > [   74.781228] 
> > [   74.781228] but task is already holding lock:
> > [   74.788017] ffff88b07b5f8380 (&policy->rwsem){++++}-{3:3}, at: store+0x69/0xa0
> > [   74.795711] 
> > [   74.795711] which lock already depends on the new lock.
> > [   74.795711] 
> > [   74.805222] 
> > [   74.805222] the existing dependency chain (in reverse order) is:
> > [   74.813529] 
> > [   74.813529] -> #2 (&policy->rwsem){++++}-{3:3}:
> > [   74.820371]        __lock_acquire+0x538/0xa90
> > [   74.825144]        lock_acquire+0xab/0x390
> > [   74.829660]        down_write+0x38/0x70
> > [   74.833910]        store+0x69/0xa0
> > [   74.837724]        kernfs_fop_write+0xdc/0x1c0
> > [   74.842576]        vfs_write+0xee/0x220
> > [   74.846814]        ksys_write+0x68/0xe0
> > [   74.851039]        do_syscall_64+0x52/0xb0
> > [   74.855512]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [   74.861463] 
> > [   74.861463] -> #1 (cpu_hotplug_lock){++++}-{0:0}:
> > [   74.868395]        __lock_acquire+0x538/0xa90
> > [   74.873127]        lock_acquire+0xab/0x390
> > [   74.877597]        cpus_read_lock+0x39/0xd0
> > [   74.882146]        cpufreq_register_driver+0xb6/0x2a0
> > [   74.887556]        intel_pstate_register_driver+0x32/0x70
> > [   74.893312]        intel_pstate_init+0x45d/0x4ca
> > [   74.898281]        do_one_initcall+0x5d/0x330
> > [   74.902984]        kernel_init_freeable+0x248/0x2c9
> > [   74.908185]        kernel_init+0xa/0x112
> > [   74.912430]        ret_from_fork+0x1f/0x30
> > [   74.916850] 
> > [   74.916850] -> #0 (intel_pstate_driver_lock){+.+.}-{3:3}:
> > [   74.924376]        check_prev_add+0x8e/0x9f0
> > [   74.928988]        validate_chain+0x79b/0x1080
> > [   74.933773]        __lock_acquire+0x538/0xa90
> > [   74.938469]        lock_acquire+0xab/0x390
> > [   74.942896]        __mutex_lock+0xac/0x9c0
> > [   74.947318]        store_energy_performance_preference+0xec/0x250
> > [   74.953735]        store+0x7c/0xa0
> > [   74.957458]        kernfs_fop_write+0xdc/0x1c0
> > [   74.962231]        vfs_write+0xee/0x220
> > [   74.966395]        ksys_write+0x68/0xe0
> > [   74.970549]        do_syscall_64+0x52/0xb0
> > [   74.974975]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [   74.980859] 
> > [   74.980859] other info that might help us debug this:
> > [   74.980859] 
> > [   74.989744] Chain exists of:
> > [   74.989744]   intel_pstate_driver_lock --> cpu_hotplug_lock --> &policy->rwsem
> > [   74.989744] 
> > [   75.002149]  Possible unsafe locking scenario:
> > [   75.002149] 
> > [   75.008643]        CPU0                    CPU1
> > [   75.013461]        ----                    ----
> > [   75.018275]   lock(&policy->rwsem);
> > [   75.022037]                                lock(cpu_hotplug_lock);
> > [   75.028495]                                lock(&policy->rwsem);
> > [   75.034773]   lock(intel_pstate_driver_lock);
> > [   75.039427] 
> > [   75.039427]  *** DEADLOCK ***
> > [   75.039427] 
> > [   75.046168] 5 locks held by main.sh/3019:
> > [   75.050456]  #0: ffff888b9d973438 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x1ba/0x220
> > [   75.058750]  #1: ffff888beaaec488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write+0xa5/0x1c0
> > [   75.067413]  #2: ffff8898284de890 (kn->active#163){.+.+}-{0:0}, at: kernfs_fop_write+0xad/0x1c0
> > [   75.076422]  #3: ffffffff82c73a70 (cpu_hotplug_lock){++++}-{0:0}, at: store+0x26/0xa0
> > [   75.084566]  #4: ffff88b07b5f8380 (&policy->rwsem){++++}-{3:3}, at: store+0x69/0xa0
> > [   75.092528] 
> > [   75.092528] stack backtrace:
> > [   75.097500] CPU: 0 PID: 3019 Comm: main.sh Not tainted 5.8.0-rc5-00064-g48fc4efcdd584 #1
> > [   75.105902] Hardware name: Intel Corporation ............/S9200WKBRD2, BIOS SE5C620.86B.0D.01.0552.060220191912 06/02/2019
> > [   75.117274] Call Trace:
> > [   75.120081]  dump_stack+0x96/0xd0
> > [   75.123754]  check_noncircular+0x162/0x180
> > [   75.128220]  check_prev_add+0x8e/0x9f0
> > [   75.132325]  validate_chain+0x79b/0x1080
> > [   75.136589]  __lock_acquire+0x538/0xa90
> > [   75.140767]  lock_acquire+0xab/0x390
> > [   75.144684]  ? store_energy_performance_preference+0xec/0x250
> > [   75.150775]  __mutex_lock+0xac/0x9c0
> > [   75.154697]  ? store_energy_performance_preference+0xec/0x250
> > [   75.160786]  ? __lock_acquire+0x538/0xa90
> > [   75.165147]  ? store_energy_performance_preference+0xec/0x250
> > [   75.171282]  ? sscanf+0x4e/0x70
> > [   75.174781]  ? store_energy_performance_preference+0xec/0x250
> > [   75.180886]  store_energy_performance_preference+0xec/0x250
> > [   75.186816]  store+0x7c/0xa0
> > [   75.190056]  kernfs_fop_write+0xdc/0x1c0
> > [   75.194338]  vfs_write+0xee/0x220
> > [   75.198011]  ksys_write+0x68/0xe0
> > [   75.201685]  do_syscall_64+0x52/0xb0
> > [   75.205617]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [   75.211051] RIP: 0033:0x7f2c1dfa9504
> > [   75.214999] Code: Bad RIP value.
> > [   75.218582] RSP: 002b:00007fffe26da068 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >
> 
> Was about to review v4 of this series, but from my reading of it it
> seems like this locking inversion situation could also occur with your
> last revision.  Is it a legitimate concern Rafael?
> 
> >[...]
> 

There a mistake in the patch.

Because energy_performance_preference is a policy attribute,
it is not valid to acquire intel_pstate_driver_lock in its "store"
routine, but that routine can check the driver pointer without additional
locking, because that pointer cannot change while it is running.

I've just sent a v5 to fix this problem.

Thanks!




WARNING: multiple messages have this Message-ID (diff)
From: Rafael J. Wysocki <rjw@rjwysocki.net>
To: lkp@lists.01.org
Subject: Re: [cpufreq] 48fc4efcdd: WARNING:possible_circular_locking_dependency_detected
Date: Mon, 03 Aug 2020 19:19:47 +0200	[thread overview]
Message-ID: <2379312.oHovjqWFI2@kreacher> (raw)
In-Reply-To: <87v9i3clnh.fsf@riseup.net>

[-- Attachment #1: Type: text/plain, Size: 7441 bytes --]

On Saturday, August 1, 2020 1:25:22 AM CEST Francisco Jerez wrote:
> kernel test robot <rong.a.chen@intel.com> writes:
> 
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: 48fc4efcdd584e8f04da8b195262ec38221bd6db ("[PATCH v2 2/2] cpufreq: intel_pstate: Implement passive mode with HWP enabled")
> > url: https://github.com/0day-ci/linux/commits/Rafael-J-Wysocki/cpufreq-intel_pstate-Implement-passive-mode-with-HWP-enabled/20200717-014718
> > base: https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git linux-next
> >
> > in testcase: kernel-selftests
> > with following parameters:
> >
> > 	group: kselftests-cpufreq
> > 	ucode: 0x5002f01
> >
> > test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> > test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> >
> >
> > on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <rong.a.chen@intel.com>
> >
> >
> > [   74.745395] WARNING: possible circular locking dependency detected
> > [   74.752071] 5.8.0-rc5-00064-g48fc4efcdd584 #1 Not tainted
> > [   74.757949] ------------------------------------------------------
> > [   74.764604] main.sh/3019 is trying to acquire lock:
> > [   74.769954] ffffffff82ec7f08 (intel_pstate_driver_lock){+.+.}-{3:3}, at: store_energy_performance_preference+0xec/0x250
> > [   74.781228] 
> > [   74.781228] but task is already holding lock:
> > [   74.788017] ffff88b07b5f8380 (&policy->rwsem){++++}-{3:3}, at: store+0x69/0xa0
> > [   74.795711] 
> > [   74.795711] which lock already depends on the new lock.
> > [   74.795711] 
> > [   74.805222] 
> > [   74.805222] the existing dependency chain (in reverse order) is:
> > [   74.813529] 
> > [   74.813529] -> #2 (&policy->rwsem){++++}-{3:3}:
> > [   74.820371]        __lock_acquire+0x538/0xa90
> > [   74.825144]        lock_acquire+0xab/0x390
> > [   74.829660]        down_write+0x38/0x70
> > [   74.833910]        store+0x69/0xa0
> > [   74.837724]        kernfs_fop_write+0xdc/0x1c0
> > [   74.842576]        vfs_write+0xee/0x220
> > [   74.846814]        ksys_write+0x68/0xe0
> > [   74.851039]        do_syscall_64+0x52/0xb0
> > [   74.855512]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [   74.861463] 
> > [   74.861463] -> #1 (cpu_hotplug_lock){++++}-{0:0}:
> > [   74.868395]        __lock_acquire+0x538/0xa90
> > [   74.873127]        lock_acquire+0xab/0x390
> > [   74.877597]        cpus_read_lock+0x39/0xd0
> > [   74.882146]        cpufreq_register_driver+0xb6/0x2a0
> > [   74.887556]        intel_pstate_register_driver+0x32/0x70
> > [   74.893312]        intel_pstate_init+0x45d/0x4ca
> > [   74.898281]        do_one_initcall+0x5d/0x330
> > [   74.902984]        kernel_init_freeable+0x248/0x2c9
> > [   74.908185]        kernel_init+0xa/0x112
> > [   74.912430]        ret_from_fork+0x1f/0x30
> > [   74.916850] 
> > [   74.916850] -> #0 (intel_pstate_driver_lock){+.+.}-{3:3}:
> > [   74.924376]        check_prev_add+0x8e/0x9f0
> > [   74.928988]        validate_chain+0x79b/0x1080
> > [   74.933773]        __lock_acquire+0x538/0xa90
> > [   74.938469]        lock_acquire+0xab/0x390
> > [   74.942896]        __mutex_lock+0xac/0x9c0
> > [   74.947318]        store_energy_performance_preference+0xec/0x250
> > [   74.953735]        store+0x7c/0xa0
> > [   74.957458]        kernfs_fop_write+0xdc/0x1c0
> > [   74.962231]        vfs_write+0xee/0x220
> > [   74.966395]        ksys_write+0x68/0xe0
> > [   74.970549]        do_syscall_64+0x52/0xb0
> > [   74.974975]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [   74.980859] 
> > [   74.980859] other info that might help us debug this:
> > [   74.980859] 
> > [   74.989744] Chain exists of:
> > [   74.989744]   intel_pstate_driver_lock --> cpu_hotplug_lock --> &policy->rwsem
> > [   74.989744] 
> > [   75.002149]  Possible unsafe locking scenario:
> > [   75.002149] 
> > [   75.008643]        CPU0                    CPU1
> > [   75.013461]        ----                    ----
> > [   75.018275]   lock(&policy->rwsem);
> > [   75.022037]                                lock(cpu_hotplug_lock);
> > [   75.028495]                                lock(&policy->rwsem);
> > [   75.034773]   lock(intel_pstate_driver_lock);
> > [   75.039427] 
> > [   75.039427]  *** DEADLOCK ***
> > [   75.039427] 
> > [   75.046168] 5 locks held by main.sh/3019:
> > [   75.050456]  #0: ffff888b9d973438 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x1ba/0x220
> > [   75.058750]  #1: ffff888beaaec488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write+0xa5/0x1c0
> > [   75.067413]  #2: ffff8898284de890 (kn->active#163){.+.+}-{0:0}, at: kernfs_fop_write+0xad/0x1c0
> > [   75.076422]  #3: ffffffff82c73a70 (cpu_hotplug_lock){++++}-{0:0}, at: store+0x26/0xa0
> > [   75.084566]  #4: ffff88b07b5f8380 (&policy->rwsem){++++}-{3:3}, at: store+0x69/0xa0
> > [   75.092528] 
> > [   75.092528] stack backtrace:
> > [   75.097500] CPU: 0 PID: 3019 Comm: main.sh Not tainted 5.8.0-rc5-00064-g48fc4efcdd584 #1
> > [   75.105902] Hardware name: Intel Corporation ............/S9200WKBRD2, BIOS SE5C620.86B.0D.01.0552.060220191912 06/02/2019
> > [   75.117274] Call Trace:
> > [   75.120081]  dump_stack+0x96/0xd0
> > [   75.123754]  check_noncircular+0x162/0x180
> > [   75.128220]  check_prev_add+0x8e/0x9f0
> > [   75.132325]  validate_chain+0x79b/0x1080
> > [   75.136589]  __lock_acquire+0x538/0xa90
> > [   75.140767]  lock_acquire+0xab/0x390
> > [   75.144684]  ? store_energy_performance_preference+0xec/0x250
> > [   75.150775]  __mutex_lock+0xac/0x9c0
> > [   75.154697]  ? store_energy_performance_preference+0xec/0x250
> > [   75.160786]  ? __lock_acquire+0x538/0xa90
> > [   75.165147]  ? store_energy_performance_preference+0xec/0x250
> > [   75.171282]  ? sscanf+0x4e/0x70
> > [   75.174781]  ? store_energy_performance_preference+0xec/0x250
> > [   75.180886]  store_energy_performance_preference+0xec/0x250
> > [   75.186816]  store+0x7c/0xa0
> > [   75.190056]  kernfs_fop_write+0xdc/0x1c0
> > [   75.194338]  vfs_write+0xee/0x220
> > [   75.198011]  ksys_write+0x68/0xe0
> > [   75.201685]  do_syscall_64+0x52/0xb0
> > [   75.205617]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [   75.211051] RIP: 0033:0x7f2c1dfa9504
> > [   75.214999] Code: Bad RIP value.
> > [   75.218582] RSP: 002b:00007fffe26da068 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >
> 
> Was about to review v4 of this series, but from my reading of it it
> seems like this locking inversion situation could also occur with your
> last revision.  Is it a legitimate concern Rafael?
> 
> >[...]
> 

There a mistake in the patch.

Because energy_performance_preference is a policy attribute,
it is not valid to acquire intel_pstate_driver_lock in its "store"
routine, but that routine can check the driver pointer without additional
locking, because that pointer cannot change while it is running.

I've just sent a v5 to fix this problem.

Thanks!



  reply	other threads:[~2020-08-03 17:19 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-16 17:37 [PATCH v2 0/2] cpufreq: intel_pstate: Implement passive mode with HWP enabled Rafael J. Wysocki
2020-07-16 17:38 ` [PATCH v2 1/2] cpufreq: intel_pstate: Rearrange the storing of newv EPP values Rafael J. Wysocki
2020-07-16 17:42 ` [PATCH v2 2/2] cpufreq: intel_pstate: Implement passive mode with HWP enabled Rafael J. Wysocki
2020-07-31 11:59   ` [cpufreq] 48fc4efcdd: WARNING:possible_circular_locking_dependency_detected kernel test robot
2020-07-31 23:25     ` Francisco Jerez
2020-08-03 17:19       ` Rafael J. Wysocki [this message]
2020-08-03 17:19         ` Rafael J. Wysocki
2020-07-27 15:13 ` [PATCH v3 0/2] cpufreq: intel_pstate: Implement passive mode with HWP enabled Rafael J. Wysocki
2020-07-27 15:15   ` [PATCH v3 1/2] cpufreq: intel_pstate: Rearrange the storing of new EPP values Rafael J. Wysocki
2020-07-27 15:17   ` [PATCH v3 2/2] cpufreq: intel_pstate: Implement passive mode with HWP enabled Rafael J. Wysocki
2020-07-28 15:09   ` [PATCH v4 0/2] " Rafael J. Wysocki
2020-07-28 15:11     ` [PATCH v4 1/2] cpufreq: intel_pstate: Rearrange the storing of new EPP values Rafael J. Wysocki
2020-07-30  1:31       ` Francisco Jerez
2020-07-28 15:13     ` [PATCH v4 2/2] cpufreq: intel_pstate: Implement passive mode with HWP enabled Rafael J. Wysocki
2020-08-01 23:21       ` Srinivas Pandruvada
2020-08-02 14:14         ` Doug Smythies
2020-08-02 19:20           ` Srinivas Pandruvada
2020-08-03 17:17       ` [PATCH v5] " Rafael J. Wysocki
2020-08-01 16:39     ` [PATCH v4 0/2] " Srinivas Pandruvada
2020-08-02 14:00       ` Doug Smythies
2020-08-02 18:39         ` Srinivas Pandruvada
2020-08-03  0:26           ` Doug Smythies
2020-08-03 17:23       ` Rafael J. Wysocki
2020-08-04 15:10   ` [PATCH v6] " Rafael J. Wysocki
2020-08-04 17:04     ` Doug Smythies
2020-08-05  9:34       ` Rafael J. Wysocki
2020-08-05 15:38         ` Srinivas Pandruvada
2020-08-05 16:28           ` Rafael J. Wysocki
2020-08-05 16:55   ` [PATCH v7] " Rafael J. Wysocki
2020-08-06  5:55     ` Doug Smythies
2020-08-06 11:25       ` Rafael J. Wysocki
2020-08-06 12:03     ` Rafael J. Wysocki
2020-08-10  0:44       ` Srinivas Pandruvada
2020-08-11  0:51       ` Francisco Jerez
2020-08-11 15:33         ` Rafael J. Wysocki
2020-08-17 21:06       ` Doug Smythies
2020-09-07  0:16         ` Doug Smythies

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2379312.oHovjqWFI2@kreacher \
    --to=rjw@rjwysocki.net \
    --cc=currojerez@riseup.net \
    --cc=dsmythies@telus.net \
    --cc=ggherdovich@suse.cz \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=peterz@infradead.org \
    --cc=rong.a.chen@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.