All of lore.kernel.org
 help / color / mirror / Atom feed
* intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
@ 2016-03-17 13:02 Josh Boyer
  2016-03-17 14:07 ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Josh Boyer @ 2016-03-17 13:02 UTC (permalink / raw)
  To: Srinivas Pandruvada, Len Brown
  Cc: Rafael J. Wysocki, Viresh Kumar, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org

Hello,

I have an Intel Atom based NUC that is producing the following
backtraces on boot of Linus' tree as of last evening.  This does not
happen with a tree with top level commit 271ecc5253e2, but does happen
when using the tree mentioned in the subject with top level commit
63e30271b04c.

The first backtrace appears to be a warning because the intel_pstate
driver is calling wrmsrl_on_cpu when interrupts are disabled?  Not
sure on that one.

The second backtrace is a lockdep report.  Both are from the same boot.

josh

[    2.063588] Intel P-state driver initializing.
[    2.064225] ------------[ cut here ]------------
[    2.064236] WARNING: CPU: 0 PID: 1 at kernel/smp.c:291
smp_call_function_single+0x116/0x130()
[    2.064240] Modules linked in:
[    2.064247] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.6.0-0.rc0.git6.1.fc25.x86_64 #1
[    2.064250] Hardware name:
\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff
\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff/DN2820FYK,
BIOS FYBYT10H.86A.0034.2014.0513.1413 05/13/2014
[    2.064254]  0000000000000086 000000008f3c08c6 ffff880237203bf0
ffffffff8144ed43
[    2.064262]  0000000000000000 ffffffff81c3eba4 ffff880237203c28
ffffffff810b1482
[    2.064269]  0000000000000000 ffffffff814881b0 ffff880232bdfa80
0000000000000093
[    2.064276] Call Trace:
[    2.064279]  <IRQ>  [<ffffffff8144ed43>] dump_stack+0x85/0xc2
[    2.064291]  [<ffffffff810b1482>] warn_slowpath_common+0x82/0xc0
[    2.064297]  [<ffffffff814881b0>] ? __rdmsr_on_cpu+0x70/0x70
[    2.064302]  [<ffffffff810b15ca>] warn_slowpath_null+0x1a/0x20
[    2.064307]  [<ffffffff81155666>] smp_call_function_single+0x116/0x130
[    2.064313]  [<ffffffff81488450>] wrmsrl_on_cpu+0x50/0x70
[    2.064319]  [<ffffffff8170beb4>] atom_set_pstate+0x74/0x90
[    2.064324]  [<ffffffff8170bcec>] intel_pstate_set_pstate+0x13c/0x220
[    2.064330]  [<ffffffff8170c025>] intel_pstate_update_util+0x155/0x340
[    2.064335]  [<ffffffff8110a411>] ? cpuacct_charge+0xd1/0x1a0
[    2.064340]  [<ffffffff8110a345>] ? cpuacct_charge+0x5/0x1a0
[    2.064346]  [<ffffffff810f74d3>] task_tick_fair+0x613/0xa60
[    2.064352]  [<ffffffff810e853b>] scheduler_tick+0x5b/0xe0
[    2.064357]  [<ffffffff8114f340>] ? tick_sched_do_timer+0x60/0x60
[    2.064362]  [<ffffffff8113cb81>] update_process_times+0x51/0x60
[    2.064367]  [<ffffffff8114eaa5>] tick_sched_handle.isra.20+0x25/0x60
[    2.064372]  [<ffffffff8114f37d>] tick_sched_timer+0x3d/0x70
[    2.064377]  [<ffffffff8113d8f0>] __hrtimer_run_queues+0x110/0x540
[    2.064382]  [<ffffffff8113e417>] hrtimer_interrupt+0xb7/0x1d0
[    2.064388]  [<ffffffff8105b258>] local_apic_timer_interrupt+0x38/0x60
[    2.064394]  [<ffffffff818b273d>] smp_apic_timer_interrupt+0x3d/0x50
[    2.064399]  [<ffffffff818b07a6>] apic_timer_interrupt+0x96/0xa0
[    2.064402]  <EOI>  [<ffffffff818af07b>] ?
_raw_spin_unlock_irqrestore+0x3b/0x60
[    2.064411]  [<ffffffff81450bd8>] ida_simple_get+0x98/0x100
[    2.064417]  [<ffffffff813236ef>] __kernfs_new_node+0x5f/0xc0
[    2.064423]  [<ffffffff813249b9>] kernfs_new_node+0x29/0x50
[    2.064427]  [<ffffffff81326502>] __kernfs_create_file+0x32/0xd0
[    2.064432]  [<ffffffff81326e40>] sysfs_add_file_mode_ns+0x90/0x1b0
[    2.064437]  [<ffffffff81326f8a>] sysfs_create_file_ns+0x2a/0x30
[    2.064442]  [<ffffffff8170872b>] cpufreq_online+0x30b/0x860
[    2.064447]  [<ffffffff81708d32>] cpufreq_add_dev+0x62/0x90
[    2.064453]  [<ffffffff815aa69c>] subsys_interface_register+0xbc/0x110
[    2.064459]  [<ffffffff81706aac>] cpufreq_register_driver+0x14c/0x210
[    2.064464]  [<ffffffff821f83e6>] intel_pstate_init+0x3b5/0x504
[    2.064469]  [<ffffffff821f8031>] ? intel_pstate_setup+0x9a/0x9a
[    2.064474]  [<ffffffff81002123>] do_one_initcall+0xb3/0x200
[    2.064479]  [<ffffffff810d86f7>] ? parse_args+0x297/0x4b0
[    2.064486]  [<ffffffff8219b227>] kernel_init_freeable+0x1f3/0x28d
[    2.064492]  [<ffffffff8189f6ee>] kernel_init+0xe/0x100
[    2.064497]  [<ffffffff818afcb2>] ret_from_fork+0x22/0x50
[    2.064501]  [<ffffffff8189f6e0>] ? rest_init+0x140/0x140
[    2.064661] ---[ end trace 83cfc0c0effa9465 ]---

[   21.748603] ======================================================
[   21.748606] [ INFO: possible circular locking dependency detected ]
[   21.748611] 4.6.0-0.rc0.git6.1.fc25.x86_64 #1 Tainted: G        W
[   21.748614] -------------------------------------------------------
[   21.748617] abrt-handle-eve/849 is trying to acquire lock:
[   21.748620]  (&p->pi_lock){-.-.-.}, at: [<ffffffff810e57f1>]
try_to_wake_up+0x31/0x510
[   21.748635]
               but task is already holding lock:
[   21.748638]  (random_write_wait.lock){-.....}, at:
[<ffffffff81102913>] __wake_up+0x23/0x50
[   21.748647]
               which lock already depends on the new lock.

[   21.748651]
               the existing dependency chain (in reverse order) is:
[   21.748655]
               -> #2 (random_write_wait.lock){-.....}:
[   21.748662]        [<ffffffff811128ee>] lock_acquire+0xfe/0x1f0
[   21.748667]        [<ffffffff818af7b6>] _raw_spin_lock_irqsave+0x56/0x90
[   21.748674]        [<ffffffff81102913>] __wake_up+0x23/0x50
[   21.748679]        [<ffffffff8157b442>] account.part.28+0x1d2/0x250
[   21.748685]        [<ffffffff8157cd5d>] extract_entropy+0x9d/0x530
[   21.748689]        [<ffffffff8157de21>] get_random_bytes+0x51/0x150
[   21.748694]        [<ffffffff810b13df>] print_oops_end_marker+0x3f/0x60
[   21.748700]        [<ffffffff810b1487>] warn_slowpath_common+0x87/0xc0
[   21.748705]        [<ffffffff810b15ca>] warn_slowpath_null+0x1a/0x20
[   21.748710]        [<ffffffff81155666>] smp_call_function_single+0x116/0x130
[   21.748716]        [<ffffffff81488450>] wrmsrl_on_cpu+0x50/0x70
[   21.748723]        [<ffffffff8170beb4>] atom_set_pstate+0x74/0x90
[   21.748729]        [<ffffffff8170bcec>] intel_pstate_set_pstate+0x13c/0x220
[   21.748734]        [<ffffffff8170c025>] intel_pstate_update_util+0x155/0x340
[   21.748739]        [<ffffffff810f74d3>] task_tick_fair+0x613/0xa60
[   21.748745]        [<ffffffff810e853b>] scheduler_tick+0x5b/0xe0
[   21.748749]        [<ffffffff8113cb81>] update_process_times+0x51/0x60
[   21.748755]        [<ffffffff8114eaa5>] tick_sched_handle.isra.20+0x25/0x60
[   21.748760]        [<ffffffff8114f37d>] tick_sched_timer+0x3d/0x70
[   21.748765]        [<ffffffff8113d8f0>] __hrtimer_run_queues+0x110/0x540
[   21.748769]        [<ffffffff8113e417>] hrtimer_interrupt+0xb7/0x1d0
[   21.748773]        [<ffffffff8105b258>] local_apic_timer_interrupt+0x38/0x60
[   21.748779]        [<ffffffff818b273d>] smp_apic_timer_interrupt+0x3d/0x50
[   21.748784]        [<ffffffff818b07a6>] apic_timer_interrupt+0x96/0xa0
[   21.748788]        [<ffffffff81450bd8>] ida_simple_get+0x98/0x100
[   21.748794]        [<ffffffff813236ef>] __kernfs_new_node+0x5f/0xc0
[   21.748800]        [<ffffffff813249b9>] kernfs_new_node+0x29/0x50
[   21.748805]        [<ffffffff81326502>] __kernfs_create_file+0x32/0xd0
[   21.748809]        [<ffffffff81326e40>] sysfs_add_file_mode_ns+0x90/0x1b0
[   21.748814]        [<ffffffff81326f8a>] sysfs_create_file_ns+0x2a/0x30
[   21.748819]        [<ffffffff8170872b>] cpufreq_online+0x30b/0x860
[   21.748824]        [<ffffffff81708d32>] cpufreq_add_dev+0x62/0x90
[   21.748828]        [<ffffffff815aa69c>] subsys_interface_register+0xbc/0x110
[   21.748835]        [<ffffffff81706aac>] cpufreq_register_driver+0x14c/0x210
[   21.748840]        [<ffffffff821f83e6>] intel_pstate_init+0x3b5/0x504
[   21.748846]        [<ffffffff81002123>] do_one_initcall+0xb3/0x200
[   21.748851]        [<ffffffff8219b227>] kernel_init_freeable+0x1f3/0x28d
[   21.748858]        [<ffffffff8189f6ee>] kernel_init+0xe/0x100
[   21.748863]        [<ffffffff818afcb2>] ret_from_fork+0x22/0x50
[   21.748868]
               -> #1 (&rq->lock){-.-.-.}:
[   21.748874]        [<ffffffff811128ee>] lock_acquire+0xfe/0x1f0
[   21.748880]        [<ffffffff818aeb9d>] _raw_spin_lock+0x3d/0x80
[   21.748884]        [<ffffffff810ec5ad>] sched_move_task+0x4d/0x280
[   21.748888]        [<ffffffff810ec7ee>] cpu_cgroup_fork+0xe/0x10
[   21.748893]        [<ffffffff8116e883>] cgroup_post_fork+0x43/0x130
[   21.748899]        [<ffffffff810afcdd>] copy_process.part.25+0x15ad/0x1b30
[   21.748904]        [<ffffffff810b043e>] _do_fork+0xee/0x700
[   21.748908]        [<ffffffff810b0a79>] kernel_thread+0x29/0x30
[   21.748913]        [<ffffffff8189f5c2>] rest_init+0x22/0x140
[   21.748917]        [<ffffffff8219b013>] start_kernel+0x49a/0x4bb
[   21.748922]        [<ffffffff8219a32c>] x86_64_start_reservations+0x2a/0x2c
[   21.748927]        [<ffffffff8219a478>] x86_64_start_kernel+0x14a/0x16d
[   21.748931]
               -> #0 (&p->pi_lock){-.-.-.}:
[   21.748937]        [<ffffffff81111cb4>] __lock_acquire+0x1a34/0x1c60
[   21.748942]        [<ffffffff811128ee>] lock_acquire+0xfe/0x1f0
[   21.748946]        [<ffffffff818af7b6>] _raw_spin_lock_irqsave+0x56/0x90
[   21.748951]        [<ffffffff810e57f1>] try_to_wake_up+0x31/0x510
[   21.748955]        [<ffffffff810e5d72>] default_wake_function+0x12/0x20
[   21.748960]        [<ffffffff812a7b13>] pollwake+0x73/0x90
[   21.748966]        [<ffffffff811028b2>] __wake_up_common+0x52/0x90
[   21.748970]        [<ffffffff81102929>] __wake_up+0x39/0x50
[   21.748974]        [<ffffffff8157b442>] account.part.28+0x1d2/0x250
[   21.748978]        [<ffffffff8157cd5d>] extract_entropy+0x9d/0x530
[   21.748983]        [<ffffffff8157de21>] get_random_bytes+0x51/0x150
[   21.748987]        [<ffffffff812fba6a>] load_elf_binary+0xc2a/0x1750
[   21.748992]        [<ffffffff81297c77>] search_binary_handler+0x97/0x1c0
[   21.748997]        [<ffffffff81299882>]
do_execveat_common.isra.36+0x6a2/0x9a0
[   21.749001]        [<ffffffff81299e6a>] SyS_execve+0x3a/0x50
[   21.749005]        [<ffffffff81004027>] do_syscall_64+0x67/0x190
[   21.749010]        [<ffffffff818afb3f>] return_from_SYSCALL_64+0x0/0x7a
[   21.749015]
               other info that might help us debug this:

[   21.749019] Chain exists of:
                 &p->pi_lock --> &rq->lock --> random_write_wait.lock

[   21.749026]  Possible unsafe locking scenario:

[   21.749030]        CPU0                    CPU1
[   21.749032]        ----                    ----
[   21.749034]   lock(random_write_wait.lock);
[   21.749038]                                lock(&rq->lock);
[   21.749042]                                lock(random_write_wait.lock);
[   21.749046]   lock(&p->pi_lock);
[   21.749050]
                *** DEADLOCK ***

[   21.749054] 1 lock held by abrt-handle-eve/849:
[   21.749056]  #0:  (random_write_wait.lock){-.....}, at:
[<ffffffff81102913>] __wake_up+0x23/0x50
[   21.749065]
               stack backtrace:
[   21.749070] CPU: 0 PID: 849 Comm: abrt-handle-eve Tainted: G
W       4.6.0-0.rc0.git6.1.fc25.x86_64 #1
[   21.749073] Hardware name:
\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff
\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff/DN2820FYK,
BIOS FYBYT10H.86A.0034.2014.0513.1413 05/13/2014
[   21.749077]  0000000000000086 0000000077cdecb8 ffff880221b13960
ffffffff8144ed43
[   21.749084]  ffffffff82af6c50 ffffffff82aebfe0 ffff880221b139a0
ffffffff8110e863
[   21.749090]  ffff880221b13a20 0000000000000001 ffff880221b6b0c0
ffff880221b6bd68
[   21.749097] Call Trace:
[   21.749102]  [<ffffffff8144ed43>] dump_stack+0x85/0xc2
[   21.749107]  [<ffffffff8110e863>] print_circular_bug+0x1e3/0x250
[   21.749112]  [<ffffffff81111cb4>] __lock_acquire+0x1a34/0x1c60
[   21.749117]  [<ffffffff8112f54d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[   21.749122]  [<ffffffff81110d47>] ? __lock_acquire+0xac7/0x1c60
[   21.749126]  [<ffffffff811107cc>] ? __lock_acquire+0x54c/0x1c60
[   21.749131]  [<ffffffff811128ee>] lock_acquire+0xfe/0x1f0
[   21.749136]  [<ffffffff810e57f1>] ? try_to_wake_up+0x31/0x510
[   21.749140]  [<ffffffff818af7b6>] _raw_spin_lock_irqsave+0x56/0x90
[   21.749145]  [<ffffffff810e57f1>] ? try_to_wake_up+0x31/0x510
[   21.749149]  [<ffffffff810e57f1>] try_to_wake_up+0x31/0x510
[   21.749153]  [<ffffffff810e5d72>] default_wake_function+0x12/0x20
[   21.749158]  [<ffffffff812a7b13>] pollwake+0x73/0x90
[   21.749162]  [<ffffffff810e5d60>] ? wake_up_q+0x70/0x70
[   21.749166]  [<ffffffff811028b2>] __wake_up_common+0x52/0x90
[   21.749170]  [<ffffffff81102929>] __wake_up+0x39/0x50
[   21.749174]  [<ffffffff8157b442>] account.part.28+0x1d2/0x250
[   21.749179]  [<ffffffff8157cd5d>] extract_entropy+0x9d/0x530
[   21.749183]  [<ffffffff812fba6a>] ? load_elf_binary+0xc2a/0x1750
[   21.749187]  [<ffffffff812fba6a>] ? load_elf_binary+0xc2a/0x1750
[   21.749191]  [<ffffffff8157de21>] get_random_bytes+0x51/0x150
[   21.749195]  [<ffffffff812fba6a>] load_elf_binary+0xc2a/0x1750
[   21.749200]  [<ffffffff81297c77>] search_binary_handler+0x97/0x1c0
[   21.749205]  [<ffffffff81299882>] do_execveat_common.isra.36+0x6a2/0x9a0
[   21.749209]  [<ffffffff812997d5>] ? do_execveat_common.isra.36+0x5f5/0x9a0
[   21.749213]  [<ffffffff81299e6a>] SyS_execve+0x3a/0x50
[   21.749217]  [<ffffffff81004027>] do_syscall_64+0x67/0x190
[   21.749222]  [<ffffffff818afb3f>] entry_SYSCALL64_slow_path+0x25/0x25

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-17 13:02 intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c Josh Boyer
@ 2016-03-17 14:07 ` Rafael J. Wysocki
  2016-03-17 14:34   ` Philippe Longepe
  2016-03-17 16:44   ` Josh Boyer
  0 siblings, 2 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2016-03-17 14:07 UTC (permalink / raw)
  To: Josh Boyer, Srinivas Pandruvada, Philippe Longepe
  Cc: Len Brown, Viresh Kumar, Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
> Hello,

Hi,

> I have an Intel Atom based NUC that is producing the following
> backtraces on boot of Linus' tree as of last evening.  This does not
> happen with a tree with top level commit 271ecc5253e2, but does happen
> when using the tree mentioned in the subject with top level commit
> 63e30271b04c.
> 
> The first backtrace appears to be a warning because the intel_pstate
> driver is calling wrmsrl_on_cpu when interrupts are disabled?  Not
> sure on that one.
> 
> The second backtrace is a lockdep report.  Both are from the same boot.

OK, thanks for the report.

Can you please try the patch below?

I'm actually unsure if we can do that safely in general for Atom because
of the initialization, but that's what Core does anyway.

Srinivas, Philippe, why exactly do we need the wrmsrl_on_cpu() in
atom_set_pstate()?  core_set_pstate() uses wrmsrl() and seems to be doing fine.

---
 drivers/cpufreq/intel_pstate.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -587,7 +587,7 @@ static void atom_set_pstate(struct cpuda
 
 	val |= vid;
 
-	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
+	wrmsrl(MSR_IA32_PERF_CTL, val);
 }
 
 static int silvermont_get_scaling(void)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-17 14:07 ` Rafael J. Wysocki
@ 2016-03-17 14:34   ` Philippe Longepe
  2016-03-17 16:44   ` Josh Boyer
  1 sibling, 0 replies; 19+ messages in thread
From: Philippe Longepe @ 2016-03-17 14:34 UTC (permalink / raw)
  To: Rafael J. Wysocki, Josh Boyer, Srinivas Pandruvada
  Cc: Len Brown, Viresh Kumar, Linux PM list, Linux-Kernel@Vger. Kernel. Org

Hi,

The wmsrl is supposed to write on the MSR corresponding to the cpu that 
is executing it.

However, it seems that the following commit done by Joe Konno already 
fixed this bug on BYT.

0dd23f94251f49da99a6cbfb22418b2d757d77d6

Now, we need to figure out why the wmsrl is not executed on the current cpu!

Regards,

Philippe


On 17/03/2016 15:07, Rafael J. Wysocki wrote:
> On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
>> Hello,
> Hi,
>
>> I have an Intel Atom based NUC that is producing the following
>> backtraces on boot of Linus' tree as of last evening.  This does not
>> happen with a tree with top level commit 271ecc5253e2, but does happen
>> when using the tree mentioned in the subject with top level commit
>> 63e30271b04c.
>>
>> The first backtrace appears to be a warning because the intel_pstate
>> driver is calling wrmsrl_on_cpu when interrupts are disabled?  Not
>> sure on that one.
>>
>> The second backtrace is a lockdep report.  Both are from the same boot.
> OK, thanks for the report.
>
> Can you please try the patch below?
>
> I'm actually unsure if we can do that safely in general for Atom because
> of the initialization, but that's what Core does anyway.
>
> Srinivas, Philippe, why exactly do we need the wrmsrl_on_cpu() in
> atom_set_pstate()?  core_set_pstate() uses wrmsrl() and seems to be doing fine.
>
> ---
>   drivers/cpufreq/intel_pstate.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -587,7 +587,7 @@ static void atom_set_pstate(struct cpuda
>   
>   	val |= vid;
>   
> -	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
> +	wrmsrl(MSR_IA32_PERF_CTL, val);
>   }
>   
>   static int silvermont_get_scaling(void)
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-17 14:07 ` Rafael J. Wysocki
  2016-03-17 14:34   ` Philippe Longepe
@ 2016-03-17 16:44   ` Josh Boyer
  2016-03-18  0:20     ` Rafael J. Wysocki
  1 sibling, 1 reply; 19+ messages in thread
From: Josh Boyer @ 2016-03-17 16:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srinivas Pandruvada, Philippe Longepe, Len Brown, Viresh Kumar,
	Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
>> Hello,
>
> Hi,
>
>> I have an Intel Atom based NUC that is producing the following
>> backtraces on boot of Linus' tree as of last evening.  This does not
>> happen with a tree with top level commit 271ecc5253e2, but does happen
>> when using the tree mentioned in the subject with top level commit
>> 63e30271b04c.
>>
>> The first backtrace appears to be a warning because the intel_pstate
>> driver is calling wrmsrl_on_cpu when interrupts are disabled?  Not
>> sure on that one.
>>
>> The second backtrace is a lockdep report.  Both are from the same boot.
>
> OK, thanks for the report.
>
> Can you please try the patch below?
>
> I'm actually unsure if we can do that safely in general for Atom because
> of the initialization, but that's what Core does anyway.
>
> Srinivas, Philippe, why exactly do we need the wrmsrl_on_cpu() in
> atom_set_pstate()?  core_set_pstate() uses wrmsrl() and seems to be doing fine.
>
> ---
>  drivers/cpufreq/intel_pstate.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -587,7 +587,7 @@ static void atom_set_pstate(struct cpuda
>
>         val |= vid;
>
> -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
> +       wrmsrl(MSR_IA32_PERF_CTL, val);
>  }
>
>  static int silvermont_get_scaling(void)
>

I applied this on top of commit 09fd671ccb24 and the backtrace and
lockdep report both go away.  So yes, this seems to clear up the
issue.  I tested it on a variety of different CPU types and didn't
notice anything wrong on them either.

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-17 16:44   ` Josh Boyer
@ 2016-03-18  0:20     ` Rafael J. Wysocki
  2016-03-18 12:37       ` Josh Boyer
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2016-03-18  0:20 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Srinivas Pandruvada, Philippe Longepe, Len Brown, Viresh Kumar,
	Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Thursday, March 17, 2016 12:44:54 PM Josh Boyer wrote:
> On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
> >> Hello,
> >
> > Hi,
> >
> >> I have an Intel Atom based NUC that is producing the following
> >> backtraces on boot of Linus' tree as of last evening.  This does not
> >> happen with a tree with top level commit 271ecc5253e2, but does happen
> >> when using the tree mentioned in the subject with top level commit
> >> 63e30271b04c.
> >>
> >> The first backtrace appears to be a warning because the intel_pstate
> >> driver is calling wrmsrl_on_cpu when interrupts are disabled?  Not
> >> sure on that one.
> >>
> >> The second backtrace is a lockdep report.  Both are from the same boot.
> >
> > OK, thanks for the report.
> >
> > Can you please try the patch below?
> >
> > I'm actually unsure if we can do that safely in general for Atom because
> > of the initialization, but that's what Core does anyway.
> >
> > Srinivas, Philippe, why exactly do we need the wrmsrl_on_cpu() in
> > atom_set_pstate()?  core_set_pstate() uses wrmsrl() and seems to be doing fine.
> >
> > ---
> >  drivers/cpufreq/intel_pstate.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Index: linux-pm/drivers/cpufreq/intel_pstate.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> > +++ linux-pm/drivers/cpufreq/intel_pstate.c
> > @@ -587,7 +587,7 @@ static void atom_set_pstate(struct cpuda
> >
> >         val |= vid;
> >
> > -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
> > +       wrmsrl(MSR_IA32_PERF_CTL, val);
> >  }
> >
> >  static int silvermont_get_scaling(void)
> >
> 
> I applied this on top of commit 09fd671ccb24 and the backtrace and
> lockdep report both go away.  So yes, this seems to clear up the
> issue.  I tested it on a variety of different CPU types and didn't
> notice anything wrong on them either.

The problems may show up during initialization and cleanup where one CPU
may be running code trying to configure a different one.  In those cases
wrmsrl_on_cpu() needs to be used.

Let me cut a patch taking that into account.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-18  0:20     ` Rafael J. Wysocki
@ 2016-03-18 12:37       ` Josh Boyer
  2016-03-18 14:36         ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Josh Boyer @ 2016-03-18 12:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srinivas Pandruvada, Philippe Longepe, Len Brown, Viresh Kumar,
	Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Thu, Mar 17, 2016 at 8:20 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, March 17, 2016 12:44:54 PM Josh Boyer wrote:
>> On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
>> >> Hello,
>> >
>> > Hi,
>> >
>> >> I have an Intel Atom based NUC that is producing the following
>> >> backtraces on boot of Linus' tree as of last evening.  This does not
>> >> happen with a tree with top level commit 271ecc5253e2, but does happen
>> >> when using the tree mentioned in the subject with top level commit
>> >> 63e30271b04c.
>> >>
>> >> The first backtrace appears to be a warning because the intel_pstate
>> >> driver is calling wrmsrl_on_cpu when interrupts are disabled?  Not
>> >> sure on that one.
>> >>
>> >> The second backtrace is a lockdep report.  Both are from the same boot.
>> >
>> > OK, thanks for the report.
>> >
>> > Can you please try the patch below?
>> >
>> > I'm actually unsure if we can do that safely in general for Atom because
>> > of the initialization, but that's what Core does anyway.
>> >
>> > Srinivas, Philippe, why exactly do we need the wrmsrl_on_cpu() in
>> > atom_set_pstate()?  core_set_pstate() uses wrmsrl() and seems to be doing fine.
>> >
>> > ---
>> >  drivers/cpufreq/intel_pstate.c |    2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > Index: linux-pm/drivers/cpufreq/intel_pstate.c
>> > ===================================================================
>> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>> > +++ linux-pm/drivers/cpufreq/intel_pstate.c
>> > @@ -587,7 +587,7 @@ static void atom_set_pstate(struct cpuda
>> >
>> >         val |= vid;
>> >
>> > -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
>> > +       wrmsrl(MSR_IA32_PERF_CTL, val);
>> >  }
>> >
>> >  static int silvermont_get_scaling(void)
>> >
>>
>> I applied this on top of commit 09fd671ccb24 and the backtrace and
>> lockdep report both go away.  So yes, this seems to clear up the
>> issue.  I tested it on a variety of different CPU types and didn't
>> notice anything wrong on them either.
>
> The problems may show up during initialization and cleanup where one CPU
> may be running code trying to configure a different one.  In those cases
> wrmsrl_on_cpu() needs to be used.
>
> Let me cut a patch taking that into account.

OK.  Happy to test when you have it ready.

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-18 12:37       ` Josh Boyer
@ 2016-03-18 14:36         ` Rafael J. Wysocki
  2016-03-18 16:13           ` Stephane Gasparini
  2016-03-18 17:35           ` Josh Boyer
  0 siblings, 2 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2016-03-18 14:36 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Srinivas Pandruvada, Philippe Longepe, Len Brown, Viresh Kumar,
	Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Friday, March 18, 2016 08:37:15 AM Josh Boyer wrote:
> On Thu, Mar 17, 2016 at 8:20 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Thursday, March 17, 2016 12:44:54 PM Josh Boyer wrote:
> >> On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >> > On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
> >> >> Hello,
> >> >
> >> > Hi,
> >> >
> >> >> I have an Intel Atom based NUC that is producing the following
> >> >> backtraces on boot of Linus' tree as of last evening.  This does not
> >> >> happen with a tree with top level commit 271ecc5253e2, but does happen
> >> >> when using the tree mentioned in the subject with top level commit
> >> >> 63e30271b04c.
> >> >>
> >> >> The first backtrace appears to be a warning because the intel_pstate
> >> >> driver is calling wrmsrl_on_cpu when interrupts are disabled?  Not
> >> >> sure on that one.
> >> >>
> >> >> The second backtrace is a lockdep report.  Both are from the same boot.
> >> >
> >> > OK, thanks for the report.
> >> >
> >> > Can you please try the patch below?
> >> >
> >> > I'm actually unsure if we can do that safely in general for Atom because
> >> > of the initialization, but that's what Core does anyway.
> >> >
> >> > Srinivas, Philippe, why exactly do we need the wrmsrl_on_cpu() in
> >> > atom_set_pstate()?  core_set_pstate() uses wrmsrl() and seems to be doing fine.
> >> >
> >> > ---
> >> >  drivers/cpufreq/intel_pstate.c |    2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > Index: linux-pm/drivers/cpufreq/intel_pstate.c
> >> > ===================================================================
> >> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> >> > +++ linux-pm/drivers/cpufreq/intel_pstate.c
> >> > @@ -587,7 +587,7 @@ static void atom_set_pstate(struct cpuda
> >> >
> >> >         val |= vid;
> >> >
> >> > -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
> >> > +       wrmsrl(MSR_IA32_PERF_CTL, val);
> >> >  }
> >> >
> >> >  static int silvermont_get_scaling(void)
> >> >
> >>
> >> I applied this on top of commit 09fd671ccb24 and the backtrace and
> >> lockdep report both go away.  So yes, this seems to clear up the
> >> issue.  I tested it on a variety of different CPU types and didn't
> >> notice anything wrong on them either.
> >
> > The problems may show up during initialization and cleanup where one CPU
> > may be running code trying to configure a different one.  In those cases
> > wrmsrl_on_cpu() needs to be used.
> >
> > Let me cut a patch taking that into account.
> 
> OK.  Happy to test when you have it ready.

Thanks!

Please test the patch below.

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts

After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with
utilization update callbacks) wrmsrl_on_cpu() cannot be called in the
intel_pstate_adjust_busy_pstate() path as that is executed with
disabled interrupts.  However, atom_set_pstate() called from there
via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
smp_call_function_single().

The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
because intel_pstate_set_pstate() calling it is also invoked during
the initialization and cleanup of the driver and in those cases it is
not guaranteed to be run on the CPU that is being updated.  However,
in the case when intel_pstate_set_pstate() is called by
intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
the register safely.  Moreover, intel_pstate_set_pstate() already
contains code that only is executed if the function is called by
intel_pstate_adjust_busy_pstate() and there is a special argument
passed to it because of that.

To fix the problem at hand, rearrange the code taking the above
observations into account.

First, replace the ->set() callback in struct pstate_funcs with a
->get_val() one that will return the value to be written to the
IA32_PERF_CTL MSR without updating the register.

Second, split intel_pstate_set_pstate() into two functions,
intel_pstate_update_pstate() to be called by
intel_pstate_adjust_busy_pstate() that will contain all of the
intel_pstate_set_pstate() code which only needs to be executed in
that case and will use wrmsrl() to update the MSR (after obtaining
the value to write to it from the ->get_val() callback), and
intel_pstate_set_min_pstate() to be invoked during the
initialization and cleanup that will set the P-state to the
minimum one and will update the MSR using wrmsrl_on_cpu().

Finally, move the code shared between intel_pstate_update_pstate()
and intel_pstate_set_min_pstate() to a new static inline function
intel_pstate_record_pstate() and make them both call it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
---
 drivers/cpufreq/intel_pstate.c |   73 ++++++++++++++++++++++++-----------------
 1 file changed, 43 insertions(+), 30 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -134,7 +134,7 @@ struct pstate_funcs {
 	int (*get_min)(void);
 	int (*get_turbo)(void);
 	int (*get_scaling)(void);
-	void (*set)(struct cpudata*, int pstate);
+	u64 (*get_val)(struct cpudata*, int pstate);
 	void (*get_vid)(struct cpudata *);
 	int32_t (*get_target_pstate)(struct cpudata *);
 };
@@ -565,7 +565,7 @@ static int atom_get_turbo_pstate(void)
 	return value & 0x7F;
 }
 
-static void atom_set_pstate(struct cpudata *cpudata, int pstate)
+static u64 atom_get_val(struct cpudata *cpudata, int pstate)
 {
 	u64 val;
 	int32_t vid_fp;
@@ -585,9 +585,7 @@ static void atom_set_pstate(struct cpuda
 	if (pstate > cpudata->pstate.max_pstate)
 		vid = cpudata->vid.turbo;
 
-	val |= vid;
-
-	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
+	return val | vid;
 }
 
 static int silvermont_get_scaling(void)
@@ -711,7 +709,7 @@ static inline int core_get_scaling(void)
 	return 100000;
 }
 
-static void core_set_pstate(struct cpudata *cpudata, int pstate)
+static u64 core_get_val(struct cpudata *cpudata, int pstate)
 {
 	u64 val;
 
@@ -719,7 +717,7 @@ static void core_set_pstate(struct cpuda
 	if (limits->no_turbo && !limits->turbo_disabled)
 		val |= (u64)1 << 32;
 
-	wrmsrl(MSR_IA32_PERF_CTL, val);
+	return val;
 }
 
 static int knl_get_turbo_pstate(void)
@@ -750,7 +748,7 @@ static struct cpu_defaults core_params =
 		.get_min = core_get_min_pstate,
 		.get_turbo = core_get_turbo_pstate,
 		.get_scaling = core_get_scaling,
-		.set = core_set_pstate,
+		.get_val = core_get_val,
 		.get_target_pstate = get_target_pstate_use_performance,
 	},
 };
@@ -769,7 +767,7 @@ static struct cpu_defaults silvermont_pa
 		.get_max_physical = atom_get_max_pstate,
 		.get_min = atom_get_min_pstate,
 		.get_turbo = atom_get_turbo_pstate,
-		.set = atom_set_pstate,
+		.get_val = atom_get_val,
 		.get_scaling = silvermont_get_scaling,
 		.get_vid = atom_get_vid,
 		.get_target_pstate = get_target_pstate_use_cpu_load,
@@ -790,7 +788,7 @@ static struct cpu_defaults airmont_param
 		.get_max_physical = atom_get_max_pstate,
 		.get_min = atom_get_min_pstate,
 		.get_turbo = atom_get_turbo_pstate,
-		.set = atom_set_pstate,
+		.get_val = atom_get_val,
 		.get_scaling = airmont_get_scaling,
 		.get_vid = atom_get_vid,
 		.get_target_pstate = get_target_pstate_use_cpu_load,
@@ -812,7 +810,7 @@ static struct cpu_defaults knl_params =
 		.get_min = core_get_min_pstate,
 		.get_turbo = knl_get_turbo_pstate,
 		.get_scaling = core_get_scaling,
-		.set = core_set_pstate,
+		.get_val = core_get_val,
 		.get_target_pstate = get_target_pstate_use_performance,
 	},
 };
@@ -839,25 +837,24 @@ static void intel_pstate_get_min_max(str
 	*min = clamp_t(int, min_perf, cpu->pstate.min_pstate, max_perf);
 }
 
-static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate, bool force)
+static inline void intel_pstate_record_pstate(struct cpudata *cpu, int pstate)
 {
-	int max_perf, min_perf;
-
-	if (force) {
-		update_turbo_state();
-
-		intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
-
-		pstate = clamp_t(int, pstate, min_perf, max_perf);
-
-		if (pstate == cpu->pstate.current_pstate)
-			return;
-	}
 	trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu);
-
 	cpu->pstate.current_pstate = pstate;
+}
 
-	pstate_funcs.set(cpu, pstate);
+static void intel_pstate_set_min_pstate(struct cpudata *cpu)
+{
+	int pstate = cpu->pstate.min_pstate;
+
+	intel_pstate_record_pstate(cpu, pstate);
+	/*
+	 * Generally, there is no guarantee that this code will always run on
+	 * the CPU being updated, so force the register update to run on the
+	 * right CPU.
+	 */
+	wrmsrl_on_cpu(cpu->cpu, MSR_IA32_PERF_CTL,
+		      pstate_funcs.get_val(cpu, pstate));
 }
 
 static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
@@ -870,7 +867,8 @@ static void intel_pstate_get_cpu_pstates
 
 	if (pstate_funcs.get_vid)
 		pstate_funcs.get_vid(cpu);
-	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate, false);
+
+	intel_pstate_set_min_pstate(cpu);
 }
 
 static inline void intel_pstate_calc_busy(struct cpudata *cpu)
@@ -997,6 +995,21 @@ static inline int32_t get_target_pstate_
 	return cpu->pstate.current_pstate - pid_calc(&cpu->pid, core_busy);
 }
 
+static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate)
+{
+	int max_perf, min_perf;
+
+	update_turbo_state();
+
+	intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
+	pstate = clamp_t(int, pstate, min_perf, max_perf);
+	if (pstate == cpu->pstate.current_pstate)
+		return;
+
+	intel_pstate_record_pstate(cpu, pstate);
+	wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu, pstate));
+}
+
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 {
 	int from, target_pstate;
@@ -1006,7 +1019,7 @@ static inline void intel_pstate_adjust_b
 
 	target_pstate = pstate_funcs.get_target_pstate(cpu);
 
-	intel_pstate_set_pstate(cpu, target_pstate, true);
+	intel_pstate_update_pstate(cpu, target_pstate);
 
 	sample = &cpu->sample;
 	trace_pstate_sample(fp_toint(sample->core_pct_busy),
@@ -1180,7 +1193,7 @@ static void intel_pstate_stop_cpu(struct
 	if (hwp_active)
 		return;
 
-	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate, false);
+	intel_pstate_set_min_pstate(cpu);
 }
 
 static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
@@ -1255,7 +1268,7 @@ static void copy_cpu_funcs(struct pstate
 	pstate_funcs.get_min   = funcs->get_min;
 	pstate_funcs.get_turbo = funcs->get_turbo;
 	pstate_funcs.get_scaling = funcs->get_scaling;
-	pstate_funcs.set       = funcs->set;
+	pstate_funcs.get_val   = funcs->get_val;
 	pstate_funcs.get_vid   = funcs->get_vid;
 	pstate_funcs.get_target_pstate = funcs->get_target_pstate;
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-18 14:36         ` Rafael J. Wysocki
@ 2016-03-18 16:13           ` Stephane Gasparini
  2016-03-18 17:52             ` Srinivas Pandruvada
  2016-03-18 17:35           ` Josh Boyer
  1 sibling, 1 reply; 19+ messages in thread
From: Stephane Gasparini @ 2016-03-18 16:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Josh Boyer, Srinivas Pandruvada, Philippe Longepe, Len Brown,
	Viresh Kumar, Linux PM list, Linux-Kernel@Vger. Kernel. Org

Rafael,

Something still obscure to me

1)
on commit 303ae7230751

core were using wrmsrl_on_cpu() in core_set_pstate()
atom were using wrmsrl() in byt_set_pstate()

call sequence is 


intel_state_init_cpu
 -> intel_pstate_get_cpu_pstates() 
   -> intel_pstate_set_pstate()
     -> pstate_funcs.set() +- byt_set_pstate()
                           +- core_set_pstate() 

2)
commit 0dd23f94251f aligned both core and atom 
so both used wrmsrl_on_cpu() in core_set_state() and byt_set_pstate()

Which means that during the init we are sure to always use wrmsrl_on_cpu()
on both atom and core.

The comment on this commit is not that precise, but we can guess that this 
issue solve here was happening during suspend_resume path that is off lining
on lining cpus (which happen a lot on Tablets and phone).

3)
at part of commit a4675fbc4a7a
the code of core_set_state  was changed to use wrmsrl instead 
of wrmsrl_on_cpu()
while the code of atom_set_pstate() (formerly byt_set_pstate() ) was kept
unchanged.


Why in step 3) both atom_set_pstate() and atom_set_pstate() were not both
changed to use wrmsrl ?
If it was OK to use wrmsrl on core, why it would not OK on atom ?

I assume that due to 0dd23f94251f commit, it was felt that the initialization 
issue as only been seen on BYT but this is not true as in step 1) core was 
already using wmsrl_on_cpu() vs byt that was using wrmsrl()

So either as part of commit a4675fbc4a7a both, core and atom should have kept 
using wrmsrl_on_cpu() or both should have been changed to use wrmsrl().

I still don’t see why atom would behave differently than core here.

If we go for using wmsrml only then reverting commit 0dd23f94251f is sufficient
compare to the patch you are proposing here.

If we agree that during initialization we need to use wrmsrl_on_cpu(), I would
change only the initialization on both core and atom.

BTW, what is the interest of setting the pstate to LFM during initialization ?
The BIOS is setting the pstate to either LFM, HFM or BFM, and why bothering
changing it.

Not setting the pstate at initialization will simply make the intel_state
starts from BIOS setting and will avoid all this complexity, i.e. calling
wrmsrl() in intel_pstate_set_pstate() and not calling it during init phase that
are prone to exhibit the issue.
 
—
Steph




> On Mar 18, 2016, at 3:36 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> 
> On Friday, March 18, 2016 08:37:15 AM Josh Boyer wrote:
>> On Thu, Mar 17, 2016 at 8:20 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> On Thursday, March 17, 2016 12:44:54 PM Josh Boyer wrote:
>>>> On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>>> On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
>>>>>> Hello,
>>>>> 
>>>>> Hi,
>>>>> 
>>>>>> I have an Intel Atom based NUC that is producing the following
>>>>>> backtraces on boot of Linus' tree as of last evening.  This does not
>>>>>> happen with a tree with top level commit 271ecc5253e2, but does happen
>>>>>> when using the tree mentioned in the subject with top level commit
>>>>>> 63e30271b04c.
>>>>>> 
>>>>>> The first backtrace appears to be a warning because the intel_pstate
>>>>>> driver is calling wrmsrl_on_cpu when interrupts are disabled?  Not
>>>>>> sure on that one.
>>>>>> 
>>>>>> The second backtrace is a lockdep report.  Both are from the same boot.
>>>>> 
>>>>> OK, thanks for the report.
>>>>> 
>>>>> Can you please try the patch below?
>>>>> 
>>>>> I'm actually unsure if we can do that safely in general for Atom because
>>>>> of the initialization, but that's what Core does anyway.
>>>>> 
>>>>> Srinivas, Philippe, why exactly do we need the wrmsrl_on_cpu() in
>>>>> atom_set_pstate()?  core_set_pstate() uses wrmsrl() and seems to be doing fine.
>>>>> 
>>>>> ---
>>>>> drivers/cpufreq/intel_pstate.c |    2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>> 
>>>>> Index: linux-pm/drivers/cpufreq/intel_pstate.c
>>>>> ===================================================================
>>>>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>>>>> +++ linux-pm/drivers/cpufreq/intel_pstate.c
>>>>> @@ -587,7 +587,7 @@ static void atom_set_pstate(struct cpuda
>>>>> 
>>>>>        val |= vid;
>>>>> 
>>>>> -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
>>>>> +       wrmsrl(MSR_IA32_PERF_CTL, val);
>>>>> }
>>>>> 
>>>>> static int silvermont_get_scaling(void)
>>>>> 
>>>> 
>>>> I applied this on top of commit 09fd671ccb24 and the backtrace and
>>>> lockdep report both go away.  So yes, this seems to clear up the
>>>> issue.  I tested it on a variety of different CPU types and didn't
>>>> notice anything wrong on them either.
>>> 
>>> The problems may show up during initialization and cleanup where one CPU
>>> may be running code trying to configure a different one.  In those cases
>>> wrmsrl_on_cpu() needs to be used.
>>> 
>>> Let me cut a patch taking that into account.
>> 
>> OK.  Happy to test when you have it ready.
> 
> Thanks!
> 
> Please test the patch below.
> 
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
> 
> After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with
> utilization update callbacks) wrmsrl_on_cpu() cannot be called in the
> intel_pstate_adjust_busy_pstate() path as that is executed with
> disabled interrupts.  However, atom_set_pstate() called from there
> via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
> IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
> smp_call_function_single().
> 
> The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
> because intel_pstate_set_pstate() calling it is also invoked during
> the initialization and cleanup of the driver and in those cases it is
> not guaranteed to be run on the CPU that is being updated.  However,
> in the case when intel_pstate_set_pstate() is called by
> intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
> the register safely.  Moreover, intel_pstate_set_pstate() already
> contains code that only is executed if the function is called by
> intel_pstate_adjust_busy_pstate() and there is a special argument
> passed to it because of that.
> 
> To fix the problem at hand, rearrange the code taking the above
> observations into account.
> 
> First, replace the ->set() callback in struct pstate_funcs with a
> ->get_val() one that will return the value to be written to the
> IA32_PERF_CTL MSR without updating the register.
> 
> Second, split intel_pstate_set_pstate() into two functions,
> intel_pstate_update_pstate() to be called by
> intel_pstate_adjust_busy_pstate() that will contain all of the
> intel_pstate_set_pstate() code which only needs to be executed in
> that case and will use wrmsrl() to update the MSR (after obtaining
> the value to write to it from the ->get_val() callback), and
> intel_pstate_set_min_pstate() to be invoked during the
> initialization and cleanup that will set the P-state to the
> minimum one and will update the MSR using wrmsrl_on_cpu().
> 
> Finally, move the code shared between intel_pstate_update_pstate()
> and intel_pstate_set_min_pstate() to a new static inline function
> intel_pstate_record_pstate() and make them both call it.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
> ---
> drivers/cpufreq/intel_pstate.c |   73 ++++++++++++++++++++++++-----------------
> 1 file changed, 43 insertions(+), 30 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -134,7 +134,7 @@ struct pstate_funcs {
> 	int (*get_min)(void);
> 	int (*get_turbo)(void);
> 	int (*get_scaling)(void);
> -	void (*set)(struct cpudata*, int pstate);
> +	u64 (*get_val)(struct cpudata*, int pstate);
> 	void (*get_vid)(struct cpudata *);
> 	int32_t (*get_target_pstate)(struct cpudata *);
> };
> @@ -565,7 +565,7 @@ static int atom_get_turbo_pstate(void)
> 	return value & 0x7F;
> }
> 
> -static void atom_set_pstate(struct cpudata *cpudata, int pstate)
> +static u64 atom_get_val(struct cpudata *cpudata, int pstate)
> {
> 	u64 val;
> 	int32_t vid_fp;
> @@ -585,9 +585,7 @@ static void atom_set_pstate(struct cpuda
> 	if (pstate > cpudata->pstate.max_pstate)
> 		vid = cpudata->vid.turbo;
> 
> -	val |= vid;
> -
> -	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
> +	return val | vid;
> }
> 
> static int silvermont_get_scaling(void)
> @@ -711,7 +709,7 @@ static inline int core_get_scaling(void)
> 	return 100000;
> }
> 
> -static void core_set_pstate(struct cpudata *cpudata, int pstate)
> +static u64 core_get_val(struct cpudata *cpudata, int pstate)
> {
> 	u64 val;
> 
> @@ -719,7 +717,7 @@ static void core_set_pstate(struct cpuda
> 	if (limits->no_turbo && !limits->turbo_disabled)
> 		val |= (u64)1 << 32;
> 
> -	wrmsrl(MSR_IA32_PERF_CTL, val);
> +	return val;
> }
> 
> static int knl_get_turbo_pstate(void)
> @@ -750,7 +748,7 @@ static struct cpu_defaults core_params =
> 		.get_min = core_get_min_pstate,
> 		.get_turbo = core_get_turbo_pstate,
> 		.get_scaling = core_get_scaling,
> -		.set = core_set_pstate,
> +		.get_val = core_get_val,
> 		.get_target_pstate = get_target_pstate_use_performance,
> 	},
> };
> @@ -769,7 +767,7 @@ static struct cpu_defaults silvermont_pa
> 		.get_max_physical = atom_get_max_pstate,
> 		.get_min = atom_get_min_pstate,
> 		.get_turbo = atom_get_turbo_pstate,
> -		.set = atom_set_pstate,
> +		.get_val = atom_get_val,
> 		.get_scaling = silvermont_get_scaling,
> 		.get_vid = atom_get_vid,
> 		.get_target_pstate = get_target_pstate_use_cpu_load,
> @@ -790,7 +788,7 @@ static struct cpu_defaults airmont_param
> 		.get_max_physical = atom_get_max_pstate,
> 		.get_min = atom_get_min_pstate,
> 		.get_turbo = atom_get_turbo_pstate,
> -		.set = atom_set_pstate,
> +		.get_val = atom_get_val,
> 		.get_scaling = airmont_get_scaling,
> 		.get_vid = atom_get_vid,
> 		.get_target_pstate = get_target_pstate_use_cpu_load,
> @@ -812,7 +810,7 @@ static struct cpu_defaults knl_params =
> 		.get_min = core_get_min_pstate,
> 		.get_turbo = knl_get_turbo_pstate,
> 		.get_scaling = core_get_scaling,
> -		.set = core_set_pstate,
> +		.get_val = core_get_val,
> 		.get_target_pstate = get_target_pstate_use_performance,
> 	},
> };
> @@ -839,25 +837,24 @@ static void intel_pstate_get_min_max(str
> 	*min = clamp_t(int, min_perf, cpu->pstate.min_pstate, max_perf);
> }
> 
> -static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate, bool force)
> +static inline void intel_pstate_record_pstate(struct cpudata *cpu, int pstate)
> {
> -	int max_perf, min_perf;
> -
> -	if (force) {
> -		update_turbo_state();
> -
> -		intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
> -
> -		pstate = clamp_t(int, pstate, min_perf, max_perf);
> -
> -		if (pstate == cpu->pstate.current_pstate)
> -			return;
> -	}
> 	trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu);
> -
> 	cpu->pstate.current_pstate = pstate;
> +}
> 
> -	pstate_funcs.set(cpu, pstate);
> +static void intel_pstate_set_min_pstate(struct cpudata *cpu)
> +{
> +	int pstate = cpu->pstate.min_pstate;
> +
> +	intel_pstate_record_pstate(cpu, pstate);
> +	/*
> +	 * Generally, there is no guarantee that this code will always run on
> +	 * the CPU being updated, so force the register update to run on the
> +	 * right CPU.
> +	 */
> +	wrmsrl_on_cpu(cpu->cpu, MSR_IA32_PERF_CTL,
> +		      pstate_funcs.get_val(cpu, pstate));
> }
> 
> static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
> @@ -870,7 +867,8 @@ static void intel_pstate_get_cpu_pstates
> 
> 	if (pstate_funcs.get_vid)
> 		pstate_funcs.get_vid(cpu);
> -	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate, false);
> +
> +	intel_pstate_set_min_pstate(cpu);
> }
> 
> static inline void intel_pstate_calc_busy(struct cpudata *cpu)
> @@ -997,6 +995,21 @@ static inline int32_t get_target_pstate_
> 	return cpu->pstate.current_pstate - pid_calc(&cpu->pid, core_busy);
> }
> 
> +static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate)
> +{
> +	int max_perf, min_perf;
> +
> +	update_turbo_state();
> +
> +	intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
> +	pstate = clamp_t(int, pstate, min_perf, max_perf);
> +	if (pstate == cpu->pstate.current_pstate)
> +		return;
> +
> +	intel_pstate_record_pstate(cpu, pstate);
> +	wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu, pstate));
> +}
> +
> static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
> {
> 	int from, target_pstate;
> @@ -1006,7 +1019,7 @@ static inline void intel_pstate_adjust_b
> 
> 	target_pstate = pstate_funcs.get_target_pstate(cpu);
> 
> -	intel_pstate_set_pstate(cpu, target_pstate, true);
> +	intel_pstate_update_pstate(cpu, target_pstate);
> 
> 	sample = &cpu->sample;
> 	trace_pstate_sample(fp_toint(sample->core_pct_busy),
> @@ -1180,7 +1193,7 @@ static void intel_pstate_stop_cpu(struct
> 	if (hwp_active)
> 		return;
> 
> -	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate, false);
> +	intel_pstate_set_min_pstate(cpu);
> }
> 
> static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
> @@ -1255,7 +1268,7 @@ static void copy_cpu_funcs(struct pstate
> 	pstate_funcs.get_min   = funcs->get_min;
> 	pstate_funcs.get_turbo = funcs->get_turbo;
> 	pstate_funcs.get_scaling = funcs->get_scaling;
> -	pstate_funcs.set       = funcs->set;
> +	pstate_funcs.get_val   = funcs->get_val;
> 	pstate_funcs.get_vid   = funcs->get_vid;
> 	pstate_funcs.get_target_pstate = funcs->get_target_pstate;
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-18 14:36         ` Rafael J. Wysocki
  2016-03-18 16:13           ` Stephane Gasparini
@ 2016-03-18 17:35           ` Josh Boyer
  2016-03-18 22:23             ` Rafael J. Wysocki
  1 sibling, 1 reply; 19+ messages in thread
From: Josh Boyer @ 2016-03-18 17:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srinivas Pandruvada, Philippe Longepe, Len Brown, Viresh Kumar,
	Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Fri, Mar 18, 2016 at 10:36 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Friday, March 18, 2016 08:37:15 AM Josh Boyer wrote:
>> On Thu, Mar 17, 2016 at 8:20 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > On Thursday, March 17, 2016 12:44:54 PM Josh Boyer wrote:
>> >> On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> >> > On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
>> >> >> Hello,
>> >> >
>> >> > Hi,
>> >> >
>> >> >> I have an Intel Atom based NUC that is producing the following
>> >> >> backtraces on boot of Linus' tree as of last evening.  This does not
>> >> >> happen with a tree with top level commit 271ecc5253e2, but does happen
>> >> >> when using the tree mentioned in the subject with top level commit
>> >> >> 63e30271b04c.
>> >> >>
>> >> >> The first backtrace appears to be a warning because the intel_pstate
>> >> >> driver is calling wrmsrl_on_cpu when interrupts are disabled?  Not
>> >> >> sure on that one.
>> >> >>
>> >> >> The second backtrace is a lockdep report.  Both are from the same boot.
>> >> >
>> >> > OK, thanks for the report.
>> >> >
>> >> > Can you please try the patch below?
>> >> >
>> >> > I'm actually unsure if we can do that safely in general for Atom because
>> >> > of the initialization, but that's what Core does anyway.
>> >> >
>> >> > Srinivas, Philippe, why exactly do we need the wrmsrl_on_cpu() in
>> >> > atom_set_pstate()?  core_set_pstate() uses wrmsrl() and seems to be doing fine.
>> >> >
>> >> > ---
>> >> >  drivers/cpufreq/intel_pstate.c |    2 +-
>> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >
>> >> > Index: linux-pm/drivers/cpufreq/intel_pstate.c
>> >> > ===================================================================
>> >> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>> >> > +++ linux-pm/drivers/cpufreq/intel_pstate.c
>> >> > @@ -587,7 +587,7 @@ static void atom_set_pstate(struct cpuda
>> >> >
>> >> >         val |= vid;
>> >> >
>> >> > -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
>> >> > +       wrmsrl(MSR_IA32_PERF_CTL, val);
>> >> >  }
>> >> >
>> >> >  static int silvermont_get_scaling(void)
>> >> >
>> >>
>> >> I applied this on top of commit 09fd671ccb24 and the backtrace and
>> >> lockdep report both go away.  So yes, this seems to clear up the
>> >> issue.  I tested it on a variety of different CPU types and didn't
>> >> notice anything wrong on them either.
>> >
>> > The problems may show up during initialization and cleanup where one CPU
>> > may be running code trying to configure a different one.  In those cases
>> > wrmsrl_on_cpu() needs to be used.
>> >
>> > Let me cut a patch taking that into account.
>>
>> OK.  Happy to test when you have it ready.
>
> Thanks!
>
> Please test the patch below.
>
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
>
> After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with
> utilization update callbacks) wrmsrl_on_cpu() cannot be called in the
> intel_pstate_adjust_busy_pstate() path as that is executed with
> disabled interrupts.  However, atom_set_pstate() called from there
> via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
> IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
> smp_call_function_single().
>
> The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
> because intel_pstate_set_pstate() calling it is also invoked during
> the initialization and cleanup of the driver and in those cases it is
> not guaranteed to be run on the CPU that is being updated.  However,
> in the case when intel_pstate_set_pstate() is called by
> intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
> the register safely.  Moreover, intel_pstate_set_pstate() already
> contains code that only is executed if the function is called by
> intel_pstate_adjust_busy_pstate() and there is a special argument
> passed to it because of that.
>
> To fix the problem at hand, rearrange the code taking the above
> observations into account.
>
> First, replace the ->set() callback in struct pstate_funcs with a
> ->get_val() one that will return the value to be written to the
> IA32_PERF_CTL MSR without updating the register.
>
> Second, split intel_pstate_set_pstate() into two functions,
> intel_pstate_update_pstate() to be called by
> intel_pstate_adjust_busy_pstate() that will contain all of the
> intel_pstate_set_pstate() code which only needs to be executed in
> that case and will use wrmsrl() to update the MSR (after obtaining
> the value to write to it from the ->get_val() callback), and
> intel_pstate_set_min_pstate() to be invoked during the
> initialization and cleanup that will set the P-state to the
> minimum one and will update the MSR using wrmsrl_on_cpu().
>
> Finally, move the code shared between intel_pstate_update_pstate()
> and intel_pstate_set_min_pstate() to a new static inline function
> intel_pstate_record_pstate() and make them both call it.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)

Tested-by: Josh Boyer <jwboyer@fedoraproject.org>

This worked fine on the original problem machine, and the other
machines I also tested.  No backtrace or lockdeps reported.

josh

> ---
>  drivers/cpufreq/intel_pstate.c |   73 ++++++++++++++++++++++++-----------------
>  1 file changed, 43 insertions(+), 30 deletions(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -134,7 +134,7 @@ struct pstate_funcs {
>         int (*get_min)(void);
>         int (*get_turbo)(void);
>         int (*get_scaling)(void);
> -       void (*set)(struct cpudata*, int pstate);
> +       u64 (*get_val)(struct cpudata*, int pstate);
>         void (*get_vid)(struct cpudata *);
>         int32_t (*get_target_pstate)(struct cpudata *);
>  };
> @@ -565,7 +565,7 @@ static int atom_get_turbo_pstate(void)
>         return value & 0x7F;
>  }
>
> -static void atom_set_pstate(struct cpudata *cpudata, int pstate)
> +static u64 atom_get_val(struct cpudata *cpudata, int pstate)
>  {
>         u64 val;
>         int32_t vid_fp;
> @@ -585,9 +585,7 @@ static void atom_set_pstate(struct cpuda
>         if (pstate > cpudata->pstate.max_pstate)
>                 vid = cpudata->vid.turbo;
>
> -       val |= vid;
> -
> -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
> +       return val | vid;
>  }
>
>  static int silvermont_get_scaling(void)
> @@ -711,7 +709,7 @@ static inline int core_get_scaling(void)
>         return 100000;
>  }
>
> -static void core_set_pstate(struct cpudata *cpudata, int pstate)
> +static u64 core_get_val(struct cpudata *cpudata, int pstate)
>  {
>         u64 val;
>
> @@ -719,7 +717,7 @@ static void core_set_pstate(struct cpuda
>         if (limits->no_turbo && !limits->turbo_disabled)
>                 val |= (u64)1 << 32;
>
> -       wrmsrl(MSR_IA32_PERF_CTL, val);
> +       return val;
>  }
>
>  static int knl_get_turbo_pstate(void)
> @@ -750,7 +748,7 @@ static struct cpu_defaults core_params =
>                 .get_min = core_get_min_pstate,
>                 .get_turbo = core_get_turbo_pstate,
>                 .get_scaling = core_get_scaling,
> -               .set = core_set_pstate,
> +               .get_val = core_get_val,
>                 .get_target_pstate = get_target_pstate_use_performance,
>         },
>  };
> @@ -769,7 +767,7 @@ static struct cpu_defaults silvermont_pa
>                 .get_max_physical = atom_get_max_pstate,
>                 .get_min = atom_get_min_pstate,
>                 .get_turbo = atom_get_turbo_pstate,
> -               .set = atom_set_pstate,
> +               .get_val = atom_get_val,
>                 .get_scaling = silvermont_get_scaling,
>                 .get_vid = atom_get_vid,
>                 .get_target_pstate = get_target_pstate_use_cpu_load,
> @@ -790,7 +788,7 @@ static struct cpu_defaults airmont_param
>                 .get_max_physical = atom_get_max_pstate,
>                 .get_min = atom_get_min_pstate,
>                 .get_turbo = atom_get_turbo_pstate,
> -               .set = atom_set_pstate,
> +               .get_val = atom_get_val,
>                 .get_scaling = airmont_get_scaling,
>                 .get_vid = atom_get_vid,
>                 .get_target_pstate = get_target_pstate_use_cpu_load,
> @@ -812,7 +810,7 @@ static struct cpu_defaults knl_params =
>                 .get_min = core_get_min_pstate,
>                 .get_turbo = knl_get_turbo_pstate,
>                 .get_scaling = core_get_scaling,
> -               .set = core_set_pstate,
> +               .get_val = core_get_val,
>                 .get_target_pstate = get_target_pstate_use_performance,
>         },
>  };
> @@ -839,25 +837,24 @@ static void intel_pstate_get_min_max(str
>         *min = clamp_t(int, min_perf, cpu->pstate.min_pstate, max_perf);
>  }
>
> -static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate, bool force)
> +static inline void intel_pstate_record_pstate(struct cpudata *cpu, int pstate)
>  {
> -       int max_perf, min_perf;
> -
> -       if (force) {
> -               update_turbo_state();
> -
> -               intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
> -
> -               pstate = clamp_t(int, pstate, min_perf, max_perf);
> -
> -               if (pstate == cpu->pstate.current_pstate)
> -                       return;
> -       }
>         trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu);
> -
>         cpu->pstate.current_pstate = pstate;
> +}
>
> -       pstate_funcs.set(cpu, pstate);
> +static void intel_pstate_set_min_pstate(struct cpudata *cpu)
> +{
> +       int pstate = cpu->pstate.min_pstate;
> +
> +       intel_pstate_record_pstate(cpu, pstate);
> +       /*
> +        * Generally, there is no guarantee that this code will always run on
> +        * the CPU being updated, so force the register update to run on the
> +        * right CPU.
> +        */
> +       wrmsrl_on_cpu(cpu->cpu, MSR_IA32_PERF_CTL,
> +                     pstate_funcs.get_val(cpu, pstate));
>  }
>
>  static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
> @@ -870,7 +867,8 @@ static void intel_pstate_get_cpu_pstates
>
>         if (pstate_funcs.get_vid)
>                 pstate_funcs.get_vid(cpu);
> -       intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate, false);
> +
> +       intel_pstate_set_min_pstate(cpu);
>  }
>
>  static inline void intel_pstate_calc_busy(struct cpudata *cpu)
> @@ -997,6 +995,21 @@ static inline int32_t get_target_pstate_
>         return cpu->pstate.current_pstate - pid_calc(&cpu->pid, core_busy);
>  }
>
> +static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate)
> +{
> +       int max_perf, min_perf;
> +
> +       update_turbo_state();
> +
> +       intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
> +       pstate = clamp_t(int, pstate, min_perf, max_perf);
> +       if (pstate == cpu->pstate.current_pstate)
> +               return;
> +
> +       intel_pstate_record_pstate(cpu, pstate);
> +       wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu, pstate));
> +}
> +
>  static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
>  {
>         int from, target_pstate;
> @@ -1006,7 +1019,7 @@ static inline void intel_pstate_adjust_b
>
>         target_pstate = pstate_funcs.get_target_pstate(cpu);
>
> -       intel_pstate_set_pstate(cpu, target_pstate, true);
> +       intel_pstate_update_pstate(cpu, target_pstate);
>
>         sample = &cpu->sample;
>         trace_pstate_sample(fp_toint(sample->core_pct_busy),
> @@ -1180,7 +1193,7 @@ static void intel_pstate_stop_cpu(struct
>         if (hwp_active)
>                 return;
>
> -       intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate, false);
> +       intel_pstate_set_min_pstate(cpu);
>  }
>
>  static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
> @@ -1255,7 +1268,7 @@ static void copy_cpu_funcs(struct pstate
>         pstate_funcs.get_min   = funcs->get_min;
>         pstate_funcs.get_turbo = funcs->get_turbo;
>         pstate_funcs.get_scaling = funcs->get_scaling;
> -       pstate_funcs.set       = funcs->set;
> +       pstate_funcs.get_val   = funcs->get_val;
>         pstate_funcs.get_vid   = funcs->get_vid;
>         pstate_funcs.get_target_pstate = funcs->get_target_pstate;
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-18 16:13           ` Stephane Gasparini
@ 2016-03-18 17:52             ` Srinivas Pandruvada
  2016-03-18 18:32               ` Stephane Gasparini
  2016-03-21  9:28               ` Stephane Gasparini
  0 siblings, 2 replies; 19+ messages in thread
From: Srinivas Pandruvada @ 2016-03-18 17:52 UTC (permalink / raw)
  To: Stephane Gasparini, Rafael J. Wysocki
  Cc: Josh Boyer, Philippe Longepe, Len Brown, Viresh Kumar,
	Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Fri, 2016-03-18 at 17:13 +0100, Stephane Gasparini wrote:
> Rafael,
> 
> Why in step 3) both atom_set_pstate() and atom_set_pstate() were not
> both
> changed to use wrmsrl ?
Initial Atom support was experimental as there were no users, till
Chrome started using. So it was just a miss.

We should never have to use wrmsrl_on_cpu. But looks like
cpufreq_driver.init() can't guarantee that.

> BTW, what is the interest of setting the pstate to LFM during
> initialization ?
> The BIOS is setting the pstate to either LFM, HFM or BFM, and why
> bothering
> changing it.
This is a different issue. BIOS has different configuration option to
enable fast boot modes which are not necessarily optimized for Linux.
Some aggressive setting will force system to reboot on boot. So I will
leave the way it is.

Thanks,
Srinivas

>  
> —
> Steph
> 
> 
> 
> 
> > 
> > On Mar 18, 2016, at 3:36 PM, Rafael J. Wysocki <rjw@rjwysocki.net>
> > wrote:
> > 
> > On Friday, March 18, 2016 08:37:15 AM Josh Boyer wrote:
> > > 
> > > On Thu, Mar 17, 2016 at 8:20 PM, Rafael J. Wysocki <rjw@rjwysocki
> > > .net> wrote:
> > > > 
> > > > On Thursday, March 17, 2016 12:44:54 PM Josh Boyer wrote:
> > > > > 
> > > > > On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki <rjw@rjwy
> > > > > socki.net> wrote:
> > > > > > 
> > > > > > On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
> > > > > > > 
> > > > > > > Hello,
> > > > > > Hi,
> > > > > > 
> > > > > > > 
> > > > > > > I have an Intel Atom based NUC that is producing the
> > > > > > > following
> > > > > > > backtraces on boot of Linus' tree as of last
> > > > > > > evening.  This does not
> > > > > > > happen with a tree with top level commit 271ecc5253e2,
> > > > > > > but does happen
> > > > > > > when using the tree mentioned in the subject with top
> > > > > > > level commit
> > > > > > > 63e30271b04c.
> > > > > > > 
> > > > > > > The first backtrace appears to be a warning because the
> > > > > > > intel_pstate
> > > > > > > driver is calling wrmsrl_on_cpu when interrupts are
> > > > > > > disabled?  Not
> > > > > > > sure on that one.
> > > > > > > 
> > > > > > > The second backtrace is a lockdep report.  Both are from
> > > > > > > the same boot.
> > > > > > OK, thanks for the report.
> > > > > > 
> > > > > > Can you please try the patch below?
> > > > > > 
> > > > > > I'm actually unsure if we can do that safely in general for
> > > > > > Atom because
> > > > > > of the initialization, but that's what Core does anyway.
> > > > > > 
> > > > > > Srinivas, Philippe, why exactly do we need the
> > > > > > wrmsrl_on_cpu() in
> > > > > > atom_set_pstate()?  core_set_pstate() uses wrmsrl() and
> > > > > > seems to be doing fine.
> > > > > > 
> > > > > > ---
> > > > > > drivers/cpufreq/intel_pstate.c |    2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > Index: linux-pm/drivers/cpufreq/intel_pstate.c
> > > > > > ===========================================================
> > > > > > ========
> > > > > > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> > > > > > +++ linux-pm/drivers/cpufreq/intel_pstate.c
> > > > > > @@ -587,7 +587,7 @@ static void atom_set_pstate(struct
> > > > > > cpuda
> > > > > > 
> > > > > >        val |= vid;
> > > > > > 
> > > > > > -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL,
> > > > > > val);
> > > > > > +       wrmsrl(MSR_IA32_PERF_CTL, val);
> > > > > > }
> > > > > > 
> > > > > > static int silvermont_get_scaling(void)
> > > > > > 
> > > > > I applied this on top of commit 09fd671ccb24 and the
> > > > > backtrace and
> > > > > lockdep report both go away.  So yes, this seems to clear up
> > > > > the
> > > > > issue.  I tested it on a variety of different CPU types and
> > > > > didn't
> > > > > notice anything wrong on them either.
> > > > The problems may show up during initialization and cleanup
> > > > where one CPU
> > > > may be running code trying to configure a different one.  In
> > > > those cases
> > > > wrmsrl_on_cpu() needs to be used.
> > > > 
> > > > Let me cut a patch taking that into account.
> > > OK.  Happy to test when you have it ready.
> > Thanks!
> > 
> > Please test the patch below.
> > 
> > ---
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Subject: [PATCH] intel_pstate: Do not call wrmsrl_on_cpu() with
> > disabled interrupts
> > 
> > After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers
> > with
> > utilization update callbacks) wrmsrl_on_cpu() cannot be called in
> > the
> > intel_pstate_adjust_busy_pstate() path as that is executed with
> > disabled interrupts.  However, atom_set_pstate() called from there
> > via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
> > IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
> > smp_call_function_single().
> > 
> > The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
> > because intel_pstate_set_pstate() calling it is also invoked during
> > the initialization and cleanup of the driver and in those cases it
> > is
> > not guaranteed to be run on the CPU that is being
> > updated.  However,
> > in the case when intel_pstate_set_pstate() is called by
> > intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
> > the register safely.  Moreover, intel_pstate_set_pstate() already
> > contains code that only is executed if the function is called by
> > intel_pstate_adjust_busy_pstate() and there is a special argument
> > passed to it because of that.
> > 
> > To fix the problem at hand, rearrange the code taking the above
> > observations into account.
> > 
> > First, replace the ->set() callback in struct pstate_funcs with a
> > ->get_val() one that will return the value to be written to the
> > IA32_PERF_CTL MSR without updating the register.
> > 
> > Second, split intel_pstate_set_pstate() into two functions,
> > intel_pstate_update_pstate() to be called by
> > intel_pstate_adjust_busy_pstate() that will contain all of the
> > intel_pstate_set_pstate() code which only needs to be executed in
> > that case and will use wrmsrl() to update the MSR (after obtaining
> > the value to write to it from the ->get_val() callback), and
> > intel_pstate_set_min_pstate() to be invoked during the
> > initialization and cleanup that will set the P-state to the
> > minimum one and will update the MSR using wrmsrl_on_cpu().
> > 
> > Finally, move the code shared between intel_pstate_update_pstate()
> > and intel_pstate_set_min_pstate() to a new static inline function
> > intel_pstate_record_pstate() and make them both call it.
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with
> > utilization update callbacks)
> > ---
> > drivers/cpufreq/intel_pstate.c |   73 ++++++++++++++++++++++++-----
> > ------------
> > 1 file changed, 43 insertions(+), 30 deletions(-)
> > 
> > Index: linux-pm/drivers/cpufreq/intel_pstate.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> > +++ linux-pm/drivers/cpufreq/intel_pstate.c
> > @@ -134,7 +134,7 @@ struct pstate_funcs {
> > 	int (*get_min)(void);
> > 	int (*get_turbo)(void);
> > 	int (*get_scaling)(void);
> > -	void (*set)(struct cpudata*, int pstate);
> > +	u64 (*get_val)(struct cpudata*, int pstate);
> > 	void (*get_vid)(struct cpudata *);
> > 	int32_t (*get_target_pstate)(struct cpudata *);
> > };
> > @@ -565,7 +565,7 @@ static int atom_get_turbo_pstate(void)
> > 	return value & 0x7F;
> > }
> > 
> > -static void atom_set_pstate(struct cpudata *cpudata, int pstate)
> > +static u64 atom_get_val(struct cpudata *cpudata, int pstate)
> > {
> > 	u64 val;
> > 	int32_t vid_fp;
> > @@ -585,9 +585,7 @@ static void atom_set_pstate(struct cpuda
> > 	if (pstate > cpudata->pstate.max_pstate)
> > 		vid = cpudata->vid.turbo;
> > 
> > -	val |= vid;
> > -
> > -	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
> > +	return val | vid;
> > }
> > 
> > static int silvermont_get_scaling(void)
> > @@ -711,7 +709,7 @@ static inline int core_get_scaling(void)
> > 	return 100000;
> > }
> > 
> > -static void core_set_pstate(struct cpudata *cpudata, int pstate)
> > +static u64 core_get_val(struct cpudata *cpudata, int pstate)
> > {
> > 	u64 val;
> > 
> > @@ -719,7 +717,7 @@ static void core_set_pstate(struct cpuda
> > 	if (limits->no_turbo && !limits->turbo_disabled)
> > 		val |= (u64)1 << 32;
> > 
> > -	wrmsrl(MSR_IA32_PERF_CTL, val);
> > +	return val;
> > }
> > 
> > static int knl_get_turbo_pstate(void)
> > @@ -750,7 +748,7 @@ static struct cpu_defaults core_params =
> > 		.get_min = core_get_min_pstate,
> > 		.get_turbo = core_get_turbo_pstate,
> > 		.get_scaling = core_get_scaling,
> > -		.set = core_set_pstate,
> > +		.get_val = core_get_val,
> > 		.get_target_pstate = get_target_pstate_use_performance,
> > 	},
> > };
> > @@ -769,7 +767,7 @@ static struct cpu_defaults silvermont_pa
> > 		.get_max_physical = atom_get_max_pstate,
> > 		.get_min = atom_get_min_pstate,
> > 		.get_turbo = atom_get_turbo_pstate,
> > -		.set = atom_set_pstate,
> > +		.get_val = atom_get_val,
> > 		.get_scaling = silvermont_get_scaling,
> > 		.get_vid = atom_get_vid,
> > 		.get_target_pstate = get_target_pstate_use_cpu_load,
> > @@ -790,7 +788,7 @@ static struct cpu_defaults airmont_param
> > 		.get_max_physical = atom_get_max_pstate,
> > 		.get_min = atom_get_min_pstate,
> > 		.get_turbo = atom_get_turbo_pstate,
> > -		.set = atom_set_pstate,
> > +		.get_val = atom_get_val,
> > 		.get_scaling = airmont_get_scaling,
> > 		.get_vid = atom_get_vid,
> > 		.get_target_pstate = get_target_pstate_use_cpu_load,
> > @@ -812,7 +810,7 @@ static struct cpu_defaults knl_params =
> > 		.get_min = core_get_min_pstate,
> > 		.get_turbo = knl_get_turbo_pstate,
> > 		.get_scaling = core_get_scaling,
> > -		.set = core_set_pstate,
> > +		.get_val = core_get_val,
> > 		.get_target_pstate = get_target_pstate_use_performance,
> > 	},
> > };
> > @@ -839,25 +837,24 @@ static void intel_pstate_get_min_max(str
> > 	*min = clamp_t(int, min_perf, cpu->pstate.min_pstate,
> > max_perf);
> > }
> > 
> > -static void intel_pstate_set_pstate(struct cpudata *cpu, int
> > pstate, bool force)
> > +static inline void intel_pstate_record_pstate(struct cpudata *cpu,
> > int pstate)
> > {
> > -	int max_perf, min_perf;
> > -
> > -	if (force) {
> > -		update_turbo_state();
> > -
> > -		intel_pstate_get_min_max(cpu, &min_perf,
> > &max_perf);
> > -
> > -		pstate = clamp_t(int, pstate, min_perf, max_perf);
> > -
> > -		if (pstate == cpu->pstate.current_pstate)
> > -			return;
> > -	}
> > 	trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu);
> > -
> > 	cpu->pstate.current_pstate = pstate;
> > +}
> > 
> > -	pstate_funcs.set(cpu, pstate);
> > +static void intel_pstate_set_min_pstate(struct cpudata *cpu)
> > +{
> > +	int pstate = cpu->pstate.min_pstate;
> > +
> > +	intel_pstate_record_pstate(cpu, pstate);
> > +	/*
> > +	 * Generally, there is no guarantee that this code will
> > always run on
> > +	 * the CPU being updated, so force the register update to
> > run on the
> > +	 * right CPU.
> > +	 */
> > +	wrmsrl_on_cpu(cpu->cpu, MSR_IA32_PERF_CTL,
> > +		      pstate_funcs.get_val(cpu, pstate));
> > }
> > 
> > static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
> > @@ -870,7 +867,8 @@ static void intel_pstate_get_cpu_pstates
> > 
> > 	if (pstate_funcs.get_vid)
> > 		pstate_funcs.get_vid(cpu);
> > -	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate,
> > false);
> > +
> > +	intel_pstate_set_min_pstate(cpu);
> > }
> > 
> > static inline void intel_pstate_calc_busy(struct cpudata *cpu)
> > @@ -997,6 +995,21 @@ static inline int32_t get_target_pstate_
> > 	return cpu->pstate.current_pstate - pid_calc(&cpu->pid,
> > core_busy);
> > }
> > 
> > +static inline void intel_pstate_update_pstate(struct cpudata *cpu,
> > int pstate)
> > +{
> > +	int max_perf, min_perf;
> > +
> > +	update_turbo_state();
> > +
> > +	intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
> > +	pstate = clamp_t(int, pstate, min_perf, max_perf);
> > +	if (pstate == cpu->pstate.current_pstate)
> > +		return;
> > +
> > +	intel_pstate_record_pstate(cpu, pstate);
> > +	wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu,
> > pstate));
> > +}
> > +
> > static inline void intel_pstate_adjust_busy_pstate(struct cpudata
> > *cpu)
> > {
> > 	int from, target_pstate;
> > @@ -1006,7 +1019,7 @@ static inline void intel_pstate_adjust_b
> > 
> > 	target_pstate = pstate_funcs.get_target_pstate(cpu);
> > 
> > -	intel_pstate_set_pstate(cpu, target_pstate, true);
> > +	intel_pstate_update_pstate(cpu, target_pstate);
> > 
> > 	sample = &cpu->sample;
> > 	trace_pstate_sample(fp_toint(sample->core_pct_busy),
> > @@ -1180,7 +1193,7 @@ static void intel_pstate_stop_cpu(struct
> > 	if (hwp_active)
> > 		return;
> > 
> > -	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate,
> > false);
> > +	intel_pstate_set_min_pstate(cpu);
> > }
> > 
> > static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
> > @@ -1255,7 +1268,7 @@ static void copy_cpu_funcs(struct pstate
> > 	pstate_funcs.get_min   = funcs->get_min;
> > 	pstate_funcs.get_turbo = funcs->get_turbo;
> > 	pstate_funcs.get_scaling = funcs->get_scaling;
> > -	pstate_funcs.set       = funcs->set;
> > +	pstate_funcs.get_val   = funcs->get_val;
> > 	pstate_funcs.get_vid   = funcs->get_vid;
> > 	pstate_funcs.get_target_pstate = funcs->get_target_pstate;
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pm" 
> > in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-18 17:52             ` Srinivas Pandruvada
@ 2016-03-18 18:32               ` Stephane Gasparini
  2016-03-18 21:44                 ` Rafael J. Wysocki
  2016-03-21  9:28               ` Stephane Gasparini
  1 sibling, 1 reply; 19+ messages in thread
From: Stephane Gasparini @ 2016-03-18 18:32 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Josh Boyer, Philippe Longepe, Len Brown,
	Viresh Kumar, Linux PM list, Linux-Kernel@Vger. Kernel. Org


—
Steph




> On Mar 18, 2016, at 6:52 PM, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
> 
> On Fri, 2016-03-18 at 17:13 +0100, Stephane Gasparini wrote:
>> Rafael,
>> 
>> Why in step 3) both atom_set_pstate() and atom_set_pstate() were not
>> both
>> changed to use wrmsrl ?
> Initial Atom support was experimental as there were no users, till
> Chrome started using. So it was just a miss.
> 
> We should never have to use wrmsrl_on_cpu. But looks like
> cpufreq_driver.init() can't guarantee that.
> 
>> BTW, what is the interest of setting the pstate to LFM during
>> initialization ?
>> The BIOS is setting the pstate to either LFM, HFM or BFM, and why
>> bothering
>> changing it.
> This is a different issue. BIOS has different configuration option to
> enable fast boot modes which are not necessarily optimized for Linux.
> Some aggressive setting will force system to reboot on boot. So I will
> leave the way it is.
> 
> Thanks,
> Srinivas
> 

Still it does not answer my question, why when implementing a4675fbc4a7a
we did changed core for wrmsrl and not atom ?

My point is that the issue was more due to a miss in the patch a4675fbc4a7a
rather than a difference of behavior between atom and core.

The commit message is a bit misleading around this.
The wrmrl_on_cpu() is needed on both core and atom during init.


>>  
>> —
>> Steph
>> 
>> 
>> 
>> 
>>> 
>>> On Mar 18, 2016, at 3:36 PM, Rafael J. Wysocki <rjw@rjwysocki.net>
>>> wrote:
>>> 
>>> On Friday, March 18, 2016 08:37:15 AM Josh Boyer wrote:
>>>> 
>>>> On Thu, Mar 17, 2016 at 8:20 PM, Rafael J. Wysocki <rjw@rjwysocki
>>>> .net> wrote:
>>>>> 
>>>>> On Thursday, March 17, 2016 12:44:54 PM Josh Boyer wrote:
>>>>>> 
>>>>>> On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki <rjw@rjwy
>>>>>> socki.net> wrote:
>>>>>>> 
>>>>>>> On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>> Hi,
>>>>>>> 
>>>>>>>> 
>>>>>>>> I have an Intel Atom based NUC that is producing the
>>>>>>>> following
>>>>>>>> backtraces on boot of Linus' tree as of last
>>>>>>>> evening.  This does not
>>>>>>>> happen with a tree with top level commit 271ecc5253e2,
>>>>>>>> but does happen
>>>>>>>> when using the tree mentioned in the subject with top
>>>>>>>> level commit
>>>>>>>> 63e30271b04c.
>>>>>>>> 
>>>>>>>> The first backtrace appears to be a warning because the
>>>>>>>> intel_pstate
>>>>>>>> driver is calling wrmsrl_on_cpu when interrupts are
>>>>>>>> disabled?  Not
>>>>>>>> sure on that one.
>>>>>>>> 
>>>>>>>> The second backtrace is a lockdep report.  Both are from
>>>>>>>> the same boot.
>>>>>>> OK, thanks for the report.
>>>>>>> 
>>>>>>> Can you please try the patch below?
>>>>>>> 
>>>>>>> I'm actually unsure if we can do that safely in general for
>>>>>>> Atom because
>>>>>>> of the initialization, but that's what Core does anyway.
>>>>>>> 
>>>>>>> Srinivas, Philippe, why exactly do we need the
>>>>>>> wrmsrl_on_cpu() in
>>>>>>> atom_set_pstate()?  core_set_pstate() uses wrmsrl() and
>>>>>>> seems to be doing fine.
>>>>>>> 
>>>>>>> ---
>>>>>>> drivers/cpufreq/intel_pstate.c |    2 +-
>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>> 
>>>>>>> Index: linux-pm/drivers/cpufreq/intel_pstate.c
>>>>>>> ===========================================================
>>>>>>> ========
>>>>>>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>>>>>>> +++ linux-pm/drivers/cpufreq/intel_pstate.c
>>>>>>> @@ -587,7 +587,7 @@ static void atom_set_pstate(struct
>>>>>>> cpuda
>>>>>>> 
>>>>>>>        val |= vid;
>>>>>>> 
>>>>>>> -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL,
>>>>>>> val);
>>>>>>> +       wrmsrl(MSR_IA32_PERF_CTL, val);
>>>>>>> }
>>>>>>> 
>>>>>>> static int silvermont_get_scaling(void)
>>>>>>> 
>>>>>> I applied this on top of commit 09fd671ccb24 and the
>>>>>> backtrace and
>>>>>> lockdep report both go away.  So yes, this seems to clear up
>>>>>> the
>>>>>> issue.  I tested it on a variety of different CPU types and
>>>>>> didn't
>>>>>> notice anything wrong on them either.
>>>>> The problems may show up during initialization and cleanup
>>>>> where one CPU
>>>>> may be running code trying to configure a different one.  In
>>>>> those cases
>>>>> wrmsrl_on_cpu() needs to be used.
>>>>> 
>>>>> Let me cut a patch taking that into account.
>>>> OK.  Happy to test when you have it ready.
>>> Thanks!
>>> 
>>> Please test the patch below.
>>> 
>>> ---
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> Subject: [PATCH] intel_pstate: Do not call wrmsrl_on_cpu() with
>>> disabled interrupts
>>> 
>>> After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers
>>> with
>>> utilization update callbacks) wrmsrl_on_cpu() cannot be called in
>>> the
>>> intel_pstate_adjust_busy_pstate() path as that is executed with
>>> disabled interrupts.  However, atom_set_pstate() called from there
>>> via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
>>> IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
>>> smp_call_function_single().
>>> 
>>> The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
>>> because intel_pstate_set_pstate() calling it is also invoked during
>>> the initialization and cleanup of the driver and in those cases it
>>> is
>>> not guaranteed to be run on the CPU that is being
>>> updated.  However,
>>> in the case when intel_pstate_set_pstate() is called by
>>> intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
>>> the register safely.  Moreover, intel_pstate_set_pstate() already
>>> contains code that only is executed if the function is called by
>>> intel_pstate_adjust_busy_pstate() and there is a special argument
>>> passed to it because of that.
>>> 
>>> To fix the problem at hand, rearrange the code taking the above
>>> observations into account.
>>> 
>>> First, replace the ->set() callback in struct pstate_funcs with a
>>> ->get_val() one that will return the value to be written to the
>>> IA32_PERF_CTL MSR without updating the register.
>>> 
>>> Second, split intel_pstate_set_pstate() into two functions,
>>> intel_pstate_update_pstate() to be called by
>>> intel_pstate_adjust_busy_pstate() that will contain all of the
>>> intel_pstate_set_pstate() code which only needs to be executed in
>>> that case and will use wrmsrl() to update the MSR (after obtaining
>>> the value to write to it from the ->get_val() callback), and
>>> intel_pstate_set_min_pstate() to be invoked during the
>>> initialization and cleanup that will set the P-state to the
>>> minimum one and will update the MSR using wrmsrl_on_cpu().
>>> 
>>> Finally, move the code shared between intel_pstate_update_pstate()
>>> and intel_pstate_set_min_pstate() to a new static inline function
>>> intel_pstate_record_pstate() and make them both call it.
>>> 
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with
>>> utilization update callbacks)
>>> ---
>>> drivers/cpufreq/intel_pstate.c |   73 ++++++++++++++++++++++++-----
>>> ------------
>>> 1 file changed, 43 insertions(+), 30 deletions(-)
>>> 
>>> Index: linux-pm/drivers/cpufreq/intel_pstate.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>>> +++ linux-pm/drivers/cpufreq/intel_pstate.c
>>> @@ -134,7 +134,7 @@ struct pstate_funcs {
>>> 	int (*get_min)(void);
>>> 	int (*get_turbo)(void);
>>> 	int (*get_scaling)(void);
>>> -	void (*set)(struct cpudata*, int pstate);
>>> +	u64 (*get_val)(struct cpudata*, int pstate);
>>> 	void (*get_vid)(struct cpudata *);
>>> 	int32_t (*get_target_pstate)(struct cpudata *);
>>> };
>>> @@ -565,7 +565,7 @@ static int atom_get_turbo_pstate(void)
>>> 	return value & 0x7F;
>>> }
>>> 
>>> -static void atom_set_pstate(struct cpudata *cpudata, int pstate)
>>> +static u64 atom_get_val(struct cpudata *cpudata, int pstate)
>>> {
>>> 	u64 val;
>>> 	int32_t vid_fp;
>>> @@ -585,9 +585,7 @@ static void atom_set_pstate(struct cpuda
>>> 	if (pstate > cpudata->pstate.max_pstate)
>>> 		vid = cpudata->vid.turbo;
>>> 
>>> -	val |= vid;
>>> -
>>> -	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
>>> +	return val | vid;
>>> }
>>> 
>>> static int silvermont_get_scaling(void)
>>> @@ -711,7 +709,7 @@ static inline int core_get_scaling(void)
>>> 	return 100000;
>>> }
>>> 
>>> -static void core_set_pstate(struct cpudata *cpudata, int pstate)
>>> +static u64 core_get_val(struct cpudata *cpudata, int pstate)
>>> {
>>> 	u64 val;
>>> 
>>> @@ -719,7 +717,7 @@ static void core_set_pstate(struct cpuda
>>> 	if (limits->no_turbo && !limits->turbo_disabled)
>>> 		val |= (u64)1 << 32;
>>> 
>>> -	wrmsrl(MSR_IA32_PERF_CTL, val);
>>> +	return val;
>>> }
>>> 
>>> static int knl_get_turbo_pstate(void)
>>> @@ -750,7 +748,7 @@ static struct cpu_defaults core_params =
>>> 		.get_min = core_get_min_pstate,
>>> 		.get_turbo = core_get_turbo_pstate,
>>> 		.get_scaling = core_get_scaling,
>>> -		.set = core_set_pstate,
>>> +		.get_val = core_get_val,
>>> 		.get_target_pstate = get_target_pstate_use_performance,
>>> 	},
>>> };
>>> @@ -769,7 +767,7 @@ static struct cpu_defaults silvermont_pa
>>> 		.get_max_physical = atom_get_max_pstate,
>>> 		.get_min = atom_get_min_pstate,
>>> 		.get_turbo = atom_get_turbo_pstate,
>>> -		.set = atom_set_pstate,
>>> +		.get_val = atom_get_val,
>>> 		.get_scaling = silvermont_get_scaling,
>>> 		.get_vid = atom_get_vid,
>>> 		.get_target_pstate = get_target_pstate_use_cpu_load,
>>> @@ -790,7 +788,7 @@ static struct cpu_defaults airmont_param
>>> 		.get_max_physical = atom_get_max_pstate,
>>> 		.get_min = atom_get_min_pstate,
>>> 		.get_turbo = atom_get_turbo_pstate,
>>> -		.set = atom_set_pstate,
>>> +		.get_val = atom_get_val,
>>> 		.get_scaling = airmont_get_scaling,
>>> 		.get_vid = atom_get_vid,
>>> 		.get_target_pstate = get_target_pstate_use_cpu_load,
>>> @@ -812,7 +810,7 @@ static struct cpu_defaults knl_params =
>>> 		.get_min = core_get_min_pstate,
>>> 		.get_turbo = knl_get_turbo_pstate,
>>> 		.get_scaling = core_get_scaling,
>>> -		.set = core_set_pstate,
>>> +		.get_val = core_get_val,
>>> 		.get_target_pstate = get_target_pstate_use_performance,
>>> 	},
>>> };
>>> @@ -839,25 +837,24 @@ static void intel_pstate_get_min_max(str
>>> 	*min = clamp_t(int, min_perf, cpu->pstate.min_pstate,
>>> max_perf);
>>> }
>>> 
>>> -static void intel_pstate_set_pstate(struct cpudata *cpu, int
>>> pstate, bool force)
>>> +static inline void intel_pstate_record_pstate(struct cpudata *cpu,
>>> int pstate)
>>> {
>>> -	int max_perf, min_perf;
>>> -
>>> -	if (force) {
>>> -		update_turbo_state();
>>> -
>>> -		intel_pstate_get_min_max(cpu, &min_perf,
>>> &max_perf);
>>> -
>>> -		pstate = clamp_t(int, pstate, min_perf, max_perf);
>>> -
>>> -		if (pstate == cpu->pstate.current_pstate)
>>> -			return;
>>> -	}
>>> 	trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu);
>>> -
>>> 	cpu->pstate.current_pstate = pstate;
>>> +}
>>> 
>>> -	pstate_funcs.set(cpu, pstate);
>>> +static void intel_pstate_set_min_pstate(struct cpudata *cpu)
>>> +{
>>> +	int pstate = cpu->pstate.min_pstate;
>>> +
>>> +	intel_pstate_record_pstate(cpu, pstate);
>>> +	/*
>>> +	 * Generally, there is no guarantee that this code will
>>> always run on
>>> +	 * the CPU being updated, so force the register update to
>>> run on the
>>> +	 * right CPU.
>>> +	 */
>>> +	wrmsrl_on_cpu(cpu->cpu, MSR_IA32_PERF_CTL,
>>> +		      pstate_funcs.get_val(cpu, pstate));
>>> }
>>> 
>>> static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
>>> @@ -870,7 +867,8 @@ static void intel_pstate_get_cpu_pstates
>>> 
>>> 	if (pstate_funcs.get_vid)
>>> 		pstate_funcs.get_vid(cpu);
>>> -	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate,
>>> false);
>>> +
>>> +	intel_pstate_set_min_pstate(cpu);
>>> }
>>> 
>>> static inline void intel_pstate_calc_busy(struct cpudata *cpu)
>>> @@ -997,6 +995,21 @@ static inline int32_t get_target_pstate_
>>> 	return cpu->pstate.current_pstate - pid_calc(&cpu->pid,
>>> core_busy);
>>> }
>>> 
>>> +static inline void intel_pstate_update_pstate(struct cpudata *cpu,
>>> int pstate)
>>> +{
>>> +	int max_perf, min_perf;
>>> +
>>> +	update_turbo_state();
>>> +
>>> +	intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
>>> +	pstate = clamp_t(int, pstate, min_perf, max_perf);
>>> +	if (pstate == cpu->pstate.current_pstate)
>>> +		return;
>>> +
>>> +	intel_pstate_record_pstate(cpu, pstate);
>>> +	wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu,
>>> pstate));
>>> +}
>>> +
>>> static inline void intel_pstate_adjust_busy_pstate(struct cpudata
>>> *cpu)
>>> {
>>> 	int from, target_pstate;
>>> @@ -1006,7 +1019,7 @@ static inline void intel_pstate_adjust_b
>>> 
>>> 	target_pstate = pstate_funcs.get_target_pstate(cpu);
>>> 
>>> -	intel_pstate_set_pstate(cpu, target_pstate, true);
>>> +	intel_pstate_update_pstate(cpu, target_pstate);
>>> 
>>> 	sample = &cpu->sample;
>>> 	trace_pstate_sample(fp_toint(sample->core_pct_busy),
>>> @@ -1180,7 +1193,7 @@ static void intel_pstate_stop_cpu(struct
>>> 	if (hwp_active)
>>> 		return;
>>> 
>>> -	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate,
>>> false);
>>> +	intel_pstate_set_min_pstate(cpu);
>>> }
>>> 
>>> static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
>>> @@ -1255,7 +1268,7 @@ static void copy_cpu_funcs(struct pstate
>>> 	pstate_funcs.get_min   = funcs->get_min;
>>> 	pstate_funcs.get_turbo = funcs->get_turbo;
>>> 	pstate_funcs.get_scaling = funcs->get_scaling;
>>> -	pstate_funcs.set       = funcs->set;
>>> +	pstate_funcs.get_val   = funcs->get_val;
>>> 	pstate_funcs.get_vid   = funcs->get_vid;
>>> 	pstate_funcs.get_target_pstate = funcs->get_target_pstate;
>>> 
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pm" 
>>> in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-18 18:32               ` Stephane Gasparini
@ 2016-03-18 21:44                 ` Rafael J. Wysocki
  2016-03-21  9:31                   ` Stephane Gasparini
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2016-03-18 21:44 UTC (permalink / raw)
  To: Stephane Gasparini
  Cc: Srinivas Pandruvada, Rafael J. Wysocki, Josh Boyer,
	Philippe Longepe, Len Brown, Viresh Kumar, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org

On Fri, Mar 18, 2016 at 7:32 PM, Stephane Gasparini
<stephane.gasparini@linux.intel.com> wrote:
>
> —
> Steph
>
>
>
>
>> On Mar 18, 2016, at 6:52 PM, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
>>
>> On Fri, 2016-03-18 at 17:13 +0100, Stephane Gasparini wrote:
>>> Rafael,
>>>
>>> Why in step 3) both atom_set_pstate() and atom_set_pstate() were not
>>> both
>>> changed to use wrmsrl ?
>> Initial Atom support was experimental as there were no users, till
>> Chrome started using. So it was just a miss.
>>
>> We should never have to use wrmsrl_on_cpu. But looks like
>> cpufreq_driver.init() can't guarantee that.
>>
>>> BTW, what is the interest of setting the pstate to LFM during
>>> initialization ?
>>> The BIOS is setting the pstate to either LFM, HFM or BFM, and why
>>> bothering
>>> changing it.
>> This is a different issue. BIOS has different configuration option to
>> enable fast boot modes which are not necessarily optimized for Linux.
>> Some aggressive setting will force system to reboot on boot. So I will
>> leave the way it is.
>>
>> Thanks,
>> Srinivas
>>
>
> Still it does not answer my question, why when implementing a4675fbc4a7a
> we did changed core for wrmsrl and not atom ?

By mistake?

> My point is that the issue was more due to a miss in the patch a4675fbc4a7a
> rather than a difference of behavior between atom and core.

The issue is due to the fact that wrmsrl_on_cpu() is used in atom_set_pstate().

Moreover, core_set_pstate() doesn't use wrmsrl_on_cpu(), so in fact it
is different from atom_set_pstate() in that respect.

Now, why and when that difference was introduced doesn't really
matter.  What matters is whether or not it makes sense and what to do
about it.

To me, it doesn't make sense.  wrmsrl() should be used on both Core
and Atom to update the MSR in intel_pstate_adjust_busy_pstate().  In
turn, wrmsrl_on_cpu() should be used by both of them on init/exit.
That's exactly what happens with my patch applied, with a twist that
on init/exit the P-state is always set to the minimum, so it is not
even necessary to pass the pstate argument between functions in that
case.

> The commit message is a bit misleading around this.
> The wrmrl_on_cpu() is needed on both core and atom during init.

Yes, it is, but how is that related to the changelog of this patch?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-18 17:35           ` Josh Boyer
@ 2016-03-18 22:23             ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2016-03-18 22:23 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Rafael J. Wysocki, Srinivas Pandruvada, Philippe Longepe,
	Len Brown, Viresh Kumar, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org

On Fri, Mar 18, 2016 at 6:35 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> On Fri, Mar 18, 2016 at 10:36 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Friday, March 18, 2016 08:37:15 AM Josh Boyer wrote:
>>> On Thu, Mar 17, 2016 at 8:20 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> > On Thursday, March 17, 2016 12:44:54 PM Josh Boyer wrote:
>>> >> On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> >> > On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
>>> >> >> Hello,
>>> >> >
>>> >> > Hi,
>>> >> >
>>> >> >> I have an Intel Atom based NUC that is producing the following
>>> >> >> backtraces on boot of Linus' tree as of last evening.  This does not
>>> >> >> happen with a tree with top level commit 271ecc5253e2, but does happen
>>> >> >> when using the tree mentioned in the subject with top level commit
>>> >> >> 63e30271b04c.
>>> >> >>
>>> >> >> The first backtrace appears to be a warning because the intel_pstate
>>> >> >> driver is calling wrmsrl_on_cpu when interrupts are disabled?  Not
>>> >> >> sure on that one.
>>> >> >>
>>> >> >> The second backtrace is a lockdep report.  Both are from the same boot.
>>> >> >
>>> >> > OK, thanks for the report.
>>> >> >
>>> >> > Can you please try the patch below?
>>> >> >
>>> >> > I'm actually unsure if we can do that safely in general for Atom because
>>> >> > of the initialization, but that's what Core does anyway.
>>> >> >
>>> >> > Srinivas, Philippe, why exactly do we need the wrmsrl_on_cpu() in
>>> >> > atom_set_pstate()?  core_set_pstate() uses wrmsrl() and seems to be doing fine.
>>> >> >
>>> >> > ---
>>> >> >  drivers/cpufreq/intel_pstate.c |    2 +-
>>> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
>>> >> >
>>> >> > Index: linux-pm/drivers/cpufreq/intel_pstate.c
>>> >> > ===================================================================
>>> >> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>>> >> > +++ linux-pm/drivers/cpufreq/intel_pstate.c
>>> >> > @@ -587,7 +587,7 @@ static void atom_set_pstate(struct cpuda
>>> >> >
>>> >> >         val |= vid;
>>> >> >
>>> >> > -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
>>> >> > +       wrmsrl(MSR_IA32_PERF_CTL, val);
>>> >> >  }
>>> >> >
>>> >> >  static int silvermont_get_scaling(void)
>>> >> >
>>> >>
>>> >> I applied this on top of commit 09fd671ccb24 and the backtrace and
>>> >> lockdep report both go away.  So yes, this seems to clear up the
>>> >> issue.  I tested it on a variety of different CPU types and didn't
>>> >> notice anything wrong on them either.
>>> >
>>> > The problems may show up during initialization and cleanup where one CPU
>>> > may be running code trying to configure a different one.  In those cases
>>> > wrmsrl_on_cpu() needs to be used.
>>> >
>>> > Let me cut a patch taking that into account.
>>>
>>> OK.  Happy to test when you have it ready.
>>
>> Thanks!
>>
>> Please test the patch below.
>>
>> ---
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Subject: [PATCH] intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
>>
>> After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with
>> utilization update callbacks) wrmsrl_on_cpu() cannot be called in the
>> intel_pstate_adjust_busy_pstate() path as that is executed with
>> disabled interrupts.  However, atom_set_pstate() called from there
>> via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
>> IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
>> smp_call_function_single().
>>
>> The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
>> because intel_pstate_set_pstate() calling it is also invoked during
>> the initialization and cleanup of the driver and in those cases it is
>> not guaranteed to be run on the CPU that is being updated.  However,
>> in the case when intel_pstate_set_pstate() is called by
>> intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
>> the register safely.  Moreover, intel_pstate_set_pstate() already
>> contains code that only is executed if the function is called by
>> intel_pstate_adjust_busy_pstate() and there is a special argument
>> passed to it because of that.
>>
>> To fix the problem at hand, rearrange the code taking the above
>> observations into account.
>>
>> First, replace the ->set() callback in struct pstate_funcs with a
>> ->get_val() one that will return the value to be written to the
>> IA32_PERF_CTL MSR without updating the register.
>>
>> Second, split intel_pstate_set_pstate() into two functions,
>> intel_pstate_update_pstate() to be called by
>> intel_pstate_adjust_busy_pstate() that will contain all of the
>> intel_pstate_set_pstate() code which only needs to be executed in
>> that case and will use wrmsrl() to update the MSR (after obtaining
>> the value to write to it from the ->get_val() callback), and
>> intel_pstate_set_min_pstate() to be invoked during the
>> initialization and cleanup that will set the P-state to the
>> minimum one and will update the MSR using wrmsrl_on_cpu().
>>
>> Finally, move the code shared between intel_pstate_update_pstate()
>> and intel_pstate_set_min_pstate() to a new static inline function
>> intel_pstate_record_pstate() and make them both call it.
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
>
> Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
>
> This worked fine on the original problem machine, and the other
> machines I also tested.  No backtrace or lockdeps reported.

Thanks!

Patch applied.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-18 17:52             ` Srinivas Pandruvada
  2016-03-18 18:32               ` Stephane Gasparini
@ 2016-03-21  9:28               ` Stephane Gasparini
  2016-03-21 14:11                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 19+ messages in thread
From: Stephane Gasparini @ 2016-03-21  9:28 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Josh Boyer, Philippe Longepe, Len Brown,
	Viresh Kumar, Linux PM list, Linux-Kernel@Vger. Kernel. Org


—
Steph




> On Mar 18, 2016, at 6:52 PM, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
> 
> On Fri, 2016-03-18 at 17:13 +0100, Stephane Gasparini wrote:
>> Rafael,
>> 
>> Why in step 3) both atom_set_pstate() and atom_set_pstate() were not
>> both
>> changed to use wrmsrl ?
> Initial Atom support was experimental as there were no users, till
> Chrome started using. So it was just a miss.
> 
> We should never have to use wrmsrl_on_cpu. But looks like
> cpufreq_driver.init() can't guarantee that.
> 
>> BTW, what is the interest of setting the pstate to LFM during
>> initialization ?
>> The BIOS is setting the pstate to either LFM, HFM or BFM, and why
>> bothering
>> changing it.
> This is a different issue. BIOS has different configuration option to
> enable fast boot modes which are not necessarily optimized for Linux.
> Some aggressive setting will force system to reboot on boot. So I will
> leave the way it is.

Here is my understanding.

1) until the driver starts, the CPUS will anyway starts at the P-State set by the BIOS.
2) even if you force it to Lowest P-State in init Intel P-State init, if there is load associated
   to the execution, 10ms after (or may be quicker with the new scheduler based option) the
   P-State will again set to P0.

so because 1) and 2) you’ll have basically the following behavior assuming we have high load
during boot, as this what can cause a reboot due to high frequencies I assume

a) BIOS set LFM
10 ms after init of intel P-State, CPUs will go to Turbo according to load.

b) BIOS set HFM
CPU will boot to HFM until we reach intel_state init.
During 10ms, CPU will be at LFM.
Due to load they will go up to BFM.

c) BIOS set to BFM.
CPU will boot to BFM until we reach intel_state init.
During 10ms, CPU will be at LFM.
Due to load they will go up to BFM.

So I may have miss something but I do not see what is the real benefit of doing this init to LFM 
that will last for 10ms.

I still think this initialization is useless and complexity the code.

Can you point me to case where having this initialization did solve an issue so that I understand 
the interest of doing this initialization ?

thanks

> 
> Thanks,
> Srinivas
> 
>>  
>> —
>> Steph
>> 
>> 
>> 
>> 
>>> 
>>> On Mar 18, 2016, at 3:36 PM, Rafael J. Wysocki <rjw@rjwysocki.net>
>>> wrote:
>>> 
>>> On Friday, March 18, 2016 08:37:15 AM Josh Boyer wrote:
>>>> 
>>>> On Thu, Mar 17, 2016 at 8:20 PM, Rafael J. Wysocki <rjw@rjwysocki
>>>> .net> wrote:
>>>>> 
>>>>> On Thursday, March 17, 2016 12:44:54 PM Josh Boyer wrote:
>>>>>> 
>>>>>> On Thu, Mar 17, 2016 at 10:07 AM, Rafael J. Wysocki <rjw@rjwy
>>>>>> socki.net> wrote:
>>>>>>> 
>>>>>>> On Thursday, March 17, 2016 09:02:29 AM Josh Boyer wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>> Hi,
>>>>>>> 
>>>>>>>> 
>>>>>>>> I have an Intel Atom based NUC that is producing the
>>>>>>>> following
>>>>>>>> backtraces on boot of Linus' tree as of last
>>>>>>>> evening.  This does not
>>>>>>>> happen with a tree with top level commit 271ecc5253e2,
>>>>>>>> but does happen
>>>>>>>> when using the tree mentioned in the subject with top
>>>>>>>> level commit
>>>>>>>> 63e30271b04c.
>>>>>>>> 
>>>>>>>> The first backtrace appears to be a warning because the
>>>>>>>> intel_pstate
>>>>>>>> driver is calling wrmsrl_on_cpu when interrupts are
>>>>>>>> disabled?  Not
>>>>>>>> sure on that one.
>>>>>>>> 
>>>>>>>> The second backtrace is a lockdep report.  Both are from
>>>>>>>> the same boot.
>>>>>>> OK, thanks for the report.
>>>>>>> 
>>>>>>> Can you please try the patch below?
>>>>>>> 
>>>>>>> I'm actually unsure if we can do that safely in general for
>>>>>>> Atom because
>>>>>>> of the initialization, but that's what Core does anyway.
>>>>>>> 
>>>>>>> Srinivas, Philippe, why exactly do we need the
>>>>>>> wrmsrl_on_cpu() in
>>>>>>> atom_set_pstate()?  core_set_pstate() uses wrmsrl() and
>>>>>>> seems to be doing fine.
>>>>>>> 
>>>>>>> ---
>>>>>>> drivers/cpufreq/intel_pstate.c |    2 +-
>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>> 
>>>>>>> Index: linux-pm/drivers/cpufreq/intel_pstate.c
>>>>>>> ===========================================================
>>>>>>> ========
>>>>>>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>>>>>>> +++ linux-pm/drivers/cpufreq/intel_pstate.c
>>>>>>> @@ -587,7 +587,7 @@ static void atom_set_pstate(struct
>>>>>>> cpuda
>>>>>>> 
>>>>>>>        val |= vid;
>>>>>>> 
>>>>>>> -       wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL,
>>>>>>> val);
>>>>>>> +       wrmsrl(MSR_IA32_PERF_CTL, val);
>>>>>>> }
>>>>>>> 
>>>>>>> static int silvermont_get_scaling(void)
>>>>>>> 
>>>>>> I applied this on top of commit 09fd671ccb24 and the
>>>>>> backtrace and
>>>>>> lockdep report both go away.  So yes, this seems to clear up
>>>>>> the
>>>>>> issue.  I tested it on a variety of different CPU types and
>>>>>> didn't
>>>>>> notice anything wrong on them either.
>>>>> The problems may show up during initialization and cleanup
>>>>> where one CPU
>>>>> may be running code trying to configure a different one.  In
>>>>> those cases
>>>>> wrmsrl_on_cpu() needs to be used.
>>>>> 
>>>>> Let me cut a patch taking that into account.
>>>> OK.  Happy to test when you have it ready.
>>> Thanks!
>>> 
>>> Please test the patch below.
>>> 
>>> ---
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> Subject: [PATCH] intel_pstate: Do not call wrmsrl_on_cpu() with
>>> disabled interrupts
>>> 
>>> After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers
>>> with
>>> utilization update callbacks) wrmsrl_on_cpu() cannot be called in
>>> the
>>> intel_pstate_adjust_busy_pstate() path as that is executed with
>>> disabled interrupts.  However, atom_set_pstate() called from there
>>> via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
>>> IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
>>> smp_call_function_single().
>>> 
>>> The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
>>> because intel_pstate_set_pstate() calling it is also invoked during
>>> the initialization and cleanup of the driver and in those cases it
>>> is
>>> not guaranteed to be run on the CPU that is being
>>> updated.  However,
>>> in the case when intel_pstate_set_pstate() is called by
>>> intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
>>> the register safely.  Moreover, intel_pstate_set_pstate() already
>>> contains code that only is executed if the function is called by
>>> intel_pstate_adjust_busy_pstate() and there is a special argument
>>> passed to it because of that.
>>> 
>>> To fix the problem at hand, rearrange the code taking the above
>>> observations into account.
>>> 
>>> First, replace the ->set() callback in struct pstate_funcs with a
>>> ->get_val() one that will return the value to be written to the
>>> IA32_PERF_CTL MSR without updating the register.
>>> 
>>> Second, split intel_pstate_set_pstate() into two functions,
>>> intel_pstate_update_pstate() to be called by
>>> intel_pstate_adjust_busy_pstate() that will contain all of the
>>> intel_pstate_set_pstate() code which only needs to be executed in
>>> that case and will use wrmsrl() to update the MSR (after obtaining
>>> the value to write to it from the ->get_val() callback), and
>>> intel_pstate_set_min_pstate() to be invoked during the
>>> initialization and cleanup that will set the P-state to the
>>> minimum one and will update the MSR using wrmsrl_on_cpu().
>>> 
>>> Finally, move the code shared between intel_pstate_update_pstate()
>>> and intel_pstate_set_min_pstate() to a new static inline function
>>> intel_pstate_record_pstate() and make them both call it.
>>> 
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with
>>> utilization update callbacks)
>>> ---
>>> drivers/cpufreq/intel_pstate.c |   73 ++++++++++++++++++++++++-----
>>> ------------
>>> 1 file changed, 43 insertions(+), 30 deletions(-)
>>> 
>>> Index: linux-pm/drivers/cpufreq/intel_pstate.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>>> +++ linux-pm/drivers/cpufreq/intel_pstate.c
>>> @@ -134,7 +134,7 @@ struct pstate_funcs {
>>> 	int (*get_min)(void);
>>> 	int (*get_turbo)(void);
>>> 	int (*get_scaling)(void);
>>> -	void (*set)(struct cpudata*, int pstate);
>>> +	u64 (*get_val)(struct cpudata*, int pstate);
>>> 	void (*get_vid)(struct cpudata *);
>>> 	int32_t (*get_target_pstate)(struct cpudata *);
>>> };
>>> @@ -565,7 +565,7 @@ static int atom_get_turbo_pstate(void)
>>> 	return value & 0x7F;
>>> }
>>> 
>>> -static void atom_set_pstate(struct cpudata *cpudata, int pstate)
>>> +static u64 atom_get_val(struct cpudata *cpudata, int pstate)
>>> {
>>> 	u64 val;
>>> 	int32_t vid_fp;
>>> @@ -585,9 +585,7 @@ static void atom_set_pstate(struct cpuda
>>> 	if (pstate > cpudata->pstate.max_pstate)
>>> 		vid = cpudata->vid.turbo;
>>> 
>>> -	val |= vid;
>>> -
>>> -	wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
>>> +	return val | vid;
>>> }
>>> 
>>> static int silvermont_get_scaling(void)
>>> @@ -711,7 +709,7 @@ static inline int core_get_scaling(void)
>>> 	return 100000;
>>> }
>>> 
>>> -static void core_set_pstate(struct cpudata *cpudata, int pstate)
>>> +static u64 core_get_val(struct cpudata *cpudata, int pstate)
>>> {
>>> 	u64 val;
>>> 
>>> @@ -719,7 +717,7 @@ static void core_set_pstate(struct cpuda
>>> 	if (limits->no_turbo && !limits->turbo_disabled)
>>> 		val |= (u64)1 << 32;
>>> 
>>> -	wrmsrl(MSR_IA32_PERF_CTL, val);
>>> +	return val;
>>> }
>>> 
>>> static int knl_get_turbo_pstate(void)
>>> @@ -750,7 +748,7 @@ static struct cpu_defaults core_params =
>>> 		.get_min = core_get_min_pstate,
>>> 		.get_turbo = core_get_turbo_pstate,
>>> 		.get_scaling = core_get_scaling,
>>> -		.set = core_set_pstate,
>>> +		.get_val = core_get_val,
>>> 		.get_target_pstate = get_target_pstate_use_performance,
>>> 	},
>>> };
>>> @@ -769,7 +767,7 @@ static struct cpu_defaults silvermont_pa
>>> 		.get_max_physical = atom_get_max_pstate,
>>> 		.get_min = atom_get_min_pstate,
>>> 		.get_turbo = atom_get_turbo_pstate,
>>> -		.set = atom_set_pstate,
>>> +		.get_val = atom_get_val,
>>> 		.get_scaling = silvermont_get_scaling,
>>> 		.get_vid = atom_get_vid,
>>> 		.get_target_pstate = get_target_pstate_use_cpu_load,
>>> @@ -790,7 +788,7 @@ static struct cpu_defaults airmont_param
>>> 		.get_max_physical = atom_get_max_pstate,
>>> 		.get_min = atom_get_min_pstate,
>>> 		.get_turbo = atom_get_turbo_pstate,
>>> -		.set = atom_set_pstate,
>>> +		.get_val = atom_get_val,
>>> 		.get_scaling = airmont_get_scaling,
>>> 		.get_vid = atom_get_vid,
>>> 		.get_target_pstate = get_target_pstate_use_cpu_load,
>>> @@ -812,7 +810,7 @@ static struct cpu_defaults knl_params =
>>> 		.get_min = core_get_min_pstate,
>>> 		.get_turbo = knl_get_turbo_pstate,
>>> 		.get_scaling = core_get_scaling,
>>> -		.set = core_set_pstate,
>>> +		.get_val = core_get_val,
>>> 		.get_target_pstate = get_target_pstate_use_performance,
>>> 	},
>>> };
>>> @@ -839,25 +837,24 @@ static void intel_pstate_get_min_max(str
>>> 	*min = clamp_t(int, min_perf, cpu->pstate.min_pstate,
>>> max_perf);
>>> }
>>> 
>>> -static void intel_pstate_set_pstate(struct cpudata *cpu, int
>>> pstate, bool force)
>>> +static inline void intel_pstate_record_pstate(struct cpudata *cpu,
>>> int pstate)
>>> {
>>> -	int max_perf, min_perf;
>>> -
>>> -	if (force) {
>>> -		update_turbo_state();
>>> -
>>> -		intel_pstate_get_min_max(cpu, &min_perf,
>>> &max_perf);
>>> -
>>> -		pstate = clamp_t(int, pstate, min_perf, max_perf);
>>> -
>>> -		if (pstate == cpu->pstate.current_pstate)
>>> -			return;
>>> -	}
>>> 	trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu);
>>> -
>>> 	cpu->pstate.current_pstate = pstate;
>>> +}
>>> 
>>> -	pstate_funcs.set(cpu, pstate);
>>> +static void intel_pstate_set_min_pstate(struct cpudata *cpu)
>>> +{
>>> +	int pstate = cpu->pstate.min_pstate;
>>> +
>>> +	intel_pstate_record_pstate(cpu, pstate);
>>> +	/*
>>> +	 * Generally, there is no guarantee that this code will
>>> always run on
>>> +	 * the CPU being updated, so force the register update to
>>> run on the
>>> +	 * right CPU.
>>> +	 */
>>> +	wrmsrl_on_cpu(cpu->cpu, MSR_IA32_PERF_CTL,
>>> +		      pstate_funcs.get_val(cpu, pstate));
>>> }
>>> 
>>> static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
>>> @@ -870,7 +867,8 @@ static void intel_pstate_get_cpu_pstates
>>> 
>>> 	if (pstate_funcs.get_vid)
>>> 		pstate_funcs.get_vid(cpu);
>>> -	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate,
>>> false);
>>> +
>>> +	intel_pstate_set_min_pstate(cpu);
>>> }
>>> 
>>> static inline void intel_pstate_calc_busy(struct cpudata *cpu)
>>> @@ -997,6 +995,21 @@ static inline int32_t get_target_pstate_
>>> 	return cpu->pstate.current_pstate - pid_calc(&cpu->pid,
>>> core_busy);
>>> }
>>> 
>>> +static inline void intel_pstate_update_pstate(struct cpudata *cpu,
>>> int pstate)
>>> +{
>>> +	int max_perf, min_perf;
>>> +
>>> +	update_turbo_state();
>>> +
>>> +	intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
>>> +	pstate = clamp_t(int, pstate, min_perf, max_perf);
>>> +	if (pstate == cpu->pstate.current_pstate)
>>> +		return;
>>> +
>>> +	intel_pstate_record_pstate(cpu, pstate);
>>> +	wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu,
>>> pstate));
>>> +}
>>> +
>>> static inline void intel_pstate_adjust_busy_pstate(struct cpudata
>>> *cpu)
>>> {
>>> 	int from, target_pstate;
>>> @@ -1006,7 +1019,7 @@ static inline void intel_pstate_adjust_b
>>> 
>>> 	target_pstate = pstate_funcs.get_target_pstate(cpu);
>>> 
>>> -	intel_pstate_set_pstate(cpu, target_pstate, true);
>>> +	intel_pstate_update_pstate(cpu, target_pstate);
>>> 
>>> 	sample = &cpu->sample;
>>> 	trace_pstate_sample(fp_toint(sample->core_pct_busy),
>>> @@ -1180,7 +1193,7 @@ static void intel_pstate_stop_cpu(struct
>>> 	if (hwp_active)
>>> 		return;
>>> 
>>> -	intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate,
>>> false);
>>> +	intel_pstate_set_min_pstate(cpu);
>>> }
>>> 
>>> static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
>>> @@ -1255,7 +1268,7 @@ static void copy_cpu_funcs(struct pstate
>>> 	pstate_funcs.get_min   = funcs->get_min;
>>> 	pstate_funcs.get_turbo = funcs->get_turbo;
>>> 	pstate_funcs.get_scaling = funcs->get_scaling;
>>> -	pstate_funcs.set       = funcs->set;
>>> +	pstate_funcs.get_val   = funcs->get_val;
>>> 	pstate_funcs.get_vid   = funcs->get_vid;
>>> 	pstate_funcs.get_target_pstate = funcs->get_target_pstate;
>>> 
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pm" 
>>> in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-18 21:44                 ` Rafael J. Wysocki
@ 2016-03-21  9:31                   ` Stephane Gasparini
  2016-03-21 14:09                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Stephane Gasparini @ 2016-03-21  9:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srinivas Pandruvada, Rafael J. Wysocki, Josh Boyer,
	Philippe Longepe, Len Brown, Viresh Kumar, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org


—
Steph




> On Mar 18, 2016, at 10:44 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> 
> On Fri, Mar 18, 2016 at 7:32 PM, Stephane Gasparini
> <stephane.gasparini@linux.intel.com> wrote:
>> 
>> —
>> Steph
>> 
>> 
>> 
>> 
>>> On Mar 18, 2016, at 6:52 PM, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
>>> 
>>> On Fri, 2016-03-18 at 17:13 +0100, Stephane Gasparini wrote:
>>>> Rafael,
>>>> 
>>>> Why in step 3) both atom_set_pstate() and atom_set_pstate() were not
>>>> both
>>>> changed to use wrmsrl ?
>>> Initial Atom support was experimental as there were no users, till
>>> Chrome started using. So it was just a miss.
>>> 
>>> We should never have to use wrmsrl_on_cpu. But looks like
>>> cpufreq_driver.init() can't guarantee that.
>>> 
>>>> BTW, what is the interest of setting the pstate to LFM during
>>>> initialization ?
>>>> The BIOS is setting the pstate to either LFM, HFM or BFM, and why
>>>> bothering
>>>> changing it.
>>> This is a different issue. BIOS has different configuration option to
>>> enable fast boot modes which are not necessarily optimized for Linux.
>>> Some aggressive setting will force system to reboot on boot. So I will
>>> leave the way it is.
>>> 
>>> Thanks,
>>> Srinivas
>>> 
>> 
>> Still it does not answer my question, why when implementing a4675fbc4a7a
>> we did changed core for wrmsrl and not atom ?
> 
> By mistake?
> 
>> My point is that the issue was more due to a miss in the patch a4675fbc4a7a
>> rather than a difference of behavior between atom and core.
> 
> The issue is due to the fact that wrmsrl_on_cpu() is used in atom_set_pstate().
> 
> Moreover, core_set_pstate() doesn't use wrmsrl_on_cpu(), so in fact it
> is different from atom_set_pstate() in that respect.
> 
> Now, why and when that difference was introduced doesn't really
> matter.  What matters is whether or not it makes sense and what to do
> about it.
> 
> To me, it doesn't make sense.  wrmsrl() should be used on both Core
> and Atom to update the MSR in intel_pstate_adjust_busy_pstate().  In
> turn, wrmsrl_on_cpu() should be used by both of them on init/exit.
> That's exactly what happens with my patch applied, with a twist that
> on init/exit the P-state is always set to the minimum, so it is not
> even necessary to pass the pstate argument between functions in that
> case.
> 
>> The commit message is a bit misleading around this.
>> The wrmrl_on_cpu() is needed on both core and atom during init.
> 
> Yes, it is, but how is that related to the changelog of this patch?

Telling what you are saying in this email in answer to me would make the thing 
more clear IMO. 
1) the error seen is a side effect of the previous change, so the issue 
   was not existing before
2) the explanation would be more clear that during the init/exit wmsrl_on_cpu()
   and other wise wmsrl is the one to be used.


> 
> Thanks,
> Rafael

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-21  9:31                   ` Stephane Gasparini
@ 2016-03-21 14:09                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2016-03-21 14:09 UTC (permalink / raw)
  To: Stephane Gasparini
  Cc: Rafael J. Wysocki, Srinivas Pandruvada, Josh Boyer,
	Philippe Longepe, Len Brown, Viresh Kumar, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org

On Monday, March 21, 2016 10:31:37 AM Stephane Gasparini wrote:
> 
> —
> Steph
> 
> 
> 
> 
> > On Mar 18, 2016, at 10:44 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > 
> > On Fri, Mar 18, 2016 at 7:32 PM, Stephane Gasparini
> > <stephane.gasparini@linux.intel.com> wrote:
> >> 
> >> —
> >> Steph
> >> 
> >> 
> >> 
> >> 
> >>> On Mar 18, 2016, at 6:52 PM, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
> >>> 
> >>> On Fri, 2016-03-18 at 17:13 +0100, Stephane Gasparini wrote:
> >>>> Rafael,
> >>>> 
> >>>> Why in step 3) both atom_set_pstate() and atom_set_pstate() were not
> >>>> both
> >>>> changed to use wrmsrl ?
> >>> Initial Atom support was experimental as there were no users, till
> >>> Chrome started using. So it was just a miss.
> >>> 
> >>> We should never have to use wrmsrl_on_cpu. But looks like
> >>> cpufreq_driver.init() can't guarantee that.
> >>> 
> >>>> BTW, what is the interest of setting the pstate to LFM during
> >>>> initialization ?
> >>>> The BIOS is setting the pstate to either LFM, HFM or BFM, and why
> >>>> bothering
> >>>> changing it.
> >>> This is a different issue. BIOS has different configuration option to
> >>> enable fast boot modes which are not necessarily optimized for Linux.
> >>> Some aggressive setting will force system to reboot on boot. So I will
> >>> leave the way it is.
> >>> 
> >>> Thanks,
> >>> Srinivas
> >>> 
> >> 
> >> Still it does not answer my question, why when implementing a4675fbc4a7a
> >> we did changed core for wrmsrl and not atom ?
> > 
> > By mistake?
> > 
> >> My point is that the issue was more due to a miss in the patch a4675fbc4a7a
> >> rather than a difference of behavior between atom and core.
> > 
> > The issue is due to the fact that wrmsrl_on_cpu() is used in atom_set_pstate().
> > 
> > Moreover, core_set_pstate() doesn't use wrmsrl_on_cpu(), so in fact it
> > is different from atom_set_pstate() in that respect.
> > 
> > Now, why and when that difference was introduced doesn't really
> > matter.  What matters is whether or not it makes sense and what to do
> > about it.
> > 
> > To me, it doesn't make sense.  wrmsrl() should be used on both Core
> > and Atom to update the MSR in intel_pstate_adjust_busy_pstate().  In
> > turn, wrmsrl_on_cpu() should be used by both of them on init/exit.
> > That's exactly what happens with my patch applied, with a twist that
> > on init/exit the P-state is always set to the minimum, so it is not
> > even necessary to pass the pstate argument between functions in that
> > case.
> > 
> >> The commit message is a bit misleading around this.
> >> The wrmrl_on_cpu() is needed on both core and atom during init.
> > 
> > Yes, it is, but how is that related to the changelog of this patch?
> 
> Telling what you are saying in this email in answer to me would make the thing 
> more clear IMO. 
> 1) the error seen is a side effect of the previous change, so the issue 
>    was not existing before

That's moot.

Of course, you can argue that being inconsistent about using wmsrl_on_cpu()
vs wmsrl() was a bug by itself, but did it really lead to any user-visible
problems?

Honestly, I doubt it.

The issue has become visible after commit a4675fbc4a7a and that's the reason
for my patch.

> 2) the explanation would be more clear that during the init/exit wmsrl_on_cpu()
>    and other wise wmsrl is the one to be used.

So to me "initialization and cleanup" means exactly "init/exit", so I don't
think it really needs to be more clear than that.

I've added a line about making things consistent between Core and Atom to the
changelog, BTW.  Please have a look at it in my tree.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-21  9:28               ` Stephane Gasparini
@ 2016-03-21 14:11                 ` Rafael J. Wysocki
  2016-03-21 18:58                   ` Srinivas Pandruvada
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2016-03-21 14:11 UTC (permalink / raw)
  To: Stephane Gasparini
  Cc: Srinivas Pandruvada, Josh Boyer, Philippe Longepe, Len Brown,
	Viresh Kumar, Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Monday, March 21, 2016 10:28:09 AM Stephane Gasparini wrote:
> 
> —
> Steph
> 
> 
> 
> 
> > On Mar 18, 2016, at 6:52 PM, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
> > 
> > On Fri, 2016-03-18 at 17:13 +0100, Stephane Gasparini wrote:
> >> Rafael,
> >> 
> >> Why in step 3) both atom_set_pstate() and atom_set_pstate() were not
> >> both
> >> changed to use wrmsrl ?
> > Initial Atom support was experimental as there were no users, till
> > Chrome started using. So it was just a miss.
> > 
> > We should never have to use wrmsrl_on_cpu. But looks like
> > cpufreq_driver.init() can't guarantee that.
> > 
> >> BTW, what is the interest of setting the pstate to LFM during
> >> initialization ?
> >> The BIOS is setting the pstate to either LFM, HFM or BFM, and why
> >> bothering
> >> changing it.
> > This is a different issue. BIOS has different configuration option to
> > enable fast boot modes which are not necessarily optimized for Linux.
> > Some aggressive setting will force system to reboot on boot. So I will
> > leave the way it is.
> 
> Here is my understanding.
> 
> 1) until the driver starts, the CPUS will anyway starts at the P-State set by the BIOS.
> 2) even if you force it to Lowest P-State in init Intel P-State init, if there is load associated
>    to the execution, 10ms after (or may be quicker with the new scheduler based option) the
>    P-State will again set to P0.
> 
> so because 1) and 2) you’ll have basically the following behavior assuming we have high load
> during boot, as this what can cause a reboot due to high frequencies I assume
> 
> a) BIOS set LFM
> 10 ms after init of intel P-State, CPUs will go to Turbo according to load.
> 
> b) BIOS set HFM
> CPU will boot to HFM until we reach intel_state init.
> During 10ms, CPU will be at LFM.
> Due to load they will go up to BFM.
> 
> c) BIOS set to BFM.
> CPU will boot to BFM until we reach intel_state init.
> During 10ms, CPU will be at LFM.
> Due to load they will go up to BFM.
> 
> So I may have miss something but I do not see what is the real benefit of doing this init to LFM 
> that will last for 10ms.
> 
> I still think this initialization is useless and complexity the code.
> 
> Can you point me to case where having this initialization did solve an issue so that I understand 
> the interest of doing this initialization ?

What you're saying above makes sense, but that change wouldn't belong to the
patch in question anyway.

Please consider submitting another patch to make that change if you think it's
worth the effort.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-21 14:11                 ` Rafael J. Wysocki
@ 2016-03-21 18:58                   ` Srinivas Pandruvada
  2016-03-21 22:02                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Srinivas Pandruvada @ 2016-03-21 18:58 UTC (permalink / raw)
  To: Rafael J. Wysocki, Stephane Gasparini
  Cc: Josh Boyer, Philippe Longepe, Len Brown, Viresh Kumar,
	Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Mon, 2016-03-21 at 15:11 +0100, Rafael J. Wysocki wrote:
> On Monday, March 21, 2016 10:28:09 AM Stephane Gasparini wrote:
> > 
> > —
> > Steph
> > 
> > 
> > 
> > 
> > > On Mar 18, 2016, at 6:52 PM, Srinivas Pandruvada <srinivas.pandru
> > > vada@linux.intel.com> wrote:
> > > 
> > > On Fri, 2016-03-18 at 17:13 +0100, Stephane Gasparini wrote:
> > > > Rafael,
> > > > 
> > > > Why in step 3) both atom_set_pstate() and atom_set_pstate()
> > > > were not
> > > > both
> > > > changed to use wrmsrl ?
> > > Initial Atom support was experimental as there were no users,
> > > till
> > > Chrome started using. So it was just a miss.
> > > 
> > > We should never have to use wrmsrl_on_cpu. But looks like
> > > cpufreq_driver.init() can't guarantee that.
> > > 
> > > > BTW, what is the interest of setting the pstate to LFM during
> > > > initialization ?
> > > > The BIOS is setting the pstate to either LFM, HFM or BFM, and
> > > > why
> > > > bothering
> > > > changing it.
> > > This is a different issue. BIOS has different configuration
> > > option to
> > > enable fast boot modes which are not necessarily optimized for
> > > Linux.
> > > Some aggressive setting will force system to reboot on boot. So I
> > > will
> > > leave the way it is.
> > 
> > Here is my understanding.
> > 
> > 1) until the driver starts, the CPUS will anyway starts at the P-
> > State set by the BIOS.
> > 2) even if you force it to Lowest P-State in init Intel P-State
> > init, if there is load associated
> >    to the execution, 10ms after (or may be quicker with the new
> > scheduler based option) the
> >    P-State will again set to P0.
> > 
> > so because 1) and 2) you’ll have basically the following behavior
> > assuming we have high load
> > during boot, as this what can cause a reboot due to high
> > frequencies I assume
> > 
> > a) BIOS set LFM
> > 10 ms after init of intel P-State, CPUs will go to Turbo according
> > to load.
> > 
> > b) BIOS set HFM
> > CPU will boot to HFM until we reach intel_state init.
> > During 10ms, CPU will be at LFM.
> > Due to load they will go up to BFM.
> > 
> > c) BIOS set to BFM.
> > CPU will boot to BFM until we reach intel_state init.
> > During 10ms, CPU will be at LFM.
> > Due to load they will go up to BFM.
> > 
> > So I may have miss something but I do not see what is the real
> > benefit of doing this init to LFM 
> > that will last for 10ms.
> > 
> > I still think this initialization is useless and complexity the
> > code.
> > 
> > Can you point me to case where having this initialization did solve
> > an issue so that I understand 
> > the interest of doing this initialization ?
> 
> What you're saying above makes sense, but that change wouldn't belong
> to the
> patch in question anyway.
> 
> Please consider submitting another patch to make that change if you
> think it's
> worth the effort.

I don't think this is worth an effort because of legacy it is carrying.
The very first version had set this to max then later changed to "min".
I can no longer ask Dirk, why he did that.
Also 10 ms is lot of time for thermal trigger, so it is not worth an
effort to go back and cause regression on some system running on
thermal edges.

Thanks,
Srinivas


> 
> Thanks,
> Rafael
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c
  2016-03-21 18:58                   ` Srinivas Pandruvada
@ 2016-03-21 22:02                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2016-03-21 22:02 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Stephane Gasparini, Josh Boyer,
	Philippe Longepe, Len Brown, Viresh Kumar, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org

On Mon, Mar 21, 2016 at 7:58 PM, Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
> On Mon, 2016-03-21 at 15:11 +0100, Rafael J. Wysocki wrote:
>> On Monday, March 21, 2016 10:28:09 AM Stephane Gasparini wrote:
>> >
>> > —
>> > Steph
>> >
>> >
>> >
>> >
>> > > On Mar 18, 2016, at 6:52 PM, Srinivas Pandruvada <srinivas.pandru
>> > > vada@linux.intel.com> wrote:
>> > >
>> > > On Fri, 2016-03-18 at 17:13 +0100, Stephane Gasparini wrote:
>> > > > Rafael,
>> > > >
>> > > > Why in step 3) both atom_set_pstate() and atom_set_pstate()
>> > > > were not
>> > > > both
>> > > > changed to use wrmsrl ?
>> > > Initial Atom support was experimental as there were no users,
>> > > till
>> > > Chrome started using. So it was just a miss.
>> > >
>> > > We should never have to use wrmsrl_on_cpu. But looks like
>> > > cpufreq_driver.init() can't guarantee that.
>> > >
>> > > > BTW, what is the interest of setting the pstate to LFM during
>> > > > initialization ?
>> > > > The BIOS is setting the pstate to either LFM, HFM or BFM, and
>> > > > why
>> > > > bothering
>> > > > changing it.
>> > > This is a different issue. BIOS has different configuration
>> > > option to
>> > > enable fast boot modes which are not necessarily optimized for
>> > > Linux.
>> > > Some aggressive setting will force system to reboot on boot. So I
>> > > will
>> > > leave the way it is.
>> >
>> > Here is my understanding.
>> >
>> > 1) until the driver starts, the CPUS will anyway starts at the P-
>> > State set by the BIOS.
>> > 2) even if you force it to Lowest P-State in init Intel P-State
>> > init, if there is load associated
>> >    to the execution, 10ms after (or may be quicker with the new
>> > scheduler based option) the
>> >    P-State will again set to P0.
>> >
>> > so because 1) and 2) you’ll have basically the following behavior
>> > assuming we have high load
>> > during boot, as this what can cause a reboot due to high
>> > frequencies I assume
>> >
>> > a) BIOS set LFM
>> > 10 ms after init of intel P-State, CPUs will go to Turbo according
>> > to load.
>> >
>> > b) BIOS set HFM
>> > CPU will boot to HFM until we reach intel_state init.
>> > During 10ms, CPU will be at LFM.
>> > Due to load they will go up to BFM.
>> >
>> > c) BIOS set to BFM.
>> > CPU will boot to BFM until we reach intel_state init.
>> > During 10ms, CPU will be at LFM.
>> > Due to load they will go up to BFM.
>> >
>> > So I may have miss something but I do not see what is the real
>> > benefit of doing this init to LFM
>> > that will last for 10ms.
>> >
>> > I still think this initialization is useless and complexity the
>> > code.
>> >
>> > Can you point me to case where having this initialization did solve
>> > an issue so that I understand
>> > the interest of doing this initialization ?
>>
>> What you're saying above makes sense, but that change wouldn't belong
>> to the
>> patch in question anyway.
>>
>> Please consider submitting another patch to make that change if you
>> think it's worth the effort.
>
> I don't think this is worth an effort because of legacy it is carrying.
> The very first version had set this to max then later changed to "min".
> I can no longer ask Dirk, why he did that.

The max generally may not be a safe initial setting.

> Also 10 ms is lot of time for thermal trigger, so it is not worth an
> effort to go back and cause regression on some system running on
> thermal edges.

I would be very surprised if staying at the BIOS-provided frequency
caused any regressions to happen, but of course at the same time
pstate.current_pstate has to be initialized to a value reflecting the
one in the register.  In order to do that we can either initialize
them both to known safe values representing the same setting as we do
now (and min definitely is the safest choice here), or read the
register and initialize pstate.current_pstate accordingly.

The latter, however, is less attractive, because now we can do the
same thing on init/exit and with that modification the exit path would
be different.

[BTW, note that init doesn't only happen during boot.  It happens on
CPU online too (although in that case whatever is left by exit should
stick) and during system resume which is quite a bit more sensitive.]

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-03-21 22:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-17 13:02 intel_pstate oopses and lockdep report with Linux v4.5-1822-g63e30271b04c Josh Boyer
2016-03-17 14:07 ` Rafael J. Wysocki
2016-03-17 14:34   ` Philippe Longepe
2016-03-17 16:44   ` Josh Boyer
2016-03-18  0:20     ` Rafael J. Wysocki
2016-03-18 12:37       ` Josh Boyer
2016-03-18 14:36         ` Rafael J. Wysocki
2016-03-18 16:13           ` Stephane Gasparini
2016-03-18 17:52             ` Srinivas Pandruvada
2016-03-18 18:32               ` Stephane Gasparini
2016-03-18 21:44                 ` Rafael J. Wysocki
2016-03-21  9:31                   ` Stephane Gasparini
2016-03-21 14:09                     ` Rafael J. Wysocki
2016-03-21  9:28               ` Stephane Gasparini
2016-03-21 14:11                 ` Rafael J. Wysocki
2016-03-21 18:58                   ` Srinivas Pandruvada
2016-03-21 22:02                     ` Rafael J. Wysocki
2016-03-18 17:35           ` Josh Boyer
2016-03-18 22:23             ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.