linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "Force HWP min perf before offline" triggers unchecked MSR access errors
@ 2019-10-29 20:55 Qian Cai
  2019-10-29 21:47 ` Rafael J. Wysocki
  0 siblings, 1 reply; 5+ messages in thread
From: Qian Cai @ 2019-10-29 20:55 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Chen Yu, Len Brown, Viresh Kumar,
	Borislav Petkov, Thomas Gleixner, linux-pm, linux-kernel

The commit af3b7379e2d7 ("cpufreq: intel_pstate: Force HWP min perf before
offline") triggers an error below while doing CPU hotplug. Reverted it (on the
top of the linux-next) fixed it.

https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config

# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              144
On-line CPU(s) list: 0-143
Thread(s) per core:  2
Core(s) per socket:  18
Socket(s):           4
NUMA node(s):        4
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
Stepping:            4
CPU MHz:             1200.001
CPU max MHz:         3700.0000
CPU min MHz:         1200.0000
BogoMIPS:            6000.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0-17,72-89
NUMA node1 CPU(s):   18-35,90-107
NUMA node2 CPU(s):   36-53,108-125
NUMA node3 CPU(s):   54-71,126-143
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch
cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb
stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle
avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap
clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
hwp hwp_act_window hwp_pkg_req pku ospke md_clear flush_l1d

[17670.190223][T69701] LTP: starting cpuhotplug02 (cpuhotplug02.sh -c 1 -l 1)
[17676.195430][   T15] unchecked MSR access error: WRMSR to 0x1b0 (tried to
write 0x00000000000000c0) at rIP: 0xffffffff82b0a97c (__wrmsr_on_cpu+0xbc/0x130)
[17676.209251][   T15] Call Trace:
[17676.212410][   T15]  ? rdmsrl_on_cpu+0xf0/0xf0
[17676.216882][   T15]  generic_exec_single+0x13e/0x1d0
[17676.221876][   T15]  ? rdmsrl_on_cpu+0xf0/0xf0
[17676.226344][   T15]  smp_call_function_single+0x1aa/0x200
[17676.231774][   T15]  ? generic_exec_single+0x1d0/0x1d0
[17676.236942][   T15]  ? rdmsrl_on_cpu+0xb1/0xf0
[17676.241410][   T15]  wrmsrl_on_cpu+0xa6/0xe0
[17676.245705][   T15]  ? wrmsr_on_cpu+0xf0/0xf0
[17676.250091][   T15]  ? kasan_slab_free+0xe/0x10
[17676.254650][   T15]  ? intel_pstate_get_epp+0x168/0x190
[17676.259905][   T15]  ? store_energy_performance_preference+0x370/0x370
[17676.266469][   T15]  intel_pstate_set_epb+0xc8/0x110
[17676.271463][   T15]  ? show_status+0x80/0x80
[17676.275760][   T15]  ? down_write_killable+0x160/0x160
[17676.280927][   T15]  intel_pstate_stop_cpu+0x126/0x150
[17676.286094][   T15]  cpufreq_offline+0x17c/0x3a0
[17676.290737][   T15]  ? cpufreq_offline+0x3a0/0x3a0
[17676.295556][   T15]  cpuhp_cpufreq_offline+0xe/0x20
[17676.300464][   T15]  cpuhp_invoke_callback+0x197/0x1120
[17676.305724][   T15]  ? lock_acquire+0x126/0x280
[17676.310280][   T15]  ? cpuhp_thread_fun+0x69/0x2f0
[17676.315098][   T15]  cpuhp_thread_fun+0x252/0x2f0
[17676.319830][   T15]  ? __cpuhp_state_remove_instance+0x350/0x350
[17676.325876][   T15]  smpboot_thread_fn+0x255/0x440
[17676.330695][   T15]  ? smpboot_register_percpu_thread+0x110/0x110
[17676.336824][   T15]  ? __kasan_check_read+0x11/0x20
[17676.341731][   T15]  ? __kthread_parkme+0xc6/0xe0
[17676.346463][   T15]  ? smpboot_register_percpu_thread+0x110/0x110
[17676.352590][   T15]  kthread+0x1e6/0x210
[17676.356534][   T15]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[17676.362314][   T15]  ret_from_fork+0x3a/0x50
[17676.895221][   T16] IRQ 273: no longer affine to CPU1
[17676.901373][   T16] process 69725 (cpuhotplug_do_s) no longer affine to cpu1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "Force HWP min perf before offline" triggers unchecked MSR access errors
  2019-10-29 20:55 "Force HWP min perf before offline" triggers unchecked MSR access errors Qian Cai
@ 2019-10-29 21:47 ` Rafael J. Wysocki
  2019-10-29 22:01   ` Qian Cai
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2019-10-29 21:47 UTC (permalink / raw)
  To: Qian Cai
  Cc: Srinivas Pandruvada, Rafael J. Wysocki, Chen Yu, Len Brown,
	Viresh Kumar, Borislav Petkov, Thomas Gleixner, Linux PM,
	Linux Kernel Mailing List

On Tue, Oct 29, 2019 at 9:55 PM Qian Cai <cai@lca.pw> wrote:
>
> The commit af3b7379e2d7 ("cpufreq: intel_pstate: Force HWP min perf before
> offline") triggers an error below while doing CPU hotplug. Reverted it (on the
> top of the linux-next) fixed it.
>
> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config

The MSR_IA32_ENERGY_PERF_BIAS MSR appears to be not present, which
should be caught by the X86_FEATURE_EPB check in
intel_pstate_set_epb().

Do you run this in a guest perchance?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "Force HWP min perf before offline" triggers unchecked MSR access errors
  2019-10-29 21:47 ` Rafael J. Wysocki
@ 2019-10-29 22:01   ` Qian Cai
  2019-10-29 22:13     ` Srinivas Pandruvada
  0 siblings, 1 reply; 5+ messages in thread
From: Qian Cai @ 2019-10-29 22:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srinivas Pandruvada, Rafael J. Wysocki, Chen Yu, Len Brown,
	Viresh Kumar, Borislav Petkov, Thomas Gleixner, Linux PM,
	Linux Kernel Mailing List



> On Oct 29, 2019, at 5:47 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> 
> The MSR_IA32_ENERGY_PERF_BIAS MSR appears to be not present, which
> should be caught by the X86_FEATURE_EPB check in
> intel_pstate_set_epb().
> 
> Do you run this in a guest perchance?

No, it is a baremetal HPE server. The dmesg does say something like energy perf bias changed from performance to normal, and the cpuflag contains epb which I thought that would pass the feature check? I could upload the whole dmesg a bit later if that helps.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "Force HWP min perf before offline" triggers unchecked MSR access errors
  2019-10-29 22:01   ` Qian Cai
@ 2019-10-29 22:13     ` Srinivas Pandruvada
  2019-10-29 22:15       ` Srinivas Pandruvada
  0 siblings, 1 reply; 5+ messages in thread
From: Srinivas Pandruvada @ 2019-10-29 22:13 UTC (permalink / raw)
  To: Qian Cai, Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Chen Yu, Len Brown, Viresh Kumar,
	Borislav Petkov, Thomas Gleixner, Linux PM,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 712 bytes --]

On Tue, 2019-10-29 at 18:01 -0400, Qian Cai wrote:
> > On Oct 29, 2019, at 5:47 PM, Rafael J. Wysocki <rafael@kernel.org>
> > wrote:
> > 
> > The MSR_IA32_ENERGY_PERF_BIAS MSR appears to be not present, which
> > should be caught by the X86_FEATURE_EPB check in
> > intel_pstate_set_epb().
> > 
> > Do you run this in a guest perchance?
> 
> No, it is a baremetal HPE server. The dmesg does say something like
> energy perf bias changed from performance to normal, and the cpuflag
> contains epb which I thought that would pass the feature check? I
> could upload the whole dmesg a bit later if that helps.

Try the attached change. You have a Skylake server with no EPP support.
This is odd.

Thanks,
Srinivas


[-- Attachment #2: epb_power.diff --]
[-- Type: text/x-patch, Size: 1025 bytes --]

diff --git a/drivers/acpi/processor_thermal.c b/drivers/acpi/processor_thermal.c
index ec2638f1df4f..f70f746ed58d 100644
--- a/drivers/acpi/processor_thermal.c
+++ b/drivers/acpi/processor_thermal.c
@@ -130,6 +130,7 @@ void acpi_thermal_cpufreq_init(int cpu)
 	struct acpi_processor *pr = per_cpu(processors, cpu);
 	int ret;
 
+	memset(&pr->thermal_req, 0, sizeof(pr->thermal_req));
 	ret = dev_pm_qos_add_request(get_cpu_device(cpu),
 				     &pr->thermal_req, DEV_PM_QOS_MAX_FREQUENCY,
 				     INT_MAX);
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 9f02de9a1b47..eab8b048dc9f 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -851,7 +851,7 @@ static void intel_pstate_hwp_force_min_perf(int cpu)
 	if (boot_cpu_has(X86_FEATURE_HWP_EPP))
 		value |= HWP_ENERGY_PERF_PREFERENCE(HWP_EPP_POWERSAVE);
 	else
-		intel_pstate_set_epb(cpu, HWP_EPP_BALANCE_POWERSAVE);
+		intel_pstate_set_epb(cpu, 0x0F);
 
 	wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
 }

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: "Force HWP min perf before offline" triggers unchecked MSR access errors
  2019-10-29 22:13     ` Srinivas Pandruvada
@ 2019-10-29 22:15       ` Srinivas Pandruvada
  0 siblings, 0 replies; 5+ messages in thread
From: Srinivas Pandruvada @ 2019-10-29 22:15 UTC (permalink / raw)
  To: Qian Cai, Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Chen Yu, Len Brown, Viresh Kumar,
	Borislav Petkov, Thomas Gleixner, Linux PM,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

On Tue, 2019-10-29 at 15:13 -0700, Srinivas Pandruvada wrote:
> On Tue, 2019-10-29 at 18:01 -0400, Qian Cai wrote:
> > > On Oct 29, 2019, at 5:47 PM, Rafael J. Wysocki <rafael@kernel.org
> > > >
> > > wrote:
> > > 
> > > The MSR_IA32_ENERGY_PERF_BIAS MSR appears to be not present,
> > > which
> > > should be caught by the X86_FEATURE_EPB check in
> > > intel_pstate_set_epb().
> > > 
> > > Do you run this in a guest perchance?
> > 
> > No, it is a baremetal HPE server. The dmesg does say something like
> > energy perf bias changed from performance to normal, and the
> > cpuflag
> > contains epb which I thought that would pass the feature check? I
> > could upload the whole dmesg a bit later if that helps.
> 
> Try the attached change. You have a Skylake server with no EPP
> support.
> This is odd.
> 
Sorry.
Ignore the previous one. It had some unrelated change.

> Thanks,
> Srinivas
> 

[-- Attachment #2: epb_power.diff --]
[-- Type: text/x-patch, Size: 515 bytes --]

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 9f02de9a1b47..eab8b048dc9f 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -851,7 +851,7 @@ static void intel_pstate_hwp_force_min_perf(int cpu)
 	if (boot_cpu_has(X86_FEATURE_HWP_EPP))
 		value |= HWP_ENERGY_PERF_PREFERENCE(HWP_EPP_POWERSAVE);
 	else
-		intel_pstate_set_epb(cpu, HWP_EPP_BALANCE_POWERSAVE);
+		intel_pstate_set_epb(cpu, 0x0F);
 
 	wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
 }

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-10-29 22:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-29 20:55 "Force HWP min perf before offline" triggers unchecked MSR access errors Qian Cai
2019-10-29 21:47 ` Rafael J. Wysocki
2019-10-29 22:01   ` Qian Cai
2019-10-29 22:13     ` Srinivas Pandruvada
2019-10-29 22:15       ` Srinivas Pandruvada

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).