linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [STACK DUMP][5.16-rc?] from (cpufreq: intel_pstate: Process HWP Guaranteed change notification)
@ 2021-11-03 20:33 Steven Rostedt
  2021-11-03 21:58 ` Linus Torvalds
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2021-11-03 20:33 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Peter Zijlstra, Srinivas Pandruvada, pablomh,
	Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 11082 bytes --]

Testing Linus's tree, I triggered the following stack trace on reboot:

[ Full console logs is attached, from boot up to reboot ]

It's a bit scrambled, but here it is:

Cannot finalize remaining DM devices, continuing.
Failed [  103.358874] kvm: exiting hardware virtualization
to finalize DM d[  103.364580] unchecked MSR access error: WRMSR to 0x773 (tried to write 0x0000000000000000) at rIP: 0xffffffffab06e824 (native_write_msr+0x4/0x20)
evices, ignoring[  103.378700] Call Trace:
.
Syncing f[  103.382504]  <IRQ>
ilesystems and b[  103.385908]  __wrmsr_on_cpu+0x33/0x40
lock devices.
R[  103.390951]  flush_smp_call_function_queue+0x122/0x1a0
ebooting.
[  103.397464]  __sysvec_call_function_single+0x29/0xc0
[  103.403402]  sysvec_call_function_single+0x9d/0xd0
[  103.408192]  </IRQ>
[  103.410305]  asm_sysvec_call_function_single+0x12/0x20
[  103.415482] RIP: 0010:cpuidle_enter_state+0xd6/0x3a0
[  103.420449] Code: 54 e8 de b1 7d ff 49 89 c5 0f 1f 44 00 00 31 ff e8 ff c7 7d ff 45 84 ff 0f 85 df 01 00 00 e8 41 15 8b ff fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 11 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d
[  103.439216] RSP: 0018:ffffb7d7800dfea8 EFLAGS: 00000202
[  103.444461] RAX: 0000000000000006 RBX: 0000000000000004 RCX: 000000000000001f
[  103.451604] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffffab93c4cf
[  103.458743] RBP: ffff9e9f9abf2f00 R08: 0000001811024b0b R09: 0000000000000018
[  103.465880] R10: 0000000000000001 R11: 000000000000292b R12: ffffffffacc5a980
[  103.473046] R13: 0000001811024b0b R14: 0000000000000004 R15: 0000000000000000
[  103.480184]  ? cpuidle_enter_state+0xcf/0x3a0
[  103.484569]  ? cpuidle_enter_state+0xcf/0x3a0
[  103.488952]  cpuidle_enter+0x29/0x40
[  103.492528]  do_idle+0x220/0x2e0
[  103.495777]  cpu_startup_entry+0x19/0x20
[  103.499704]  secondary_startup_64_no_verify+0xc2/0xcb
[  103.504800] Call Trace:
[  103.507261]  __wrmsr_on_cpu+0x33/0x40
[  103.510951]  flush_smp_call_function_queue+0x122/0x1a0
[  103.516097]  flush_smp_call_function_from_idle+0x6e/0xc0
[  103.521422]  do_idle+0x193/0x2e0
[  103.524653]  cpu_startup_entry+0x19/0x20
[  103.528595]  secondary_startup_64_no_verify+0xc2/0xcb
[  103.533649] Call Trace:
[  103.536122]  __wrmsr_on_cpu+0x33/0x40
[  103.539794]  flush_smp_call_function_queue+0x122/0x1a0
[  103.544932]  flush_smp_call_function_from_idle+0x6e/0xc0
[  103.550256]  do_idle+0x193/0x2e0
[  103.553490]  cpu_startup_entry+0x19/0x20
[  103.557437]  secondary_startup_64_no_verify+0xc2/0xcb
[  103.562530] Call Trace:
[  103.565004]  __wrmsr_on_cpu+0x33/0x40
[  103.568675]  flush_smp_call_function_queue+0x122/0x1a0
[  103.573829]  flush_smp_call_function_from_idle+0x6e/0xc0
[  103.580188]  do_idle+0x193/0x2e0
[  103.584458]  cpu_startup_entry+0x19/0x20
[  103.589398]  secondary_startup_64_no_verify+0xc2/0xcb
[  103.595487] Call Trace:
[  103.598961]  __wrmsr_on_cpu+0x33/0x40
[  103.603598]  flush_smp_call_function_queue+0x122/0x1a0
[  103.609647]  flush_smp_call_function_from_idle+0x6e/0xc0
[  103.615856]  do_idle+0x193/0x2e0
[  103.619983]  cpu_startup_entry+0x19/0x20
[  103.624787]  secondary_startup_64_no_verify+0xc2/0xcb
[  103.630692] Call Trace:
[  103.633978]  __wrmsr_on_cpu+0x33/0x40
[  103.638459]  flush_smp_call_function_queue+0x122/0x1a0
[  103.644401]  flush_smp_call_function_from_idle+0x6e/0xc0
[  103.650475]  do_idle+0x193/0x2e0
[  103.654461]  cpu_startup_entry+0x19/0x20
[  103.659113]  secondary_startup_64_no_verify+0xc2/0xcb
[  103.664887] Call Trace:
[  103.668038]  __wrmsr_on_cpu+0x33/0x40
[  103.672376]  generic_exec_single+0xd4/0x110
[  103.677250]  ? __rdmsr_on_cpu+0x40/0x40
[  103.681764]  smp_call_function_single+0xba/0x190
[  103.687051]  ? __rdmsr_on_cpu+0x40/0x40
[  103.691521]  wrmsrl_on_cpu+0x4f/0x70
[  103.695751]  intel_pstate_disable_hwp_interrupt+0x18/0x60
[  103.701789]  intel_pstate_suspend+0x25/0x50
[  103.706607]  cpufreq_suspend+0x7c/0x120
[  103.711085]  device_shutdown+0x1a/0x1b0
[  103.715535]  __do_sys_reboot.cold+0x2f/0x5b
[  103.720343]  ? vfs_writev+0xc1/0x190
[  103.724534]  ? trace_hardirqs_on+0x2b/0xe0
[  103.729268]  ? call_rcu+0x1f6/0x6b0
[  103.733366]  ? syscall_enter_from_user_mode+0x17/0x70
[  103.738987]  do_syscall_64+0x3b/0x90
[  103.743136]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  103.748766] RIP: 0033:0x7fb707540ef7
[  103.752907] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 51 af 0c 00 f7 d8 64 89 02 b8
[  103.772899] RSP: 002b:00007ffde6d8d2a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[  103.781108] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb707540ef7
[  103.788862] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
[  103.796616] RBP: 00007ffde6d8d500 R08: 000000000000000a R09: 0000000000000000
[  103.804367] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffde6d8d338
[  103.812156] R13: 0000000000000001 R14: 0000000000000000 R15: 00005610b428a968
[  103.819955] Call Trace:
[  103.823057]  <IRQ>
[  103.825702]  __wrmsr_on_cpu+0x33/0x40
[  103.829998]  flush_smp_call_function_queue+0x122/0x1a0
[  103.835773]  __sysvec_call_function+0x29/0xc0
[  103.840767]  sysvec_call_function+0x9d/0xd0
[  103.845590]  </IRQ>
[  103.848336]  asm_sysvec_call_function+0x12/0x20
[  103.853503] RIP: 0010:cpuidle_enter_state+0xd6/0x3a0
[  103.859098] Code: 54 e8 de b1 7d ff 49 89 c5 0f 1f 44 00 00 31 ff e8 ff c7 7d ff 45 84 ff 0f 85 df 01 00 00 e8 41 15 8b ff fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 11 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d
[  103.879209] RSP: 0018:ffffffffaca03e58 EFLAGS: 00000202
[  103.885139] RAX: 0000000000000006 RBX: 0000000000000004 RCX: 000000000000001f
[  103.892971] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffffab93c4cf
[  103.900805] RBP: ffff9e9f9aa32f00 R08: 000000182c26c48b R09: 0000000000000008
[  103.908647] R10: 0000000000000392 R11: 000000000000036f R12: ffffffffacc5a980
[  103.916493] R13: 000000182c26c48b R14: 0000000000000004 R15: 0000000000000000
[  103.924341]  ? cpuidle_enter_state+0xcf/0x3a0
[  103.929418]  ? cpuidle_enter_state+0xcf/0x3a0
[  103.934496]  cpuidle_enter+0x29/0x40
[  103.938794]  do_idle+0x220/0x2e0
[  103.942747]  cpu_startup_entry+0x19/0x20
[  103.947385]  start_kernel+0x92e/0x953
[  103.951775]  secondary_startup_64_no_verify+0xc2/0xcb
[  103.958736] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  103.966762] e1000e: EEE TX LPI TIMER: 00000011
[  104.059640] reboot: Restarting system
[  104.064017] reboot: machine restart

The RIP of cpuidle_enter_state+0xd6/0x3a0 gives me in gdb:

(gdb) li *cpuidle_enter_state+0xd6
0xffffffff8193c4d6 is in cpuidle_enter_state (/work/git/linux-trace.git/drivers/cpuidle/cpuidle.c:259).
254             }
255
256             if (!cpuidle_state_is_coupled(drv, index))
257                     local_irq_enable();
258
259             if (entered_state >= 0) {
260                     s64 diff, delay = drv->states[entered_state].exit_latency_ns;
261                     int i;
262
263                     /*

And the disassemble of that location:

   0xffffffff8193c496 <+150>:   je     0xffffffff8193c6c5 <cpuidle_enter_state+709>
   0xffffffff8193c49c <+156>:   call   0xffffffff811ee680 <start_critical_timings>
   0xffffffff8193c4a1 <+161>:   call   0xffffffff81117810 <sched_clock_idle_wakeup_event>
   0xffffffff8193c4a6 <+166>:   mov    %gs:0x7e6d50bb(%rip),%edi        # 0x11568 <cpu_number>
   0xffffffff8193c4ad <+173>:   call   0xffffffff81117690 <sched_clock_cpu>
   0xffffffff8193c4b2 <+178>:   mov    %rax,%r13
   0xffffffff8193c4b5 <+181>:   nopl   0x0(%rax,%rax,1)
   0xffffffff8193c4ba <+186>:   xor    %edi,%edi
   0xffffffff8193c4bc <+188>:   call   0xffffffff81118cc0 <sched_idle_set_state>
   0xffffffff8193c4c1 <+193>:   test   %r15b,%r15b
   0xffffffff8193c4c4 <+196>:   jne    0xffffffff8193c6a9 <cpuidle_enter_state+681>
   0xffffffff8193c4ca <+202>:   call   0xffffffff811eda10 <trace_hardirqs_on>
   0xffffffff8193c4cf <+207>:   call   *0xffffffff82a467a0
   0xffffffff8193c4d6 <+214>:   test   %r14d,%r14d
   0xffffffff8193c4d9 <+217>:   js     0xffffffff8193c5f0 <cpuidle_enter_state+496>
   0xffffffff8193c4df <+223>:   movslq %r14d,%rax
   0xffffffff8193c4e2 <+226>:   sub    (%rsp),%r13
   0xffffffff8193c4e6 <+230>:   lea    (%rax,%rax,2),%rdx
   0xffffffff8193c4ea <+234>:   lea    (%rax,%rdx,4),%rdx
   0xffffffff8193c4ee <+238>:   lea    (%r12,%rdx,8),%rcx

Where there's: call   *0xffffffff82a467a0

Just above it, which isn't any address I can find.

But I believe that it's the local_irq_enable() causing it, as when I
compile it with -E, I have:

 if (!cpuidle_state_is_coupled(drv, index))
  do { trace_hardirqs_on(); arch_local_irq_enable(); } while (0);


And:

[ formatted to read better ]

static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) __attribute__((no_instrument_function)) __attribute__((no_instrument_function)) void arch_local_irq_enable(void)
{
    (void)({ unsigned long __edi = __edi, __esi = __esi, __edx = __edx, __ecx = __ecx, __eax = __eax;; ((void)pv_ops.irq.irq_enable.func);
	 asm volatile("# ALT: oldnstr\n"
			 "661:\n\t"
			 "771:\n\t"
			 "999:\n\t"
			 ".pushsection .discard.retpoline_safe\n\t" " "
			 ".quad" " " " 999b\n\t"
             		 ".popsection\n\t"
			 "call *%c[paravirt_opptr];" "\n"
			 "772:\n" ".pushsection .parainstructions,\"a\"\n" " "
			 ".balign 8" " " "\n"
			 " " ".quad" " " " 771b\n"
			 "  .byte " "%c[paravirt_typenum]" "\n"
			 "  .byte 772b-771b\n"
			 "  .short " "%c[paravirt_clobber]" "\n"
			 ".popsection\n" "\n
             662:\n"
			 "# ALT: padding\n"
			 ".skip -(((" "665""1""f-""664""1""f" ")-(" "662b-661b" ")) > 0) * " "((" "665""1""f-""664""1""f" ")-(" "662b-661b" ")),0x90\n"
			 "663" ":\n"
			 ".pushsection .altinstructions,\"a\"\n"
			 " .long 661b - .\n"
			 " .long " "664""1""f - .\n"
			 " .word " "((( 8*3
             2+16)) | (1 << 15))" "\n"
			 " .byte " "663""b-661b" "\n"
			 " .byte " "665""1""f-""664""1""f" "\n"
			 ".popsection\n"
			 ".pushsection .altinstr_replacement, \"ax\"\n"
			 "# ALT: replacement " "1" "\n"
			 "664""1"":\n\t"
			 "sti;" "\n"
			 "665""1" ":\n"
			 ".popsection\n" : "=a"
             (__eax), "+r" (current_stack_pointer) :
		 [paravirt_typenum] "i" ((__builtin_offsetof(struct paravirt_patch_template, irq.irq_enable.func) / sizeof(void *))), [paravirt_opptr] "i" (&(pv_ops.irq.irq_enable.func)), [paravirt_clobber] "i" (((1 << 0))) :
		 "memory", "cc" ); ;
           });
}


I bisected it down to:

 57577c996d73 ("cpufreq: intel_pstate: Process HWP Guaranteed change notification")

Removing the commit makes the issue go away. Adding it back brings it back.

The above gdb was done from this commit.

So I'm guessing that this commit calls some code that does not expect to be
called on powering off, at least in debug mode, and is triggering this
stack dump. Probably because there was some static call or something that
isn't shutdown properly.

I also attached the config.

-- Steve

[-- Attachment #2: console-log.gz --]
[-- Type: application/gzip, Size: 28034 bytes --]

[-- Attachment #3: config.gz --]
[-- Type: application/gzip, Size: 37053 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [STACK DUMP][5.16-rc?] from (cpufreq: intel_pstate: Process HWP Guaranteed change notification)
  2021-11-03 20:33 [STACK DUMP][5.16-rc?] from (cpufreq: intel_pstate: Process HWP Guaranteed change notification) Steven Rostedt
@ 2021-11-03 21:58 ` Linus Torvalds
  2021-11-04  4:42   ` Srinivas Pandruvada
  0 siblings, 1 reply; 3+ messages in thread
From: Linus Torvalds @ 2021-11-03 21:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Peter Zijlstra, Srinivas Pandruvada, pablomh, Rafael J. Wysocki

On Wed, Nov 3, 2021 at 1:33 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> The RIP of cpuidle_enter_state+0xd6/0x3a0 gives me in gdb:

That's not actually the error address - it's not an oops. That's just
the register data at the interrupt entry.

The error was that triggered this was that

    unchecked MSR access error:
        WRMSR to 0x773 (tried to write 0x0000000000000000)
        at rIP: 0xffffffffab06e824 (native_write_msr+0x4/0x20)

and then it prints the stack frame - because we shouldn't be doing
random MSR writes.

And part of printing the stack frame is printing the register state at
kernel entries when switching to an interrupt or exception stack.

So that RIP you point to is not really interesting, and it looks like
it's an entirely random harmless instruction, it's just that it is
preceded by that

        sti
        nopw

so it's not surprising that it's an interrupt point, because it's
exactly one instruction after interrupts were enabled on that CPU (the
one instruction being the sti shadow that itself doesn't take
interrupts).

So cpuidle_enter_state() itself is entirely harmless, and you see that
in the call chain presumably simply because the CPU was idle when the
interrupt happened. Again, that's a very reasonable place for an
interrupt to happen.

End result: all of this is simply due to an unexpected MSR write, and
the real issue is that interrupt from an IPI that caused the WRMSR
write on another CPU:

    __wrmsr_on_cpu+0x33/0x40
    flush_smp_call_function_queue+0x122/0x1a0
    __sysvec_call_function_single+0x29/0xc0
    sysvec_call_function_single+0x9d/0xd0

and the rest of the call chain is just noise from "whatever that CPU
happened to do when it got the IPI".

You don't get the call chain of whatever actually did the IPI, I'm
afraid. That's the actual _cause_ of all this, but that happens on
another CPU entirely.

> I bisected it down to:
>
>  57577c996d73 ("cpufreq: intel_pstate: Process HWP Guaranteed change notification")
>
> Removing the commit makes the issue go away. Adding it back brings it back'

That commit does various odd things.

The code seems to depend on the 'hwp_notify_lock' spinlock, but then
it does some work outside that locked region (exactly because that
work triggers an interrupt that needs that lock):

      wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00);

and that's exactly what will then warn because that MSR hasn't been
enabled on that CPU.

But it also does randomly do some READ_ONCE/WRITE_ONCE magic that all
seems entirely bogus. Those values should be protected by the lock.

The way it cancels delayed work is also very questionable: it does

        spin_lock_irqsave(&hwp_notify_lock, flags);
        if (cpumask_test_and_clear_cpu(cpudata->cpu, &hwp_intr_enable_mask))
                cancel_delayed_work(&cpudata->hwp_notify_work);
        spin_unlock_irqrestore(&hwp_notify_lock, flags);

but if the delayed work was already started on that cpu, then it will
continue running concurrently, even if the caller now thinks it's all
disabled.

You need the "sync" version to make sure the work is also waited on,
but you can't do that under the spinlock that the work in question may
need, of course.

> So I'm guessing that this commit calls some code that does not expect to be
> called on powering off, at least in debug mode, and is triggering this
> stack dump. Probably because there was some static call or something that
> isn't shutdown properly.

It actually looks more direct than that to me: the code literally
seems to do that wrmsrl_on_cpu() that triggers the IPI that actually
causes the unchecked MSR write.

I don't know this hardware, and I don't know that code, but there are
certainly a few red flags here.

Srinivas?

              Linus

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [STACK DUMP][5.16-rc?] from (cpufreq: intel_pstate: Process HWP Guaranteed change notification)
  2021-11-03 21:58 ` Linus Torvalds
@ 2021-11-04  4:42   ` Srinivas Pandruvada
  0 siblings, 0 replies; 3+ messages in thread
From: Srinivas Pandruvada @ 2021-11-04  4:42 UTC (permalink / raw)
  To: Linus Torvalds, Steven Rostedt
  Cc: LKML, Peter Zijlstra, pablomh, Rafael J. Wysocki

On Wed, 2021-11-03 at 14:58 -0700, Linus Torvalds wrote:
> On Wed, Nov 3, 2021 at 1:33 PM Steven Rostedt <rostedt@goodmis.org>
> wrote:
> > 
> > The RIP of cpuidle_enter_state+0xd6/0x3a0 gives me in gdb:
> 
> That's not actually the error address - it's not an oops. That's just
> the register data at the interrupt entry.
> 
> The error was that triggered this was that
> 
>     unchecked MSR access error:
>         WRMSR to 0x773 (tried to write 0x0000000000000000)
>         at rIP: 0xffffffffab06e824 (native_write_msr+0x4/0x20)
> 

MSR 773 access must be protected by a CPUID flag. Sorry I missed this
check during offline path.

diff --git a/drivers/cpufreq/intel_pstate.c
b/drivers/cpufreq/intel_pstate.c
index 349ddbaef796..4184073be2c4 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1621,7 +1621,8 @@ static void
intel_pstate_disable_hwp_interrupt(struct cpudata *cpudata)
        unsigned long flags;
 
        /* wrmsrl_on_cpu has to be outside spinlock as this can result
in IPC */
-       wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00);
+       if (boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
+               wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00);
 
        spin_lock_irqsave(&hwp_notify_lock, flags);
        if (cpumask_test_and_clear_cpu(cpudata->cpu,
&hwp_intr_enable_mask))


I will submit a patch to fix this.

Thanks,
Srinivas

> and then it prints the stack frame - because we shouldn't be doing
> random MSR writes.
> 
> And part of printing the stack frame is printing the register state
> at
> kernel entries when switching to an interrupt or exception stack.
> 
> So that RIP you point to is not really interesting, and it looks like
> it's an entirely random harmless instruction, it's just that it is
> preceded by that
> 
>         sti
>         nopw
> 
> so it's not surprising that it's an interrupt point, because it's
> exactly one instruction after interrupts were enabled on that CPU
> (the
> one instruction being the sti shadow that itself doesn't take
> interrupts).
> 
> So cpuidle_enter_state() itself is entirely harmless, and you see
> that
> in the call chain presumably simply because the CPU was idle when the
> interrupt happened. Again, that's a very reasonable place for an
> interrupt to happen.
> 
> End result: all of this is simply due to an unexpected MSR write, and
> the real issue is that interrupt from an IPI that caused the WRMSR
> write on another CPU:
> 
>     __wrmsr_on_cpu+0x33/0x40
>     flush_smp_call_function_queue+0x122/0x1a0
>     __sysvec_call_function_single+0x29/0xc0
>     sysvec_call_function_single+0x9d/0xd0
> 
> and the rest of the call chain is just noise from "whatever that CPU
> happened to do when it got the IPI".
> 
> You don't get the call chain of whatever actually did the IPI, I'm
> afraid. That's the actual _cause_ of all this, but that happens on
> another CPU entirely.
> 
> > I bisected it down to:
> > 
> >  57577c996d73 ("cpufreq: intel_pstate: Process HWP Guaranteed
> > change notification")
> > 
> > Removing the commit makes the issue go away. Adding it back brings
> > it back'
> 
> That commit does various odd things.
> 
> The code seems to depend on the 'hwp_notify_lock' spinlock, but then
> it does some work outside that locked region (exactly because that
> work triggers an interrupt that needs that lock):
> 
>       wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00);
> 
> and that's exactly what will then warn because that MSR hasn't been
> enabled on that CPU.
> 
> But it also does randomly do some READ_ONCE/WRITE_ONCE magic that all
> seems entirely bogus. Those values should be protected by the lock.
> 
> The way it cancels delayed work is also very questionable: it does
> 
>         spin_lock_irqsave(&hwp_notify_lock, flags);
>         if (cpumask_test_and_clear_cpu(cpudata->cpu,
> &hwp_intr_enable_mask))
>                 cancel_delayed_work(&cpudata->hwp_notify_work);
>         spin_unlock_irqrestore(&hwp_notify_lock, flags);
> 
> but if the delayed work was already started on that cpu, then it will
> continue running concurrently, even if the caller now thinks it's all
> disabled.
> 
> You need the "sync" version to make sure the work is also waited on,
> but you can't do that under the spinlock that the work in question
> may
> need, of course.
> 
> > So I'm guessing that this commit calls some code that does not
> > expect to be
> > called on powering off, at least in debug mode, and is triggering
> > this
> > stack dump. Probably because there was some static call or
> > something that
> > isn't shutdown properly.
> 
> It actually looks more direct than that to me: the code literally
> seems to do that wrmsrl_on_cpu() that triggers the IPI that actually
> causes the unchecked MSR write.
> 
> I don't know this hardware, and I don't know that code, but there are
> certainly a few red flags here.
> 
> Srinivas?
> 

>               Linus



^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-04  4:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-03 20:33 [STACK DUMP][5.16-rc?] from (cpufreq: intel_pstate: Process HWP Guaranteed change notification) Steven Rostedt
2021-11-03 21:58 ` Linus Torvalds
2021-11-04  4:42   ` Srinivas Pandruvada

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).