S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get

* S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get
@ 2021-08-17 14:50 Marek Marczykowski-Górecki
  2021-08-17 15:15 ` Juergen Gross
  0 siblings, 1 reply; 6+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-08-17 14:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 6957 bytes --]

On Tue, Aug 17, 2021 at 04:04:24PM +0200, Jan Beulich wrote:
> On 17.08.2021 15:48, Marek Marczykowski-Górecki wrote:
> > On Tue, Aug 17, 2021 at 02:29:20PM +0100, Andrew Cooper wrote:
> >> On 17/08/2021 14:21, Jan Beulich wrote:
> >>> On 17.08.2021 15:06, Andrew Cooper wrote:
> >>>> Perhaps we want the cpu_down() logic to explicitly invalidate their
> >>>> lazily cached values?
> >>> I'd rather do this on the cpu_up() path (no point clobbering what may
> >>> get further clobbered, and then perhaps not to a value of our liking),
> >>> yet then we can really avoid doing this from a notifier and instead do
> >>> it early enough in xstate_init() (taking care of XSS at the same time).
> > 
> > Funny you mention notifiers. Apparently cpufreq driver does use it to
> > initialize things. And fails to do so:
> > 
> > (XEN) Finishing wakeup from ACPI S3 state.
> > (XEN) CPU0: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
> > (XEN) Enabling non-boot CPUs  ...
> > (XEN) CPU1: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
> > (XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Not tainted ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82d04024ad2b>] vcpu_runstate_get+0x153/0x244
> > (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000   rbx: ffff830049667c50   rcx: 0000000000000001
> > (XEN) rdx: 000000321d74d000   rsi: ffff830049667c50   rdi: ffff83025dcc0000
> > (XEN) rbp: ffff830049667c40   rsp: ffff830049667c10   r8:  ffff83020511a820
> > (XEN) r9:  ffff82d04057ef78   r10: 0180000000000000   r11: 8000000000000000
> > (XEN) r12: ffff83025dcc0000   r13: ffff830205118c60   r14: 0000000000000001
> > (XEN) r15: 0000000000000010   cr0: 000000008005003b   cr4: 00000000003526e0
> > (XEN) cr3: 0000000049656000   cr2: 0000000000000028
> > (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen code around <ffff82d04024ad2b> (vcpu_runstate_get+0x153/0x244):
> > (XEN)  48 8b 14 ca 48 8b 04 02 <4c> 8b 70 28 e9 01 ff ff ff 4c 8d 3d dd 64 32 00
> > (XEN) Xen stack trace from rsp=ffff830049667c10:
> > (XEN)    0000000000000180 ffff83025dcbd410 ffff83020511bf30 ffff830205118c60
> > (XEN)    0000000000000001 0000000000000010 ffff830049667c80 ffff82d04024ae73
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 ffff830049667cb8 ffff82d0402560a9
> > (XEN)    ffff830205118320 0000000000000001 ffff83020511bf30 ffff83025dc7a6f0
> > (XEN)    0000000000000000 ffff830049667d58 ffff82d040254cb1 00000001402e9f74
> > (XEN)    0000000000000000 ffff830049667d10 ffff82d040224eda 000000000025dc81
> > (XEN)    000000321d74d000 ffff82d040571278 0000000000000001 ffff830049667d28
> > (XEN)    ffff82d040228b44 ffff82d0403102cf 0000000000000000 ffff82d0402283a4
> > (XEN)    ffff82d040459688 ffff82d040459680 ffff82d040459240 0000000000000004
> > (XEN)    0000000000000000 ffff830049667d68 ffff82d04025510e ffff830049667db0
> > (XEN)    ffff82d040221ba4 0000000000000000 0000000000000001 0000000000000001
> > (XEN)    0000000000000000 ffff830049667e00 0000000000000001 ffff82d04058a5c0
> > (XEN)    ffff830049667dc8 ffff82d040203867 0000000000000001 ffff830049667df0
> > (XEN)    ffff82d040203c51 ffff82d040459400 0000000000000001 0000000000000010
> > (XEN)    ffff830049667e20 ffff82d040203e26 ffff830049667ef8 0000000000000000
> > (XEN)    0000000000000003 0000000000000200 ffff830049667e50 ffff82d040270bac
> > (XEN)    ffff83020116a640 ffff830258ff6000 0000000000000000 0000000000000000
> > (XEN)    ffff830049667e70 ffff82d0402056aa ffff830258ff61b8 ffff82d0405701b0
> > (XEN)    ffff830049667e88 ffff82d04022963c ffff82d0405701a0 ffff830049667eb8
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d04024ad2b>] R vcpu_runstate_get+0x153/0x244

This is xen/common/sched/core.c:322. get_sched_res(v->processor) is
NULL at this point for CPU1.

The only place that can calls set_sched_res() and doesn't expect the
previous value to be valid, is cpu_schedule_up(). For non-BSP its called
_only_ from notifier at CPU_UP_PREPARE (cpu_schedule_callback()), but
that notifier explicitly exclude suspend/resume case:

    static int cpu_schedule_callback(
        struct notifier_block *nfb, unsigned long action, void *hcpu)
    {
        unsigned int cpu = (unsigned long)hcpu;
        int rc = 0;

        /*
         * All scheduler related suspend/resume handling needed is done in
         * cpupool.c.
         */
        if ( system_state > SYS_STATE_active )
            return NOTIFY_DONE;

But, nothing in cpupool.c is calling into set_sched_res().

On the other hand, sched_rm_cpu() (which I believe is called as part of
parking the CPU) calls cpu_schedule_down(), which then calls
set_sched_res(cpu, NULL).

In short: scheduler for parked CPUs is not re-initialized during resume.
But cpufreq expects it to be...

> > (XEN)    [<ffff82d04024ae73>] F get_cpu_idle_time+0x57/0x59
> > (XEN)    [<ffff82d0402560a9>] F cpufreq_statistic_init+0x191/0x210
> > (XEN)    [<ffff82d040254cb1>] F cpufreq_add_cpu+0x3cc/0x5bb
> > (XEN)    [<ffff82d04025510e>] F cpufreq.c#cpu_callback+0x27/0x32
> > (XEN)    [<ffff82d040221ba4>] F notifier_call_chain+0x6c/0x96
> > (XEN)    [<ffff82d040203867>] F cpu.c#cpu_notifier_call_chain+0x1b/0x36
> > (XEN)    [<ffff82d040203c51>] F cpu_up+0xaf/0xc8
> > (XEN)    [<ffff82d040203e26>] F enable_nonboot_cpus+0x6b/0x1f8
> > (XEN)    [<ffff82d040270bac>] F power.c#enter_state_helper+0x152/0x60a
> > (XEN)    [<ffff82d0402056aa>] F domain.c#continue_hypercall_tasklet_handler+0x4c/0xb9
> > (XEN)    [<ffff82d04022963c>] F tasklet.c#do_tasklet_work+0x76/0xac
> > (XEN)    [<ffff82d040229920>] F do_tasklet+0x58/0x8a
> > (XEN)    [<ffff82d0402e6607>] F domain.c#idle_loop+0x74/0xdd
> > (XEN) 
> > (XEN) Pagetable walk from 0000000000000028:
> > (XEN)  L4[0x000] = 000000025dce1063 ffffffffffffffff
> > (XEN)  L3[0x000] = 000000025dce0063 ffffffffffffffff
> > (XEN)  L2[0x000] = 000000025dcdf063 ffffffffffffffff
> > (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> > (XEN) 
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) FATAL PAGE FAULT
> > (XEN) [error_code=0000]
> > (XEN) Faulting linear address: 0000000000000028
> > (XEN) ****************************************
> > 
> > This is after adding brutal `this_cpu(xcr0) = 0` in xstate_init().
> 
> And presumably again only with "smt=0"? 

Yes. With smt=1 suspend works fine on this particular machine. At least
with only dom0 running (haven't tried with any domU yet)...

> In any event, for us to not mix
> things, may I ask that you start a new thread for this further issue?

Sure.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread