All of lore.kernel.org
 help / color / mirror / Atom feed
* S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get
@ 2021-08-17 14:50 Marek Marczykowski-Górecki
  2021-08-17 15:15 ` Juergen Gross
  0 siblings, 1 reply; 6+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-08-17 14:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 6957 bytes --]

On Tue, Aug 17, 2021 at 04:04:24PM +0200, Jan Beulich wrote:
> On 17.08.2021 15:48, Marek Marczykowski-Górecki wrote:
> > On Tue, Aug 17, 2021 at 02:29:20PM +0100, Andrew Cooper wrote:
> >> On 17/08/2021 14:21, Jan Beulich wrote:
> >>> On 17.08.2021 15:06, Andrew Cooper wrote:
> >>>> Perhaps we want the cpu_down() logic to explicitly invalidate their
> >>>> lazily cached values?
> >>> I'd rather do this on the cpu_up() path (no point clobbering what may
> >>> get further clobbered, and then perhaps not to a value of our liking),
> >>> yet then we can really avoid doing this from a notifier and instead do
> >>> it early enough in xstate_init() (taking care of XSS at the same time).
> > 
> > Funny you mention notifiers. Apparently cpufreq driver does use it to
> > initialize things. And fails to do so:
> > 
> > (XEN) Finishing wakeup from ACPI S3 state.
> > (XEN) CPU0: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
> > (XEN) Enabling non-boot CPUs  ...
> > (XEN) CPU1: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
> > (XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Not tainted ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82d04024ad2b>] vcpu_runstate_get+0x153/0x244
> > (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000   rbx: ffff830049667c50   rcx: 0000000000000001
> > (XEN) rdx: 000000321d74d000   rsi: ffff830049667c50   rdi: ffff83025dcc0000
> > (XEN) rbp: ffff830049667c40   rsp: ffff830049667c10   r8:  ffff83020511a820
> > (XEN) r9:  ffff82d04057ef78   r10: 0180000000000000   r11: 8000000000000000
> > (XEN) r12: ffff83025dcc0000   r13: ffff830205118c60   r14: 0000000000000001
> > (XEN) r15: 0000000000000010   cr0: 000000008005003b   cr4: 00000000003526e0
> > (XEN) cr3: 0000000049656000   cr2: 0000000000000028
> > (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen code around <ffff82d04024ad2b> (vcpu_runstate_get+0x153/0x244):
> > (XEN)  48 8b 14 ca 48 8b 04 02 <4c> 8b 70 28 e9 01 ff ff ff 4c 8d 3d dd 64 32 00
> > (XEN) Xen stack trace from rsp=ffff830049667c10:
> > (XEN)    0000000000000180 ffff83025dcbd410 ffff83020511bf30 ffff830205118c60
> > (XEN)    0000000000000001 0000000000000010 ffff830049667c80 ffff82d04024ae73
> > (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 ffff830049667cb8 ffff82d0402560a9
> > (XEN)    ffff830205118320 0000000000000001 ffff83020511bf30 ffff83025dc7a6f0
> > (XEN)    0000000000000000 ffff830049667d58 ffff82d040254cb1 00000001402e9f74
> > (XEN)    0000000000000000 ffff830049667d10 ffff82d040224eda 000000000025dc81
> > (XEN)    000000321d74d000 ffff82d040571278 0000000000000001 ffff830049667d28
> > (XEN)    ffff82d040228b44 ffff82d0403102cf 0000000000000000 ffff82d0402283a4
> > (XEN)    ffff82d040459688 ffff82d040459680 ffff82d040459240 0000000000000004
> > (XEN)    0000000000000000 ffff830049667d68 ffff82d04025510e ffff830049667db0
> > (XEN)    ffff82d040221ba4 0000000000000000 0000000000000001 0000000000000001
> > (XEN)    0000000000000000 ffff830049667e00 0000000000000001 ffff82d04058a5c0
> > (XEN)    ffff830049667dc8 ffff82d040203867 0000000000000001 ffff830049667df0
> > (XEN)    ffff82d040203c51 ffff82d040459400 0000000000000001 0000000000000010
> > (XEN)    ffff830049667e20 ffff82d040203e26 ffff830049667ef8 0000000000000000
> > (XEN)    0000000000000003 0000000000000200 ffff830049667e50 ffff82d040270bac
> > (XEN)    ffff83020116a640 ffff830258ff6000 0000000000000000 0000000000000000
> > (XEN)    ffff830049667e70 ffff82d0402056aa ffff830258ff61b8 ffff82d0405701b0
> > (XEN)    ffff830049667e88 ffff82d04022963c ffff82d0405701a0 ffff830049667eb8
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d04024ad2b>] R vcpu_runstate_get+0x153/0x244

This is xen/common/sched/core.c:322. get_sched_res(v->processor) is
NULL at this point for CPU1.

The only place that can calls set_sched_res() and doesn't expect the
previous value to be valid, is cpu_schedule_up(). For non-BSP its called
_only_ from notifier at CPU_UP_PREPARE (cpu_schedule_callback()), but
that notifier explicitly exclude suspend/resume case:

    static int cpu_schedule_callback(
        struct notifier_block *nfb, unsigned long action, void *hcpu)
    {
        unsigned int cpu = (unsigned long)hcpu;
        int rc = 0;

        /*
         * All scheduler related suspend/resume handling needed is done in
         * cpupool.c.
         */
        if ( system_state > SYS_STATE_active )
            return NOTIFY_DONE;

But, nothing in cpupool.c is calling into set_sched_res().

On the other hand, sched_rm_cpu() (which I believe is called as part of
parking the CPU) calls cpu_schedule_down(), which then calls
set_sched_res(cpu, NULL).

In short: scheduler for parked CPUs is not re-initialized during resume.
But cpufreq expects it to be...


> > (XEN)    [<ffff82d04024ae73>] F get_cpu_idle_time+0x57/0x59
> > (XEN)    [<ffff82d0402560a9>] F cpufreq_statistic_init+0x191/0x210
> > (XEN)    [<ffff82d040254cb1>] F cpufreq_add_cpu+0x3cc/0x5bb
> > (XEN)    [<ffff82d04025510e>] F cpufreq.c#cpu_callback+0x27/0x32
> > (XEN)    [<ffff82d040221ba4>] F notifier_call_chain+0x6c/0x96
> > (XEN)    [<ffff82d040203867>] F cpu.c#cpu_notifier_call_chain+0x1b/0x36
> > (XEN)    [<ffff82d040203c51>] F cpu_up+0xaf/0xc8
> > (XEN)    [<ffff82d040203e26>] F enable_nonboot_cpus+0x6b/0x1f8
> > (XEN)    [<ffff82d040270bac>] F power.c#enter_state_helper+0x152/0x60a
> > (XEN)    [<ffff82d0402056aa>] F domain.c#continue_hypercall_tasklet_handler+0x4c/0xb9
> > (XEN)    [<ffff82d04022963c>] F tasklet.c#do_tasklet_work+0x76/0xac
> > (XEN)    [<ffff82d040229920>] F do_tasklet+0x58/0x8a
> > (XEN)    [<ffff82d0402e6607>] F domain.c#idle_loop+0x74/0xdd
> > (XEN) 
> > (XEN) Pagetable walk from 0000000000000028:
> > (XEN)  L4[0x000] = 000000025dce1063 ffffffffffffffff
> > (XEN)  L3[0x000] = 000000025dce0063 ffffffffffffffff
> > (XEN)  L2[0x000] = 000000025dcdf063 ffffffffffffffff
> > (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> > (XEN) 
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) FATAL PAGE FAULT
> > (XEN) [error_code=0000]
> > (XEN) Faulting linear address: 0000000000000028
> > (XEN) ****************************************
> > 
> > This is after adding brutal `this_cpu(xcr0) = 0` in xstate_init().
> 
> And presumably again only with "smt=0"? 

Yes. With smt=1 suspend works fine on this particular machine. At least
with only dom0 running (haven't tried with any domU yet)...

> In any event, for us to not mix
> things, may I ask that you start a new thread for this further issue?

Sure.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get
  2021-08-17 14:50 S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get Marek Marczykowski-Górecki
@ 2021-08-17 15:15 ` Juergen Gross
  2021-08-18  6:24   ` Juergen Gross
  0 siblings, 1 reply; 6+ messages in thread
From: Juergen Gross @ 2021-08-17 15:15 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, Jan Beulich; +Cc: xen-devel, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 5221 bytes --]

On 17.08.21 16:50, Marek Marczykowski-Górecki wrote:
> On Tue, Aug 17, 2021 at 04:04:24PM +0200, Jan Beulich wrote:
>> On 17.08.2021 15:48, Marek Marczykowski-Górecki wrote:
>>> On Tue, Aug 17, 2021 at 02:29:20PM +0100, Andrew Cooper wrote:
>>>> On 17/08/2021 14:21, Jan Beulich wrote:
>>>>> On 17.08.2021 15:06, Andrew Cooper wrote:
>>>>>> Perhaps we want the cpu_down() logic to explicitly invalidate their
>>>>>> lazily cached values?
>>>>> I'd rather do this on the cpu_up() path (no point clobbering what may
>>>>> get further clobbered, and then perhaps not to a value of our liking),
>>>>> yet then we can really avoid doing this from a notifier and instead do
>>>>> it early enough in xstate_init() (taking care of XSS at the same time).
>>>
>>> Funny you mention notifiers. Apparently cpufreq driver does use it to
>>> initialize things. And fails to do so:
>>>
>>> (XEN) Finishing wakeup from ACPI S3 state.
>>> (XEN) CPU0: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
>>> (XEN) Enabling non-boot CPUs  ...
>>> (XEN) CPU1: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
>>> (XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Not tainted ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82d04024ad2b>] vcpu_runstate_get+0x153/0x244
>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
>>> (XEN) rax: 0000000000000000   rbx: ffff830049667c50   rcx: 0000000000000001
>>> (XEN) rdx: 000000321d74d000   rsi: ffff830049667c50   rdi: ffff83025dcc0000
>>> (XEN) rbp: ffff830049667c40   rsp: ffff830049667c10   r8:  ffff83020511a820
>>> (XEN) r9:  ffff82d04057ef78   r10: 0180000000000000   r11: 8000000000000000
>>> (XEN) r12: ffff83025dcc0000   r13: ffff830205118c60   r14: 0000000000000001
>>> (XEN) r15: 0000000000000010   cr0: 000000008005003b   cr4: 00000000003526e0
>>> (XEN) cr3: 0000000049656000   cr2: 0000000000000028
>>> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>> (XEN) Xen code around <ffff82d04024ad2b> (vcpu_runstate_get+0x153/0x244):
>>> (XEN)  48 8b 14 ca 48 8b 04 02 <4c> 8b 70 28 e9 01 ff ff ff 4c 8d 3d dd 64 32 00
>>> (XEN) Xen stack trace from rsp=ffff830049667c10:
>>> (XEN)    0000000000000180 ffff83025dcbd410 ffff83020511bf30 ffff830205118c60
>>> (XEN)    0000000000000001 0000000000000010 ffff830049667c80 ffff82d04024ae73
>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 ffff830049667cb8 ffff82d0402560a9
>>> (XEN)    ffff830205118320 0000000000000001 ffff83020511bf30 ffff83025dc7a6f0
>>> (XEN)    0000000000000000 ffff830049667d58 ffff82d040254cb1 00000001402e9f74
>>> (XEN)    0000000000000000 ffff830049667d10 ffff82d040224eda 000000000025dc81
>>> (XEN)    000000321d74d000 ffff82d040571278 0000000000000001 ffff830049667d28
>>> (XEN)    ffff82d040228b44 ffff82d0403102cf 0000000000000000 ffff82d0402283a4
>>> (XEN)    ffff82d040459688 ffff82d040459680 ffff82d040459240 0000000000000004
>>> (XEN)    0000000000000000 ffff830049667d68 ffff82d04025510e ffff830049667db0
>>> (XEN)    ffff82d040221ba4 0000000000000000 0000000000000001 0000000000000001
>>> (XEN)    0000000000000000 ffff830049667e00 0000000000000001 ffff82d04058a5c0
>>> (XEN)    ffff830049667dc8 ffff82d040203867 0000000000000001 ffff830049667df0
>>> (XEN)    ffff82d040203c51 ffff82d040459400 0000000000000001 0000000000000010
>>> (XEN)    ffff830049667e20 ffff82d040203e26 ffff830049667ef8 0000000000000000
>>> (XEN)    0000000000000003 0000000000000200 ffff830049667e50 ffff82d040270bac
>>> (XEN)    ffff83020116a640 ffff830258ff6000 0000000000000000 0000000000000000
>>> (XEN)    ffff830049667e70 ffff82d0402056aa ffff830258ff61b8 ffff82d0405701b0
>>> (XEN)    ffff830049667e88 ffff82d04022963c ffff82d0405701a0 ffff830049667eb8
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d04024ad2b>] R vcpu_runstate_get+0x153/0x244
> 
> This is xen/common/sched/core.c:322. get_sched_res(v->processor) is
> NULL at this point for CPU1.
> 
> The only place that can calls set_sched_res() and doesn't expect the
> previous value to be valid, is cpu_schedule_up(). For non-BSP its called
> _only_ from notifier at CPU_UP_PREPARE (cpu_schedule_callback()), but
> that notifier explicitly exclude suspend/resume case:
> 
>      static int cpu_schedule_callback(
>          struct notifier_block *nfb, unsigned long action, void *hcpu)
>      {
>          unsigned int cpu = (unsigned long)hcpu;
>          int rc = 0;
> 
>          /*
>           * All scheduler related suspend/resume handling needed is done in
>           * cpupool.c.
>           */
>          if ( system_state > SYS_STATE_active )
>              return NOTIFY_DONE;
> 
> But, nothing in cpupool.c is calling into set_sched_res().
> 
> On the other hand, sched_rm_cpu() (which I believe is called as part of
> parking the CPU) calls cpu_schedule_down(), which then calls
> set_sched_res(cpu, NULL).
> 
> In short: scheduler for parked CPUs is not re-initialized during resume.
> But cpufreq expects it to be...

I'll be looking into that.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get
  2021-08-17 15:15 ` Juergen Gross
@ 2021-08-18  6:24   ` Juergen Gross
  2021-08-18  6:41     ` Jan Beulich
  2021-08-18 10:12     ` Marek Marczykowski-Górecki
  0 siblings, 2 replies; 6+ messages in thread
From: Juergen Gross @ 2021-08-18  6:24 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, Jan Beulich; +Cc: xen-devel, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 5843 bytes --]

Marek,

On 17.08.21 17:15, Juergen Gross wrote:
> On 17.08.21 16:50, Marek Marczykowski-Górecki wrote:
>> On Tue, Aug 17, 2021 at 04:04:24PM +0200, Jan Beulich wrote:
>>> On 17.08.2021 15:48, Marek Marczykowski-Górecki wrote:
>>>> On Tue, Aug 17, 2021 at 02:29:20PM +0100, Andrew Cooper wrote:
>>>>> On 17/08/2021 14:21, Jan Beulich wrote:
>>>>>> On 17.08.2021 15:06, Andrew Cooper wrote:
>>>>>>> Perhaps we want the cpu_down() logic to explicitly invalidate their
>>>>>>> lazily cached values?
>>>>>> I'd rather do this on the cpu_up() path (no point clobbering what may
>>>>>> get further clobbered, and then perhaps not to a value of our 
>>>>>> liking),
>>>>>> yet then we can really avoid doing this from a notifier and 
>>>>>> instead do
>>>>>> it early enough in xstate_init() (taking care of XSS at the same 
>>>>>> time).
>>>>
>>>> Funny you mention notifiers. Apparently cpufreq driver does use it to
>>>> initialize things. And fails to do so:
>>>>
>>>> (XEN) Finishing wakeup from ACPI S3 state.
>>>> (XEN) CPU0: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
>>>> (XEN) Enabling non-boot CPUs  ...
>>>> (XEN) CPU1: xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f
>>>> (XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Not tainted ]----
>>>> (XEN) CPU:    0
>>>> (XEN) RIP:    e008:[<ffff82d04024ad2b>] vcpu_runstate_get+0x153/0x244
>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
>>>> (XEN) rax: 0000000000000000   rbx: ffff830049667c50   rcx: 
>>>> 0000000000000001
>>>> (XEN) rdx: 000000321d74d000   rsi: ffff830049667c50   rdi: 
>>>> ffff83025dcc0000
>>>> (XEN) rbp: ffff830049667c40   rsp: ffff830049667c10   r8:  
>>>> ffff83020511a820
>>>> (XEN) r9:  ffff82d04057ef78   r10: 0180000000000000   r11: 
>>>> 8000000000000000
>>>> (XEN) r12: ffff83025dcc0000   r13: ffff830205118c60   r14: 
>>>> 0000000000000001
>>>> (XEN) r15: 0000000000000010   cr0: 000000008005003b   cr4: 
>>>> 00000000003526e0
>>>> (XEN) cr3: 0000000049656000   cr2: 0000000000000028
>>>> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 
>>>> 0000000000000000
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>> (XEN) Xen code around <ffff82d04024ad2b> 
>>>> (vcpu_runstate_get+0x153/0x244):
>>>> (XEN)  48 8b 14 ca 48 8b 04 02 <4c> 8b 70 28 e9 01 ff ff ff 4c 8d 3d 
>>>> dd 64 32 00
>>>> (XEN) Xen stack trace from rsp=ffff830049667c10:
>>>> (XEN)    0000000000000180 ffff83025dcbd410 ffff83020511bf30 
>>>> ffff830205118c60
>>>> (XEN)    0000000000000001 0000000000000010 ffff830049667c80 
>>>> ffff82d04024ae73
>>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 
>>>> 0000000000000000
>>>> (XEN)    0000000000000000 0000000000000000 ffff830049667cb8 
>>>> ffff82d0402560a9
>>>> (XEN)    ffff830205118320 0000000000000001 ffff83020511bf30 
>>>> ffff83025dc7a6f0
>>>> (XEN)    0000000000000000 ffff830049667d58 ffff82d040254cb1 
>>>> 00000001402e9f74
>>>> (XEN)    0000000000000000 ffff830049667d10 ffff82d040224eda 
>>>> 000000000025dc81
>>>> (XEN)    000000321d74d000 ffff82d040571278 0000000000000001 
>>>> ffff830049667d28
>>>> (XEN)    ffff82d040228b44 ffff82d0403102cf 0000000000000000 
>>>> ffff82d0402283a4
>>>> (XEN)    ffff82d040459688 ffff82d040459680 ffff82d040459240 
>>>> 0000000000000004
>>>> (XEN)    0000000000000000 ffff830049667d68 ffff82d04025510e 
>>>> ffff830049667db0
>>>> (XEN)    ffff82d040221ba4 0000000000000000 0000000000000001 
>>>> 0000000000000001
>>>> (XEN)    0000000000000000 ffff830049667e00 0000000000000001 
>>>> ffff82d04058a5c0
>>>> (XEN)    ffff830049667dc8 ffff82d040203867 0000000000000001 
>>>> ffff830049667df0
>>>> (XEN)    ffff82d040203c51 ffff82d040459400 0000000000000001 
>>>> 0000000000000010
>>>> (XEN)    ffff830049667e20 ffff82d040203e26 ffff830049667ef8 
>>>> 0000000000000000
>>>> (XEN)    0000000000000003 0000000000000200 ffff830049667e50 
>>>> ffff82d040270bac
>>>> (XEN)    ffff83020116a640 ffff830258ff6000 0000000000000000 
>>>> 0000000000000000
>>>> (XEN)    ffff830049667e70 ffff82d0402056aa ffff830258ff61b8 
>>>> ffff82d0405701b0
>>>> (XEN)    ffff830049667e88 ffff82d04022963c ffff82d0405701a0 
>>>> ffff830049667eb8
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82d04024ad2b>] R vcpu_runstate_get+0x153/0x244
>>
>> This is xen/common/sched/core.c:322. get_sched_res(v->processor) is
>> NULL at this point for CPU1.
>>
>> The only place that can calls set_sched_res() and doesn't expect the
>> previous value to be valid, is cpu_schedule_up(). For non-BSP its called
>> _only_ from notifier at CPU_UP_PREPARE (cpu_schedule_callback()), but
>> that notifier explicitly exclude suspend/resume case:
>>
>>      static int cpu_schedule_callback(
>>          struct notifier_block *nfb, unsigned long action, void *hcpu)
>>      {
>>          unsigned int cpu = (unsigned long)hcpu;
>>          int rc = 0;
>>
>>          /*
>>           * All scheduler related suspend/resume handling needed is 
>> done in
>>           * cpupool.c.
>>           */
>>          if ( system_state > SYS_STATE_active )
>>              return NOTIFY_DONE;
>>
>> But, nothing in cpupool.c is calling into set_sched_res().
>>
>> On the other hand, sched_rm_cpu() (which I believe is called as part of
>> parking the CPU) calls cpu_schedule_down(), which then calls
>> set_sched_res(cpu, NULL).
>>
>> In short: scheduler for parked CPUs is not re-initialized during resume.
>> But cpufreq expects it to be...
> 
> I'll be looking into that.

Could you please try the attached patch?


Juergen


[-- Attachment #1.1.2: 0001-xen-sched-fix-get_cpu_idle_time-for-smt-0-suspend-re.patch --]
[-- Type: text/x-patch, Size: 1846 bytes --]

From e38d0816a33fbaa3c0c45bcd6e9d433b1e021222 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Dario Faggioli <dfaggioli@suse.com>
Date: Wed, 18 Aug 2021 08:18:20 +0200
Subject: [PATCH] xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

With smt=0 during a suspend/resume cycle of the machine the threads
which have been parked before will briefly come up again. This can
result in problems e.g. with cpufreq driver being active as this will
call into get_cpu_idle_time() for a cpu without initialized scheduler
data.

Fix that by letting get_cpu_idle_time() deal with this case.

Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
An alternative way to fix the issue would be to keep the sched_resource
of offline cpus allocated like we already do with idle vcpus and units.
This fix would be more intrusive, but it would avoid similar other bugs
like this one.
---
 xen/common/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 6d34764d38..9ac1b01ca8 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
     struct vcpu_runstate_info state = { 0 };
     const struct vcpu *v = idle_vcpu[cpu];
 
-    if ( cpu_online(cpu) && v )
+    if ( cpu_online(cpu) && v && get_sched_res(cpu) )
         vcpu_runstate_get(v, &state);
 
     return state.time[RUNSTATE_running];
-- 
2.26.2


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get
  2021-08-18  6:24   ` Juergen Gross
@ 2021-08-18  6:41     ` Jan Beulich
  2021-08-18  6:52       ` Juergen Gross
  2021-08-18 10:12     ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2021-08-18  6:41 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Andrew Cooper, Marek Marczykowski-Górecki

On 18.08.2021 08:24, Juergen Gross wrote:
> Could you please try the attached patch?

Seeing the patch, two questions:

1) Can idle_vcpu[cpu] ever be NULL when cpu_online(cpu) returned true?

2) Seeing get_sched_res()'es access to per-CPU data, would it make sense
to move the cpu_online() check into there? While I guess the majority of
users are guaranteed to invoke it for online CPUs, I wonder if there
aren't any further uses on the CPU bringup / teardown code paths.

Jan



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get
  2021-08-18  6:41     ` Jan Beulich
@ 2021-08-18  6:52       ` Juergen Gross
  0 siblings, 0 replies; 6+ messages in thread
From: Juergen Gross @ 2021-08-18  6:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Marek Marczykowski-Górecki


[-- Attachment #1.1.1: Type: text/plain, Size: 974 bytes --]

On 18.08.21 08:41, Jan Beulich wrote:
> On 18.08.2021 08:24, Juergen Gross wrote:
>> Could you please try the attached patch?
> 
> Seeing the patch, two questions:
> 
> 1) Can idle_vcpu[cpu] ever be NULL when cpu_online(cpu) returned true?

No.

> 2) Seeing get_sched_res()'es access to per-CPU data, would it make sense
> to move the cpu_online() check into there? While I guess the majority of
> users are guaranteed to invoke it for online CPUs, I wonder if there
> aren't any further uses on the CPU bringup / teardown code paths.

As get_sched_res() is private to the scheduler sources, this would imply
a scheduler function being used for an offline cpu, which is rather
unlikely. Especially as get_cpu_idle_time() is the only official
scheduler function taking a physical cpu number as parameter. All other
functions only work with vcpus, and this would require to wake, pause,
... an idle vcpu of an offline cpu to hit the problem.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get
  2021-08-18  6:24   ` Juergen Gross
  2021-08-18  6:41     ` Jan Beulich
@ 2021-08-18 10:12     ` Marek Marczykowski-Górecki
  1 sibling, 0 replies; 6+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-08-18 10:12 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Jan Beulich, xen-devel, Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 2571 bytes --]

On Wed, Aug 18, 2021 at 08:24:39AM +0200, Juergen Gross wrote:
> Marek,
> 
> On 17.08.21 17:15, Juergen Gross wrote:
> > On 17.08.21 16:50, Marek Marczykowski-Górecki wrote:
> > I'll be looking into that.
> 
> Could you please try the attached patch?

It works, thanks!

So, the only other suspend-related issue I'm aware of, is:
https://lore.kernel.org/xen-devel/20210131021526.GB6354@mail-itl/

But I don't have reliable reproducer for that one...


> From e38d0816a33fbaa3c0c45bcd6e9d433b1e021222 Mon Sep 17 00:00:00 2001
> From: Juergen Gross <jgross@suse.com>
> To: xen-devel@lists.xenproject.org
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Dario Faggioli <dfaggioli@suse.com>
> Date: Wed, 18 Aug 2021 08:18:20 +0200
> Subject: [PATCH] xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> With smt=0 during a suspend/resume cycle of the machine the threads
> which have been parked before will briefly come up again. This can
> result in problems e.g. with cpufreq driver being active as this will
> call into get_cpu_idle_time() for a cpu without initialized scheduler
> data.
> 
> Fix that by letting get_cpu_idle_time() deal with this case.
> 
> Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

> ---
> An alternative way to fix the issue would be to keep the sched_resource
> of offline cpus allocated like we already do with idle vcpus and units.
> This fix would be more intrusive, but it would avoid similar other bugs
> like this one.
> ---
>  xen/common/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index 6d34764d38..9ac1b01ca8 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
>      struct vcpu_runstate_info state = { 0 };
>      const struct vcpu *v = idle_vcpu[cpu];
>  
> -    if ( cpu_online(cpu) && v )
> +    if ( cpu_online(cpu) && v && get_sched_res(cpu) )
>          vcpu_runstate_get(v, &state);
>  
>      return state.time[RUNSTATE_running];
> -- 
> 2.26.2
> 






-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-18 10:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17 14:50 S3 resume issue in cpufreq -> get_cpu_idle_time->vcpu_runstate_get Marek Marczykowski-Górecki
2021-08-17 15:15 ` Juergen Gross
2021-08-18  6:24   ` Juergen Gross
2021-08-18  6:41     ` Jan Beulich
2021-08-18  6:52       ` Juergen Gross
2021-08-18 10:12     ` Marek Marczykowski-Górecki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.