On Tue, Sep 06, 2022 at 01:46:55PM +0200, Juergen Gross wrote:
> On 22.08.22 17:34, Juergen Gross wrote:
> > On 21.08.22 18:14, Marek Marczykowski-Górecki wrote:
> > > On Sat, Oct 09, 2021 at 06:28:17PM +0200, Marek Marczykowski-Górecki wrote:
> > > > On Sun, Jan 31, 2021 at 03:15:30AM +0100, Marek Marczykowski-Górecki wrote:
> > > > > On Tue, Sep 29, 2020 at 05:27:48PM +0200, Jürgen Groß wrote:
> > > > > > On 29.09.20 17:16, Marek Marczykowski-Górecki wrote:
> > > > > > > On Tue, Sep 29, 2020 at 05:07:11PM +0200, Jürgen Groß wrote:
> > > > > > > > On 29.09.20 16:27, Marek Marczykowski-Górecki wrote:
> > > > > > > > > On Mon, Mar 23, 2020 at 01:09:49AM +0100, Marek Marczykowski-Górecki wrote:
> > > > > > > > > > On Thu, Mar 19, 2020 at 01:28:10AM +0100, Dario Faggioli wrote:
> > > > > > > > > > > [Adding Juergen]
> > > > > > > > > > > 
> > > > > > > > > > > On Wed, 2020-03-18 at 23:10 +0100, Marek Marczykowski-Górecki wrote:
> > > > > > > > > > > > On Wed, Mar 18, 2020 at 02:50:52PM +0000, Andrew Cooper wrote:
> > > > > > > > > > > > > On 18/03/2020 14:16, Marek Marczykowski-Górecki wrote:
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > In my test setup (inside KVM with nested virt enabled), I rather
> > > > > > > > > > > > > > frequently get Xen crash on resume from S3. Full message below.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This is Xen 4.13.0, with some patches, including "sched: fix
> > > > > > > > > > > > > > resuming
> > > > > > > > > > > > > > from S3 with smt=0".
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Contrary to the previous issue, this one does not happen always -
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > would say in about 40% cases on this setup, but very rarely on
> > > > > > > > > > > > > > physical
> > > > > > > > > > > > > > setup.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This is _without_ core scheduling enabled, and also with smt=off.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Do you think it would be any different on xen-unstable? I cat
> > > > > > > > > > > > > > try, but
> > > > > > > > > > > > > > it isn't trivial in this setup, so I'd ask first.
> > > > > > > > > > > > > > 
> > > > > > > > > > > Well, Juergen has fixed quite a few issues.
> > > > > > > > > > > 
> > > > > > > > > > > Most of them where triggering with core-scheduling enabled, and I don't
> > > > > > > > > > > recall any of them which looked similar or related to this.
> > > > > > > > > > > 
> > > > > > > > > > > Still, it's possible that the same issue causes different symptoms, and
> > > > > > > > > > > hence that maybe one of the patches would fix this too.
> > > > > > > > > > 
> > > > > > > > > > I've tested on master (d094e95fb7c), and reproduced exactly the same crash
> > > > > > > > > > (pasted below for the completeness).
> > > > > > > > > > But there is more: additionally, in most
> > > > > > > > > > (all?) cases after resume I've got
> > > > > > > > > > soft lockup in Linux dom0 in smp_call_function_single() - see below. It
> > > > > > > > > > didn't happened before and the only change was Xen 4.13 -> master.
> > > > > > > > > > 
> > > > > > > > > > Xen crash:
> > > > > > > > > > 
> > > > > > > > > > (XEN) Assertion
> > > > > > > > > > 'c2rqd(sched_unit_master(unit)) ==
> > > > > > > > > > svc->rqd' failed at credit2.c:2133
> > > > > > > > > 
> > > > > > > > > Juergen, any idea about this one? This is also happening on the current
> > > > > > > > > stable-4.14 (28855ebcdbfa).
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Oh, sorry I didn't come back to this issue.
> > > > > > > > 
> > > > > > > > I suspect this is related to stop_machine_run() being called during
> > > > > > > > suspend(), as I'm seeing very sporadic issues when offlining and then
> > > > > > > > onlining cpus with core scheduling being active (it seems as if the
> > > > > > > > dom0 vcpu doing the cpu online activity sometimes is using an old
> > > > > > > > vcpu state).
> > > > > > > 
> > > > > > > Note this is default Xen 4.14 start, so core scheduling is _not_ active:
> > > > > > 
> > > > > > The similarity in the two failure cases is that multiple cpus are
> > > > > > affected by the operations during stop_machine_run().
> > > > > > 
> > > > > > > 
> > > > > > >       (XEN) Brought up 2 CPUs
> > > > > > >       (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
> > > > > > >       (XEN) Adding cpu 0 to runqueue 0
> > > > > > >       (XEN)  First cpu on runqueue, activating
> > > > > > >       (XEN) Adding cpu 1 to runqueue 1
> > > > > > >       (XEN)  First cpu on runqueue, activating
> > > > > > > 
> > > > > > > > I wasn't able to catch the real problem despite of having tried lots
> > > > > > > > of approaches using debug patches.
> > > > > > > > 
> > > > > > > > Recently I suspected the whole problem could be somehow related to
> > > > > > > > RCU handling, as stop_machine_run() is relying on tasklets which are
> > > > > > > > executing in idle context, and RCU handling is done in idle context,
> > > > > > > > too. So there might be some kind of use after free scenario in case
> > > > > > > > some memory is freed via RCU despite it still being used by a tasklet.
> > > > > > > 
> > > > > > > That sounds plausible, even though I don't really know this area of Xen.
> > > > > > > 
> > > > > > > > I "just" need to find some time to verify this suspicion. Any help doing
> > > > > > > > this would be appreciated. :-)
> > > > > > > 
> > > > > > > I do have a setup where I can easily-ish reproduce the issue. If there
> > > > > > > is some debug patch you'd like me to try, I can do that.
> > > > > > 
> > > > > > Thanks. I might come back to that offer as you are seeing a crash which
> > > > > > will be much easier to analyze. Catching my error case is much harder as
> > > > > > it surfaces some time after the real problem in a non destructive way
> > > > > > (usually I'm seeing a failure to load a library in the program which
> > > > > > just did its job via exactly the library claiming not being loadable).
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > I'm resurrecting this thread as it was recently mentioned elsewhere. I
> > > > > can still reproduce the issue on the recent staging branch (9dc687f155).
> > > > > 
> > > > > It fails after the first resume (not always, but frequent enough to
> > > > > debug it). At least one guest needs to be running - with just (PV) dom0
> > > > > the crash doesn't happen (at least for the ~8 times in a row I tried).
> > > > > If the first resume works, the second (almost?) always will fail but
> > > > > with a different symptoms - dom0 kernel lockups (at least some of its
> > > > > vcpus). I haven't debugged this one yet at all.
> > > > > 
> > > > > Any help will be appreciated, I can apply some debug patches, change
> > > > > configuration etc.
> > > > 
> > > > This still happens on 4.14.3. Maybe it is related to freeing percpu
> > > > areas, as it caused other issues with suspend too? Just a thought...
> > > 
> > > I have reproduced this on current staging(*). And I can reproduce it
> > > reliably. And also, I got (I believe) closely related crash with credit1
> > > scheduler.
> > > 
> > > (*) It isn't plain staging, it's one with my xhci console patches on
> > > top, including attempt to make it survive S3. I believe the only
> > > relevant part there is sticking set_timer() into console resume path (or
> > > just having a timer with rather short delay registered). The actual tree
> > > at https://github.com/marmarek/xen/tree/master-xue2-debug, including
> > > quite a lot of debug prints and debug hacks.
> > > 
> > > Specific crash with credit2:
> > > 
> > >      (XEN) Assertion 'sched_unit_master(currunit) == cpu' failed at
> > > common/sched/credit.c:928
> > >      (XEN) ----[ Xen-4.17-unstable  x86_64  debug=y  Tainted:   C    ]----
> > >      (XEN) CPU:    0
> > >      (XEN) RIP:    e008:[<ffff82d0402434bf>] credit.c#csched_tick+0x2d4/0x494
> > >      (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v4)
> > >      (XEN) rax: ffff82d0405c4298   rbx: 0000000000000002   rcx: 0000000000000002
> > >      (XEN) rdx: ffff8302517f64d0   rsi: ffff8302515c0fc0   rdi: 0000000000000002
> > >      (XEN) rbp: ffff830256227e38   rsp: ffff830256227de0   r8:  0000000000000004
> > >      (XEN) r9:  ffff8302517ac820   r10: ffff830251745068   r11: 00000088cb734887
> > >      (XEN) r12: ffff83025174de50   r13: ffff8302515c0fa0   r14: ffff83025174df40
> > >      (XEN) r15: ffff8302515c0cc0   cr0: 0000000080050033   cr4: 0000000000372660
> > >      (XEN) cr3: 00000001bacbd000   cr2: 000077e5ec02a318
> > >      (XEN) fsb: 000077e5fe533700   gsb: ffff888255700000   gss: 0000000000000000
> > >      (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > >      (XEN) Xen code around <ffff82d0402434bf> (credit.c#csched_tick+0x2d4/0x494):
> > >      (XEN)  01 00 00 e9 2a 01 00 00 <0f> 0b 0f 0b 0f 0b 48 8b 41 20
> > > 0f b7 00 89 45 cc
> > >      (XEN) Xen stack trace from rsp=ffff830256227de0:
> > >      (XEN)    ffff830256227fff 0000000000000000 0000000256227e10 ffff82d04035be90
> > >      (XEN)    ffff830256227ef8 ffff830251745000 ffff82d0405c3280 ffff82d0402431eb
> > >      (XEN)    0000000000000002 00000088c9ba9534 0000000000000000 ffff830256227e60
> > >      (XEN)    ffff82d04022ee53 ffff82d0405c3280 ffff8302963e1320 ffff8302515c0fc0
> > >      (XEN)    ffff830256227ea0 ffff82d04022f73f ffff830256227e80 ffff82d0405c9f00
> > >      (XEN)    ffffffffffffffff ffff82d0405c9f00 ffff830256227fff 0000000000000000
> > >      (XEN)    ffff830256227ed8 ffff82d04022d26c ffff830251745000 0000000000000000
> > >      (XEN)    0000000000000000 ffff830256227fff 0000000000000000 ffff830256227ee8
> > >      (XEN)    ffff82d04022d2ff 00007cfda9dd80e7 ffff82d0402f03c6 ffff88810c005c00
> > >      (XEN)    0000000000000031 0000000000000100 00000000fffffe00 0000000000000031
> > >      (XEN)    0000000000000031 ffffffff82d45d28 0000000000000e2e 0000000000000000
> > >      (XEN)    0000000000000032 00000000ffffef31 0000000000000000 ffff88812244a700
> > >      (XEN)    0000000000000005 ffff88812244a780 000000fa00000000 ffffffff818db55f
> > >      (XEN)    000000000000e033 0000000000000246 ffffc900409b7c50 000000000000e02b
> > >      (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >      (XEN)    0000e01000000000 ffff830251745000 0000000000000000 0000000000372660
> > >      (XEN)    0000000000000000 800000025620b002 000e030300000001 0000000000000000
> > >      (XEN) Xen call trace:
> > >      (XEN)    [<ffff82d0402434bf>] R credit.c#csched_tick+0x2d4/0x494
> > >      (XEN)    [<ffff82d04022ee53>] F timer.c#execute_timer+0x45/0x5c
> > >      (XEN)    [<ffff82d04022f73f>] F timer.c#timer_softirq_action+0x71/0x278
> > >      (XEN)    [<ffff82d04022d26c>] F softirq.c#__do_softirq+0x94/0xbe
> > >      (XEN)    [<ffff82d04022d2ff>] F do_softirq+0x13/0x15
> > >      (XEN)    [<ffff82d0402f03c6>] F x86_64/entry.S#process_softirqs+0x6/0x20
> > >      (XEN)
> > >      (XEN)
> > >      (XEN) ****************************************
> > >      (XEN) Panic on CPU 0:
> > >      (XEN) Assertion 'sched_unit_master(currunit) == cpu' failed at
> > > common/sched/credit.c:928
> > >      (XEN) ****************************************
> > > 
> > > Specific crash with credit1:
> > > 
> > >      (XEN) ASSERT FAIL! cpu 4, sched_unit_master 0, currunit id 4
> > >      (XEN) Assertion 'sched_unit_master(currunit) == cpu' failed at
> > > common/sched/credit.c:936
> > >      (XEN) ----[ Xen-4.17-unstable  x86_64  debug=y  Tainted:   C    ]----
> > >      (XEN) CPU:    0
> > >      (XEN) RIP:    e008:[<ffff82d04024355c>] credit.c#csched_tick+0x2f7/0x4b7
> > >      (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v4)
> > >      (XEN) rax: ffff8302517f64d0   rbx: 0000000000000004   rcx: 0000000000000000
> > >      (XEN) rdx: ffff830256227fff   rsi: 000000000000000a   rdi: ffff82d0404786b8
> > >      (XEN) rbp: ffff830256227e38   rsp: ffff830256227de0   r8:  0000000000000087
> > >      (XEN) r9:  0000000000000001   r10: 0000000000000001   r11: 0000000000000004
> > >      (XEN) r12: ffff83025174de50   r13: ffff830251778100   r14: ffff83025174df40
> > >      (XEN) r15: ffff8302515c0cc0   cr0: 0000000080050033   cr4: 0000000000372660
> > >      (XEN) cr3: 000000024ac10000   cr2: 0000000000000000
> > >      (XEN) fsb: 0000000000000000   gsb: ffff888255700000   gss: 0000000000000000
> > >      (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > >      (XEN) Xen code around <ffff82d04024355c> (credit.c#csched_tick+0x2f7/0x4b7):
> > >      (XEN)  58 70 0f 84 bf fe ff ff <0f> 0b 0f 0b 0f 0b 48 8b 40 20
> > > 0f b7 00 89 45 cc
> > >      (XEN) Xen stack trace from rsp=ffff830256227de0:
> > >      (XEN)    000000000046692a ffff8302515c0bc0 000000044025ae27 002191c0517f64e8
> > >      (XEN)    ffff82d0405c33e0 ffff8302517c8590 ffff82d0405c3280 ffff82d040243265
> > >      (XEN)    0000000000000004 00000026d27a80a7 0000000000000000 ffff830256227e60
> > >      (XEN)    ffff82d04022ee94 ffff82d0405c3280 ffff8302517e7df0 ffff830251778120
> > >      (XEN)    ffff830256227ea0 ffff82d04022f77a ffff830256227e80 ffff82d0405c9f00
> > >      (XEN)    ffffffffffffffff ffff82d0405c9f00 ffff830256227fff 0000000000000000
> > >      (XEN)    ffff830256227ed8 ffff82d04022d26c ffff830251745000 0000000000000000
> > >      (XEN)    0000000000000000 ffff830256227fff 0000000000000000 ffff830256227ee8
> > >      (XEN)    ffff82d04022d2ff 00007cfda9dd80e7 ffff82d0402f0496 0000000000000001
> > >      (XEN)    ffffc90040127ddc ffffc90040127dd0 ffffc90040127dd4 ffffc90040127dd8
> > >      (XEN)    00000000756e6547 0000000000000000 0000000000000000 ffffc90040127ddc
> > >      (XEN)    ffffc90040127dd0 000000000000000d 000000006c65746e 0000000049656e69
> > >      (XEN)    ffffc90040127dd4 ffffc90040127dd8 000000fa00000000 ffffffff8101c5a4
> > >      (XEN)    000000000000e033 0000000000000293 ffffc90040127db8 000000000000e02b
> > >      (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > >      (XEN)    0000e01000000000 ffff830251745000 0000000000000000 0000000000372660
> > >      (XEN)    0000000000000000 800000025620b002 000e030300000001 0000000000000000
> > >      (XEN) Xen call trace:
> > >      (XEN)    [<ffff82d04024355c>] R credit.c#csched_tick+0x2f7/0x4b7
> > >      (XEN)    [<ffff82d04022ee94>] F timer.c#execute_timer+0x45/0x5c
> > >      (XEN)    [<ffff82d04022f77a>] F timer.c#timer_softirq_action+0x71/0x278
> > >      (XEN)    [<ffff82d04022d26c>] F softirq.c#__do_softirq+0x94/0xbe
> > >      (XEN)    [<ffff82d04022d2ff>] F do_softirq+0x13/0x15
> > >      (XEN)    [<ffff82d0402f0496>] F x86_64/entry.S#process_softirqs+0x6/0x20
> > >      (XEN)
> > >      (XEN) ****************************************
> > >      (XEN) Panic on CPU 0:
> > >      (XEN) Assertion 'sched_unit_master(currunit) == cpu' failed at
> > > common/sched/credit.c:936
> > >      (XEN) ****************************************
> > > 
> > > I'm not sure about the nature of credit2 crash, but for credit1 it looks
> > > much simpler to understand. Credit1 registers a timer for each pcpu (in
> > > init_pdata()). Then, the timer function
> > > (csched_tick()->csched_unit_acct()) asserts if it was really called on
> > > correct CPU. The issue is that suspend path migrates all timers to CPU0
> > > (timerc.:cpu_callback->migrate_timers_from_cpu()), and restore path
> > > doesn't migrate them back. All this while not unregistering pCPUs from
> > > the scheduler (sched_deinit_pdata() / init_pdata() is not called -
> > > confirmed via debug prints).
> > > 
> > > I'm not exactly sure if that's the same issue that applies to credit2,
> > > but I think it's likely. A naive fix could be unregistering all pCPUs
> > > from the scheduler (an registering them back on resume), but this may
> > > mess up CPU pools, and probably few other things. Any better ideas? I
> > > can test various patches, provide more debug info etc - should be rather
> > > straightforward given I have reliable reproducer now.
> > 
> > Unregistering the cpus from the scheduler would break core scheduling.
> > 
> > I'll have a look if the timers can just be deactivated in order not to
> > have them migrated to cpu0.
> 
> Could you test the attached patch, please?

I did a test with only dom0 running, and it works now. It isn't a
comprehensive test, but just dom0 was enough to crash it before, and it
stays working now. Thanks!

Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

> 
> 
> Juergen

> From a2c224bee53e2216d41b34518bb2d11b64d63b6a Mon Sep 17 00:00:00 2001
> From: Juergen Gross <jgross@suse.com>
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien@xen.org>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Wei Liu <wl@xen.org>
> Date: Tue, 6 Sep 2022 13:39:48 +0200
> Subject: [PATCH] xen/timer: don't migrate timers away from cpus during suspend
> 
> During a suspend/resume cycle timers on all cpus but cpu 0 will be
> migrated to cpu 0, as the other cpus are taken down.
> 
> This is problematic in case such a timer is related to a specific vcpu,
> as the vcpus are not migrated to another cpu during suspend (migrating
> them would break cpupools and core scheduling).
> 
> In order to avoid the problems just try to keep the timers on their
> cpus. Only migrate them away in case resume failed. Doing so isn't
> problematic, as any vcpu on a cpu not coming back to life would be
> migrated away, too.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  xen/common/timer.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/common/timer.c b/xen/common/timer.c
> index 9b5016d5ed..6b5473e5f1 100644
> --- a/xen/common/timer.c
> +++ b/xen/common/timer.c
> @@ -637,6 +637,13 @@ static void free_percpu_timers(unsigned int cpu)
>          ASSERT(ts->heap == dummy_heap);
>  }
>  
> +static void deinit_timers(unsigned int cpu)
> +{
> +    migrate_timers_from_cpu(cpu);
> +    if ( !park_offline_cpus )
> +        free_percpu_timers(cpu);
> +}
> +
>  static int cf_check cpu_callback(
>      struct notifier_block *nfb, unsigned long action, void *hcpu)
>  {
> @@ -655,13 +662,14 @@ static int cf_check cpu_callback(
>          }
>          break;
>  
> -    case CPU_UP_CANCELED:
>      case CPU_DEAD:
> -    case CPU_RESUME_FAILED:
> -        migrate_timers_from_cpu(cpu);
> +        if ( system_state != SYS_STATE_suspend )
> +            deinit_timers(cpu);
> +        break;
>  
> -        if ( !park_offline_cpus && system_state != SYS_STATE_suspend )
> -            free_percpu_timers(cpu);
> +    case CPU_UP_CANCELED:
> +    case CPU_RESUME_FAILED:
> +        deinit_timers(cpu);
>          break;
>  
>      case CPU_REMOVE:
> -- 
> 2.35.3
> 






-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab