From: "Jürgen Groß" <jgross@suse.com>
To: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
Cc: Juergen Gross <jgross@suse.de>,
Dario Faggioli <dfaggioli@suse.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [Xen-devel] Xen crash after S3 suspend - Xen 4.13
Date: Tue, 29 Sep 2020 17:27:48 +0200 [thread overview]
Message-ID: <ea53b845-5edf-a61e-62ae-7ababc30b3e0@suse.com> (raw)
In-Reply-To: <20200929151627.GE1482@mail-itl>
On 29.09.20 17:16, Marek Marczykowski-Górecki wrote:
> On Tue, Sep 29, 2020 at 05:07:11PM +0200, Jürgen Groß wrote:
>> On 29.09.20 16:27, Marek Marczykowski-Górecki wrote:
>>> On Mon, Mar 23, 2020 at 01:09:49AM +0100, Marek Marczykowski-Górecki wrote:
>>>> On Thu, Mar 19, 2020 at 01:28:10AM +0100, Dario Faggioli wrote:
>>>>> [Adding Juergen]
>>>>>
>>>>> On Wed, 2020-03-18 at 23:10 +0100, Marek Marczykowski-Górecki wrote:
>>>>>> On Wed, Mar 18, 2020 at 02:50:52PM +0000, Andrew Cooper wrote:
>>>>>>> On 18/03/2020 14:16, Marek Marczykowski-Górecki wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> In my test setup (inside KVM with nested virt enabled), I rather
>>>>>>>> frequently get Xen crash on resume from S3. Full message below.
>>>>>>>>
>>>>>>>> This is Xen 4.13.0, with some patches, including "sched: fix
>>>>>>>> resuming
>>>>>>>> from S3 with smt=0".
>>>>>>>>
>>>>>>>> Contrary to the previous issue, this one does not happen always -
>>>>>>>> I
>>>>>>>> would say in about 40% cases on this setup, but very rarely on
>>>>>>>> physical
>>>>>>>> setup.
>>>>>>>>
>>>>>>>> This is _without_ core scheduling enabled, and also with smt=off.
>>>>>>>>
>>>>>>>> Do you think it would be any different on xen-unstable? I cat
>>>>>>>> try, but
>>>>>>>> it isn't trivial in this setup, so I'd ask first.
>>>>>>>>
>>>>> Well, Juergen has fixed quite a few issues.
>>>>>
>>>>> Most of them where triggering with core-scheduling enabled, and I don't
>>>>> recall any of them which looked similar or related to this.
>>>>>
>>>>> Still, it's possible that the same issue causes different symptoms, and
>>>>> hence that maybe one of the patches would fix this too.
>>>>
>>>> I've tested on master (d094e95fb7c), and reproduced exactly the same crash
>>>> (pasted below for the completeness).
>>>> But there is more: additionally, in most (all?) cases after resume I've got
>>>> soft lockup in Linux dom0 in smp_call_function_single() - see below. It
>>>> didn't happened before and the only change was Xen 4.13 -> master.
>>>>
>>>> Xen crash:
>>>>
>>>> (XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at credit2.c:2133
>>>
>>> Juergen, any idea about this one? This is also happening on the current
>>> stable-4.14 (28855ebcdbfa).
>>>
>>
>> Oh, sorry I didn't come back to this issue.
>>
>> I suspect this is related to stop_machine_run() being called during
>> suspend(), as I'm seeing very sporadic issues when offlining and then
>> onlining cpus with core scheduling being active (it seems as if the
>> dom0 vcpu doing the cpu online activity sometimes is using an old
>> vcpu state).
>
> Note this is default Xen 4.14 start, so core scheduling is _not_ active:
The similarity in the two failure cases is that multiple cpus are
affected by the operations during stop_machine_run().
>
> (XEN) Brought up 2 CPUs
> (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
> (XEN) Adding cpu 0 to runqueue 0
> (XEN) First cpu on runqueue, activating
> (XEN) Adding cpu 1 to runqueue 1
> (XEN) First cpu on runqueue, activating
>
>> I wasn't able to catch the real problem despite of having tried lots
>> of approaches using debug patches.
>>
>> Recently I suspected the whole problem could be somehow related to
>> RCU handling, as stop_machine_run() is relying on tasklets which are
>> executing in idle context, and RCU handling is done in idle context,
>> too. So there might be some kind of use after free scenario in case
>> some memory is freed via RCU despite it still being used by a tasklet.
>
> That sounds plausible, even though I don't really know this area of Xen.
>
>> I "just" need to find some time to verify this suspicion. Any help doing
>> this would be appreciated. :-)
>
> I do have a setup where I can easily-ish reproduce the issue. If there
> is some debug patch you'd like me to try, I can do that.
Thanks. I might come back to that offer as you are seeing a crash which
will be much easier to analyze. Catching my error case is much harder as
it surfaces some time after the real problem in a non destructive way
(usually I'm seeing a failure to load a library in the program which
just did its job via exactly the library claiming not being loadable).
Juergen
next prev parent reply other threads:[~2020-09-29 15:28 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-18 14:16 [Xen-devel] Xen crash after S3 suspend - Xen 4.13 Marek Marczykowski-Górecki
2020-03-18 14:50 ` Andrew Cooper
2020-03-18 22:10 ` Marek Marczykowski-Górecki
2020-03-19 0:28 ` Dario Faggioli
2020-03-19 0:59 ` Marek Marczykowski-Górecki
2020-03-23 0:09 ` Marek Marczykowski-Górecki
2020-03-23 8:14 ` Jan Beulich
2020-09-29 14:27 ` Marek Marczykowski-Górecki
2020-09-29 15:07 ` Jürgen Groß
2020-09-29 15:16 ` Marek Marczykowski-Górecki
2020-09-29 15:27 ` Jürgen Groß [this message]
2021-01-31 2:15 ` [Xen-devel] Xen crash after S3 suspend - Xen 4.13 and newer Marek Marczykowski-Górecki
2021-10-09 16:28 ` Marek Marczykowski-Górecki
2022-08-21 16:14 ` Marek Marczykowski-Górecki
2022-08-22 9:53 ` Jan Beulich
2022-08-22 10:00 ` Marek Marczykowski-Górecki
2022-09-20 10:22 ` Marek Marczykowski-Górecki
2022-09-20 14:30 ` Jan Beulich
2022-10-11 11:22 ` Marek Marczykowski-Górecki
2022-10-14 16:42 ` George Dunlap
2022-10-21 6:41 ` Juergen Gross
2022-08-22 15:34 ` Juergen Gross
2022-09-06 11:46 ` Juergen Gross
2022-09-06 12:35 ` Marek Marczykowski-Górecki
2022-09-07 12:21 ` Dario Faggioli
2022-09-07 15:07 ` marmarek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea53b845-5edf-a61e-62ae-7ababc30b3e0@suse.com \
--to=jgross@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=dfaggioli@suse.com \
--cc=jgross@suse.de \
--cc=marmarek@invisiblethingslab.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).