From: "Jürgen Groß" <jgross@suse.com>
To: Sergey Dyasli <sergey.dyasli@citrix.com>,
Xen-devel <xen-devel@lists.xen.org>
Cc: Ross Lagerwall <ross.lagerwall@citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
George Dunlap <George.Dunlap@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
Dario Faggioli <dfaggioli@suse.com>
Subject: Re: [Xen-devel] Live-Patch application failure in core-scheduling mode
Date: Tue, 11 Feb 2020 10:23:45 +0100 [thread overview]
Message-ID: <360520f2-397c-2d09-6ee5-8e7809ec20e0@suse.com> (raw)
In-Reply-To: <9bff11d5-3b14-d57e-adc9-5d923297c3a0@citrix.com>
On 11.02.20 10:07, Sergey Dyasli wrote:
> On 07/02/2020 08:04, Jürgen Groß wrote:
>> On 06.02.20 15:02, Sergey Dyasli wrote:
>>> On 06/02/2020 11:05, Sergey Dyasli wrote:
>>>> On 06/02/2020 09:57, Jürgen Groß wrote:
>>>>> On 05.02.20 17:03, Sergey Dyasli wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I'm currently investigating a Live-Patch application failure in core-
>>>>>> scheduling mode and this is an example of what I usually get:
>>>>>> (it's easily reproducible)
>>>>>>
>>>>>> (XEN) [ 342.528305] livepatch: lp: CPU8 - IPIing the other 15 CPUs
>>>>>> (XEN) [ 342.558340] livepatch: lp: Timed out on semaphore in CPU quiesce phase 13/15
>>>>>> (XEN) [ 342.558343] bad cpus: 6 9
>>>>>>
>>>>>> (XEN) [ 342.559293] CPU: 6
>>>>>> (XEN) [ 342.559562] Xen call trace:
>>>>>> (XEN) [ 342.559565] [<ffff82d08023f304>] R common/schedule.c#sched_wait_rendezvous_in+0xa4/0x270
>>>>>> (XEN) [ 342.559568] [<ffff82d08023f8aa>] F common/schedule.c#schedule+0x17a/0x260
>>>>>> (XEN) [ 342.559571] [<ffff82d080240d5a>] F common/softirq.c#__do_softirq+0x5a/0x90
>>>>>> (XEN) [ 342.559574] [<ffff82d080278ec5>] F arch/x86/domain.c#guest_idle_loop+0x35/0x60
>>>>>>
>>>>>> (XEN) [ 342.559761] CPU: 9
>>>>>> (XEN) [ 342.560026] Xen call trace:
>>>>>> (XEN) [ 342.560029] [<ffff82d080241661>] R _spin_lock_irq+0x11/0x40
>>>>>> (XEN) [ 342.560032] [<ffff82d08023f323>] F common/schedule.c#sched_wait_rendezvous_in+0xc3/0x270
>>>>>> (XEN) [ 342.560036] [<ffff82d08023f8aa>] F common/schedule.c#schedule+0x17a/0x260
>>>>>> (XEN) [ 342.560039] [<ffff82d080240d5a>] F common/softirq.c#__do_softirq+0x5a/0x90
>>>>>> (XEN) [ 342.560042] [<ffff82d080279db5>] F arch/x86/domain.c#idle_loop+0x55/0xb0
>>>>>>
>>>>>> The first HT sibling is waiting for the second in the LP-application
>>>>>> context while the second waits for the first in the scheduler context.
>>>>>>
>>>>>> Any suggestions on how to improve this situation are welcome.
>>>>>
>>>>> Can you test the attached patch, please? It is only tested to boot, so
>>>>> I did no livepatch tests with it.
>>>>
>>>> Thank you for the patch! It seems to fix the issue in my manual testing.
>>>> I'm going to submit automatic LP testing for both thread/core modes.
>>>
>>> Andrew suggested to test late ucode loading as well and so I did.
>>> It uses stop_machine() to rendezvous cpus and it failed with a similar
>>> backtrace for a problematic CPU. But in this case the system crashed
>>> since there is no timeout involved:
>>>
>>> (XEN) [ 155.025168] Xen call trace:
>>> (XEN) [ 155.040095] [<ffff82d0802417f2>] R _spin_unlock_irq+0x22/0x30
>>> (XEN) [ 155.069549] [<ffff82d08023f3c2>] S common/schedule.c#sched_wait_rendezvous_in+0xa2/0x270
>>> (XEN) [ 155.109696] [<ffff82d08023f728>] F common/schedule.c#sched_slave+0x198/0x260
>>> (XEN) [ 155.145521] [<ffff82d080240e1a>] F common/softirq.c#__do_softirq+0x5a/0x90
>>> (XEN) [ 155.180223] [<ffff82d0803716f6>] F x86_64/entry.S#process_softirqs+0x6/0x20
>>>
>>> It looks like your patch provides a workaround for LP case, but other
>>> cases like stop_machine() remain broken since the underlying issue with
>>> the scheduler is still there.
>>
>> And here is the fix for ucode loading (that was in fact the only case
>> where stop_machine_run() wasn't already called in a tasklet).
>>
>> I have done a manual test loading new ucode with core scheduling
>> active.
>
> The patch seems to fix the issue, thanks!
> Do you plan to post the 2 patches to the ML now for proper review?
Yes.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
prev parent reply other threads:[~2020-02-11 9:24 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-05 16:03 [Xen-devel] Live-Patch application failure in core-scheduling mode Sergey Dyasli
2020-02-05 16:35 ` Jürgen Groß
2020-02-06 9:57 ` Jürgen Groß
2020-02-06 11:05 ` Sergey Dyasli
2020-02-06 14:02 ` Sergey Dyasli
2020-02-06 14:29 ` Jürgen Groß
2020-02-07 8:04 ` Jürgen Groß
2020-02-07 8:23 ` Jan Beulich
2020-02-07 8:42 ` Jürgen Groß
2020-02-07 8:49 ` Jan Beulich
2020-02-07 9:25 ` Jürgen Groß
2020-02-07 9:51 ` Jan Beulich
2020-02-07 9:58 ` Jürgen Groß
2020-02-07 11:44 ` Roger Pau Monné
2020-02-07 12:58 ` Jürgen Groß
2020-02-08 12:19 ` Andrew Cooper
2020-02-08 12:29 ` Jürgen Groß
2020-02-11 9:07 ` Sergey Dyasli
2020-02-11 9:23 ` Jürgen Groß [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=360520f2-397c-2d09-6ee5-8e7809ec20e0@suse.com \
--to=jgross@suse.com \
--cc=George.Dunlap@citrix.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=dfaggioli@suse.com \
--cc=ross.lagerwall@citrix.com \
--cc=sergey.dyasli@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).