xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Jürgen Groß" <jgross@suse.com>
To: Sergey Dyasli <sergey.dyasli@citrix.com>,
	Xen-devel <xen-devel@lists.xen.org>
Cc: Ross Lagerwall <ross.lagerwall@citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	George Dunlap <George.Dunlap@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	Dario Faggioli <dfaggioli@suse.com>
Subject: Re: [Xen-devel] Live-Patch application failure in core-scheduling mode
Date: Tue, 11 Feb 2020 10:23:45 +0100	[thread overview]
Message-ID: <360520f2-397c-2d09-6ee5-8e7809ec20e0@suse.com> (raw)
In-Reply-To: <9bff11d5-3b14-d57e-adc9-5d923297c3a0@citrix.com>

On 11.02.20 10:07, Sergey Dyasli wrote:
> On 07/02/2020 08:04, Jürgen Groß wrote:
>> On 06.02.20 15:02, Sergey Dyasli wrote:
>>> On 06/02/2020 11:05, Sergey Dyasli wrote:
>>>> On 06/02/2020 09:57, Jürgen Groß wrote:
>>>>> On 05.02.20 17:03, Sergey Dyasli wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I'm currently investigating a Live-Patch application failure in core-
>>>>>> scheduling mode and this is an example of what I usually get:
>>>>>> (it's easily reproducible)
>>>>>>
>>>>>>        (XEN) [  342.528305] livepatch: lp: CPU8 - IPIing the other 15 CPUs
>>>>>>        (XEN) [  342.558340] livepatch: lp: Timed out on semaphore in CPU quiesce phase 13/15
>>>>>>        (XEN) [  342.558343] bad cpus: 6 9
>>>>>>
>>>>>>        (XEN) [  342.559293] CPU:    6
>>>>>>        (XEN) [  342.559562] Xen call trace:
>>>>>>        (XEN) [  342.559565]    [<ffff82d08023f304>] R common/schedule.c#sched_wait_rendezvous_in+0xa4/0x270
>>>>>>        (XEN) [  342.559568]    [<ffff82d08023f8aa>] F common/schedule.c#schedule+0x17a/0x260
>>>>>>        (XEN) [  342.559571]    [<ffff82d080240d5a>] F common/softirq.c#__do_softirq+0x5a/0x90
>>>>>>        (XEN) [  342.559574]    [<ffff82d080278ec5>] F arch/x86/domain.c#guest_idle_loop+0x35/0x60
>>>>>>
>>>>>>        (XEN) [  342.559761] CPU:    9
>>>>>>        (XEN) [  342.560026] Xen call trace:
>>>>>>        (XEN) [  342.560029]    [<ffff82d080241661>] R _spin_lock_irq+0x11/0x40
>>>>>>        (XEN) [  342.560032]    [<ffff82d08023f323>] F common/schedule.c#sched_wait_rendezvous_in+0xc3/0x270
>>>>>>        (XEN) [  342.560036]    [<ffff82d08023f8aa>] F common/schedule.c#schedule+0x17a/0x260
>>>>>>        (XEN) [  342.560039]    [<ffff82d080240d5a>] F common/softirq.c#__do_softirq+0x5a/0x90
>>>>>>        (XEN) [  342.560042]    [<ffff82d080279db5>] F arch/x86/domain.c#idle_loop+0x55/0xb0
>>>>>>
>>>>>> The first HT sibling is waiting for the second in the LP-application
>>>>>> context while the second waits for the first in the scheduler context.
>>>>>>
>>>>>> Any suggestions on how to improve this situation are welcome.
>>>>>
>>>>> Can you test the attached patch, please? It is only tested to boot, so
>>>>> I did no livepatch tests with it.
>>>>
>>>> Thank you for the patch! It seems to fix the issue in my manual testing.
>>>> I'm going to submit automatic LP testing for both thread/core modes.
>>>
>>> Andrew suggested to test late ucode loading as well and so I did.
>>> It uses stop_machine() to rendezvous cpus and it failed with a similar
>>> backtrace for a problematic CPU. But in this case the system crashed
>>> since there is no timeout involved:
>>>
>>>       (XEN) [  155.025168] Xen call trace:
>>>       (XEN) [  155.040095]    [<ffff82d0802417f2>] R _spin_unlock_irq+0x22/0x30
>>>       (XEN) [  155.069549]    [<ffff82d08023f3c2>] S common/schedule.c#sched_wait_rendezvous_in+0xa2/0x270
>>>       (XEN) [  155.109696]    [<ffff82d08023f728>] F common/schedule.c#sched_slave+0x198/0x260
>>>       (XEN) [  155.145521]    [<ffff82d080240e1a>] F common/softirq.c#__do_softirq+0x5a/0x90
>>>       (XEN) [  155.180223]    [<ffff82d0803716f6>] F x86_64/entry.S#process_softirqs+0x6/0x20
>>>
>>> It looks like your patch provides a workaround for LP case, but other
>>> cases like stop_machine() remain broken since the underlying issue with
>>> the scheduler is still there.
>>
>> And here is the fix for ucode loading (that was in fact the only case
>> where stop_machine_run() wasn't already called in a tasklet).
>>
>> I have done a manual test loading new ucode with core scheduling
>> active.
> 
> The patch seems to fix the issue, thanks!
> Do you plan to post the 2 patches to the ML now for proper review?

Yes.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

      reply	other threads:[~2020-02-11  9:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-05 16:03 [Xen-devel] Live-Patch application failure in core-scheduling mode Sergey Dyasli
2020-02-05 16:35 ` Jürgen Groß
2020-02-06  9:57 ` Jürgen Groß
2020-02-06 11:05   ` Sergey Dyasli
2020-02-06 14:02     ` Sergey Dyasli
2020-02-06 14:29       ` Jürgen Groß
2020-02-07  8:04       ` Jürgen Groß
2020-02-07  8:23         ` Jan Beulich
2020-02-07  8:42           ` Jürgen Groß
2020-02-07  8:49             ` Jan Beulich
2020-02-07  9:25               ` Jürgen Groß
2020-02-07  9:51                 ` Jan Beulich
2020-02-07  9:58                   ` Jürgen Groß
2020-02-07 11:44                 ` Roger Pau Monné
2020-02-07 12:58                   ` Jürgen Groß
2020-02-08 12:19             ` Andrew Cooper
2020-02-08 12:29               ` Jürgen Groß
2020-02-11  9:07         ` Sergey Dyasli
2020-02-11  9:23           ` Jürgen Groß [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=360520f2-397c-2d09-6ee5-8e7809ec20e0@suse.com \
    --to=jgross@suse.com \
    --cc=George.Dunlap@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=ross.lagerwall@citrix.com \
    --cc=sergey.dyasli@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).