All of lore.kernel.org
 help / color / mirror / Atom feed
From: Julien Grall <Julien.Grall@arm.com>
To: Juergen Gross <jgross@suse.com>,
	Volodymyr Babchuk <vlad.babchuk@gmail.com>
Cc: "Stefano Stabellini" <sstabellini@kernel.org>,
	"Wei Liu" <wei.liu2@citrix.com>,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	"George Dunlap" <george.dunlap@eu.citrix.com>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Tim Deegan" <tim@xen.org>, "Dario Faggioli" <dfaggioli@suse.com>,
	"Jan Beulich" <jbeulich@suse.com>,
	xen-devel <xen-devel@lists.xenproject.org>, nd <nd@arm.com>,
	"Ian Jackson" <ian.jackson@eu.citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v2 0/6] xen: simplify suspend/resume handling
Date: Thu, 28 Mar 2019 13:49:11 +0000	[thread overview]
Message-ID: <f9c6f862-589e-8565-d011-004a38a8f3d6@arm.com> (raw)
In-Reply-To: <243dbc25-b2bd-82d8-dbd5-722a9287bddd@suse.com>



On 28/03/2019 13:37, Juergen Gross wrote:
> On 28/03/2019 14:33, Julien Grall wrote:
>> Hi,
>>
>> On 3/28/19 1:01 PM, Volodymyr Babchuk wrote:
>>> Hello Juergen,
>>>
>>> On Thu, 28 Mar 2019 at 14:09, Juergen Gross <jgross@suse.com> wrote:
>>>>
>>>> Especially in the scheduler area (schedule.c, cpupool.c) there is a
>>>> rather complex handling involved when doing suspend and resume.
>>>>
>>>> This can be simplified a lot by not performing a complete cpu down and
>>>> up cycle for the non-boot cpus, but keeping the pure software related
>>>> state and freeing it only in case a cpu didn't come up again during
>>>> resume.
>>>>
>>>> In summary not only the complexity can be reduced, but the failure
>>>> tolerance will be even better with this series: With a dedicated hook
>>>> for failing cpus when resuming it is now possible to survive e.g. a
>>>> cpupool being left without any cpu after resume by moving its domains
>>>> to cpupool0.
>>>>
>>>> Juergen Gross (6):
>>>>     xen/sched: call cpu_disable_scheduler() via cpu notifier
>>>>     xen: add helper for calling notifier_call_chain() to common/cpu.c
>>>>     xen: add new cpu notifier action CPU_RESUME_FAILED
>>>>     xen: don't free percpu areas during suspend
>>>>     xen/cpupool: simplify suspend/resume handling
>>>>     xen/sched: don't disable scheduler on cpus during suspend
>>>>
>>>>    xen/arch/arm/smpboot.c     |   4 -
>>>>    xen/arch/x86/percpu.c      |   3 +-
>>>>    xen/arch/x86/smpboot.c     |   3 -
>>>>    xen/common/cpu.c           |  61 +++++++-------
>>>>    xen/common/cpupool.c       | 131 ++++++++++++-----------------
>>>>    xen/common/schedule.c      | 203
>>>> +++++++++++++++++++--------------------------
>>>>    xen/include/xen/cpu.h      |  29 ++++---
>>>>    xen/include/xen/sched-if.h |   1 -
>>>>    8 files changed, 190 insertions(+), 245 deletions(-)
>>>>
>>>
>>> I tested your patch series on ARM64 platform. We had issue with hard
>>> affinity - there was assertion failure in sched_credit2 code during
>>> suspension if one of the vCPUs is pinned to non-0 pCPU.
>> When you report an error, please make clear what commit you are using
>> and whether you have patches applied on top.
>>
>> In this case, we have no support of suspend/resume on Arm today. So bug
>> report around suspend/resume is a bit confusing to have. It is also more
>> difficult to help when you don't have the full picture as a bug may be
>> in your code and upstream Xen.
>>
>> I saw Juergen suggested a fix, please carry it in whatever series you have.
>>
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) PSCI cpu off failed for CPU0 err=-3
>>> (XEN) ****************************************
>>
>> PSCI CPU off failing is never a good news. Here, the command has been
>> denied by PSCI monitor. But... why does CPU off is actually called on
>> CPU0? Shouldn't we have turned off the platform instead?
> 
> Could it be that a scheduler lock is no longer reachable as the percpu
> memory of another cpu has been released and allocated again? That would
> be one of the possible results of my series.

The data abort shown before the panic is potentially the percpu issue. 
But I don't think it will have the effect to try to turn off CPU0. This 
looks more an issue in the machine_halt/machine_restart path.

Indeed CPU off may rightfully return -3 (DENIED) if the Trusted-OS 
reside on this CPU. We technically should have checked before that the 
CPU could be turned off. But it looks like we are missing this code. I 
vaguely remember to already have pointed out that issue in the past.

Cheers,

> 
> 
> Juergen
> 

-- 
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-03-28 13:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-28 12:06 [PATCH v2 0/6] xen: simplify suspend/resume handling Juergen Gross
2019-03-28 12:06 ` [PATCH v2 1/6] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
2019-03-29 17:19   ` Dario Faggioli
2019-04-01 10:34   ` Julien Grall
2019-03-28 12:06 ` [PATCH v2 2/6] xen: add helper for calling notifier_call_chain() to common/cpu.c Juergen Gross
2019-03-29 17:33   ` Dario Faggioli
2019-03-28 12:06 ` [PATCH v2 3/6] xen: add new cpu notifier action CPU_RESUME_FAILED Juergen Gross
2019-03-28 12:06 ` [PATCH v2 4/6] xen: don't free percpu areas during suspend Juergen Gross
2019-03-28 13:39   ` Jan Beulich
2019-03-28 12:06 ` [PATCH v2 5/6] xen/cpupool: simplify suspend/resume handling Juergen Gross
2019-03-28 12:06 ` [PATCH v2 6/6] xen/sched: don't disable scheduler on cpus during suspend Juergen Gross
2019-03-29 17:36   ` Dario Faggioli
2019-03-28 13:01 ` [PATCH v2 0/6] xen: simplify suspend/resume handling Volodymyr Babchuk
2019-03-28 13:21   ` Juergen Gross
2019-03-28 13:23     ` Julien Grall
2019-03-28 13:34     ` Volodymyr Babchuk
2019-03-28 13:33   ` Julien Grall
2019-03-28 13:37     ` Juergen Gross
2019-03-28 13:49       ` Julien Grall [this message]
2019-03-28 13:56     ` Volodymyr Babchuk
2019-03-28 14:43       ` Julien Grall
2019-03-28 14:53       ` Dario Faggioli
2019-03-28 14:57         ` Volodymyr Babchuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f9c6f862-589e-8565-d011-004a38a8f3d6@arm.com \
    --to=julien.grall@arm.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jgross@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=nd@arm.com \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=tim@xen.org \
    --cc=vlad.babchuk@gmail.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.