From: boris.ostrovsky@oracle.com
To: Anchal Agarwal <anchalag@amazon.com>
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
hpa@zytor.com, x86@kernel.org, jgross@suse.com,
linux-pm@vger.kernel.org, linux-mm@kvack.org, kamatam@amazon.com,
sstabellini@kernel.org, konrad.wilk@oracle.com,
roger.pau@citrix.com, axboe@kernel.dk, davem@davemloft.net,
rjw@rjwysocki.net, len.brown@intel.com, pavel@ucw.cz,
peterz@infradead.org, eduval@amazon.com, sblbir@amazon.com,
xen-devel@lists.xenproject.org, vkuznets@redhat.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
dwmw@amazon.co.uk, benh@kernel.crashing.org
Subject: Re: [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode
Date: Mon, 14 Sep 2020 20:24:22 -0400 [thread overview]
Message-ID: <e9b94104-d20a-b6b2-cbe0-f79b1ed09c98@oracle.com> (raw)
In-Reply-To: <20200914214754.GA19975@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>
On 9/14/20 5:47 PM, Anchal Agarwal wrote:
> On Sun, Sep 13, 2020 at 11:43:30AM -0400, boris.ostrovsky@oracle.com wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> On 8/21/20 6:25 PM, Anchal Agarwal wrote:
>>> Though, accquirng pm_mutex is still right thing to do, we may
>>> see deadlock if PM hibernation is interrupted by Xen suspend.
>>> PM hibernation depends on xenwatch thread to process xenbus state
>>> transactions, but the thread will sleep to wait pm_mutex which is
>>> already held by PM hibernation context in the scenario. Xen shutdown
>>> code may need some changes to avoid the issue.
>>
>>
>> Is it Xen's shutdown or suspend code that needs to address this? (Or I
>> may not understand what the problem is that you are describing)
>>
> Its Xen suspend code I think. If we do not take the system_transition_mutex
> in do_suspend then if hibernation is triggered in parallel to xen suspend there
> could be issues.
But you *are* taking this mutex to avoid this exact race, aren't you?
> Now this is still theoretical in my case and I havent been able
> to reproduce such a race. So the approach the original author took was to take
> this lock which to me seems right.
> And its Xen suspend and not Xen Shutdown. So basically if this scenario
> happens I am of the view one of other will fail to occur then how do we recover
> or avoid this at all.
>
> Does that answer your question?
>
>>> +
>>> +static int xen_setup_pm_notifier(void)
>>> +{
>>> + if (!xen_hvm_domain() || xen_initial_domain())
>>> + return -ENODEV;
>>
>> I don't think this works anymore.
> What do you mean?
> The first check is for xen domain types and other is for architecture support.
> The reason I put this check here is because I wanted to segregate the two.
> I do not want to register this notifier at all for !hmv guest and also if its
> an initial control domain.
> The arm check only lands in notifier because once hibernate() api is called ->
> calls pm_notifier_call_chain for PM_HIBERNATION_PREPARE this will fail for
> aarch64.
> Once we have support for aarch64 this notifier can go away altogether.
>
> Is there any other reason I may be missing why we should move this check to
> notifier?
Not registering this notifier is equivalent to having it return NOTIFY_OK.
In your earlier versions just returning NOTIFY_OK was not sufficient for
hibernation to proceed since the notifier would also need to set
suspend_mode appropriately. But now your notifier essentially filters
out unsupported configurations. And so if it is not called your
configuration (e.g. PV domain) will be considered supported.
>> In the past your notifier would set suspend_mode (or something) but now
>> it really doesn't do anything except reports an error in some (ARM) cases.
>>
>> So I think you should move this check into the notifier.
>> (And BTW I still think PM_SUSPEND_PREPARE should return an error too.
>> The fact that we are using "suspend" in xen routine names is irrelevant)
>>
> I may have send "not-updated" version of the notifier's function change.
>
> + switch (pm_event) {
> + case PM_HIBERNATION_PREPARE:
> + /* Guest hibernation is not supported for aarch64 currently*/
> + if (IS_ENABLED(CONFIG_ARM64)) {
> + ret = NOTIFY_BAD;
> + break;
> + }
> + case PM_RESTORE_PREPARE:
> + case PM_POST_RESTORE:
> + case PM_POST_HIBERNATION:
> + default:
> + ret = NOTIFY_OK;
> + }
There is no difference on x86 between this code and what you sent
earlier. In both instances PM_SUSPEND_PREPARE will return NOTIFY_OK.
On ARM this code will allow suspend to proceed (which is not what we want).
-boris
>
> With the above path PM_SUSPEND_PREPARE will go all together. Does that
> resolves this issue? I wanted to get rid of all SUSPEND_* as they are not needed
> here clearly.
> The only reason I kept it there is if someone tries to trigger hibernation on
> ARM instances they should get an error. As I am not sure about the current
> behavior. There may be a better way to not invoke hibernation on ARM DomU's and
> get rid of this block all together.
>
> Again, sorry for sending in the half baked fix. My workspace switch may have
> caused the error.
>>
>>
>> -boris
>>
> Anchal
>>
>>> + return register_pm_notifier(&xen_pm_notifier_block);
>>> +}
>>> +
next prev parent reply other threads:[~2020-09-15 0:27 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-21 22:22 [PATCH v3 00/11] Fix PM hibernation in Xen guests Anchal Agarwal
2020-08-21 22:25 ` [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode Anchal Agarwal
2020-09-13 15:43 ` boris.ostrovsky
2020-09-14 21:47 ` Anchal Agarwal
2020-09-15 0:24 ` boris.ostrovsky [this message]
2020-09-15 18:00 ` Anchal Agarwal
2020-09-15 19:58 ` boris.ostrovsky
2020-09-21 21:54 ` Anchal Agarwal
2020-09-22 16:18 ` boris.ostrovsky
2020-09-22 23:17 ` Anchal Agarwal
[not found] ` <20200925190423.GA31885@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>
2020-09-25 20:02 ` boris.ostrovsky
2020-09-25 22:28 ` Anchal Agarwal
2020-09-28 18:49 ` boris.ostrovsky
2020-09-30 21:29 ` Anchal Agarwal
2020-10-01 12:43 ` boris.ostrovsky
2021-05-21 5:26 ` Anchal Agarwal
2021-05-25 22:23 ` Boris Ostrovsky
2021-05-26 4:40 ` Anchal Agarwal
2021-05-26 18:29 ` Boris Ostrovsky
[not found] ` <20210528215008.GA19622@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>
2021-06-01 14:18 ` Boris Ostrovsky
2021-06-02 19:37 ` Anchal Agarwal
2021-06-03 20:11 ` Boris Ostrovsky
2021-06-03 23:27 ` Anchal Agarwal
2021-06-04 1:49 ` Boris Ostrovsky
2020-09-13 17:07 ` boris.ostrovsky
2020-08-21 22:26 ` [PATCH v3 02/11] xenbus: add freeze/thaw/restore callbacks support Anchal Agarwal
2020-09-13 16:11 ` boris.ostrovsky
2020-09-15 19:56 ` Anchal Agarwal
2020-08-21 22:26 ` [PATCH v3 03/11] x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume Anchal Agarwal
2020-08-21 22:27 ` [PATCH v3 04/11] x86/xen: add system core suspend and resume callbacks Anchal Agarwal
2020-09-13 17:25 ` boris.ostrovsky
2020-08-21 22:27 ` [PATCH v3 05/11] genirq: Shutdown irq chips in suspend/resume during hibernation Thomas Gleixner
2020-08-22 0:36 ` Thomas Gleixner
2020-08-24 17:25 ` Anchal Agarwal
2020-08-25 13:20 ` Christoph Hellwig
2020-08-25 15:25 ` Thomas Gleixner
2020-08-21 22:28 ` [PATCH v3 06/11] xen-blkfront: add callbacks for PM suspend and hibernation Anchal Agarwal
2020-08-21 22:29 ` [PATCH v3 07/11] xen-netfront: " Anchal Agarwal
2020-08-21 22:29 ` [PATCH v3 08/11] x86/xen: save and restore steal clock during PM hibernation Anchal Agarwal
2020-08-21 22:30 ` [PATCH v3 09/11] xen: Introduce wrapper for save/restore sched clock offset Anchal Agarwal
2020-08-21 22:30 ` [PATCH v3 10/11] xen: Update sched clock offset to avoid system instability in hibernation Anchal Agarwal
2020-09-13 17:52 ` boris.ostrovsky
2020-08-21 22:31 ` [PATCH v3 11/11] PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA Anchal Agarwal
2020-08-28 18:26 ` [PATCH v3 00/11] Fix PM hibernation in Xen guests Anchal Agarwal
2020-08-28 18:29 ` Rafael J. Wysocki
2020-08-28 18:39 ` Anchal Agarwal
2020-09-11 20:44 ` Anchal Agarwal
2020-09-11 15:19 ` boris.ostrovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e9b94104-d20a-b6b2-cbe0-f79b1ed09c98@oracle.com \
--to=boris.ostrovsky@oracle.com \
--cc=anchalag@amazon.com \
--cc=axboe@kernel.dk \
--cc=benh@kernel.crashing.org \
--cc=bp@alien8.de \
--cc=davem@davemloft.net \
--cc=dwmw@amazon.co.uk \
--cc=eduval@amazon.com \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=kamatam@amazon.com \
--cc=konrad.wilk@oracle.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pavel@ucw.cz \
--cc=peterz@infradead.org \
--cc=rjw@rjwysocki.net \
--cc=roger.pau@citrix.com \
--cc=sblbir@amazon.com \
--cc=sstabellini@kernel.org \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).