All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anchal Agarwal <anchalag@amazon.com>
To: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>, "hpa@zytor.com" <hpa@zytor.com>,
	"jgross@suse.com" <jgross@suse.com>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"sstabellini@kernel.org" <sstabellini@kernel.org>,
	"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>,
	"roger.pau@citrix.com" <roger.pau@citrix.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"len.brown@intel.com" <len.brown@intel.com>,
	"pavel@ucw.cz" <pavel@ucw.cz>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"dwmw@amazon.co.uk" <dwmw@amazon.co.uk>
Subject: Re: [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode
Date: Thu, 3 Jun 2021 23:27:42 +0000	[thread overview]
Message-ID: <20210603232742.GB14368@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> (raw)
In-Reply-To: <2cb71322-9d3d-395e-293b-24888f5be759@oracle.com>

On Thu, Jun 03, 2021 at 04:11:46PM -0400, Boris Ostrovsky wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 6/2/21 3:37 PM, Anchal Agarwal wrote:
> > On Tue, Jun 01, 2021 at 10:18:36AM -0400, Boris Ostrovsky wrote:
> >>
> > The resume won't fail because in the image the xen_vcpu and xen_vcpu_info are
> > same. These are the same values that got in there during saving of the
> > hibernation image. So whatever xen_vcpu got as a value during boot time registration on resume is
> > essentially lost once the jump into the saved kernel image happens. Interesting
> > part is if KASLR is not enabled boot time vcpup mfn is same as in the image.
> 
> 
> Do you start the your guest right after you've hibernated it? What happens if you create (and keep running) a few other guests in-between? mfn would likely be different then I'd think.
> 
>
Yes, I just run it in loops on a single guest and I am able to see the issue in
20-40 iterations sometime may be sooner. Yeah, you could be right and this could
definitely happen more often depending what's happening on dom0 side.
> > Once you enable KASLR this value changes sometimes and whenever that happens
> > resume gets stuck. Does that make sense?
> >
> > No it does not resume successfully if hypercall fails because I was trying to
> > explicitly reset vcpu and invoke hypercall.
> > I am just wondering why does restore logic fails to work here or probably I am
> > missing a critical piece here.
> 
> 
> If you are not using KASLR then xen_vcpu_info is at the same address every time you boot. So whatever you registered before hibernating stays the same when you boot second time and register again, and so successful comparison in xen_vcpu_setup() works. (Mostly by chance.)
>
That's what I thought so too.
> 
> But if KASLR is on then this comparison not failing should cause xen_vcpu pointer in the loaded image to become bogus because xen_vcpu is now registered for a different xen_vcpu_info address during boot.
> 
The reason for that I think is once you jump into the image that information is
getting lost. But there is  some residue somewhere that's causing the resume to
fail. I haven't been able to pinpoint the exact field value that may be causing
that issue.
Correct me if I am wrong here, but even if hypothetically I put a hack to tell the kernel
somehow re-register vcpu it won't pass because there is no hypercall to
unregister it in first place? Can the resumed kernel use the new values in that
case [Now this is me just throwing wild guesses!!]

> 
> >>> Another line of thought is something what kexec does to come around this problem
> >>> is to abuse soft_reset and issue it during syscore_resume or may be before the image get loaded.
> >>> I haven't experimented with that yet as I am assuming there has to be a way to re-register vcpus during resume.
> >>
> >> Right, that sounds like it should work.
> >>
> > You mean soft reset or re-register vcpu?
> 
> 
> Doing something along the lines of a soft reset. It should allow you to re-register. Not sure how you can use it without Xen changes though.
> 
No not without xen changes. It won't work. I will have xen changes in place to
test that on our infrastructure. 

--
Anchal
> 
> 
> -boris
> 

  reply	other threads:[~2021-06-03 23:27 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 22:22 [PATCH v3 00/11] Fix PM hibernation in Xen guests Anchal Agarwal
2020-08-21 22:22 ` Anchal Agarwal
2020-08-21 22:25 ` [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode Anchal Agarwal
2020-08-21 22:25   ` Anchal Agarwal
2020-09-13 15:43   ` boris.ostrovsky
2020-09-14 21:47     ` Anchal Agarwal
2020-09-14 21:47       ` Anchal Agarwal
2020-09-15  0:24       ` boris.ostrovsky
2020-09-15 18:00         ` Anchal Agarwal
2020-09-15 18:00           ` Anchal Agarwal
2020-09-15 19:58           ` boris.ostrovsky
2020-09-21 21:54             ` Anchal Agarwal
2020-09-21 21:54               ` Anchal Agarwal
2020-09-22 16:18               ` boris.ostrovsky
2020-09-22 23:17                 ` Anchal Agarwal
2020-09-22 23:17                   ` Anchal Agarwal
2020-09-25 19:04                   ` Anchal Agarwal
2020-09-25 19:04                     ` Anchal Agarwal
2020-09-25 20:02                     ` boris.ostrovsky
2020-09-25 22:28                       ` Anchal Agarwal
2020-09-25 22:28                         ` Anchal Agarwal
2020-09-28 18:49                         ` boris.ostrovsky
2020-09-30 21:29                           ` Anchal Agarwal
2020-10-01 12:43                             ` boris.ostrovsky
2021-05-21  5:26                               ` Anchal Agarwal
2021-05-25 22:23                                 ` Boris Ostrovsky
2021-05-26  4:40                                   ` Anchal Agarwal
2021-05-26 18:29                                     ` Boris Ostrovsky
2021-05-28 21:50                                       ` Anchal Agarwal
2021-06-01 14:18                                         ` Boris Ostrovsky
2021-06-02 19:37                                           ` Anchal Agarwal
2021-06-03 20:11                                             ` Boris Ostrovsky
2021-06-03 23:27                                               ` Anchal Agarwal [this message]
2021-06-04  1:49                                                 ` Boris Ostrovsky
2020-09-13 17:07   ` boris.ostrovsky
2020-08-21 22:26 ` [PATCH v3 02/11] xenbus: add freeze/thaw/restore callbacks support Anchal Agarwal
2020-08-21 22:26   ` Anchal Agarwal
2020-09-13 16:11   ` boris.ostrovsky
2020-09-15 19:56     ` Anchal Agarwal
2020-09-15 19:56       ` Anchal Agarwal
2020-08-21 22:26 ` [PATCH v3 03/11] x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume Anchal Agarwal
2020-08-21 22:26   ` Anchal Agarwal
2020-08-21 22:27 ` [PATCH v3 04/11] x86/xen: add system core suspend and resume callbacks Anchal Agarwal
2020-08-21 22:27   ` Anchal Agarwal
2020-09-13 17:25   ` boris.ostrovsky
2020-08-21 22:27 ` [PATCH v3 05/11] genirq: Shutdown irq chips in suspend/resume during hibernation Thomas Gleixner
2020-08-22  0:36   ` Thomas Gleixner
2020-08-24 17:25     ` Anchal Agarwal
2020-08-24 17:25       ` Anchal Agarwal
2020-08-25 13:20     ` Christoph Hellwig
2020-08-25 15:25       ` Thomas Gleixner
2020-08-21 22:28 ` [PATCH v3 06/11] xen-blkfront: add callbacks for PM suspend and hibernation Anchal Agarwal
2020-08-21 22:28   ` Anchal Agarwal
2020-08-21 22:29 ` [PATCH v3 07/11] xen-netfront: " Anchal Agarwal
2020-08-21 22:29   ` Anchal Agarwal
2020-08-21 22:29 ` [PATCH v3 08/11] x86/xen: save and restore steal clock during PM hibernation Anchal Agarwal
2020-08-21 22:29   ` Anchal Agarwal
2020-08-21 22:30 ` [PATCH v3 09/11] xen: Introduce wrapper for save/restore sched clock offset Anchal Agarwal
2020-08-21 22:30   ` Anchal Agarwal
2020-08-21 22:30 ` [PATCH v3 10/11] xen: Update sched clock offset to avoid system instability in hibernation Anchal Agarwal
2020-08-21 22:30   ` Anchal Agarwal
2020-09-13 17:52   ` boris.ostrovsky
2020-08-21 22:31 ` [PATCH v3 11/11] PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA Anchal Agarwal
2020-08-21 22:31   ` Anchal Agarwal
2020-08-28 18:26 ` [PATCH v3 00/11] Fix PM hibernation in Xen guests Anchal Agarwal
2020-08-28 18:26   ` Anchal Agarwal
2020-08-28 18:29   ` Rafael J. Wysocki
2020-08-28 18:29     ` Rafael J. Wysocki
2020-08-28 18:39     ` Anchal Agarwal
2020-09-11 20:44       ` Anchal Agarwal
2020-09-11 15:19 ` boris.ostrovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210603232742.GB14368@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com \
    --to=anchalag@amazon.com \
    --cc=axboe@kernel.dk \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=davem@davemloft.net \
    --cc=dwmw@amazon.co.uk \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.