linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Anchal Agarwal <anchalag@amazon.com>
To: <boris.ostrovsky@oracle.com>
Cc: <tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de>,
	<hpa@zytor.com>, <x86@kernel.org>, <jgross@suse.com>,
	<linux-pm@vger.kernel.org>, <linux-mm@kvack.org>,
	<kamatam@amazon.com>, <sstabellini@kernel.org>,
	<konrad.wilk@oracle.com>, <roger.pau@citrix.com>,
	<axboe@kernel.dk>, <davem@davemloft.net>, <rjw@rjwysocki.net>,
	<len.brown@intel.com>, <pavel@ucw.cz>, <peterz@infradead.org>,
	<eduval@amazon.com>, <sblbir@amazon.com>,
	<xen-devel@lists.xenproject.org>, <vkuznets@redhat.com>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<dwmw@amazon.co.uk>, <benh@kernel.crashing.org>
Subject: Re: [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode
Date: Wed, 30 Sep 2020 21:29:44 +0000	[thread overview]
Message-ID: <20200930212944.GA3138@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> (raw)
In-Reply-To: <cc738014-6a79-a5ae-cb2a-a02ff15b4582@oracle.com>

On Mon, Sep 28, 2020 at 02:49:56PM -0400, boris.ostrovsky@oracle.com wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 9/25/20 6:28 PM, Anchal Agarwal wrote:
> > On Fri, Sep 25, 2020 at 04:02:58PM -0400, boris.ostrovsky@oracle.com wrote:
> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>
> >>
> >>
> >> On 9/25/20 3:04 PM, Anchal Agarwal wrote:
> >>> On Tue, Sep 22, 2020 at 11:17:36PM +0000, Anchal Agarwal wrote:
> >>>> On Tue, Sep 22, 2020 at 12:18:05PM -0400, boris.ostrovsky@oracle.com wrote:
> >>>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 9/21/20 5:54 PM, Anchal Agarwal wrote:
> 
> >>>>> Also, wrt KASLR stuff, that issue is still seen sometimes but I haven't had
> >>>>> bandwidth to dive deep into the issue and fix it.
> >>
> >> So what's the plan there? You first mentioned this issue early this year and judged by your response it is not clear whether you will ever spend time looking at it.
> >>
> > I do want to fix it and did do some debugging earlier this year just haven't
> > gotten back to it. Also, wanted to understand if the issue is a blocker to this
> > series?
> 
> 
> Integrating code with known bugs is less than ideal.
> 
So for this series to be accepted, KASLR needs to be fixed along with other
comments of course? 
> 
> 3% failure for this feature seems to be a manageable number from the reproducability perspective --- you should be able to script this and each iteration should take way under a minute, no?
> 
>
Yes it should be doable. The % is not constant here that's the max I have seen.
Also, if at worse it takes a min per run and I have to run 2000-3000 runs to
produce failure that will still be slower. I have to dig in to see if I can find
a better way. 

> > I had some theories when debugging around this like if the random base address picked by kaslr for the
> > resuming kernel mismatches the suspended kernel and just jogging my memory, I didn't find that as the case.
> > Another hunch was if physical address of registered vcpu info at boot is different from what suspended kernel
> > has and that can cause CPU's to get stuck when coming online.
> 
> 
> I'd think if this were the case you'd have 100% failure rate. And we are also re-registering vcpu info on xen restore and I am not aware of any failures due to KASLR.
> 
What I meant there wrt VCPU info was that VCPU info is not unregistered during hibernation,
so Xen still remembers the old physical addresses for the VCPU information, created by the
booting kernel. But since the hibernation kernel may have different physical
addresses for VCPU info and if mismatch happens, it may cause issues with resume. 
During hibernation, the VCPU info register hypercall is not invoked again.
> 
> > The issue was only
> > reproducible 3% of the time out of 3000 runs hence its hard to just reproduce this.
> >
> > Moreover, I also wanted to get an insight on if hibernation works correctly with KASLR
> > generally and its only Xen causing the issue?
> 
> 
> With KASLR being on by default I'd be surprised if it didn't.
>
Thant makes it xen specific then. Also, I have not seen the issue on KVM based
instances.
> 
> -boris
> 
- Anchal


  reply	other threads:[~2020-09-30 21:30 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 22:22 [PATCH v3 00/11] Fix PM hibernation in Xen guests Anchal Agarwal
2020-08-21 22:25 ` [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode Anchal Agarwal
2020-09-13 15:43   ` boris.ostrovsky
2020-09-14 21:47     ` Anchal Agarwal
2020-09-15  0:24       ` boris.ostrovsky
2020-09-15 18:00         ` Anchal Agarwal
2020-09-15 19:58           ` boris.ostrovsky
2020-09-21 21:54             ` Anchal Agarwal
2020-09-22 16:18               ` boris.ostrovsky
2020-09-22 23:17                 ` Anchal Agarwal
2020-09-25 19:04                   ` Anchal Agarwal
2020-09-25 20:02                     ` boris.ostrovsky
2020-09-25 22:28                       ` Anchal Agarwal
2020-09-28 18:49                         ` boris.ostrovsky
2020-09-30 21:29                           ` Anchal Agarwal [this message]
2020-10-01 12:43                             ` boris.ostrovsky
2021-05-21  5:26                               ` Anchal Agarwal
2021-05-25 22:23                                 ` Boris Ostrovsky
2021-05-26  4:40                                   ` Anchal Agarwal
2021-05-26 18:29                                     ` Boris Ostrovsky
2021-05-28 21:50                                       ` Anchal Agarwal
2021-06-01 14:18                                         ` Boris Ostrovsky
2021-06-02 19:37                                           ` Anchal Agarwal
2021-06-03 20:11                                             ` Boris Ostrovsky
2021-06-03 23:27                                               ` Anchal Agarwal
2021-06-04  1:49                                                 ` Boris Ostrovsky
2020-09-13 17:07   ` boris.ostrovsky
2020-08-21 22:26 ` [PATCH v3 02/11] xenbus: add freeze/thaw/restore callbacks support Anchal Agarwal
2020-09-13 16:11   ` boris.ostrovsky
2020-09-15 19:56     ` Anchal Agarwal
2020-08-21 22:26 ` [PATCH v3 03/11] x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume Anchal Agarwal
2020-08-21 22:27 ` [PATCH v3 04/11] x86/xen: add system core suspend and resume callbacks Anchal Agarwal
2020-09-13 17:25   ` boris.ostrovsky
2020-08-21 22:27 ` [PATCH v3 05/11] genirq: Shutdown irq chips in suspend/resume during hibernation Thomas Gleixner
2020-08-22  0:36   ` Thomas Gleixner
2020-08-24 17:25     ` Anchal Agarwal
2020-08-25 13:20     ` Christoph Hellwig
2020-08-25 15:25       ` Thomas Gleixner
2020-08-21 22:28 ` [PATCH v3 06/11] xen-blkfront: add callbacks for PM suspend and hibernation Anchal Agarwal
2020-08-21 22:29 ` [PATCH v3 07/11] xen-netfront: " Anchal Agarwal
2020-08-21 22:29 ` [PATCH v3 08/11] x86/xen: save and restore steal clock during PM hibernation Anchal Agarwal
2020-08-21 22:30 ` [PATCH v3 09/11] xen: Introduce wrapper for save/restore sched clock offset Anchal Agarwal
2020-08-21 22:30 ` [PATCH v3 10/11] xen: Update sched clock offset to avoid system instability in hibernation Anchal Agarwal
2020-09-13 17:52   ` boris.ostrovsky
2020-08-21 22:31 ` [PATCH v3 11/11] PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA Anchal Agarwal
2020-08-28 18:26 ` [PATCH v3 00/11] Fix PM hibernation in Xen guests Anchal Agarwal
2020-08-28 18:29   ` Rafael J. Wysocki
2020-08-28 18:39     ` Anchal Agarwal
2020-09-11 20:44       ` Anchal Agarwal
2020-09-11 15:19 ` boris.ostrovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200930212944.GA3138@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com \
    --to=anchalag@amazon.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=davem@davemloft.net \
    --cc=dwmw@amazon.co.uk \
    --cc=eduval@amazon.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=kamatam@amazon.com \
    --cc=konrad.wilk@oracle.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=roger.pau@citrix.com \
    --cc=sblbir@amazon.com \
    --cc=sstabellini@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).