All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anchal Agarwal <anchalag@amazon.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"Singh, Balbir" <sblbir@amazon.com>,
	"Valentin, Eduardo" <eduval@amazon.com>,
	"boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"sstabellini@kernel.org" <sstabellini@kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"pavel@ucw.cz" <pavel@ucw.cz>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"x86@kernel.org" <x86@kernel.org>,
	"roger.pau@citrix.com" <roger.pau@citrix.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"Kamata, Munehisa" <kamatam@amazon.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"konrad.wilk@oracle.co" <konrad.wilk@oracle.com>,
	"len.brown@intel.com" <len.brown@intel.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"fllinden@amaozn.com" <fllinden@amazon.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	<anchalag@amazon.com>
Subject: Re: [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
Date: Wed, 22 Jan 2020 20:07:10 +0000	[thread overview]
Message-ID: <20200122200710.GA3071@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> (raw)
In-Reply-To: <20200114192952.GA26755@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>

On Tue, Jan 14, 2020 at 07:29:52PM +0000, Anchal Agarwal wrote:
> On Tue, Jan 14, 2020 at 12:30:02AM +0100, Rafael J. Wysocki wrote:
> > On Mon, Jan 13, 2020 at 10:50 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Mon, Jan 13, 2020 at 1:43 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Mon, Jan 13, 2020 at 11:43:18AM +0000, Singh, Balbir wrote:
> > > > > For your original comment, just wanted to clarify the following:
> > > > >
> > > > > 1. After hibernation, the machine can be resumed on a different but compatible
> > > > > host (these are VM images hibernated)
> > > > > 2. This means the clock between host1 and host2 can/will be different
> > > > >
> > > > > In your comments are you making the assumption that the host(s) is/are the
> > > > > same? Just checking the assumptions being made and being on the same page with
> > > > > them.
> > > >
> > > > I would expect this to be the same problem we have as regular suspend,
> > > > after power off the TSC will have been reset, so resume will have to
> > > > somehow bridge that gap. I've no idea if/how it does that.
> > >
> > > In general, this is done by timekeeping_resume() and the only special
> > > thing done for the TSC appears to be the tsc_verify_tsc_adjust(true)
> > > call in tsc_resume().
> > 
> > And I forgot about tsc_restore_sched_clock_state() that gets called
> > via restore_processor_state() on x86, before calling
> > timekeeping_resume().
> >
> In this case tsc_verify_tsc_adjust(true) this does nothing as
> feature bit X86_FEATURE_TSC_ADJUST is not available to guest. 
> I am no expert in this area, but could this be messing things up?
> 
> Thanks,
> Anchal
Gentle nudge on this. I will add more data here in case that helps.

1. Before this patch, tsc is stable but hibernation does not work
100% of the time. I agree if tsc is stable it should not be marked
unstable however, in this case if I run a cpu intensive workload
in the background and trigger reboot-hibernation loop I see a 
workqueue lockup. 

2. The lockup does not hose the system completely,
the reboot-hibernation carries out and system recovers. 
However, as mentioned in the commit message system does 
become unreachable for couple of seconds.

3. Xen suspend/resume seems to save/restore time_memory area in its
xen_arch_pre_suspend and xen_arch_post_suspend. The xen clock value
is saved. xen_sched_clock_offset is set at resume time to ensure a
monotonic clock value

4. Also, the instances do not have InvariantTSC exposed. Feature bit
X86_FEATURE_TSC_ADJUST is not available to guest and xen clocksource
is used by guests.

I am not sure if something needs to be fixed on hibernate path itself
or its very much ties to time handling on xen guest hibernation

Here is a part of log from last hibernation exit to next hibernation
entry. The loop was running for a while so boot to lockup log will be
huge. I am specifically including the timestamps.

...
01h 57m 15.627s(  16ms): [    5.822701] OOM killer enabled.
01h 57m 15.627s(   0ms): [    5.824981] Restarting tasks ... done.
01h 57m 15.627s(   0ms): [    5.836397] PM: hibernation exit
01h 57m 17.636s(2009ms): [    7.844471] PM: hibernation entry
01h 57m 52.725s(35089ms): [   42.934542] BUG: workqueue lockup - pool cpus=0
node=0 flags=0x0 nice=0 stuck for 37s!
01h 57m 52.730s(   5ms): [   42.941468] Showing busy workqueues and worker
pools:
01h 57m 52.734s(   4ms): [   42.945088] workqueue events: flags=0x0
01h 57m 52.737s(   3ms): [   42.948385]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=2/256
01h 57m 52.742s(   5ms): [   42.952838]     pending: vmstat_shepherd,
check_corruption
01h 57m 52.746s(   4ms): [   42.956927] workqueue events_power_efficient:
flags=0x80
01h 57m 52.749s(   3ms): [   42.960731]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=4/256
01h 57m 52.754s(   5ms): [   42.964835]     pending: neigh_periodic_work,
do_cache_clean [sunrpc], neigh_periodic_work, check_lifetime
01h 57m 52.781s(  27ms): [   42.971419] workqueue mm_percpu_wq: flags=0x8
01h 57m 52.781s(   0ms): [   42.974628]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.781s(   0ms): [   42.978901]     pending: vmstat_update
01h 57m 52.781s(   0ms): [   42.981822] workqueue ipv6_addrconf: flags=0x40008
01h 57m 52.781s(   0ms): [   42.985524]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/1
01h 57m 52.781s(   0ms): [   42.989670]     pending: addrconf_verify_work [ipv6]
01h 57m 52.782s(   1ms): [   42.993282] workqueue xfs-conv/xvda1: flags=0xc
01h 57m 52.786s(   4ms): [   42.996708]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=3/256
01h 57m 52.790s(   4ms): [   43.000954]     pending: xfs_end_io [xfs],
xfs_end_io [xfs], xfs_end_io [xfs]
01h 57m 52.795s(   5ms): [   43.005610] workqueue xfs-reclaim/xvda1: flags=0xc
01h 57m 52.798s(   3ms): [   43.008945]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.802s(   4ms): [   43.012675]     pending: xfs_reclaim_worker [xfs]
01h 57m 52.805s(   3ms): [   43.015741] workqueue xfs-sync/xvda1: flags=0x4
01h 57m 52.808s(   3ms): [   43.018723]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.811s(   3ms): [   43.022436]     pending: xfs_log_worker [xfs]
01h 57m 52.814s(   3ms): [   43.043519] Filesystems sync: 35.234 seconds
01h 57m 52.837s(  23ms): [   43.048133] Freezing user space processes ...
(elapsed 0.001 seconds) done.
01h 57m 52.844s(   7ms): [   43.055996] OOM killer disabled.
01h 57m 53.838s( 994ms): [   43.061512] PM: Preallocating image memory... done
(allocated 385859 pages)
01h 57m 53.843s(   5ms): [   44.054720] PM: Allocated 1543436 kbytes in 1.06
seconds (1456.07 MB/s)
01h 57m 53.861s(  18ms): [   44.060885] Freezing remaining freezable tasks ...
(elapsed 0.001 seconds) done.
01h 57m 53.861s(   0ms): [   44.069715] printk: Suspending console(s) (use
no_console_suspend to debug)
01h 57m 56.278s(2417ms): [   44.116601] Disabling non-boot CPUs ...
.....
hibernate-resume loop continues after this. As mentioned above, I loose
connectivity for a while.


Thanks,
Anchal


WARNING: multiple messages have this Message-ID (diff)
From: Anchal Agarwal <anchalag@amazon.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Kamata, Munehisa" <kamatam@amazon.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"pavel@ucw.cz" <pavel@ucw.cz>, "hpa@zytor.com" <hpa@zytor.com>,
	"boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>,
	"sstabellini@kernel.org" <sstabellini@kernel.org>,
	"Valentin, Eduardo" <eduval@amazon.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Singh, Balbir" <sblbir@amazon.com>,
	"len.brown@intel.com" <len.brown@intel.com>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	anchalag@amazon.com, "bp@alien8.de" <bp@alien8.de>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"konrad.wilk@oracle.co" <konrad.wilk@oracle.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"fllinden@amaozn.com" <fllinden@amazon.com>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"Woodhouse,  David" <dwmw@amazon.co.uk>,
	"roger.pau@citrix.com" <roger.pau@citrix.com>
Subject: Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
Date: Wed, 22 Jan 2020 20:07:10 +0000	[thread overview]
Message-ID: <20200122200710.GA3071@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> (raw)
In-Reply-To: <20200114192952.GA26755@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>

On Tue, Jan 14, 2020 at 07:29:52PM +0000, Anchal Agarwal wrote:
> On Tue, Jan 14, 2020 at 12:30:02AM +0100, Rafael J. Wysocki wrote:
> > On Mon, Jan 13, 2020 at 10:50 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Mon, Jan 13, 2020 at 1:43 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Mon, Jan 13, 2020 at 11:43:18AM +0000, Singh, Balbir wrote:
> > > > > For your original comment, just wanted to clarify the following:
> > > > >
> > > > > 1. After hibernation, the machine can be resumed on a different but compatible
> > > > > host (these are VM images hibernated)
> > > > > 2. This means the clock between host1 and host2 can/will be different
> > > > >
> > > > > In your comments are you making the assumption that the host(s) is/are the
> > > > > same? Just checking the assumptions being made and being on the same page with
> > > > > them.
> > > >
> > > > I would expect this to be the same problem we have as regular suspend,
> > > > after power off the TSC will have been reset, so resume will have to
> > > > somehow bridge that gap. I've no idea if/how it does that.
> > >
> > > In general, this is done by timekeeping_resume() and the only special
> > > thing done for the TSC appears to be the tsc_verify_tsc_adjust(true)
> > > call in tsc_resume().
> > 
> > And I forgot about tsc_restore_sched_clock_state() that gets called
> > via restore_processor_state() on x86, before calling
> > timekeeping_resume().
> >
> In this case tsc_verify_tsc_adjust(true) this does nothing as
> feature bit X86_FEATURE_TSC_ADJUST is not available to guest. 
> I am no expert in this area, but could this be messing things up?
> 
> Thanks,
> Anchal
Gentle nudge on this. I will add more data here in case that helps.

1. Before this patch, tsc is stable but hibernation does not work
100% of the time. I agree if tsc is stable it should not be marked
unstable however, in this case if I run a cpu intensive workload
in the background and trigger reboot-hibernation loop I see a 
workqueue lockup. 

2. The lockup does not hose the system completely,
the reboot-hibernation carries out and system recovers. 
However, as mentioned in the commit message system does 
become unreachable for couple of seconds.

3. Xen suspend/resume seems to save/restore time_memory area in its
xen_arch_pre_suspend and xen_arch_post_suspend. The xen clock value
is saved. xen_sched_clock_offset is set at resume time to ensure a
monotonic clock value

4. Also, the instances do not have InvariantTSC exposed. Feature bit
X86_FEATURE_TSC_ADJUST is not available to guest and xen clocksource
is used by guests.

I am not sure if something needs to be fixed on hibernate path itself
or its very much ties to time handling on xen guest hibernation

Here is a part of log from last hibernation exit to next hibernation
entry. The loop was running for a while so boot to lockup log will be
huge. I am specifically including the timestamps.

...
01h 57m 15.627s(  16ms): [    5.822701] OOM killer enabled.
01h 57m 15.627s(   0ms): [    5.824981] Restarting tasks ... done.
01h 57m 15.627s(   0ms): [    5.836397] PM: hibernation exit
01h 57m 17.636s(2009ms): [    7.844471] PM: hibernation entry
01h 57m 52.725s(35089ms): [   42.934542] BUG: workqueue lockup - pool cpus=0
node=0 flags=0x0 nice=0 stuck for 37s!
01h 57m 52.730s(   5ms): [   42.941468] Showing busy workqueues and worker
pools:
01h 57m 52.734s(   4ms): [   42.945088] workqueue events: flags=0x0
01h 57m 52.737s(   3ms): [   42.948385]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=2/256
01h 57m 52.742s(   5ms): [   42.952838]     pending: vmstat_shepherd,
check_corruption
01h 57m 52.746s(   4ms): [   42.956927] workqueue events_power_efficient:
flags=0x80
01h 57m 52.749s(   3ms): [   42.960731]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=4/256
01h 57m 52.754s(   5ms): [   42.964835]     pending: neigh_periodic_work,
do_cache_clean [sunrpc], neigh_periodic_work, check_lifetime
01h 57m 52.781s(  27ms): [   42.971419] workqueue mm_percpu_wq: flags=0x8
01h 57m 52.781s(   0ms): [   42.974628]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.781s(   0ms): [   42.978901]     pending: vmstat_update
01h 57m 52.781s(   0ms): [   42.981822] workqueue ipv6_addrconf: flags=0x40008
01h 57m 52.781s(   0ms): [   42.985524]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/1
01h 57m 52.781s(   0ms): [   42.989670]     pending: addrconf_verify_work [ipv6]
01h 57m 52.782s(   1ms): [   42.993282] workqueue xfs-conv/xvda1: flags=0xc
01h 57m 52.786s(   4ms): [   42.996708]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=3/256
01h 57m 52.790s(   4ms): [   43.000954]     pending: xfs_end_io [xfs],
xfs_end_io [xfs], xfs_end_io [xfs]
01h 57m 52.795s(   5ms): [   43.005610] workqueue xfs-reclaim/xvda1: flags=0xc
01h 57m 52.798s(   3ms): [   43.008945]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.802s(   4ms): [   43.012675]     pending: xfs_reclaim_worker [xfs]
01h 57m 52.805s(   3ms): [   43.015741] workqueue xfs-sync/xvda1: flags=0x4
01h 57m 52.808s(   3ms): [   43.018723]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.811s(   3ms): [   43.022436]     pending: xfs_log_worker [xfs]
01h 57m 52.814s(   3ms): [   43.043519] Filesystems sync: 35.234 seconds
01h 57m 52.837s(  23ms): [   43.048133] Freezing user space processes ...
(elapsed 0.001 seconds) done.
01h 57m 52.844s(   7ms): [   43.055996] OOM killer disabled.
01h 57m 53.838s( 994ms): [   43.061512] PM: Preallocating image memory... done
(allocated 385859 pages)
01h 57m 53.843s(   5ms): [   44.054720] PM: Allocated 1543436 kbytes in 1.06
seconds (1456.07 MB/s)
01h 57m 53.861s(  18ms): [   44.060885] Freezing remaining freezable tasks ...
(elapsed 0.001 seconds) done.
01h 57m 53.861s(   0ms): [   44.069715] printk: Suspending console(s) (use
no_console_suspend to debug)
01h 57m 56.278s(2417ms): [   44.116601] Disabling non-boot CPUs ...
.....
hibernate-resume loop continues after this. As mentioned above, I loose
connectivity for a while.


Thanks,
Anchal


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2020-01-22 20:07 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-07 23:45 [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation Anchal Agarwal
2020-01-07 23:45 ` [Xen-devel] " Anchal Agarwal
2020-01-08 10:50 ` Peter Zijlstra
2020-01-08 10:50   ` [Xen-devel] " Peter Zijlstra
2020-01-10 15:35   ` Eduardo Valentin
2020-01-10 15:35     ` [Xen-devel] " Eduardo Valentin
2020-01-13 10:16     ` Peter Zijlstra
2020-01-13 10:16       ` [Xen-devel] " Peter Zijlstra
2020-01-13 11:43       ` Singh, Balbir
2020-01-13 11:43         ` [Xen-devel] " Singh, Balbir
2020-01-13 11:43         ` Singh, Balbir
2020-01-13 11:48         ` Rafael J. Wysocki
2020-01-13 11:48           ` [Xen-devel] " Rafael J. Wysocki
2020-01-13 11:48           ` Rafael J. Wysocki
2020-01-13 12:42         ` Peter Zijlstra
2020-01-13 12:42           ` [Xen-devel] " Peter Zijlstra
2020-01-13 12:42           ` Peter Zijlstra
2020-01-13 21:50           ` Rafael J. Wysocki
2020-01-13 21:50             ` [Xen-devel] " Rafael J. Wysocki
2020-01-13 21:50             ` Rafael J. Wysocki
2020-01-13 23:30             ` Rafael J. Wysocki
2020-01-13 23:30               ` [Xen-devel] " Rafael J. Wysocki
2020-01-13 23:30               ` Rafael J. Wysocki
2020-01-14 19:29               ` Anchal Agarwal
2020-01-14 19:29                 ` [Xen-devel] " Anchal Agarwal
2020-01-22 20:07                 ` Anchal Agarwal [this message]
2020-01-22 20:07                   ` Anchal Agarwal
2020-01-23 16:27                   ` Boris Ostrovsky
2020-01-23 16:27                     ` [Xen-devel] " Boris Ostrovsky
2020-01-13 13:01         ` Andrew Cooper
2020-01-13 13:01           ` Andrew Cooper
2020-01-13 13:01           ` Andrew Cooper
2020-01-13 13:54           ` David Woodhouse
2020-01-13 13:54             ` David Woodhouse
2020-01-13 13:54             ` David Woodhouse
2020-01-13 15:02           ` Singh, Balbir
2020-01-13 15:02             ` Singh, Balbir
2020-01-13 15:02             ` Singh, Balbir

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200122200710.GA3071@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com \
    --to=anchalag@amazon.com \
    --cc=axboe@kernel.dk \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=davem@davemloft.net \
    --cc=dwmw@amazon.co.uk \
    --cc=eduval@amazon.com \
    --cc=fllinden@amazon.com \
    --cc=hpa@zytor.com \
    --cc=kamatam@amazon.com \
    --cc=konrad.wilk@oracle.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=roger.pau@citrix.com \
    --cc=sblbir@amazon.com \
    --cc=sstabellini@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.