Netdev Archive on lore.kernel.org
 help / color / Atom feed
From: "Singh, Balbir" <sblbir@amazon.com>
To: "peterz@infradead.org" <peterz@infradead.org>,
	"Valentin, Eduardo" <eduval@amazon.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>
Cc: "boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Agarwal, Anchal" <anchalag@amazon.com>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"sstabellini@kernel.org" <sstabellini@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com" 
	<Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"pavel@ucw.cz" <pavel@ucw.cz>,
	"jgross@suse.com" <jgross@suse.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"x86@kernel.org" <x86@kernel.org>,
	"roger.pau@citrix.com" <roger.pau@citrix.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"fllinden@amaozn.com" <fllinden@amaozn.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"Kamata, Munehisa" <kamatam@amazon.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"konrad.wilk@oracle.co" <konrad.wilk@oracle.co>,
	"len.brown@intel.com" <len.brown@intel.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
Date: Mon, 13 Jan 2020 15:02:17 +0000
Message-ID: <34c5d10f1df00345ff7ab2ba91d38a32967b3bce.camel@amazon.com> (raw)
In-Reply-To: <7bb967ca-2a91-6397-9c0a-6eafd43c83ed@citrix.com>

On Mon, 2020-01-13 at 13:01 +0000, Andrew Cooper wrote:
> On 13/01/2020 11:43, Singh, Balbir wrote:
> > On Mon, 2020-01-13 at 11:16 +0100, Peter Zijlstra wrote:
> > > On Fri, Jan 10, 2020 at 07:35:20AM -0800, Eduardo Valentin wrote:
> > > > Hey Peter,
> > > > 
> > > > On Wed, Jan 08, 2020 at 11:50:11AM +0100, Peter Zijlstra wrote:
> > > > > On Tue, Jan 07, 2020 at 11:45:26PM +0000, Anchal Agarwal wrote:
> > > > > > From: Eduardo Valentin <eduval@amazon.com>
> > > > > > 
> > > > > > System instability are seen during resume from hibernation when
> > > > > > system
> > > > > > is under heavy CPU load. This is due to the lack of update of
> > > > > > sched
> > > > > > clock data, and the scheduler would then think that heavy CPU hog
> > > > > > tasks need more time in CPU, causing the system to freeze
> > > > > > during the unfreezing of tasks. For example, threaded irqs,
> > > > > > and kernel processes servicing network interface may be delayed
> > > > > > for several tens of seconds, causing the system to be unreachable.
> > > > > > The fix for this situation is to mark the sched clock as unstable
> > > > > > as early as possible in the resume path, leaving it unstable
> > > > > > for the duration of the resume process. This will force the
> > > > > > scheduler to attempt to align the sched clock across CPUs using
> > > > > > the delta with time of day, updating sched clock data. In a post
> > > > > > hibernation event, we can then mark the sched clock as stable
> > > > > > again, avoiding unnecessary syncs with time of day on systems
> > > > > > in which TSC is reliable.
> > > > > 
> > > > > This makes no frigging sense what so bloody ever. If the clock is
> > > > > stable, we don't care about sched_clock_data. When it is stable you
> > > > > get
> > > > > a linear function of the TSC without complicated bits on.
> > > > > 
> > > > > When it is unstable, only then do we care about the
> > > > > sched_clock_data.
> > > > > 
> > > > 
> > > > Yeah, maybe what is not clear here is that we covering for situation
> > > > where clock stability changes over time, e.g. at regular boot clock is
> > > > stable, hibernation happens, then restore happens in a non-stable
> > > > clock.
> > > 
> > > Still confused, who marks the thing unstable? The patch seems to suggest
> > > you do yourself, but it is not at all clear why.
> > > 
> > > If TSC really is unstable, then it needs to remain unstable. If the TSC
> > > really is stable then there is no point in marking is unstable.
> > > 
> > > Either way something is off, and you're not telling me what.
> > > 
> > 
> > Hi, Peter
> > 
> > For your original comment, just wanted to clarify the following:
> > 
> > 1. After hibernation, the machine can be resumed on a different but
> > compatible
> > host (these are VM images hibernated)
> > 2. This means the clock between host1 and host2 can/will be different
> 
> The guests TSC value is part of all save/migrate/resume state.  Given
> this bug, I presume you've actually discarded all register state on
> hibernate, and the TSC is starting again from 0?
> 
> The frequency of the new TSC might very likely be different, but the
> scale/offset in the paravirtual clock information should let Linux's
> view of time stay consistent.
> 

I am looking at my old dmesg logs, which I seem to have lost to revalidate,
but I think Eduardo had a different point. I should point out that I was
adding to the list of potentially missed assumptions


> > In your comments are you making the assumption that the host(s) is/are the
> > same? Just checking the assumptions being made and being on the same page
> > with
> > them.
> 
> TSCs are a massive source of "fun".  I'm not surprised that there are
> yet more bugs around.
> 
> Does anyone actually know what does/should happen to the real TSC on
> native S4?  The default course of action should be for virtualisation to
> follow suit.
> 
> ~Andrew

Balbir

      parent reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-07 23:45 Anchal Agarwal
2020-01-08 10:50 ` Peter Zijlstra
2020-01-10 15:35   ` Eduardo Valentin
2020-01-13 10:16     ` Peter Zijlstra
2020-01-13 11:43       ` Singh, Balbir
2020-01-13 11:48         ` Rafael J. Wysocki
2020-01-13 12:42         ` Peter Zijlstra
2020-01-13 21:50           ` Rafael J. Wysocki
2020-01-13 23:30             ` Rafael J. Wysocki
2020-01-14 19:29               ` Anchal Agarwal
2020-01-22 20:07                 ` Anchal Agarwal
2020-01-23 16:27                   ` Boris Ostrovsky
2020-01-13 13:01         ` [Xen-devel] " Andrew Cooper
2020-01-13 13:54           ` David Woodhouse
2020-01-13 15:02           ` Singh, Balbir [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34c5d10f1df00345ff7ab2ba91d38a32967b3bce.camel@amazon.com \
    --to=sblbir@amazon.com \
    --cc=Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com \
    --cc=anchalag@amazon.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=axboe@kernel.dk \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=davem@davemloft.net \
    --cc=dwmw@amazon.co.uk \
    --cc=eduval@amazon.com \
    --cc=fllinden@amaozn.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=kamatam@amazon.com \
    --cc=konrad.wilk@oracle.co \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git