From: Andrew Cooper <andrew.cooper3@citrix.com> To: "Singh, Balbir" <sblbir@amazon.com>, "peterz@infradead.org" <peterz@infradead.org>, "Valentin, Eduardo" <eduval@amazon.com> Cc: "konrad.wilk@oracle.co" <konrad.wilk@oracle.co>, "x86@kernel.org" <x86@kernel.org>, "len.brown@intel.com" <len.brown@intel.com>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "pavel@ucw.cz" <pavel@ucw.cz>, "hpa@zytor.com" <hpa@zytor.com>, "boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>, "sstabellini@kernel.org" <sstabellini@kernel.org>, "fllinden@amaozn.com" <fllinden@amaozn.com>, "Kamata, Munehisa" <kamatam@amazon.com>, "mingo@redhat.com" <mingo@redhat.com>, "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>, "axboe@kernel.dk" <axboe@kernel.dk>, "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>, "Agarwal, Anchal" <anchalag@amazon.com>, "bp@alien8.de" <bp@alien8.de>, "tglx@linutronix.de" <tglx@linutronix.de>, "jgross@suse.com" <jgross@suse.com>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>, "Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com" <Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>, "rjw@rjwysocki.net" <rjw@rjwysocki.net>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "vkuznets@redhat.com" <vkuznets@redhat.com>, "davem@davemloft.net" <davem@davemloft.net>, "Woodhouse, David" <dwmw@amazon.co.uk>, "roger.pau@citrix.com" <roger.pau@citrix.com> Subject: Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation Date: Mon, 13 Jan 2020 13:01:02 +0000 [thread overview] Message-ID: <7bb967ca-2a91-6397-9c0a-6eafd43c83ed@citrix.com> (raw) In-Reply-To: <857b42b2e86b2ae09a23f488daada3b1b2836116.camel@amazon.com> On 13/01/2020 11:43, Singh, Balbir wrote: > On Mon, 2020-01-13 at 11:16 +0100, Peter Zijlstra wrote: >> On Fri, Jan 10, 2020 at 07:35:20AM -0800, Eduardo Valentin wrote: >>> Hey Peter, >>> >>> On Wed, Jan 08, 2020 at 11:50:11AM +0100, Peter Zijlstra wrote: >>>> On Tue, Jan 07, 2020 at 11:45:26PM +0000, Anchal Agarwal wrote: >>>>> From: Eduardo Valentin <eduval@amazon.com> >>>>> >>>>> System instability are seen during resume from hibernation when system >>>>> is under heavy CPU load. This is due to the lack of update of sched >>>>> clock data, and the scheduler would then think that heavy CPU hog >>>>> tasks need more time in CPU, causing the system to freeze >>>>> during the unfreezing of tasks. For example, threaded irqs, >>>>> and kernel processes servicing network interface may be delayed >>>>> for several tens of seconds, causing the system to be unreachable. >>>>> The fix for this situation is to mark the sched clock as unstable >>>>> as early as possible in the resume path, leaving it unstable >>>>> for the duration of the resume process. This will force the >>>>> scheduler to attempt to align the sched clock across CPUs using >>>>> the delta with time of day, updating sched clock data. In a post >>>>> hibernation event, we can then mark the sched clock as stable >>>>> again, avoiding unnecessary syncs with time of day on systems >>>>> in which TSC is reliable. >>>> This makes no frigging sense what so bloody ever. If the clock is >>>> stable, we don't care about sched_clock_data. When it is stable you get >>>> a linear function of the TSC without complicated bits on. >>>> >>>> When it is unstable, only then do we care about the sched_clock_data. >>>> >>> Yeah, maybe what is not clear here is that we covering for situation >>> where clock stability changes over time, e.g. at regular boot clock is >>> stable, hibernation happens, then restore happens in a non-stable clock. >> Still confused, who marks the thing unstable? The patch seems to suggest >> you do yourself, but it is not at all clear why. >> >> If TSC really is unstable, then it needs to remain unstable. If the TSC >> really is stable then there is no point in marking is unstable. >> >> Either way something is off, and you're not telling me what. >> > Hi, Peter > > For your original comment, just wanted to clarify the following: > > 1. After hibernation, the machine can be resumed on a different but compatible > host (these are VM images hibernated) > 2. This means the clock between host1 and host2 can/will be different The guests TSC value is part of all save/migrate/resume state. Given this bug, I presume you've actually discarded all register state on hibernate, and the TSC is starting again from 0? The frequency of the new TSC might very likely be different, but the scale/offset in the paravirtual clock information should let Linux's view of time stay consistent. > In your comments are you making the assumption that the host(s) is/are the > same? Just checking the assumptions being made and being on the same page with > them. TSCs are a massive source of "fun". I'm not surprised that there are yet more bugs around. Does anyone actually know what does/should happen to the real TSC on native S4? The default course of action should be for virtualisation to follow suit. ~Andrew
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Cooper <andrew.cooper3@citrix.com> To: "Singh, Balbir" <sblbir@amazon.com>, "peterz@infradead.org" <peterz@infradead.org>, "Valentin, Eduardo" <eduval@amazon.com> Cc: "konrad.wilk@oracle.co" <konrad.wilk@oracle.co>, "Kamata, Munehisa" <kamatam@amazon.com>, "len.brown@intel.com" <len.brown@intel.com>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "pavel@ucw.cz" <pavel@ucw.cz>, "hpa@zytor.com" <hpa@zytor.com>, "tglx@linutronix.de" <tglx@linutronix.de>, "sstabellini@kernel.org" <sstabellini@kernel.org>, "fllinden@amaozn.com" <fllinden@amaozn.com>, "x86@kernel.org" <x86@kernel.org>, "mingo@redhat.com" <mingo@redhat.com>, "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>, "jgross@suse.com" <jgross@suse.com>, "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>, "Agarwal, Anchal" <anchalag@amazon.com>, "bp@alien8.de" <bp@alien8.de>, "boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>, "axboe@kernel.dk" <axboe@kernel.dk>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>, "Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com" <Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>, "rjw@rjwysocki.net" <rjw@rjwysocki.net>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "vkuznets@redhat.com" <vkuznets@redhat.com>, "davem@davemloft.net" <davem@davemloft.net>, "Woodhouse, David" <dwmw@amazon.co.uk>, "roger.pau@citrix.com" <roger.pau@citrix.com> Subject: Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation Date: Mon, 13 Jan 2020 13:01:02 +0000 [thread overview] Message-ID: <7bb967ca-2a91-6397-9c0a-6eafd43c83ed@citrix.com> (raw) In-Reply-To: <857b42b2e86b2ae09a23f488daada3b1b2836116.camel@amazon.com> On 13/01/2020 11:43, Singh, Balbir wrote: > On Mon, 2020-01-13 at 11:16 +0100, Peter Zijlstra wrote: >> On Fri, Jan 10, 2020 at 07:35:20AM -0800, Eduardo Valentin wrote: >>> Hey Peter, >>> >>> On Wed, Jan 08, 2020 at 11:50:11AM +0100, Peter Zijlstra wrote: >>>> On Tue, Jan 07, 2020 at 11:45:26PM +0000, Anchal Agarwal wrote: >>>>> From: Eduardo Valentin <eduval@amazon.com> >>>>> >>>>> System instability are seen during resume from hibernation when system >>>>> is under heavy CPU load. This is due to the lack of update of sched >>>>> clock data, and the scheduler would then think that heavy CPU hog >>>>> tasks need more time in CPU, causing the system to freeze >>>>> during the unfreezing of tasks. For example, threaded irqs, >>>>> and kernel processes servicing network interface may be delayed >>>>> for several tens of seconds, causing the system to be unreachable. >>>>> The fix for this situation is to mark the sched clock as unstable >>>>> as early as possible in the resume path, leaving it unstable >>>>> for the duration of the resume process. This will force the >>>>> scheduler to attempt to align the sched clock across CPUs using >>>>> the delta with time of day, updating sched clock data. In a post >>>>> hibernation event, we can then mark the sched clock as stable >>>>> again, avoiding unnecessary syncs with time of day on systems >>>>> in which TSC is reliable. >>>> This makes no frigging sense what so bloody ever. If the clock is >>>> stable, we don't care about sched_clock_data. When it is stable you get >>>> a linear function of the TSC without complicated bits on. >>>> >>>> When it is unstable, only then do we care about the sched_clock_data. >>>> >>> Yeah, maybe what is not clear here is that we covering for situation >>> where clock stability changes over time, e.g. at regular boot clock is >>> stable, hibernation happens, then restore happens in a non-stable clock. >> Still confused, who marks the thing unstable? The patch seems to suggest >> you do yourself, but it is not at all clear why. >> >> If TSC really is unstable, then it needs to remain unstable. If the TSC >> really is stable then there is no point in marking is unstable. >> >> Either way something is off, and you're not telling me what. >> > Hi, Peter > > For your original comment, just wanted to clarify the following: > > 1. After hibernation, the machine can be resumed on a different but compatible > host (these are VM images hibernated) > 2. This means the clock between host1 and host2 can/will be different The guests TSC value is part of all save/migrate/resume state. Given this bug, I presume you've actually discarded all register state on hibernate, and the TSC is starting again from 0? The frequency of the new TSC might very likely be different, but the scale/offset in the paravirtual clock information should let Linux's view of time stay consistent. > In your comments are you making the assumption that the host(s) is/are the > same? Just checking the assumptions being made and being on the same page with > them. TSCs are a massive source of "fun". I'm not surprised that there are yet more bugs around. Does anyone actually know what does/should happen to the real TSC on native S4? The default course of action should be for virtualisation to follow suit. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2020-01-13 13:08 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-01-07 23:45 [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation Anchal Agarwal 2020-01-07 23:45 ` [Xen-devel] " Anchal Agarwal 2020-01-08 10:50 ` Peter Zijlstra 2020-01-08 10:50 ` [Xen-devel] " Peter Zijlstra 2020-01-10 15:35 ` Eduardo Valentin 2020-01-10 15:35 ` [Xen-devel] " Eduardo Valentin 2020-01-13 10:16 ` Peter Zijlstra 2020-01-13 10:16 ` [Xen-devel] " Peter Zijlstra 2020-01-13 11:43 ` Singh, Balbir 2020-01-13 11:43 ` [Xen-devel] " Singh, Balbir 2020-01-13 11:43 ` Singh, Balbir 2020-01-13 11:48 ` Rafael J. Wysocki 2020-01-13 11:48 ` [Xen-devel] " Rafael J. Wysocki 2020-01-13 11:48 ` Rafael J. Wysocki 2020-01-13 12:42 ` Peter Zijlstra 2020-01-13 12:42 ` [Xen-devel] " Peter Zijlstra 2020-01-13 12:42 ` Peter Zijlstra 2020-01-13 21:50 ` Rafael J. Wysocki 2020-01-13 21:50 ` [Xen-devel] " Rafael J. Wysocki 2020-01-13 21:50 ` Rafael J. Wysocki 2020-01-13 23:30 ` Rafael J. Wysocki 2020-01-13 23:30 ` [Xen-devel] " Rafael J. Wysocki 2020-01-13 23:30 ` Rafael J. Wysocki 2020-01-14 19:29 ` Anchal Agarwal 2020-01-14 19:29 ` [Xen-devel] " Anchal Agarwal 2020-01-22 20:07 ` Anchal Agarwal 2020-01-22 20:07 ` [Xen-devel] " Anchal Agarwal 2020-01-23 16:27 ` Boris Ostrovsky 2020-01-23 16:27 ` [Xen-devel] " Boris Ostrovsky 2020-01-13 13:01 ` Andrew Cooper [this message] 2020-01-13 13:01 ` Andrew Cooper 2020-01-13 13:01 ` Andrew Cooper 2020-01-13 13:54 ` David Woodhouse 2020-01-13 13:54 ` David Woodhouse 2020-01-13 13:54 ` David Woodhouse 2020-01-13 15:02 ` Singh, Balbir 2020-01-13 15:02 ` Singh, Balbir 2020-01-13 15:02 ` Singh, Balbir
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=7bb967ca-2a91-6397-9c0a-6eafd43c83ed@citrix.com \ --to=andrew.cooper3@citrix.com \ --cc=Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com \ --cc=anchalag@amazon.com \ --cc=axboe@kernel.dk \ --cc=boris.ostrovsky@oracle.com \ --cc=bp@alien8.de \ --cc=davem@davemloft.net \ --cc=dwmw@amazon.co.uk \ --cc=eduval@amazon.com \ --cc=fllinden@amaozn.com \ --cc=hpa@zytor.com \ --cc=jgross@suse.com \ --cc=kamatam@amazon.com \ --cc=konrad.wilk@oracle.co \ --cc=len.brown@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-pm@vger.kernel.org \ --cc=mingo@redhat.com \ --cc=netdev@vger.kernel.org \ --cc=pavel@ucw.cz \ --cc=peterz@infradead.org \ --cc=rjw@rjwysocki.net \ --cc=roger.pau@citrix.com \ --cc=sblbir@amazon.com \ --cc=sstabellini@kernel.org \ --cc=tglx@linutronix.de \ --cc=vkuznets@redhat.com \ --cc=x86@kernel.org \ --cc=xen-devel@lists.xenproject.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.