From: Anchal Agarwal <anchalag@amazon.com> To: <tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de>, <hpa@zytor.com>, <x86@kernel.org>, <boris.ostrovsky@oracle.com>, <jgross@suse.com>, <linux-pm@vger.kernel.org>, <linux-mm@kvack.org>, <kamatam@amazon.com>, <sstabellini@kernel.org>, <konrad.wilk@oracle.co>, <roger.pau@citrix.com>, <axboe@kernel.dk>, <davem@davemloft.net>, <rjw@rjwysocki.net>, <len.brown@intel.com>, <pavel@ucw.cz>, <peterz@infradead.org>, <eduval@amazon.com>, <sblbir@amazon.com>, <anchalag@amazon.com>, <xen-devel@lists.xenproject.org>, <vkuznets@redhat.com>, <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>, <dwmw@amazon.co.uk>, <fllinden@amaozn.com> Cc: <anchalag@amazon.com> Subject: [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation Date: Tue, 7 Jan 2020 23:45:26 +0000 [thread overview] Message-ID: <20200107234526.GA19034@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> (raw) From: Eduardo Valentin <eduval@amazon.com> System instability are seen during resume from hibernation when system is under heavy CPU load. This is due to the lack of update of sched clock data, and the scheduler would then think that heavy CPU hog tasks need more time in CPU, causing the system to freeze during the unfreezing of tasks. For example, threaded irqs, and kernel processes servicing network interface may be delayed for several tens of seconds, causing the system to be unreachable. Situation like this can be reported by using lockup detectors such as workqueue lockup detectors: [root@ip-172-31-67-114 ec2-user]# echo disk > /sys/power/state Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ... kernel:BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 57s! Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ... kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 57s! Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ... kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 57s! Message from syslogd@ip-172-31-67-114 at May 7 18:29:06 ... kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 403s! The fix for this situation is to mark the sched clock as unstable as early as possible in the resume path, leaving it unstable for the duration of the resume process. This will force the scheduler to attempt to align the sched clock across CPUs using the delta with time of day, updating sched clock data. In a post hibernation event, we can then mark the sched clock as stable again, avoiding unnecessary syncs with time of day on systems in which TSC is reliable. Reviewed-by: Erik Quanstrom <quanstro@amazon.com> Reviewed-by: Frank van der Linden <fllinden@amazon.com> Reviewed-by: Balbir Singh <sblbir@amazon.com> Reviewed-by: Munehisa Kamata <kamatam@amazon.com> Tested-by: Anchal Agarwal <anchalag@amazon.com> Signed-off-by: Eduardo Valentin <eduval@amazon.com> --- arch/x86/kernel/tsc.c | 29 +++++++++++++++++++++++++++++ include/linux/sched/clock.h | 5 +++++ kernel/sched/clock.c | 4 ++-- 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 7e322e2daaf5..ae77b8bc4e46 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -14,6 +14,7 @@ #include <linux/percpu.h> #include <linux/timex.h> #include <linux/static_key.h> +#include <linux/suspend.h> #include <asm/hpet.h> #include <asm/timer.h> @@ -1534,3 +1535,31 @@ unsigned long calibrate_delay_is_known(void) return 0; } #endif + +static int tsc_pm_notifier(struct notifier_block *notifier, + unsigned long pm_event, void *unused) +{ + switch (pm_event) { + case PM_HIBERNATION_PREPARE: + clear_sched_clock_stable(); + break; + case PM_POST_HIBERNATION: + /* Set back to the default */ + if (!check_tsc_unstable()) + set_sched_clock_stable(); + break; + } + + return 0; +}; + +static struct notifier_block tsc_pm_notifier_block = { + .notifier_call = tsc_pm_notifier, +}; + +static int tsc_setup_pm_notifier(void) +{ + return register_pm_notifier(&tsc_pm_notifier_block); +} + +subsys_initcall(tsc_setup_pm_notifier); diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h index 867d588314e0..902654ac5f7e 100644 --- a/include/linux/sched/clock.h +++ b/include/linux/sched/clock.h @@ -32,6 +32,10 @@ static inline void clear_sched_clock_stable(void) { } +static inline void set_sched_clock_stable(void) +{ +} + static inline void sched_clock_idle_sleep_event(void) { } @@ -51,6 +55,7 @@ static inline u64 local_clock(void) } #else extern int sched_clock_stable(void); +extern void set_sched_clock_stable(void); extern void clear_sched_clock_stable(void); /* diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index 1152259a4ca0..374d40e5b1a2 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -116,7 +116,7 @@ static void __scd_stamp(struct sched_clock_data *scd) scd->tick_raw = sched_clock(); } -static void __set_sched_clock_stable(void) +void set_sched_clock_stable(void) { struct sched_clock_data *scd; @@ -236,7 +236,7 @@ static int __init sched_clock_init_late(void) smp_mb(); /* matches {set,clear}_sched_clock_stable() */ if (__sched_clock_stable_early) - __set_sched_clock_stable(); + set_sched_clock_stable(); return 0; } -- 2.15.3.AMZN
WARNING: multiple messages have this Message-ID (diff)
From: Anchal Agarwal <anchalag@amazon.com> To: <tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de>, <hpa@zytor.com>, <x86@kernel.org>, <boris.ostrovsky@oracle.com>, <jgross@suse.com>, <linux-pm@vger.kernel.org>, <linux-mm@kvack.org>, <kamatam@amazon.com>, <sstabellini@kernel.org>, <konrad.wilk@oracle.co>, <roger.pau@citrix.com>, <axboe@kernel.dk>, <davem@davemloft.net>, <rjw@rjwysocki.net>, <len.brown@intel.com>, <pavel@ucw.cz>, <peterz@infradead.org>, <eduval@amazon.com>, <sblbir@amazon.com>, <anchalag@amazon.com>, <xen-devel@lists.xenproject.org>, <vkuznets@redhat.com>, <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>, <dwmw@amazon.co.uk>, <fllinden@amaozn.com> Cc: anchalag@amazon.com Subject: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation Date: Tue, 7 Jan 2020 23:45:26 +0000 [thread overview] Message-ID: <20200107234526.GA19034@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> (raw) From: Eduardo Valentin <eduval@amazon.com> System instability are seen during resume from hibernation when system is under heavy CPU load. This is due to the lack of update of sched clock data, and the scheduler would then think that heavy CPU hog tasks need more time in CPU, causing the system to freeze during the unfreezing of tasks. For example, threaded irqs, and kernel processes servicing network interface may be delayed for several tens of seconds, causing the system to be unreachable. Situation like this can be reported by using lockup detectors such as workqueue lockup detectors: [root@ip-172-31-67-114 ec2-user]# echo disk > /sys/power/state Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ... kernel:BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 57s! Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ... kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 57s! Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ... kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 57s! Message from syslogd@ip-172-31-67-114 at May 7 18:29:06 ... kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 403s! The fix for this situation is to mark the sched clock as unstable as early as possible in the resume path, leaving it unstable for the duration of the resume process. This will force the scheduler to attempt to align the sched clock across CPUs using the delta with time of day, updating sched clock data. In a post hibernation event, we can then mark the sched clock as stable again, avoiding unnecessary syncs with time of day on systems in which TSC is reliable. Reviewed-by: Erik Quanstrom <quanstro@amazon.com> Reviewed-by: Frank van der Linden <fllinden@amazon.com> Reviewed-by: Balbir Singh <sblbir@amazon.com> Reviewed-by: Munehisa Kamata <kamatam@amazon.com> Tested-by: Anchal Agarwal <anchalag@amazon.com> Signed-off-by: Eduardo Valentin <eduval@amazon.com> --- arch/x86/kernel/tsc.c | 29 +++++++++++++++++++++++++++++ include/linux/sched/clock.h | 5 +++++ kernel/sched/clock.c | 4 ++-- 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 7e322e2daaf5..ae77b8bc4e46 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -14,6 +14,7 @@ #include <linux/percpu.h> #include <linux/timex.h> #include <linux/static_key.h> +#include <linux/suspend.h> #include <asm/hpet.h> #include <asm/timer.h> @@ -1534,3 +1535,31 @@ unsigned long calibrate_delay_is_known(void) return 0; } #endif + +static int tsc_pm_notifier(struct notifier_block *notifier, + unsigned long pm_event, void *unused) +{ + switch (pm_event) { + case PM_HIBERNATION_PREPARE: + clear_sched_clock_stable(); + break; + case PM_POST_HIBERNATION: + /* Set back to the default */ + if (!check_tsc_unstable()) + set_sched_clock_stable(); + break; + } + + return 0; +}; + +static struct notifier_block tsc_pm_notifier_block = { + .notifier_call = tsc_pm_notifier, +}; + +static int tsc_setup_pm_notifier(void) +{ + return register_pm_notifier(&tsc_pm_notifier_block); +} + +subsys_initcall(tsc_setup_pm_notifier); diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h index 867d588314e0..902654ac5f7e 100644 --- a/include/linux/sched/clock.h +++ b/include/linux/sched/clock.h @@ -32,6 +32,10 @@ static inline void clear_sched_clock_stable(void) { } +static inline void set_sched_clock_stable(void) +{ +} + static inline void sched_clock_idle_sleep_event(void) { } @@ -51,6 +55,7 @@ static inline u64 local_clock(void) } #else extern int sched_clock_stable(void); +extern void set_sched_clock_stable(void); extern void clear_sched_clock_stable(void); /* diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index 1152259a4ca0..374d40e5b1a2 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -116,7 +116,7 @@ static void __scd_stamp(struct sched_clock_data *scd) scd->tick_raw = sched_clock(); } -static void __set_sched_clock_stable(void) +void set_sched_clock_stable(void) { struct sched_clock_data *scd; @@ -236,7 +236,7 @@ static int __init sched_clock_init_late(void) smp_mb(); /* matches {set,clear}_sched_clock_stable() */ if (__sched_clock_stable_early) - __set_sched_clock_stable(); + set_sched_clock_stable(); return 0; } -- 2.15.3.AMZN _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
next reply other threads:[~2020-01-07 23:46 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-01-07 23:45 Anchal Agarwal [this message] 2020-01-07 23:45 ` [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation Anchal Agarwal 2020-01-08 10:50 ` Peter Zijlstra 2020-01-08 10:50 ` [Xen-devel] " Peter Zijlstra 2020-01-10 15:35 ` Eduardo Valentin 2020-01-10 15:35 ` [Xen-devel] " Eduardo Valentin 2020-01-13 10:16 ` Peter Zijlstra 2020-01-13 10:16 ` [Xen-devel] " Peter Zijlstra 2020-01-13 11:43 ` Singh, Balbir 2020-01-13 11:43 ` [Xen-devel] " Singh, Balbir 2020-01-13 11:43 ` Singh, Balbir 2020-01-13 11:48 ` Rafael J. Wysocki 2020-01-13 11:48 ` [Xen-devel] " Rafael J. Wysocki 2020-01-13 11:48 ` Rafael J. Wysocki 2020-01-13 12:42 ` Peter Zijlstra 2020-01-13 12:42 ` [Xen-devel] " Peter Zijlstra 2020-01-13 12:42 ` Peter Zijlstra 2020-01-13 21:50 ` Rafael J. Wysocki 2020-01-13 21:50 ` [Xen-devel] " Rafael J. Wysocki 2020-01-13 21:50 ` Rafael J. Wysocki 2020-01-13 23:30 ` Rafael J. Wysocki 2020-01-13 23:30 ` [Xen-devel] " Rafael J. Wysocki 2020-01-13 23:30 ` Rafael J. Wysocki 2020-01-14 19:29 ` Anchal Agarwal 2020-01-14 19:29 ` [Xen-devel] " Anchal Agarwal 2020-01-22 20:07 ` Anchal Agarwal 2020-01-22 20:07 ` [Xen-devel] " Anchal Agarwal 2020-01-23 16:27 ` Boris Ostrovsky 2020-01-23 16:27 ` [Xen-devel] " Boris Ostrovsky 2020-01-13 13:01 ` Andrew Cooper 2020-01-13 13:01 ` Andrew Cooper 2020-01-13 13:01 ` Andrew Cooper 2020-01-13 13:54 ` David Woodhouse 2020-01-13 13:54 ` David Woodhouse 2020-01-13 13:54 ` David Woodhouse 2020-01-13 15:02 ` Singh, Balbir 2020-01-13 15:02 ` Singh, Balbir 2020-01-13 15:02 ` Singh, Balbir
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200107234526.GA19034@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com \ --to=anchalag@amazon.com \ --cc=Woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com \ --cc=axboe@kernel.dk \ --cc=boris.ostrovsky@oracle.com \ --cc=bp@alien8.de \ --cc=davem@davemloft.net \ --cc=dwmw@amazon.co.uk \ --cc=eduval@amazon.com \ --cc=fllinden@amaozn.com \ --cc=hpa@zytor.com \ --cc=jgross@suse.com \ --cc=kamatam@amazon.com \ --cc=konrad.wilk@oracle.co \ --cc=len.brown@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-pm@vger.kernel.org \ --cc=mingo@redhat.com \ --cc=netdev@vger.kernel.org \ --cc=pavel@ucw.cz \ --cc=peterz@infradead.org \ --cc=rjw@rjwysocki.net \ --cc=roger.pau@citrix.com \ --cc=sblbir@amazon.com \ --cc=sstabellini@kernel.org \ --cc=tglx@linutronix.de \ --cc=vkuznets@redhat.com \ --cc=x86@kernel.org \ --cc=xen-devel@lists.xenproject.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.