From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753388AbbDBRkJ (ORCPT ); Thu, 2 Apr 2015 13:40:09 -0400 Received: from mail-am1on0095.outbound.protection.outlook.com ([157.56.112.95]:37588 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752976AbbDBRkE (ORCPT ); Thu, 2 Apr 2015 13:40:04 -0400 Authentication-Results: spf=fail (sender IP is 12.216.194.146) smtp.mailfrom=ezchip.com; ezchip.com; dkim=none (message not signed) header.d=none; From: To: Frederic Weisbecker , Don Zickus , Ingo Molnar , Andrew Morton , Andrew Jones , chai wen , Ulrich Obergfell , Fabian Frederick , Aaron Tomlin , Ben Zhang , "Christoph Lameter" , Gilad Ben-Yossef , "Steven Rostedt" , , "Jonathan Corbet" , , Peter Zijlstra CC: Chris Metcalf Subject: [PATCH v3] watchdog: add watchdog_cpumask sysctl to assist nohz Date: Thu, 2 Apr 2015 13:39:28 -0400 Message-ID: <1427996368-2199-1-git-send-email-cmetcalf@ezchip.com> X-Mailer: git-send-email 2.1.2 In-Reply-To: <20150402164845.GD10357@lerouge> References: <1427741465-15747-1-git-send-email-cmetcalf@ezchip.com> <20150331072502.GA16754@gmail.com> <551AE7D4.3020608@ezchip.com> <20150402133502.GA175361@redhat.com> <551D48F9.6090101@ezchip.com> <20150402141527.GD175361@redhat.com> <20150402153827.GC10357@lerouge> <551D6373.2030000@ezchip.com> <20150402164845.GD10357@lerouge> X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:12.216.194.146;CTRY:US;IPV:NLI;EFV:NLI;BMV:1;SFV:NSPM;SFS:(10009020)(6009001)(339900001)(199003)(189002)(106466001)(36756003)(105606002)(50226001)(42186005)(229853001)(47776003)(77156002)(62966003)(6806004)(19580405001)(19580395003)(86152002)(92566002)(46102003)(87936001)(2950100001)(93886004)(86362001)(575784001)(104016003)(50986999)(33646002)(50466002)(76176999)(48376002)(85426001)(921003)(1121003);DIR:OUT;SFP:1101;SCL:1;SRVR:HE1PR02MB0779;H:ld-1.internal.tilera.com;FPR:;SPF:Fail;MLV:sfv;A:1;MX:1;LANG:en; MIME-Version: 1.0 Content-Type: text/plain X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR02MB0779; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5002010)(5005006);SRVR:HE1PR02MB0779;BCL:0;PCL:0;RULEID:;SRVR:HE1PR02MB0779; X-Forefront-PRVS: 0534947130 X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Apr 2015 17:39:57.6551 (UTC) X-MS-Exchange-CrossTenant-Id: 0fc16e0a-3cd3-4092-8b2f-0a42cff122c3 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=0fc16e0a-3cd3-4092-8b2f-0a42cff122c3;Ip=[12.216.194.146];Helo=[ld-1.internal.tilera.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR02MB0779 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Chris Metcalf Change the default behavior of watchdog so it only runs on the housekeeping cores when nohz_full is enabled at build and boot time. Allow modifying the set of cores the watchdog is currently running on with a new kernel.watchdog_cpumask sysctl. Signed-off-by: Chris Metcalf --- Technically this is only v2, but I accidentally replied to an earlier email after adding v2 to the subject line, so for clarity I'm calling this thread v3. This change depends on my earlier change to add a tick_nohz_full_clear_cpus() API. If folks are OK with my doing so, I can add it to the set of patches I'm planning to ask Linus to pull for 4.1. Documentation/lockup-watchdogs.txt | 6 ++++++ Documentation/sysctl/kernel.txt | 9 +++++++++ include/linux/nmi.h | 1 + include/linux/sched.h | 3 +++ kernel/sysctl.c | 7 +++++++ kernel/watchdog.c | 33 ++++++++++++++++++++++++++++++++- 6 files changed, 58 insertions(+), 1 deletion(-) diff --git a/Documentation/lockup-watchdogs.txt b/Documentation/lockup-watchdogs.txt index ab0baa692c13..82a99eedf904 100644 --- a/Documentation/lockup-watchdogs.txt +++ b/Documentation/lockup-watchdogs.txt @@ -61,3 +61,9 @@ As explained above, a kernel knob is provided that allows administrators to configure the period of the hrtimer and the perf event. The right value for a particular environment is a trade-off between fast response to lockups and detection overhead. + +By default, the watchdog runs on all online cores. However, on a +kernel configured with NO_HZ_FULL, by default the watchdog runs only +on the housekeeping cores, not the cores specified in the "nohz_full" +boot argument. In either case, the set of cores running the watchdog +may be adjusted via the kernel.watchdog_cpumask sysctl. diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 83ab25660fc9..5821dc6bb5c2 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -858,6 +858,15 @@ example. If a system hangs up, try pressing the NMI switch. ============================================================== +watchdog_cpumask: + +This value can be used to control on which cpus the watchdog will run. +The default cpumask specifies every core, but if NO_HZ_FULL is enabled +in the kernel config, and cores are specified with the nohz_full= boot +argument, those cores are excluded by default. + +============================================================== + watchdog_thresh: This value can be used to control the frequency of hrtimer and NMI diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 9b2022ab4d85..cebf36e618e0 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -70,6 +70,7 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *); u64 hw_nmi_get_sample_period(int watchdog_thresh); extern int watchdog_user_enabled; extern int watchdog_thresh; +extern unsigned long *watchdog_mask_bits; extern int sysctl_softlockup_all_cpu_backtrace; struct ctl_table; extern int proc_dowatchdog(struct ctl_table *, int , diff --git a/include/linux/sched.h b/include/linux/sched.h index 6d77432e14ff..a6f048f4fbeb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -377,6 +377,9 @@ extern void touch_all_softlockup_watchdogs(void); extern int proc_dowatchdog_thresh(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +extern int proc_dowatchdog_mask(struct ctl_table *table, int write, + void __user *buffer, + size_t *lenp, loff_t *ppos); extern unsigned int softlockup_panic; void lockup_detector_init(void); #else diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 88ea2d6e0031..2fb96ffa56d1 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -860,6 +860,13 @@ static struct ctl_table kern_table[] = { .extra2 = &sixty, }, { + .procname = "watchdog_cpumask", + .data = &watchdog_mask_bits, + .maxlen = NR_CPUS, + .mode = 0644, + .proc_handler = proc_dowatchdog_mask, + }, + { .procname = "softlockup_panic", .data = &softlockup_panic, .maxlen = sizeof(int), diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 3174bf8e3538..2140c2d81dc9 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -31,6 +32,8 @@ int __read_mostly sysctl_softlockup_all_cpu_backtrace; #else #define sysctl_softlockup_all_cpu_backtrace 0 #endif +static cpumask_var_t watchdog_mask; +unsigned long *watchdog_mask_bits; static int __read_mostly watchdog_running; static u64 __read_mostly sample_period; @@ -431,6 +434,10 @@ static void watchdog_enable(unsigned int cpu) hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); hrtimer->function = watchdog_timer_fn; + /* Exit if the cpu is not allowed for watchdog. */ + if (!cpumask_test_cpu(cpu, watchdog_mask)) + do_exit(0); + /* Enable the perf event */ watchdog_nmi_enable(cpu); @@ -653,6 +660,8 @@ static void watchdog_disable_all_cpus(void) } } +static DEFINE_MUTEX(watchdog_proc_mutex); + /* * proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh */ @@ -662,7 +671,6 @@ int proc_dowatchdog(struct ctl_table *table, int write, { int err, old_thresh, old_enabled; bool old_hardlockup; - static DEFINE_MUTEX(watchdog_proc_mutex); mutex_lock(&watchdog_proc_mutex); old_thresh = ACCESS_ONCE(watchdog_thresh); @@ -700,12 +708,35 @@ out: mutex_unlock(&watchdog_proc_mutex); return err; } + +int proc_dowatchdog_mask(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int err; + + mutex_lock(&watchdog_proc_mutex); + err = proc_do_large_bitmap(table, write, buffer, lenp, ppos); + if (!err && write && watchdog_user_enabled) { + watchdog_disable_all_cpus(); + watchdog_enable_all_cpus(false); + } + mutex_unlock(&watchdog_proc_mutex); + return err; +} + #endif /* CONFIG_SYSCTL */ void __init lockup_detector_init(void) { set_sample_period(); + alloc_bootmem_cpumask_var(&watchdog_mask); + cpumask_copy(watchdog_mask, cpu_possible_mask); + tick_nohz_full_clear_cpus(watchdog_mask); + + /* The sysctl API requires a variable holding a pointer to the mask. */ + watchdog_mask_bits = cpumask_bits(watchdog_mask); + if (watchdog_user_enabled) watchdog_enable_all_cpus(false); } -- 2.1.2