From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753234AbbDBNgO (ORCPT ); Thu, 2 Apr 2015 09:36:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52543 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751718AbbDBNgM (ORCPT ); Thu, 2 Apr 2015 09:36:12 -0400 Date: Thu, 2 Apr 2015 09:35:02 -0400 From: Don Zickus To: Chris Metcalf Cc: Ingo Molnar , Andrew Morton , Andrew Jones , chai wen , Ulrich Obergfell , Fabian Frederick , Aaron Tomlin , Ben Zhang , Christoph Lameter , Frederic Weisbecker , Gilad Ben-Yossef , Steven Rostedt , open list Subject: Re: [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores Message-ID: <20150402133502.GA175361@redhat.com> References: <1427741465-15747-1-git-send-email-cmetcalf@ezchip.com> <20150331072502.GA16754@gmail.com> <551AE7D4.3020608@ezchip.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <551AE7D4.3020608@ezchip.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 31, 2015 at 02:30:44PM -0400, Chris Metcalf wrote: > On 03/31/2015 03:25 AM, Ingo Molnar wrote: > >* cmetcalf@ezchip.com wrote: > > > >>From: Chris Metcalf > >> > >>Running watchdog can be a helpful debugging feature on regular > >>cores, but it's incompatible with nohz_full, since it forces > >>regular scheduling events. Accordingly, just exit out immediately > >>from any nohz_full core. > >> > >>An alternate approach would be to add a flags field or function to > >>smp_hotplug_thread to control on which cores the percpu threads > >>are created, but it wasn't clear that much mechanism was useful. > >> > >>[...] > >So what happens if someone wants to enable the lockup detector, with a > >long timeout, even on nohz-full CPUs? This patch makes that > >impossible. > > > >A better solution would be to tweak the defaults: > > > > - to default the watchdog(s) to disabled when nohz-full is > > enabled, even if HARDLOCKUP_DETECTOR=y or DETECT_HUNG_TASK=y, and > > allow it to be re-enabled via its sysctl. > > That's certainly a reasonable thing to do; it looks like just an #ifdef > at the top of watchdog.c would suffice. Does this look right? > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 8a46d9d8a66f..c8555c211e65 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -25,7 +25,11 @@ > #include > #include > +#ifdef CONFIG_NO_HZ_FULL > +int watchdog_user_enabled = 0; > +#else > int watchdog_user_enabled = 1; > +#endif > int __read_mostly watchdog_thresh = 10; > #ifdef CONFIG_SMP > int __read_mostly sysctl_softlockup_all_cpu_backtrace; > > It doesn't look like I need to do anything else special to disable > HARDLOCKUP_DETECTOR, and khungtaskd can happily run on > a non-nohz core, so that should be OK. > > What I was trying to achieve with my proposed patch was kind > of orthogonal: to allow the watchdog to run on standard cores, > but not run on nohz cores, so we could benefit from it on the > cores where it was safe for it to run. Do you see value in this, > or better to just enable/disable all watchdog threads collectively? Hmm, I am not sure I am a big fan of this approach. I know RHEL keeps the watchdogs enabled for customers and it would be a regression if we disabled it. And at the same time, I could see RHEL leaning towards enabling CONFIG_NO_HZ_FULL, which would just delay this problem a number of years until RHEL-8 gets around to ramping up. So I guess I would prefer to figure out a better co-existing solution now. Can I ask how the NO_HZ_FULL technology works from userspace? Is there a system command that has to be sent? How does the kernel know to turn off ticks and trust userspace to do the right thing? Cheers, Don > > -- > Chris Metcalf, EZChip Semiconductor > http://www.ezchip.com >