From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753234AbbDBNgO (ORCPT <rfc822;w@1wt.eu>);
	Thu, 2 Apr 2015 09:36:14 -0400
Received: from mx1.redhat.com ([209.132.183.28]:52543 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751718AbbDBNgM (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 2 Apr 2015 09:36:12 -0400
Date: Thu, 2 Apr 2015 09:35:02 -0400
From: Don Zickus <dzickus@redhat.com>
To: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>, Andrew Morton <akpm@linux-foundation.org>,
        Andrew Jones <drjones@redhat.com>,
        chai wen <chaiw.fnst@cn.fujitsu.com>,
        Ulrich Obergfell <uobergfe@redhat.com>,
        Fabian Frederick <fabf@skynet.be>, Aaron Tomlin <atomlin@redhat.com>,
        Ben Zhang <benzh@chromium.org>, Christoph Lameter <cl@linux.com>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Gilad Ben-Yossef <gilad@benyossef.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores
Message-ID: <20150402133502.GA175361@redhat.com>
References: <1427741465-15747-1-git-send-email-cmetcalf@ezchip.com>
 <20150331072502.GA16754@gmail.com>
 <551AE7D4.3020608@ezchip.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <551AE7D4.3020608@ezchip.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Mar 31, 2015 at 02:30:44PM -0400, Chris Metcalf wrote:
> On 03/31/2015 03:25 AM, Ingo Molnar wrote:
> >* cmetcalf@ezchip.com <cmetcalf@ezchip.com> wrote:
> >
> >>From: Chris Metcalf <cmetcalf@ezchip.com>
> >>
> >>Running watchdog can be a helpful debugging feature on regular
> >>cores, but it's incompatible with nohz_full, since it forces
> >>regular scheduling events.  Accordingly, just exit out immediately
> >>from any nohz_full core.
> >>
> >>An alternate approach would be to add a flags field or function to
> >>smp_hotplug_thread to control on which cores the percpu threads
> >>are created, but it wasn't clear that much mechanism was useful.
> >>
> >>[...]
> >So what happens if someone wants to enable the lockup detector, with a
> >long timeout, even on nohz-full CPUs? This patch makes that
> >impossible.
> >
> >A better solution would be to tweak the defaults:
> >
> >  - to default the watchdog(s) to disabled when nohz-full is
> >    enabled, even if HARDLOCKUP_DETECTOR=y or DETECT_HUNG_TASK=y, and
> >    allow it to be re-enabled via its sysctl.
> 
> That's certainly a reasonable thing to do; it looks like just an #ifdef
> at the top of watchdog.c would suffice.  Does this look right?
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 8a46d9d8a66f..c8555c211e65 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -25,7 +25,11 @@
>  #include <linux/kvm_para.h>
>  #include <linux/perf_event.h>
> +#ifdef CONFIG_NO_HZ_FULL
> +int watchdog_user_enabled = 0;
> +#else
>  int watchdog_user_enabled = 1;
> +#endif
>  int __read_mostly watchdog_thresh = 10;
>  #ifdef CONFIG_SMP
>  int __read_mostly sysctl_softlockup_all_cpu_backtrace;
> 
> It doesn't look like I need to do anything else special to disable
> HARDLOCKUP_DETECTOR, and khungtaskd can happily run on
> a non-nohz core, so that should be OK.
> 
> What I was trying to achieve with my proposed patch was kind
> of orthogonal: to allow the watchdog to run on standard cores,
> but not run on nohz cores, so we could benefit from it on the
> cores where it was safe for it to run.  Do you see value in this,
> or better to just enable/disable all watchdog threads collectively?


Hmm, I am not sure I am a big fan of this approach.  I know RHEL keeps the
watchdogs enabled for customers and it would be a regression if we disabled
it.  And at the same time, I could see RHEL leaning towards enabling
CONFIG_NO_HZ_FULL, which would just delay this problem a number of years
until RHEL-8 gets around to ramping up.

So I guess I would prefer to figure out a better co-existing solution now.

Can I ask how the NO_HZ_FULL technology works from userspace?  Is there a
system command that has to be sent?  How does the kernel know to turn off
ticks and trust userspace to do the right thing?

Cheers,
Don

> 
> -- 
> Chris Metcalf, EZChip Semiconductor
> http://www.ezchip.com
>