From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934467AbcCNOpd (ORCPT ); Mon, 14 Mar 2016 10:45:33 -0400 Received: from prod-mail-xrelay07.akamai.com ([23.79.238.175]:41796 "EHLO prod-mail-xrelay07.akamai.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934283AbcCNOpb (ORCPT ); Mon, 14 Mar 2016 10:45:31 -0400 Subject: Re: [PATCH] watchdog: don't run proc_watchdog_update if new value is same as old To: Don Zickus References: <1457826627-21727-1-git-send-email-johunt@akamai.com> <20160314143426.GK194535@redhat.com> Cc: akpm@linux-foundation.org, uobergfe@redhat.com, linux-kernel@vger.kernel.org From: Josh Hunt Message-ID: <56E6CE86.5070606@akamai.com> Date: Mon, 14 Mar 2016 09:45:26 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <20160314143426.GK194535@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/14/2016 09:34 AM, Don Zickus wrote: > On Sat, Mar 12, 2016 at 06:50:26PM -0500, Joshua Hunt wrote: >> While working on a script to restore all sysctl params before a series of >> tests I found that writing any value into the >> /proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh} >> causes them to call proc_watchdog_update(). Not only that, but when I >> wrote to these proc files in a loop I could easily trigger a soft lockup. >> >> [ 955.756196] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. >> [ 955.765994] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. >> [ 955.774619] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. >> [ 955.783182] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. >> [ 959.788319] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 30s! [swapper/4:0] >> [ 959.788325] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 30s! [swapper/5:0] >> >> There doesn't appear to be a reason for doing this work other every time a >> write occurs, so only do the work when the values change. > > Hi Josh, > > Thanks for the patch. I have no objections to it, but Uli and myself were > interested in the reason for the softlockups. Uli is going to provide a > test patch to see if his theory is correct. That way we fix the underlying > issue and then apply your patch on top. Make sense? Yep. Sounds good. I meant to mention I didn't diagnose the soft-lockup. If you provide a patch I'm happy to test. I can also attempt to debug that part more if needed. Josh > > Cheers, > Don > >> >> Signed-off-by: Josh Hunt >> --- >> kernel/watchdog.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/watchdog.c b/kernel/watchdog.c >> index b3ace6e..9acb29f 100644 >> --- a/kernel/watchdog.c >> +++ b/kernel/watchdog.c >> @@ -923,6 +923,9 @@ static int proc_watchdog_common(int which, struct ctl_table *table, int write, >> * both lockup detectors are disabled if proc_watchdog_update() >> * returns an error. >> */ >> + if (old == new) >> + goto out; >> + >> err = proc_watchdog_update(); >> } >> out: >> @@ -967,7 +970,7 @@ int proc_soft_watchdog(struct ctl_table *table, int write, >> int proc_watchdog_thresh(struct ctl_table *table, int write, >> void __user *buffer, size_t *lenp, loff_t *ppos) >> { >> - int err, old; >> + int err, old, new; >> >> get_online_cpus(); >> mutex_lock(&watchdog_proc_mutex); >> @@ -987,6 +990,10 @@ int proc_watchdog_thresh(struct ctl_table *table, int write, >> /* >> * Update the sample period. Restore on failure. >> */ >> + new = ACCESS_ONCE(watchdog_thresh); >> + if (old == new) >> + goto out; >> + >> set_sample_period(); >> err = proc_watchdog_update(); >> if (err) { >> -- >> 1.7.9.5 >>