From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S934467AbcCNOpd (ORCPT <rfc822;w@1wt.eu>);
	Mon, 14 Mar 2016 10:45:33 -0400
Received: from prod-mail-xrelay07.akamai.com ([23.79.238.175]:41796 "EHLO
	prod-mail-xrelay07.akamai.com" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S934283AbcCNOpb (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 14 Mar 2016 10:45:31 -0400
Subject: Re: [PATCH] watchdog: don't run proc_watchdog_update if new value is
 same as old
To: Don Zickus <dzickus@redhat.com>
References: <1457826627-21727-1-git-send-email-johunt@akamai.com>
 <20160314143426.GK194535@redhat.com>
Cc: akpm@linux-foundation.org, uobergfe@redhat.com,
        linux-kernel@vger.kernel.org
From: Josh Hunt <johunt@akamai.com>
Message-ID: <56E6CE86.5070606@akamai.com>
Date: Mon, 14 Mar 2016 09:45:26 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.5.1
MIME-Version: 1.0
In-Reply-To: <20160314143426.GK194535@redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 03/14/2016 09:34 AM, Don Zickus wrote:
> On Sat, Mar 12, 2016 at 06:50:26PM -0500, Joshua Hunt wrote:
>> While working on a script to restore all sysctl params before a series of
>> tests I found that writing any value into the
>> /proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh}
>> causes them to call proc_watchdog_update(). Not only that, but when I
>> wrote to these proc files in a loop I could easily trigger a soft lockup.
>>
>> [  955.756196] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
>> [  955.765994] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
>> [  955.774619] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
>> [  955.783182] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
>> [  959.788319] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 30s! [swapper/4:0]
>> [  959.788325] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 30s! [swapper/5:0]
>>
>> There doesn't appear to be a reason for doing this work other every time a
>> write occurs, so only do the work when the values change.
>
> Hi Josh,
>
> Thanks for the patch.  I have no objections to it, but Uli and myself were
> interested in the reason for the softlockups.  Uli is going to provide a
> test patch to see if his theory is correct.  That way we fix the underlying
> issue and then apply your patch on top. Make sense?

Yep. Sounds good. I meant to mention I didn't diagnose the soft-lockup. 
If you provide a patch I'm happy to test. I can also attempt to debug 
that part more if needed.

Josh

>
> Cheers,
> Don
>
>>
>> Signed-off-by: Josh Hunt <johunt@akamai.com>
>> ---
>>   kernel/watchdog.c |    9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index b3ace6e..9acb29f 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -923,6 +923,9 @@ static int proc_watchdog_common(int which, struct ctl_table *table, int write,
>>   		 * both lockup detectors are disabled if proc_watchdog_update()
>>   		 * returns an error.
>>   		 */
>> +		if (old == new)
>> +			goto out;
>> +
>>   		err = proc_watchdog_update();
>>   	}
>>   out:
>> @@ -967,7 +970,7 @@ int proc_soft_watchdog(struct ctl_table *table, int write,
>>   int proc_watchdog_thresh(struct ctl_table *table, int write,
>>   			 void __user *buffer, size_t *lenp, loff_t *ppos)
>>   {
>> -	int err, old;
>> +	int err, old, new;
>>
>>   	get_online_cpus();
>>   	mutex_lock(&watchdog_proc_mutex);
>> @@ -987,6 +990,10 @@ int proc_watchdog_thresh(struct ctl_table *table, int write,
>>   	/*
>>   	 * Update the sample period. Restore on failure.
>>   	 */
>> +	new = ACCESS_ONCE(watchdog_thresh);
>> +	if (old == new)
>> +		goto out;
>> +
>>   	set_sample_period();
>>   	err = proc_watchdog_update();
>>   	if (err) {
>> --
>> 1.7.9.5
>>