From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753261AbbDBPnD (ORCPT ); Thu, 2 Apr 2015 11:43:03 -0400 Received: from mail-db3on0065.outbound.protection.outlook.com ([157.55.234.65]:3146 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751180AbbDBPm7 (ORCPT ); Thu, 2 Apr 2015 11:42:59 -0400 Message-ID: <551D6373.2030000@ezchip.com> Date: Thu, 2 Apr 2015 11:42:43 -0400 From: Chris Metcalf User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Frederic Weisbecker , Don Zickus CC: Ingo Molnar , Andrew Morton , Andrew Jones , chai wen , Ulrich Obergfell , Fabian Frederick , Aaron Tomlin , Ben Zhang , Christoph Lameter , Gilad Ben-Yossef , Steven Rostedt , open list Subject: Re: [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores References: <1427741465-15747-1-git-send-email-cmetcalf@ezchip.com> <20150331072502.GA16754@gmail.com> <551AE7D4.3020608@ezchip.com> <20150402133502.GA175361@redhat.com> <551D48F9.6090101@ezchip.com> <20150402141527.GD175361@redhat.com> <20150402153827.GC10357@lerouge> In-Reply-To: <20150402153827.GC10357@lerouge> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: BN1PR12CA0033.namprd12.prod.outlook.com (25.160.77.43) To DB4PR02MB0543.eurprd02.prod.outlook.com (10.141.45.16) Authentication-Results: vger.kernel.org; dkim=none (message not signed) header.d=none; X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB4PR02MB0543; X-Microsoft-Antispam-PRVS: X-Forefront-Antispam-Report: BMV:1;SFV:NSPM;SFS:(10009020)(6049001)(6009001)(24454002)(51704005)(377454003)(479174004)(23746002)(86362001)(46102003)(77096005)(62966003)(19580395003)(33656002)(76176999)(92566002)(93886004)(80316001)(87976001)(122386002)(50466002)(65816999)(42186005)(15975445007)(50986999)(87266999)(54356999)(64126003)(36756003)(2950100001)(66066001)(83506001)(77156002)(47776003)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:DB4PR02MB0543;H:[10.7.0.41];FPR:;SPF:None;MLV:sfv;LANG:en; X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(5002010);SRVR:DB4PR02MB0543;BCL:0;PCL:0;RULEID:;SRVR:DB4PR02MB0543; X-Forefront-PRVS: 0534947130 X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Apr 2015 15:42:55.4714 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR02MB0543 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/02/2015 11:38 AM, Frederic Weisbecker wrote: > On Thu, Apr 02, 2015 at 10:15:27AM -0400, Don Zickus wrote: >> On Thu, Apr 02, 2015 at 09:49:45AM -0400, Chris Metcalf wrote: >>>> Can I ask how the NO_HZ_FULL technology works from userspace? Is there a >>>> system command that has to be sent? How does the kernel know to turn off >>>> ticks and trust userspace to do the right thing? >>> The NO_HZ_FULL option, when configured into the kernel, lets >>> you boot with "nohz_full=1-15" (or whatever cpumask you like), >>> typically in conjunction with "isolcpus=1-15". At this point no tasks >>> will run on those cores until explicitly placed there by affinity, and >>> once there and running in userspace, the kernel will automatically >>> get out of their way and not interrupt at all. This lets those tasks >>> run with 100.000% of the cpu, which is a requirement for many >>> user-space device drivers running high throughput devices. >>> (This is typically the use case for the tile architecture customers.) >>> >>> So, other than a boot flag, there are no system commands or >>> other APIs to deal with. >> Ah, I am starting to understand your approach in the original patch better. >> >>> Part of the requirement, though, is that there can be only one task >>> bound and runnable on that cpu, otherwise the kernel has to be >>> involved to do the context-switching off of the scheduler tick. >>> This is why having the standard watchdog kernel thread doesn't >>> work in this context. >> So, there is no preemption happening, which means the softlockup is rather >> pointless. > Still useful actually because nohz full only takes effect when a single task runs > on the CPU. But there can still be more than 1 task running, just nohz full will > be disabled. It all happens dynamically. > >> Can interrupts be disabled or handled on that cpu? I am trying >> to see if the hardlockup detector becomes rather silly on those cpus too. > No interrupts aren't disabled on these CPUs. Now the goal is to avoid them: > migrate irqs, nohz full, etc... > > But there can be irqs. And actually there is at least 1 tick every second in > order to keep the scheduler stats moving forward. We plan to get rid of it but > anyway the point is that IRQ can happen on nohz full CPUs. > >>> I continue to suspect that the right model here is to disable the >>> watchdog specifically on the cores that the user has tagged with >>> the nohz_full boot argument. I agree that there might be a case >>> to be made for leaving the watchdog conditionally (as suggested >>> by Ingo) but it should be possible to have the watchdogs on >>> the nohz_full cores be turned off completely if desired. >> I think I might be slowly coming around to your thoughts. I might request a >> different patch though based on the answers above. Maybe even create a >> subset of the online cpus for the watchdog to work off of. The watchdog >> would copy the online cpu mask, mask off the nohz cpus and just function >> that way. It would print loud messages for each nohz cpu it was masking >> off. > All agreed with that! We should at least keep the watchdog running on > non-nohz-full CPUs. And also allow to re-enable it everywhere when needed, > in case we have a lockup to chase on nohz full CPUs. > >> Then perhaps as a debug aid, expose a /proc/sys/kernel/watchdog_cpumask for >> folks to modify in case they want to enable the watchdog on the nohz cpus. > That sounds like a good idea. OK, I will respin v2 of the patch as follows: - Provide a watchdog_cpumask as suggested by Don. - On a non-NO_HZ_FULL build, it defaults to cpu_possible as normal - On a NO_HZ_FULL build, it defaults to the housekeeping cpus - If the mask is modified, we disable and then re-enable the watchdog, so that the watchdog init code can exit() the appropriate threads as they start up This should address the various concerns that have been raised. -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com