From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933461AbcHJWRP (ORCPT ); Wed, 10 Aug 2016 18:17:15 -0400 Received: from mail-wm0-f46.google.com ([74.125.82.46]:38759 "EHLO mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932949AbcHJWRM (ORCPT ); Wed, 10 Aug 2016 18:17:12 -0400 Date: Thu, 11 Aug 2016 00:16:58 +0200 From: Frederic Weisbecker To: Christoph Lameter Cc: Chris Metcalf , Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , Rik van Riel , Tejun Heo , Thomas Gleixner , "Paul E. McKenney" , Viresh Kumar , Catalin Marinas , Will Deacon , Andy Lutomirski , Daniel Lezcano , linux-doc@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: clocksource_watchdog causing scheduling of timers every second (was [v13] support "task_isolation" mode) Message-ID: <20160810221656.GC19757@lerouge> References: <1468529299-27929-1-git-send-email-cmetcalf@mellanox.com> <7a3f66f7-5011-7d59-2e0e-f57e4e42e6b6@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 27, 2016 at 08:55:28AM -0500, Christoph Lameter wrote: > On Mon, 25 Jul 2016, Christoph Lameter wrote: > > > Guess so. I will have a look at this when I get some time again. > > Ok so the problem is the clocksource_watchdog() function in > kernel/time/clocksource.c. This function is active if > CONFIG_CLOCKSOURCE_WATCHDOG is defined. It will check the timesources of > each processor for being within bounds and then reschedule itself on the > next one. > > The purpose of the function seems to be to determine *if* a clocksource is > unstable. It does not mean that the clocksource *is* unstable. > > The critical piece of code is this: > > /* > * Cycle through CPUs to check if the CPUs stay synchronized > * to each other. > */ > next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask); > if (next_cpu >= nr_cpu_ids) > next_cpu = cpumask_first(cpu_online_mask); > watchdog_timer.expires += WATCHDOG_INTERVAL; > add_timer_on(&watchdog_timer, next_cpu); > > > Should we just cycle through the cpus that are not isolated? Otherwise we > need to have some means to check the clocksources for accuracy remotely > (probably impossible for TSC etc). > > The WATCHDOG_INTERVAL is 1 second so this causes an interrupt every > second. > > Note that we are running with the patch that removes the 1 HZ mininum time > tick. With an older kernel code base (redhat) we can keep the kernel quiet > for minutes. The clocksource watchdog causes timers to fire again. I had similar issues, this seems to happen when the tsc is considered not reliable (which doesn't necessarily mean unstable. I think it has to do with some x86 CPU feature flag). IIRC, this _has_ to execute on all online CPUs because every TSCs of running CPUs are concerned. I personally override that with passing the tsc=reliable kernel parameter. Of course use it at your own risk. But eventually I don't think we can offline that to housekeeping only CPUs.