From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933365AbcGJMY0 (ORCPT ); Sun, 10 Jul 2016 08:24:26 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:35912 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933281AbcGJMYY (ORCPT ); Sun, 10 Jul 2016 08:24:24 -0400 From: Nicolai Stange To: Thomas Gleixner Cc: Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, John Stultz , Borislav Petkov , Paolo Bonzini , Viresh Kumar , Hidehiro Kawai , "Peter Zijlstra (Intel)" , "Christopher S. Hall" , Adrian Hunter , Suresh Siddha , linux-kernel@vger.kernel.org, Nicolai Stange Subject: [PATCH 0/4] avoid double timer interrupt with nohz and Intel TSC Date: Sun, 10 Jul 2016 14:23:41 +0200 Message-Id: <20160710122345.13061-1-nicstange@gmail.com> X-Mailer: git-send-email 2.9.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With a single task running on a NOHZ CPU on an Intel Haswell, I recognized that I did not only get the one expected local_timer APIC interrupt, but two per second at minimum. Further tracing showed that the first one preceedes the programmed deadline by up to ~50us and hence, it did nothing except for reprogramming the TSC deadline clockevent device to trigger shortly thereafter again. FYI, the trace looks like this: <...>-2938 [007] d.h. 420.753164: local_timer_entry: vector=239 <...>-2938 [007] d.h. 420.753164: __hrtimer_run_queues <-hrtimer_interrupt <...>-2938 [007] d.h. 420.753184: local_timer_entry: vector=239 <...>-2938 [007] d.h. 420.753184: __hrtimer_run_queues <-hrtimer_interrupt <...>-2938 [007] d.h. 420.753195: tick_sched_timer <-__hrtimer_run_queues <...>-2938 [007] d.h. 421.752170: local_timer_entry: vector=239 <...>-2938 [007] d.h. 421.752171: __hrtimer_run_queues <-hrtimer_interrupt <...>-2938 [007] d.h. 421.752202: local_timer_entry: vector=239 <...>-2938 [007] d.h. 421.752202: __hrtimer_run_queues <-hrtimer_interrupt <...>-2938 [007] d.h. 421.752202: tick_sched_timer <-__hrtimer_run_queues It turns out that this too early programmed TSC deadline is caused by inaccuracies in some frequency calculations which become significant if the timer periods become large as it is the case for nohz with one task (delta = 10^9ns). The first three patches address inaccuracies entering the TSC deadline clockevent devices' frequency. The fourth patch is the most important one as it addresses the error of largest relative magnitude. It is caused by the assumption in the clockevents core that the ratio of the monotonic clock's frequency to that of the clockevent device's is a constant. Since the monotonic clock's frequency gets dynamically adjusted in order to compensate for NTP errors, this is not true. With this patchset applied, the trace looks like this: <...>-23609 [007] d.h. 1811.586658: local_timer_entry: vector=239 <...>-23609 [007] d.h. 1811.586680: __hrtimer_run_queues <-hrtimer_interrupt <...>-23609 [007] d.h. 1811.586680: tick_sched_timer <-__hrtimer_run_queues <...>-23609 [007] d.h. 1812.585659: local_timer_entry: vector=239 <...>-23609 [007] d.h. 1812.585666: __hrtimer_run_queues <-hrtimer_interrupt <...>-23609 [007] d.h. 1812.585666: tick_sched_timer <-__hrtimer_run_queues <...>-23609 [007] d.h. 1813.584661: local_timer_entry: vector=239 <...>-23609 [007] d.h. 1813.584668: __hrtimer_run_queues <-hrtimer_interrupt <...>-23609 [007] d.h. 1813.584668: tick_sched_timer <-__hrtimer_run_queues Please note that the first three TSC-patches might not be necessary to get this result. In fact, [3/4] ("arch, x86, tsc: inform TSC deadline clockevent device about") is somewhat counterproductive in the sense that on my system, it usually corrects the TSC deadline device's frequency towards lower values and thus, facilitates the too-early interrupt behaviour initially described. However, I decided to send them along with the fourth patch because - I tested the fourth patch in this setting - I believe that a greater accurracy of the TSC deadline device is worthwhile on its own Applicable to linux-next-20160708. The individual patches don't depend on each other. Nicolai Stange (4): arch, x86, tsc deadline clockevent dev: reduce frequency roundoff error arch, x86, tsc deadline clockevent dev: reduce TSC_DIVISOR to 2 arch, x86, tsc: inform TSC deadline clockevent device about recalibration kernel/time/clockevents: compensate for monotonic clock's dynamic frequency arch/x86/include/asm/apic.h | 1 + arch/x86/kernel/apic/apic.c | 29 ++++++++++++++++++++++++-- arch/x86/kernel/tsc.c | 4 ++++ kernel/time/clockevents.c | 1 + kernel/time/timekeeping.c | 50 +++++++++++++++++++++++++++++++++++++++++++++ kernel/time/timekeeping.h | 1 + 6 files changed, 84 insertions(+), 2 deletions(-) -- 2.9.0