From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1754919AbdKCVMH (ORCPT <rfc822;w@1wt.eu>);
        Fri, 3 Nov 2017 17:12:07 -0400
Received: from mx1.redhat.com ([209.132.183.28]:45616 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1752114AbdKCVMF (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 3 Nov 2017 17:12:05 -0400
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com AF075C0587FA
Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lcapitulino@redhat.com
Date: Fri, 3 Nov 2017 17:12:03 -0400
From: Luiz Capitulino <lcapitulino@redhat.com>
To: fweisbec@gmail.com
Cc: tglx@linutronix.de, mtosatti@redhat.com, williams@redhat.com,
        nicstange@gmail.com, linux-kernel@vger.kernel.org
Subject: [nohz_full/apic] multiple timer interrupts a second
Message-ID: <20171103170703.46d72a31@redhat.com>
Organization: Red Hat
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Fri, 03 Nov 2017 21:12:05 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


[CC'ing lkml this time]

Hi,

I've observed that smp_apic_timer_interrupt() is sometimes called
two or more times a second on a nohz_full core which has a single
task taking 100% of the core. In one of the calls, hrtimer_interrupt()
runs tick_sched_timer(), but in others it doesn't call any handler.
Here's an example (Linus HEAD f34157878):

<...>-1831  [008]  1060.115578: funcgraph_entry:                   |  smp_apic_timer_interrupt() {
<...>-1831  [008]  1060.115578: funcgraph_entry:                   |    hrtimer_interrupt() {
<...>-1831  [008]  1060.115578: hrtimer_expire_entry: hrtimer=0xffff8edfefd12d60 function=tick_sched_timer now=1060079001980
<...>-1831  [008]  1060.115578: funcgraph_entry:        1.172 us   |      tick_sched_timer();
<...>-1831  [008]  1060.115580: funcgraph_exit:         1.757 us   |    }
<...>-1831  [008]  1060.115580: hrtimer_start:        hrtimer=0xffff8edfefd12d60 function=tick_sched_timer expires=1061079000000 softexpires=1061079000000
<...>-1831  [008]  1060.115581: funcgraph_exit:         3.026 us   |  }
<...>-1831  [008]  1061.115577: funcgraph_entry:                   |  smp_apic_timer_interrupt() {
<...>-1831  [008]  1061.115577: funcgraph_entry:        0.261 us   |    hrtimer_interrupt(); <---------- NO handler called
<...>-1831  [008]  1061.115578: funcgraph_exit:         1.349 us   |  }
<...>-1831  [008]  1061.115579: funcgraph_entry:                   |  smp_apic_timer_interrupt() {
<...>-1831  [008]  1061.115579: funcgraph_entry:                   |    hrtimer_interrupt() {
<...>-1831  [008]  1061.115579: hrtimer_expire_entry: hrtimer=0xffff8edfefd12d60 function=tick_sched_timer now=1061079001473
<...>-1831  [008]  1061.115580: funcgraph_entry:        1.413 us   |      tick_sched_timer();
<...>-1831  [008]  1061.115581: funcgraph_exit:         2.124 us   |    }
<...>-1831  [008]  1061.115582: hrtimer_start:        hrtimer=0xffff8edfefd12d60 function=tick_sched_timer expires=1062079000000 softexpires=1062079000000
<...>-1831  [008]  1061.115582: funcgraph_exit:         3.255 us   |  }

Is this expected for some reason?

I guess what's happening is that the deadline timer is firing
earlier than expected. From a few dozen to a few hundreds
nanoseconds earlier. When this happens, hrtimer_interrupt()
skips calling the hrtimer handler (since it's early) and the
apic is programmed to fire in the next microsecond.

On further research I saw that Nicolai tried to fix a very similar
problem last year:

commit 1a9e4c564ab174e53ed86def922804a5ddc63e7d
Author: Nicolai Stange <nicstange@gmail.com>
Date:   Thu Jul 14 17:22:54 2016 +0200

    x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
    
    I noticed the following bug/misbehavior on certain Intel systems: with a
    single task running on a NOHZ CPU on an Intel Haswell, I recognized
    that I did not only get the one expected local_timer APIC interrupt, but
    two per second at minimum. (!)

Maybe this issue is still present?