From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AC70C07E99 for ; Sat, 10 Jul 2021 00:52:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 05A42613B6 for ; Sat, 10 Jul 2021 00:52:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231293AbhGJAza (ORCPT ); Fri, 9 Jul 2021 20:55:30 -0400 Received: from mail.kernel.org ([198.145.29.99]:58424 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229931AbhGJAza (ORCPT ); Fri, 9 Jul 2021 20:55:30 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6CEAC613BF; Sat, 10 Jul 2021 00:52:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1625878365; bh=NJp6PfKPBhF9FzKqtEXxMZ26tOfT4MTzbfvemX8jly0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=nRLVEEybkxBU2V5Y+vMe+lttN40BM1s5yLR5U4enJOtiFd7/g1t69j7uFNqD2Xqxy ILXajiAzVGymPWkLQxSk1JIzLidti5jolVptPpkaBfhc+4AGnkm0lBXrnmt7fgVmxm GRe8RicXlXsUvAURooBhT3CobTEuvgWWpomgDp1Ef+VhkUUf9+SApDYErB5LQeXd01 bmlIOxugamlWx/PQHiLLgzRQ0oi04XBZqMPrPjZsdQSFla6YlFqwMBrl/zLb5DO2Zr /PpGNfXlv8Xk3V7rjf2TbTIXe8Si/5BLKZiZ/uP18EbUkXR+bZSjTaLXWw8zCb/06v OAXnfZ7ceRGBw== Date: Sat, 10 Jul 2021 02:52:43 +0200 From: Frederic Weisbecker To: Nicolas Saenz Julienne Cc: He Zhe , anna-maria@linutronix.de, linux-kernel@vger.kernel.org, tglx@linutronix.de Subject: Re: [PATCH] timers: Fix get_next_timer_interrupt() with no timers pending Message-ID: <20210710005243.GA23956@lothringen> References: <20200723151641.12236-1-frederic@kernel.org> <20210708153620.GA6716@lothringen> <20210709084303.GA17239@lothringen> <11e85cd8-40ac-09fe-e1fe-0eafa351072c@windriver.com> <4409fa71931446d9cabd849431ee0098c9b31292.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4409fa71931446d9cabd849431ee0098c9b31292.camel@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 09, 2021 at 04:13:25PM +0200, Nicolas Saenz Julienne wrote: > 31cd0e119d50 ("timers: Recalculate next timer interrupt only when > necessary") subtly altered get_next_timer_interrupt()'s behaviour. The > function no longer consistently returns KTIME_MAX with no timers > pending. > > In order to decide if there are any timers pending we check whether the > next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now. > Unfortunately, the next expiry time and the timer base clock are no > longer updated in unison. The former changes upon certain timer > operations (enqueue, expire, detach), whereas the latter keeps track of > jiffies as they move forward. Ultimately breaking the logic above. > > A simplified example: > > - Upon entering get_next_timer_interrupt() with: > > jiffies = 1 > base->clk = 0; > base->next_expiry = NEXT_TIMER_MAX_DELTA; > > 'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function > returns KTIME_MAX. > > - 'base->clk' is updated to the jiffies value. > > - The next time we enter get_next_timer_interrupt(), taking into account > no timer operations happened: > > base->clk = 1; > base->next_expiry = NEXT_TIMER_MAX_DELTA; > > 'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function > returns a valid expire time, which is incorrect. > > This ultimately might unnecessarily rearm sched's timer on nohz_full > setups, and add latency to the system[1]. > > So, introduce 'base->timers_pending'[2], update it every time > 'base->next_expiry' changes, and use it in get_next_timer_interrupt(). > > [1] See tick_nohz_stop_tick(). > [2] A quick pahole check on x86_64 and arm64 shows it doesn't make > 'struct timer_base' any bigger. > > Fixes: 31cd0e119d50 ("timers: Recalculate next timer interrupt only when necessary") > Signed-off-by: Nicolas Saenz Julienne Very good catch. And the fix looks good: Acked-by: Frederic Weisbecker I guess later we can turn this .timers_pending into .timers_count and that would spare us the costly call to __next_timer_interrupt() up to the last level after the last timer is dequeued. Anyway, thanks a lot!