From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C021DC3A5A2 for ; Mon, 19 Aug 2019 18:30:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A36DB22CF4 for ; Mon, 19 Aug 2019 18:30:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728336AbfHSSaA (ORCPT ); Mon, 19 Aug 2019 14:30:00 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:48188 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728067AbfHSSaA (ORCPT ); Mon, 19 Aug 2019 14:30:00 -0400 Received: from pd9ef1cb8.dip0.t-ipconnect.de ([217.239.28.184] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1hzmPj-0001Tl-5O; Mon, 19 Aug 2019 20:29:55 +0200 Date: Mon, 19 Aug 2019 20:29:47 +0200 (CEST) From: Thomas Gleixner To: Arul Jeniston cc: viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, arul_mc@dell.com Subject: Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. In-Reply-To: Message-ID: References: <20190816083246.169312-1-arul.jeniston@gmail.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Arul, On Mon, 19 Aug 2019, Arul Jeniston wrote: > > hits ktime_get() or whether it happens concurrent on a different CPU. > > ktime_get() can never use inconsistent tk data for calculating the time. > > Agreed. I think, I am not making my point clear here. > Do you mean to say ktime_get() would always return incremental time > irrespective of isr and multi-processors? Yes. The only exception is when the TSC is either jumping or not fully in sync between cores. > If yes, this is where, I have difference of understanding. And your understanding is still wrong. I explain it to you _once_ more: The side which updates the timekeeper: - increments the sequence count _BEFORE_ it changes any data. After that increment the sequence count is odd, i.e. bit 0 is set. - updates data (base, last, mult, shift ....) - increments it again _AFTER_ it updated data. After that increment the sequence count is even, i.e. bit 0 is cleared. The read out side: start: - reads the sequence count - if sequence count is odd (update in progress) go back to start - reads base from timekeeper data - reads TSC and calculates the delta with timekeeper data (last, mult, shift ...), i.e. timekeeping_get_ns(). - reads the sequence count again. If it is even and the same as read above, the data is valid and consistent and the result is returned. If the sequence count is different to the original value it goes back to start. It does not matter at all if timekeeping_get_ns() returns occasionally a wrong value due to timekeeper data being updated concurrently. The result is discarded and never returned to the caller. It tries again. All places which update the timer keeper issue the sequence count increment protection and are properly serialized against each other. So there is no occacional point where ktime_get() would return random crap due to being interrupted by an update or due to a concurrent update on a different CPU. This is a protection mechanism which is well understood in computer science (seqlock with lockless readers) and it works in kernel timekeeping for way more than a decade without any issue except when the underlying hardware clocksource (TSC in that case) misbehaves. There is no way to protect the code against this and we are not going to do anything about it simply because we can't. The fact that you can observe the (cycles < last) condition is not proving anything. Just looking at the (cycles < last) condition is wrong. You need to proof that the result is returned from ktime_get() without a retry despite the sequence counter being changed. I doubt you can. If you can prove that the condition is met _AND_ the sequence counter has NOT changed, then you have proven that the TSC on your machine is not correctly synchronized or otherwise returning crap values. You can make up further weird theories about the incorrectness of that code, but these theories wont become magically true. Thanks, tglx