From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 265F4C433B4 for ; Thu, 22 Apr 2021 07:41:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DAE72613D5 for ; Thu, 22 Apr 2021 07:41:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235049AbhDVHmK (ORCPT ); Thu, 22 Apr 2021 03:42:10 -0400 Received: from mga11.intel.com ([192.55.52.93]:28542 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229629AbhDVHmF (ORCPT ); Thu, 22 Apr 2021 03:42:05 -0400 IronPort-SDR: CU858dNtWOfcK9f11iIqAanr8G/vNgogRCNZVBvr5FILC7MPspYNoNVHBb9npaY/9pO6bQWjXO cdcXYf/32bzA== X-IronPort-AV: E=McAfee;i="6200,9189,9961"; a="192655705" X-IronPort-AV: E=Sophos;i="5.82,242,1613462400"; d="scan'208";a="192655705" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Apr 2021 00:41:31 -0700 IronPort-SDR: rR8R41KGJ9UZ9H0Q4EhJ8nwmQmVsv8czyA61v0GVlXTdhfZwF4hEPcEvpUbNa/LeVaUqSPbKxA Ui7QMMrN9RVw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,242,1613462400"; d="scan'208";a="455682013" Received: from shbuild999.sh.intel.com (HELO localhost) ([10.239.147.94]) by fmsmga002.fm.intel.com with ESMTP; 22 Apr 2021 00:41:27 -0700 Date: Thu, 22 Apr 2021 15:41:26 +0800 From: Feng Tang To: Xing Zhengjun , paulmck@kernel.org Cc: paulmck@kernel.org, Thomas Gleixner , John Stultz , Stephen Boyd , Jonathan Corbet , Mark Rutland , Marc Zyngier , Andi Kleen , Chris Mason , LKML , lkp@lists.01.org, lkp@intel.com Subject: Re: [LKP] Re: [clocksource] 6c52b5f3cf: stress-ng.opcode.ops_per_sec -14.4% regression Message-ID: <20210422074126.GA85095@shbuild999.sh.intel.com> References: <20210420064934.GE31773@xsang-OptiPlex-9020> <20210420134331.GM975577@paulmck-ThinkPad-P17-Gen-1> <20210420140552.GA3158164@paulmck-ThinkPad-P17-Gen-1> <04f4752e-6c5a-8439-fe75-6363d212c7b2@intel.com> <20210421134224.GR975577@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Paul, On Thu, Apr 22, 2021 at 02:58:27PM +0800, Xing Zhengjun wrote: > > > On 4/21/2021 9:42 PM, Paul E. McKenney wrote: > >On Wed, Apr 21, 2021 at 02:07:19PM +0800, Xing, Zhengjun wrote: > >> > >>On 4/20/2021 10:05 PM, Paul E. McKenney wrote: > >>>On Tue, Apr 20, 2021 at 06:43:31AM -0700, Paul E. McKenney wrote: > >>>>On Tue, Apr 20, 2021 at 02:49:34PM +0800, kernel test robot wrote: > >>>>>Greeting, > >>>>> > >>>>>FYI, we noticed a -14.4% regression of stress-ng.opcode.ops_per_sec due to commit: > >>>>> > >>>>> > >>>>>commit: 6c52b5f3cfefd6e429efc4413fd25e3c394e959f ("clocksource: Reduce WATCHDOG_THRESHOLD") > >>>>>https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.04.13a > >>>>> > >>>>> > >>>>>in testcase: stress-ng > >>>>>on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory > >>>>>with following parameters: > >>>>> > >>>>> nr_threads: 10% > >>>>> disk: 1HDD > >>>>> testtime: 60s > >>>>> fs: ext4 > >>>>> class: os > >>>>> test: opcode > >>>>> cpufreq_governor: performance > >>>>> ucode: 0x5003006 > >>>>Hmmm... I will try a less-aggressive reduction. Thank you for testing! > >>>But wait... This code is only running twice per second. It is very > >>>hard to believe that a clock-read retry twice per second is worth 2% of > >>>performance, let alone 14.4%. > >>> > >>>Is something else perhaps going on here? > >>> > >>>For example, did this run enable any of the new diagnositic clocksource.* > >>>kernel parameters? > >>> > >>> Thanx, Paul > >>I attached the kernel log, the following logs are related with the > >>clocksource. > >>[    3.453206] clocksource: timekeeping watchdog on CPU1: Marking > >>clocksource 'tsc-early' as unstable because the skew is too large: > >>[    3.455197] clocksource:                       'hpet' wd_now: 288fcc0 > >>wd_last: 1a8b333 mask: ffffffff > >>[    3.455199] clocksource:                       'tsc-early' cs_now: > >>1def309ebfdee cs_last: 1def2bd70d92c mask: ffffffffffffffff > >>[    3.455201] clocksource:                       No current clocksource. > >>[    3.457197] tsc: Marking TSC unstable due to clocksource watchdog > >> > >>6c52b5f3cf reduced WATCHDOG_THRESHOLD, then in clocksource_watchdog, the > >>warning logs are print, the TSC is marked as unstable. > >>/* Check the deviation from the watchdog clocksource. */ > >Aha, so this system really does have an unstable TSC! Which means that > >the patch is operating as designed. > > > >Or are you saying that this is a false positive? > > > > Thanx, Paul > > It happened during boot and before TSC calibration > (tsc_refine_calibration_work()), so on some machines "abs(cs_nsec - wd_nsec) > > WATCHDOG_THRESHOLD", WATCHDOG_THRESHOLD is set too small at that time. > After TSC calibrated, abs(cs_nsec - wd_nsec) should be very small, > WATCHDOG_THRESHOLD for here is ok. So I suggest increasing the > WATCHDOG_THRESHOLD before TSC calibration, for example, the clocks be skewed > by more than 1% to be marked unstable. As Zhengjun measuered, this is a Cascade Lake platform, and it has 2 times calibration of tsc, the first one of early quick calibration gives 2100 MHz, while the later accurate calibration gives 2095 MHz, so there is about 2.5/1000 deviation for the first number, which just exceeds the 1/1000 threshold you set :) Following is the tsc freq info from kernel log [ 0.000000] DMI: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 0.000000] tsc: Detected 2100.000 MHz processor ... [ 13.859982] tsc: Refined TSC clocksource calibration: 2095.077 MHz Thanks, Feng From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1712627951442034487==" MIME-Version: 1.0 From: Feng Tang To: lkp@lists.01.org Subject: Re: [clocksource] 6c52b5f3cf: stress-ng.opcode.ops_per_sec -14.4% regression Date: Thu, 22 Apr 2021 15:41:26 +0800 Message-ID: <20210422074126.GA85095@shbuild999.sh.intel.com> In-Reply-To: List-Id: --===============1712627951442034487== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi Paul, On Thu, Apr 22, 2021 at 02:58:27PM +0800, Xing Zhengjun wrote: > = > = > On 4/21/2021 9:42 PM, Paul E. McKenney wrote: > >On Wed, Apr 21, 2021 at 02:07:19PM +0800, Xing, Zhengjun wrote: > >> > >>On 4/20/2021 10:05 PM, Paul E. McKenney wrote: > >>>On Tue, Apr 20, 2021 at 06:43:31AM -0700, Paul E. McKenney wrote: > >>>>On Tue, Apr 20, 2021 at 02:49:34PM +0800, kernel test robot wrote: > >>>>>Greeting, > >>>>> > >>>>>FYI, we noticed a -14.4% regression of stress-ng.opcode.ops_per_sec = due to commit: > >>>>> > >>>>> > >>>>>commit: 6c52b5f3cfefd6e429efc4413fd25e3c394e959f ("clocksource: Redu= ce WATCHDOG_THRESHOLD") > >>>>>https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git d= ev.2021.04.13a > >>>>> > >>>>> > >>>>>in testcase: stress-ng > >>>>>on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz= with 192G memory > >>>>>with following parameters: > >>>>> > >>>>> nr_threads: 10% > >>>>> disk: 1HDD > >>>>> testtime: 60s > >>>>> fs: ext4 > >>>>> class: os > >>>>> test: opcode > >>>>> cpufreq_governor: performance > >>>>> ucode: 0x5003006 > >>>>Hmmm... I will try a less-aggressive reduction. Thank you for testi= ng! > >>>But wait... This code is only running twice per second. It is very > >>>hard to believe that a clock-read retry twice per second is worth 2% of > >>>performance, let alone 14.4%. > >>> > >>>Is something else perhaps going on here? > >>> > >>>For example, did this run enable any of the new diagnositic clocksourc= e.* > >>>kernel parameters? > >>> > >>> Thanx, Paul > >>I attached the kernel log, the following logs are related with the > >>clocksource. > >>[=C2=A0=C2=A0=C2=A0 3.453206] clocksource: timekeeping watchdog on CPU1= : Marking > >>clocksource 'tsc-early' as unstable because the skew is too large: > >>[=C2=A0=C2=A0=C2=A0 3.455197] clocksource:=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 'hpet' wd_now: 288fcc0 > >>wd_last: 1a8b333 mask: ffffffff > >>[=C2=A0=C2=A0=C2=A0 3.455199] clocksource:=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 'tsc-early' cs_now: > >>1def309ebfdee cs_last: 1def2bd70d92c mask: ffffffffffffffff > >>[=C2=A0=C2=A0=C2=A0 3.455201] clocksource:=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 No current clocksource. > >>[=C2=A0=C2=A0=C2=A0 3.457197] tsc: Marking TSC unstable due to clocksou= rce watchdog > >> > >>6c52b5f3cf reduced WATCHDOG_THRESHOLD, then in clocksource_watchdog, the > >>warning logs are print, the TSC is marked as unstable. > >>/* Check the deviation from the watchdog clocksource. */ > >Aha, so this system really does have an unstable TSC! Which means that > >the patch is operating as designed. > > > >Or are you saying that this is a false positive? > > > > Thanx, Paul > = > It happened during boot and before TSC calibration > (tsc_refine_calibration_work()), so on some machines "abs(cs_nsec - wd_ns= ec) > > WATCHDOG_THRESHOLD", WATCHDOG_THRESHOLD is set too small at that time. > After TSC calibrated, abs(cs_nsec - wd_nsec) should be very small, > WATCHDOG_THRESHOLD for here is ok. So I suggest increasing the > WATCHDOG_THRESHOLD before TSC calibration, for example, the clocks be ske= wed > by more than 1% to be marked unstable. As Zhengjun measuered, this is a Cascade Lake platform, and it has 2 times calibration of tsc, the first one of early quick calibration gives 2100 MHz, while the later accurate calibration gives 2095 MHz, so there is about 2.5/1000 deviation for the first number, which just exceeds the 1/1000 threshold you set :) Following is the tsc freq info from kernel log [ 0.000000] DMI: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.0= 2.01.0008.031920191559 03/19/2019 [ 0.000000] tsc: Detected 2100.000 MHz processor ... [ 13.859982] tsc: Refined TSC clocksource calibration: 2095.077 MHz Thanks, Feng --===============1712627951442034487==--