From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1568C433EF for ; Mon, 2 May 2022 07:56:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359014AbiEBH7e (ORCPT ); Mon, 2 May 2022 03:59:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243888AbiEBH7b (ORCPT ); Mon, 2 May 2022 03:59:31 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C089C29CA9 for ; Mon, 2 May 2022 00:56:03 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1651478162; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oSTsTvc0i3X+jcobiwC8b/0Z6n30nlP5evwaZXnZegM=; b=okjCS72WW/zgt3w/ziE/MMfsTjIrUSVQFO0RXs+GPl/L9V76kOYHdWayuBtXsDLniKL8Ql fuxxGyMEWoQRXwpt9aTog7sizd/eBmdS4MXUC/xyANVtDhkKF25G+YGNJaxRUfrS7rdkz5 3vQp0XI3AuMI0qj0a+nB0r9ykYUuRSh4D1HRFMJ4sbxg8qhyNtItcVaLb1pOJqQlAWd6Zo lypOuA4db3vKV5IJxfqRabOuOHwLPuWmqkVuLieSGG2dnrnVQ9ojon6q/widLLXlcq7YUB gqveHUiRSirDtR1eyC7V9A8JTOtf55hOvMF0ycxnKoEhH/J+oM1weyXD7pmcNA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1651478162; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oSTsTvc0i3X+jcobiwC8b/0Z6n30nlP5evwaZXnZegM=; b=70uwt0LUFfpBzUcqZQ/4Uo/fGpZCStcsokrQ6yEFiIW3asCuHZJ7tZjUFw+tWrsCV8lmDV H81sBkHYEShx40DQ== To: Waiman Long , Ingo Molnar , Borislav Petkov , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, "H. Peter Anvin" , Feng Tang , Bill Gray , Jirka Hladky Subject: Re: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment In-Reply-To: References: <20220314194630.1726542-1-longman@redhat.com> <20220314194630.1726542-3-longman@redhat.com> <87czhymql2.ffs@tglx> <87levx8kou.ffs@tglx> <4f02fe46-b253-2809-0af7-f2e9da091fe9@redhat.com> <87czh50xwf.ffs@tglx> <68837b1a-f85b-e842-f8c0-1cad162856f4@redhat.com> <87h76ew3sb.ffs@tglx> Date: Mon, 02 May 2022 09:56:01 +0200 Message-ID: <87a6c0nzam.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 29 2022 at 13:41, Waiman Long wrote: > On 4/27/22 18:38, Thomas Gleixner wrote: >> I grant you that it does not matter for the loop under the assumption >> that the loop runs at constant frequency, but is that a guarantee that >> it does not matter later on? > Yes, that is my point that frequency doesn't matter if frequency remain=20 > the same. Of course, all bets are off if frequency really change. Frequency changes after boot. >> If you overcompensate by a factor of 3 because the upcoming CPU ran at >> the lowest frequency, then it might become visible later when everything >> runs at full speed. > I don't think the overhead will be directly proportional to the cpu=20 > frequency. A 3X increase in frequency will certainly cause the overhead=20 > to be lowered, but it won't be 1/3. Maybe 1/2 at most. It's very close to proportional according to my experiments. >>> However about we half the measure sync_overhead as compensation to avoid >>> over-estimation, but probably increase the chance that we need a second >>> adjustment of TSC wrap. >> Half of what? > > What I mean is > > @@ -533,7 +551,7 @@ void check_tsc_sync_target(void) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * really depends on CPU= , node distance and frequency. Add the > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * estimated sync overhe= ad to the adjustment value. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 cur->adjusted +=3D cur_max_warp + s= ync_overhead; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 cur->adjusted +=3D cur_max_warp + s= ync_overhead/2; The point is? Make it magically half while it still jumps around: >>> With this patch applied, the measured overhead on the same CooperLake >>> system on different reboot runs varies from 104 to 326. >> Half of something which jumps around? Not convinced. :) ... > I have applied your patch with some modification and below was the=20 > relevant part of the boot up log on a Cooperlake system with this TSC=20 > sync problem. > > [=C2=A0=C2=A0=C2=A0 0.008858] smpboot: CPU 36 Converting physical 0 to lo= gical die 2 > [=C2=A0=C2=A0=C2=A0 0.008858] Sync overhead: 230 > [=C2=A0=C2=A0=C2=A0 0.008858] Sync overhead: 547 A: 149597 M: 149596 F: 2= 500016 > [=C2=A0=C2=A0=C2=A0 0.008858] TSC ADJUST compensate: CPU36 observed 76342= warp=20 > (overhead 230). Adjust: 76457 > [=C2=A0=C2=A0=C2=A0 0.008858] smpboot: CPU 54 Converting physical 0 to lo= gical die 3 > [=C2=A0=C2=A0=C2=A0 0.008858] Sync overhead: 178 > [=C2=A0=C2=A0=C2=A0 0.008858] Sync overhead: 658 A: 177970 M: 177968 F: 2= 500028 > [=C2=A0=C2=A0=C2=A0 0.008858] TSC ADJUST compensate: CPU54 observed 76568= warp=20 > (overhead 178). Adjust: 76657 > > BTW, CPUs 36 and 54 are the first CPU of socket 2 and 3 respectively. It= =20 > is always these CPUs that need TSC adjustment. > > I have no idea why the sync overhead actually increase in the subsequent= =20 > measurement as the frequency was set at 2.5GHz, I think. See, that's the thing I'm worried about. An allegedly remote lock takes 178 cycles, but a loop of cache local lock operations takes on average 658 cycles. That does not make sense and I haven't seen that behaviour on any of my machines I ran that on. The average was always less than the initial measurement. > I tried to offline all the CPUs in a socket and then online the first=20 > CPU as suggested. However, I was not able to cause the tsc_sync loop > to run. It runs for sure when the first CPU on a socket goes online except when TSC is already disabled. It does not matter whether it was online before or not. > BTW, without my patch the same system will boot up with the following > log: We know that by now. > How about we add the full sync_overhead at bootup stage, but then half=20 > sync_overhead after boot in case the tsc_sync loop is run again for new=20 > online CPUs? Or any other scheme that you think is appropriate. Can you please stop grasping for straws? I'm as much interested as you to get this solved, but I'm not interested in 'works for me' and 'we don't know why it works' solutions simply because they end up on my desk sooner than later again. Thanks tglx