From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCD8FC55186 for ; Fri, 24 Apr 2020 01:31:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A6DA220704 for ; Fri, 24 Apr 2020 01:31:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726108AbgDXBbp (ORCPT ); Thu, 23 Apr 2020 21:31:45 -0400 Received: from mga17.intel.com ([192.55.52.151]:5811 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725884AbgDXBbp (ORCPT ); Thu, 23 Apr 2020 21:31:45 -0400 IronPort-SDR: CfhctWYX4BNoeXcTE7cVZpGeIMNMLr0299aWgXpf0wRgcDmULLzZPzlnuYfw7oqIAAwba21dOb C/EOr9FbQpyQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Apr 2020 18:31:45 -0700 IronPort-SDR: LMmKJlpb9Vy006lzjkYj3xTycGbie0zo9z4nUNvGCMA8CO4ZHuoN9ZvcadC6ZO0xLbTKr+RpmR 9ii7/5jV0QaQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,309,1583222400"; d="scan'208";a="335183763" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga001.jf.intel.com with ESMTP; 23 Apr 2020 18:31:43 -0700 Date: Thu, 23 Apr 2020 18:32:22 -0700 From: Ricardo Neri To: Giovanni Gherdovich Cc: Srinivas Pandruvada , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Borislav Petkov , Len Brown , "Rafael J . Wysocki" , x86@kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Mel Gorman , Doug Smythies , Like Xu , Neil Rickert , Chris Wilson Subject: Re: [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown Message-ID: <20200424013222.GA26355@ranerica-svr.sc.intel.com> References: <20200416054745.740-1-ggherdovich@suse.cz> <20200416054745.740-2-ggherdovich@suse.cz> <20200422171547.GA11942@ranerica-svr.sc.intel.com> <1587629164.28094.11.camel@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1587629164.28094.11.camel@suse.cz> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 23, 2020 at 10:06:04AM +0200, Giovanni Gherdovich wrote: > On Wed, 2020-04-22 at 10:15 -0700, Ricardo Neri wrote: > > On Thu, Apr 16, 2020 at 07:47:42AM +0200, Giovanni Gherdovich wrote: > > > Some hypervisors such as VMWare ESXi 5.5 advertise support for > > > X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular, > > > MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base > > > clock frequency of the CPU (highest non-turbo frequency), producing a > > > division by zero when computing the ratio turbo_freq/base_freq necessary > > > for frequency invariant accounting. > > > > > > It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate > > > data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance > > > couldn't be done in principle (not that it would make a lot of sense in a > > > VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This > > > appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise > > > that feature. > > > > > > Signed-off-by: Giovanni Gherdovich > > > Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") > > > --- > > > arch/x86/kernel/smpboot.c | 9 +++++++++ > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > > index fe3ab9632f3b..3a318ec9bc17 100644 > > > --- a/arch/x86/kernel/smpboot.c > > > +++ b/arch/x86/kernel/smpboot.c > > > @@ -1985,6 +1985,15 @@ static bool intel_set_max_freq_ratio(void) > > > return false; > > > > > > out: > > > + /* > > > + * Some hypervisors advertise X86_FEATURE_APERFMPERF > > > + * but then fill all MSR's with zeroes. > > > + */ > > > + if (!base_freq) { > > > + pr_debug("Couldn't determine cpu base frequency, necessary for scale-invariant accounting.\n"); > > > + return false; > > > + } > > > > It may be possible that MSR_TURBO_RATIO_LIMIT is also all-zeros. In > > such case, turbo_freq will be also zero. If that is the case, > > arch_max_freq_ratio will be zero and we will see a division by zero > > exception in arch_scale_freq_tick() because mcnt is multiplied by > > arch_max_freq_ratio(). > > Thanks Ricardo for clarifying this. > > Follow-up question: when I see an all-zeros MSR_TURBO_RATIO_LIMIT, can I > assume the CPU doesn't support turbo boost? Or is it possible that such a CPU > has turbo boost, just the turbo ratios aren't declared in the MSR? > > Some context: this feature (called "frequency invariance") wants to know > what's the max clock freq a CPU can have at any time (it needs it for some > scheduler calculations). This is hard to know precisely, because turbo can > kick in at any time and depends on many factors. So it settles for an > "average maximum frequency", which I decided the 4 cores turbo is a good > estimate for. Now, if an all-zeros MSR_TURBO_RATIO_LIMIT means "turbo boost > unsupported", this is actually the easy case because then I know exactly what > the max freq is (base frequency). If, on the other hand, an all-zeros MSR > means "there may or may not be turbo, and you don't know how much" then I must > disable frequency invariance. I'd say that there can be cases in which the CPU has turbo boost and yet the turbo ratios are not declared in MSR_TURBO_RATIO_LIMIT. Hence, frequency invariance should be disabled. Thanks and BR, Ricardo