From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,
	SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C931AC433E0
	for <linux-kernel@archiver.kernel.org>; Fri,  5 Mar 2021 02:33:36 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 9A7C964EC9
	for <linux-kernel@archiver.kernel.org>; Fri,  5 Mar 2021 02:33:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229791AbhCECdc (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 4 Mar 2021 21:33:32 -0500
Received: from mga06.intel.com ([134.134.136.31]:38809 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S229436AbhCECdb (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 4 Mar 2021 21:33:31 -0500
IronPort-SDR: J5ZrL8MbjQAIS+PNlNF2fc4GMVOYa64tAfkOP2STYc+/W8rU7pxgFw1q3+f4bXYPhrRxxGpyLZ
 7PDr+hT1em+A==
X-IronPort-AV: E=McAfee;i="6000,8403,9913"; a="248941328"
X-IronPort-AV: E=Sophos;i="5.81,224,1610438400"; 
   d="scan'208";a="248941328"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Mar 2021 18:33:30 -0800
IronPort-SDR: ZBEGOxWl1Occbip9Q3/yVi4Hx23VPJWDUFNza22g7byXr/dQhCIv+jUgZ6NB6D1WIeO0qGXVVC
 vjEQtDRhraiA==
X-IronPort-AV: E=Sophos;i="5.81,224,1610438400"; 
   d="scan'208";a="401093201"
Received: from likexu-mobl1.ccr.corp.intel.com (HELO [10.238.4.93]) ([10.238.4.93])
  by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Mar 2021 18:33:26 -0800
Subject: Re: [PATCH v3 5/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for
 Arch LBR
To:     Sean Christopherson <seanjc@google.com>
Cc:     Peter Zijlstra <peterz@infradead.org>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>,
        Kan Liang <kan.liang@linux.intel.com>,
        Dave Hansen <dave.hansen@intel.com>, wei.w.wang@intel.com,
        Borislav Petkov <bp@alien8.de>, kvm@vger.kernel.org,
        x86@kernel.org, linux-kernel@vger.kernel.org,
        Like Xu <like.xu@linux.intel.com>
References: <20210303135756.1546253-1-like.xu@linux.intel.com>
 <20210303135756.1546253-6-like.xu@linux.intel.com>
 <YD/APUcINwvP53VZ@google.com>
 <890a6f34-812a-5937-8761-d448a04f67d7@intel.com>
 <YEEG48erESM0+3CB@google.com>
From:   "Xu, Like" <like.xu@intel.com>
Message-ID: <5be999eb-64d7-de0e-254b-82711acc5e24@intel.com>
Date:   Fri, 5 Mar 2021 10:33:24 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
 Thunderbird/78.8.0
MIME-Version: 1.0
In-Reply-To: <YEEG48erESM0+3CB@google.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 2021/3/5 0:12, Sean Christopherson wrote:
> On Thu, Mar 04, 2021, Xu, Like wrote:
>> Hi Sean,
>>
>> Thanks for your detailed review on the patch set.
>>
>> On 2021/3/4 0:58, Sean Christopherson wrote:
>>> On Wed, Mar 03, 2021, Like Xu wrote:
>>>> @@ -348,10 +352,26 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
>>>>    	return true;
>>>>    }
>>>> +/*
>>>> + * Check if the requested depth values is supported
>>>> + * based on the bits [0:7] of the guest cpuid.1c.eax.
>>>> + */
>>>> +static bool arch_lbr_depth_is_valid(struct kvm_vcpu *vcpu, u64 depth)
>>>> +{
>>>> +	struct kvm_cpuid_entry2 *best;
>>>> +
>>>> +	best = kvm_find_cpuid_entry(vcpu, 0x1c, 0);
>>>> +	if (best && depth && !(depth % 8))
>>> This is still wrong, it fails to weed out depth > 64.
>> How come ? The testcases depth = {65, 127, 128} get #GP as expected.
> @depth is a u64, throw in a number that is a multiple of 8 and >= 520, and the
> "(1ULL << (depth / 8 - 1))" will trigger undefined behavior due to shifting
> beyond the capacity of a ULL / u64.

Extra:

when we say "undefined behavior" if shifting beyond the capacity of a ULL,
do you mean that the actual behavior depends on the machine, architecture 
or compiler?

>
> Adding the "< 64" check would also allow dropping the " & 0xff" since the check
> would ensure the shift doesn't go beyond bit 7.  I'm not sure the cleverness is
> worth shaving a cycle, though.

Finally how about:

     if (best && depth && (depth < 65) && !(depth & 7))
         return best->eax & BIT_ULL(depth / 8 - 1);

     return false;

Do you see the room for optimization ？

>
>>> Not that this is a hot path, but it's probably worth double checking that the
>>> compiler generates simple code for "depth % 8", e.g. it can be "depth & 7)".
>> Emm, the "%" operation is quite normal over kernel code.
> So is "&" :-)  I was just pointing out that the compiler should optimize this,
> and it did.
>
>> if (best && depth && !(depth % 8))
>>     10659:       48 85 c0                test   rax,rax
>>     1065c:       74 c7                   je     10625 <intel_pmu_set_msr+0x65>
>>     1065e:       4d 85 e4                test   r12,r12
>>     10661:       74 c2                   je     10625 <intel_pmu_set_msr+0x65>
>>     10663:       41 f6 c4 07             test   r12b,0x7
>>     10667:       75 bc                   jne    10625 <intel_pmu_set_msr+0x65>
>>
>> It looks like the compiler does the right thing.
>> Do you see the room for optimization ？
>>
>>>> +		return (best->eax & 0xff) & (1ULL << (depth / 8 - 1));
> Actually, looking at this again, I would explicitly use BIT() instead of 1ULL
> (or BIT_ULL), since the shift must be 7 or less.
>
>>>> +
>>>> +	return false;
>>>> +}
>>>> +