From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E8C0C76196 for ; Fri, 7 Apr 2023 07:08:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239414AbjDGHIn (ORCPT ); Fri, 7 Apr 2023 03:08:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229531AbjDGHIl (ORCPT ); Fri, 7 Apr 2023 03:08:41 -0400 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BD717ED5; Fri, 7 Apr 2023 00:08:40 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id ix20so39402142plb.3; Fri, 07 Apr 2023 00:08:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680851320; x=1683443320; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=fogZpcIhBm+EYQJR/1/ExLYvOd3540V4d7kBYjvBISc=; b=k6HZMKQj51o76jbF9gJLkRe9sZkuLAp7u5+eeJYAgXVgpkgegeCMZkVVV5KqwTWAUn sPJ1yQDoK1GiOZdJRe1OI/otZ8hpc95RWumZUKVHlApuw+rgQs4EkPUyMHu5wf+JVL77 kvZ82e7wU24XjcYaM82qyYgkES6XxYvQSO1ngdQn3gVsMb5GTh+SvdbBCQRNtUFCOnxt YmvQAFnFuiev/spjE8vWg6YM1yOJjfFnMCQ2+f4mzQuUEz/uSDvl4TXKzYc/3DAkvh0T hB4JfciPP3wAR0ZHtGBuZU5HkiJATZVgdxArbtdoB6jdu29376pAQLaZVjd2vu16QHz0 i7nA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680851320; x=1683443320; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fogZpcIhBm+EYQJR/1/ExLYvOd3540V4d7kBYjvBISc=; b=txtm6dThCEUeXEAXGnrEYWPSaeSC2wJVzAmduPLFIXC6UOhEUYdhy1XiId8352C5DV 82DSFd4c45LPeR6gn+z76N/txaXJMLf1pIg8zjujlsug2A8ZERJX5t5b1xjqkTSTiNxh pVeoc5plLjxJKhsM1wiOb25qYTNYN7SJPv1uUwIc8lKIHdqNzDtD+FLFGw7EusBOC4uL 0/rsGwywUkjzE4ht61UU/l0mxymG89BQQjpFj/E74W0I/u/OUHjFSEg0SymzOW0dNf7x Cm/4p3E0WDKThGcDgJ79cdVtbNmYckiCnp4G6fgwQ9rXvYMm8eIuRkBIAzmsMD7wY8T7 mIng== X-Gm-Message-State: AAQBX9efDxIu/b1iWsoLSfyP4tCnK7V4fyVWyVo7iR6/0nOq917zE3Nl bAvO6Xhj9abOmH3motGXa0s= X-Google-Smtp-Source: AKy350ZNTq/beLjyiUwHDgH8lCxSxWJAn1P72EB9rgrSOngKYZUMKh0tnAmSnf3KlDgt6iLI9T7tkw== X-Received: by 2002:a05:6a20:47cf:b0:d5:3818:6427 with SMTP id ey15-20020a056a2047cf00b000d538186427mr1080718pzb.9.1680851319573; Fri, 07 Apr 2023 00:08:39 -0700 (PDT) Received: from [192.168.255.10] ([103.7.29.32]) by smtp.gmail.com with ESMTPSA id e9-20020a62ee09000000b006259e883ee9sm2340322pfi.189.2023.04.07.00.08.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Apr 2023 00:08:39 -0700 (PDT) Message-ID: Date: Fri, 7 Apr 2023 15:08:30 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH v4 11/12] KVM: x86/svm/pmu: Add AMD PerfMonV2 support Content-Language: en-US To: Sean Christopherson Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Sandipan Das References: <20230214050757.9623-1-likexu@tencent.com> <20230214050757.9623-12-likexu@tencent.com> From: Like Xu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/4/2023 9:35 am, Sean Christopherson wrote: > On Tue, Feb 14, 2023, Like Xu wrote: >> + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: >> + if (!msr_info->host_initiated) >> + return 0; /* Writes are ignored */ > > Where is the "writes ignored" behavior documented? I can't find anything in the > APM that defines write behavior. KVM would follow the real hardware behavior once specifications stay silent on details or secret. > >> >> pmu->global_status = data; >> return 0; >> case MSR_CORE_PERF_GLOBAL_CTRL: >> if (!kvm_valid_perf_global_ctrl(pmu, data)) >> return 1; >> - >> + fallthrough; > > This _definitely_ needs a comment. Hmm, and I would prefer to reverse these, i.e. > > case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: > data &= ~pmu->global_ctrl_mask; > fallthrough; > case MSR_CORE_PERF_GLOBAL_CTRL: > if (!kvm_valid_perf_global_ctrl(pmu, data)) > return 1; > > It's a bit arbitrary, but either Intel or AMD is going to end up with extra code, > and IMO skipping a validity check is more alarming than skipping clearing of > reserved bits, i.e. will look like a bug to future readers. > >> + case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: >> + data &= ~pmu->global_ctrl_mask; >> if (pmu->global_ctrl != data) { >> diff = pmu->global_ctrl ^ data; >> pmu->global_ctrl = data; >> @@ -616,7 +625,8 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) >> case MSR_CORE_PERF_GLOBAL_OVF_CTRL: >> if (data & pmu->global_ovf_ctrl_mask) >> return 1; >> - >> + fallthrough; > > Here too. Argh, the APM doesn't actually define what happens on reserved bits, > it just says "WO". I vote to be conservative and ignore writes to reserved bits. > And then we can have one comment for the whole block, e.g. > > /* > * Note, AMD ignores writes to read-only PMU MSRs/bits, whereas Intel > * generates #GP on attempts to write reserved bits or RO MSRs. > */ > switch (msr) { > case MSR_CORE_PERF_GLOBAL_STATUS: > if (!msr_info->host_initiated) > return 1; /* RO MSR */ > fallthrough; > case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: > if (!msr_info->host_initiated) > break; > > pmu->global_status = data; > break; > case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: > data &= ~pmu->global_ctrl_mask; > fallthrough; > case MSR_CORE_PERF_GLOBAL_CTRL: > if (!kvm_valid_perf_global_ctrl(pmu, data)) > return 1; > > if (pmu->global_ctrl != data) { > diff = pmu->global_ctrl ^ data; > pmu->global_ctrl = data; > reprogram_counters(pmu, diff); > } > break; > case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: > fallthrough; > case MSR_CORE_PERF_GLOBAL_OVF_CTRL: > if (data & pmu->global_ovf_ctrl_mask) > return 1; > > if (!msr_info->host_initiated) > pmu->global_status &= ~data; > break; > default: > kvm_pmu_mark_pmc_in_use(vcpu, msr_info->index); > return static_call(kvm_x86_pmu_set_msr)(vcpu, msr_info); > } > > return 0; AMD doesn't generates #GP on attempts to write PMU RO MSRs and reserved bits. How about this: /* * Note, AMD ignores writes to reserved bits and read-only PMU MSRs, * whereas Intel generates #GP on attempts to write reserved/RO MSRs. */ switch (msr) { case MSR_CORE_PERF_GLOBAL_STATUS: if (!msr_info->host_initiated || (data & pmu->global_ovf_ctrl_mask)) return 1; /* RO MSR */ fallthrough; case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: if (!msr_info->host_initiated) break; pmu->global_status = data; break; case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: data &= ~pmu->global_ctrl_mask; fallthrough; case MSR_CORE_PERF_GLOBAL_CTRL: if (!kvm_valid_perf_global_ctrl(pmu, data)) return 1; if (pmu->global_ctrl != data) { diff = pmu->global_ctrl ^ data; pmu->global_ctrl = data; reprogram_counters(pmu, diff); } break; case MSR_CORE_PERF_GLOBAL_OVF_CTRL: if (data & pmu->global_ovf_ctrl_mask) return 1; fallthrough; case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: if (!msr_info->host_initiated) pmu->global_status &= ~data; break; default: kvm_pmu_mark_pmc_in_use(vcpu, msr_info->index); return static_call(kvm_x86_pmu_set_msr)(vcpu, msr_info); } return 0; > >> @@ -164,20 +181,34 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) >> static void amd_pmu_refresh(struct kvm_vcpu *vcpu) >> { >> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >> + struct kvm_cpuid_entry2 *entry; >> + union cpuid_0x80000022_ebx ebx; >> >> - if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) >> + pmu->version = 1; >> + if (guest_cpuid_has(vcpu, X86_FEATURE_PERFMON_V2)) { >> + pmu->version = 2; >> + entry = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0); > > No need for the intermediate "entry". >> + ebx.full = entry->ebx; > > Oof, at first glance this looks like a potential null-pointer deref bug. I > believe we can do > > /* > * Note, PERFMON_V2 is also in 0x80000022.0x0, i.e. the guest > * CPUID entry is guaranteed to be non-NULL. > */ > BUILD_BUG_ON(x86_feature_cpuid(X86_FEATURE_PERFMON_V2).function != 0x80000022 || > x86_feature_cpuid(X86_FEATURE_PERFMON_V2).index != 0x80000022); x86_feature_cpuid(X86_FEATURE_PERFMON_V2).index); > ebx.full = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0)->ebx; > >> + pmu->nr_arch_gp_counters = min_t(unsigned int, >> + ebx.split.num_core_pmc, >> + kvm_pmu_cap.num_counters_gp); >> + } else if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) { >> pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE; > > This needs to be sanitized, no? E.g. if KVM only has access to 4 counters, but > userspace sets X86_FEATURE_PERFCTR_CORE anyways. Hrm, unless I'm missing something, > that's a pre-existing bug. Now your point is that if a user space more capbility than KVM can support, KVM should constrain it. Your previous preference was that the user space can set capbilities that evene if KVM doesn't support as long as it doesn't break KVM and host and the guest will eat its own. > > If I'm right, can you add a patch to cap nr_arch_gp_counters at > kvm_pmu_cap.num_counters_gp in the common flow, i.e. after this if-else block? > Then there is no change needed in this patch, e.g. we'll naturally end up with: > > union cpuid_0x80000022_ebx ebx; > > pmu->version = 1; > if (guest_cpuid_has(vcpu, X86_FEATURE_PERFMON_V2)) { > pmu->version = 2; > /* > * Note, PERFMON_V2 is also in 0x80000022.0x0, i.e. the guest > * CPUID entry is guaranteed to be non-NULL. > */ > BUILD_BUG_ON(x86_feature_cpuid(X86_FEATURE_PERFMON_V2).function != 0x80000022 || > x86_feature_cpuid(X86_FEATURE_PERFMON_V2).index); > ebx.full = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0)->ebx; > pmu->nr_arch_gp_counters = ebx.split.num_core_pmc; > } else if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) { > pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE; > } else { > pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS; > } > > pmu->nr_arch_gp_counters = min_t(unsigned int, > pmu->nr_arch_gp_counters, > kvm_pmu_cap.num_counters_gp);