From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E43B9C761A6 for ; Thu, 30 Mar 2023 01:02:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229718AbjC3BCB (ORCPT ); Wed, 29 Mar 2023 21:02:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229448AbjC3BB6 (ORCPT ); Wed, 29 Mar 2023 21:01:58 -0400 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7094A4ED4; Wed, 29 Mar 2023 18:01:56 -0700 (PDT) Received: by mail-pj1-x102f.google.com with SMTP id a16so15757144pjs.4; Wed, 29 Mar 2023 18:01:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680138116; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ambahkN5XdW7qXHNDINIC24o08jAaGeXQsgkARkrmIU=; b=ouosHywmC5LEy+i6XmI8rHhLKmQifTAE77Oxt7BFzarPRrLZvQuUz025xKtrDyUL0B 8nQp5ULIftg9mlK2GghSMJDHlSK5J/Y6pl/xgZp1YKNdQ05pOmdw9vmldaSpVyRF72OT woVRtqPOt3Y8ZzmMg2ZtJs4SSAwuGQtcERvWxOJBXisUZQ8NiMoPhWpRCABUta7Yofu/ P+M4svfKhwuuk2fMfOAHtnzF4NDd6gwI9eVPkzxtW1ImPpgtXPFwoq2Dl0eGZizqUAD9 zjvOM+Y4POvGIuHfnu7poGMB5zDRFDrs3nBUMtLW/o66ad/meUcHboIAl8WNjvGsb91G LMXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680138116; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ambahkN5XdW7qXHNDINIC24o08jAaGeXQsgkARkrmIU=; b=xkzpj8Sh0lGsBptpN4mcBnzOc/BbFc/95pXBxnLjUSUoyWuS669UhmezI+bhON+VYV uZqRfeF4R+qNrOeA3KaG0pUkhjR1n2EoyTEBO65TrpfF8NtXAPTgSJ/k3RocRzWnXd30 MCrwJDxiQzFFxC4LjWt1XFhXA3pTr9c0FxpJDLCm6RphTfrwYMfDsrUEp7nXZze0EcWa y2Hi8Co1/ktHzF7d0T+gD9jOhNsGmaVaILy+8XsO9l8Qa4/iuVsYC189PQXJAds8k29l D4wY83kEz+JnAmGkLIDG635ZZV84WKqJkrueaAIww78sSBVxVPjWWyGMNF+1iJ/TJAT1 oHtA== X-Gm-Message-State: AAQBX9e7Fh+WZOSH1fSiVdWWZvJgPNCj7etTF0cu9wzkajpHxYb8nTwf lNQwiB+f2qfZ2h7dXUkal/M= X-Google-Smtp-Source: AKy350bfnPPh8o+VeAI2wWqoxyNfA776Yqe7apoBE1tgQRdwqiR1e1GA/MchzzDgZvEtnVO2zrrCpQ== X-Received: by 2002:a17:90b:4f4d:b0:23f:7c82:2463 with SMTP id pj13-20020a17090b4f4d00b0023f7c822463mr23109684pjb.9.1680138115714; Wed, 29 Mar 2023 18:01:55 -0700 (PDT) Received: from localhost ([192.55.54.55]) by smtp.gmail.com with ESMTPSA id rj3-20020a17090b3e8300b0023f685f7914sm2020152pjb.49.2023.03.29.18.01.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 18:01:55 -0700 (PDT) Date: Wed, 29 Mar 2023 18:01:53 -0700 From: Isaku Yamahata To: Zhi Wang Cc: isaku.yamahata@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar , David Matlack , Kai Huang , Sean Christopherson Subject: Re: [PATCH v13 019/113] KVM: TDX: create/destroy VM structure Message-ID: <20230330010153.GB1112017@ls.amr.corp.intel.com> References: <7c011a5c9dd92cfb9074297af22d132a4e57fd11.1678643052.git.isaku.yamahata@intel.com> <20230326140936.00003397@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20230326140936.00003397@gmail.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, thanks for review. On Sun, Mar 26, 2023 at 02:09:36PM +0300, Zhi Wang wrote: > On Sun, 12 Mar 2023 10:55:43 -0700 > isaku.yamahata@intel.com wrote: > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > index 8b02e605cfb5..3ede8a726b47 100644 > > --- a/arch/x86/kvm/vmx/tdx.c > > +++ b/arch/x86/kvm/vmx/tdx.c > > @@ -5,8 +5,9 @@ > > > > #include "capabilities.h" > > #include "x86_ops.h" > > -#include "x86.h" > > #include "tdx.h" > > +#include "tdx_ops.h" > > +#include "x86.h" > > > > #undef pr_fmt > > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > @@ -46,11 +47,276 @@ int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) > > return r; > > } > > > > +struct tdx_info { > > + u8 nr_tdcs_pages; > > +}; > > + > > +/* Info about the TDX module. */ > > +static struct tdx_info tdx_info; > > + > > +/* > > + * Some TDX SEAMCALLs (TDH.MNG.CREATE, TDH.PHYMEM.CACHE.WB, > > + * TDH.MNG.KEY.RECLAIMID, TDH.MNG.KEY.FREEID etc) tries to acquire a global lock > > + * internally in TDX module. If failed, TDX_OPERAND_BUSY is returned without > > + * spinning or waiting due to a constraint on execution time. It's caller's > > + * responsibility to avoid race (or retry on TDX_OPERAND_BUSY). Use this mutex > > + * to avoid race in TDX module because the kernel knows better about scheduling. > > + */ > > +static DEFINE_MUTEX(tdx_lock); > > +static struct mutex *tdx_mng_key_config_lock; > > + > > +static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) > > +{ > > + return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); > > +} > > + > > +static inline bool is_td_created(struct kvm_tdx *kvm_tdx) > > +{ > > + return kvm_tdx->tdr_pa; > > +} > > + > > +static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) > > +{ > > + tdx_guest_keyid_free(kvm_tdx->hkid); > > + kvm_tdx->hkid = 0; > > +} > > + > > +static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx) > > +{ > > + return kvm_tdx->hkid > 0; > > +} > > + > > int tdx_hardware_enable(void) > > { > > return tdx_cpu_enable(); > > } > > > > +static void tdx_clear_page(unsigned long page_pa) > > +{ > > + const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); > > + void *page = __va(page_pa); > > + unsigned long i; > > + > > + if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { > > + clear_page(page); > > + return; > > + } > > Is it possbile to have a TDX machine without MOVDIR64B support? I am not sure > if there is any other way for the kernel to clear the posioned cache line. > > If not, there should be a warn/bug at least and check if MOVDIR64B support > when initializing the TDX. Because the latest TDX specification uses movdir64b, it's safe for TDX to assume movdir64b. I'll add the check to TDX initialization part and drop it from here. > > +void tdx_mmu_release_hkid(struct kvm *kvm) > > +{ > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > > + cpumask_var_t packages; > > + bool cpumask_allocated; > > + u64 err; > > + int ret; > > + int i; > > + > > + if (!is_hkid_assigned(kvm_tdx)) > > + return; > > + > > + if (!is_td_created(kvm_tdx)) > > + goto free_hkid; > > + > > + cpumask_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL); > > + cpus_read_lock(); > > + for_each_online_cpu(i) { > > + if (cpumask_allocated && > > + cpumask_test_and_set_cpu(topology_physical_package_id(i), > > + packages)) > > + continue; > > Is this necessary to check cpumask_allocated in the while loop? if cpumask > is not succefully allocated, wouldn't it be better to bail out just after > it? No because we can't return error here. It's better to do in-efficiently freeing resources instead of leak. We can move the check out of loop. But it would be ugly (if () {cpu loop} else {cpu loop} ) and this function isn't performance critical. Also I think it's okay to depend on compiler optimization for loop invariant. My compiler didn't optimize it in this case, though. > > + > > + /* > > + * We can destroy multiple the guest TDs simultaneously. > > + * Prevent tdh_phymem_cache_wb from returning TDX_BUSY by > > + * serialization. > > + */ > > + mutex_lock(&tdx_lock); > > + ret = smp_call_on_cpu(i, tdx_do_tdh_phymem_cache_wb, NULL, 1); > > + mutex_unlock(&tdx_lock); > > + if (ret) > > + break; > > + } > > + cpus_read_unlock(); > > + free_cpumask_var(packages); > > + > > + mutex_lock(&tdx_lock); > > + err = tdh_mng_key_freeid(kvm_tdx->tdr_pa); > > + mutex_unlock(&tdx_lock); > > + if (WARN_ON_ONCE(err)) { > > + pr_tdx_error(TDH_MNG_KEY_FREEID, err, NULL); > > + pr_err("tdh_mng_key_freeid failed. HKID %d is leaked.\n", > > + kvm_tdx->hkid); > > + return; > > + } > > + > > +free_hkid: > > + tdx_hkid_free(kvm_tdx); > > +} > > + > > +void tdx_vm_free(struct kvm *kvm) > > +{ > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > > + int i; > > + > > + /* Can't reclaim or free TD pages if teardown failed. */ > > + if (is_hkid_assigned(kvm_tdx)) > > + return; > > + > > Better to explain why, as it is common to think even the teardown failed, we > should still try to reclaim the pages as many as we can. Ok, here is the updated comment. /* * tdx_mmu_release_hkid() failed to reclaim HKID. Something went wrong * heavily with TDX module. Give up freeing TD pages. As the function * already warned, don't warn it again. */ -- Isaku Yamahata