From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C540C433FE for ; Wed, 23 Mar 2022 20:18:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344591AbiCWUTa (ORCPT ); Wed, 23 Mar 2022 16:19:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240576AbiCWUT1 (ORCPT ); Wed, 23 Mar 2022 16:19:27 -0400 Received: from mail-yb1-xb2c.google.com (mail-yb1-xb2c.google.com [IPv6:2607:f8b0:4864:20::b2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C8178BF28 for ; Wed, 23 Mar 2022 13:17:57 -0700 (PDT) Received: by mail-yb1-xb2c.google.com with SMTP id j2so4865552ybu.0 for ; Wed, 23 Mar 2022 13:17:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+f+hkXEMqCDbBc5k9itvwvl/EyJykaOvhcwPJN5igM8=; b=cgJ5PdNPSnzZO53gJodR4J6UqkOLzbinHvabBtMW9HEiEwzobQs/HEA0DX5wJQhqJ8 wcq0ZpWhgVTaXL6ym68hyA+Iw7b6slyEn2XEMmkMraqOwCDf5nCwDX4rWzVWIhdh/DAW zzylVqv0dJHs0tgjPuBwWspI9L49c+m8Gmwy8buYp3OxUhHyka764i5BLu5jxjzLD13v eu1pq1U5K8Kyhwo68WR+/EekLa99GqF/CBJJGUIRm/p7IQRuK6pvtc0EOXaEP516MBi4 i4+IdwDeVF4DB7ZYAn3uQoRAhSm5m2fs9BX3P1XRKkFUp+ghy4Yq7uahkcl/RkU4AM1w IX6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+f+hkXEMqCDbBc5k9itvwvl/EyJykaOvhcwPJN5igM8=; b=nVAIOPCX08YAn1NVJMwBoqYYjAl6GsAfLaz3QDWwmg0OAk8uOeU4nCeQC8DxdYk2rE rWtJJzaHb9YWVqbkh1kINJ8K018LapJ554ltXYJge4Yx4l5STn4KWhGSGRmec6h0S9x8 j7JR5WCXpAGDIb9mbNGNswjs9ehsZ3SiZ4wPbG38fJFpDJ4fFzK05sl2XYZQ+jhH12L+ /myjB18ZJQUMkAsblMsW07lIcca+rB4tR6vs65TwV5zgdOmh9rARXtVlCEUqXjmiVQVG wA8bV7wkELBDVE4x+pfpNiJTd9dQP6CjUnVpo9q1AwvdvOjIz4mtDOAO3uMHzRG42NtQ /P7g== X-Gm-Message-State: AOAM531hxtdBFWl/dgKuYQE/e1faFpi+ORHbsyjYUAWOYeN5LLuFJJQb p/yVvGDT16HCTDZuWo44QMF/jSLlsGn1o/NBP1ho+Q== X-Google-Smtp-Source: ABdhPJyAMqmsqsOhRiAmhz+lBqJttS0XHfqfbeokZE5gB4NHjjgwuACok/noPo6+zxu6wq+bOBMZJNdRNSIhWic2BFA= X-Received: by 2002:a25:7387:0:b0:633:8a4d:ae8 with SMTP id o129-20020a257387000000b006338a4d0ae8mr1790817ybc.286.1648066676402; Wed, 23 Mar 2022 13:17:56 -0700 (PDT) MIME-Version: 1.0 References: <6e096d8509ef40ce3e25c1e132643e772641241b.1646422845.git.isaku.yamahata@intel.com> <20220323190812.GH1964605@ls.amr.corp.intel.com> In-Reply-To: <20220323190812.GH1964605@ls.amr.corp.intel.com> From: Erdem Aktas Date: Wed, 23 Mar 2022 13:17:45 -0700 Message-ID: Subject: Re: [RFC PATCH v5 073/104] KVM: TDX: track LP tdx vcpu run and teardown vcpus on descroing the guest TD To: Isaku Yamahata Cc: "Yamahata, Isaku" , "open list:KERNEL VIRTUAL MACHINE (KVM)" , open list , Paolo Bonzini , Jim Mattson , Connor Kuehl , Sean Christopherson Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org So the tdh_vp_flush should always succeed while vm is being torn down. Thanks Isaku for the explanation, and I think it would be great to add the error message. -Erdem On Wed, Mar 23, 2022 at 12:08 PM Isaku Yamahata wrote: > > On Tue, Mar 22, 2022 at 05:54:45PM -0700, > Erdem Aktas wrote: > > > On Fri, Mar 4, 2022 at 11:50 AM wrote: > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > > index a6b1a8ce888d..690298fb99c7 100644 > > > --- a/arch/x86/kvm/vmx/tdx.c > > > +++ b/arch/x86/kvm/vmx/tdx.c > > > @@ -48,6 +48,14 @@ struct tdx_capabilities tdx_caps; > > > static DEFINE_MUTEX(tdx_lock); > > > static struct mutex *tdx_mng_key_config_lock; > > > > > > +/* > > > + * A per-CPU list of TD vCPUs associated with a given CPU. Used when a CPU > > > + * is brought down to invoke TDH_VP_FLUSH on the approapriate TD vCPUS. > > > + * Protected by interrupt mask. This list is manipulated in process context > > > + * of vcpu and IPI callback. See tdx_flush_vp_on_cpu(). > > > + */ > > > +static DEFINE_PER_CPU(struct list_head, associated_tdvcpus); > > > + > > > static u64 hkid_mask __ro_after_init; > > > static u8 hkid_start_pos __ro_after_init; > > > > > > @@ -87,6 +95,8 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx) > > > > > > static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu) > > > { > > > + list_del(&to_tdx(vcpu)->cpu_list); > > > + > > > /* > > > * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1, > > > * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU > > > @@ -97,6 +107,22 @@ static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu) > > > vcpu->cpu = -1; > > > } > > > > > > +void tdx_hardware_enable(void) > > > +{ > > > + INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, raw_smp_processor_id())); > > > +} > > > + > > > +void tdx_hardware_disable(void) > > > +{ > > > + int cpu = raw_smp_processor_id(); > > > + struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu); > > > + struct vcpu_tdx *tdx, *tmp; > > > + > > > + /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */ > > > + list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list) > > > + tdx_disassociate_vp(&tdx->vcpu); > > > +} > > > + > > > static void tdx_clear_page(unsigned long page) > > > { > > > const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); > > > @@ -230,9 +256,11 @@ void tdx_mmu_prezap(struct kvm *kvm) > > > struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > > > cpumask_var_t packages; > > > bool cpumask_allocated; > > > + struct kvm_vcpu *vcpu; > > > u64 err; > > > int ret; > > > int i; > > > + unsigned long j; > > > > > > if (!is_hkid_assigned(kvm_tdx)) > > > return; > > > @@ -248,6 +276,17 @@ void tdx_mmu_prezap(struct kvm *kvm) > > > return; > > > } > > > > > > + kvm_for_each_vcpu(j, vcpu, kvm) > > > + tdx_flush_vp_on_cpu(vcpu); > > > + > > > + mutex_lock(&tdx_lock); > > > + err = tdh_mng_vpflushdone(kvm_tdx->tdr.pa); > > > > Hi Isaku, > > Hi. > > > > I am wondering about the impact of the failures on these functions. Is > > there any other function which recovers any failures here? > > When I look at the tdx_flush_vp function, it seems like it can fail > > due to task migration so tdx_flush_vp_on_cpu might also fail and if it > > fails, tdh_mng_vpflushdone returns err. Since tdx_vm_teardown does not > > return any error , how the VMM can free the keyid used in this TD. > > Will they be forever in "used state"? > > Also if tdx_vm_teardown fails, the kvm_tdx->hkid is never set to -1 > > which will prevent tdx_vcpu_free to free and reclaim the resources > > allocated for the vcpu. > > mmu_prezap() is called via release callback of mmu notifier when the last mmu > reference of this process is dropped. It is after all kvm vcpu fd and kvm vm > fd were closed. vcpu will never run. But we still hold kvm_vcpu structures. > There is no race between tdh_vp_flush()/tdh_mng_vpflushdone() here and process > migration. tdh_vp_flush()/tdh_mng_vp_flushdone() should success. > > The cpuid check in tdx_flush_vp() is for vcpu_load() which may race with process > migration. > > Anyway what if one of those TDX seamcalls fails? HKID is leaked and will be > never used because there is no good way to free and use HKID safely. Such > failure is due to unknown issue and probably a bug. > > One mitigation is to add pr_err() when HKID leak happens. I'll add such message > on next respin. > > thanks, > -- > Isaku Yamahata