From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34471C433DB for ; Thu, 7 Jan 2021 02:40:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D686F23100 for ; Thu, 7 Jan 2021 02:40:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726824AbhAGCjs (ORCPT ); Wed, 6 Jan 2021 21:39:48 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:32588 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726260AbhAGCjs (ORCPT ); Wed, 6 Jan 2021 21:39:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1609987102; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P+QXdNKmRS/r+PXKz582NlhSIlxrJQLOOsv8NwtWn7s=; b=YtZb01Mkz9Nz4lg3HrFjXlPOmXxxa7z+zeYPWElNUHy37z8NGtrxY3dn9QUA1fPlThmlzi LhsF2BELA1kIShtHnflKz4ddm/qG4+IDrqQk6pcpO0lC0WjM7d3RmjUxl5uik+QoMjaBdX z4ZTql3BZ+Y0slBC/KN/McIrV2kWYs8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-450-oxxrwXkxPLSe8408U34oAA-1; Wed, 06 Jan 2021 21:38:18 -0500 X-MC-Unique: oxxrwXkxPLSe8408U34oAA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9CFCE800D53; Thu, 7 Jan 2021 02:38:16 +0000 (UTC) Received: from starship (unknown [10.35.206.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 899EC27C3C; Thu, 7 Jan 2021 02:38:12 +0000 (UTC) Message-ID: <4e9db353de15333e17e023c91e2e0b4ec3d880c7.camel@redhat.com> Subject: Re: [PATCH 2/2] KVM: nVMX: fix for disappearing L1->L2 event injection on L1 migration From: Maxim Levitsky To: Sean Christopherson Cc: kvm@vger.kernel.org, Joerg Roedel , Wanpeng Li , "open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , Vitaly Kuznetsov , "H. Peter Anvin" , Sean Christopherson , Paolo Bonzini , Ingo Molnar , Borislav Petkov , Jim Mattson , Thomas Gleixner Date: Thu, 07 Jan 2021 04:38:11 +0200 In-Reply-To: References: <20210106105306.450602-1-mlevitsk@redhat.com> <20210106105306.450602-3-mlevitsk@redhat.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Wed, 2021-01-06 at 10:17 -0800, Sean Christopherson wrote: > On Wed, Jan 06, 2021, Maxim Levitsky wrote: > > If migration happens while L2 entry with an injected event to L2 is pending, > > we weren't including the event in the migration state and it would be > > lost leading to L2 hang. > > But the injected event should still be in vmcs12 and KVM_STATE_NESTED_RUN_PENDING > should be set in the migration state, i.e. it should naturally be copied to > vmcs02 and thus (re)injected by vmx_set_nested_state(). Is nested_run_pending > not set? Is the info in vmcs12 somehow lost? Or am I off in left field... You are completely right. The injected event can be copied like that since the vmc(b|s)12 is migrated. We can safely disregard both these two patches and the parallel two patches for SVM. I am almost sure that the real root cause of this bug was that we weren't restoring the nested run pending flag, and I even happened to fix this in this patch series. This is the trace of the bug (I removed the timestamps to make it easier to read) kvm_exit: vcpu 0 reason vmrun rip 0xffffffffa0688ffa info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000 kvm_nested_vmrun: rip: 0xffffffffa0688ffa vmcb: 0x0000000103594000 nrip: 0xffffffff814b3b01 int_ctl: 0x01000001 event_inj: 0x80000036 npt: on ^^^ this is the injection kvm_nested_intercepts: cr_read: 0010 cr_write: 0010 excp: 00060042 intercepts: bc4c8027 00006e7f 00000000 kvm_fpu: unload kvm_userspace_exit: reason KVM_EXIT_INTR (10) ============================================================================ migration happens here ============================================================================ ... kvm_async_pf_ready: token 0xffffffff gva 0 kvm_apic_accept_irq: apicid 0 vec 243 (Fixed|edge) kvm_nested_intr_vmexit: rip: 0x000000000000fff0 ^^^^^ this is the nested vmexit that shouldn't have happened, since nested run is pending, and which erased the eventinj field which was migrated correctly just like you say. kvm_nested_vmexit_inject: reason: interrupt ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000 ... We did notice that this vmexit had a wierd RIP and I even explained this later to myself, that this is the default RIP which we put to vmcb, and it wasn't yet updated, since it updates just prior to vm entry. My test already survived about 170 iterations (usually it crashes after 20-40 iterations) I am leaving the stress test running all night, let see if it survives. V2 of the patches is on the way. Thanks again for the help! Best regards, Maxim Levitsky > > > Fix this by queueing the injected event in similar manner to how we queue > > interrupted injections. > > > > This can be reproduced by running an IO intense task in L2, > > and repeatedly migrating the L1. > > > > Suggested-by: Paolo Bonzini > > Signed-off-by: Maxim Levitsky > > --- > > arch/x86/kvm/vmx/nested.c | 12 ++++++------ > > 1 file changed, 6 insertions(+), 6 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > > index e2f26564a12de..2ea0bb14f385f 100644 > > --- a/arch/x86/kvm/vmx/nested.c > > +++ b/arch/x86/kvm/vmx/nested.c > > @@ -2355,12 +2355,12 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12) > > * Interrupt/Exception Fields > > */ > > if (vmx->nested.nested_run_pending) { > > - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, > > - vmcs12->vm_entry_intr_info_field); > > - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, > > - vmcs12->vm_entry_exception_error_code); > > - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, > > - vmcs12->vm_entry_instruction_len); > > + if ((vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK)) > > + vmx_process_injected_event(&vmx->vcpu, > > + vmcs12->vm_entry_intr_info_field, > > + vmcs12->vm_entry_instruction_len, > > + vmcs12->vm_entry_exception_error_code); > > + > > vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, > > vmcs12->guest_interruptibility_info); > > vmx->loaded_vmcs->nmi_known_unmasked = > > -- > > 2.26.2 > >