From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A1AFC433DF for ; Fri, 22 May 2020 10:06:03 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6A1C1205CB for ; Fri, 22 May 2020 10:06:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A1C1205CB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=citrix.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1jc4Yl-0000y2-Ay; Fri, 22 May 2020 10:05:47 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1jc4Yj-0000xx-AW for xen-devel@lists.xenproject.org; Fri, 22 May 2020 10:05:45 +0000 X-Inumbo-ID: cc84cdcc-9c13-11ea-aba9-12813bfff9fa Received: from esa2.hc3370-68.iphmx.com (unknown [216.71.145.153]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id cc84cdcc-9c13-11ea-aba9-12813bfff9fa; Fri, 22 May 2020 10:05:44 +0000 (UTC) Authentication-Results: esa2.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none IronPort-SDR: y7eq3Hxw3uie9y4rdx1w4vE06kp/+C79LbQUjxLbRHkoTo0/t5VK6MXIXnjXf3u6zfsBlOxTkW Vgnpt/VHI+Oec7D5ijLXVqYz+OKOqq/XGamFsaTdxIvVmLMrKjfIIAuIYU60eEVQB8PKzz52pW zgYz2qr5JQpaC5wtueWTyJCQUFPJyk2wlfO7w76UOD+Z6no7nNtOgpUQ++SlcO1TRv+eXYFkyx pfmTpNEjkZJs6apbiFi7ytQKbeT4KAKG9kySmdIfkD6BH8ocCw0pTsmRlL4a1gBcI2hMPltAbe ujg= X-SBRS: 2.7 X-MesageID: 18193849 X-Ironport-Server: esa2.hc3370-68.iphmx.com X-Remote-IP: 162.221.158.21 X-Policy: $RELAYED X-IronPort-AV: E=Sophos;i="5.73,421,1583211600"; d="scan'208";a="18193849" Subject: Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation To: Andrew Cooper , References: <1590097438-28829-1-git-send-email-igor.druzhinin@citrix.com> From: Igor Druzhinin Message-ID: <506f21d4-ed81-2cd5-46af-162407553c91@citrix.com> Date: Fri, 22 May 2020 11:05:39 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: wl@xen.org, jbeulich@suse.com, roger.pau@citrix.com Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" On 22/05/2020 10:45, Andrew Cooper wrote: > On 21/05/2020 22:43, Igor Druzhinin wrote: >> If a recalculation NPT fault hasn't been handled explicitly in >> hvm_hap_nested_page_fault() then it's potentially safe to retry - >> US bit has been re-instated in PTE and any real fault would be correctly >> re-raised next time. >> >> This covers a specific case of migration with vGPU assigned on AMD: >> global log-dirty is enabled and causes immediate recalculation NPT >> fault in MMIO area upon access. This type of fault isn't described >> explicitly in hvm_hap_nested_page_fault (this isn't called on >> EPT misconfig exit on Intel) which results in domain crash. >> >> Signed-off-by: Igor Druzhinin >> --- >> xen/arch/x86/hvm/svm/svm.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c >> index 46a1aac..f0d0bd3 100644 >> --- a/xen/arch/x86/hvm/svm/svm.c >> +++ b/xen/arch/x86/hvm/svm/svm.c >> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v, >> /* inject #VMEXIT(NPF) into guest. */ >> nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa); >> return; >> + case 0: >> + /* If a recalculation page fault hasn't been handled - just retry. */ >> + if ( pfec & PFEC_user_mode ) >> + return; > > This smells like it is a recipe for livelocks. > > Everything should have been handled properly by the call to > p2m_pt_handle_deferred_changes() which precedes svm_do_nested_pgfault(). > > It is legitimate for the MMIO mapping to end up being transiently > recalculated, but the fact that p2m_pt_handle_deferred_changes() doesn't > fix it up suggests that the bug is there. > > Do you have the complete NPT walk to the bad mapping? Do we have > _PAGE_USER in the leaf mapping, or is this perhaps a spurious fault? It does fix it up. The problem is that currently in SVM we enter svm_do_nested_pgfault immediately after p2m_pt_handle_deferred_changes is finished finished. Yes, we don't have _PAGE_USER initially and, yes, it's fixed up correctly in p2m_pt_handle_deferred_changes but svm_do_nested_pgfault doesn't know about it. Please read my second email about alternatives that suggest to resolve the issue you're worrying about. Igor