From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-23.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 99E99C433DB
	for <linux-kernel@archiver.kernel.org>; Wed, 27 Jan 2021 18:24:30 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 47C8C64D9A
	for <linux-kernel@archiver.kernel.org>; Wed, 27 Jan 2021 18:24:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S236211AbhA0SYJ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 27 Jan 2021 13:24:09 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38718 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S236061AbhA0SYC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 27 Jan 2021 13:24:02 -0500
Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 952DAC06174A
        for <linux-kernel@vger.kernel.org>; Wed, 27 Jan 2021 10:23:22 -0800 (PST)
Received: by mail-pg1-x52c.google.com with SMTP id z21so2157578pgj.4
        for <linux-kernel@vger.kernel.org>; Wed, 27 Jan 2021 10:23:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to;
        bh=jJRPrAWF7MFclOHB2Lr1Il6UnBQA+yMSwpuT+dr3FeE=;
        b=p+JOm0MaAq0DKLuPmm62TKPoLEb63Ivgqxil78dFCdEx4sz1UgEhA/sr1202i1f6fO
         gP7dBosWqWE8SEl5Q9bXMvRttmO8MBkch9w74sMo3b4uL32ehzuvDUOBIATwr0R0jhK0
         wGRAbqIP9ZOtS4RGRPo2nOos/oPULUsm4x6AllyAgIBAyWdz6upYM6KonsT6v1VM7h02
         rCbHzpgKaZvHxKNM2a0EAofk1HjicEqG+hgMq7srXFKGVCqhp+ey+fC+nLuHf0PTADwe
         rQtWBV+yPvFIpO4pNyMKRoWlZirWQrvsMnCt7yvFN3uXHS8Umhyg+bkawtJCTQvJHdiF
         ARtA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to;
        bh=jJRPrAWF7MFclOHB2Lr1Il6UnBQA+yMSwpuT+dr3FeE=;
        b=OHGxj4Oz5X3n+1B7Qv0VsviHbxgfbcR6z3etzUfcVjFM0wUbnqt20MVenOveotQuwf
         oIWcNU++07FLyj2tl2X9WE0D4Lez58BwsZxEe/MkFIzWumZU8mdi0TekO1mR5rFDJjRX
         sivktcf+tct3SwOeiCO+D6tRodYVpcOqcJRxBq79Qn/mWSHmIvLSqUx+dpI3ty96I3no
         HyZXS/+IWBMC7HXFEVME4nlJO+UNdRBLllaCDNv0unCf4wXZGU2z5J0jOYo53KxU16pV
         wLcVKLSmvFJxuAJqKbCEQITwXSf3HMWBMY3eyO7RswTBmGvC8BIpBm+7WfQjlBtR99Q8
         eDVA==
X-Gm-Message-State: AOAM530SLxl9lG9W3iTMIw+TwXoV76K1JV9+W3kNezaI1yCmNcFc28fi
        ZPbA3uKup+flSDTfu1Wkgp4DXg==
X-Google-Smtp-Source: ABdhPJwSTnTQ18ANJPAv6GUZl6bhKkOAA3IiPI3vNrcPn/QyI7/VnJlki21EIEAx7yZmuM3T0cOiNg==
X-Received: by 2002:a63:5459:: with SMTP id e25mr1610122pgm.403.1611771801851;
        Wed, 27 Jan 2021 10:23:21 -0800 (PST)
Received: from google.com ([2620:15c:f:10:1ea0:b8ff:fe73:50f5])
        by smtp.gmail.com with ESMTPSA id m73sm3022198pga.25.2021.01.27.10.23.19
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 27 Jan 2021 10:23:21 -0800 (PST)
Date:   Wed, 27 Jan 2021 10:23:14 -0800
From:   Sean Christopherson <seanjc@google.com>
To:     David Stevens <stevensd@chromium.org>
Cc:     Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
        James Morse <james.morse@arm.com>,
        Julien Thierry <julien.thierry.kdev@gmail.com>,
        Suzuki K Poulose <suzuki.poulose@arm.com>,
        linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu,
        Huacai Chen <chenhuacai@kernel.org>,
        Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
        linux-mips@vger.kernel.org, Paul Mackerras <paulus@ozlabs.org>,
        kvm-ppc@vger.kernel.org,
        Christian Borntraeger <borntraeger@de.ibm.com>,
        Janosch Frank <frankja@linux.ibm.com>,
        David Hildenbrand <david@redhat.com>,
        Cornelia Huck <cohuck@redhat.com>,
        Claudio Imbrenda <imbrenda@linux.ibm.com>
Subject: Re: [PATCH v2] KVM: x86/mmu: consider the hva in mmu_notifier retry
Message-ID: <YBGvku1KUUk6LPAj@google.com>
References: <20210127024504.613844-1-stevensd@google.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="MkeFThl52DvUx8zi"
Content-Disposition: inline
In-Reply-To: <20210127024504.613844-1-stevensd@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


--MkeFThl52DvUx8zi
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Wed, Jan 27, 2021, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> Track the range being invalidated by mmu_notifier and skip page fault
> retries if the fault address is not affected by the in-progress
> invalidation. Handle concurrent invalidations by finding the minimal
> range which includes all ranges being invalidated. Although the combined
> range may include unrelated addresses and cannot be shrunk as individual
> invalidation operations complete, it is unlikely the marginal gains of
> proper range tracking are worth the additional complexity.
> 
> The primary benefit of this change is the reduction in the likelihood of
> extreme latency when handing a page fault due to another thread having
> been preempted while modifying host virtual addresses.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
> v1 -> v2:
>  - improve handling of concurrent invalidation requests by unioning
>    ranges, instead of just giving up and using [0, ULONG_MAX).

Ooh, even better.

>  - add lockdep check
>  - code comments and formatting
> 
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |  2 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
>  arch/x86/kvm/mmu/mmu.c                 | 16 ++++++++------
>  arch/x86/kvm/mmu/paging_tmpl.h         |  7 ++++---
>  include/linux/kvm_host.h               | 27 +++++++++++++++++++++++-
>  virt/kvm/kvm_main.c                    | 29 ++++++++++++++++++++++----
>  6 files changed, 67 insertions(+), 16 deletions(-)
> 

...

> @@ -3717,7 +3720,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
>  
> -	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable))
> +	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
> +			 write, &map_writable))
>  		return RET_PF_RETRY;
>  
>  	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
> @@ -3725,7 +3729,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  
>  	r = RET_PF_RETRY;
>  	spin_lock(&vcpu->kvm->mmu_lock);
> -	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
> +	if (mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))

'hva' will be uninitialized at this point if the gfn did not resolve to a
memslot, i.e. when handling an MMIO page fault.  On the plus side, that's an
opportunity for another optimization as there is no need to retry MMIO page
faults on mmu_notifier invalidations.  Including the attached patch as a preqreq
to this will avoid consuming an uninitialized 'hva'.


>  		goto out_unlock;
>  	r = make_mmu_pages_available(vcpu);
>  	if (r)

...

>  void kvm_release_pfn_clean(kvm_pfn_t pfn);
>  void kvm_release_pfn_dirty(kvm_pfn_t pfn);
> @@ -1203,6 +1206,28 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
>  		return 1;
>  	return 0;
>  }
> +
> +static inline int mmu_notifier_retry_hva(struct kvm *kvm,
> +					 unsigned long mmu_seq,
> +					 unsigned long hva)
> +{
> +#ifdef CONFIG_LOCKDEP
> +	lockdep_is_held(&kvm->mmu_lock);

No need to manually do the #ifdef, just use lockdep_assert_held instead of
lockdep_is_held.

> +#endif
> +	/*
> +	 * If mmu_notifier_count is non-zero, then the range maintained by
> +	 * kvm_mmu_notifier_invalidate_range_start contains all addresses that
> +	 * might be being invalidated. Note that it may include some false
> +	 * positives, due to shortcuts when handing concurrent invalidations.
> +	 */
> +	if (unlikely(kvm->mmu_notifier_count) &&
> +	    kvm->mmu_notifier_range_start <= hva &&
> +	    hva < kvm->mmu_notifier_range_end)

Uber nit: I find this easier to read if 'hva' is on the left-hand side for both
checks, i.e.

	if (unlikely(kvm->mmu_notifier_count) &&
	    hva >= kvm->mmu_notifier_range_start &&
	    hva < kvm->mmu_notifier_range_end)

> +		return 1;
> +	if (kvm->mmu_notifier_seq != mmu_seq)
> +		return 1;
> +	return 0;
> +}
>  #endif
>  
>  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING

--MkeFThl52DvUx8zi
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment;
	filename="0001-KVM-x86-mmu-Skip-mmu_notifier-check-when-handling-MM.patch"

>From a1bfdc6fe16582440815cfecc656313dff993003 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 27 Jan 2021 10:04:45 -0800
Subject: [PATCH] KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page
 fault

Don't retry a page fault due to an mmu_notifier invalidation when
handling a page fault for a GPA that did not resolve to a memslot, i.e.
an MMIO page fault.  Invalidations from the mmu_notifier signal a change
in a host virtual address (HVA) mapping; without a memslot, there is no
HVA and thus no possibility that the invalidation is relevant to the
page fault being handled.

Note, the MMIO vs. memslot generation checks handle the case where a
pending memslot will create a memslot overlapping the faulting GPA.  The
mmu_notifier checks are orthogonal to memslot updates.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 2 +-
 arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d16481aa29d..9ac0a727015d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3725,7 +3725,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	r = RET_PF_RETRY;
 	spin_lock(&vcpu->kvm->mmu_lock);
-	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
+	if (!is_noslot_pfn(pfn) && mmu_notifier_retry(vcpu->kvm, mmu_seq))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 50e268eb8e1a..ab54263d857c 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -869,7 +869,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 
 	r = RET_PF_RETRY;
 	spin_lock(&vcpu->kvm->mmu_lock);
-	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
+	if (!is_noslot_pfn(pfn) && mmu_notifier_retry(vcpu->kvm, mmu_seq))
 		goto out_unlock;
 
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
-- 
2.30.0.280.ga3ce27912f-goog


--MkeFThl52DvUx8zi--

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=cUgw=G6=lists.cs.columbia.edu=kvmarm-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00,
	DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B26F3C433E0
	for <kvmarm@archiver.kernel.org>; Wed, 27 Jan 2021 19:34:30 +0000 (UTC)
Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253])
	by mail.kernel.org (Postfix) with ESMTP id 3541C64DA3
	for <kvmarm@archiver.kernel.org>; Wed, 27 Jan 2021 19:34:29 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3541C64DA3
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvmarm-bounces@lists.cs.columbia.edu
Received: from localhost (localhost [127.0.0.1])
	by mm01.cs.columbia.edu (Postfix) with ESMTP id 5A5784B38D;
	Wed, 27 Jan 2021 14:34:29 -0500 (EST)
X-Virus-Scanned: at lists.cs.columbia.edu
Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail
	(fail, message has been altered) header.i=@google.com
Received: from mm01.cs.columbia.edu ([127.0.0.1])
	by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id md2522RvFS6Y; Wed, 27 Jan 2021 14:34:28 -0500 (EST)
Received: from mm01.cs.columbia.edu (localhost [127.0.0.1])
	by mm01.cs.columbia.edu (Postfix) with ESMTP id 1D9B84B2B2;
	Wed, 27 Jan 2021 14:34:28 -0500 (EST)
Received: from localhost (localhost [127.0.0.1])
 by mm01.cs.columbia.edu (Postfix) with ESMTP id 40EB24B310
 for <kvmarm@lists.cs.columbia.edu>; Wed, 27 Jan 2021 13:23:24 -0500 (EST)
X-Virus-Scanned: at lists.cs.columbia.edu
Received: from mm01.cs.columbia.edu ([127.0.0.1])
 by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id JLbYg-WiC9sG for <kvmarm@lists.cs.columbia.edu>;
 Wed, 27 Jan 2021 13:23:23 -0500 (EST)
Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com
 [209.85.210.178])
 by mm01.cs.columbia.edu (Postfix) with ESMTPS id F2F184B240
 for <kvmarm@lists.cs.columbia.edu>; Wed, 27 Jan 2021 13:23:22 -0500 (EST)
Received: by mail-pf1-f178.google.com with SMTP id j12so1722160pfj.12
 for <kvmarm@lists.cs.columbia.edu>; Wed, 27 Jan 2021 10:23:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to;
 bh=jJRPrAWF7MFclOHB2Lr1Il6UnBQA+yMSwpuT+dr3FeE=;
 b=p+JOm0MaAq0DKLuPmm62TKPoLEb63Ivgqxil78dFCdEx4sz1UgEhA/sr1202i1f6fO
 gP7dBosWqWE8SEl5Q9bXMvRttmO8MBkch9w74sMo3b4uL32ehzuvDUOBIATwr0R0jhK0
 wGRAbqIP9ZOtS4RGRPo2nOos/oPULUsm4x6AllyAgIBAyWdz6upYM6KonsT6v1VM7h02
 rCbHzpgKaZvHxKNM2a0EAofk1HjicEqG+hgMq7srXFKGVCqhp+ey+fC+nLuHf0PTADwe
 rQtWBV+yPvFIpO4pNyMKRoWlZirWQrvsMnCt7yvFN3uXHS8Umhyg+bkawtJCTQvJHdiF
 ARtA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=jJRPrAWF7MFclOHB2Lr1Il6UnBQA+yMSwpuT+dr3FeE=;
 b=ZsS1eT/pVsXlbEhto/d0sL81VJ/3szdY+USD4UPQxKDviHWacwKsJaaQYWtrivuDOd
 8AJAK9jP33L1L/0WaAioha9qgKMkd7SFzZXCtJMSVJrqLClIjuINqGkxtS0mczVqK3p5
 t015DJtlvFgztOEqc/dn7aOcr6MYKgS9jTS5JJdv+SOHKGeMoyXuQZHiJvELjTSeVNGm
 6/EIjPE7ygk7PwDsd3O90+iNlJSMvxwWLxsfDLhMlYMvad6yjkKmrunl1O9xFbnCsyWx
 4jI5o54ZxowX1DkD6iRjU1MznWCdxqQEmqorur7EvX78ZQ0ytx8fXGUJvm2JEvN0TIJ8
 CeJg==
X-Gm-Message-State: AOAM531aqw+FQ1M1zsSpxbV5a6r4dKFNTSGQtzK0ApI9WgFkx9TnuGmd
 VgMn0rciBvZOw6Yw1LZXXXVuAQ==
X-Google-Smtp-Source: ABdhPJwSTnTQ18ANJPAv6GUZl6bhKkOAA3IiPI3vNrcPn/QyI7/VnJlki21EIEAx7yZmuM3T0cOiNg==
X-Received: by 2002:a63:5459:: with SMTP id e25mr1610122pgm.403.1611771801851; 
 Wed, 27 Jan 2021 10:23:21 -0800 (PST)
Received: from google.com ([2620:15c:f:10:1ea0:b8ff:fe73:50f5])
 by smtp.gmail.com with ESMTPSA id m73sm3022198pga.25.2021.01.27.10.23.19
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 27 Jan 2021 10:23:21 -0800 (PST)
Date: Wed, 27 Jan 2021 10:23:14 -0800
From: Sean Christopherson <seanjc@google.com>
To: David Stevens <stevensd@chromium.org>
Subject: Re: [PATCH v2] KVM: x86/mmu: consider the hva in mmu_notifier retry
Message-ID: <YBGvku1KUUk6LPAj@google.com>
References: <20210127024504.613844-1-stevensd@google.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="MkeFThl52DvUx8zi"
Content-Disposition: inline
In-Reply-To: <20210127024504.613844-1-stevensd@google.com>
X-Mailman-Approved-At: Wed, 27 Jan 2021 14:34:27 -0500
Cc: Wanpeng Li <wanpengli@tencent.com>, kvm@vger.kernel.org,
 David Hildenbrand <david@redhat.com>, linux-mips@vger.kernel.org,
 Paul Mackerras <paulus@ozlabs.org>, Claudio Imbrenda <imbrenda@linux.ibm.com>,
 kvmarm@lists.cs.columbia.edu, Janosch Frank <frankja@linux.ibm.com>,
 Marc Zyngier <maz@kernel.org>, Joerg Roedel <joro@8bytes.org>,
 Huacai Chen <chenhuacai@kernel.org>,
 Christian Borntraeger <borntraeger@de.ibm.com>,
 Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>, kvm-ppc@vger.kernel.org,
 linux-arm-kernel@lists.infradead.org, Jim Mattson <jmattson@google.com>,
 Cornelia Huck <cohuck@redhat.com>, linux-kernel@vger.kernel.org,
 Paolo Bonzini <pbonzini@redhat.com>, Vitaly Kuznetsov <vkuznets@redhat.com>
X-BeenThere: kvmarm@lists.cs.columbia.edu
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Where KVM/ARM decisions are made <kvmarm.lists.cs.columbia.edu>
List-Unsubscribe: <https://lists.cs.columbia.edu/mailman/options/kvmarm>,
 <mailto:kvmarm-request@lists.cs.columbia.edu?subject=unsubscribe>
List-Archive: <https://lists.cs.columbia.edu/pipermail/kvmarm>
List-Post: <mailto:kvmarm@lists.cs.columbia.edu>
List-Help: <mailto:kvmarm-request@lists.cs.columbia.edu?subject=help>
List-Subscribe: <https://lists.cs.columbia.edu/mailman/listinfo/kvmarm>,
 <mailto:kvmarm-request@lists.cs.columbia.edu?subject=subscribe>
Errors-To: kvmarm-bounces@lists.cs.columbia.edu
Sender: kvmarm-bounces@lists.cs.columbia.edu


--MkeFThl52DvUx8zi
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Wed, Jan 27, 2021, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> Track the range being invalidated by mmu_notifier and skip page fault
> retries if the fault address is not affected by the in-progress
> invalidation. Handle concurrent invalidations by finding the minimal
> range which includes all ranges being invalidated. Although the combined
> range may include unrelated addresses and cannot be shrunk as individual
> invalidation operations complete, it is unlikely the marginal gains of
> proper range tracking are worth the additional complexity.
> 
> The primary benefit of this change is the reduction in the likelihood of
> extreme latency when handing a page fault due to another thread having
> been preempted while modifying host virtual addresses.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
> v1 -> v2:
>  - improve handling of concurrent invalidation requests by unioning
>    ranges, instead of just giving up and using [0, ULONG_MAX).

Ooh, even better.

>  - add lockdep check
>  - code comments and formatting
> 
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |  2 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
>  arch/x86/kvm/mmu/mmu.c                 | 16 ++++++++------
>  arch/x86/kvm/mmu/paging_tmpl.h         |  7 ++++---
>  include/linux/kvm_host.h               | 27 +++++++++++++++++++++++-
>  virt/kvm/kvm_main.c                    | 29 ++++++++++++++++++++++----
>  6 files changed, 67 insertions(+), 16 deletions(-)
> 

...

> @@ -3717,7 +3720,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
>  
> -	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable))
> +	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
> +			 write, &map_writable))
>  		return RET_PF_RETRY;
>  
>  	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
> @@ -3725,7 +3729,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  
>  	r = RET_PF_RETRY;
>  	spin_lock(&vcpu->kvm->mmu_lock);
> -	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
> +	if (mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))

'hva' will be uninitialized at this point if the gfn did not resolve to a
memslot, i.e. when handling an MMIO page fault.  On the plus side, that's an
opportunity for another optimization as there is no need to retry MMIO page
faults on mmu_notifier invalidations.  Including the attached patch as a preqreq
to this will avoid consuming an uninitialized 'hva'.


>  		goto out_unlock;
>  	r = make_mmu_pages_available(vcpu);
>  	if (r)

...

>  void kvm_release_pfn_clean(kvm_pfn_t pfn);
>  void kvm_release_pfn_dirty(kvm_pfn_t pfn);
> @@ -1203,6 +1206,28 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
>  		return 1;
>  	return 0;
>  }
> +
> +static inline int mmu_notifier_retry_hva(struct kvm *kvm,
> +					 unsigned long mmu_seq,
> +					 unsigned long hva)
> +{
> +#ifdef CONFIG_LOCKDEP
> +	lockdep_is_held(&kvm->mmu_lock);

No need to manually do the #ifdef, just use lockdep_assert_held instead of
lockdep_is_held.

> +#endif
> +	/*
> +	 * If mmu_notifier_count is non-zero, then the range maintained by
> +	 * kvm_mmu_notifier_invalidate_range_start contains all addresses that
> +	 * might be being invalidated. Note that it may include some false
> +	 * positives, due to shortcuts when handing concurrent invalidations.
> +	 */
> +	if (unlikely(kvm->mmu_notifier_count) &&
> +	    kvm->mmu_notifier_range_start <= hva &&
> +	    hva < kvm->mmu_notifier_range_end)

Uber nit: I find this easier to read if 'hva' is on the left-hand side for both
checks, i.e.

	if (unlikely(kvm->mmu_notifier_count) &&
	    hva >= kvm->mmu_notifier_range_start &&
	    hva < kvm->mmu_notifier_range_end)

> +		return 1;
> +	if (kvm->mmu_notifier_seq != mmu_seq)
> +		return 1;
> +	return 0;
> +}
>  #endif
>  
>  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING

--MkeFThl52DvUx8zi
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment;
	filename="0001-KVM-x86-mmu-Skip-mmu_notifier-check-when-handling-MM.patch"

>From a1bfdc6fe16582440815cfecc656313dff993003 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 27 Jan 2021 10:04:45 -0800
Subject: [PATCH] KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page
 fault

Don't retry a page fault due to an mmu_notifier invalidation when
handling a page fault for a GPA that did not resolve to a memslot, i.e.
an MMIO page fault.  Invalidations from the mmu_notifier signal a change
in a host virtual address (HVA) mapping; without a memslot, there is no
HVA and thus no possibility that the invalidation is relevant to the
page fault being handled.

Note, the MMIO vs. memslot generation checks handle the case where a
pending memslot will create a memslot overlapping the faulting GPA.  The
mmu_notifier checks are orthogonal to memslot updates.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 2 +-
 arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d16481aa29d..9ac0a727015d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3725,7 +3725,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	r = RET_PF_RETRY;
 	spin_lock(&vcpu->kvm->mmu_lock);
-	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
+	if (!is_noslot_pfn(pfn) && mmu_notifier_retry(vcpu->kvm, mmu_seq))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 50e268eb8e1a..ab54263d857c 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -869,7 +869,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 
 	r = RET_PF_RETRY;
 	spin_lock(&vcpu->kvm->mmu_lock);
-	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
+	if (!is_noslot_pfn(pfn) && mmu_notifier_retry(vcpu->kvm, mmu_seq))
 		goto out_unlock;
 
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
-- 
2.30.0.280.ga3ce27912f-goog


--MkeFThl52DvUx8zi
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

--MkeFThl52DvUx8zi--

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=JtRh=G6=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-14.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 419F5C433E0
	for <linux-arm-kernel@archiver.kernel.org>; Wed, 27 Jan 2021 18:24:41 +0000 (UTC)
Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id D62B464D9A
	for <linux-arm-kernel@archiver.kernel.org>; Wed, 27 Jan 2021 18:24:40 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D62B464D9A
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=merlin.20170209; h=Sender:Cc:List-Subscribe:
	List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:
	Content-Type:MIME-Version:References:Message-ID:Subject:To:From:Date:Reply-To
	:Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
	 bh=fOnkstCX0S/wJ68bI668qMIg2ZjrzqA/9dZKsePJWNM=; b=Z+nftwx1y4bql4fajhDiSf/EZ
	V15ZpowpocDqGYv94z0bvnlzpie5W1daKlwqQWhfHc6GOhyFwAHvjCSVQ2YU/S7g1Vf/Xils4pKVT
	OeMlGiYXRTTcebslMknuPuHZSFyhbyXH4mBMgBp46SzRIn/Oou05xBcIW7peSnNpVKkrH9p46D86D
	lCLeENbw9E6BZ+6WTQaDEcR9geUi0LVw8sWHcTE1HN5vJn88dntK/wxZrcnJbXASAowbegprhGgu4
	2Ef2x0ewDLskjLvR4gKev0HGHO71pcHkSfU/300h9NpqpdS5a0nAmkgGHrtrp7OH/EA2IFQ7kyFdK
	7q1IK98mg==;
Received: from localhost ([::1] helo=merlin.infradead.org)
	by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux))
	id 1l4pTT-0001Rx-Fu; Wed, 27 Jan 2021 18:23:27 +0000
Received: from mail-pf1-x42f.google.com ([2607:f8b0:4864:20::42f])
 by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux))
 id 1l4pTR-0001RW-E0
 for linux-arm-kernel@lists.infradead.org; Wed, 27 Jan 2021 18:23:26 +0000
Received: by mail-pf1-x42f.google.com with SMTP id u67so1745809pfb.3
 for <linux-arm-kernel@lists.infradead.org>;
 Wed, 27 Jan 2021 10:23:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to;
 bh=jJRPrAWF7MFclOHB2Lr1Il6UnBQA+yMSwpuT+dr3FeE=;
 b=p+JOm0MaAq0DKLuPmm62TKPoLEb63Ivgqxil78dFCdEx4sz1UgEhA/sr1202i1f6fO
 gP7dBosWqWE8SEl5Q9bXMvRttmO8MBkch9w74sMo3b4uL32ehzuvDUOBIATwr0R0jhK0
 wGRAbqIP9ZOtS4RGRPo2nOos/oPULUsm4x6AllyAgIBAyWdz6upYM6KonsT6v1VM7h02
 rCbHzpgKaZvHxKNM2a0EAofk1HjicEqG+hgMq7srXFKGVCqhp+ey+fC+nLuHf0PTADwe
 rQtWBV+yPvFIpO4pNyMKRoWlZirWQrvsMnCt7yvFN3uXHS8Umhyg+bkawtJCTQvJHdiF
 ARtA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=jJRPrAWF7MFclOHB2Lr1Il6UnBQA+yMSwpuT+dr3FeE=;
 b=BG+M38ftUHRstTtWEtPzweg2KpqfPPI5qD/A5JXLqjLExYZruLK7w3BWlRiPczjpB9
 uGUY0GW0EwYyA052KJQGMwKZzkb/5avpFJDlmi6WmoY13bKdC31lR8ot30ZGTE6+Sd6i
 78vlypDaTR38Bzy957e4OzacOw34MNF17+wzq++PKoLvWrVf0oeSyxqkRuitZRO60Ern
 spKHVQSzRVFEPreR1prVi0b+dfItYZ2NWgdtM1Dtqn6IqyOXE45YVar+9z/lkaTq75EM
 d/0GQqyHidLZrKguDY4tjN9AwoTaaIoUfQf2BGt0LDvVCrcZeXDnnM9/1hdAP7eY/rQ9
 vAiw==
X-Gm-Message-State: AOAM5310Ukq6+1Z4m2U5naS0aCVpY0h6BlpkDkD5OPap5Ltj3UC0sdLg
 w84+XyWcKCRE3FlAFTj5wO16oQ==
X-Google-Smtp-Source: ABdhPJwSTnTQ18ANJPAv6GUZl6bhKkOAA3IiPI3vNrcPn/QyI7/VnJlki21EIEAx7yZmuM3T0cOiNg==
X-Received: by 2002:a63:5459:: with SMTP id e25mr1610122pgm.403.1611771801851; 
 Wed, 27 Jan 2021 10:23:21 -0800 (PST)
Received: from google.com ([2620:15c:f:10:1ea0:b8ff:fe73:50f5])
 by smtp.gmail.com with ESMTPSA id m73sm3022198pga.25.2021.01.27.10.23.19
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 27 Jan 2021 10:23:21 -0800 (PST)
Date: Wed, 27 Jan 2021 10:23:14 -0800
From: Sean Christopherson <seanjc@google.com>
To: David Stevens <stevensd@chromium.org>
Subject: Re: [PATCH v2] KVM: x86/mmu: consider the hva in mmu_notifier retry
Message-ID: <YBGvku1KUUk6LPAj@google.com>
References: <20210127024504.613844-1-stevensd@google.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="MkeFThl52DvUx8zi"
Content-Disposition: inline
In-Reply-To: <20210127024504.613844-1-stevensd@google.com>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20210127_132325_601061_E4B7E7B6 
X-CRM114-Status: GOOD (  34.41  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Cc: Wanpeng Li <wanpengli@tencent.com>, kvm@vger.kernel.org,
 David Hildenbrand <david@redhat.com>, linux-mips@vger.kernel.org,
 Paul Mackerras <paulus@ozlabs.org>, Claudio Imbrenda <imbrenda@linux.ibm.com>,
 kvmarm@lists.cs.columbia.edu, Janosch Frank <frankja@linux.ibm.com>,
 Marc Zyngier <maz@kernel.org>, Joerg Roedel <joro@8bytes.org>,
 Huacai Chen <chenhuacai@kernel.org>,
 Christian Borntraeger <borntraeger@de.ibm.com>,
 Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
 Julien Thierry <julien.thierry.kdev@gmail.com>,
 Suzuki K Poulose <suzuki.poulose@arm.com>, kvm-ppc@vger.kernel.org,
 linux-arm-kernel@lists.infradead.org, Jim Mattson <jmattson@google.com>,
 Cornelia Huck <cohuck@redhat.com>, linux-kernel@vger.kernel.org,
 James Morse <james.morse@arm.com>, Paolo Bonzini <pbonzini@redhat.com>,
 Vitaly Kuznetsov <vkuznets@redhat.com>
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org


--MkeFThl52DvUx8zi
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Wed, Jan 27, 2021, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> Track the range being invalidated by mmu_notifier and skip page fault
> retries if the fault address is not affected by the in-progress
> invalidation. Handle concurrent invalidations by finding the minimal
> range which includes all ranges being invalidated. Although the combined
> range may include unrelated addresses and cannot be shrunk as individual
> invalidation operations complete, it is unlikely the marginal gains of
> proper range tracking are worth the additional complexity.
> 
> The primary benefit of this change is the reduction in the likelihood of
> extreme latency when handing a page fault due to another thread having
> been preempted while modifying host virtual addresses.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
> v1 -> v2:
>  - improve handling of concurrent invalidation requests by unioning
>    ranges, instead of just giving up and using [0, ULONG_MAX).

Ooh, even better.

>  - add lockdep check
>  - code comments and formatting
> 
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |  2 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
>  arch/x86/kvm/mmu/mmu.c                 | 16 ++++++++------
>  arch/x86/kvm/mmu/paging_tmpl.h         |  7 ++++---
>  include/linux/kvm_host.h               | 27 +++++++++++++++++++++++-
>  virt/kvm/kvm_main.c                    | 29 ++++++++++++++++++++++----
>  6 files changed, 67 insertions(+), 16 deletions(-)
> 

...

> @@ -3717,7 +3720,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
>  
> -	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable))
> +	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
> +			 write, &map_writable))
>  		return RET_PF_RETRY;
>  
>  	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
> @@ -3725,7 +3729,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  
>  	r = RET_PF_RETRY;
>  	spin_lock(&vcpu->kvm->mmu_lock);
> -	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
> +	if (mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))

'hva' will be uninitialized at this point if the gfn did not resolve to a
memslot, i.e. when handling an MMIO page fault.  On the plus side, that's an
opportunity for another optimization as there is no need to retry MMIO page
faults on mmu_notifier invalidations.  Including the attached patch as a preqreq
to this will avoid consuming an uninitialized 'hva'.


>  		goto out_unlock;
>  	r = make_mmu_pages_available(vcpu);
>  	if (r)

...

>  void kvm_release_pfn_clean(kvm_pfn_t pfn);
>  void kvm_release_pfn_dirty(kvm_pfn_t pfn);
> @@ -1203,6 +1206,28 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
>  		return 1;
>  	return 0;
>  }
> +
> +static inline int mmu_notifier_retry_hva(struct kvm *kvm,
> +					 unsigned long mmu_seq,
> +					 unsigned long hva)
> +{
> +#ifdef CONFIG_LOCKDEP
> +	lockdep_is_held(&kvm->mmu_lock);

No need to manually do the #ifdef, just use lockdep_assert_held instead of
lockdep_is_held.

> +#endif
> +	/*
> +	 * If mmu_notifier_count is non-zero, then the range maintained by
> +	 * kvm_mmu_notifier_invalidate_range_start contains all addresses that
> +	 * might be being invalidated. Note that it may include some false
> +	 * positives, due to shortcuts when handing concurrent invalidations.
> +	 */
> +	if (unlikely(kvm->mmu_notifier_count) &&
> +	    kvm->mmu_notifier_range_start <= hva &&
> +	    hva < kvm->mmu_notifier_range_end)

Uber nit: I find this easier to read if 'hva' is on the left-hand side for both
checks, i.e.

	if (unlikely(kvm->mmu_notifier_count) &&
	    hva >= kvm->mmu_notifier_range_start &&
	    hva < kvm->mmu_notifier_range_end)

> +		return 1;
> +	if (kvm->mmu_notifier_seq != mmu_seq)
> +		return 1;
> +	return 0;
> +}
>  #endif
>  
>  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING

--MkeFThl52DvUx8zi
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment;
	filename="0001-KVM-x86-mmu-Skip-mmu_notifier-check-when-handling-MM.patch"

>From a1bfdc6fe16582440815cfecc656313dff993003 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 27 Jan 2021 10:04:45 -0800
Subject: [PATCH] KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page
 fault

Don't retry a page fault due to an mmu_notifier invalidation when
handling a page fault for a GPA that did not resolve to a memslot, i.e.
an MMIO page fault.  Invalidations from the mmu_notifier signal a change
in a host virtual address (HVA) mapping; without a memslot, there is no
HVA and thus no possibility that the invalidation is relevant to the
page fault being handled.

Note, the MMIO vs. memslot generation checks handle the case where a
pending memslot will create a memslot overlapping the faulting GPA.  The
mmu_notifier checks are orthogonal to memslot updates.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 2 +-
 arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d16481aa29d..9ac0a727015d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3725,7 +3725,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	r = RET_PF_RETRY;
 	spin_lock(&vcpu->kvm->mmu_lock);
-	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
+	if (!is_noslot_pfn(pfn) && mmu_notifier_retry(vcpu->kvm, mmu_seq))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 50e268eb8e1a..ab54263d857c 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -869,7 +869,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 
 	r = RET_PF_RETRY;
 	spin_lock(&vcpu->kvm->mmu_lock);
-	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
+	if (!is_noslot_pfn(pfn) && mmu_notifier_retry(vcpu->kvm, mmu_seq))
 		goto out_unlock;
 
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
-- 
2.30.0.280.ga3ce27912f-goog


--MkeFThl52DvUx8zi
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

--MkeFThl52DvUx8zi--


From mboxrd@z Thu Jan  1 00:00:00 1970
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 27 Jan 2021 18:23:14 +0000
Subject: Re: [PATCH v2] KVM: x86/mmu: consider the hva in mmu_notifier retry
Message-Id: <YBGvku1KUUk6LPAj@google.com>
MIME-Version: 1
Content-Type: multipart/mixed; boundary="MkeFThl52DvUx8zi"
List-Id: <kvm-ppc.vger.kernel.org>
References: <20210127024504.613844-1-stevensd@google.com>
In-Reply-To: <20210127024504.613844-1-stevensd@google.com>
To: David Stevens <stevensd@chromium.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>, Julien Thierry <julien.thierry.kdev@gmail.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, Huacai Chen <chenhuacai@kernel.org>, Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>, linux-mips@vger.kernel.org, Paul Mackerras <paulus@ozlabs.org>, kvm-ppc@vger.kernel.org, Christian Borntraeger <borntraeger@de.ibm.com>, Janosch Frank <frankja@linux.ibm.com>, David Hildenbrand <david@redhat.com>, Cornelia Huck <cohuck@redhat.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>


--MkeFThl52DvUx8zi
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Wed, Jan 27, 2021, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> Track the range being invalidated by mmu_notifier and skip page fault
> retries if the fault address is not affected by the in-progress
> invalidation. Handle concurrent invalidations by finding the minimal
> range which includes all ranges being invalidated. Although the combined
> range may include unrelated addresses and cannot be shrunk as individual
> invalidation operations complete, it is unlikely the marginal gains of
> proper range tracking are worth the additional complexity.
> 
> The primary benefit of this change is the reduction in the likelihood of
> extreme latency when handing a page fault due to another thread having
> been preempted while modifying host virtual addresses.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
> v1 -> v2:
>  - improve handling of concurrent invalidation requests by unioning
>    ranges, instead of just giving up and using [0, ULONG_MAX).

Ooh, even better.

>  - add lockdep check
>  - code comments and formatting
> 
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |  2 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
>  arch/x86/kvm/mmu/mmu.c                 | 16 ++++++++------
>  arch/x86/kvm/mmu/paging_tmpl.h         |  7 ++++---
>  include/linux/kvm_host.h               | 27 +++++++++++++++++++++++-
>  virt/kvm/kvm_main.c                    | 29 ++++++++++++++++++++++----
>  6 files changed, 67 insertions(+), 16 deletions(-)
> 

...

> @@ -3717,7 +3720,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	smp_rmb();
>  
> -	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable))
> +	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
> +			 write, &map_writable))
>  		return RET_PF_RETRY;
>  
>  	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
> @@ -3725,7 +3729,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  
>  	r = RET_PF_RETRY;
>  	spin_lock(&vcpu->kvm->mmu_lock);
> -	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
> +	if (mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))

'hva' will be uninitialized at this point if the gfn did not resolve to a
memslot, i.e. when handling an MMIO page fault.  On the plus side, that's an
opportunity for another optimization as there is no need to retry MMIO page
faults on mmu_notifier invalidations.  Including the attached patch as a preqreq
to this will avoid consuming an uninitialized 'hva'.


>  		goto out_unlock;
>  	r = make_mmu_pages_available(vcpu);
>  	if (r)

...

>  void kvm_release_pfn_clean(kvm_pfn_t pfn);
>  void kvm_release_pfn_dirty(kvm_pfn_t pfn);
> @@ -1203,6 +1206,28 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
>  		return 1;
>  	return 0;
>  }
> +
> +static inline int mmu_notifier_retry_hva(struct kvm *kvm,
> +					 unsigned long mmu_seq,
> +					 unsigned long hva)
> +{
> +#ifdef CONFIG_LOCKDEP
> +	lockdep_is_held(&kvm->mmu_lock);

No need to manually do the #ifdef, just use lockdep_assert_held instead of
lockdep_is_held.

> +#endif
> +	/*
> +	 * If mmu_notifier_count is non-zero, then the range maintained by
> +	 * kvm_mmu_notifier_invalidate_range_start contains all addresses that
> +	 * might be being invalidated. Note that it may include some false
> +	 * positives, due to shortcuts when handing concurrent invalidations.
> +	 */
> +	if (unlikely(kvm->mmu_notifier_count) &&
> +	    kvm->mmu_notifier_range_start <= hva &&
> +	    hva < kvm->mmu_notifier_range_end)

Uber nit: I find this easier to read if 'hva' is on the left-hand side for both
checks, i.e.

	if (unlikely(kvm->mmu_notifier_count) &&
	    hva >= kvm->mmu_notifier_range_start &&
	    hva < kvm->mmu_notifier_range_end)

> +		return 1;
> +	if (kvm->mmu_notifier_seq != mmu_seq)
> +		return 1;
> +	return 0;
> +}
>  #endif
>  
>  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING

--MkeFThl52DvUx8zi
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment;
	filename="0001-KVM-x86-mmu-Skip-mmu_notifier-check-when-handling-MM.patch"

>From a1bfdc6fe16582440815cfecc656313dff993003 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 27 Jan 2021 10:04:45 -0800
Subject: [PATCH] KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page
 fault

Don't retry a page fault due to an mmu_notifier invalidation when
handling a page fault for a GPA that did not resolve to a memslot, i.e.
an MMIO page fault.  Invalidations from the mmu_notifier signal a change
in a host virtual address (HVA) mapping; without a memslot, there is no
HVA and thus no possibility that the invalidation is relevant to the
page fault being handled.

Note, the MMIO vs. memslot generation checks handle the case where a
pending memslot will create a memslot overlapping the faulting GPA.  The
mmu_notifier checks are orthogonal to memslot updates.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 2 +-
 arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d16481aa29d..9ac0a727015d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3725,7 +3725,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	r = RET_PF_RETRY;
 	spin_lock(&vcpu->kvm->mmu_lock);
-	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
+	if (!is_noslot_pfn(pfn) && mmu_notifier_retry(vcpu->kvm, mmu_seq))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 50e268eb8e1a..ab54263d857c 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -869,7 +869,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 
 	r = RET_PF_RETRY;
 	spin_lock(&vcpu->kvm->mmu_lock);
-	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
+	if (!is_noslot_pfn(pfn) && mmu_notifier_retry(vcpu->kvm, mmu_seq))
 		goto out_unlock;
 
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
-- 
2.30.0.280.ga3ce27912f-goog


--MkeFThl52DvUx8zi--