From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=G/HE=7K=vger.kernel.org=kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8EF4EC433E0
	for <kvm@archiver.kernel.org>; Thu, 28 May 2020 01:54:20 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 6D68C2145D
	for <kvm@archiver.kernel.org>; Thu, 28 May 2020 01:54:20 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=ozlabs.org header.i=@ozlabs.org header.b="Sc5+sYr+"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726519AbgE1ByT (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Wed, 27 May 2020 21:54:19 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38002 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1725920AbgE1ByT (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 27 May 2020 21:54:19 -0400
Received: from ozlabs.org (ozlabs.org [IPv6:2401:3900:2:1::2])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC795C08C5C1;
        Wed, 27 May 2020 18:54:18 -0700 (PDT)
Received: by ozlabs.org (Postfix, from userid 1003)
        id 49XW2v5CXhz9sSc; Thu, 28 May 2020 11:54:15 +1000 (AEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ozlabs.org; s=201707;
        t=1590630855; bh=1dkTCAmepa8D/tUnw26bVomyWCJr3VvbPcigofmw/oU=;
        h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
        b=Sc5+sYr+mgtPlC2bcDEWWiSpUGaA9k83bVXJJvAxR4BNVIEXZZmBRZGb1VpI+V8wc
         VwY11SIAqhVBxCJNilOa9FjjGCvf1i5Pk4OudUoNuoEzZLTl1FhYc7k+HVoOb98Z+g
         Vr9LkygcVlFBW8My7BFkivxBCUwy4+auEiInxLATDRHnD08zCEz8A0TApKfmqN6i8W
         eaxmldtwbxC5i8jTMadPVdzQA+5QnIOnkSZXD7M+OCZC3gjFks1IDnn7t9UNVy7Bpm
         +1yAafc7v8Hr+10CDCKaTaFa1OTt6RZ+NA6M1+COT5O9zSed8jDpCUn6ubhC4j6003
         HQy1GZnO6PopA==
Date:   Thu, 28 May 2020 11:54:10 +1000
From:   Paul Mackerras <paulus@ozlabs.org>
To:     kvm@vger.kernel.org
Cc:     kvm-ppc@vger.kernel.org, Laurent Vivier <lvivier@redhat.com>,
        David Gibson <david@gibson.dropbear.id.au>,
        Nick Piggin <npiggin@au1.ibm.com>
Subject: [PATCH 2/2] KVM: PPC: Book3S HV: Close race with page faults around
 memslot flushes
Message-ID: <20200528015410.GE307798@thinks.paulus.ozlabs.org>
References: <20200528015331.GD307798@thinks.paulus.ozlabs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20200528015331.GD307798@thinks.paulus.ozlabs.org>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

There is a potential race condition between hypervisor page faults
and flushing a memslot.  It is possible for a page fault to read the
memslot before a memslot is updated and then write a PTE to the
partition-scoped page tables after kvmppc_radix_flush_memslot has
completed.  (Note that this race has never been explicitly observed.)

To close this race, it is sufficient to increment the MMU sequence
number while the kvm->mmu_lock is held.  That will cause
mmu_notifier_retry() to return true, and the page fault will then
return to the guest without inserting a PTE.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index bc3f795..aa41183 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1130,6 +1130,11 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
 					 kvm->arch.lpid);
 		gpa += PAGE_SIZE;
 	}
+	/*
+	 * Increase the mmu notifier sequence number to prevent any page
+	 * fault that read the memslot earlier from writing a PTE.
+	 */
+	kvm->mmu_notifier_seq++;
 	spin_unlock(&kvm->mmu_lock);
 }
 
-- 
2.7.4


From mboxrd@z Thu Jan  1 00:00:00 1970
From: Paul Mackerras <paulus@ozlabs.org>
Date: Thu, 28 May 2020 01:54:10 +0000
Subject: [PATCH 2/2] KVM: PPC: Book3S HV: Close race with page faults around memslot flushes
Message-Id: <20200528015410.GE307798@thinks.paulus.ozlabs.org>
List-Id: <kvm-ppc.vger.kernel.org>
References: <20200528015331.GD307798@thinks.paulus.ozlabs.org>
In-Reply-To: <20200528015331.GD307798@thinks.paulus.ozlabs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: kvm@vger.kernel.org
Cc: kvm-ppc@vger.kernel.org, Laurent Vivier <lvivier@redhat.com>, David Gibson <david@gibson.dropbear.id.au>, Nick Piggin <npiggin@au1.ibm.com>

There is a potential race condition between hypervisor page faults
and flushing a memslot.  It is possible for a page fault to read the
memslot before a memslot is updated and then write a PTE to the
partition-scoped page tables after kvmppc_radix_flush_memslot has
completed.  (Note that this race has never been explicitly observed.)

To close this race, it is sufficient to increment the MMU sequence
number while the kvm->mmu_lock is held.  That will cause
mmu_notifier_retry() to return true, and the page fault will then
return to the guest without inserting a PTE.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index bc3f795..aa41183 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1130,6 +1130,11 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
 					 kvm->arch.lpid);
 		gpa += PAGE_SIZE;
 	}
+	/*
+	 * Increase the mmu notifier sequence number to prevent any page
+	 * fault that read the memslot earlier from writing a PTE.
+	 */
+	kvm->mmu_notifier_seq++;
 	spin_unlock(&kvm->mmu_lock);
 }
 
-- 
2.7.4