From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BACAC4320A for ; Wed, 18 Aug 2021 10:38:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0211F6108F for ; Wed, 18 Aug 2021 10:38:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234655AbhHRKir (ORCPT ); Wed, 18 Aug 2021 06:38:47 -0400 Received: from mail.kernel.org ([198.145.29.99]:47916 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234353AbhHRKij (ORCPT ); Wed, 18 Aug 2021 06:38:39 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6C8D76108E; Wed, 18 Aug 2021 10:38:04 +0000 (UTC) Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mGIxO-005jGE-Hj; Wed, 18 Aug 2021 11:38:02 +0100 Date: Wed, 18 Aug 2021 11:38:02 +0100 Message-ID: <87r1errqb9.wl-maz@kernel.org> From: Marc Zyngier To: Oliver Upton Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, Peter Shier , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , James Morse , Alexandru Elisei , Suzuki K Poulose Subject: Re: [PATCH 2/4] KVM: arm64: Handle PSCI resets before userspace touches vCPU state In-Reply-To: <20210818085047.1005285-3-oupton@google.com> References: <20210818085047.1005285-1-oupton@google.com> <20210818085047.1005285-3-oupton@google.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: oupton@google.com, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, pshier@google.com, ricarkol@google.com, jingzhangos@google.com, rananta@google.com, james.morse@arm.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Wed, 18 Aug 2021 09:50:45 +0100, Oliver Upton wrote: > > The CPU_ON PSCI call takes a payload that KVM uses to configure a > destination vCPU to run. This payload is non-architectural state and not > exposed through any existing UAPI. Effectively, we have a race between > CPU_ON and userspace saving/restoring a guest: if the target vCPU isn't > ran again before the VMM saves its state, the requested PC and context > ID are lost. When restored, the target vCPU will be runnable and start > executing at its old PC. > > We can avoid this race by making sure the reset payload is serviced > before userspace can access a vCPU's state. This is, of course, a hairy > ugly hack. A benefit of such a hack, though, is that we've managed to > massage the reset state into the architected state, thereby making it > migratable without forcing userspace to play our game with a UAPI > addition. I don't think it is that bad. In a way, it is similar to the "resync pending exception state" dance that we do on vcpu exit to userspace. One thing to note is that it only works because this is done from the vcpu thread itself. > > Fixes: 358b28f09f0a ("arm/arm64: KVM: Allow a VCPU to fully reset itself") > Signed-off-by: Oliver Upton > --- > I really hate this, but my imagination is failing me on any other way to > cure the race without cluing in userspace. Any ideas? > > arch/arm64/kvm/arm.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index 0de4b41c3706..6b124c29c663 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -1216,6 +1216,15 @@ long kvm_arch_vcpu_ioctl(struct file *filp, > if (copy_from_user(®, argp, sizeof(reg))) > break; > > + /* > + * ugly hack. We could owe a reset due to PSCI and not yet > + * serviced it. Prevent userspace from reading/writing state > + * that will be clobbered by the eventual handling of the reset > + * bit. This reads a bit odd. You are taking care of two potential issues in one go here: - userspace writes won't be overwritten by a pending reset as they will take place after said reset - userspace reads will reflect the state of the freshly reset CPU instead of some stale state > + */ > + if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu)) > + kvm_reset_vcpu(vcpu); > + > if (ioctl == KVM_SET_ONE_REG) > r = kvm_arm_set_reg(vcpu, ®); > else Otherwise, well spotted. Thanks, M. -- Without deviation from the norm, progress is not possible.