From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C35AC48BDF for ; Wed, 9 Jun 2021 10:23:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3F2E561359 for ; Wed, 9 Jun 2021 10:23:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236050AbhFIKZc (ORCPT ); Wed, 9 Jun 2021 06:25:32 -0400 Received: from mail.kernel.org ([198.145.29.99]:37356 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234964AbhFIKZb (ORCPT ); Wed, 9 Jun 2021 06:25:31 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id ABF1F611BD; Wed, 9 Jun 2021 10:23:37 +0000 (UTC) Received: from 82-132-234-177.dab.02.net ([82.132.234.177] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1lqvN1-006SFR-K6; Wed, 09 Jun 2021 11:23:35 +0100 Date: Wed, 09 Jun 2021 11:23:34 +0100 Message-ID: <877dj3z68p.wl-maz@kernel.org> From: Marc Zyngier To: Oliver Upton Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, Paolo Bonzini , Sean Christopherson , Peter Shier , Jim Mattson , David Matlack , Ricardo Koller , Jing Zhang , Raghavendra Rao Anata , Alexandru Elisei , James Morse , Suzuki K Poulose Subject: Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE In-Reply-To: <20210608214742.1897483-3-oupton@google.com> References: <20210608214742.1897483-1-oupton@google.com> <20210608214742.1897483-3-oupton@google.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 82.132.234.177 X-SA-Exim-Rcpt-To: oupton@google.com, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, pbonzini@redhat.com, seanjc@google.com, pshier@google.com, jmattson@google.com, dmatlack@google.com, ricarkol@google.com, jingzhangos@google.com, rananta@google.com, Alexandru.Elisei@arm.com, james.morse@arm.com, suzuki.poulose@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Hi Oliver, Please Cc the KVM/arm64 reviewers (now added). Also, please consider subscribing to the kvmarm mailing list so that I don't have to manually approve your posts ;-). On Tue, 08 Jun 2021 22:47:34 +0100, Oliver Upton wrote: > > ARMv8 provides for a virtual counter-timer offset that is added to guest > views of the virtual counter-timer (CNTVOFF_EL2). To date, KVM has not > provided userspace with any perception of this, and instead affords a > value-based scheme of migrating the virtual counter-timer by directly > reading/writing the guest's CNTVCT_EL0. This is problematic because > counters continue to elapse while the register is being written, meaning > it is possible for drift to sneak in to the guest's time scale. This is > exacerbated by the fact that KVM will calculate an appropriate > CNTVOFF_EL2 every time the register is written, which will be broadcast > to all virtual CPUs. The only possible way to avoid causing guest time > to drift is to restore counter-timers by offset. Well, the current method has one huge advantage: time can never go backward from the guest PoV if you restore what you have saved. Yes, time can elapse, but you don't even need to migrate to observe that. > > Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls > to migrate the value of CNTVOFF_EL2. These ioctls yield precise control > of the virtual counter-timers to userspace, allowing it to define its > own heuristics for managing vCPU offsets. I'm not really in favour of inventing a completely new API, for multiple reasons: - CNTVOFF is an EL2 concept. I'd rather not expose it as such as it becomes really confusing with NV (which does expose its own CNTVOFF via the ONE_REG interface) - You seem to allow each vcpu to get its own offset. I don't think that's right. The architecture defines that all PEs have the same view of the counters, and an EL1 guest should be given that illusion. - by having a parallel save/restore interface, you make it harder to reason about what happens with concurrent calls to both interfaces - the userspace API is already horribly bloated, and I'm not overly keen on adding more if we can avoid it. I'd rather you extend the current ONE_REG interface and make it modal, either allowing the restore of an absolute value or an offset for CNTVCT_EL0. This would also keep a consistent behaviour when restoring vcpus. The same logic would apply to the physical offset. As for how to make it modal, we have plenty of bits left in the ONE_REG encoding. Pick one, and make that a "relative" attribute. This will result in some minor surgery in the get/set code paths, but at least no entirely new mechanism. One question though: how do you plan to reliably compute the offset? As far as I can see, it is subject to the same issues you described above (while the guest is being restored, time flies), and you have the added risk of exposing a counter going backward from a guest perspective. Thanks, M. -- Without deviation from the norm, progress is not possible.