From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68731C32753 for ; Wed, 14 Aug 2019 14:19:52 +0000 (UTC) Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by mail.kernel.org (Postfix) with ESMTP id 0391A2133F for ; Wed, 14 Aug 2019 14:19:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0391A2133F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvmarm-bounces@lists.cs.columbia.edu Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 6CAE94A4C1; Wed, 14 Aug 2019 10:19:51 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ux7NS-wU2xvC; Wed, 14 Aug 2019 10:19:49 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id EAACF4A535; Wed, 14 Aug 2019 10:19:49 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id C88944A527 for ; Wed, 14 Aug 2019 10:19:48 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3p0r49OHpDJq for ; Wed, 14 Aug 2019 10:19:47 -0400 (EDT) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 3C0F54A4C1 for ; Wed, 14 Aug 2019 10:19:47 -0400 (EDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9965828; Wed, 14 Aug 2019 07:19:46 -0700 (PDT) Received: from big-swifty.misterjones.org (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BBF673F694; Wed, 14 Aug 2019 07:19:41 -0700 (PDT) Date: Wed, 14 Aug 2019 15:19:02 +0100 Message-ID: <8636i3omnd.wl-maz@kernel.org> From: Marc Zyngier To: Alexander Graf Subject: Re: [PATCH 0/9] arm64: Stolen time support In-Reply-To: <8ca5c106-7c12-4c6e-6d81-a90f281a9894@amazon.com> References: <20190802145017.42543-1-steven.price@arm.com> <20190803190522.5fec8f7d@why> <6789f477-8ab5-cc54-1ad2-8627917b07c9@arm.com> <8ca5c106-7c12-4c6e-6d81-a90f281a9894@amazon.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 EasyPG/1.0.0 Emacs/26 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Organization: Approximate MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Cc: kvm@vger.kernel.org, linux-doc@vger.kernel.org, Catalin Marinas , linux-kernel@vger.kernel.org, Russell King , Paolo Bonzini , Steven Price , Will Deacon , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Wed, 14 Aug 2019 14:02:25 +0100, Alexander Graf wrote: > > > > On 05.08.19 15:06, Steven Price wrote: > > On 03/08/2019 19:05, Marc Zyngier wrote: > >> On Fri, 2 Aug 2019 15:50:08 +0100 > >> Steven Price wrote: > >> > >> Hi Steven, > >> > >>> This series add support for paravirtualized time for arm64 guests and > >>> KVM hosts following the specification in Arm's document DEN 0057A: > >>> > >>> https://developer.arm.com/docs/den0057/a > >>> > >>> It implements support for stolen time, allowing the guest to > >>> identify time when it is forcibly not executing. > >>> > >>> It doesn't implement support for Live Physical Time (LPT) as there are > >>> some concerns about the overheads and approach in the above > >>> specification, and I expect an updated version of the specification to > >>> be released soon with just the stolen time parts. > >> > >> Thanks for posting this. > >> > >> My current concern with this series is around the fact that we allocate > >> memory from the kernel on behalf of the guest. It is the first example > >> of such thing in the ARM port, and I can't really say I'm fond of it. > >> > >> x86 seems to get away with it by having the memory allocated from > >> userspace, why I tend to like more. Yes, put_user is more > >> expensive than a straight store, but this isn't done too often either. > >> > >> What is the rational for your current approach? > > > > As I see it there are 3 approaches that can be taken here: > > > > 1. Hypervisor allocates memory and adds it to the virtual machine. This > > means that everything to do with the 'device' is encapsulated behind the > > KVM_CREATE_DEVICE / KVM_[GS]ET_DEVICE_ATTR ioctls. But since we want the > > stolen time structure to be fast it cannot be a trapping region and has > > to be backed by real memory - in this case allocated by the host kernel. > > > > 2. Host user space allocates memory. Similar to above, but this time > > user space needs to manage the memory region as well as the usual > > KVM_CREATE_DEVICE dance. I've no objection to this, but it means > > kvmtool/QEMU needs to be much more aware of what is going on (e.g. how > > to size the memory region). > > You ideally want to get the host overhead for a VM to as little as you > can. I'm not terribly fond of the idea of reserving a full page just > because we're too afraid of having the guest donate memory. Well, reduce the amount of memory you give to the guest by one page, and allocate that page to the stolen time device. Problem solved! Seriously, if you're worried about the allocation of a single page, you should first look at how many holes we have in the vcpu structure, for example (even better, with the 8.4 NV patches applied). Just fixing that would give you that page back *per vcpu*. > > 3. Guest kernel "donates" the memory to the hypervisor for the > > structure. As far as I'm aware this is what x86 does. The problems I see > > this approach are: > > > > a) kexec becomes much more tricky - there needs to be a disabling > > mechanism for the guest to stop the hypervisor scribbling on memory > > before starting the new kernel. > > I wouldn't call "quiesce a device" much more tricky. We have to do > that for other devices as well today. And since there is no standard way of doing it, we keep inventing weird and wonderful ways of doing so -- cue the terrible GICv3 LPI situation, and all the various hacks to keep existing IOMMU mappings around across firmware/kernel handovers as well as kexec. > > > b) If there is more than one entity that is interested in the > > information (e.g. firmware and kernel) then this requires some form of > > arbitration in the guest because the hypervisor doesn't want to have to > > track an arbitrary number of regions to update. > > Why would FW care? Exactly. It doesn't care. Not caring means it doesn't know about the page the guest has allocated for stolen time, and starts using it for its own purposes. Hello, memory corruption. Same thing goes if you reboot into a non stolen time aware kernel. > > > c) Performance can suffer if the host kernel doesn't have a suitably > > aligned/sized area to use. As you say - put_user() is more expensive. > > Just define the interface to always require natural alignment when > donating a memory location? > > > The structure is updated on every return to the VM. > > If you really do suffer from put_user(), there are alternatives. You > could just map the page on the registration hcall and then leave it > pinned until the vcpu gets destroyed again. put_user() should be cheap enough. It is one of the things we tend to optimise anyway. And yes, worse case, we pin the page. > > > Of course x86 does prove the third approach can work, but I'm not sure > > which is actually better. Avoid the kexec cancellation requirements was > > the main driver of the current approach. Although many of the > > I really don't understand the problem with kexec cancellation. Worst > case, let guest FW set it up for you and propagate only the address > down via ACPI/DT. That way you can mark the respective memory as > reserved too. We already went down that road with the LPI hack. I'm not going there again if we can avoid it. And it turn out that we can. Just allocate the stolen time page as a separate memblock, give it to KVM for that purpose. Your suggestion of letting the guest firmware set something up only works if whatever you're booting after that understands it. If it doesn't, you're screwed. > But even with a Linux only mechanism, just take a look at > arch/x86/kernel/kvmclock.c. All they do to remove the map is to hook > into machine_crash_shutdown() and machine_shutdown(). I'm not going to take something that is Linux specific. It has to work for all guests, at all times, whether they know about the hypervisor service or not. M. > > > Alex > > > conversations about this were also tied up with Live Physical Time which > > adds its own complications. > > > > Steve > > _______________________________________________ > > kvmarm mailing list > > kvmarm@lists.cs.columbia.edu > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > > -- Jazz is not dead, it just smells funny. _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm