From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by smtp.lore.kernel.org (Postfix) with ESMTP id F17FFC43217 for ; Tue, 19 Apr 2022 13:51:17 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 4816E4B1C0; Tue, 19 Apr 2022 09:51:17 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6y9DOt0wqjyX; Tue, 19 Apr 2022 09:51:16 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 0B7674B205; Tue, 19 Apr 2022 09:51:16 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id F264F4B204 for ; Tue, 19 Apr 2022 09:51:14 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WPlqvyQ0YIpG for ; Tue, 19 Apr 2022 09:51:13 -0400 (EDT) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mm01.cs.columbia.edu (Postfix) with ESMTP id AE1F04B1AA for ; Tue, 19 Apr 2022 09:51:13 -0400 (EDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3E1E71063; Tue, 19 Apr 2022 06:51:13 -0700 (PDT) Received: from monolith.localdoman (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EDB6F3F766; Tue, 19 Apr 2022 06:51:11 -0700 (PDT) Date: Tue, 19 Apr 2022 14:51:05 +0100 From: Alexandru Elisei To: will@kernel.org, mark.rutland@arm.com, linux-arm-kernel@lists.infradead.org, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, kvmarm@lists.cs.columbia.edu Subject: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu The approach I've taken so far in adding support for SPE in KVM [1] relies on pinning the entire VM memory to avoid SPE triggering stage 2 faults altogether. I've taken this approach because: 1. SPE reports the guest VA on an stage 2 fault, similar to stage 1 faults, and at the moment KVM has no way to resolve the VA to IPA translation. The AT instruction is not useful here, because PAR_EL1 doesn't report the IPA in the case of a stage 2 fault on a stage 1 translation table walk. 2. The stage 2 fault is reported asynchronously via an interrupt, which means there will be a window where profiling is stopped from the moment SPE triggers the fault and when the PE taks the interrupt. This blackout window is obviously not present when running on bare metal, as there is no second stage of address translation being performed. I've been thinking about this approach and I was considering translating the VA reported by SPE to the IPA instead, thus treating the SPE stage 2 data aborts more like regular (MMU) data aborts. As I see it, this approach has several merits over memory pinning: - The stage 1 translation table walker is also needed for nested virtualization, to emulate AT S1* instructions executed by the L1 guest hypervisor. - Walking the guest's translation tables is less of a departure from the way KVM manages physical memory for a virtual machine today. I had a discussion with Mark offline about this approach and he expressed a very sensible concern: when a guest is profiling, there is a blackout window where profiling is stopped which doesn't happen on bare metal (point 2 above). My questions are: 1. Is having this blackout window, regardless of its size, unnacceptable? If it is, then I'll continue with the memory pinning approach. 2. If having a blackout window is acceptable, how large can this window be before it becomes too much? I can try to take some performance measurements to evaluate the blackout window when using a stage 1 walker in relation to the buffer write speed on different hardware. I have access to an N1SDP machine and an Ampere Altra for this. [1] https://lore.kernel.org/all/20211117153842.302159-1-alexandru.elisei@arm.com/ Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CD9A9C433F5 for ; Tue, 19 Apr 2022 14:53:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Subject:To:From :Date:Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=vI6tiW2ChwHM5uAEQFNrpIqMXKPucifBD7QFCcDDO88=; b=qV1tUBjt8i5bob /1WiQa00pP55qp8Mc1243x1W/8D8MjCcNe911vJc8t0guKJeNg3to7gxRsWMgZ88TiEbJonl2i359 u/m+FS1xbmSvmdrMzjclG884XSD8beudc5A6FJyyKX01085Kwr+tsTdywgslrvSmkz2gyIpDoM9o1 Q0O5ChQbinfydislYCiOLHauvYIte9eynJ0fHo+hZ/1oczrJeR5KqoksNZYUawcIu38PHCOrd/zs3 uFTe3TnD5ZS9DFEDDfeqFAGoyOZ46bz1Y1+pbHGZ66jSRFaWmbwtgrGXJn5rmBxk5JglyiSZF5fhR xfFK9181+c5gs+uW6zRw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1ngpCl-004Pho-7S; Tue, 19 Apr 2022 14:51:47 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1ngoGD-00418X-1w for linux-arm-kernel@lists.infradead.org; Tue, 19 Apr 2022 13:51:19 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3E1E71063; Tue, 19 Apr 2022 06:51:13 -0700 (PDT) Received: from monolith.localdoman (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EDB6F3F766; Tue, 19 Apr 2022 06:51:11 -0700 (PDT) Date: Tue, 19 Apr 2022 14:51:05 +0100 From: Alexandru Elisei To: will@kernel.org, mark.rutland@arm.com, linux-arm-kernel@lists.infradead.org, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, kvmarm@lists.cs.columbia.edu Subject: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220419_065117_202357_76EF6B23 X-CRM114-Status: GOOD ( 13.46 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The approach I've taken so far in adding support for SPE in KVM [1] relies on pinning the entire VM memory to avoid SPE triggering stage 2 faults altogether. I've taken this approach because: 1. SPE reports the guest VA on an stage 2 fault, similar to stage 1 faults, and at the moment KVM has no way to resolve the VA to IPA translation. The AT instruction is not useful here, because PAR_EL1 doesn't report the IPA in the case of a stage 2 fault on a stage 1 translation table walk. 2. The stage 2 fault is reported asynchronously via an interrupt, which means there will be a window where profiling is stopped from the moment SPE triggers the fault and when the PE taks the interrupt. This blackout window is obviously not present when running on bare metal, as there is no second stage of address translation being performed. I've been thinking about this approach and I was considering translating the VA reported by SPE to the IPA instead, thus treating the SPE stage 2 data aborts more like regular (MMU) data aborts. As I see it, this approach has several merits over memory pinning: - The stage 1 translation table walker is also needed for nested virtualization, to emulate AT S1* instructions executed by the L1 guest hypervisor. - Walking the guest's translation tables is less of a departure from the way KVM manages physical memory for a virtual machine today. I had a discussion with Mark offline about this approach and he expressed a very sensible concern: when a guest is profiling, there is a blackout window where profiling is stopped which doesn't happen on bare metal (point 2 above). My questions are: 1. Is having this blackout window, regardless of its size, unnacceptable? If it is, then I'll continue with the memory pinning approach. 2. If having a blackout window is acceptable, how large can this window be before it becomes too much? I can try to take some performance measurements to evaluate the blackout window when using a stage 1 walker in relation to the buffer write speed on different hardware. I have access to an N1SDP machine and an Ampere Altra for this. [1] https://lore.kernel.org/all/20211117153842.302159-1-alexandru.elisei@arm.com/ Thanks, Alex _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel