From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by smtp.lore.kernel.org (Postfix) with ESMTP id A123BC04A68 for ; Wed, 27 Jul 2022 16:06:37 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 2B6314C915; Wed, 27 Jul 2022 12:06:37 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@linux.dev Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GRrfYp8XuLz0; Wed, 27 Jul 2022 12:06:36 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id F28F54C918; Wed, 27 Jul 2022 12:06:35 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 549654C915 for ; Wed, 27 Jul 2022 12:06:35 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bcAkdTGFn6ZM for ; Wed, 27 Jul 2022 12:06:34 -0400 (EDT) Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 03C7E4C913 for ; Wed, 27 Jul 2022 12:06:33 -0400 (EDT) Date: Wed, 27 Jul 2022 16:06:30 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1658937993; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jn9jHV/s0Q+BX5nI+AoGuNqFTdhYxUJgGjXqeiORd9k=; b=jpGfgkrYyGqdsd+Ginq9okV0U8BWrOTA0KenuB+AyMw/dScNca0W1WDwB6BO0FArfy27PS hAgSmZRVGx0JifP1hgTbaHdutRghnEEBS4Zf3zUZ6myaeph0QDo1xYaC3Xd/j52vPlhM+h HPbgHMNkXXRLgGUBLfutyhQ1EiirmX4= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Oliver Upton To: Alexandru Elisei Subject: Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory Message-ID: References: <20220419141012.GB6143@willie-the-truck> <875yjiyka4.wl-maz@kernel.org> <874jz2yja5.wl-maz@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Cc: Marc Zyngier , Will Deacon , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Wed, Jul 27, 2022 at 11:38:53AM +0100, Alexandru Elisei wrote: > Hi Marc, > > On Wed, Jul 27, 2022 at 10:52:34AM +0100, Marc Zyngier wrote: > > On Wed, 27 Jul 2022 10:30:59 +0100, > > Marc Zyngier wrote: > > > > > > On Tue, 26 Jul 2022 18:51:21 +0100, > > > Oliver Upton wrote: > > > > > > > > Doesn't pinning the buffer also imply pinning the stage 1 tables > > > > responsible for its translation as well? I agree that pinning the buffer > > > > is likely the best way forward as pinning the whole of guest memory is > > > > entirely impractical. > > > > Huh, I just realised that you were talking about S1. I don't think we > > need to do this. As long as the translation falls into a mapped > > region (pinned or not), we don't need to worry. Right, but my issue is what happens when a fragment of the S1 becomes unmapped at S2. We were discussing the idea of faulting once on the buffer at the beginning of profiling, seems to me that it could just as easily happen at runtime and get tripped up by what Alex points out below: > PMBSR_EL1.DL might be set 1 as a result of stage 2 fault reported by SPE, > which means the last record written is incomplete. Records have a variable > size, so it's impossible for KVM to revert to the end of the last known > good record without parsing the buffer (references here [1]). And even if > KVM would know the size of a record, there's this bit in the Arm ARM which > worries me (ARM DDI 0487H.a, page D10-5177): > > "The architecture does not require that a sample record is written > sequentially by the SPU, only that: > [..] > - On a Profiling Buffer management interrupt, PMBSR_EL1.DL indicates > whether PMBPTR_EL1 points to the first byte after the last complete > sample record." > > So there might be gaps in the buffer, meaning that the entire buffer would > have to be discarded if DL is set as a result of a stage 2 fault. Attempting to avoid thrashing with more threads so I'm going to summon back some context from your original reply, Marc: > > > > Live migration also throws a wrench in this. IOW, there are still potential > > > > sources of blackout unattributable to guest manipulation of the SPU. > > > > > > Can you chime some light on this? I appreciate that you can't play the > > > R/O trick on the SPE buffer as it invalidates the above discussion, > > > but it should be relatively easy to track these pages and never reset > > > them as clean until the vcpu is stopped. Unless you foresee other > > > issues? Right, we can play tricks on pre-copy to avoid write protecting the SPE buffer. My concern was more around post-copy, where userspace could've decided to leave the buffer behind and demand it back on the resulting S2 fault. > > > To be clear, I don't worry too much about these blind windows. The > > > architecture doesn't really give us the right tools to make it work > > > reliably, making this a best effort only. Unless we pin the whole > > > guest and forego migration and other fault-driven mechanisms. > > > > > > Maybe that is a choice we need to give to the user: cheap, fast, > > > reliable. Pick two. As long as we crisply document the errata in KVM's virtualized SPE (and inform the guest), that sounds reasonable. I'm just uneasy about proceeding with an implementation w/ so many gotchas unless all parties involved are aware of the quirks. -- Thanks, Oliver _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 68E2FC04A68 for ; Wed, 27 Jul 2022 16:07:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=sQ69qtIlbv2pfTF9RcBgWIz56HrjV9Rz55m1FoYcmMg=; b=FcG+plR0ohU4f6 5s0S/Nyt+SxGCL2ngJMGcJGYDbjznGMxcW/CiTkOgTur+yoUXce5ViZnqWkN3NtIVyOpIBFMwiCrw F+YYZUC53divByKoowxEEo68gl2J01efX8ZX5FIf+lOK85VuU3iDR5okuGFcegJvsWWRXt8JqZ8cR 6O2zt9e541WwM3QXtzt60cOqnThRwOUkumIDK8aTBv7/aNyrgerV1JokLdYd2qN2L/fV+BOL+L0fD piJWigTULoCo5CbFrna340tg5HLcmQz/zObf8V3czrupXHFB52VCEuQVCbMqby468F+SAIQPxqifb e6NjZuqHM9QNVZowZJYA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oGjYZ-00FM5Q-86; Wed, 27 Jul 2022 16:06:43 +0000 Received: from out2.migadu.com ([188.165.223.204]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oGjYS-00FLzS-Bn for linux-arm-kernel@lists.infradead.org; Wed, 27 Jul 2022 16:06:41 +0000 Date: Wed, 27 Jul 2022 16:06:30 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1658937993; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jn9jHV/s0Q+BX5nI+AoGuNqFTdhYxUJgGjXqeiORd9k=; b=jpGfgkrYyGqdsd+Ginq9okV0U8BWrOTA0KenuB+AyMw/dScNca0W1WDwB6BO0FArfy27PS hAgSmZRVGx0JifP1hgTbaHdutRghnEEBS4Zf3zUZ6myaeph0QDo1xYaC3Xd/j52vPlhM+h HPbgHMNkXXRLgGUBLfutyhQ1EiirmX4= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Oliver Upton To: Alexandru Elisei Cc: Marc Zyngier , Will Deacon , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org Subject: Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory Message-ID: References: <20220419141012.GB6143@willie-the-truck> <875yjiyka4.wl-maz@kernel.org> <874jz2yja5.wl-maz@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220727_090636_769641_A0F00583 X-CRM114-Status: GOOD ( 35.04 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Jul 27, 2022 at 11:38:53AM +0100, Alexandru Elisei wrote: > Hi Marc, > > On Wed, Jul 27, 2022 at 10:52:34AM +0100, Marc Zyngier wrote: > > On Wed, 27 Jul 2022 10:30:59 +0100, > > Marc Zyngier wrote: > > > > > > On Tue, 26 Jul 2022 18:51:21 +0100, > > > Oliver Upton wrote: > > > > > > > > Doesn't pinning the buffer also imply pinning the stage 1 tables > > > > responsible for its translation as well? I agree that pinning the buffer > > > > is likely the best way forward as pinning the whole of guest memory is > > > > entirely impractical. > > > > Huh, I just realised that you were talking about S1. I don't think we > > need to do this. As long as the translation falls into a mapped > > region (pinned or not), we don't need to worry. Right, but my issue is what happens when a fragment of the S1 becomes unmapped at S2. We were discussing the idea of faulting once on the buffer at the beginning of profiling, seems to me that it could just as easily happen at runtime and get tripped up by what Alex points out below: > PMBSR_EL1.DL might be set 1 as a result of stage 2 fault reported by SPE, > which means the last record written is incomplete. Records have a variable > size, so it's impossible for KVM to revert to the end of the last known > good record without parsing the buffer (references here [1]). And even if > KVM would know the size of a record, there's this bit in the Arm ARM which > worries me (ARM DDI 0487H.a, page D10-5177): > > "The architecture does not require that a sample record is written > sequentially by the SPU, only that: > [..] > - On a Profiling Buffer management interrupt, PMBSR_EL1.DL indicates > whether PMBPTR_EL1 points to the first byte after the last complete > sample record." > > So there might be gaps in the buffer, meaning that the entire buffer would > have to be discarded if DL is set as a result of a stage 2 fault. Attempting to avoid thrashing with more threads so I'm going to summon back some context from your original reply, Marc: > > > > Live migration also throws a wrench in this. IOW, there are still potential > > > > sources of blackout unattributable to guest manipulation of the SPU. > > > > > > Can you chime some light on this? I appreciate that you can't play the > > > R/O trick on the SPE buffer as it invalidates the above discussion, > > > but it should be relatively easy to track these pages and never reset > > > them as clean until the vcpu is stopped. Unless you foresee other > > > issues? Right, we can play tricks on pre-copy to avoid write protecting the SPE buffer. My concern was more around post-copy, where userspace could've decided to leave the buffer behind and demand it back on the resulting S2 fault. > > > To be clear, I don't worry too much about these blind windows. The > > > architecture doesn't really give us the right tools to make it work > > > reliably, making this a best effort only. Unless we pin the whole > > > guest and forego migration and other fault-driven mechanisms. > > > > > > Maybe that is a choice we need to give to the user: cheap, fast, > > > reliable. Pick two. As long as we crisply document the errata in KVM's virtualized SPE (and inform the guest), that sounds reasonable. I'm just uneasy about proceeding with an implementation w/ so many gotchas unless all parties involved are aware of the quirks. -- Thanks, Oliver _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel