From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8052BCCA480 for ; Wed, 29 Jun 2022 19:15:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230527AbiF2TPT (ORCPT ); Wed, 29 Jun 2022 15:15:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230520AbiF2TPR (ORCPT ); Wed, 29 Jun 2022 15:15:17 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08A283DA44 for ; Wed, 29 Jun 2022 12:15:16 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A237EB82652 for ; Wed, 29 Jun 2022 19:15:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 59E06C34114; Wed, 29 Jun 2022 19:15:11 +0000 (UTC) Date: Wed, 29 Jun 2022 20:15:07 +0100 From: Catalin Marinas To: Peter Collingbourne Cc: kvmarm@lists.cs.columbia.edu, Marc Zyngier , kvm@vger.kernel.org, Andy Lutomirski , Linux ARM , Michael Roth , Chao Peng , Will Deacon , Evgenii Stepanov , Steven Price Subject: Re: [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled Message-ID: References: <20220623234944.141869-1-pcc@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Tue, Jun 28, 2022 at 11:54:51AM -0700, Peter Collingbourne wrote: > On Tue, Jun 28, 2022 at 10:58 AM Catalin Marinas > wrote: > > That's why it would be interesting to see > > the effect of using DC GZVA instead of DC ZVA for page zeroing. > > > > I suspect on Android you'd notice the fork() penalty a bit more with all > > the copy-on-write having to copy tags. But we can't tell until we do > > some benchmarks. If the penalty is indeed significant, we'll go back to > > assessing the races here. > > Okay, I can try to measure it. I do feel rather strongly though that > we should try to avoid tagging pages as much as possible even ignoring > the potential performance implications. > > Here's one more idea: we can tag pages eagerly as you propose, but > introduce an opt-out. For example, we introduce a MAP_NOMTE flag, > which would prevent tag initialization as well as causing any future > attempt to mprotect(PROT_MTE) to fail. Allocators that know that the > memory will not be used for MTE in the future can set this flag. For > example, Scudo can start setting this flag once MTE has been disabled > as it has no mechanism for turning MTE back on once disabled. And that > way we will end up with no tags on the heap in the processes with MTE > disabled. Mappings with MAP_NOMTE would not be permitted in the guest > memory space of MTE enabled guests. For executables mapped by the > kernel we may consider adding a bit to the ELF program headers to > enable MAP_NOMTE. I don't like such negative flags and we should aim for minimal changes to code that doesn't care about MTE. If there's a performance penalty with zeroing the tags, we'll keep looking at the lazy tag initialisation. In the meantime, I'll think some more about the lazy stuff. We need at least mte_sync_tags() fixed to set the PG_mte_tagged after the tags have been updated (fixes the CoW + mprotect() race but probably breaks concurrent MAP_SHARED mprotect()). We'd have to add some barriers (maybe in a new function, set_page_tagged()). Some cases like restoring from swap (both private and shared) have the page lock held. KVM doesn't seem to take any page lock, so it can race with the VMM. Anyway, I doubt we can get away with a single bit. We can't make use of PG_locked either since set_pte_at() is called with the page either locked or unlocked. -- Catalin From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56A3FCCA481 for ; Wed, 29 Jun 2022 19:15:22 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id B78764B52E; Wed, 29 Jun 2022 15:15:21 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xwxR60YuiJEw; Wed, 29 Jun 2022 15:15:19 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 885ED4B534; Wed, 29 Jun 2022 15:15:19 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 9D6B54B525 for ; Wed, 29 Jun 2022 15:15:18 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iGiuHub-FuvE for ; Wed, 29 Jun 2022 15:15:16 -0400 (EDT) Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 56B7D4B4D6 for ; Wed, 29 Jun 2022 15:15:16 -0400 (EDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 9F797B82553; Wed, 29 Jun 2022 19:15:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 59E06C34114; Wed, 29 Jun 2022 19:15:11 +0000 (UTC) Date: Wed, 29 Jun 2022 20:15:07 +0100 From: Catalin Marinas To: Peter Collingbourne Subject: Re: [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled Message-ID: References: <20220623234944.141869-1-pcc@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Cc: kvm@vger.kernel.org, Marc Zyngier , Andy Lutomirski , Evgenii Stepanov , Michael Roth , Chao Peng , Steven Price , Will Deacon , kvmarm@lists.cs.columbia.edu, Linux ARM X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Tue, Jun 28, 2022 at 11:54:51AM -0700, Peter Collingbourne wrote: > On Tue, Jun 28, 2022 at 10:58 AM Catalin Marinas > wrote: > > That's why it would be interesting to see > > the effect of using DC GZVA instead of DC ZVA for page zeroing. > > > > I suspect on Android you'd notice the fork() penalty a bit more with all > > the copy-on-write having to copy tags. But we can't tell until we do > > some benchmarks. If the penalty is indeed significant, we'll go back to > > assessing the races here. > > Okay, I can try to measure it. I do feel rather strongly though that > we should try to avoid tagging pages as much as possible even ignoring > the potential performance implications. > > Here's one more idea: we can tag pages eagerly as you propose, but > introduce an opt-out. For example, we introduce a MAP_NOMTE flag, > which would prevent tag initialization as well as causing any future > attempt to mprotect(PROT_MTE) to fail. Allocators that know that the > memory will not be used for MTE in the future can set this flag. For > example, Scudo can start setting this flag once MTE has been disabled > as it has no mechanism for turning MTE back on once disabled. And that > way we will end up with no tags on the heap in the processes with MTE > disabled. Mappings with MAP_NOMTE would not be permitted in the guest > memory space of MTE enabled guests. For executables mapped by the > kernel we may consider adding a bit to the ELF program headers to > enable MAP_NOMTE. I don't like such negative flags and we should aim for minimal changes to code that doesn't care about MTE. If there's a performance penalty with zeroing the tags, we'll keep looking at the lazy tag initialisation. In the meantime, I'll think some more about the lazy stuff. We need at least mte_sync_tags() fixed to set the PG_mte_tagged after the tags have been updated (fixes the CoW + mprotect() race but probably breaks concurrent MAP_SHARED mprotect()). We'd have to add some barriers (maybe in a new function, set_page_tagged()). Some cases like restoring from swap (both private and shared) have the page lock held. KVM doesn't seem to take any page lock, so it can race with the VMM. Anyway, I doubt we can get away with a single bit. We can't make use of PG_locked either since set_pte_at() is called with the page either locked or unlocked. -- Catalin _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C82F6C43334 for ; Wed, 29 Jun 2022 19:16:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=hAiOZcMpMnEtK0egBPwl/a0W5X5KTIqiumLGPLrXt4w=; b=SvqDTwJKuY07hL 4iblVt7bZqoddJ4C93yJiov0LF3ciKsBdqa3SedIO+FcCmDTCv+E25e5IRZUZCPcbbOPS9IhCeFI8 8hl98JvXpb2Q07rEXnCy/L/vluieCSRN4ultFSLZQ+igUAo4G14L0IzNjk3J6HVvzi6Qlx9L6QFVt DBkfZZ4GlUdBm6JJvsLAuQTAFVylRyt01wqG5AwAsbNJay5LGLSqnWmBPM3BQhOkfTWEsOn94uIVg B4WkAQkLc+GRcJf4jv4jhaJQ5HnXsi48zMdRTahGtSWWLQICtzWwH4Ntmiv1XNNWepZOpLQt5n71n gok27gW/Bj+qHaVhGXMw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1o6d9y-00Dj1H-HQ; Wed, 29 Jun 2022 19:15:35 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1o6d9g-00Diq1-GG for linux-arm-kernel@lists.infradead.org; Wed, 29 Jun 2022 19:15:20 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 9F797B82553; Wed, 29 Jun 2022 19:15:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 59E06C34114; Wed, 29 Jun 2022 19:15:11 +0000 (UTC) Date: Wed, 29 Jun 2022 20:15:07 +0100 From: Catalin Marinas To: Peter Collingbourne Cc: kvmarm@lists.cs.columbia.edu, Marc Zyngier , kvm@vger.kernel.org, Andy Lutomirski , Linux ARM , Michael Roth , Chao Peng , Will Deacon , Evgenii Stepanov , Steven Price Subject: Re: [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled Message-ID: References: <20220623234944.141869-1-pcc@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220629_121516_739647_E83C89C4 X-CRM114-Status: GOOD ( 26.90 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Jun 28, 2022 at 11:54:51AM -0700, Peter Collingbourne wrote: > On Tue, Jun 28, 2022 at 10:58 AM Catalin Marinas > wrote: > > That's why it would be interesting to see > > the effect of using DC GZVA instead of DC ZVA for page zeroing. > > > > I suspect on Android you'd notice the fork() penalty a bit more with all > > the copy-on-write having to copy tags. But we can't tell until we do > > some benchmarks. If the penalty is indeed significant, we'll go back to > > assessing the races here. > > Okay, I can try to measure it. I do feel rather strongly though that > we should try to avoid tagging pages as much as possible even ignoring > the potential performance implications. > > Here's one more idea: we can tag pages eagerly as you propose, but > introduce an opt-out. For example, we introduce a MAP_NOMTE flag, > which would prevent tag initialization as well as causing any future > attempt to mprotect(PROT_MTE) to fail. Allocators that know that the > memory will not be used for MTE in the future can set this flag. For > example, Scudo can start setting this flag once MTE has been disabled > as it has no mechanism for turning MTE back on once disabled. And that > way we will end up with no tags on the heap in the processes with MTE > disabled. Mappings with MAP_NOMTE would not be permitted in the guest > memory space of MTE enabled guests. For executables mapped by the > kernel we may consider adding a bit to the ELF program headers to > enable MAP_NOMTE. I don't like such negative flags and we should aim for minimal changes to code that doesn't care about MTE. If there's a performance penalty with zeroing the tags, we'll keep looking at the lazy tag initialisation. In the meantime, I'll think some more about the lazy stuff. We need at least mte_sync_tags() fixed to set the PG_mte_tagged after the tags have been updated (fixes the CoW + mprotect() race but probably breaks concurrent MAP_SHARED mprotect()). We'd have to add some barriers (maybe in a new function, set_page_tagged()). Some cases like restoring from swap (both private and shared) have the page lock held. KVM doesn't seem to take any page lock, so it can race with the VMM. Anyway, I doubt we can get away with a single bit. We can't make use of PG_locked either since set_pte_at() is called with the page either locked or unlocked. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel