From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39127C4332F for ; Wed, 2 Mar 2022 11:26:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241353AbiCBL1b (ORCPT ); Wed, 2 Mar 2022 06:27:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241297AbiCBL1a (ORCPT ); Wed, 2 Mar 2022 06:27:30 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86BA66622D; Wed, 2 Mar 2022 03:26:47 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2419161804; Wed, 2 Mar 2022 11:26:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D16F9C340F1; Wed, 2 Mar 2022 11:26:45 +0000 (UTC) Authentication-Results: smtp.kernel.org; dkim=pass (1024-bit key) header.d=zx2c4.com header.i=@zx2c4.com header.b="FLziEXpf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1646220402; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fufxSPRIOrmbnPbDC6hJ7ttxshIW5aGS9pJhSWhnCM4=; b=FLziEXpf2l0WTAhzSsctWfFPyoNczuf0yET5OvrmfeoHkbQdJJUnS5q+NI+vrChRlkMr3x 174jqFOO+Z3Pr5c79Gii5Ryw0Vd9q9Ey0Hlr2keq9mQvPlFbNmRp37DUVNhIihEZDWJZG5 HjPNCU02XchGOwdc6y1fEsx970Bts6c= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id a4166dc6 (TLSv1.3:AEAD-AES256-GCM-SHA384:256:NO); Wed, 2 Mar 2022 11:26:41 +0000 (UTC) Received: by mail-yw1-f181.google.com with SMTP id 00721157ae682-2dbc48104beso13440867b3.5; Wed, 02 Mar 2022 03:26:40 -0800 (PST) X-Gm-Message-State: AOAM531CCPFcEGKBbRNRGtYXkH9nqJ+pu4T+FHrfbpHSC9V4hrNo0nv0 rosp0lWDaH02FtNz1efiEeJ2262x3AcFLqhgPQc= X-Google-Smtp-Source: ABdhPJxbaslGOWNMYlqRkibxJ2ICArMnaBOXMWHprS75pC5WtpgRWsCTpsPEORMZE4FcmdBod69a3hNDLa7CXaxndsk= X-Received: by 2002:a81:1143:0:b0:2db:ccb4:b0a1 with SMTP id 64-20020a811143000000b002dbccb4b0a1mr9951120ywr.499.1646220398624; Wed, 02 Mar 2022 03:26:38 -0800 (PST) MIME-Version: 1.0 References: <223f858c-34c5-3ccd-b9e8-7585a976364d@redhat.com> <20220301121419-mutt-send-email-mst@kernel.org> <20220302031738-mutt-send-email-mst@kernel.org> In-Reply-To: <20220302031738-mutt-send-email-mst@kernel.org> From: "Jason A. Donenfeld" Date: Wed, 2 Mar 2022 12:26:27 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: propagating vmgenid outward and upward To: "Michael S. Tsirkin" Cc: Laszlo Ersek , LKML , KVM list , QEMU Developers , linux-hyperv@vger.kernel.org, Linux Crypto Mailing List , Alexander Graf , "Michael Kelley (LINUX)" , Greg Kroah-Hartman , adrian@parity.io, =?UTF-8?Q?Daniel_P=2E_Berrang=C3=A9?= , Dominik Brodowski , Jann Horn , "Rafael J. Wysocki" , "Brown, Len" , Pavel Machek , Linux PM , Colm MacCarthaigh , "Theodore Ts'o" , Arnd Bergmann Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Hey Michael, Thanks for the benchmark. On Wed, Mar 2, 2022 at 9:30 AM Michael S. Tsirkin wrote: > So yes, the overhead is higher by 50% which seems a lot but it's from a > very small number, so I don't see why it's a show stopper, it's not by a > factor of 10 such that we should sacrifice safety by default. Maybe a > kernel flag that removes the read replacing it with an interrupt will > do. > > In other words, premature optimization is the root of all evil. Unfortunately I don't think it's as simple as that for several reasons. First, I'm pretty confident a beefy Intel machine can mostly hide non-dependent comparisons in the memory access and have the problem mostly go away. But this is much less the case on, say, an in-order MIPS32r2, which isn't just "some crappy ISA I'm using for the sake of argument," but actually the platform on which a lot of networking and WireGuard stuff runs, so I do care about it. There, we have 4 reads/comparisons which can't pipeline nearly as well. There's also the atomicity aspect, which I think makes your benchmark not quite accurate. Those 16 bytes could change between the first and second word (or between the Nth and N+1th word for N<=3 on 32-bit). What if in that case the word you read second doesn't change, but the word you read first did? So then you find yourself having to do a hi-lo-hi dance. And then consider the 32-bit case, where that's even more annoying. This is just one of those things that comes up when you compare the semantics of a "large unique ID" and "word-sized counter", as general topics. (My suggestion is that vmgenid provide both.) Finally, there's a slightly storage aspect, where adding 16 bytes to a per-key struct is a little bit heavier than adding 4 bytes and might bust a cache line without sufficient care, care which always has some cost in one way or another. So I just don't know if it's realistic to impose a 16-byte per-packet comparison all the time like that. I'm familiar with WireGuard obviously, but there's also cifs and maybe even wifi and bluetooth, and who knows what else, to care about too. Then there's the userspace discussion. I can't imagine a 16-byte hotpath comparison being accepted as implementable. > And I feel if linux > DTRT and reads the 16 bytes then hypervisor vendors will be motivated to > improve and add a 4 byte unique one. As long as linux is interrupt > driven there's no motivation for change. I reeeeeally don't want to get pulled into the politics of this on the hypervisor side. I assume an improved thing would begin with QEMU and Firecracker or something collaborating because they're both open source and Amazon people seem interested. And then pressure builds for Microsoft and VMware to do it on their side. And then we get this all nicely implemented in the kernel. In the meantime, though, I'm not going to refuse to address the problem entirely just because the virtual hardware is less than perfect; I'd rather make the most with what we've got while still being somewhat reasonable from an implementation perspective. Jason From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7BF5AC4332F for ; Wed, 2 Mar 2022 13:03:51 +0000 (UTC) Received: from localhost ([::1]:54054 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nPOdx-0000Xr-Gw for qemu-devel@archiver.kernel.org; Wed, 02 Mar 2022 08:03:49 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33084) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nPN89-0008W6-Vn for qemu-devel@nongnu.org; Wed, 02 Mar 2022 06:26:55 -0500 Received: from dfw.source.kernel.org ([139.178.84.217]:59696) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nPN87-0006sT-Tm for qemu-devel@nongnu.org; Wed, 02 Mar 2022 06:26:53 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3228461808 for ; Wed, 2 Mar 2022 11:26:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AE015C004E1 for ; Wed, 2 Mar 2022 11:26:47 +0000 (UTC) Authentication-Results: smtp.kernel.org; dkim=pass (1024-bit key) header.d=zx2c4.com header.i=@zx2c4.com header.b="OrmfiKIa" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1646220401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fufxSPRIOrmbnPbDC6hJ7ttxshIW5aGS9pJhSWhnCM4=; b=OrmfiKIa+Djo1GQCQJbETUspOzvJqvLOkrzANHmEdFv9T+PKFdYkbxjLGlJ1ouf9GpymUA oCpIycK6eENRznaOiqRjn/qmpKdjz5200uuz6D8NzsTHlpbvQXwyo7agYd9knZMClMQW4/ NmFFGY1OEVUgXDbgY9f4c4aHmHGMNME= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id b3675b77 (TLSv1.3:AEAD-AES256-GCM-SHA384:256:NO) for ; Wed, 2 Mar 2022 11:26:41 +0000 (UTC) Received: by mail-yw1-f181.google.com with SMTP id 00721157ae682-2dbd97f9bfcso13281777b3.9 for ; Wed, 02 Mar 2022 03:26:40 -0800 (PST) X-Gm-Message-State: AOAM533bc00uoteJMeE1K9dU4oOpUG9/D2TYTPcIlfslyRhVt8xg0bvQ Q0z347VkXACpEF22DQ+ThRFTWkoa3Q0mrqtHcvU= X-Google-Smtp-Source: ABdhPJxbaslGOWNMYlqRkibxJ2ICArMnaBOXMWHprS75pC5WtpgRWsCTpsPEORMZE4FcmdBod69a3hNDLa7CXaxndsk= X-Received: by 2002:a81:1143:0:b0:2db:ccb4:b0a1 with SMTP id 64-20020a811143000000b002dbccb4b0a1mr9951120ywr.499.1646220398624; Wed, 02 Mar 2022 03:26:38 -0800 (PST) MIME-Version: 1.0 References: <223f858c-34c5-3ccd-b9e8-7585a976364d@redhat.com> <20220301121419-mutt-send-email-mst@kernel.org> <20220302031738-mutt-send-email-mst@kernel.org> In-Reply-To: <20220302031738-mutt-send-email-mst@kernel.org> From: "Jason A. Donenfeld" Date: Wed, 2 Mar 2022 12:26:27 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: propagating vmgenid outward and upward To: "Michael S. Tsirkin" Content-Type: text/plain; charset="UTF-8" Received-SPF: pass client-ip=139.178.84.217; envelope-from=SRS0=fxzG=TN=zx2c4.com=Jason@kernel.org; helo=dfw.source.kernel.org X-Spam_score_int: -67 X-Spam_score: -6.8 X-Spam_bar: ------ X-Spam_report: (-6.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Brown, Len" , linux-hyperv@vger.kernel.org, Colm MacCarthaigh , =?UTF-8?Q?Daniel_P=2E_Berrang=C3=A9?= , adrian@parity.io, KVM list , Jann Horn , Greg Kroah-Hartman , Linux PM , "Rafael J. Wysocki" , LKML , Dominik Brodowski , QEMU Developers , Alexander Graf , Linux Crypto Mailing List , Pavel Machek , Theodore Ts'o , "Michael Kelley \(LINUX\)" , Laszlo Ersek , Arnd Bergmann Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Hey Michael, Thanks for the benchmark. On Wed, Mar 2, 2022 at 9:30 AM Michael S. Tsirkin wrote: > So yes, the overhead is higher by 50% which seems a lot but it's from a > very small number, so I don't see why it's a show stopper, it's not by a > factor of 10 such that we should sacrifice safety by default. Maybe a > kernel flag that removes the read replacing it with an interrupt will > do. > > In other words, premature optimization is the root of all evil. Unfortunately I don't think it's as simple as that for several reasons. First, I'm pretty confident a beefy Intel machine can mostly hide non-dependent comparisons in the memory access and have the problem mostly go away. But this is much less the case on, say, an in-order MIPS32r2, which isn't just "some crappy ISA I'm using for the sake of argument," but actually the platform on which a lot of networking and WireGuard stuff runs, so I do care about it. There, we have 4 reads/comparisons which can't pipeline nearly as well. There's also the atomicity aspect, which I think makes your benchmark not quite accurate. Those 16 bytes could change between the first and second word (or between the Nth and N+1th word for N<=3 on 32-bit). What if in that case the word you read second doesn't change, but the word you read first did? So then you find yourself having to do a hi-lo-hi dance. And then consider the 32-bit case, where that's even more annoying. This is just one of those things that comes up when you compare the semantics of a "large unique ID" and "word-sized counter", as general topics. (My suggestion is that vmgenid provide both.) Finally, there's a slightly storage aspect, where adding 16 bytes to a per-key struct is a little bit heavier than adding 4 bytes and might bust a cache line without sufficient care, care which always has some cost in one way or another. So I just don't know if it's realistic to impose a 16-byte per-packet comparison all the time like that. I'm familiar with WireGuard obviously, but there's also cifs and maybe even wifi and bluetooth, and who knows what else, to care about too. Then there's the userspace discussion. I can't imagine a 16-byte hotpath comparison being accepted as implementable. > And I feel if linux > DTRT and reads the 16 bytes then hypervisor vendors will be motivated to > improve and add a 4 byte unique one. As long as linux is interrupt > driven there's no motivation for change. I reeeeeally don't want to get pulled into the politics of this on the hypervisor side. I assume an improved thing would begin with QEMU and Firecracker or something collaborating because they're both open source and Amazon people seem interested. And then pressure builds for Microsoft and VMware to do it on their side. And then we get this all nicely implemented in the kernel. In the meantime, though, I'm not going to refuse to address the problem entirely just because the virtual hardware is less than perfect; I'd rather make the most with what we've got while still being somewhat reasonable from an implementation perspective. Jason