From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5131EC432BE for ; Tue, 10 Aug 2021 15:51:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E64E560F35 for ; Tue, 10 Aug 2021 15:51:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E64E560F35 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 713476B009C; Tue, 10 Aug 2021 11:51:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 674516B009D; Tue, 10 Aug 2021 11:51:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 516978D0001; Tue, 10 Aug 2021 11:51:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0143.hostedemail.com [216.40.44.143]) by kanga.kvack.org (Postfix) with ESMTP id 361D46B009C for ; Tue, 10 Aug 2021 11:51:09 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D5F15182CA8D7 for ; Tue, 10 Aug 2021 15:51:08 +0000 (UTC) X-FDA: 78459609816.14.825B00C Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf21.hostedemail.com (Postfix) with ESMTP id 002A5D026BE5 for ; Tue, 10 Aug 2021 15:51:07 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10072"; a="211822295" X-IronPort-AV: E=Sophos;i="5.84,310,1620716400"; d="scan'208";a="211822295" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2021 08:51:05 -0700 X-IronPort-AV: E=Sophos;i="5.84,310,1620716400"; d="scan'208";a="503176144" Received: from chdubay-mobl1.amr.corp.intel.com (HELO [10.212.234.193]) ([10.212.234.193]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2021 08:51:04 -0700 Subject: Re: [PATCH 0/5] x86: Impplement support for unaccepted memory To: "Kirill A. Shutemov" Cc: "Kirill A. Shutemov" , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org References: <20210810062626.1012-1-kirill.shutemov@linux.intel.com> <4b80289a-07a4-bf92-9946-b0a8afb27326@intel.com> <20210810151548.4exag5uj73bummsr@black.fi.intel.com> From: Dave Hansen Autocrypt: addr=dave.hansen@intel.com; keydata= xsFNBE6HMP0BEADIMA3XYkQfF3dwHlj58Yjsc4E5y5G67cfbt8dvaUq2fx1lR0K9h1bOI6fC oAiUXvGAOxPDsB/P6UEOISPpLl5IuYsSwAeZGkdQ5g6m1xq7AlDJQZddhr/1DC/nMVa/2BoY 2UnKuZuSBu7lgOE193+7Uks3416N2hTkyKUSNkduyoZ9F5twiBhxPJwPtn/wnch6n5RsoXsb ygOEDxLEsSk/7eyFycjE+btUtAWZtx+HseyaGfqkZK0Z9bT1lsaHecmB203xShwCPT49Blxz VOab8668QpaEOdLGhtvrVYVK7x4skyT3nGWcgDCl5/Vp3TWA4K+IofwvXzX2ON/Mj7aQwf5W iC+3nWC7q0uxKwwsddJ0Nu+dpA/UORQWa1NiAftEoSpk5+nUUi0WE+5DRm0H+TXKBWMGNCFn c6+EKg5zQaa8KqymHcOrSXNPmzJuXvDQ8uj2J8XuzCZfK4uy1+YdIr0yyEMI7mdh4KX50LO1 pmowEqDh7dLShTOif/7UtQYrzYq9cPnjU2ZW4qd5Qz2joSGTG9eCXLz5PRe5SqHxv6ljk8mb ApNuY7bOXO/A7T2j5RwXIlcmssqIjBcxsRRoIbpCwWWGjkYjzYCjgsNFL6rt4OL11OUF37wL QcTl7fbCGv53KfKPdYD5hcbguLKi/aCccJK18ZwNjFhqr4MliQARAQABzShEYXZpZCBDaHJp c3RvcGhlciBIYW5zZW4gPGRhdmVAc3I3MS5uZXQ+wsF7BBMBAgAlAhsDBgsJCAcDAgYVCAIJ CgsEFgIDAQIeAQIXgAUCTo3k0QIZAQAKCRBoNZUwcMmSsMO2D/421Xg8pimb9mPzM5N7khT0 2MCnaGssU1T59YPE25kYdx2HntwdO0JA27Wn9xx5zYijOe6B21ufrvsyv42auCO85+oFJWfE K2R/IpLle09GDx5tcEmMAHX6KSxpHmGuJmUPibHVbfep2aCh9lKaDqQR07gXXWK5/yU1Dx0r VVFRaHTasp9fZ9AmY4K9/BSA3VkQ8v3OrxNty3OdsrmTTzO91YszpdbjjEFZK53zXy6tUD2d e1i0kBBS6NLAAsqEtneplz88T/v7MpLmpY30N9gQU3QyRC50jJ7LU9RazMjUQY1WohVsR56d ORqFxS8ChhyJs7BI34vQusYHDTp6PnZHUppb9WIzjeWlC7Jc8lSBDlEWodmqQQgp5+6AfhTD kDv1a+W5+ncq+Uo63WHRiCPuyt4di4/0zo28RVcjtzlGBZtmz2EIC3vUfmoZbO/Gn6EKbYAn rzz3iU/JWV8DwQ+sZSGu0HmvYMt6t5SmqWQo/hyHtA7uF5Wxtu1lCgolSQw4t49ZuOyOnQi5 f8R3nE7lpVCSF1TT+h8kMvFPv3VG7KunyjHr3sEptYxQs4VRxqeirSuyBv1TyxT+LdTm6j4a mulOWf+YtFRAgIYyyN5YOepDEBv4LUM8Tz98lZiNMlFyRMNrsLV6Pv6SxhrMxbT6TNVS5D+6 UorTLotDZKp5+M7BTQRUY85qARAAsgMW71BIXRgxjYNCYQ3Xs8k3TfAvQRbHccky50h99TUY sqdULbsb3KhmY29raw1bgmyM0a4DGS1YKN7qazCDsdQlxIJp9t2YYdBKXVRzPCCsfWe1dK/q 66UVhRPP8EGZ4CmFYuPTxqGY+dGRInxCeap/xzbKdvmPm01Iw3YFjAE4PQ4hTMr/H76KoDbD cq62U50oKC83ca/PRRh2QqEqACvIH4BR7jueAZSPEDnzwxvVgzyeuhwqHY05QRK/wsKuhq7s UuYtmN92Fasbxbw2tbVLZfoidklikvZAmotg0dwcFTjSRGEg0Gr3p/xBzJWNavFZZ95Rj7Et db0lCt0HDSY5q4GMR+SrFbH+jzUY/ZqfGdZCBqo0cdPPp58krVgtIGR+ja2Mkva6ah94/oQN lnCOw3udS+Eb/aRcM6detZr7XOngvxsWolBrhwTQFT9D2NH6ryAuvKd6yyAFt3/e7r+HHtkU kOy27D7IpjngqP+b4EumELI/NxPgIqT69PQmo9IZaI/oRaKorYnDaZrMXViqDrFdD37XELwQ gmLoSm2VfbOYY7fap/AhPOgOYOSqg3/Nxcapv71yoBzRRxOc4FxmZ65mn+q3rEM27yRztBW9 AnCKIc66T2i92HqXCw6AgoBJRjBkI3QnEkPgohQkZdAb8o9WGVKpfmZKbYBo4pEAEQEAAcLB XwQYAQIACQUCVGPOagIbDAAKCRBoNZUwcMmSsJeCEACCh7P/aaOLKWQxcnw47p4phIVR6pVL e4IEdR7Jf7ZL00s3vKSNT+nRqdl1ugJx9Ymsp8kXKMk9GSfmZpuMQB9c6io1qZc6nW/3TtvK pNGz7KPPtaDzvKA4S5tfrWPnDr7n15AU5vsIZvgMjU42gkbemkjJwP0B1RkifIK60yQqAAlT YZ14P0dIPdIPIlfEPiAWcg5BtLQU4Wg3cNQdpWrCJ1E3m/RIlXy/2Y3YOVVohfSy+4kvvYU3 lXUdPb04UPw4VWwjcVZPg7cgR7Izion61bGHqVqURgSALt2yvHl7cr68NYoFkzbNsGsye9ft M9ozM23JSgMkRylPSXTeh5JIK9pz2+etco3AfLCKtaRVysjvpysukmWMTrx8QnI5Nn5MOlJj 1Ov4/50JY9pXzgIDVSrgy6LYSMc4vKZ3QfCY7ipLRORyalFDF3j5AGCMRENJjHPD6O7bl3Xo 4DzMID+8eucbXxKiNEbs21IqBZbbKdY1GkcEGTE7AnkA3Y6YB7I/j9mQ3hCgm5muJuhM/2Fr OPsw5tV/LmQ5GXH0JQ/TZXWygyRFyyI2FqNTx4WHqUn3yFj8rwTAU1tluRUYyeLy0ayUlKBH ybj0N71vWO936MqP6haFERzuPAIpxj2ezwu0xb1GjTk4ynna6h5GjnKgdfOWoRtoWndMZxbA z5cecg== Message-ID: <82b8836f-a467-e5ff-08f3-704a85b9faa0@intel.com> Date: Tue, 10 Aug 2021 08:51:01 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20210810151548.4exag5uj73bummsr@black.fi.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 002A5D026BE5 Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf21.hostedemail.com: domain of dave.hansen@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=dave.hansen@intel.com X-Stat-Signature: 55oezns99wb38ajambsnhxy6sbigjbkn X-HE-Tag: 1628610667-697770 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/10/21 8:15 AM, Kirill A. Shutemov wrote: > On Tue, Aug 10, 2021 at 07:08:58AM -0700, Dave Hansen wrote: >> On 8/9/21 11:26 PM, Kirill A. Shutemov wrote: >>> UEFI Specification version 2.9 introduces concept of memory acceptanc= e: >>> Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, >>> requiring memory to be accepted before it can be used by the guest. >>> Accepting happens via a protocol specific for the Virtrual Machine >>> platform. >>> >>> Accepting memory is costly and it makes VMM allocate memory for the >>> accepted guest physical address range. We don't want to accept all me= mory >>> upfront. >> >> This could use a bit more explanation. Any VM is likely to *eventuall= y* >> touch all its memory, so it's not like a VMM has a long-term advantage >> by delaying this. >> >> So, it must have to do with resource use at boot. Is this to help boo= t >> times? >=20 > Yes, boot time is main motivation. >=20 > But I'm going also to look at long-term VM behaviour with the fixed mem= ory > footprint. I think if a workload allocate/free memory within the same > amount we can keep memory beyond the size unaccepted. Few tweaks likely > will be required such as disabling page shuffling on free to keep > unaccepted memory at the tail of free list. More investigation needed. OK, so this is predicated on the idea that a guest will not use all of its assigned RAM and that the host will put that RAM to good use elsewhere. Right? That's undercut by a a few of factors: 1. Many real-world cloud providers do not overcommit RAM. If the guest does not use the RAM, it goes to waste. (Yes, there are providers that overcommit, but we're talking generally about places where this proposal is useful). 2. Long-term, RAM fills up with page cache in many scenarios So, this is really only beneficial for long-term host physical memory use if: 1. The host is overcommitting and 2. The guest never uses all of its RAM Seeing as TDX doesn't support swap and can't coexist with persistent memory, the only recourse for folks overcommitting TDX VMs when the run out of RAM is to kill VMs. I can't imagine that a lot of folks are going to do this. In other words, I buy the boot speed argument. But, I don't buy the "this saves memory long term" argument at all. >> I had expected this series, but I also expected it to be connected to >> CONFIG_DEFERRED_STRUCT_PAGE_INIT somehow. Could you explain a bit how >> this problem is different and demands a totally orthogonal solution? >> >> For instance, what prevents us from declaring: "Memory is accepted at >> the time that its 'struct page' is initialized" ? Then, we use all th= e >> infrastructure we already have for DEFERRED_STRUCT_PAGE_INIT. >=20 > That was my first thought too and I tried it just to realize that it is > not what we want. If we would accept page on page struct init it means = we > would make host allocate all memory assigned to the guest on boot even = if > guest actually use small portion of it. >=20 > Also deferred page init only allows to scale memory accept across multi= ple > CPUs, but doesn't allow to get to userspace before we done with it. See > wait_for_completion(&pgdat_init_all_done_comp). That's good information. It's a refinement of the "I want to boot faster" requirement. What you want is not just going _faster_, but being able to run userspace before full acceptance has completed. Would you be able to quantify how fast TDX page acceptance is? Are we talking about MB/s, GB/s, TB/s? This series is rather bereft of numbers for a feature which making a performance claim. Let's say we have a 128GB VM. How much does faster does this approach reach userspace than if all memory was accepted up front? How much memory _could_ have been accepted at the point userspace starts running?