From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B18BC433EF for ; Mon, 11 Apr 2022 06:38:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3754C6B0072; Mon, 11 Apr 2022 02:38:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 325926B0073; Mon, 11 Apr 2022 02:38:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EEB86B0074; Mon, 11 Apr 2022 02:38:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 11EF56B0072 for ; Mon, 11 Apr 2022 02:38:24 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D29A323E6A for ; Mon, 11 Apr 2022 06:38:23 +0000 (UTC) X-FDA: 79343644086.14.2AF543E Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf28.hostedemail.com (Postfix) with ESMTP id B6437C0009 for ; Mon, 11 Apr 2022 06:38:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649659102; x=1681195102; h=message-id:date:mime-version:to:cc:references:from: subject:in-reply-to:content-transfer-encoding; bh=uARV/jTosTxQq4l76DhMLrvvCZEgajnIMtFFAQBCWU4=; b=hBjPvb/CZr6BWf7348ddsoHlUxGxhwjS0sG0AiglXXeT3YYHJvECfEEC y5353W2hbkgzRiBzNSggLYyLw2ZynqZNByfaxomWZFGJGEtZmkTCGW4N2 u+s81lDEFCp4SN1Yw32p5UdjxkUtGxZSsyyOXIrVWHehEs00TWuKX8qTI VD58I5jdthCHxFvWzlBXHe3+Yz+G3K5QTvIoKglU+E8uQqitOn0svhVLc xuUZ5y4RJUuVf2xP+ATKlDmsxRhiolal/tRmKjdxjJQEmUpj7EOYMsF+A FGC1FboTNnwesAxrBiBqrR/LjzIODkHUGh28KGP1Ci2/rHGATvmNPnxNr A==; X-IronPort-AV: E=McAfee;i="6400,9594,10313"; a="324958191" X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="324958191" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2022 23:38:21 -0700 X-IronPort-AV: E=Sophos;i="5.90,251,1643702400"; d="scan'208";a="525311978" Received: from srkondle-mobl.amr.corp.intel.com (HELO [10.212.113.6]) ([10.212.113.6]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2022 23:38:03 -0700 Message-ID: <6c976344-fdd6-95cd-2cb0-b0e817bf0392@intel.com> Date: Sun, 10 Apr 2022 23:38:08 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Content-Language: en-US To: "Kirill A. Shutemov" Cc: "Kirill A. Shutemov" , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel , Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , Brijesh Singh , Mike Rapoport , David Hildenbrand , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, Mike Rapoport References: <20220405234343.74045-1-kirill.shutemov@linux.intel.com> <20220405234343.74045-2-kirill.shutemov@linux.intel.com> <767c2100-c171-1fd3-6a92-0af2e4bf3067@intel.com> <20220409155423.iv2arckmvavvpegt@box.shutemov.name> From: Dave Hansen Subject: Re: [PATCHv4 1/8] mm: Add support for unaccepted memory In-Reply-To: <20220409155423.iv2arckmvavvpegt@box.shutemov.name> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: 3t7akoew3sqfriws7rckmc36am8zx31p Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="hBjPvb/C"; spf=none (imf28.hostedemail.com: domain of dave.hansen@intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=dave.hansen@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B6437C0009 X-HE-Tag: 1649659102-893453 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/9/22 08:54, Kirill A. Shutemov wrote: > On Fri, Apr 08, 2022 at 11:55:43AM -0700, Dave Hansen wrote: >>> The page allocator is modified to accept pages on the first allocation. >>> PageUnaccepted() is used to indicate that the page requires acceptance. >> >> Does this consume an actual page flag or is it aliased? > > It is encoded as a page type in mapcount of unallocated memory. It is not > aliased with PageOffline() as I did before. > > I will mention that it is a new page type. Guess I should have looked at the code. :) Are we just increasingly using the StudlyNames() for anything to do with pages? >>> + /* >>> + * PageUnaccepted() indicates that the page has to be "accepted" before it can >>> + * be used. Page allocator has to call accept_page() before returning the page >>> + * to the caller. >>> + */ >> >> Let's talk about "used" with a bit more detail. Maybe: >> >> /* >> * PageUnaccepted() indicates that the page has to be "accepted" before >> * it can be read or written. The page allocator must to call >> * accept_page() before touching the page or returning it to the caller. >> */ > > I guess s/must to call/must call/, right? Yep. ... >>> + /* >>> + * Check if the page needs to be marked as PageUnaccepted(). >>> + * Used for the new pages added to the buddy allocator for the first >>> + * time. >>> + */ >>> + if (!unaccepted && (fpi_flags & FPI_UNACCEPTED)) >>> + unaccepted = page_is_unaccepted(page, order); >> >> if (page_needs_acceptance && (fpi_flags & FPI_UNACCEPTED)) >> page_needs_acceptance = page_is_unaccepted(page, order); >> >>> + if (unaccepted) >>> + __SetPageUnaccepted(page); >> >> This is getting hard for me to follow. >> >> There are: >> 1. Pages that come in here with PageUnaccepted()==1 >> 2. Pages that come in here with PageUnaccepted()==0, but a buddy that >> was PageUnaccepted()==1 >> >> In either of those cases, the bitmap will be consulted to see if the >> page is *truly* unaccepted or not. But, I'm struggling to figure out >> how a page could end up in one of those scenarios and *not* be >> page_is_unaccepted(). >> >> There are three pieces of information that come in: >> 1. PageUnaccepted(page) >> 2. PageUnaccepted(buddies[]) >> 3. the bitmap > > 1 and 2 are the same conceptionally: merged-in pieces of the resulting page. > >> >> and one piece of information going out: >> >> PageUnaccepted(page); >> >> I think I need a more coherent description of how those four things fit >> together. > > The page gets marked as PageUnaccepted() if any of merged-in pages is > PageUnaccepted(). > > For new pages, just being added to buddy allocator, consult > page_is_unaccepted(). FPI_UNACCEPTED indicates that the page is new and > page_is_unaccepted() check is required. > > Avoid calling page_is_unaccepted() if it is known that the page needs > acceptance anyway. It can be costly. > > Is it good enough explanation? Yeah, elaborating on the slow and fast paths makes a lot of sense. > FPI_UNACCEPTED is not a good name. Any help with a better one? > FPI_CHECK_UNACCEPTED? Maybe even something like FPI_UNACCEPTED_SLOWPATH. Then you can say that when this is passed in the pages might not have PageUnaccepted() set and the slow bitmap needs to be consulted. >>> if (fpi_flags & FPI_TO_TAIL) >>> to_tail = true; >>> else if (is_shuffle_order(order)) >>> @@ -1149,7 +1192,8 @@ static inline void __free_one_page(struct page *page, >>> static inline bool page_expected_state(struct page *page, >>> unsigned long check_flags) >>> { >>> - if (unlikely(atomic_read(&page->_mapcount) != -1)) >>> + if (unlikely(atomic_read(&page->_mapcount) != -1) && >>> + !PageUnaccepted(page)) >>> return false; >> >> That probably deserves a comment, and maybe its own if() statement. > > Own if does not work. PageUnaccepted() is encoded in _mapcount. > > What about this: > > /* > * page->_mapcount is expected to be -1. > * > * There is an exception for PageUnaccepted(). The page type can be set > * for pages on free list. Page types are encoded in _mapcount. > * > * PageUnaccepted() will get cleared in post_alloc_hook(). > */ > if (unlikely((atomic_read(&page->_mapcount) | PG_unaccepted) != -1)) > return false; > > ? That's better. But, aren't the PG_* names usually reserved for real page->flags bits? That naming might be part of my confusion. >>> add_to_free_list(&page[size], zone, high, migratetype); >>> set_buddy_order(&page[size], high); >>> } >>> @@ -2396,6 +2446,9 @@ inline void post_alloc_hook(struct page *page, unsigned int order, >>> */ >>> kernel_unpoison_pages(page, 1 << order); >>> >>> + if (PageUnaccepted(page)) >>> + accept_page(page, order); >>> + >>> /* >>> * As memory initialization might be integrated into KASAN, >>> * KASAN unpoisoning and memory initializion code must be >> >> Is accepted memory guaranteed to be zeroed? Do we want to skip the >> __GFP_ZERO behavior later in this function? Or is that just a silly >> over-optimization? > > For TDX, it is true that the memory gets cleared on acceptance, but I > don't we can say the same for any possible implementation. > > I would rather leave __GFP_ZERO for peace of mind. Clearing the cache-hot > page for the second time shouldn't be a big deal comparing to acceptance > cost. Sure, fair enough.