From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8A55C67871 for ; Thu, 12 Jan 2023 11:59:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 058578E0002; Thu, 12 Jan 2023 06:59:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F23628E0001; Thu, 12 Jan 2023 06:59:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D757F8E0002; Thu, 12 Jan 2023 06:59:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C1DD38E0001 for ; Thu, 12 Jan 2023 06:59:16 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 88359120897 for ; Thu, 12 Jan 2023 11:59:16 +0000 (UTC) X-FDA: 80346001512.01.FDF04B6 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf09.hostedemail.com (Postfix) with ESMTP id 9118314001A for ; Thu, 12 Jan 2023 11:59:14 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=qzHWsqiB; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=GsSUBRvS; spf=pass (imf09.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673524755; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rk8NDzBWXwv58AsxaL9j8+/toDN9Bb61CC4cwhVedfM=; b=o4Vtj76JQDN/xuIYzGhq+W4kQeYjIu0RmGE818+KHHzL39Jw+GYgpJjmAwGXOr2QoyeTQA TXy+WAGxkQ+duyuWggQe2s0Wp4M4vEICux7MHjwlxvq09jhoy31EKiSncYRi5yGJI+1rw3 aM2wuf5wLQ1uziicwKnAyA/cGhQukOo= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=qzHWsqiB; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=GsSUBRvS; spf=pass (imf09.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673524755; a=rsa-sha256; cv=none; b=KIVrVXI9FpJAVyHL0K4kV4nE5beSgT3ifozBhNo8VUOycPsclBpxwXoX7FDKjxR7GtyQrM fSc7RBw5Noa7qUucSveRk2XUm/SSFyBvuK9ai5gAk77F8Awg0Vstq0GW027pgYGqcg4u0C IToIeaaz1j5zNTf3f34H3HHnG2uML48= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C66783EF0F; Thu, 12 Jan 2023 11:59:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1673524747; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rk8NDzBWXwv58AsxaL9j8+/toDN9Bb61CC4cwhVedfM=; b=qzHWsqiBk4OJrhKbSO650DFI5q4koVVxnqN2ZIEz8G1dreeKiagR3NaIzWHJVB3Bl/fcdI jCL6pFrhSUuL1/2WzQBm8TJ1cYEbe+oW4Vh0T0yjwS7M4ToYHGT30L/V/TH4U6MwdjHfWz Hq/aNYQC8eNjCUx+P362LtfzuP+uQ2c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1673524747; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rk8NDzBWXwv58AsxaL9j8+/toDN9Bb61CC4cwhVedfM=; b=GsSUBRvSchziBexgTOABJx/UJVpqdXwdjgN18Qaksi7RKXa9O1ylPBIS91uJCl1uBMcnHa hFOBoUYqBU/lwEBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 11AD213585; Thu, 12 Jan 2023 11:59:07 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id vQdxAwv2v2NxPAAAMHmgww (envelope-from ); Thu, 12 Jan 2023 11:59:07 +0000 Message-ID: <2b6c77bd-bead-7bfb-bf07-63e9ca837c58@suse.cz> Date: Thu, 12 Jan 2023 12:59:06 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCHv8 02/14] mm: Add support for unaccepted memory Content-Language: en-US To: "Kirill A. Shutemov" Cc: "Kirill A. Shutemov" , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel , Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Dave Hansen , Mike Rapoport , David Hildenbrand , Mel Gorman , marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, aarcange@redhat.com, peterx@redhat.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, Mike Rapoport References: <20221207014933.8435-1-kirill.shutemov@linux.intel.com> <20221207014933.8435-3-kirill.shutemov@linux.intel.com> <20221209192616.dg4cbe7mgh3axv5h@box.shutemov.name> <3ab6ea38-5a9b-af4f-3c94-b75dce682bc1@suse.cz> <20221224164639.pb3hrvbxtlodgm5e@box.shutemov.name> From: Vlastimil Babka In-Reply-To: <20221224164639.pb3hrvbxtlodgm5e@box.shutemov.name> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9118314001A X-Stat-Signature: z7m99rrt5bw435zgezqz8hsc7rwwpki9 X-Rspam-User: X-HE-Tag: 1673524754-381817 X-HE-Meta: U2FsdGVkX18klqSAqq/xts4hYbhjZ8/DXIGdiwF2YPy7xNVyoH2OLRhsk1+6mPzdN3RegPHsEslSrJbPjyrZ5mmXbKJgZmtLcPT6FPJx443b3Lg/bNbqcdCiJLWdlSwFn1isTtpAFjlxLa1C3skDjmzzSR3NCmbRRVt14lmpy/LkhYWKnjejTg0cK5wRMnf9Wm9485m3HCsxwTIvuoNTElAdRHbVL0MrFChvy3TNDlzhF+1JdiBwKo2wqgAEM9ll3U8J80PyJ8dmV9rOVHldgdwa8ASxAQRuM1JBX0vqBJEvba1poNQJCS4NGLPdPpNE0m6Tsopyo+Uycb8IuD8K+LtCI3YjXCNwT1ojuwD2FwGkuliqSC6U8D8JK2nmI/bbDZ0zcq3sZRiiZLq1F/9sgO+a3xYw/6IrEBr/oOUW7sbGOOHr5CgKRBv+bKFeH5gLSQk/IbOHonrrEMoKzY9AmGkL2pf99Is0PaQBwCgBz7sjJEgIzDyOGDAVatYHgsk66TaA/MJxA8VZulDKjdVz8OJ4/QZVpYzakSrdl9v9o1jQNc4xL7agnyIsIJaEGLNUN63hIwA3hEx9lNdzpj+QONN5HqDaUX95w+erhwS/AE5HzKaWqeCrBrNJu2+BIJG+koaSHCst1VpuYCv3AySzYBLbn9ClqZV/QhC0HhQnTuFXqQ5/6O5RcewJy0Nvx+DbcJqmofw72k5xuZYkXhpsy6L/1eY4gNLSJlC4ib2K4BdlOgL4xqY9svlcsnTF8FlkBVMJKPGj/DMSKlw7DGKWxxfERhk5canG1XKRxcv0RDIPFx+vh4XlKWWTOzMAX2CtIiNLc/8vW6mun7dl+xxFxbkMOV0x+A96I0w3tpau5ljMLoCXxnD3jSwv2hVdyfoqBFYoyD8cbhd4V/MA9zC5znPHvRKSwEAU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/24/22 17:46, Kirill A. Shutemov wrote: > On Fri, Dec 09, 2022 at 11:23:50PM +0100, Vlastimil Babka wrote: >> On 12/9/22 20:26, Kirill A. Shutemov wrote: >> >> > #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT >> >> > /* >> >> > * Watermark failed for this zone, but see if we can >> >> > @@ -4299,6 +4411,9 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, >> >> > >> >> > return page; >> >> > } else { >> >> > + if (try_to_accept_memory(zone)) >> >> > + goto try_this_zone; >> >> >> >> On the other hand, here we failed the full rmqueue(), including the >> >> potentially fragmenting fallbacks, so I'm worried that before we finally >> >> fail all of that and resort to accepting more memory, we already fragmented >> >> the already accepted memory, more than necessary. >> > >> > I'm not sure I follow. We accept memory in pageblock chunks. Do we want to >> > allocate from a free pageblock if we have other memory to tap from? It >> > doesn't make sense to me. >> >> The fragmentation avoidance based on migratetype does work with pageblock >> granularity, so yeah, if you accept a single pageblock worth of memory and >> then (through __rmqueue_fallback()) end up serving both movable and >> unmovable allocations from it, the whole fragmentation avoidance mechanism >> is defeated and you end up with unmovable allocations (e.g. page tables) >> scattered over many pageblocks and inability to allocate any huge pages. >> >> >> So one way to prevent would be to move the acceptance into rmqueue() to >> >> happen before __rmqueue_fallback(), which I originally had in mind and maybe >> >> suggested that previously. >> > >> > I guess it should be pretty straight forward to fail __rmqueue_fallback() >> > if there's non-empty unaccepted_pages list and steer to >> > try_to_accept_memory() this way. >> >> That could be a way indeed. We do have ALLOC_NOFRAGMENT which could be >> possible to employ here. >> But maybe the zone_watermark_fast() modification would be simpler yet >> sufficient. It makes sense to me that we'd try to keep a high watermark >> worth of pre-accepted memory. zone_watermark_fast() would fail at low >> watermark, so we could try accepting (high-low) at a time instead of single >> pageblock. > > Looks like we already have __zone_watermark_unusable_free() that seems > match use-case rather closely. We only need switch unaccepted memory to > per-zone accounting. Could work. I'd still suggest also making try_to_accept_memory() to accept up to high watermark, not a single pageblock. > The fixup below suppose to do the trick, but I'm not sure how to test > fragmentation avoidance properly. > > Any suggestions? Haven't done that for years, maybe Mel knows better. But from what I remember, I'd compare /proc/pagetypeinfo with and without memory accepting, and collect the mm_page_alloc_extfrag tracepoint. If there are more of these events happening, it's bad. Ideally with a workload that stresses both userspace (movable) allocations and kernel allocations. Again, Mel might have suggestions for a mmtest? > > diff --git a/drivers/base/node.c b/drivers/base/node.c > index ca6f0590be21..1bd2d245edee 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -483,7 +483,7 @@ static ssize_t node_read_meminfo(struct device *dev, > #endif > #ifdef CONFIG_UNACCEPTED_MEMORY > , > - nid, K(node_page_state(pgdat, NR_UNACCEPTED)) > + nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED)) > #endif > ); > len += hugetlb_report_node_meminfo(buf, len, nid); > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c > index 789b77c7b6df..e9c05b4c457c 100644 > --- a/fs/proc/meminfo.c > +++ b/fs/proc/meminfo.c > @@ -157,7 +157,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) > > #ifdef CONFIG_UNACCEPTED_MEMORY > show_val_kb(m, "Unaccepted: ", > - global_node_page_state(NR_UNACCEPTED)); > + global_zone_page_state(NR_UNACCEPTED)); > #endif > > hugetlb_report_meminfo(m); > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 9c762e8175fc..8b5800cd4424 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -152,6 +152,9 @@ enum zone_stat_item { > NR_ZSPAGES, /* allocated in zsmalloc */ > #endif > NR_FREE_CMA_PAGES, > +#ifdef CONFIG_UNACCEPTED_MEMORY > + NR_UNACCEPTED, > +#endif > NR_VM_ZONE_STAT_ITEMS }; > > enum node_stat_item { > @@ -198,9 +201,6 @@ enum node_stat_item { > NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */ > NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */ > NR_KERNEL_STACK_KB, /* measured in KiB */ > -#ifdef CONFIG_UNACCEPTED_MEMORY > - NR_UNACCEPTED, > -#endif > #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) > NR_KERNEL_SCS_KB, /* measured in KiB */ > #endif > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index e80e8d398863..404b267332a9 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1779,7 +1779,7 @@ static bool try_to_accept_memory(struct zone *zone) > > migratetype = get_pfnblock_migratetype(page, page_to_pfn(page)); > __mod_zone_freepage_state(zone, -1 << order, migratetype); > - __mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, -1 << order); > + __mod_zone_page_state(zone, NR_UNACCEPTED, -1 << order); > spin_unlock_irqrestore(&zone->lock, flags); > > if (last) > @@ -1808,7 +1808,7 @@ static void __free_unaccepted(struct page *page, unsigned int order) > migratetype = get_pfnblock_migratetype(page, page_to_pfn(page)); > list_add_tail(&page->lru, &zone->unaccepted_pages); > __mod_zone_freepage_state(zone, 1 << order, migratetype); > - __mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, 1 << order); > + __mod_zone_page_state(zone, NR_UNACCEPTED, 1 << order); > spin_unlock_irqrestore(&zone->lock, flags); > > if (first) > @@ -4074,6 +4074,9 @@ static inline long __zone_watermark_unusable_free(struct zone *z, > if (!(alloc_flags & ALLOC_CMA)) > unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES); > #endif > +#ifdef CONFIG_UNACCEPTED_MEMORY > + unusable_free += zone_page_state(z, NR_UNACCEPTED); > +#endif > > return unusable_free; > }