From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53B3BC4332F for ; Fri, 9 Dec 2022 22:23:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C66708E0003; Fri, 9 Dec 2022 17:23:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BEF6D8E0001; Fri, 9 Dec 2022 17:23:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A69FC8E0003; Fri, 9 Dec 2022 17:23:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 90D478E0001 for ; Fri, 9 Dec 2022 17:23:54 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 67BB41401A1 for ; Fri, 9 Dec 2022 22:23:54 +0000 (UTC) X-FDA: 80224196388.01.47AB9E0 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf03.hostedemail.com (Postfix) with ESMTP id 83C2A2001B for ; Fri, 9 Dec 2022 22:23:52 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=z+Jq+hOa; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=3bULmWys; spf=pass (imf03.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670624632; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yazt+l+wO+0PmZJrbTp0vGTCDSc87Dt5hYFTptn2yHM=; b=gaU/N2p1bU0t0WsYnl5/HrhaDuWhCKFU1Me2lieNac4xoOG52XKEX9STIVVmH1FSkLtRSQ gkd92PyeIB698xp3yGNA1zS1YtDOsjawOq6Syjb5lJYFG5vNAp1ntRosOL175uGdCt7hgE KtpuvtgTeqNeN2v81v8ioc0sUeXd79w= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=z+Jq+hOa; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=3bULmWys; spf=pass (imf03.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670624632; a=rsa-sha256; cv=none; b=s7SeCvYeTNkouQVVnkNGNdq5BwUtPasO/p77bH4FQdKYiSnGLShqLHqCmafYeQ2Pc0tBQA JtS1Mc24xYGCh19XbkilI307gGvdDFEFwDc7BdFZlz0HhJ23uT7u1GFkTN1oIdy6ERld/y be2x5deVmTUwmg3vTPC2PN/zqMx2afk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id DD4FE1FD74; Fri, 9 Dec 2022 22:23:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1670624630; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yazt+l+wO+0PmZJrbTp0vGTCDSc87Dt5hYFTptn2yHM=; b=z+Jq+hOaStbFPtaNKlg3FRxlSBaJXPh+oQnblB5JeLK9QCSD5hkCKokZJHkzC4ZBGcEoS8 vLH8TNiP0LfBe/WMywCdbj5Tvhbr7KprdSH0e6iBWZPfpgudPSkMaotO3kiXvRyWmVlLrC l+y1sHZ68X32TZgeSWYxn6wkOV3kK4I= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1670624630; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yazt+l+wO+0PmZJrbTp0vGTCDSc87Dt5hYFTptn2yHM=; b=3bULmWys6kgqDJazRM50lrTiCrUAe0nUMdboobiW6WAV1/JQD9e6r6NUpUFhCOxFGJf6gX 6LWv0XsEnE0/tOAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7889F13597; Fri, 9 Dec 2022 22:23:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id UJS+HHa1k2NPUQAAMHmgww (envelope-from ); Fri, 09 Dec 2022 22:23:50 +0000 Message-ID: <3ab6ea38-5a9b-af4f-3c94-b75dce682bc1@suse.cz> Date: Fri, 9 Dec 2022 23:23:50 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: Re: [PATCHv8 02/14] mm: Add support for unaccepted memory To: "Kirill A. Shutemov" Cc: "Kirill A. Shutemov" , Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel , Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Dave Hansen , Mike Rapoport , David Hildenbrand , Mel Gorman , marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, aarcange@redhat.com, peterx@redhat.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, Mike Rapoport References: <20221207014933.8435-1-kirill.shutemov@linux.intel.com> <20221207014933.8435-3-kirill.shutemov@linux.intel.com> <20221209192616.dg4cbe7mgh3axv5h@box.shutemov.name> Content-Language: en-US From: Vlastimil Babka In-Reply-To: <20221209192616.dg4cbe7mgh3axv5h@box.shutemov.name> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 83C2A2001B X-Rspam-User: X-Stat-Signature: 3s51erfqczg9tbx115u4baoe5z6m4p3e X-HE-Tag: 1670624632-974566 X-HE-Meta: U2FsdGVkX1+B5xoIPfM+Vl6skjYMAPCAVU98bLu+7dMM2rb+9fnxkoo/ggU7Zspe5FvMP9N24dwVwSSzrh/u0r+Mm+jroBwxxZWmxr7vCgzUt8bENc5oW9Ov9GV0egORq9ilqZnmANbAw/Eb8g8p1LtJ8wNb8lfyTJLtIC1iKju6v6IFDf0WcV3kiZpn1JZe3SDmirGfXn1ErNGmY3asSqASFYZQk4eF23u9YyUFilbCdOMcO8QLsC2xZch7DRKSzZhr0yiicT6xWuJINVeDcm00lus49+FMk/iZivvpQyXeCQtHLz81+QIqIUipAbmlHRS9PGCFq6w5UrK+TeiDwyEnqsd9vbliVcam1JGmDbqI97s1j2gBy5IhJZCMVyRDuf2FkMu9X8eFcFI05lP8UtynLQ4IMl0dbPhtmpG6nSPaJ8/R7JFYLNjuaKgbTTwFJpj7E+hFML16u7J10llNPA3HoRa6eYqqlcQAL4u44pIakdiXovVk7V/rDTcyxBo5omt8DzT29TR6cxP5eN2qoH3SfiSgbqYZQBo22CSWVAZtrTaDwyFlfMOoVglkndi3J4ahwnOhmHDtFwE7+ajJlBDKvwnto40/hpcOPcakvRMPNYCJYwGO7NfaJeLfYw7gGCaG7eqdXseGGcKzGToiE106bWaq8TIInRk1hwn3hbO68kSO0vigO5vaohG2MsfyQM6F6LYSsuKIOg4NK8Owl6CeNTaGBvrv4ZNey9O8hxLPM7Q5/99UUN0/UvfI1ttpB5xy3Afv1sH2Bw8mgNiHSrJvco1a+WBjeYUcryNw2n9+yJdbsoQKwVhm605nPVoHIhIa56KtElF8dSq7BvBxL/spZNPtU2pbvxEMAds6zSFSQjP7ersKdpIcqpZeJSI0R20iGhAylBK7mjLlSnBpqhOzQVGrm2Zz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/9/22 20:26, Kirill A. Shutemov wrote: >> > #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT >> > /* >> > * Watermark failed for this zone, but see if we can >> > @@ -4299,6 +4411,9 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, >> > >> > return page; >> > } else { >> > + if (try_to_accept_memory(zone)) >> > + goto try_this_zone; >> >> On the other hand, here we failed the full rmqueue(), including the >> potentially fragmenting fallbacks, so I'm worried that before we finally >> fail all of that and resort to accepting more memory, we already fragmented >> the already accepted memory, more than necessary. > > I'm not sure I follow. We accept memory in pageblock chunks. Do we want to > allocate from a free pageblock if we have other memory to tap from? It > doesn't make sense to me. The fragmentation avoidance based on migratetype does work with pageblock granularity, so yeah, if you accept a single pageblock worth of memory and then (through __rmqueue_fallback()) end up serving both movable and unmovable allocations from it, the whole fragmentation avoidance mechanism is defeated and you end up with unmovable allocations (e.g. page tables) scattered over many pageblocks and inability to allocate any huge pages. >> So one way to prevent would be to move the acceptance into rmqueue() to >> happen before __rmqueue_fallback(), which I originally had in mind and maybe >> suggested that previously. > > I guess it should be pretty straight forward to fail __rmqueue_fallback() > if there's non-empty unaccepted_pages list and steer to > try_to_accept_memory() this way. That could be a way indeed. We do have ALLOC_NOFRAGMENT which could be possible to employ here. But maybe the zone_watermark_fast() modification would be simpler yet sufficient. It makes sense to me that we'd try to keep a high watermark worth of pre-accepted memory. zone_watermark_fast() would fail at low watermark, so we could try accepting (high-low) at a time instead of single pageblock. > But I still don't understand why. To avoid what I described above. >> But maybe less intrusive and more robust way would be to track how much >> memory is unaccepted, and actually decrement that amount from free memory >> in zone_watermark_fast() in order to force earlier failure of that check and >> thus to accept more memory and give us a buffer of truly accepted and >> available memory up to high watermark, which should hopefully prevent most >> of the fallbacks. Then the code I flagged above as currently unecessary >> would make perfect sense. > > The next patch adds per-node unaccepted memory accounting. We can move it > per-zone if it would help. Right. >> And maybe Mel will have some ideas as well. > > I don't have much expertise in page allocator. Any input is valuable. > >> > + >> > #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT >> > /* Try again if zone has deferred pages */ >> > if (static_branch_unlikely(&deferred_pages)) { >> > @@ -6935,6 +7050,10 @@ static void __meminit zone_init_free_lists(struct zone *zone) >> > INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); >> > zone->free_area[order].nr_free = 0; >> > } >> > + >> > +#ifdef CONFIG_UNACCEPTED_MEMORY >> > + INIT_LIST_HEAD(&zone->unaccepted_pages); >> > +#endif >> > } >> > >> > /* >> >