From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38CF9C433EF for ; Thu, 26 May 2022 19:46:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D5BB8D0003; Thu, 26 May 2022 15:46:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 733D18D0001; Thu, 26 May 2022 15:46:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A9A78D0003; Thu, 26 May 2022 15:46:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 49EC98D0001 for ; Thu, 26 May 2022 15:46:49 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1EE27207B2 for ; Thu, 26 May 2022 19:46:49 +0000 (UTC) X-FDA: 79508926938.10.46CD155 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf18.hostedemail.com (Postfix) with ESMTP id 2C87A1C0021 for ; Thu, 26 May 2022 19:46:30 +0000 (UTC) Received: by mail-pl1-f171.google.com with SMTP id a17so2303183plb.4 for ; Thu, 26 May 2022 12:46:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=10Sx9S31e2idpPzTS8bdIYASB+W2MY1RjKj3b50jxno=; b=BhWNPt+Fs+dkDlGSEJf+F2zzckhK5r4u/mwh0dONw5POvJTOhTrYg+5hgv7x7vkzXo N8vrUh9AaoNUN+i4bCfRTy5pVYdZCp4muwGl5DM8/sUrwrlW/arMbLzUPohjFx/T6QtE toMt2+2rc+2ZIjqT/JZEDDUxBxWwV73CuwYTJogb9hMdzGFCzI3Yogh8Ukpn46DvTB5e 8euJeIO1oUShbtMlSJrYygh0rP0h+9CFehB5aEXSxdXk64gUaL7o6egRLlvFnA7T+BDY UPUQXu8wKrDr93Xm+wbOGQNFJvL4NHBsHn6oi0h+sLO+IIbnB0gSUvPDvqdQxgdu1tIX msbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=10Sx9S31e2idpPzTS8bdIYASB+W2MY1RjKj3b50jxno=; b=g00kiVMuCWbTl1ipjkw0ShuKIYdckdTvjDCCr78MZ2t1YgA8KvmGIXC7aUxYetFkXa XqUjWfi62E5enXsI9EaOjmy47FQH5t7hRH20ez79IH2P9sB9GzJR3jwoVkkDH/hQxOGm E6vwoLpffgvS8KgoxR6AByl0iyPVKFOeB9IdaQqwFv8DAWIF4OB+zOYV16goIoC8HkgT b1yySeLRHY9GBhd6CFAaukxQIVj4DmZdACYmloZO2W3J9ZByWkg42bd84u5iRH3pIhdT dqplNWuWmZ5dB1DVatPPTn8Xwco/9gqv0eB0AVU/rfMkkOwEl87hxDiWOMYhbLaaaKo6 sEMw== X-Gm-Message-State: AOAM532FqFIp1CPxO8Pu3os+leny00FC85Wus0DZxsynK8kT8SOX6+TK FT0qXDd4uiQu4SM7NSNuoMc= X-Google-Smtp-Source: ABdhPJxwvObOqF0ec6vutPjsD1M0dTjHDj1dhQuRd4QPqe95Y4WSFjdIzdOEhubaAyzDa8kP4aEOFQ== X-Received: by 2002:a17:90a:8b98:b0:1e2:5df6:52b8 with SMTP id z24-20020a17090a8b9800b001e25df652b8mr3235837pjn.28.1653594407329; Thu, 26 May 2022 12:46:47 -0700 (PDT) Received: from [192.168.1.50] (ip70-191-40-110.oc.oc.cox.net. [70.191.40.110]) by smtp.gmail.com with ESMTPSA id q2-20020a170902b10200b0015e8d4eb1f7sm2010994plr.65.2022.05.26.12.46.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 May 2022 12:46:46 -0700 (PDT) Message-ID: Date: Thu, 26 May 2022 12:46:44 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH v11 3/6] mm: make alloc_contig_range work at pageblock granularity Content-Language: en-US To: Zi Yan Cc: David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, Vlastimil Babka , Mel Gorman , Eric Ren , Mike Rapoport , Oscar Salvador , Christophe Leroy , Andrew Morton , kernel test robot , Qian Cai References: <20220425143118.2850746-1-zi.yan@sent.com> <20220425143118.2850746-4-zi.yan@sent.com> <23A7297E-6C84-4138-A9FE-3598234004E6@nvidia.com> <180aaa57-28d8-30f0-e843-ea52e3a180a8@gmail.com> <805CA8E8-9216-4CB3-BC8E-DC89456FDF3C@nvidia.com> From: Doug Berger In-Reply-To: <805CA8E8-9216-4CB3-BC8E-DC89456FDF3C@nvidia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2C87A1C0021 X-Stat-Signature: kj5pdefhruoccu8zirzay1pjudwy9oq3 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=BhWNPt+F; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of opendmb@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=opendmb@gmail.com X-Rspam-User: X-HE-Tag: 1653594390-53216 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/26/2022 10:34 AM, Zi Yan wrote: > On 25 May 2022, at 17:11, Zi Yan wrote: > >> On 25 May 2022, at 17:03, Doug Berger wrote: >> >>> On 5/25/2022 10:53 AM, Zi Yan wrote: >>>> On 25 May 2022, at 13:41, Doug Berger wrote: >>>> >>>>> I am seeing some free memory accounting problems with linux-next that I have bisected to this commit (i.e. b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity"). >>>>> >>>>> On an arm64 SMP platform with 4GB total memory and the default 16MB default CMA pool, I am seeing the following after boot with a sysrq Show Memory (e.g. 'echo m > /proc/sysrq-trigger'): >>>>> >>>>> [ 16.015906] sysrq: Show Memory >>>>> [ 16.019039] Mem-Info: >>>>> [ 16.021348] active_anon:14604 inactive_anon:919 isolated_anon:0 >>>>> [ 16.021348] active_file:0 inactive_file:0 isolated_file:0 >>>>> [ 16.021348] unevictable:0 dirty:0 writeback:0 >>>>> [ 16.021348] slab_reclaimable:3662 slab_unreclaimable:3333 >>>>> [ 16.021348] mapped:928 shmem:15146 pagetables:63 bounce:0 >>>>> [ 16.021348] kernel_misc_reclaimable:0 >>>>> [ 16.021348] free:976766 free_pcp:991 free_cma:7017 >>>>> [ 16.056937] Node 0 active_anon:58416kB inactive_anon:3676kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3712kB dirty:0kB writeback:0kB shmem:60584kB writeback_tmp:0kB kernel_stack:1200kB pagetables:252kB all_unreclaimable? no >>>>> [ 16.081526] DMA free:3041036kB boost:0kB min:6036kB low:9044kB high:12052kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3145728kB managed:3029992kB mlocked:0kB bounce:0kB free_pcp:636kB local_pcp:0kB free_cma:28068kB >>>>> [ 16.108650] lowmem_reserve[]: 0 0 944 944 >>>>> [ 16.112746] Normal free:866028kB boost:0kB min:1936kB low:2900kB high:3864kB reserved_highatomic:0KB active_anon:58416kB inactive_anon:3676kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1048576kB managed:967352kB mlocked:0kB bounce:0kB free_pcp:3328kB local_pcp:864kB free_cma:0kB >>>>> [ 16.140393] lowmem_reserve[]: 0 0 0 0 >>>>> [ 16.144133] DMA: 7*4kB (UMC) 4*8kB (M) 3*16kB (M) 3*32kB (MC) 5*64kB (M) 4*128kB (MC) 5*256kB (UMC) 7*512kB (UM) 5*1024kB (UM) 9*2048kB (UMC) 732*4096kB (MC) = 3027724kB >>>>> [ 16.159609] Normal: 149*4kB (UM) 95*8kB (UME) 26*16kB (UME) 8*32kB (ME) 2*64kB (UE) 1*128kB (M) 2*256kB (ME) 2*512kB (ME) 2*1024kB (UM) 0*2048kB 210*4096kB (M) = 866028kB >>>>> [ 16.175165] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB >>>>> [ 16.183937] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB >>>>> [ 16.192533] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB >>>>> [ 16.201040] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB >>>>> [ 16.209374] 15146 total pagecache pages >>>>> [ 16.213246] 0 pages in swap cache >>>>> [ 16.216595] Swap cache stats: add 0, delete 0, find 0/0 >>>>> [ 16.221867] Free swap = 0kB >>>>> [ 16.224780] Total swap = 0kB >>>>> [ 16.227693] 1048576 pages RAM >>>>> [ 16.230694] 0 pages HighMem/MovableOnly >>>>> [ 16.234564] 49240 pages reserved >>>>> [ 16.237825] 4096 pages cma reserved >>>>> >>>>> Some anomolies in the above are: >>>>> free_cma:7017 with only 4096 pages cma reserved >>>>> DMA free:3041036kB with only managed:3029992kB >>>>> >>>>> I'm not sure what is going on here, but I am suspicious of split_free_page() since del_page_from_free_list doesn't affect migrate_type accounting, but __free_one_page() can. >>>>> Also PageBuddy(page) is being checked without zone->lock in isolate_single_pageblock(). >>>>> >>>>> Please investigate this as well. >>>> >>>> >>>> Can you try this patch https://lore.kernel.org/linux-mm/20220524194756.1698351-1-zi.yan@sent.com/ >>>> and see if it fixes the issue? >>>> >>>> Thanks. >>>> >>> The last hunk didn't apply directly to this commit, but I was able to apply the patch to linux-next/master with no improvement to the free memory accounting (actually anecdotaly worse): >>> >>> [ 6.236828] sysrq: Show Memory >>> [ 6.239973] Mem-Info: >>> [ 6.242290] active_anon:14594 inactive_anon:924 isolated_anon:0 >>> [ 6.242290] active_file:0 inactive_file:0 isolated_file:0 >>> [ 6.242290] unevictable:0 dirty:0 writeback:0 >>> [ 6.242290] slab_reclaimable:3671 slab_unreclaimable:3575 >>> [ 6.242290] mapped:935 shmem:15147 pagetables:63 bounce:0 >>> [ 6.242290] kernel_misc_reclaimable:0 >>> [ 6.242290] free:1059009 free_pcp:1067 free_cma:90112 >>> [ 6.278048] Node 0 active_anon:58376kB inactive_anon:3844kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3740kB dirty:0kB writeback:0kB shmem:60588kB writeback_tmp:0kB kernel_stack:1216kB pagetables:252kB all_unreclaimable? no >>> [ 6.279422] arm-scmi brcm_scmi@0: timed out in resp(caller: scmi_perf_level_set+0xe0/0x110) >>> [ 6.302501] DMA free:3372200kB boost:0kB min:6032kB low:9040kB high:12048kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3145728kB managed:3029800kB mlocked:0kB bounce:0kB free_pcp:636kB local_pcp:0kB free_cma:360448kB >>> [ 6.302515] lowmem_reserve[]: 0 0 944 >>> [ 6.310894] cpufreq: __target_index: Failed to change cpu frequency: -110 >>> [ 6.337920] 944 >>> [ 6.337925] Normal free:863584kB boost:0kB min:1940kB low:2904kB high:3868kB reserved_highatomic:0KB active_anon:58376kB inactive_anon:3896kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1048576kB managed:967352kB mlocked:0kB bounce:0kB free_pcp:3492kB local_pcp:828kB free_cma:0kB >>> [ 6.377782] lowmem_reserve[]: 0 0 0 0 >>> [ 6.381461] DMA: 4*4kB (UM) 5*8kB (M) 3*16kB (M) 2*32kB (M) 6*64kB (M) 5*128kB (M) 6*256kB (UM) 5*512kB (UM) 4*1024kB (M) 10*2048kB (UMC) 732*4096kB (MC) = 3028136kB >>> [ 6.396324] Normal: 84*4kB (U) 94*8kB (UM) 260*16kB (UME) 149*32kB (UM) 99*64kB (UME) 39*128kB (UM) 12*256kB (U) 3*512kB (UME) 2*1024kB (UM) 0*2048kB 204*4096kB (M) = 863584kB >>> [ 6.412054] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB >>> [ 6.420770] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB >>> [ 6.429312] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB >>> [ 6.437767] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB >>> [ 6.446047] 15147 total pagecache pages >>> [ 6.449890] 0 pages in swap cache >>> [ 6.453210] Swap cache stats: add 0, delete 0, find 0/0 >>> [ 6.458445] Free swap = 0kB >>> [ 6.461331] Total swap = 0kB >>> [ 6.464217] 1048576 pages RAM >>> [ 6.467190] 0 pages HighMem/MovableOnly >>> [ 6.471032] 49288 pages reserved >>> [ 6.474267] 4096 pages cma reserved >>> >>> Regards, >>> Doug >> >> I will look into it. Thanks for reporting it. > > Hi Doug, > > Can you try the patch below? It takes out free pages under zone lock now > and modifies page stats properly. Thanks. > > > diff --git a/mm/internal.h b/mm/internal.h > index 64e61b032dac..c0f8fbe0445b 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -374,8 +374,8 @@ extern void *memmap_alloc(phys_addr_t size, phys_addr_t align, > phys_addr_t min_addr, > int nid, bool exact_nid); > > -void split_free_page(struct page *free_page, > - int order, unsigned long split_pfn_offset); > +int split_free_page(struct page *free_page, > + unsigned int order, unsigned long split_pfn_offset); > > #if defined CONFIG_COMPACTION || defined CONFIG_CMA > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index bc93a82e51e6..6f6e4649ac21 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1100,30 +1100,44 @@ static inline void __free_one_page(struct page *page, > * @order: the order of the page > * @split_pfn_offset: split offset within the page > * > + * Return -ENOENT if the free page is changed, otherwise 0 > + * > * It is used when the free page crosses two pageblocks with different migratetypes > * at split_pfn_offset within the page. The split free page will be put into > * separate migratetype lists afterwards. Otherwise, the function achieves > * nothing. > */ > -void split_free_page(struct page *free_page, > - int order, unsigned long split_pfn_offset) > +int split_free_page(struct page *free_page, > + unsigned int order, unsigned long split_pfn_offset) > { > struct zone *zone = page_zone(free_page); > unsigned long free_page_pfn = page_to_pfn(free_page); > unsigned long pfn; > unsigned long flags; > int free_page_order; > + int mt; > + int ret = 0; > > if (split_pfn_offset == 0) > - return; > + return ret; > > spin_lock_irqsave(&zone->lock, flags); > + > + if (!PageBuddy(free_page) || buddy_order(free_page) != order) { > + ret = -ENOENT; > + goto out; > + } > + > + mt = get_pageblock_migratetype(free_page); > + if (likely(!is_migrate_isolate(mt))) > + __mod_zone_freepage_state(zone, -(1UL << order), mt); > + > del_page_from_free_list(free_page, zone, order); > for (pfn = free_page_pfn; > pfn < free_page_pfn + (1UL << order);) { > int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); > > - free_page_order = min_t(int, > + free_page_order = min_t(unsigned int, > pfn ? __ffs(pfn) : order, > __fls(split_pfn_offset)); This part of the patch doesn't agree with any version of page_alloc.c I have, but I was able to manually apply the change. > __free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order, > @@ -1134,7 +1148,9 @@ void split_free_page(struct page *free_page, > if (split_pfn_offset == 0) > split_pfn_offset = (1UL << order) - (pfn - free_page_pfn); > } > +out: > spin_unlock_irqrestore(&zone->lock, flags); > + return ret; > } > /* > * A bad page could be due to a number of fields. Instead of multiple branches, > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index c643c8420809..f539ccf7fb44 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -300,7 +300,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > * the in-use page then splitting the free page. > */ > static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > - gfp_t gfp_flags, bool isolate_before) > + gfp_t gfp_flags, bool isolate_before, bool skip_isolation) > { > unsigned char saved_mt; > unsigned long start_pfn; > @@ -327,11 +327,16 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > zone->zone_start_pfn); > > saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock)); > - ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags, > - isolate_pageblock, isolate_pageblock + pageblock_nr_pages); > > - if (ret) > - return ret; > + if (skip_isolation) > + VM_BUG_ON(!is_migrate_isolate(saved_mt)); > + else { > + ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags, > + isolate_pageblock, isolate_pageblock + pageblock_nr_pages); > + > + if (ret) > + return ret; > + } > > /* > * Bail out early when the to-be-isolated pageblock does not form > @@ -367,8 +372,11 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > int order = buddy_order(page); > > if (pfn + (1UL << order) > boundary_pfn) > - split_free_page(page, order, boundary_pfn - pfn); > - pfn += (1UL << order); > + /* free page changed before split, check it again */ > + if (split_free_page(page, order, boundary_pfn - pfn)) > + continue; > + > + pfn += 1UL << order; > continue; > } > /* > @@ -463,7 +471,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > return 0; > failed: > /* restore the original migratetype */ > - unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt); > + if (!skip_isolation) > + unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt); > return -EBUSY; > } > > @@ -522,14 +531,18 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > unsigned long isolate_start = ALIGN_DOWN(start_pfn, pageblock_nr_pages); > unsigned long isolate_end = ALIGN(end_pfn, pageblock_nr_pages); > int ret; > + bool skip_isolation = false; > > /* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */ > - ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false); > + ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false, skip_isolation); > if (ret) > return ret; > > + if (isolate_start == isolate_end - pageblock_nr_pages) > + skip_isolation = true; > + > /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ > - ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true); > + ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true, skip_isolation); > if (ret) { > unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype); > return ret; > > > -- > Best Regards, > Yan, Zi This patch does appear to fix the problem I observed. I'll poke it a little more, but so far it looks good. Thanks!, Doug