From: Zi Yan <ziy@nvidia.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>,
David Hildenbrand <david@redhat.com>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
iommu@lists.linux-foundation.org,
Eric Ren <renzhengeek@gmail.com>,
Robin Murphy <robin.murphy@arm.com>,
Christoph Hellwig <hch@lst.de>, Vlastimil Babka <vbabka@suse.cz>,
Marek Szyprowski <m.szyprowski@samsung.com>
Subject: Re: [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity
Date: Fri, 04 Feb 2022 10:19:55 -0500 [thread overview]
Message-ID: <E188FF7F-F52E-4DB6-B258-B9CCD7058E56@nvidia.com> (raw)
In-Reply-To: <Yf0wpFmtckRRzLFg@localhost.localdomain>
[-- Attachment #1: Type: text/plain, Size: 4612 bytes --]
On 4 Feb 2022, at 8:56, Oscar Salvador wrote:
> On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
>> pageblocks with different migratetypes. It might unnecessarily convert
>> extra pageblocks at the beginning and at the end of the range. Change
>> alloc_contig_range() to work at pageblock granularity.
>>
>> It is done by restoring pageblock types and split >pageblock_order free
>> pages after isolating at MAX_ORDER-1 granularity and migrating pages
>> away at pageblock granularity. The reason for this process is that
>> during isolation, some pages, either free or in-use, might have >pageblock
>> sizes and isolating part of them can cause free accounting issues.
>> Restoring the migratetypes of the pageblocks not in the interesting
>> range later is much easier.
>
> Hi Zi Yan,
>
> Due to time constraints I only glanced over, so some comments below
> about stuff that caught my eye:
Thanks for looking at the patches!
>
>> +static inline void split_free_page_into_pageblocks(struct page *free_page,
>> + int order, struct zone *zone)
>> +{
>> + unsigned long pfn;
>> +
>> + spin_lock(&zone->lock);
>> + del_page_from_free_list(free_page, zone, order);
>> + for (pfn = page_to_pfn(free_page);
>> + pfn < page_to_pfn(free_page) + (1UL << order);
>
> It migt make sense to have a end_pfn variable so that does not have to
> be constantly evaluated. Or maybe the compiler is clever enough to only
> evualuate it once.
Sure. Will add end_pfn.
>
>> + pfn += pageblock_nr_pages) {
>> + int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>> +
>> + __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
>> + mt, FPI_NONE);
>> + }
>> + spin_unlock(&zone->lock);
>
> It is possible that free_page's order is already pageblock_order, so I
> would add a one-liner upfront to catch that case and return, otherwise
> we do the delete_from_freelist-and-free_it_again dance.
Make sense. Will do.
>
>> + /* Save the migratepages of the pageblocks before start and after end */
>> + num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
>> + + (isolate_end - alloc_end) / pageblock_nr_pages;
>> + saved_mt =
>> + kmalloc_array(num_pageblock_to_save,
>> + sizeof(unsigned char), GFP_KERNEL);
>> + if (!saved_mt)
>> + return -ENOMEM;
>> +
>> + num = save_migratetypes(saved_mt, isolate_start, alloc_start);
>> +
>> + num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
>
> I really hope we can put all this magic within start_isolate_page_range,
> and the counterparts in undo_isolate_page_range.
>
That is my plan too.
> Also, I kinda dislike the &saved_mt thing. I thought about some other
> approaches but nothing that wasn't too specific for this case, and I
> guess we want that function to be as generic as possible.
>
I do not like it either. This whole save and restore thing should go away
once I make MIGRATE_ISOLATE a standalone bit, so that the original
migrateypes will not be overwritten during isolation. Hopefully, I can
work on it soon to get rid of this hunk completely.
>> + /*
>> + * Split free page spanning [alloc_end, isolate_end) and put the
>> + * pageblocks in the right migratetype list
>> + */
>> + for (outer_end = alloc_end; outer_end < isolate_end;) {
>> + unsigned long begin_pfn = outer_end;
>> +
>> + order = 0;
>> + while (!PageBuddy(pfn_to_page(outer_end))) {
>> + if (++order >= MAX_ORDER) {
>> + outer_end = begin_pfn;
>> + break;
>> + }
>> + outer_end &= ~0UL << order;
>> + }
>> +
>> + if (outer_end != begin_pfn) {
>> + order = buddy_order(pfn_to_page(outer_end));
>> +
>> + /*
>> + * split the free page has start page and put the pageblocks
>> + * in the right migratetype list
>> + */
>> + VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);
>
> How could this possibily happen?
Right. It is not possible. Will remove it.
>
>> + {
>> + struct page *free_page = pfn_to_page(outer_end);
>> +
>> + split_free_page_into_pageblocks(free_page, order, cc.zone);
>> + }
>> + outer_end += 1UL << order;
>> + } else
>> + outer_end = begin_pfn + 1;
>> }
>
> I think there are cases could optimize for. If the page has already been
> split in pageblock by the outer_start loop, we could skip this outer_end
> logic altogether.
>
> E.g: An order-10 page is split in two pageblocks. There's nothing else
> to be done, right? We could skip this.
Yes. I will think about it more and do some optimization.
--
Best Regards,
Yan, Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
next prev parent reply other threads:[~2022-02-04 15:21 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-19 19:06 [PATCH v4 0/7] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-01-19 19:06 ` [PATCH v4 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others Zi Yan
2022-01-24 14:02 ` Mel Gorman
2022-01-24 16:12 ` Zi Yan
2022-01-24 16:43 ` Mel Gorman
2022-01-19 19:06 ` [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
2022-01-25 6:23 ` Oscar Salvador
2022-01-19 19:06 ` [PATCH v4 3/7] mm: page_isolation: check specified range for unmovable pages Zi Yan
2022-01-24 9:55 ` Oscar Salvador
2022-01-24 17:17 ` Zi Yan
2022-01-25 13:19 ` Oscar Salvador
2022-01-25 13:21 ` Oscar Salvador
2022-01-25 16:31 ` Zi Yan
2022-02-02 12:18 ` Oscar Salvador
2022-02-02 12:25 ` David Hildenbrand
2022-02-02 16:25 ` Zi Yan
2022-02-02 16:35 ` Oscar Salvador
2022-01-19 19:06 ` [PATCH v4 4/7] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2022-02-04 13:56 ` Oscar Salvador
2022-02-04 15:19 ` Zi Yan [this message]
2022-01-19 19:06 ` [PATCH v4 5/7] mm: cma: use pageblock_order as the single alignment Zi Yan
2022-01-19 19:06 ` [PATCH v4 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2022-01-19 19:06 ` [PATCH v4 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E188FF7F-F52E-4DB6-B258-B9CCD7058E56@nvidia.com \
--to=ziy@nvidia.com \
--cc=david@redhat.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=m.szyprowski@samsung.com \
--cc=mgorman@techsingularity.net \
--cc=osalvador@suse.de \
--cc=renzhengeek@gmail.com \
--cc=robin.murphy@arm.com \
--cc=vbabka@suse.cz \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).