Re: [PATCH v3 2/2] mm: drain per-cpu pages outside of isolate_migratepages_range

From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@suse.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, osalvador@suse.de, richard.weiyang@gmail.com,
	vbabka@suse.cz, rientjes@google.com
Subject: Re: [PATCH v3 2/2] mm: drain per-cpu pages outside of isolate_migratepages_range
Date: Tue, 8 Sep 2020 11:26:49 +0200	[thread overview]
Message-ID: <29b37895-5824-be62-9d11-5e920069b03c@redhat.com> (raw)
In-Reply-To: <20200907073237.GG30144@dhcp22.suse.cz>

>> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> 
> I believe we should be going an opposite direction and define a more
> understandable and usable semantic for start_isolate_page_range. We do
> not want callers to scratch their heads to call all caches they might
> need to flush.

I still prefer temporarily disabling PCP, or using a different mechanism
to teach PCP to no work on isolated pageblocks (while the latter seems
to be harder to achieve). I can spot that Vlastimil already tried to
implement it - I'll have a look shortly.

Having that said, this patch improves the situation in case we cannot
get temporarily disabling of PCP implemented - instead of flushing at
random place, we flush at less random places :)

But it is far from ideal. It's just another of the special cases that
developed over time and instead of properly fixing it, we work around it.

Note: I think the flush semantics of drain_all_pages() is very different
from lru_add_drain_all(). The latter does not mess with isolated page
that have just been freed (as it's not part of the core buddy / buddy
extension).

> 
>> ---
>>  mm/memory_hotplug.c | 15 +--------------
>>  mm/page_alloc.c     |  2 ++
>>  mm/page_isolation.c | 40 ++++++++++++----------------------------
>>  3 files changed, 15 insertions(+), 42 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index b11a269e2356..5a2ed1a94555 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1536,6 +1536,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
>>  	}
>>  
>>  	do {
>> +		drain_all_pages(zone);
>>  		pfn = start_pfn;
>>  		do {
>>  			if (signal_pending(current)) {
>> @@ -1575,20 +1576,6 @@ static int __ref __offline_pages(unsigned long start_pfn,
>>  		/* check again */
>>  		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
>>  					    NULL, check_pages_isolated_cb);
>> -		/*
>> -		 * per-cpu pages are drained in start_isolate_page_range, but if
>> -		 * there are still pages that are not free, make sure that we
>> -		 * drain again, because when we isolated range we might
>> -		 * have raced with another thread that was adding pages to pcp
>> -		 * list.
>> -		 *
>> -		 * Forward progress should be still guaranteed because
>> -		 * pages on the pcp list can only belong to MOVABLE_ZONE
>> -		 * because has_unmovable_pages explicitly checks for
>> -		 * PageBuddy on freed pages on other zones.
>> -		 */
>> -		if (ret)
>> -			drain_all_pages(zone);
>>  	} while (ret);
>>  
>>  	/* Ok, all of our target is isolated.
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index fab5e97dc9ca..6d6a501a103e 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -8462,6 +8462,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>  	if (ret < 0)
>>  		return ret;
>>  
>> +	drain_all_pages(cc.zone);
>> +
>>  	/*
>>  	 * In case of -EBUSY, we'd like to know which page causes problem.
>>  	 * So, just fall through. test_pages_isolated() has a tracepoint
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 63a3db10a8c0..8dfa6c6c668d 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -19,8 +19,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>>  {
>>  	struct page *unmovable = NULL;
>>  	struct zone *zone;
>> -	unsigned long flags;
>> -	int ret = -EBUSY;
>> +	unsigned long flags, nr_pages;
>> +	int ret = -EBUSY, mt;
>>  
>>  	zone = page_zone(page);
>>  
>> @@ -39,24 +39,18 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>>  	 * We just check MOVABLE pages.
>>  	 */
>>  	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
>> -	if (!unmovable) {
>> -		unsigned long nr_pages;
>> -		int mt = get_pageblock_migratetype(page);
>> -
>> -		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>> -		zone->nr_isolate_pageblock++;
>> -		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
>> -									NULL);
>> -
>> -		__mod_zone_freepage_state(zone, -nr_pages, mt);
>> -		ret = 0;
>> -	}
>> +	if (unmovable)
>> +		goto out;
>>  
>> +	mt = get_pageblock_migratetype(page);
>> +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>> +	zone->nr_isolate_pageblock++;
>> +	nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, NULL);
>> +	__mod_zone_freepage_state(zone, -nr_pages, mt);
>> +	ret = 0;
>>  out:
>>  	spin_unlock_irqrestore(&zone->lock, flags);
>> -	if (!ret) {
>> -		drain_all_pages(zone);
>> -	} else {
>> +	if (ret) {
>>  		WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
>>  

There are patches in -mm/-next by me that already cleanup/reshuffle that
code heavily, so better just keep the changes minimal or built up on top
of -next.

>>  		if ((isol_flags & REPORT_FAILURE) && unmovable)
>> @@ -170,14 +164,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>   * pageblocks we may have modified and return -EBUSY to caller. This
>>   * prevents two threads from simultaneously working on overlapping ranges.
>>   *
>> - * Please note that there is no strong synchronization with the page allocator
>> - * either. Pages might be freed while their page blocks are marked ISOLATED.
>> - * In some cases pages might still end up on pcp lists and that would allow
>> - * for their allocation even when they are in fact isolated already. Depending
>> - * on how strong of a guarantee the caller needs drain_all_pages might be needed
>> - * (e.g. __offline_pages will need to call it after check for isolated range for
>> - * a next retry).
>> - *
>>   * Return: the number of isolated pageblocks on success and -EBUSY if any part
>>   * of range cannot be isolated.
>>   */
>> @@ -192,9 +178,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>  	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
>>  	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
>>  
>> -	for (pfn = start_pfn;
>> -	     pfn < end_pfn;
>> -	     pfn += pageblock_nr_pages) {
>> +	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {

unnecessary code churn.

>>  		page = __first_valid_page(pfn, pageblock_nr_pages);
>>  		if (page) {
>>  			if (set_migratetype_isolate(page, migratetype, flags)) {
>> -- 
>> 2.25.1
>>
> 

-- 
Thanks,

David / dhildenb