linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Oscar Salvador <osalvador@suse.de>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [RFC 5/5] mm, page_alloc: disable pcplists during page isolation
Date: Thu, 10 Sep 2020 13:05:13 +0200	[thread overview]
Message-ID: <3d3b53db-aeaa-ff24-260b-36427fac9b1c@suse.cz> (raw)
In-Reply-To: <5c5753b4-8cb9-fa02-a0fa-d5ca22731cbb@redhat.com>

On 9/10/20 12:29 PM, David Hildenbrand wrote:
> On 09.09.20 13:55, Vlastimil Babka wrote:
>> On 9/9/20 1:36 PM, Michal Hocko wrote:
>>> On Wed 09-09-20 12:48:54, Vlastimil Babka wrote:
>>>> Here's a version that will apply on top of next-20200908. The first 4 patches need no change.
>>>> For callers that pair start_isolate_page_range() with
>>>> undo_isolated_page_range() properly, this is transparent. Currently that's
>>>> alloc_contig_range(). __offline_pages() doesn't call undo_isolated_page_range()
>>>> in the succes case, so it has to be carful to handle restoring pcp->high and batch
>>>> and unlocking pcp_batch_high_lock.
>>>
>>> I was hoping that it would be possible to have this completely hidden
>>> inside start_isolate_page_range code path.
>> 
>> I hoped so too, but we can't know the moment when all processes that were in the
>> critical part of freeing pages to pcplists have moved on (they might have been
>> rescheduled).
>> We could change free_unref_page() to disable IRQs sooner, before
>> free_unref_page_prepare(), or at least the get_pfnblock_migratetype() part. Then
>> after the single drain, we should be safe, AFAICS?
> 
> At least moving it before getting the migratetype should not be that severe?
> 
>> RT guys might not be happy though, but it's much simpler than this patch. I
>> still like some of the cleanups in 1-4 though tbh :)
> 
> It would certainly make this patch much simpler. Do you have a prototype
> lying around?

Below is the critical part, while writing it I realized that there's also
free_unref_page_list() where it's more ugly.

So free_unref_page() simply moves the "determine migratetype" part under
disabled interrupts. For free_unref_page_list() we optimistically determine them
without disabling interrupts, and with disabled interrupts we check if zone has
isolated pageblocks and thus we should not trust that value anymore. That's same
pattern as free_pcppages_bulk uses.

But unfortunately it means an extra page_zone() plus a test for each page on the
list :/

----8<----
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index defefed79cfb..57e2a341c95c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3103,18 +3103,21 @@ void mark_free_pages(struct zone *zone)
 }
 #endif /* CONFIG_PM */
 
-static bool free_unref_page_prepare(struct page *page, unsigned long pfn)
+static bool free_unref_page_prepare(struct page *page)
 {
-	int migratetype;
-
 	if (!free_pcp_prepare(page))
 		return false;
 
-	migratetype = get_pfnblock_migratetype(page, pfn);
-	set_pcppage_migratetype(page, migratetype);
 	return true;
 }
 
+static inline void free_unref_page_set_migratetype(struct page *page, unsigned long pfn)
+{
+	int migratetype	= get_pfnblock_migratetype(page, pfn);
+
+	set_pcppage_migratetype(page, migratetype);
+}
+
 static void free_unref_page_commit(struct page *page, unsigned long pfn)
 {
 	struct zone *zone = page_zone(page);
@@ -3156,10 +3159,17 @@ void free_unref_page(struct page *page)
 	unsigned long flags;
 	unsigned long pfn = page_to_pfn(page);
 
-	if (!free_unref_page_prepare(page, pfn))
+	if (!free_unref_page_prepare(page))
 		return;
 
+	/*
+	 * by disabling interrupts before reading pageblock's migratetype,
+	 * we can guarantee that after changing a pageblock to MIGRATE_ISOLATE
+	 * and drain_all_pages(), there's nobody who would read the old
+	 * migratetype and put a page from isoalted pageblock to pcplists
+	 */
 	local_irq_save(flags);
+	free_unref_page_set_migratetype(page, pfn);
 	free_unref_page_commit(page, pfn);
 	local_irq_restore(flags);
 }
@@ -3176,9 +3186,10 @@ void free_unref_page_list(struct list_head *list)
 	/* Prepare pages for freeing */
 	list_for_each_entry_safe(page, next, list, lru) {
 		pfn = page_to_pfn(page);
-		if (!free_unref_page_prepare(page, pfn))
+		if (!free_unref_page_prepare(page))
 			list_del(&page->lru);
 		set_page_private(page, pfn);
+		free_unref_page_set_migratetype(page, pfn);
 	}
 
 	local_irq_save(flags);
@@ -3187,6 +3198,8 @@ void free_unref_page_list(struct list_head *list)
 
 		set_page_private(page, 0);
 		trace_mm_page_free_batched(page);
+		if (has_isolate_pageblock(page_zone(page)))
+			free_unref_page_set_migratetype(page, pfn);
 		free_unref_page_commit(page, pfn);
 
 		/*



  reply	other threads:[~2020-09-10 11:07 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 16:36 [RFC 0/5] disable pcplists during page isolation Vlastimil Babka
2020-09-07 16:36 ` [RFC 1/5] mm, page_alloc: clean up pageset high and batch update Vlastimil Babka
2020-09-10  8:31   ` Oscar Salvador
2020-09-10  8:34     ` Oscar Salvador
2020-09-10 10:47     ` David Hildenbrand
2020-09-17 16:05     ` Vlastimil Babka
2020-09-07 16:36 ` [RFC 2/5] mm, page_alloc: calculate pageset high and batch once per zone Vlastimil Babka
2020-09-10  9:00   ` Oscar Salvador
2020-09-10 10:56   ` David Hildenbrand
2020-09-07 16:36 ` [RFC 3/5] mm, page_alloc(): remove setup_pageset() Vlastimil Babka
2020-09-10  9:23   ` Oscar Salvador
2020-09-10  9:57     ` Oscar Salvador
2020-09-18  9:37     ` Vlastimil Babka
2020-09-10 11:00   ` David Hildenbrand
2020-09-07 16:36 ` [RFC 4/5] mm, page_alloc: cache pageset high and batch in struct zone Vlastimil Babka
2020-09-10 11:30   ` Oscar Salvador
2020-09-18 12:02     ` Vlastimil Babka
2020-09-07 16:36 ` [RFC 5/5] mm, page_alloc: disable pcplists during page isolation Vlastimil Babka
2020-09-09 10:48   ` Vlastimil Babka
2020-09-09 11:36     ` Michal Hocko
2020-09-09 11:44       ` David Hildenbrand
2020-09-09 12:57         ` David Hildenbrand
2020-09-09 11:55       ` Vlastimil Babka
2020-09-10 10:29         ` David Hildenbrand
2020-09-10 11:05           ` Vlastimil Babka [this message]
2020-09-10 12:42             ` David Hildenbrand
2020-09-09 12:53       ` Pavel Tatashin
2020-09-08 18:29 ` [RFC 0/5] " David Hildenbrand
2020-09-09 10:54   ` Vlastimil Babka
2020-09-09 11:27     ` osalvador
2020-09-09 11:29       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d3b53db-aeaa-ff24-260b-36427fac9b1c@suse.cz \
    --to=vbabka@suse.cz \
    --cc=david@redhat.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).