linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/memory_hotplug: return zero from do_migrate_range() for only success
@ 2023-02-14 22:32 SeongJae Park
  2023-02-15 13:16 ` David Hildenbrand
  0 siblings, 1 reply; 5+ messages in thread
From: SeongJae Park @ 2023-02-14 22:32 UTC (permalink / raw)
  To: akpm; +Cc: david, osalvador, linux-mm, linux-kernel, SeongJae Park

do_migrate_range() returns migrate_pages() return value, which zero
means perfect success, in usual cases.  If all pages are failed to be
isolated, however, it returns isolate_{lru,movalbe}_page() return
values, or zero if all pfn were invalid, were hugetlb or hwpoisoned.  So
do_migrate_range() returning zero means either perfect success, or
special cases of isolation total failure.

Actually, the return value is not checked by any caller, so it might be
better to simply make it a void function.  However, there is a TODO for
checking the return value.

Make it easier to understand what it means.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/memory_hotplug.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a1e8c3e9ab08..db2c02d502a2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1620,6 +1620,12 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
 	return 0;
 }
 
+/*
+ * migrate pages in the given pfn range.
+ *
+ * Returns the number of {normal folio, large folio, hugetlb} that were not
+ * migrated, or an error code.
+ */
 static int
 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
@@ -1685,6 +1691,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		}
 		put_page(page);
 	}
+	ret = -ENOENT;
 	if (!list_empty(&source)) {
 		nodemask_t nmask = node_states[N_MEMORY];
 		struct migration_target_control mtc = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/memory_hotplug: return zero from do_migrate_range() for only success
  2023-02-14 22:32 [PATCH] mm/memory_hotplug: return zero from do_migrate_range() for only success SeongJae Park
@ 2023-02-15 13:16 ` David Hildenbrand
  2023-02-15 18:03   ` SeongJae Park
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2023-02-15 13:16 UTC (permalink / raw)
  To: SeongJae Park, akpm; +Cc: osalvador, linux-mm, linux-kernel

On 14.02.23 23:32, SeongJae Park wrote:
> do_migrate_range() returns migrate_pages() return value, which zero
> means perfect success, in usual cases.  If all pages are failed to be
> isolated, however, it returns isolate_{lru,movalbe}_page() return
> values, or zero if all pfn were invalid, were hugetlb or hwpoisoned.  So
> do_migrate_range() returning zero means either perfect success, or
> special cases of isolation total failure.
> 
> Actually, the return value is not checked by any caller, so it might be
> better to simply make it a void function.  However, there is a TODO for
> checking the return value.

I'd prefer to not add more dead code ;) Let's not return an error instead.

It's still unclear which kind of fatal migration issues we actually care 
about and how to really detect them.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/memory_hotplug: return zero from do_migrate_range() for only success
  2023-02-15 13:16 ` David Hildenbrand
@ 2023-02-15 18:03   ` SeongJae Park
  2023-02-15 20:00     ` David Hildenbrand
  0 siblings, 1 reply; 5+ messages in thread
From: SeongJae Park @ 2023-02-15 18:03 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: SeongJae Park, akpm, osalvador, linux-mm, linux-kernel

On Wed, 15 Feb 2023 14:16:05 +0100 David Hildenbrand <david@redhat.com> wrote:

> On 14.02.23 23:32, SeongJae Park wrote:
> > do_migrate_range() returns migrate_pages() return value, which zero
> > means perfect success, in usual cases.  If all pages are failed to be
> > isolated, however, it returns isolate_{lru,movalbe}_page() return
> > values, or zero if all pfn were invalid, were hugetlb or hwpoisoned.  So
> > do_migrate_range() returning zero means either perfect success, or
> > special cases of isolation total failure.
> > 
> > Actually, the return value is not checked by any caller, so it might be
> > better to simply make it a void function.  However, there is a TODO for
> > checking the return value.
> 
> I'd prefer to not add more dead code ;) Let's not return an error instead.

Makes sense, I will send next spin soon.

> 
> It's still unclear which kind of fatal migration issues we actually care 
> about and how to really detect them.

What do you think about treating the isolation/migration rate limit
(migrate_rs) hit in do_migrate_range() as fatal?  It warns for the event
already, so definitely a bad sign.

If that's not that bad enough to be treated as fatal, I think we could have yet
another rate limit to be considered fatal.


Thanks,
SJ

> 
> -- 
> Thanks,
> 
> David / dhildenb

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/memory_hotplug: return zero from do_migrate_range() for only success
  2023-02-15 18:03   ` SeongJae Park
@ 2023-02-15 20:00     ` David Hildenbrand
  2023-02-15 22:33       ` SeongJae Park
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2023-02-15 20:00 UTC (permalink / raw)
  To: SeongJae Park; +Cc: akpm, osalvador, linux-mm, linux-kernel

On 15.02.23 19:03, SeongJae Park wrote:
> On Wed, 15 Feb 2023 14:16:05 +0100 David Hildenbrand <david@redhat.com> wrote:
> 
>> On 14.02.23 23:32, SeongJae Park wrote:
>>> do_migrate_range() returns migrate_pages() return value, which zero
>>> means perfect success, in usual cases.  If all pages are failed to be
>>> isolated, however, it returns isolate_{lru,movalbe}_page() return
>>> values, or zero if all pfn were invalid, were hugetlb or hwpoisoned.  So
>>> do_migrate_range() returning zero means either perfect success, or
>>> special cases of isolation total failure.
>>>
>>> Actually, the return value is not checked by any caller, so it might be
>>> better to simply make it a void function.  However, there is a TODO for
>>> checking the return value.
>>
>> I'd prefer to not add more dead code ;) Let's not return an error instead.
> 
> Makes sense, I will send next spin soon.
> 
>>
>> It's still unclear which kind of fatal migration issues we actually care
>> about and how to really detect them.
> 
> What do you think about treating the isolation/migration rate limit
> (migrate_rs) hit in do_migrate_range() as fatal?  It warns for the event
> already, so definitely a bad sign.
> 
> If that's not that bad enough to be treated as fatal, I think we could have yet
> another rate limit to be considered fatal.

IIRC, there are some setups where offlining might take several minutes 
(e.g., heavy O_DIRECT load) and that's to be expected.

So the existing code warns for better debugging, but keeps trying. So 
the ratelimit is rather to not produce too much debug output, not to 
really indicate that something is fatal.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/memory_hotplug: return zero from do_migrate_range() for only success
  2023-02-15 20:00     ` David Hildenbrand
@ 2023-02-15 22:33       ` SeongJae Park
  0 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2023-02-15 22:33 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: SeongJae Park, akpm, osalvador, linux-mm, linux-kernel

On Wed, 15 Feb 2023 21:00:50 +0100 David Hildenbrand <david@redhat.com> wrote:

> On 15.02.23 19:03, SeongJae Park wrote:
> > On Wed, 15 Feb 2023 14:16:05 +0100 David Hildenbrand <david@redhat.com> wrote:
> > 
> >> On 14.02.23 23:32, SeongJae Park wrote:
> >>> do_migrate_range() returns migrate_pages() return value, which zero
> >>> means perfect success, in usual cases.  If all pages are failed to be
> >>> isolated, however, it returns isolate_{lru,movalbe}_page() return
> >>> values, or zero if all pfn were invalid, were hugetlb or hwpoisoned.  So
> >>> do_migrate_range() returning zero means either perfect success, or
> >>> special cases of isolation total failure.
> >>>
> >>> Actually, the return value is not checked by any caller, so it might be
> >>> better to simply make it a void function.  However, there is a TODO for
> >>> checking the return value.
> >>
> >> I'd prefer to not add more dead code ;) Let's not return an error instead.
> > 
> > Makes sense, I will send next spin soon.
> > 
> >>
> >> It's still unclear which kind of fatal migration issues we actually care
> >> about and how to really detect them.
> > 
> > What do you think about treating the isolation/migration rate limit
> > (migrate_rs) hit in do_migrate_range() as fatal?  It warns for the event
> > already, so definitely a bad sign.
> > 
> > If that's not that bad enough to be treated as fatal, I think we could have yet
> > another rate limit to be considered fatal.
> 
> IIRC, there are some setups where offlining might take several minutes 
> (e.g., heavy O_DIRECT load) and that's to be expected.
> 
> So the existing code warns for better debugging, but keeps trying. So 
> the ratelimit is rather to not produce too much debug output, not to 
> really indicate that something is fatal.

Thank you for clarification, David!


Thanks,
SJ

> 
> -- 
> Thanks,
> 
> David / dhildenb

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-02-15 22:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-14 22:32 [PATCH] mm/memory_hotplug: return zero from do_migrate_range() for only success SeongJae Park
2023-02-15 13:16 ` David Hildenbrand
2023-02-15 18:03   ` SeongJae Park
2023-02-15 20:00     ` David Hildenbrand
2023-02-15 22:33       ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).