All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: <linux-mm@kvack.org>, Oscar Salvador <OSalvador@suse.com>,
	Baoquan He <bhe@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [RFC PATCH 4/5] mm, memory_hotplug: print reason for the offlining failure
Date: Wed, 7 Nov 2018 14:04:13 -0800	[thread overview]
Message-ID: <20181107140413.2c0061e440123be76bf419bf@linux-foundation.org> (raw)
In-Reply-To: <20181107101830.17405-5-mhocko@kernel.org>

On Wed,  7 Nov 2018 11:18:29 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> The memory offlining failure reporting is inconsistent and insufficient.
> Some error paths simply do not report the failure to the log at all.
> When we do report there are no details about the reason of the failure
> and there are several of them which makes memory offlining failures
> hard to debug.
> 
> Make sure that the
> 	memory offlining [mem %#010llx-%#010llx] failed
> message is printed for all failures and also provide a short textual
> reason for the failure e.g.
> 
> [ 1984.506184] rac1 kernel: memory offlining [mem 0x82600000000-0x8267fffffff] failed due to signal backoff
> 
> this tells us that the offlining has failed because of a signal pending
> aka user intervention.
> 
> ...

Some of these messages will come out looking a bit odd.

> @@ -1573,7 +1576,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  				       MIGRATE_MOVABLE, true);
>  	if (ret) {
>  		mem_hotplug_done();
> -		return ret;
> +		reason = "failed to isolate range";

"memory offlining [mem ...] failed due to failed to isolate range"

> +		goto failed_removal
>  	}
>  
>  	arg.start_pfn = start_pfn;
> @@ -1582,15 +1586,19 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  
>  	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
>  	ret = notifier_to_errno(ret);
> -	if (ret)
> -		goto failed_removal;
> +	if (ret) {
> +		reason = "notifiers failure";

"memory offlining [mem ...] failed due to notifiers failure"

> @@ -1607,8 +1615,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  	 * actually in order to make hugetlbfs's object counting consistent.
>  	 */
>  	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> -	if (ret)
> -		goto failed_removal;
> +	if (ret) {
> +		reason = "fails to disolve hugetlb pages";

"memory offlining [mem ...] failed due to fails to disolve hugetlb pages"


Fix:

--- a/mm/memory_hotplug.c~mm-memory_hotplug-print-reason-for-the-offlining-failure-fix
+++ a/mm/memory_hotplug.c
@@ -1576,7 +1576,7 @@ static int __ref __offline_pages(unsigne
 				       MIGRATE_MOVABLE, true);
 	if (ret) {
 		mem_hotplug_done();
-		reason = "failed to isolate range";
+		reason = "failure to isolate range";
 		goto failed_removal
 	}
 
@@ -1587,7 +1587,7 @@ static int __ref __offline_pages(unsigne
 	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
 	ret = notifier_to_errno(ret);
 	if (ret) {
-		reason = "notifiers failure";
+		reason = "notifier failure";
 		goto failed_removal_isolated;
 	}
 
@@ -1616,7 +1616,7 @@ repeat:
 	 */
 	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
 	if (ret) {
-		reason = "fails to disolve hugetlb pages";
+		reason = "failure to dissolve huge pages";
 		goto failed_removal_isolated;
 	}
 	/* check again */
_


WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Oscar Salvador <OSalvador@suse.com>,
	Baoquan He <bhe@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [RFC PATCH 4/5] mm, memory_hotplug: print reason for the offlining failure
Date: Wed, 7 Nov 2018 14:04:13 -0800	[thread overview]
Message-ID: <20181107140413.2c0061e440123be76bf419bf@linux-foundation.org> (raw)
In-Reply-To: <20181107101830.17405-5-mhocko@kernel.org>

On Wed,  7 Nov 2018 11:18:29 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> The memory offlining failure reporting is inconsistent and insufficient.
> Some error paths simply do not report the failure to the log at all.
> When we do report there are no details about the reason of the failure
> and there are several of them which makes memory offlining failures
> hard to debug.
> 
> Make sure that the
> 	memory offlining [mem %#010llx-%#010llx] failed
> message is printed for all failures and also provide a short textual
> reason for the failure e.g.
> 
> [ 1984.506184] rac1 kernel: memory offlining [mem 0x82600000000-0x8267fffffff] failed due to signal backoff
> 
> this tells us that the offlining has failed because of a signal pending
> aka user intervention.
> 
> ...

Some of these messages will come out looking a bit odd.

> @@ -1573,7 +1576,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  				       MIGRATE_MOVABLE, true);
>  	if (ret) {
>  		mem_hotplug_done();
> -		return ret;
> +		reason = "failed to isolate range";

"memory offlining [mem ...] failed due to failed to isolate range"

> +		goto failed_removal
>  	}
>  
>  	arg.start_pfn = start_pfn;
> @@ -1582,15 +1586,19 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  
>  	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
>  	ret = notifier_to_errno(ret);
> -	if (ret)
> -		goto failed_removal;
> +	if (ret) {
> +		reason = "notifiers failure";

"memory offlining [mem ...] failed due to notifiers failure"

> @@ -1607,8 +1615,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  	 * actually in order to make hugetlbfs's object counting consistent.
>  	 */
>  	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> -	if (ret)
> -		goto failed_removal;
> +	if (ret) {
> +		reason = "fails to disolve hugetlb pages";

"memory offlining [mem ...] failed due to fails to disolve hugetlb pages"


Fix:

--- a/mm/memory_hotplug.c~mm-memory_hotplug-print-reason-for-the-offlining-failure-fix
+++ a/mm/memory_hotplug.c
@@ -1576,7 +1576,7 @@ static int __ref __offline_pages(unsigne
 				       MIGRATE_MOVABLE, true);
 	if (ret) {
 		mem_hotplug_done();
-		reason = "failed to isolate range";
+		reason = "failure to isolate range";
 		goto failed_removal
 	}
 
@@ -1587,7 +1587,7 @@ static int __ref __offline_pages(unsigne
 	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
 	ret = notifier_to_errno(ret);
 	if (ret) {
-		reason = "notifiers failure";
+		reason = "notifier failure";
 		goto failed_removal_isolated;
 	}
 
@@ -1616,7 +1616,7 @@ repeat:
 	 */
 	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
 	if (ret) {
-		reason = "fails to disolve hugetlb pages";
+		reason = "failure to dissolve huge pages";
 		goto failed_removal_isolated;
 	}
 	/* check again */
_

  reply	other threads:[~2018-11-07 22:04 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-07 10:18 [RFC PATCH 0/5] mm, memory_hotplug: improve memory offlining failures debugging Michal Hocko
2018-11-07 10:18 ` Michal Hocko
2018-11-07 10:18 ` [RFC PATCH 1/5] mm: print more information about mapping in __dump_page Michal Hocko
2018-11-07 10:18   ` Michal Hocko
2018-11-24  0:04   ` Andrew Morton
2018-11-24  0:04     ` Andrew Morton
2018-11-25  8:10     ` Michal Hocko
2018-11-07 10:18 ` [RFC PATCH 2/5] mm: lower the printk loglevel for __dump_page messages Michal Hocko
2018-11-07 10:18   ` Michal Hocko
2018-11-16  0:56   ` Baoquan He
2018-12-12 14:25   ` Michal Hocko
2018-12-12 14:34     ` Michal Hocko
2018-11-07 10:18 ` [RFC PATCH 3/5] mm, memory_hotplug: drop pointless block alignment checks from __offline_pages Michal Hocko
2018-11-07 10:18   ` Michal Hocko
2018-11-07 10:18 ` [RFC PATCH 4/5] mm, memory_hotplug: print reason for the offlining failure Michal Hocko
2018-11-07 10:18   ` Michal Hocko
2018-11-07 22:04   ` Andrew Morton [this message]
2018-11-07 22:04     ` Andrew Morton
2018-11-08  8:01     ` Michal Hocko
2018-11-13  8:02     ` Michal Hocko
2018-11-08  6:23   ` Anshuman Khandual
2018-11-08  7:59     ` Michal Hocko
2018-11-07 10:18 ` [RFC PATCH 5/5] mm, memory_hotplug: be more verbose for memory offline failures Michal Hocko
2018-11-07 10:18   ` Michal Hocko
2018-11-08  7:16   ` Anshuman Khandual
2018-11-08  8:12     ` Michal Hocko
2018-11-08  8:19       ` Anshuman Khandual
2018-11-13  8:03       ` Michal Hocko
2018-11-16  0:07   ` Andrew Morton
2018-11-16  0:07     ` Andrew Morton
2018-11-16  7:21     ` Michal Hocko
2018-11-16  7:21       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181107140413.2c0061e440123be76bf419bf@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=OSalvador@suse.com \
    --cc=bhe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.