linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] mm: page_alloc: dump migrate-failed pages
@ 2021-03-11 18:30 Minchan Kim
  2021-03-11 18:53 ` David Hildenbrand
  0 siblings, 1 reply; 3+ messages in thread
From: Minchan Kim @ 2021-03-11 18:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, John Dias, Michal Hocko, David Hildenbrand,
	Jason Baron, Minchan Kim

Currently, debugging CMA allocation failures is quite limited.
The most commong source of these failures seems to be page
migration which doesn't provide any useful information on the
reason of the failure by itself. alloc_contig_range can report
those failures as it holds a list of migrate-failed pages.

page refcount, mapcount with page flags on dump_page are
helpful information to deduce the culprit. Furthermore,
dump_page_owner was super helpful to find long term pinner
who initiated the page allocation.

The reason it approach with dynamic debug is the debug message
could emit lots of noises as alloc_contig_range calls more
frequently since it's a best effort allocator.

There are two ifdefery conditions to support common dyndbg options:

- CONFIG_DYNAMIC_DEBUG_CORE && DYNAMIC_DEBUG_MODULE
It aims for supporting the feature with only specific file
with adding ccflags.

- CONFIG_DYNAMIC_DEBUG
It aims for supporting the feature with system wide globally.

A simple example to enable the feature:

Admin could enable the dump like this(by default, disabled)

	echo "func alloc_contig_dump_pages +p" > control

Admin could disable it.

	echo "func alloc_contig_dump_pages =_" > control

Detail goes Documentation/admin-guide/dynamic-debug-howto.rst

A concern is utility functions in dump_page uses inconsistent
loglevels.

__dump_page: KERN_WARNING
__dump_page_owner: KERN_ALERT
        stack_trace_print: KERN_DEFAULT

There are bunch of places to use the inconsistent loglevel
utility functions(e.g., just grep dump_page/strace_trace_print).
It's unfortunate but here we are. It could be addressed
different patchset.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
* from v3 - https://lore.kernel.org/linux-mm/20210310180104.517886-1-minchan@kernel.org
  * add dyndgb usage comment - akpm
  * use dumpstack instead of warn_on - david

* from v2 - https://lore.kernel.org/linux-mm/20210308202047.1903802-1-minchan@kernel.org/
  * remove ratelimit - mhocko

* from v1 - https://lore.kernel.org/linux-mm/20210217163603.429062-1-minchan@kernel.org/
  * use dynamic debugging with system wide instead of per-call site - mhocko

 mm/page_alloc.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e4b29ee2b1e..76fc202cb105 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8453,6 +8453,36 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
 				pageblock_nr_pages));
 }
 
+#if defined(CONFIG_DYNAMIC_DEBUG) || \
+	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
+/*
+ * usage)
+ * 	dyndbg_dir="/sys/kernel/debug/dynamic_debug"
+ * To enable:
+ * 	echo "func alloc_contig_dump_pages +p" > $dyndbg_dir/control
+ * To disable:
+ * 	echo "func alloc_contig_dump_pages =_" > $dyndbg_dir/control
+ * For detail, read dynamic-debug-howto.rst
+ */
+static void alloc_contig_dump_pages(struct list_head *page_list)
+{
+	DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
+			"migrate failure");
+
+	if (DYNAMIC_DEBUG_BRANCH(descriptor)) {
+		struct page *page;
+
+		dump_stack();
+		list_for_each_entry(page, page_list, lru)
+			dump_page(page, "migration failure");
+	}
+}
+#else
+static inline void alloc_contig_dump_pages(struct list_head *page_list)
+{
+}
+#endif
+
 /* [start, end) must belong to a single zone. */
 static int __alloc_contig_migrate_range(struct compact_control *cc,
 					unsigned long start, unsigned long end)
@@ -8496,6 +8526,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
 	}
 	if (ret < 0) {
+		alloc_contig_dump_pages(&cc->migratepages);
 		putback_movable_pages(&cc->migratepages);
 		return ret;
 	}
-- 
2.31.0.rc2.261.g7f71774620-goog



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] mm: page_alloc: dump migrate-failed pages
  2021-03-11 18:30 [PATCH v4] mm: page_alloc: dump migrate-failed pages Minchan Kim
@ 2021-03-11 18:53 ` David Hildenbrand
  2021-03-11 19:27   ` Minchan Kim
  0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand @ 2021-03-11 18:53 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: linux-mm, LKML, John Dias, Michal Hocko, Jason Baron

On 11.03.21 19:30, Minchan Kim wrote:
> Currently, debugging CMA allocation failures is quite limited.
> The most commong source of these failures seems to be page

s/commong/common/

> migration which doesn't provide any useful information on the
> reason of the failure by itself. alloc_contig_range can report
> those failures as it holds a list of migrate-failed pages.
> 
> page refcount, mapcount with page flags on dump_page are
> helpful information to deduce the culprit. Furthermore,
> dump_page_owner was super helpful to find long term pinner
> who initiated the page allocation.

Maybe simply "The information logged by dump_page() has already proven 
helpful for debugging allocation issues, like identifying long-term 
pinnings on ZONE_MOVABLE or MIGRATE_CMA."

> 
> The reason it approach with dynamic debug is the debug message
> could emit lots of noises as alloc_contig_range calls more
> frequently since it's a best effort allocator.

"Let's use the dynamic debugging infrastructure, such that we avoid
flooding the logs and creating a lot of noise on frequent 
alloc_contig_range() calls. This information is helpful for debugging only."

 >>>

> 
> There are two ifdefery conditions to support common dyndbg options:
> 
> - CONFIG_DYNAMIC_DEBUG_CORE && DYNAMIC_DEBUG_MODULE
> It aims for supporting the feature with only specific file
> with adding ccflags.
> 
> - CONFIG_DYNAMIC_DEBUG
> It aims for supporting the feature with system wide globally.
> 
> A simple example to enable the feature:
> 
> Admin could enable the dump like this(by default, disabled)
> 
> 	echo "func alloc_contig_dump_pages +p" > control
> 
> Admin could disable it.
> 
> 	echo "func alloc_contig_dump_pages =_" > control
> 
> Detail goes Documentation/admin-guide/dynamic-debug-howto.rst

<<< I'd drop that completely and only mention:

"For details on dynamic debugging, see 
Documentation/admin-guide/dynamic-debug-howto.rst."

As you have usage in the code itself, I think you don't have to be 
repetitive. The ifdeffery seems to be common (e.g., 
include/linux/netdevice.) for dynamic debugging users, so I don't see 
the need to describe that in detail.

 >>>

> 
> A concern is utility functions in dump_page uses inconsistent
> loglevels.
> 
> __dump_page: KERN_WARNING
> __dump_page_owner: KERN_ALERT
>          stack_trace_print: KERN_DEFAULT
> 
> There are bunch of places to use the inconsistent loglevel
> utility functions(e.g., just grep dump_page/strace_trace_print).
> It's unfortunate but here we are. It could be addressed
> different patchset.

<<< I'd drop that completely and mention

"In the future, we might want to make the loglevels used inside 
dump_page() consistent and eventually rework the way we log the 
information here. See [1]"

Where [1] is a link to the discussion.

> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
> * from v3 - https://lore.kernel.org/linux-mm/20210310180104.517886-1-minchan@kernel.org
>    * add dyndgb usage comment - akpm
>    * use dumpstack instead of warn_on - david
> 
> * from v2 - https://lore.kernel.org/linux-mm/20210308202047.1903802-1-minchan@kernel.org/
>    * remove ratelimit - mhocko
> 
> * from v1 - https://lore.kernel.org/linux-mm/20210217163603.429062-1-minchan@kernel.org/
>    * use dynamic debugging with system wide instead of per-call site - mhocko
> 
>   mm/page_alloc.c | 31 +++++++++++++++++++++++++++++++
>   1 file changed, 31 insertions(+)
> 

Minor nits:

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e4b29ee2b1e..76fc202cb105 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8453,6 +8453,36 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
>   				pageblock_nr_pages));
>   }
>   
> +#if defined(CONFIG_DYNAMIC_DEBUG) || \
> +	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
> +/*
> + * usage)

"usage)" looks wrong here. Did you mean "Usage:"

> + * 	dyndbg_dir="/sys/kernel/debug/dynamic_debug"
> + * To enable:
> + * 	echo "func alloc_contig_dump_pages +p" > $dyndbg_dir/control
> + * To disable:
> + * 	echo "func alloc_contig_dump_pages =_" > $dyndbg_dir/control
> + * For detail, read dynamic-debug-howto.rst

Maybe simply

"See admin-guide/dynamic-debug-howto.rst"

> + */
> +static void alloc_contig_dump_pages(struct list_head *page_list)
> +{
> +	DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
> +			"migrate failure");

You can fit that into a single line.

> +
> +	if (DYNAMIC_DEBUG_BRANCH(descriptor)) {
> +		struct page *page;
> +
> +		dump_stack();
> +		list_for_each_entry(page, page_list, lru)
> +			dump_page(page, "migration failure");
> +	}
> +}
> +#else
> +static inline void alloc_contig_dump_pages(struct list_head *page_list)
> +{
> +}
> +#endif
> +
>   /* [start, end) must belong to a single zone. */
>   static int __alloc_contig_migrate_range(struct compact_control *cc,
>   					unsigned long start, unsigned long end)
> @@ -8496,6 +8526,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
>   	}
>   	if (ret < 0) {
> +		alloc_contig_dump_pages(&cc->migratepages);
>   		putback_movable_pages(&cc->migratepages);
>   		return ret;
>   	}
> 

As I said, for my taste good enough for now. I would certainly preferred 
what Michal suggested (e.g., doing it via debug loglevels), but this 
gets the job done and is not too ugly.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] mm: page_alloc: dump migrate-failed pages
  2021-03-11 18:53 ` David Hildenbrand
@ 2021-03-11 19:27   ` Minchan Kim
  0 siblings, 0 replies; 3+ messages in thread
From: Minchan Kim @ 2021-03-11 19:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, linux-mm, LKML, John Dias, Michal Hocko, Jason Baron

On Thu, Mar 11, 2021 at 07:53:01PM +0100, David Hildenbrand wrote:
> On 11.03.21 19:30, Minchan Kim wrote:
> > Currently, debugging CMA allocation failures is quite limited.
> > The most commong source of these failures seems to be page
> 
> s/commong/common/

Oops.

> 
> > migration which doesn't provide any useful information on the
> > reason of the failure by itself. alloc_contig_range can report
> > those failures as it holds a list of migrate-failed pages.
> > 
> > page refcount, mapcount with page flags on dump_page are
> > helpful information to deduce the culprit. Furthermore,
> > dump_page_owner was super helpful to find long term pinner
> > who initiated the page allocation.
> 
> Maybe simply "The information logged by dump_page() has already proven
> helpful for debugging allocation issues, like identifying long-term pinnings
> on ZONE_MOVABLE or MIGRATE_CMA."

Will take it.

> 
> > 
> > The reason it approach with dynamic debug is the debug message
> > could emit lots of noises as alloc_contig_range calls more
> > frequently since it's a best effort allocator.
> 
> "Let's use the dynamic debugging infrastructure, such that we avoid
> flooding the logs and creating a lot of noise on frequent
> alloc_contig_range() calls. This information is helpful for debugging only."

Sure.

> 
> >>>
> 
> > 
> > There are two ifdefery conditions to support common dyndbg options:
> > 
> > - CONFIG_DYNAMIC_DEBUG_CORE && DYNAMIC_DEBUG_MODULE
> > It aims for supporting the feature with only specific file
> > with adding ccflags.
> > 
> > - CONFIG_DYNAMIC_DEBUG
> > It aims for supporting the feature with system wide globally.
> > 
> > A simple example to enable the feature:
> > 
> > Admin could enable the dump like this(by default, disabled)
> > 
> > 	echo "func alloc_contig_dump_pages +p" > control
> > 
> > Admin could disable it.
> > 
> > 	echo "func alloc_contig_dump_pages =_" > control
> > 
> > Detail goes Documentation/admin-guide/dynamic-debug-howto.rst
> 
> <<< I'd drop that completely and only mention:
> 
> "For details on dynamic debugging, see
> Documentation/admin-guide/dynamic-debug-howto.rst."

I'd like to keep it since many developer are not familiar with
dynamic debug and the syntax was not straightforward to me
when I read the doc. I believe that's why Andrew asked me to
add simple comment on the comment. Including simple how to use
in the description makes reviewer more comfortable to start
the detail doc if they want to read and test.

> 
> As you have usage in the code itself, I think you don't have to be
> repetitive. The ifdeffery seems to be common (e.g.,
> include/linux/netdevice.) for dynamic debugging users, so I don't see the
> need to describe that in detail.
> 
> >>>
> 
> > 
> > A concern is utility functions in dump_page uses inconsistent
> > loglevels.
> > 
> > __dump_page: KERN_WARNING
> > __dump_page_owner: KERN_ALERT
> >          stack_trace_print: KERN_DEFAULT
> > 
> > There are bunch of places to use the inconsistent loglevel
> > utility functions(e.g., just grep dump_page/strace_trace_print).
> > It's unfortunate but here we are. It could be addressed
> > different patchset.
> 
> <<< I'd drop that completely and mention
> 
> "In the future, we might want to make the loglevels used inside dump_page()
> consistent and eventually rework the way we log the information here. See
> [1]"
> 
> Where [1] is a link to the discussion.
> 
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> > * from v3 - https://lore.kernel.org/linux-mm/20210310180104.517886-1-minchan@kernel.org
> >    * add dyndgb usage comment - akpm
> >    * use dumpstack instead of warn_on - david
> > 
> > * from v2 - https://lore.kernel.org/linux-mm/20210308202047.1903802-1-minchan@kernel.org/
> >    * remove ratelimit - mhocko
> > 
> > * from v1 - https://lore.kernel.org/linux-mm/20210217163603.429062-1-minchan@kernel.org/
> >    * use dynamic debugging with system wide instead of per-call site - mhocko
> > 
> >   mm/page_alloc.c | 31 +++++++++++++++++++++++++++++++
> >   1 file changed, 31 insertions(+)
> > 
> 
> Minor nits:
> 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3e4b29ee2b1e..76fc202cb105 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -8453,6 +8453,36 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
> >   				pageblock_nr_pages));
> >   }
> > +#if defined(CONFIG_DYNAMIC_DEBUG) || \
> > +	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
> > +/*
> > + * usage)
> 
> "usage)" looks wrong here. Did you mean "Usage:"

Sure.

> 
> > + * 	dyndbg_dir="/sys/kernel/debug/dynamic_debug"
> > + * To enable:
> > + * 	echo "func alloc_contig_dump_pages +p" > $dyndbg_dir/control
> > + * To disable:
> > + * 	echo "func alloc_contig_dump_pages =_" > $dyndbg_dir/control
> > + * For detail, read dynamic-debug-howto.rst
> 
> Maybe simply
> 
> "See admin-guide/dynamic-debug-howto.rst"

Yes, for code comment, it would be enough to point out where the detail
is. 

> 
> > + */
> > +static void alloc_contig_dump_pages(struct list_head *page_list)
> > +{
> > +	DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
> > +			"migrate failure");
> 
> You can fit that into a single line.

Sure.

> 
> > +
> > +	if (DYNAMIC_DEBUG_BRANCH(descriptor)) {
> > +		struct page *page;
> > +
> > +		dump_stack();
> > +		list_for_each_entry(page, page_list, lru)
> > +			dump_page(page, "migration failure");
> > +	}
> > +}
> > +#else
> > +static inline void alloc_contig_dump_pages(struct list_head *page_list)
> > +{
> > +}
> > +#endif
> > +
> >   /* [start, end) must belong to a single zone. */
> >   static int __alloc_contig_migrate_range(struct compact_control *cc,
> >   					unsigned long start, unsigned long end)
> > @@ -8496,6 +8526,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
> >   				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
> >   	}
> >   	if (ret < 0) {
> > +		alloc_contig_dump_pages(&cc->migratepages);
> >   		putback_movable_pages(&cc->migratepages);
> >   		return ret;
> >   	}
> > 
> 
> As I said, for my taste good enough for now. I would certainly preferred
> what Michal suggested (e.g., doing it via debug loglevels), but this gets

Sure, that is definitely improvement. Someday, I will post the clean up.

> the job done and is not too ugly.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-03-11 19:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 18:30 [PATCH v4] mm: page_alloc: dump migrate-failed pages Minchan Kim
2021-03-11 18:53 ` David Hildenbrand
2021-03-11 19:27   ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).