linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] debug_pagealloc improvements through page_owner
@ 2019-08-16 10:13 Vlastimil Babka
  2019-08-16 10:13 ` [PATCH 1/3] mm, page_owner: record page owner for each subpage Vlastimil Babka
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Vlastimil Babka @ 2019-08-16 10:13 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Kirill A. Shutemov, Michal Hocko, Mel Gorman,
	Matthew Wilcox, Vlastimil Babka

The debug_pagealloc functionality serves a similar purpose on the page
allocator level that slub_debug does on the kmalloc level, which is to detect
bad users. One notable feature that slub_debug has is storing stack traces of
who last allocated and freed the object. On page level we track allocations via
page_owner, but that info is discarded when freeing, and we don't track freeing
at all. This series improves those aspects. With both debug_pagealloc and
page_owner enabled, we can then get bug reports such as the example in Patch 3.

SLUB debug tracking additionaly stores cpu, pid and timestamp. This could be
added later, if deemed useful enough to justify the additional page_ext
structure size.

Vlastimil Babka (3):
  mm, page_owner: record page owner for each subpage
  mm, page_owner: keep owner info when freeing the page
  mm, page_owner, debug_pagealloc: save and dump freeing stack trace

 .../admin-guide/kernel-parameters.txt         |   2 +
 include/linux/page_ext.h                      |   1 +
 mm/Kconfig.debug                              |   4 +-
 mm/page_owner.c                               | 123 +++++++++++++-----
 4 files changed, 96 insertions(+), 34 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] mm, page_owner: record page owner for each subpage
  2019-08-16 10:13 [PATCH 0/3] debug_pagealloc improvements through page_owner Vlastimil Babka
@ 2019-08-16 10:13 ` Vlastimil Babka
  2019-08-16 14:04   ` Kirill A. Shutemov
  2019-08-16 10:14 ` [PATCH 2/3] mm, page_owner: keep owner info when freeing the page Vlastimil Babka
  2019-08-16 10:14 ` [PATCH 3/3] mm, page_owner, debug_pagealloc: save and dump freeing stack trace Vlastimil Babka
  2 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2019-08-16 10:13 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Kirill A. Shutemov, Michal Hocko, Mel Gorman,
	Matthew Wilcox, Vlastimil Babka

Currently, page owner info is only recorded for the first page of a high-order
allocation, and copied to tail pages in the event of a split page. With the
plan to keep previous owner info after freeing the page, it would be benefical
to record page owner for each subpage upon allocation. This increases the
overhead for high orders, but that should be acceptable for a debugging option.

The order stored for each subpage is the order of the whole allocation. This
makes it possible to calculate the "head" pfn and to recognize "tail" pages
(quoted because not all high-order allocations are compound pages with true
head and tail pages). When reading the page_owner debugfs file, keep skipping
the "tail" pages so that stats gathered by existing scripts don't get inflated.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_owner.c | 40 ++++++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index addcbb2ae4e4..813fcb70547b 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -154,18 +154,23 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
 	return handle;
 }
 
-static inline void __set_page_owner_handle(struct page_ext *page_ext,
-	depot_stack_handle_t handle, unsigned int order, gfp_t gfp_mask)
+static inline void __set_page_owner_handle(struct page *page,
+	struct page_ext *page_ext, depot_stack_handle_t handle,
+	unsigned int order, gfp_t gfp_mask)
 {
 	struct page_owner *page_owner;
+	int i;
 
-	page_owner = get_page_owner(page_ext);
-	page_owner->handle = handle;
-	page_owner->order = order;
-	page_owner->gfp_mask = gfp_mask;
-	page_owner->last_migrate_reason = -1;
+	for (i = 0; i < (1 << order); i++) {
+		page_owner = get_page_owner(page_ext);
+		page_owner->handle = handle;
+		page_owner->order = order;
+		page_owner->gfp_mask = gfp_mask;
+		page_owner->last_migrate_reason = -1;
+		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
 
-	__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
+		page_ext = lookup_page_ext(page + i);
+	}
 }
 
 noinline void __set_page_owner(struct page *page, unsigned int order,
@@ -178,7 +183,7 @@ noinline void __set_page_owner(struct page *page, unsigned int order,
 		return;
 
 	handle = save_stack(gfp_mask);
-	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
+	__set_page_owner_handle(page, page_ext, handle, order, gfp_mask);
 }
 
 void __set_page_owner_migrate_reason(struct page *page, int reason)
@@ -204,8 +209,11 @@ void __split_page_owner(struct page *page, unsigned int order)
 
 	page_owner = get_page_owner(page_ext);
 	page_owner->order = 0;
-	for (i = 1; i < (1 << order); i++)
-		__copy_page_owner(page, page + i);
+	for (i = 1; i < (1 << order); i++) {
+		page_ext = lookup_page_ext(page + i);
+		page_owner = get_page_owner(page_ext);
+		page_owner->order = 0;
+	}
 }
 
 void __copy_page_owner(struct page *oldpage, struct page *newpage)
@@ -483,6 +491,13 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 
 		page_owner = get_page_owner(page_ext);
 
+		/*
+		 * Don't print "tail" pages of high-order allocations as that
+		 * would inflate the stats.
+		 */
+		if (!IS_ALIGNED(pfn, 1 << page_owner->order))
+			continue;
+
 		/*
 		 * Access to page_ext->handle isn't synchronous so we should
 		 * be careful to access it.
@@ -562,7 +577,8 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
 				continue;
 
 			/* Found early allocated page */
-			__set_page_owner_handle(page_ext, early_handle, 0, 0);
+			__set_page_owner_handle(page, page_ext, early_handle,
+						0, 0);
 			count++;
 		}
 		cond_resched();
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] mm, page_owner: keep owner info when freeing the page
  2019-08-16 10:13 [PATCH 0/3] debug_pagealloc improvements through page_owner Vlastimil Babka
  2019-08-16 10:13 ` [PATCH 1/3] mm, page_owner: record page owner for each subpage Vlastimil Babka
@ 2019-08-16 10:14 ` Vlastimil Babka
  2019-08-16 10:14 ` [PATCH 3/3] mm, page_owner, debug_pagealloc: save and dump freeing stack trace Vlastimil Babka
  2 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2019-08-16 10:14 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Kirill A. Shutemov, Michal Hocko, Mel Gorman,
	Matthew Wilcox, Vlastimil Babka

For debugging purposes it might be useful to keep the owner info even after
page has been freed, and include it in e.g. dump_page() when detecting a bad
page state. For that, change the PAGE_EXT_OWNER flag meaning to "page owner
info has been set at least once" and add new PAGE_EXT_OWNER_ACTIVE for tracking
whether page is supposed to be currently tracked allocated or free. Adjust
dump_page() accordingly, distinguishing free and allocated pages. In the
page_owner debugfs file, keep printing only allocated pages so that existing
scripts are not confused, and also because free pages are irrelevant for the
memory statistics or leak detection that's the typical use case of the file,
anyway.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/page_ext.h |  1 +
 mm/page_owner.c          | 34 ++++++++++++++++++++++++----------
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 09592951725c..682fd465df06 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -18,6 +18,7 @@ struct page_ext_operations {
 
 enum page_ext_flags {
 	PAGE_EXT_OWNER,
+	PAGE_EXT_OWNER_ACTIVE,
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	PAGE_EXT_YOUNG,
 	PAGE_EXT_IDLE,
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 813fcb70547b..4a48e018dbdf 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -111,7 +111,7 @@ void __reset_page_owner(struct page *page, unsigned int order)
 		page_ext = lookup_page_ext(page + i);
 		if (unlikely(!page_ext))
 			continue;
-		__clear_bit(PAGE_EXT_OWNER, &page_ext->flags);
+		__clear_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
 	}
 }
 
@@ -168,6 +168,7 @@ static inline void __set_page_owner_handle(struct page *page,
 		page_owner->gfp_mask = gfp_mask;
 		page_owner->last_migrate_reason = -1;
 		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
+		__set_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
 
 		page_ext = lookup_page_ext(page + i);
 	}
@@ -243,6 +244,7 @@ void __copy_page_owner(struct page *oldpage, struct page *newpage)
 	 * the new page, which will be freed.
 	 */
 	__set_bit(PAGE_EXT_OWNER, &new_ext->flags);
+	__set_bit(PAGE_EXT_OWNER_ACTIVE, &new_ext->flags);
 }
 
 void pagetypeinfo_showmixedcount_print(struct seq_file *m,
@@ -302,7 +304,7 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 			if (unlikely(!page_ext))
 				continue;
 
-			if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags))
+			if (!test_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags))
 				continue;
 
 			page_owner = get_page_owner(page_ext);
@@ -413,21 +415,26 @@ void __dump_page_owner(struct page *page)
 	mt = gfpflags_to_migratetype(gfp_mask);
 
 	if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) {
-		pr_alert("page_owner info is not active (free page?)\n");
+		pr_alert("page_owner info is not present (never set?)\n");
 		return;
 	}
 
+	if (test_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags))
+		pr_alert("page_owner tracks the page as allocated\n");
+	else
+		pr_alert("page_owner tracks the page as freed\n");
+
+	pr_alert("page last allocated via order %u, migratetype %s, gfp_mask %#x(%pGg)\n",
+		 page_owner->order, migratetype_names[mt], gfp_mask, &gfp_mask);
+
 	handle = READ_ONCE(page_owner->handle);
 	if (!handle) {
-		pr_alert("page_owner info is not active (free page?)\n");
-		return;
+		pr_alert("page_owner allocation stack trace missing\n");
+	} else {
+		nr_entries = stack_depot_fetch(handle, &entries);
+		stack_trace_print(entries, nr_entries, 0);
 	}
 
-	nr_entries = stack_depot_fetch(handle, &entries);
-	pr_alert("page allocated via order %u, migratetype %s, gfp_mask %#x(%pGg)\n",
-		 page_owner->order, migratetype_names[mt], gfp_mask, &gfp_mask);
-	stack_trace_print(entries, nr_entries, 0);
-
 	if (page_owner->last_migrate_reason != -1)
 		pr_alert("page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
@@ -489,6 +496,13 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 		if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags))
 			continue;
 
+		/*
+		 * Although we do have the info about past allocation of free
+		 * pages, it's not relevant for current memory usage.
+		 */
+		if (!test_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags))
+			continue;
+
 		page_owner = get_page_owner(page_ext);
 
 		/*
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] mm, page_owner, debug_pagealloc: save and dump freeing stack trace
  2019-08-16 10:13 [PATCH 0/3] debug_pagealloc improvements through page_owner Vlastimil Babka
  2019-08-16 10:13 ` [PATCH 1/3] mm, page_owner: record page owner for each subpage Vlastimil Babka
  2019-08-16 10:14 ` [PATCH 2/3] mm, page_owner: keep owner info when freeing the page Vlastimil Babka
@ 2019-08-16 10:14 ` Vlastimil Babka
  2 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2019-08-16 10:14 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Kirill A. Shutemov, Michal Hocko, Mel Gorman,
	Matthew Wilcox, Vlastimil Babka

The debug_pagealloc functionality is useful to catch buggy page allocator users
that cause e.g. use after free or double free. When page inconsistency is
detected, debugging is often simpler by knowing the call stack of process that
last allocated and freed the page. When page_owner is also enabled, we record
the allocation stack trace, but not freeing.

This patch therefore adds recording of freeing process stack trace to page
owner info, if both page_owner and debug_pagealloc are configured and enabled.
With only page_owner enabled, this info is not useful for the memory leak
debugging use case. dump_page() is adjusted to print the info. An example
result of calling __free_pages() twice may look like this (note the page last free
stack trace):

BUG: Bad page state in process bash  pfn:13d8f8
page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x1affff800000000()
raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount
page_owner tracks the page as freed
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL)
 prep_new_page+0x143/0x150
 get_page_from_freelist+0x289/0x380
 __alloc_pages_nodemask+0x13c/0x2d0
 khugepaged+0x6e/0xc10
 kthread+0xf9/0x130
 ret_from_fork+0x3a/0x50
page last free stack trace:
 free_pcp_prepare+0x134/0x1e0
 free_unref_page+0x18/0x90
 khugepaged+0x7b/0xc10
 kthread+0xf9/0x130
 ret_from_fork+0x3a/0x50
Modules linked in:
CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Call Trace:
 dump_stack+0x85/0xc0
 bad_page.cold+0xba/0xbf
 rmqueue_pcplist.isra.0+0x6c5/0x6d0
 rmqueue+0x2d/0x810
 get_page_from_freelist+0x191/0x380
 __alloc_pages_nodemask+0x13c/0x2d0
 __get_free_pages+0xd/0x30
 __pud_alloc+0x2c/0x110
 copy_page_range+0x4f9/0x630
 dup_mmap+0x362/0x480
 dup_mm+0x68/0x110
 copy_process+0x19e1/0x1b40
 _do_fork+0x73/0x310
 __x64_sys_clone+0x75/0x80
 do_syscall_64+0x6e/0x1e0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f10af854a10
...

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 .../admin-guide/kernel-parameters.txt         |  2 +
 mm/Kconfig.debug                              |  4 +-
 mm/page_owner.c                               | 53 ++++++++++++++-----
 3 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 47d981a86e2f..e813a17d622e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -809,6 +809,8 @@
 			enables the feature at boot time. By default, it is
 			disabled and the system will work mostly the same as a
 			kernel built without CONFIG_DEBUG_PAGEALLOC.
+			Note: to get most of debug_pagealloc error reports, it's
+			useful to also enable the page_owner functionality.
 			on: enable the feature
 
 	debugpat	[X86] Enable PAT debugging
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 82b6a20898bd..327b3ebf23bf 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -21,7 +21,9 @@ config DEBUG_PAGEALLOC
 	  Also, the state of page tracking structures is checked more often as
 	  pages are being allocated and freed, as unexpected state changes
 	  often happen for same reasons as memory corruption (e.g. double free,
-	  use-after-free).
+	  use-after-free). The error reports for these checks can be augmented
+	  with stack traces of last allocation and freeing of the page, when
+	  PAGE_OWNER is also selected and enabled on boot.
 
 	  For architectures which don't enable ARCH_SUPPORTS_DEBUG_PAGEALLOC,
 	  fill the pages with poison patterns after free_pages() and verify
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 4a48e018dbdf..dee931184788 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -24,6 +24,9 @@ struct page_owner {
 	short last_migrate_reason;
 	gfp_t gfp_mask;
 	depot_stack_handle_t handle;
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	depot_stack_handle_t free_handle;
+#endif
 };
 
 static bool page_owner_disabled = true;
@@ -102,19 +105,6 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext)
 	return (void *)page_ext + page_owner_ops.offset;
 }
 
-void __reset_page_owner(struct page *page, unsigned int order)
-{
-	int i;
-	struct page_ext *page_ext;
-
-	for (i = 0; i < (1 << order); i++) {
-		page_ext = lookup_page_ext(page + i);
-		if (unlikely(!page_ext))
-			continue;
-		__clear_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
-	}
-}
-
 static inline bool check_recursive_alloc(unsigned long *entries,
 					 unsigned int nr_entries,
 					 unsigned long ip)
@@ -154,6 +144,32 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
 	return handle;
 }
 
+void __reset_page_owner(struct page *page, unsigned int order)
+{
+	int i;
+	struct page_ext *page_ext;
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	depot_stack_handle_t handle = 0;
+	struct page_owner *page_owner;
+
+	if (debug_pagealloc_enabled())
+		handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
+#endif
+
+	for (i = 0; i < (1 << order); i++) {
+		page_ext = lookup_page_ext(page + i);
+		if (unlikely(!page_ext))
+			continue;
+		__clear_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
+#ifdef CONFIG_DEBUG_PAGEALLOC
+		if (debug_pagealloc_enabled()) {
+			page_owner = get_page_owner(page_ext);
+			page_owner->free_handle = handle;
+		}
+#endif
+	}
+}
+
 static inline void __set_page_owner_handle(struct page *page,
 	struct page_ext *page_ext, depot_stack_handle_t handle,
 	unsigned int order, gfp_t gfp_mask)
@@ -435,6 +451,17 @@ void __dump_page_owner(struct page *page)
 		stack_trace_print(entries, nr_entries, 0);
 	}
 
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	handle = READ_ONCE(page_owner->free_handle);
+	if (!handle) {
+		pr_alert("page_owner free stack trace missing\n");
+	} else {
+		nr_entries = stack_depot_fetch(handle, &entries);
+		pr_alert("page last free stack trace:\n");
+		stack_trace_print(entries, nr_entries, 0);
+	}
+#endif
+
 	if (page_owner->last_migrate_reason != -1)
 		pr_alert("page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] mm, page_owner: record page owner for each subpage
  2019-08-16 10:13 ` [PATCH 1/3] mm, page_owner: record page owner for each subpage Vlastimil Babka
@ 2019-08-16 14:04   ` Kirill A. Shutemov
  2019-08-19 11:46     ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2019-08-16 14:04 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Kirill A. Shutemov,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On Fri, Aug 16, 2019 at 12:13:59PM +0200, Vlastimil Babka wrote:
> Currently, page owner info is only recorded for the first page of a high-order
> allocation, and copied to tail pages in the event of a split page. With the
> plan to keep previous owner info after freeing the page, it would be benefical
> to record page owner for each subpage upon allocation. This increases the
> overhead for high orders, but that should be acceptable for a debugging option.
> 
> The order stored for each subpage is the order of the whole allocation. This
> makes it possible to calculate the "head" pfn and to recognize "tail" pages
> (quoted because not all high-order allocations are compound pages with true
> head and tail pages). When reading the page_owner debugfs file, keep skipping
> the "tail" pages so that stats gathered by existing scripts don't get inflated.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Hm. That's all reasonable, but I have a question: do you see how page
owner thing works for THP now?

I don't see anything in split_huge_page() path (do not confuse it with
split_page() path) that would copy the information to tail pages. Do you?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] mm, page_owner: record page owner for each subpage
  2019-08-16 14:04   ` Kirill A. Shutemov
@ 2019-08-19 11:46     ` Vlastimil Babka
  2019-08-19 11:55       ` Kirill A. Shutemov
  0 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2019-08-19 11:46 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Andrew Morton, linux-kernel, Kirill A. Shutemov,
	Michal Hocko, Mel Gorman, Matthew Wilcox


On 8/16/19 4:04 PM, Kirill A. Shutemov wrote:
> On Fri, Aug 16, 2019 at 12:13:59PM +0200, Vlastimil Babka wrote:
>> Currently, page owner info is only recorded for the first page of a high-order
>> allocation, and copied to tail pages in the event of a split page. With the
>> plan to keep previous owner info after freeing the page, it would be benefical
>> to record page owner for each subpage upon allocation. This increases the
>> overhead for high orders, but that should be acceptable for a debugging option.
>>
>> The order stored for each subpage is the order of the whole allocation. This
>> makes it possible to calculate the "head" pfn and to recognize "tail" pages
>> (quoted because not all high-order allocations are compound pages with true
>> head and tail pages). When reading the page_owner debugfs file, keep skipping
>> the "tail" pages so that stats gathered by existing scripts don't get inflated.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Hm. That's all reasonable, but I have a question: do you see how page
> owner thing works for THP now?
> 
> I don't see anything in split_huge_page() path (do not confuse it with
> split_page() path) that would copy the information to tail pages. Do you?
 
You're right, it's missing. This patch fixes that and can be added e.g.
at the end of the series.

----8<----
From 56ac1b41559eecf52a2d453c49ce66dbbb227c64 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Mon, 19 Aug 2019 13:38:29 +0200
Subject: [PATCH] mm, page_owner: handle THP splits correctly

THP splitting path is missing the split_page_owner() call that split_page()
has. As a result, split THP pages are wrongly reported in the page_owner file
as order-9 pages. Furthermore when the former head page is freed, the remaining
former tail pages are not listed in the page_owner file at all. This patch
fixes that by adding the split_page_owner() call into __split_huge_page().

Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/huge_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 738065f765ab..d727a0401484 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -32,6 +32,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/oom.h>
 #include <linux/numa.h>
+#include <linux/page_owner.h>
 
 #include <asm/tlb.h>
 #include <asm/pgalloc.h>
@@ -2533,6 +2534,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 
 	remap_page(head);
 
+	split_page_owner(head, HPAGE_PMD_ORDER);
+
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		struct page *subpage = head + i;
 		if (subpage == page)
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] mm, page_owner: record page owner for each subpage
  2019-08-19 11:46     ` Vlastimil Babka
@ 2019-08-19 11:55       ` Kirill A. Shutemov
  2019-08-19 11:57         ` Kirill A. Shutemov
  0 siblings, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2019-08-19 11:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, linux-mm, Andrew Morton, linux-kernel,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On Mon, Aug 19, 2019 at 11:46:37AM +0000, Vlastimil Babka wrote:
> 
> On 8/16/19 4:04 PM, Kirill A. Shutemov wrote:
> > On Fri, Aug 16, 2019 at 12:13:59PM +0200, Vlastimil Babka wrote:
> >> Currently, page owner info is only recorded for the first page of a high-order
> >> allocation, and copied to tail pages in the event of a split page. With the
> >> plan to keep previous owner info after freeing the page, it would be benefical
> >> to record page owner for each subpage upon allocation. This increases the
> >> overhead for high orders, but that should be acceptable for a debugging option.
> >>
> >> The order stored for each subpage is the order of the whole allocation. This
> >> makes it possible to calculate the "head" pfn and to recognize "tail" pages
> >> (quoted because not all high-order allocations are compound pages with true
> >> head and tail pages). When reading the page_owner debugfs file, keep skipping
> >> the "tail" pages so that stats gathered by existing scripts don't get inflated.
> >>
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Hm. That's all reasonable, but I have a question: do you see how page
> > owner thing works for THP now?
> > 
> > I don't see anything in split_huge_page() path (do not confuse it with
> > split_page() path) that would copy the information to tail pages. Do you?
>  
> You're right, it's missing. This patch fixes that and can be added e.g.
> at the end of the series.

I would rather put it the first. Possbily with stable@.

> ----8<----
> From 56ac1b41559eecf52a2d453c49ce66dbbb227c64 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Mon, 19 Aug 2019 13:38:29 +0200
> Subject: [PATCH] mm, page_owner: handle THP splits correctly
> 
> THP splitting path is missing the split_page_owner() call that split_page()
> has. As a result, split THP pages are wrongly reported in the page_owner file
> as order-9 pages. Furthermore when the former head page is freed, the remaining
> former tail pages are not listed in the page_owner file at all. This patch
> fixes that by adding the split_page_owner() call into __split_huge_page().
> 
> Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/huge_memory.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 738065f765ab..d727a0401484 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -32,6 +32,7 @@
>  #include <linux/shmem_fs.h>
>  #include <linux/oom.h>
>  #include <linux/numa.h>
> +#include <linux/page_owner.h>
>  
>  #include <asm/tlb.h>
>  #include <asm/pgalloc.h>
> @@ -2533,6 +2534,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>  
>  	remap_page(head);
>  
> +	split_page_owner(head, HPAGE_PMD_ORDER);
> +

I think it has to be before remap_page(). This way nobody would be able to
mess with the page until it has valid page_owner.

>  	for (i = 0; i < HPAGE_PMD_NR; i++) {
>  		struct page *subpage = head + i;
>  		if (subpage == page)
> -- 
> 2.22.0
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] mm, page_owner: record page owner for each subpage
  2019-08-19 11:55       ` Kirill A. Shutemov
@ 2019-08-19 11:57         ` Kirill A. Shutemov
  2019-08-19 13:11           ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2019-08-19 11:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, linux-mm, Andrew Morton, linux-kernel,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On Mon, Aug 19, 2019 at 11:55:51AM +0000, Kirill A. Shutemov wrote:
> > @@ -2533,6 +2534,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> >  
> >  	remap_page(head);
> >  
> > +	split_page_owner(head, HPAGE_PMD_ORDER);
> > +
> 
> I think it has to be before remap_page(). This way nobody would be able to
> mess with the page until it has valid page_owner.

Or rather next to ClearPageCompound().

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] mm, page_owner: record page owner for each subpage
  2019-08-19 11:57         ` Kirill A. Shutemov
@ 2019-08-19 13:11           ` Vlastimil Babka
  0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2019-08-19 13:11 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, linux-mm, Andrew Morton, linux-kernel,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On 8/19/19 1:57 PM, Kirill A. Shutemov wrote:
> On Mon, Aug 19, 2019 at 11:55:51AM +0000, Kirill A. Shutemov wrote:
>>> @@ -2533,6 +2534,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>>  
>>>  	remap_page(head);
>>>  
>>> +	split_page_owner(head, HPAGE_PMD_ORDER);
>>> +
>>
>> I think it has to be before remap_page(). This way nobody would be able to
>> mess with the page until it has valid page_owner.
> 
> Or rather next to ClearPageCompound().

OK, will wait for more feedback and repost with that patch first, and as
you suggest.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-08-19 13:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16 10:13 [PATCH 0/3] debug_pagealloc improvements through page_owner Vlastimil Babka
2019-08-16 10:13 ` [PATCH 1/3] mm, page_owner: record page owner for each subpage Vlastimil Babka
2019-08-16 14:04   ` Kirill A. Shutemov
2019-08-19 11:46     ` Vlastimil Babka
2019-08-19 11:55       ` Kirill A. Shutemov
2019-08-19 11:57         ` Kirill A. Shutemov
2019-08-19 13:11           ` Vlastimil Babka
2019-08-16 10:14 ` [PATCH 2/3] mm, page_owner: keep owner info when freeing the page Vlastimil Babka
2019-08-16 10:14 ` [PATCH 3/3] mm, page_owner, debug_pagealloc: save and dump freeing stack trace Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).