All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] debug_pagealloc improvements through page_owner
@ 2019-08-20 13:18 Vlastimil Babka
  2019-08-20 13:18 ` [PATCH v2 1/4] mm, page_owner: handle THP splits correctly Vlastimil Babka
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Vlastimil Babka @ 2019-08-20 13:18 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Kirill A. Shutemov, Michal Hocko, Mel Gorman,
	Matthew Wilcox, Vlastimil Babka

v2: also fix THP split handling (added Patch 1) per Kirill

The debug_pagealloc functionality serves a similar purpose on the page
allocator level that slub_debug does on the kmalloc level, which is to detect
bad users. One notable feature that slub_debug has is storing stack traces of
who last allocated and freed the object. On page level we track allocations via
page_owner, but that info is discarded when freeing, and we don't track freeing
at all. This series improves those aspects. With both debug_pagealloc and
page_owner enabled, we can then get bug reports such as the example in Patch 4.

SLUB debug tracking additionaly stores cpu, pid and timestamp. This could be
added later, if deemed useful enough to justify the additional page_ext
structure size.

Vlastimil Babka (4):
  mm, page_owner: handle THP splits correctly
  mm, page_owner: record page owner for each subpage
  mm, page_owner: keep owner info when freeing the page
  mm, page_owner, debug_pagealloc: save and dump freeing stack trace

 .../admin-guide/kernel-parameters.txt         |   2 +
 include/linux/page_ext.h                      |   1 +
 mm/Kconfig.debug                              |   4 +-
 mm/huge_memory.c                              |   4 +
 mm/page_owner.c                               | 123 +++++++++++++-----
 5 files changed, 100 insertions(+), 34 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] mm, page_owner: handle THP splits correctly
  2019-08-20 13:18 [PATCH v2 0/4] debug_pagealloc improvements through page_owner Vlastimil Babka
@ 2019-08-20 13:18 ` Vlastimil Babka
  2019-08-20 15:16   ` Sasha Levin
  2019-08-20 13:18 ` [PATCH v2 2/4] mm, page_owner: record page owner for each subpage Vlastimil Babka
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2019-08-20 13:18 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Kirill A. Shutemov, Michal Hocko, Mel Gorman,
	Matthew Wilcox, Vlastimil Babka, Kirill A . Shutemov, stable

THP splitting path is missing the split_page_owner() call that split_page()
has. As a result, split THP pages are wrongly reported in the page_owner file
as order-9 pages. Furthermore when the former head page is freed, the remaining
former tail pages are not listed in the page_owner file at all. This patch
fixes that by adding the split_page_owner() call into __split_huge_page().

Fixes: a9627bc5e34e ("mm/page_owner: introduce split_page_owner and replace manual handling")
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: stable@vger.kernel.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/huge_memory.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 738065f765ab..de1f15969e27 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -32,6 +32,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/oom.h>
 #include <linux/numa.h>
+#include <linux/page_owner.h>
 
 #include <asm/tlb.h>
 #include <asm/pgalloc.h>
@@ -2516,6 +2517,9 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 	}
 
 	ClearPageCompound(head);
+
+	split_page_owner(head, HPAGE_PMD_ORDER);
+
 	/* See comment in __split_huge_page_tail() */
 	if (PageAnon(head)) {
 		/* Additional pin to swap cache */
-- 
2.22.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] mm, page_owner: record page owner for each subpage
  2019-08-20 13:18 [PATCH v2 0/4] debug_pagealloc improvements through page_owner Vlastimil Babka
  2019-08-20 13:18 ` [PATCH v2 1/4] mm, page_owner: handle THP splits correctly Vlastimil Babka
@ 2019-08-20 13:18 ` Vlastimil Babka
  2019-09-24 11:31   ` Kirill A. Shutemov
  2019-08-20 13:18 ` [PATCH v2 3/4] mm, page_owner: keep owner info when freeing the page Vlastimil Babka
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2019-08-20 13:18 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Kirill A. Shutemov, Michal Hocko, Mel Gorman,
	Matthew Wilcox, Vlastimil Babka

Currently, page owner info is only recorded for the first page of a high-order
allocation, and copied to tail pages in the event of a split page. With the
plan to keep previous owner info after freeing the page, it would be benefical
to record page owner for each subpage upon allocation. This increases the
overhead for high orders, but that should be acceptable for a debugging option.

The order stored for each subpage is the order of the whole allocation. This
makes it possible to calculate the "head" pfn and to recognize "tail" pages
(quoted because not all high-order allocations are compound pages with true
head and tail pages). When reading the page_owner debugfs file, keep skipping
the "tail" pages so that stats gathered by existing scripts don't get inflated.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_owner.c | 40 ++++++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index addcbb2ae4e4..813fcb70547b 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -154,18 +154,23 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
 	return handle;
 }
 
-static inline void __set_page_owner_handle(struct page_ext *page_ext,
-	depot_stack_handle_t handle, unsigned int order, gfp_t gfp_mask)
+static inline void __set_page_owner_handle(struct page *page,
+	struct page_ext *page_ext, depot_stack_handle_t handle,
+	unsigned int order, gfp_t gfp_mask)
 {
 	struct page_owner *page_owner;
+	int i;
 
-	page_owner = get_page_owner(page_ext);
-	page_owner->handle = handle;
-	page_owner->order = order;
-	page_owner->gfp_mask = gfp_mask;
-	page_owner->last_migrate_reason = -1;
+	for (i = 0; i < (1 << order); i++) {
+		page_owner = get_page_owner(page_ext);
+		page_owner->handle = handle;
+		page_owner->order = order;
+		page_owner->gfp_mask = gfp_mask;
+		page_owner->last_migrate_reason = -1;
+		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
 
-	__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
+		page_ext = lookup_page_ext(page + i);
+	}
 }
 
 noinline void __set_page_owner(struct page *page, unsigned int order,
@@ -178,7 +183,7 @@ noinline void __set_page_owner(struct page *page, unsigned int order,
 		return;
 
 	handle = save_stack(gfp_mask);
-	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
+	__set_page_owner_handle(page, page_ext, handle, order, gfp_mask);
 }
 
 void __set_page_owner_migrate_reason(struct page *page, int reason)
@@ -204,8 +209,11 @@ void __split_page_owner(struct page *page, unsigned int order)
 
 	page_owner = get_page_owner(page_ext);
 	page_owner->order = 0;
-	for (i = 1; i < (1 << order); i++)
-		__copy_page_owner(page, page + i);
+	for (i = 1; i < (1 << order); i++) {
+		page_ext = lookup_page_ext(page + i);
+		page_owner = get_page_owner(page_ext);
+		page_owner->order = 0;
+	}
 }
 
 void __copy_page_owner(struct page *oldpage, struct page *newpage)
@@ -483,6 +491,13 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 
 		page_owner = get_page_owner(page_ext);
 
+		/*
+		 * Don't print "tail" pages of high-order allocations as that
+		 * would inflate the stats.
+		 */
+		if (!IS_ALIGNED(pfn, 1 << page_owner->order))
+			continue;
+
 		/*
 		 * Access to page_ext->handle isn't synchronous so we should
 		 * be careful to access it.
@@ -562,7 +577,8 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
 				continue;
 
 			/* Found early allocated page */
-			__set_page_owner_handle(page_ext, early_handle, 0, 0);
+			__set_page_owner_handle(page, page_ext, early_handle,
+						0, 0);
 			count++;
 		}
 		cond_resched();
-- 
2.22.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] mm, page_owner: keep owner info when freeing the page
  2019-08-20 13:18 [PATCH v2 0/4] debug_pagealloc improvements through page_owner Vlastimil Babka
  2019-08-20 13:18 ` [PATCH v2 1/4] mm, page_owner: handle THP splits correctly Vlastimil Babka
  2019-08-20 13:18 ` [PATCH v2 2/4] mm, page_owner: record page owner for each subpage Vlastimil Babka
@ 2019-08-20 13:18 ` Vlastimil Babka
  2019-09-24 11:35   ` Kirill A. Shutemov
  2019-08-20 13:18 ` [PATCH v2 4/4] mm, page_owner, debug_pagealloc: save and dump freeing stack trace Vlastimil Babka
  2019-08-22 23:03 ` [PATCH v2 0/4] debug_pagealloc improvements through page_owner Andrew Morton
  4 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2019-08-20 13:18 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Kirill A. Shutemov, Michal Hocko, Mel Gorman,
	Matthew Wilcox, Vlastimil Babka

For debugging purposes it might be useful to keep the owner info even after
page has been freed, and include it in e.g. dump_page() when detecting a bad
page state. For that, change the PAGE_EXT_OWNER flag meaning to "page owner
info has been set at least once" and add new PAGE_EXT_OWNER_ACTIVE for tracking
whether page is supposed to be currently tracked allocated or free. Adjust
dump_page() accordingly, distinguishing free and allocated pages. In the
page_owner debugfs file, keep printing only allocated pages so that existing
scripts are not confused, and also because free pages are irrelevant for the
memory statistics or leak detection that's the typical use case of the file,
anyway.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/page_ext.h |  1 +
 mm/page_owner.c          | 34 ++++++++++++++++++++++++----------
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 09592951725c..682fd465df06 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -18,6 +18,7 @@ struct page_ext_operations {
 
 enum page_ext_flags {
 	PAGE_EXT_OWNER,
+	PAGE_EXT_OWNER_ACTIVE,
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	PAGE_EXT_YOUNG,
 	PAGE_EXT_IDLE,
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 813fcb70547b..4a48e018dbdf 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -111,7 +111,7 @@ void __reset_page_owner(struct page *page, unsigned int order)
 		page_ext = lookup_page_ext(page + i);
 		if (unlikely(!page_ext))
 			continue;
-		__clear_bit(PAGE_EXT_OWNER, &page_ext->flags);
+		__clear_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
 	}
 }
 
@@ -168,6 +168,7 @@ static inline void __set_page_owner_handle(struct page *page,
 		page_owner->gfp_mask = gfp_mask;
 		page_owner->last_migrate_reason = -1;
 		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
+		__set_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
 
 		page_ext = lookup_page_ext(page + i);
 	}
@@ -243,6 +244,7 @@ void __copy_page_owner(struct page *oldpage, struct page *newpage)
 	 * the new page, which will be freed.
 	 */
 	__set_bit(PAGE_EXT_OWNER, &new_ext->flags);
+	__set_bit(PAGE_EXT_OWNER_ACTIVE, &new_ext->flags);
 }
 
 void pagetypeinfo_showmixedcount_print(struct seq_file *m,
@@ -302,7 +304,7 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 			if (unlikely(!page_ext))
 				continue;
 
-			if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags))
+			if (!test_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags))
 				continue;
 
 			page_owner = get_page_owner(page_ext);
@@ -413,21 +415,26 @@ void __dump_page_owner(struct page *page)
 	mt = gfpflags_to_migratetype(gfp_mask);
 
 	if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) {
-		pr_alert("page_owner info is not active (free page?)\n");
+		pr_alert("page_owner info is not present (never set?)\n");
 		return;
 	}
 
+	if (test_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags))
+		pr_alert("page_owner tracks the page as allocated\n");
+	else
+		pr_alert("page_owner tracks the page as freed\n");
+
+	pr_alert("page last allocated via order %u, migratetype %s, gfp_mask %#x(%pGg)\n",
+		 page_owner->order, migratetype_names[mt], gfp_mask, &gfp_mask);
+
 	handle = READ_ONCE(page_owner->handle);
 	if (!handle) {
-		pr_alert("page_owner info is not active (free page?)\n");
-		return;
+		pr_alert("page_owner allocation stack trace missing\n");
+	} else {
+		nr_entries = stack_depot_fetch(handle, &entries);
+		stack_trace_print(entries, nr_entries, 0);
 	}
 
-	nr_entries = stack_depot_fetch(handle, &entries);
-	pr_alert("page allocated via order %u, migratetype %s, gfp_mask %#x(%pGg)\n",
-		 page_owner->order, migratetype_names[mt], gfp_mask, &gfp_mask);
-	stack_trace_print(entries, nr_entries, 0);
-
 	if (page_owner->last_migrate_reason != -1)
 		pr_alert("page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
@@ -489,6 +496,13 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 		if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags))
 			continue;
 
+		/*
+		 * Although we do have the info about past allocation of free
+		 * pages, it's not relevant for current memory usage.
+		 */
+		if (!test_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags))
+			continue;
+
 		page_owner = get_page_owner(page_ext);
 
 		/*
-- 
2.22.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] mm, page_owner, debug_pagealloc: save and dump freeing stack trace
  2019-08-20 13:18 [PATCH v2 0/4] debug_pagealloc improvements through page_owner Vlastimil Babka
                   ` (2 preceding siblings ...)
  2019-08-20 13:18 ` [PATCH v2 3/4] mm, page_owner: keep owner info when freeing the page Vlastimil Babka
@ 2019-08-20 13:18 ` Vlastimil Babka
  2019-09-24 11:42   ` Kirill A. Shutemov
  2019-08-22 23:03 ` [PATCH v2 0/4] debug_pagealloc improvements through page_owner Andrew Morton
  4 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2019-08-20 13:18 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: linux-kernel, Kirill A. Shutemov, Michal Hocko, Mel Gorman,
	Matthew Wilcox, Vlastimil Babka

The debug_pagealloc functionality is useful to catch buggy page allocator users
that cause e.g. use after free or double free. When page inconsistency is
detected, debugging is often simpler by knowing the call stack of process that
last allocated and freed the page. When page_owner is also enabled, we record
the allocation stack trace, but not freeing.

This patch therefore adds recording of freeing process stack trace to page
owner info, if both page_owner and debug_pagealloc are configured and enabled.
With only page_owner enabled, this info is not useful for the memory leak
debugging use case. dump_page() is adjusted to print the info. An example
result of calling __free_pages() twice may look like this (note the page last free
stack trace):

BUG: Bad page state in process bash  pfn:13d8f8
page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x1affff800000000()
raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount
page_owner tracks the page as freed
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL)
 prep_new_page+0x143/0x150
 get_page_from_freelist+0x289/0x380
 __alloc_pages_nodemask+0x13c/0x2d0
 khugepaged+0x6e/0xc10
 kthread+0xf9/0x130
 ret_from_fork+0x3a/0x50
page last free stack trace:
 free_pcp_prepare+0x134/0x1e0
 free_unref_page+0x18/0x90
 khugepaged+0x7b/0xc10
 kthread+0xf9/0x130
 ret_from_fork+0x3a/0x50
Modules linked in:
CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Call Trace:
 dump_stack+0x85/0xc0
 bad_page.cold+0xba/0xbf
 rmqueue_pcplist.isra.0+0x6c5/0x6d0
 rmqueue+0x2d/0x810
 get_page_from_freelist+0x191/0x380
 __alloc_pages_nodemask+0x13c/0x2d0
 __get_free_pages+0xd/0x30
 __pud_alloc+0x2c/0x110
 copy_page_range+0x4f9/0x630
 dup_mmap+0x362/0x480
 dup_mm+0x68/0x110
 copy_process+0x19e1/0x1b40
 _do_fork+0x73/0x310
 __x64_sys_clone+0x75/0x80
 do_syscall_64+0x6e/0x1e0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f10af854a10
...

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 .../admin-guide/kernel-parameters.txt         |  2 +
 mm/Kconfig.debug                              |  4 +-
 mm/page_owner.c                               | 53 ++++++++++++++-----
 3 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 47d981a86e2f..e813a17d622e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -809,6 +809,8 @@
 			enables the feature at boot time. By default, it is
 			disabled and the system will work mostly the same as a
 			kernel built without CONFIG_DEBUG_PAGEALLOC.
+			Note: to get most of debug_pagealloc error reports, it's
+			useful to also enable the page_owner functionality.
 			on: enable the feature
 
 	debugpat	[X86] Enable PAT debugging
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 82b6a20898bd..327b3ebf23bf 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -21,7 +21,9 @@ config DEBUG_PAGEALLOC
 	  Also, the state of page tracking structures is checked more often as
 	  pages are being allocated and freed, as unexpected state changes
 	  often happen for same reasons as memory corruption (e.g. double free,
-	  use-after-free).
+	  use-after-free). The error reports for these checks can be augmented
+	  with stack traces of last allocation and freeing of the page, when
+	  PAGE_OWNER is also selected and enabled on boot.
 
 	  For architectures which don't enable ARCH_SUPPORTS_DEBUG_PAGEALLOC,
 	  fill the pages with poison patterns after free_pages() and verify
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 4a48e018dbdf..dee931184788 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -24,6 +24,9 @@ struct page_owner {
 	short last_migrate_reason;
 	gfp_t gfp_mask;
 	depot_stack_handle_t handle;
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	depot_stack_handle_t free_handle;
+#endif
 };
 
 static bool page_owner_disabled = true;
@@ -102,19 +105,6 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext)
 	return (void *)page_ext + page_owner_ops.offset;
 }
 
-void __reset_page_owner(struct page *page, unsigned int order)
-{
-	int i;
-	struct page_ext *page_ext;
-
-	for (i = 0; i < (1 << order); i++) {
-		page_ext = lookup_page_ext(page + i);
-		if (unlikely(!page_ext))
-			continue;
-		__clear_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
-	}
-}
-
 static inline bool check_recursive_alloc(unsigned long *entries,
 					 unsigned int nr_entries,
 					 unsigned long ip)
@@ -154,6 +144,32 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
 	return handle;
 }
 
+void __reset_page_owner(struct page *page, unsigned int order)
+{
+	int i;
+	struct page_ext *page_ext;
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	depot_stack_handle_t handle = 0;
+	struct page_owner *page_owner;
+
+	if (debug_pagealloc_enabled())
+		handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
+#endif
+
+	for (i = 0; i < (1 << order); i++) {
+		page_ext = lookup_page_ext(page + i);
+		if (unlikely(!page_ext))
+			continue;
+		__clear_bit(PAGE_EXT_OWNER_ACTIVE, &page_ext->flags);
+#ifdef CONFIG_DEBUG_PAGEALLOC
+		if (debug_pagealloc_enabled()) {
+			page_owner = get_page_owner(page_ext);
+			page_owner->free_handle = handle;
+		}
+#endif
+	}
+}
+
 static inline void __set_page_owner_handle(struct page *page,
 	struct page_ext *page_ext, depot_stack_handle_t handle,
 	unsigned int order, gfp_t gfp_mask)
@@ -435,6 +451,17 @@ void __dump_page_owner(struct page *page)
 		stack_trace_print(entries, nr_entries, 0);
 	}
 
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	handle = READ_ONCE(page_owner->free_handle);
+	if (!handle) {
+		pr_alert("page_owner free stack trace missing\n");
+	} else {
+		nr_entries = stack_depot_fetch(handle, &entries);
+		pr_alert("page last free stack trace:\n");
+		stack_trace_print(entries, nr_entries, 0);
+	}
+#endif
+
 	if (page_owner->last_migrate_reason != -1)
 		pr_alert("page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
-- 
2.22.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] mm, page_owner: handle THP splits correctly
  2019-08-20 13:18 ` [PATCH v2 1/4] mm, page_owner: handle THP splits correctly Vlastimil Babka
@ 2019-08-20 15:16   ` Sasha Levin
  0 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2019-08-20 15:16 UTC (permalink / raw)
  To: Sasha Levin, Vlastimil Babka, linux-mm; +Cc: linux-kernel, , stable, stable

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: a9627bc5e34e mm/page_owner: introduce split_page_owner and replace manual handling.

The bot has tested the following trees: v5.2.9, v4.19.67, v4.14.139, v4.9.189.

v5.2.9: Build OK!
v4.19.67: Failed to apply! Possible dependencies:
    426dcd4b600f ("hexagon: switch to NO_BOOTMEM")
    46515fdb1adf ("ixgbe: move common Rx functions to ixgbe_txrx_common.h")
    57c8a661d95d ("mm: remove include/linux/bootmem.h")
    6471f52af786 ("alpha: switch to NO_BOOTMEM")
    98fa15f34cb3 ("mm: replace all open encodings for NUMA_NO_NODE")
    d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support")
    e0a9317d9004 ("hexagon: use generic dma_noncoherent_ops")
    f406f222d4b2 ("hexagon: implement the sync_sg_for_device DMA operation")

v4.14.139: Failed to apply! Possible dependencies:
    01417c6cc7dc ("powerpc/64: Change soft_enabled from flag to bitmask")
    0b63acf4a0eb ("powerpc/64: Move set_soft_enabled() and rename")
    1696d0fb7fcd ("powerpc/64: Set DSCR default initially from SPR")
    1af19331a3a1 ("powerpc/64s: Relax PACA address limitations")
    4890aea65ae7 ("powerpc/64: Allocate pacas per node")
    4e26bc4a4ed6 ("powerpc/64: Rename soft_enabled to irq_soft_mask")
    8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized")
    98fa15f34cb3 ("mm: replace all open encodings for NUMA_NO_NODE")
    9f83e00f4cc1 ("powerpc/64: Improve inline asm in arch_local_irq_disable")
    b5c1bd62c054 ("powerpc/64: Fix arch_local_irq_disable() prototype")
    c2e480ba8227 ("powerpc/64: Add #defines for paca->soft_enabled flags")
    ff967900c9d4 ("powerpc/64: Fix latency tracing for lazy irq replay")

v4.9.189: Failed to apply! Possible dependencies:
    010426079ec1 ("sched/headers: Prepare for new header dependencies before moving more code to <linux/sched/mm.h>")
    2077be6783b5 ("arm64: Use __pa_symbol for kernel symbols")
    39bc88e5e38e ("arm64: Disable TTBR0_EL1 during normal kernel execution")
    3f07c0144132 ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>")
    68db0cf10678 ("sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h>")
    7c0f6ba682b9 ("Replace <asm/uaccess.h> with <linux/uaccess.h> globally")
    869dcfd10dfe ("arm64: Add cast for virt_to_pfn")
    9164bb4a18df ("sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h>")
    98fa15f34cb3 ("mm: replace all open encodings for NUMA_NO_NODE")
    9cf09d68b89a ("arm64: xen: Enable user access before a privcmd hvc call")
    bd38967d406f ("arm64: Factor out PAN enabling/disabling into separate uaccess_* macros")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

--
Thanks,
Sasha


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/4] debug_pagealloc improvements through page_owner
  2019-08-20 13:18 [PATCH v2 0/4] debug_pagealloc improvements through page_owner Vlastimil Babka
                   ` (3 preceding siblings ...)
  2019-08-20 13:18 ` [PATCH v2 4/4] mm, page_owner, debug_pagealloc: save and dump freeing stack trace Vlastimil Babka
@ 2019-08-22 23:03 ` Andrew Morton
  2019-09-20 23:34   ` Andrew Morton
  4 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2019-08-22 23:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Kirill A. Shutemov, Michal Hocko,
	Mel Gorman, Matthew Wilcox

On Tue, 20 Aug 2019 15:18:24 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> v2: also fix THP split handling (added Patch 1) per Kirill
> 
> The debug_pagealloc functionality serves a similar purpose on the page
> allocator level that slub_debug does on the kmalloc level, which is to detect
> bad users. One notable feature that slub_debug has is storing stack traces of
> who last allocated and freed the object. On page level we track allocations via
> page_owner, but that info is discarded when freeing, and we don't track freeing
> at all. This series improves those aspects. With both debug_pagealloc and
> page_owner enabled, we can then get bug reports such as the example in Patch 4.
> 
> SLUB debug tracking additionaly stores cpu, pid and timestamp. This could be
> added later, if deemed useful enough to justify the additional page_ext
> structure size.

Thanks.  I split [1/1] out of the series as a bugfix and turned this
into a three-patch series.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/4] debug_pagealloc improvements through page_owner
  2019-08-22 23:03 ` [PATCH v2 0/4] debug_pagealloc improvements through page_owner Andrew Morton
@ 2019-09-20 23:34   ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2019-09-20 23:34 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, linux-kernel, Kirill A. Shutemov,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On Thu, 22 Aug 2019 16:03:44 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Tue, 20 Aug 2019 15:18:24 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:
> 
> > v2: also fix THP split handling (added Patch 1) per Kirill
> > 
> > The debug_pagealloc functionality serves a similar purpose on the page
> > allocator level that slub_debug does on the kmalloc level, which is to detect
> > bad users. One notable feature that slub_debug has is storing stack traces of
> > who last allocated and freed the object. On page level we track allocations via
> > page_owner, but that info is discarded when freeing, and we don't track freeing
> > at all. This series improves those aspects. With both debug_pagealloc and
> > page_owner enabled, we can then get bug reports such as the example in Patch 4.
> > 
> > SLUB debug tracking additionaly stores cpu, pid and timestamp. This could be
> > added later, if deemed useful enough to justify the additional page_ext
> > structure size.
> 
> Thanks.  I split [1/1] out of the series as a bugfix and turned this
> into a three-patch series.
> 

None of which anyone has yet reviewed :(



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] mm, page_owner: record page owner for each subpage
  2019-08-20 13:18 ` [PATCH v2 2/4] mm, page_owner: record page owner for each subpage Vlastimil Babka
@ 2019-09-24 11:31   ` Kirill A. Shutemov
  2019-09-24 15:10     ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2019-09-24 11:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Kirill A. Shutemov,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On Tue, Aug 20, 2019 at 03:18:26PM +0200, Vlastimil Babka wrote:
> Currently, page owner info is only recorded for the first page of a high-order
> allocation, and copied to tail pages in the event of a split page. With the
> plan to keep previous owner info after freeing the page, it would be benefical
> to record page owner for each subpage upon allocation. This increases the
> overhead for high orders, but that should be acceptable for a debugging option.
> 
> The order stored for each subpage is the order of the whole allocation. This
> makes it possible to calculate the "head" pfn and to recognize "tail" pages
> (quoted because not all high-order allocations are compound pages with true
> head and tail pages). When reading the page_owner debugfs file, keep skipping
> the "tail" pages so that stats gathered by existing scripts don't get inflated.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_owner.c | 40 ++++++++++++++++++++++++++++------------
>  1 file changed, 28 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index addcbb2ae4e4..813fcb70547b 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -154,18 +154,23 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
>  	return handle;
>  }
>  
> -static inline void __set_page_owner_handle(struct page_ext *page_ext,
> -	depot_stack_handle_t handle, unsigned int order, gfp_t gfp_mask)
> +static inline void __set_page_owner_handle(struct page *page,
> +	struct page_ext *page_ext, depot_stack_handle_t handle,
> +	unsigned int order, gfp_t gfp_mask)
>  {
>  	struct page_owner *page_owner;
> +	int i;
>  
> -	page_owner = get_page_owner(page_ext);
> -	page_owner->handle = handle;
> -	page_owner->order = order;
> -	page_owner->gfp_mask = gfp_mask;
> -	page_owner->last_migrate_reason = -1;
> +	for (i = 0; i < (1 << order); i++) {
> +		page_owner = get_page_owner(page_ext);
> +		page_owner->handle = handle;
> +		page_owner->order = order;
> +		page_owner->gfp_mask = gfp_mask;
> +		page_owner->last_migrate_reason = -1;
> +		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
>  
> -	__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
> +		page_ext = lookup_page_ext(page + i);

Isn't it off-by-one? You are calculating page_ext for the next page,
right? And cant we just do page_ext++ here instead?

> +	}
>  }
>  
>  noinline void __set_page_owner(struct page *page, unsigned int order,
> @@ -178,7 +183,7 @@ noinline void __set_page_owner(struct page *page, unsigned int order,
>  		return;
>  
>  	handle = save_stack(gfp_mask);
> -	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
> +	__set_page_owner_handle(page, page_ext, handle, order, gfp_mask);
>  }
>  
>  void __set_page_owner_migrate_reason(struct page *page, int reason)
> @@ -204,8 +209,11 @@ void __split_page_owner(struct page *page, unsigned int order)
>  
>  	page_owner = get_page_owner(page_ext);
>  	page_owner->order = 0;
> -	for (i = 1; i < (1 << order); i++)
> -		__copy_page_owner(page, page + i);
> +	for (i = 1; i < (1 << order); i++) {
> +		page_ext = lookup_page_ext(page + i);

Again, page_ext++?

> +		page_owner = get_page_owner(page_ext);
> +		page_owner->order = 0;
> +	}
>  }
>  
>  void __copy_page_owner(struct page *oldpage, struct page *newpage)
> @@ -483,6 +491,13 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
>  
>  		page_owner = get_page_owner(page_ext);
>  
> +		/*
> +		 * Don't print "tail" pages of high-order allocations as that
> +		 * would inflate the stats.
> +		 */
> +		if (!IS_ALIGNED(pfn, 1 << page_owner->order))
> +			continue;
> +
>  		/*
>  		 * Access to page_ext->handle isn't synchronous so we should
>  		 * be careful to access it.
> @@ -562,7 +577,8 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
>  				continue;
>  
>  			/* Found early allocated page */
> -			__set_page_owner_handle(page_ext, early_handle, 0, 0);
> +			__set_page_owner_handle(page, page_ext, early_handle,
> +						0, 0);
>  			count++;
>  		}
>  		cond_resched();
> -- 
> 2.22.0
> 
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] mm, page_owner: keep owner info when freeing the page
  2019-08-20 13:18 ` [PATCH v2 3/4] mm, page_owner: keep owner info when freeing the page Vlastimil Babka
@ 2019-09-24 11:35   ` Kirill A. Shutemov
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2019-09-24 11:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Kirill A. Shutemov,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On Tue, Aug 20, 2019 at 03:18:27PM +0200, Vlastimil Babka wrote:
> For debugging purposes it might be useful to keep the owner info even after
> page has been freed, and include it in e.g. dump_page() when detecting a bad
> page state. For that, change the PAGE_EXT_OWNER flag meaning to "page owner
> info has been set at least once" and add new PAGE_EXT_OWNER_ACTIVE for tracking
> whether page is supposed to be currently tracked allocated or free.

Why not PAGE_EXT_OWNER_ALLOCED if it means "allocated"? Active is somewhat
loaded term for a page.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] mm, page_owner, debug_pagealloc: save and dump freeing stack trace
  2019-08-20 13:18 ` [PATCH v2 4/4] mm, page_owner, debug_pagealloc: save and dump freeing stack trace Vlastimil Babka
@ 2019-09-24 11:42   ` Kirill A. Shutemov
  2019-09-24 15:15     ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2019-09-24 11:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Kirill A. Shutemov,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On Tue, Aug 20, 2019 at 03:18:28PM +0200, Vlastimil Babka wrote:
> The debug_pagealloc functionality is useful to catch buggy page allocator users
> that cause e.g. use after free or double free. When page inconsistency is
> detected, debugging is often simpler by knowing the call stack of process that
> last allocated and freed the page. When page_owner is also enabled, we record
> the allocation stack trace, but not freeing.
> 
> This patch therefore adds recording of freeing process stack trace to page
> owner info, if both page_owner and debug_pagealloc are configured and enabled.
> With only page_owner enabled, this info is not useful for the memory leak
> debugging use case. dump_page() is adjusted to print the info. An example
> result of calling __free_pages() twice may look like this (note the page last free
> stack trace):
> 
> BUG: Bad page state in process bash  pfn:13d8f8
> page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
> flags: 0x1affff800000000()
> raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000
> raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
> page dumped because: nonzero _refcount
> page_owner tracks the page as freed
> page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL)
>  prep_new_page+0x143/0x150
>  get_page_from_freelist+0x289/0x380
>  __alloc_pages_nodemask+0x13c/0x2d0
>  khugepaged+0x6e/0xc10
>  kthread+0xf9/0x130
>  ret_from_fork+0x3a/0x50
> page last free stack trace:
>  free_pcp_prepare+0x134/0x1e0
>  free_unref_page+0x18/0x90
>  khugepaged+0x7b/0xc10
>  kthread+0xf9/0x130
>  ret_from_fork+0x3a/0x50
> Modules linked in:
> CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  dump_stack+0x85/0xc0
>  bad_page.cold+0xba/0xbf
>  rmqueue_pcplist.isra.0+0x6c5/0x6d0
>  rmqueue+0x2d/0x810
>  get_page_from_freelist+0x191/0x380
>  __alloc_pages_nodemask+0x13c/0x2d0
>  __get_free_pages+0xd/0x30
>  __pud_alloc+0x2c/0x110
>  copy_page_range+0x4f9/0x630
>  dup_mmap+0x362/0x480
>  dup_mm+0x68/0x110
>  copy_process+0x19e1/0x1b40
>  _do_fork+0x73/0x310
>  __x64_sys_clone+0x75/0x80
>  do_syscall_64+0x6e/0x1e0
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7f10af854a10
> ...
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  .../admin-guide/kernel-parameters.txt         |  2 +
>  mm/Kconfig.debug                              |  4 +-
>  mm/page_owner.c                               | 53 ++++++++++++++-----
>  3 files changed, 45 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 47d981a86e2f..e813a17d622e 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -809,6 +809,8 @@
>  			enables the feature at boot time. By default, it is
>  			disabled and the system will work mostly the same as a
>  			kernel built without CONFIG_DEBUG_PAGEALLOC.
> +			Note: to get most of debug_pagealloc error reports, it's
> +			useful to also enable the page_owner functionality.
>  			on: enable the feature
>  
>  	debugpat	[X86] Enable PAT debugging
> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
> index 82b6a20898bd..327b3ebf23bf 100644
> --- a/mm/Kconfig.debug
> +++ b/mm/Kconfig.debug
> @@ -21,7 +21,9 @@ config DEBUG_PAGEALLOC
>  	  Also, the state of page tracking structures is checked more often as
>  	  pages are being allocated and freed, as unexpected state changes
>  	  often happen for same reasons as memory corruption (e.g. double free,
> -	  use-after-free).
> +	  use-after-free). The error reports for these checks can be augmented
> +	  with stack traces of last allocation and freeing of the page, when
> +	  PAGE_OWNER is also selected and enabled on boot.
>  
>  	  For architectures which don't enable ARCH_SUPPORTS_DEBUG_PAGEALLOC,
>  	  fill the pages with poison patterns after free_pages() and verify
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 4a48e018dbdf..dee931184788 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -24,6 +24,9 @@ struct page_owner {
>  	short last_migrate_reason;
>  	gfp_t gfp_mask;
>  	depot_stack_handle_t handle;
> +#ifdef CONFIG_DEBUG_PAGEALLOC
> +	depot_stack_handle_t free_handle;
> +#endif

I think it's possible to add space for the second stack handle at runtime:
adjust page_owner_ops->size inside the ->need(). The second stack might be
useful beyond CONFIG_DEBUG_PAGEALLOC. We probably should not tie these
features.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] mm, page_owner: record page owner for each subpage
  2019-09-24 11:31   ` Kirill A. Shutemov
@ 2019-09-24 15:10     ` Vlastimil Babka
  2019-09-24 15:16       ` Kirill A. Shutemov
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2019-09-24 15:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Andrew Morton, linux-kernel, Kirill A. Shutemov,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On 9/24/19 1:31 PM, Kirill A. Shutemov wrote:
> On Tue, Aug 20, 2019 at 03:18:26PM +0200, Vlastimil Babka wrote:
>> Currently, page owner info is only recorded for the first page of a high-order
>> allocation, and copied to tail pages in the event of a split page. With the
>> plan to keep previous owner info after freeing the page, it would be benefical
>> to record page owner for each subpage upon allocation. This increases the
>> overhead for high orders, but that should be acceptable for a debugging option.
>>
>> The order stored for each subpage is the order of the whole allocation. This
>> makes it possible to calculate the "head" pfn and to recognize "tail" pages
>> (quoted because not all high-order allocations are compound pages with true
>> head and tail pages). When reading the page_owner debugfs file, keep skipping
>> the "tail" pages so that stats gathered by existing scripts don't get inflated.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>  mm/page_owner.c | 40 ++++++++++++++++++++++++++++------------
>>  1 file changed, 28 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/page_owner.c b/mm/page_owner.c
>> index addcbb2ae4e4..813fcb70547b 100644
>> --- a/mm/page_owner.c
>> +++ b/mm/page_owner.c
>> @@ -154,18 +154,23 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
>>  	return handle;
>>  }
>>  
>> -static inline void __set_page_owner_handle(struct page_ext *page_ext,
>> -	depot_stack_handle_t handle, unsigned int order, gfp_t gfp_mask)
>> +static inline void __set_page_owner_handle(struct page *page,
>> +	struct page_ext *page_ext, depot_stack_handle_t handle,
>> +	unsigned int order, gfp_t gfp_mask)
>>  {
>>  	struct page_owner *page_owner;
>> +	int i;
>>  
>> -	page_owner = get_page_owner(page_ext);
>> -	page_owner->handle = handle;
>> -	page_owner->order = order;
>> -	page_owner->gfp_mask = gfp_mask;
>> -	page_owner->last_migrate_reason = -1;
>> +	for (i = 0; i < (1 << order); i++) {
>> +		page_owner = get_page_owner(page_ext);
>> +		page_owner->handle = handle;
>> +		page_owner->order = order;
>> +		page_owner->gfp_mask = gfp_mask;
>> +		page_owner->last_migrate_reason = -1;
>> +		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
>>  
>> -	__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
>> +		page_ext = lookup_page_ext(page + i);
> 
> Isn't it off-by-one? You are calculating page_ext for the next page,
> right?

You're right, thanks!

> And cant we just do page_ext++ here instead?

Unfortunately no, as that implies sizeof(page_ext), which only declares
unsigned long flags; and the rest is runtime-determined.
Perhaps I could add a wrapper named e.g. page_ext_next() that would use
get_entry_size() internally and hide the necessary casts to void * and back?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] mm, page_owner, debug_pagealloc: save and dump freeing stack trace
  2019-09-24 11:42   ` Kirill A. Shutemov
@ 2019-09-24 15:15     ` Vlastimil Babka
  2019-09-24 15:24       ` Kirill A. Shutemov
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2019-09-24 15:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Andrew Morton, linux-kernel, Kirill A. Shutemov,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On 9/24/19 1:42 PM, Kirill A. Shutemov wrote:
>> --- a/mm/page_owner.c
>> +++ b/mm/page_owner.c
>> @@ -24,6 +24,9 @@ struct page_owner {
>>  	short last_migrate_reason;
>>  	gfp_t gfp_mask;
>>  	depot_stack_handle_t handle;
>> +#ifdef CONFIG_DEBUG_PAGEALLOC
>> +	depot_stack_handle_t free_handle;
>> +#endif
> 
> I think it's possible to add space for the second stack handle at runtime:
> adjust page_owner_ops->size inside the ->need().

Uh that would complicate the code needlessly? The extra memory overhead
isn't that much for a scenario that's already a debugging one
(page_owner), I was more concerned about the cpu overhead, thus the
static key.

> The second stack might be
> useful beyond CONFIG_DEBUG_PAGEALLOC. We probably should not tie these
> features.

Yeah I generalized it later, as KASAN guys wanted the same thing:

https://lore.kernel.org/linux-arm-kernel/d98bf550-367d-0744-025a-52307248ec82@suse.cz/

Perhaps as it's still mmotm I can squash it here.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] mm, page_owner: record page owner for each subpage
  2019-09-24 15:10     ` Vlastimil Babka
@ 2019-09-24 15:16       ` Kirill A. Shutemov
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2019-09-24 15:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Kirill A. Shutemov,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On Tue, Sep 24, 2019 at 05:10:59PM +0200, Vlastimil Babka wrote:
> On 9/24/19 1:31 PM, Kirill A. Shutemov wrote:
> > On Tue, Aug 20, 2019 at 03:18:26PM +0200, Vlastimil Babka wrote:
> >> Currently, page owner info is only recorded for the first page of a high-order
> >> allocation, and copied to tail pages in the event of a split page. With the
> >> plan to keep previous owner info after freeing the page, it would be benefical
> >> to record page owner for each subpage upon allocation. This increases the
> >> overhead for high orders, but that should be acceptable for a debugging option.
> >>
> >> The order stored for each subpage is the order of the whole allocation. This
> >> makes it possible to calculate the "head" pfn and to recognize "tail" pages
> >> (quoted because not all high-order allocations are compound pages with true
> >> head and tail pages). When reading the page_owner debugfs file, keep skipping
> >> the "tail" pages so that stats gathered by existing scripts don't get inflated.
> >>
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> >>  mm/page_owner.c | 40 ++++++++++++++++++++++++++++------------
> >>  1 file changed, 28 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/mm/page_owner.c b/mm/page_owner.c
> >> index addcbb2ae4e4..813fcb70547b 100644
> >> --- a/mm/page_owner.c
> >> +++ b/mm/page_owner.c
> >> @@ -154,18 +154,23 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
> >>  	return handle;
> >>  }
> >>  
> >> -static inline void __set_page_owner_handle(struct page_ext *page_ext,
> >> -	depot_stack_handle_t handle, unsigned int order, gfp_t gfp_mask)
> >> +static inline void __set_page_owner_handle(struct page *page,
> >> +	struct page_ext *page_ext, depot_stack_handle_t handle,
> >> +	unsigned int order, gfp_t gfp_mask)
> >>  {
> >>  	struct page_owner *page_owner;
> >> +	int i;
> >>  
> >> -	page_owner = get_page_owner(page_ext);
> >> -	page_owner->handle = handle;
> >> -	page_owner->order = order;
> >> -	page_owner->gfp_mask = gfp_mask;
> >> -	page_owner->last_migrate_reason = -1;
> >> +	for (i = 0; i < (1 << order); i++) {
> >> +		page_owner = get_page_owner(page_ext);
> >> +		page_owner->handle = handle;
> >> +		page_owner->order = order;
> >> +		page_owner->gfp_mask = gfp_mask;
> >> +		page_owner->last_migrate_reason = -1;
> >> +		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
> >>  
> >> -	__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
> >> +		page_ext = lookup_page_ext(page + i);
> > 
> > Isn't it off-by-one? You are calculating page_ext for the next page,
> > right?
> 
> You're right, thanks!
> 
> > And cant we just do page_ext++ here instead?
> 
> Unfortunately no, as that implies sizeof(page_ext), which only declares
> unsigned long flags; and the rest is runtime-determined.
> Perhaps I could add a wrapper named e.g. page_ext_next() that would use
> get_entry_size() internally and hide the necessary casts to void * and back?

Yeah, it looks less costly than calling lookup_page_ext() on each
iteration. And looks like it can be inlined if we make 'extra_mem' visible
(under different name) outside page_ext.c.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] mm, page_owner, debug_pagealloc: save and dump freeing stack trace
  2019-09-24 15:15     ` Vlastimil Babka
@ 2019-09-24 15:24       ` Kirill A. Shutemov
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2019-09-24 15:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, linux-kernel, Kirill A. Shutemov,
	Michal Hocko, Mel Gorman, Matthew Wilcox

On Tue, Sep 24, 2019 at 05:15:01PM +0200, Vlastimil Babka wrote:
> On 9/24/19 1:42 PM, Kirill A. Shutemov wrote:
> >> --- a/mm/page_owner.c
> >> +++ b/mm/page_owner.c
> >> @@ -24,6 +24,9 @@ struct page_owner {
> >>  	short last_migrate_reason;
> >>  	gfp_t gfp_mask;
> >>  	depot_stack_handle_t handle;
> >> +#ifdef CONFIG_DEBUG_PAGEALLOC
> >> +	depot_stack_handle_t free_handle;
> >> +#endif
> > 
> > I think it's possible to add space for the second stack handle at runtime:
> > adjust page_owner_ops->size inside the ->need().
> 
> Uh that would complicate the code needlessly? The extra memory overhead
> isn't that much for a scenario that's already a debugging one
> (page_owner), I was more concerned about the cpu overhead, thus the
> static key.

Okay, fair enough.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-09-24 15:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 13:18 [PATCH v2 0/4] debug_pagealloc improvements through page_owner Vlastimil Babka
2019-08-20 13:18 ` [PATCH v2 1/4] mm, page_owner: handle THP splits correctly Vlastimil Babka
2019-08-20 15:16   ` Sasha Levin
2019-08-20 13:18 ` [PATCH v2 2/4] mm, page_owner: record page owner for each subpage Vlastimil Babka
2019-09-24 11:31   ` Kirill A. Shutemov
2019-09-24 15:10     ` Vlastimil Babka
2019-09-24 15:16       ` Kirill A. Shutemov
2019-08-20 13:18 ` [PATCH v2 3/4] mm, page_owner: keep owner info when freeing the page Vlastimil Babka
2019-09-24 11:35   ` Kirill A. Shutemov
2019-08-20 13:18 ` [PATCH v2 4/4] mm, page_owner, debug_pagealloc: save and dump freeing stack trace Vlastimil Babka
2019-09-24 11:42   ` Kirill A. Shutemov
2019-09-24 15:15     ` Vlastimil Babka
2019-09-24 15:24       ` Kirill A. Shutemov
2019-08-22 23:03 ` [PATCH v2 0/4] debug_pagealloc improvements through page_owner Andrew Morton
2019-09-20 23:34   ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.