linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page
@ 2021-08-19  5:41 Yang Shi
  2021-08-19  5:41 ` [v2 PATCH 2/3] doc: hwpoison: correct the support for hugepage Yang Shi
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Yang Shi @ 2021-08-19  5:41 UTC (permalink / raw)
  To: naoya.horiguchi, osalvador, tdmackey, david, willy, akpm, corbet
  Cc: shy828301, linux-mm, linux-kernel

In the current implementation of soft offline, if non-LRU page is met,
all the slab caches will be dropped to free the page then offline.  But
if the page is not slab page all the effort is wasted in vain.  Even
though it is a slab page, it is not guaranteed the page could be freed
at all.

However the side effect and cost is quite high.  It does not only drop
the slab caches, but also may drop a significant amount of page caches
which are associated with inode caches.  It could make the most
workingset gone in order to just offline a page.  And the offline is not
guaranteed to succeed at all, actually I really doubt the success rate
for real life workload.

Furthermore the worse consequence is the system may be locked up and
unusable since the page cache release may incur huge amount of works
queued for memcg release.

Actually we ran into such unpleasant case in our production environment.
Firstly, the workqueue of memory_failure_work_func is locked up as
below:

BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 53s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=14/256 refcnt=15
    in-flight: 409271:memory_failure_work_func
    pending: kfree_rcu_work, kfree_rcu_monitor, kfree_rcu_work, rht_deferred_worker, rht_deferred_worker, rht_deferred_worker, rht_deferred_worker, kfree_rcu_work, kfree_rcu_work, kfree_rcu_work, kfree_rcu_work, drain_local_stock, kfree_rcu_work
workqueue mm_percpu_wq: flags=0x8
  pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_update
workqueue cgroup_destroy: flags=0x0
  pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1 refcnt=12072
    pending: css_release_work_fn

There were over 12K css_release_work_fn queued, and this caused a few
lockups due to the contention of worker pool lock with IRQ disabled, for
example:

NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
Modules linked in: amd64_edac_mod edac_mce_amd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel xt_DSCP iptable_mangle kvm_amd bpfilter vfat fat acpi_ipmi i2c_piix4 usb_storage ipmi_si k10temp i2c_core ipmi_devintf ipmi_msghandler acpi_cpufreq sch_fq_codel xfs libcrc32c crc32c_intel mlx5_core mlxfw nvme xhci_pci ptp nvme_core pps_core xhci_hcd
CPU: 1 PID: 205500 Comm: kworker/1:0 Tainted: G             L    5.10.32-t1.el7.twitter.x86_64 #1
Hardware name: TYAN F5AMT /z        /S8026GM2NRE-CGN, BIOS V8.030 03/30/2021
Workqueue: events memory_failure_work_func
RIP: 0010:queued_spin_lock_slowpath+0x41/0x1a0
Code: 41 f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 1b 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 f6 c4 01 75 04 c6 47
RSP: 0018:ffff9b2ac278f900 EFLAGS: 00000002
RAX: 0000000000480101 RBX: ffff8ce98ce71800 RCX: 0000000000000084
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8ce98ce6a140
RBP: 00000000000284c8 R08: ffffd7248dcb6808 R09: 0000000000000000
R10: 0000000000000003 R11: ffff9b2ac278f9b0 R12: 0000000000000001
R13: ffff8cb44dab9c00 R14: ffffffffbd1ce6a0 R15: ffff8cacaa37f068
FS:  0000000000000000(0000) GS:ffff8ce98ce40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcf6e8cb000 CR3: 0000000a0c60a000 CR4: 0000000000350ee0
Call Trace:
 __queue_work+0xd6/0x3c0
 queue_work_on+0x1c/0x30
 uncharge_batch+0x10e/0x110
 mem_cgroup_uncharge_list+0x6d/0x80
 release_pages+0x37f/0x3f0
 __pagevec_release+0x1c/0x50
 __invalidate_mapping_pages+0x348/0x380
 ? xfs_alloc_buftarg+0xa4/0x120 [xfs]
 inode_lru_isolate+0x10a/0x160
 ? iput+0x1d0/0x1d0
 __list_lru_walk_one+0x7b/0x170
 ? iput+0x1d0/0x1d0
 list_lru_walk_one+0x4a/0x60
 prune_icache_sb+0x37/0x50
 super_cache_scan+0x123/0x1a0
 do_shrink_slab+0x10c/0x2c0
 shrink_slab+0x1f1/0x290
 drop_slab_node+0x4d/0x70
 soft_offline_page+0x1ac/0x5b0
 ? dev_mce_log+0xee/0x110
 ? notifier_call_chain+0x39/0x90
 memory_failure_work_func+0x6a/0x90
 process_one_work+0x19e/0x340
 ? process_one_work+0x340/0x340
 worker_thread+0x30/0x360
 ? process_one_work+0x340/0x340
 kthread+0x116/0x130

The lockup made the machine is quite unusable.  And it also made the
most workingset gone, the reclaimabled slab caches were reduced from 12G
to 300MB, the page caches were decreased from 17G to 4G.

But the most disappointing thing is all the effort doesn't make the page
offline, it just returns:

soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()

It seems the aggressive behavior for non-LRU page didn't pay back, so it
doesn't make too much sense to keep it considering the terrible side
effect.

Reported-by: David Mackey <tdmackey@twitter.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
v2: * Rebased on top of https://lore.kernel.org/linux-mm/CAHbLzkpAEZRTmnOnjVHYHGJ7ApjdC8eDh53DAnTHsG185QGOfQ@mail.gmail.com/T/#t (Naoya Horiguchi)
    * Added comment about possible future optimization when handling
      slab page (David Hildenbrand)
    * Added patch #3 to call dump_page (Matthew Wilcox)

 include/linux/mm.h   |  2 +-
 mm/hwpoison-inject.c |  2 +-
 mm/memory-failure.c  | 18 ++++++++----------
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7e43d1b01e0a..a3cc83d64564 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3203,7 +3203,7 @@ extern void memory_failure_queue_kick(int cpu);
 extern int unpoison_memory(unsigned long pfn);
 extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
-extern void shake_page(struct page *p, int access);
+extern void shake_page(struct page *p);
 extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(unsigned long pfn, int flags);
 
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index 1ae1ebc2b9b1..aff4d27ec235 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -30,7 +30,7 @@ static int hwpoison_inject(void *data, u64 val)
 	if (!hwpoison_filter_enable)
 		goto inject;
 
-	shake_page(hpage, 0);
+	shake_page(hpage);
 	/*
 	 * This implies unable to support non-LRU pages.
 	 */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 665316c7ea40..7cfa134b1370 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -282,9 +282,9 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags)
 
 /*
  * Unknown page type encountered. Try to check whether it can turn PageLRU by
- * lru_add_drain_all, or a free page by reclaiming slabs when possible.
+ * lru_add_drain_all.
  */
-void shake_page(struct page *p, int access)
+void shake_page(struct page *p)
 {
 	if (PageHuge(p))
 		return;
@@ -296,11 +296,9 @@ void shake_page(struct page *p, int access)
 	}
 
 	/*
-	 * Only call drop_slab_node here (which would also shrink
-	 * other caches) if access is not potentially fatal.
+	 * TODO: Could shrink slab caches here if a lightweight range-based
+	 * shrinker will be available.
 	 */
-	if (access)
-		drop_slab_node(page_to_nid(p));
 }
 EXPORT_SYMBOL_GPL(shake_page);
 
@@ -1205,7 +1203,7 @@ static int get_any_page(struct page *p, unsigned long flags)
 			 * page, retry.
 			 */
 			if (pass++ < 3) {
-				shake_page(p, 1);
+				shake_page(p);
 				goto try_again;
 			}
 			ret = -EIO;
@@ -1222,7 +1220,7 @@ static int get_any_page(struct page *p, unsigned long flags)
 		 */
 		if (pass++ < 3) {
 			put_page(p);
-			shake_page(p, 1);
+			shake_page(p);
 			count_increased = false;
 			goto try_again;
 		}
@@ -1369,7 +1367,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * shake_page() again to ensure that it's flushed.
 	 */
 	if (mlocked)
-		shake_page(hpage, 0);
+		shake_page(hpage);
 
 	/*
 	 * Now that the dirty bit has been propagated to the
@@ -1723,7 +1721,7 @@ int memory_failure(unsigned long pfn, int flags)
 	 * The check (unnecessarily) ignores LRU pages being isolated and
 	 * walked by the page reclaim code, however that's not a big loss.
 	 */
-	shake_page(p, 0);
+	shake_page(p);
 
 	lock_page(p);
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [v2 PATCH 2/3] doc: hwpoison: correct the support for hugepage
  2021-08-19  5:41 [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page Yang Shi
@ 2021-08-19  5:41 ` Yang Shi
  2021-08-20  9:34   ` David Hildenbrand
  2021-08-19  5:41 ` [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page Yang Shi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Yang Shi @ 2021-08-19  5:41 UTC (permalink / raw)
  To: naoya.horiguchi, osalvador, tdmackey, david, willy, akpm, corbet
  Cc: shy828301, linux-mm, linux-kernel

The hwpoison support for huge page, both hugetlb and THP, has been in
kernel for a while, the statement in document is obsolete, correct it.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
v2: * Collected ack from Naoya Horiguchi

 Documentation/vm/hwpoison.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index a5c884293dac..89b5f7a52077 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -180,7 +180,6 @@ Limitations
 ===========
 - Not all page types are supported and never will. Most kernel internal
   objects cannot be recovered, only LRU pages for now.
-- Right now hugepage support is missing.
 
 ---
 Andi Kleen, Oct 2009
-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page
  2021-08-19  5:41 [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page Yang Shi
  2021-08-19  5:41 ` [v2 PATCH 2/3] doc: hwpoison: correct the support for hugepage Yang Shi
@ 2021-08-19  5:41 ` Yang Shi
  2021-08-20  6:48   ` HORIGUCHI NAOYA(堀口 直也)
  2021-09-22 19:37   ` Luck, Tony
  2021-08-20  7:04 ` [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page HORIGUCHI NAOYA(堀口 直也)
  2021-08-20  7:08 ` David Hildenbrand
  3 siblings, 2 replies; 14+ messages in thread
From: Yang Shi @ 2021-08-19  5:41 UTC (permalink / raw)
  To: naoya.horiguchi, osalvador, tdmackey, david, willy, akpm, corbet
  Cc: shy828301, linux-mm, linux-kernel

Currently just very simple message is shown for unhandlable page, e.g.
non-LRU page, like:
soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()

It is not very helpful for further debug, calling dump_page() could show
more useful information.

Calling dump_page() in get_any_page() in order to not duplicate the call
in a couple of different places.  It may be called with pcp disabled and
holding memory hotplug lock, it should be not a big deal since hwpoison
handler is not called very often.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/memory-failure.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 7cfa134b1370..60df8fcd0444 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1228,6 +1228,9 @@ static int get_any_page(struct page *p, unsigned long flags)
 		ret = -EIO;
 	}
 out:
+	if (ret == -EIO)
+		dump_page(p, "hwpoison: unhandlable page");
+
 	return ret;
 }
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page
  2021-08-19  5:41 ` [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page Yang Shi
@ 2021-08-20  6:48   ` HORIGUCHI NAOYA(堀口 直也)
  2021-08-20 18:40     ` Yang Shi
  2021-09-22 19:37   ` Luck, Tony
  1 sibling, 1 reply; 14+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-08-20  6:48 UTC (permalink / raw)
  To: Yang Shi
  Cc: osalvador, tdmackey, david, willy, akpm, corbet, linux-mm, linux-kernel

On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> Currently just very simple message is shown for unhandlable page, e.g.
> non-LRU page, like:
> soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> 
> It is not very helpful for further debug, calling dump_page() could show
> more useful information.
> 
> Calling dump_page() in get_any_page() in order to not duplicate the call
> in a couple of different places.  It may be called with pcp disabled and
> holding memory hotplug lock, it should be not a big deal since hwpoison
> handler is not called very often.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  mm/memory-failure.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7cfa134b1370..60df8fcd0444 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1228,6 +1228,9 @@ static int get_any_page(struct page *p, unsigned long flags)
>  		ret = -EIO;
>  	}
>  out:
> +	if (ret == -EIO)
> +		dump_page(p, "hwpoison: unhandlable page");
> +

I feel that 4 callers of get_hwpoison_page() are in the different context,
so it might be better to consider them separately to add dump_page() or not.
soft_offline_page() still prints out "%s: %#lx: unknown page type: %lx (%pGp)"
message, which might be duplicate so this printk() may be dropped.
In memory_failure_hugetlb() and memory_failure(), we can call dump_page() after
action_result().  unpoison_memory() doesn't need dump_page() at all because
it's related to already hwpoisoned page.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page
  2021-08-19  5:41 [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page Yang Shi
  2021-08-19  5:41 ` [v2 PATCH 2/3] doc: hwpoison: correct the support for hugepage Yang Shi
  2021-08-19  5:41 ` [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page Yang Shi
@ 2021-08-20  7:04 ` HORIGUCHI NAOYA(堀口 直也)
  2021-08-20  7:08 ` David Hildenbrand
  3 siblings, 0 replies; 14+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-08-20  7:04 UTC (permalink / raw)
  To: Yang Shi
  Cc: osalvador, tdmackey, david, willy, akpm, corbet, linux-mm, linux-kernel

On Wed, Aug 18, 2021 at 10:41:14PM -0700, Yang Shi wrote:
> In the current implementation of soft offline, if non-LRU page is met,
> all the slab caches will be dropped to free the page then offline.  But
> if the page is not slab page all the effort is wasted in vain.  Even
> though it is a slab page, it is not guaranteed the page could be freed
> at all.
> 
> However the side effect and cost is quite high.  It does not only drop
> the slab caches, but also may drop a significant amount of page caches
> which are associated with inode caches.  It could make the most
> workingset gone in order to just offline a page.  And the offline is not
> guaranteed to succeed at all, actually I really doubt the success rate
> for real life workload.
> 
> Furthermore the worse consequence is the system may be locked up and
> unusable since the page cache release may incur huge amount of works
> queued for memcg release.
> 
> Actually we ran into such unpleasant case in our production environment.
> Firstly, the workqueue of memory_failure_work_func is locked up as
> below:
> 
> BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 53s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=14/256 refcnt=15
>     in-flight: 409271:memory_failure_work_func
>     pending: kfree_rcu_work, kfree_rcu_monitor, kfree_rcu_work, rht_deferred_worker, rht_deferred_worker, rht_deferred_worker, rht_deferred_worker, kfree_rcu_work, kfree_rcu_work, kfree_rcu_work, kfree_rcu_work, drain_local_stock, kfree_rcu_work
> workqueue mm_percpu_wq: flags=0x8
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>     pending: vmstat_update
> workqueue cgroup_destroy: flags=0x0
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1 refcnt=12072
>     pending: css_release_work_fn
> 
> There were over 12K css_release_work_fn queued, and this caused a few
> lockups due to the contention of worker pool lock with IRQ disabled, for
> example:
> 
> NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> Modules linked in: amd64_edac_mod edac_mce_amd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel xt_DSCP iptable_mangle kvm_amd bpfilter vfat fat acpi_ipmi i2c_piix4 usb_storage ipmi_si k10temp i2c_core ipmi_devintf ipmi_msghandler acpi_cpufreq sch_fq_codel xfs libcrc32c crc32c_intel mlx5_core mlxfw nvme xhci_pci ptp nvme_core pps_core xhci_hcd
> CPU: 1 PID: 205500 Comm: kworker/1:0 Tainted: G             L    5.10.32-t1.el7.twitter.x86_64 #1
> Hardware name: TYAN F5AMT /z        /S8026GM2NRE-CGN, BIOS V8.030 03/30/2021
> Workqueue: events memory_failure_work_func
> RIP: 0010:queued_spin_lock_slowpath+0x41/0x1a0
> Code: 41 f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 1b 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 f6 c4 01 75 04 c6 47
> RSP: 0018:ffff9b2ac278f900 EFLAGS: 00000002
> RAX: 0000000000480101 RBX: ffff8ce98ce71800 RCX: 0000000000000084
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8ce98ce6a140
> RBP: 00000000000284c8 R08: ffffd7248dcb6808 R09: 0000000000000000
> R10: 0000000000000003 R11: ffff9b2ac278f9b0 R12: 0000000000000001
> R13: ffff8cb44dab9c00 R14: ffffffffbd1ce6a0 R15: ffff8cacaa37f068
> FS:  0000000000000000(0000) GS:ffff8ce98ce40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fcf6e8cb000 CR3: 0000000a0c60a000 CR4: 0000000000350ee0
> Call Trace:
>  __queue_work+0xd6/0x3c0
>  queue_work_on+0x1c/0x30
>  uncharge_batch+0x10e/0x110
>  mem_cgroup_uncharge_list+0x6d/0x80
>  release_pages+0x37f/0x3f0
>  __pagevec_release+0x1c/0x50
>  __invalidate_mapping_pages+0x348/0x380
>  ? xfs_alloc_buftarg+0xa4/0x120 [xfs]
>  inode_lru_isolate+0x10a/0x160
>  ? iput+0x1d0/0x1d0
>  __list_lru_walk_one+0x7b/0x170
>  ? iput+0x1d0/0x1d0
>  list_lru_walk_one+0x4a/0x60
>  prune_icache_sb+0x37/0x50
>  super_cache_scan+0x123/0x1a0
>  do_shrink_slab+0x10c/0x2c0
>  shrink_slab+0x1f1/0x290
>  drop_slab_node+0x4d/0x70
>  soft_offline_page+0x1ac/0x5b0
>  ? dev_mce_log+0xee/0x110
>  ? notifier_call_chain+0x39/0x90
>  memory_failure_work_func+0x6a/0x90
>  process_one_work+0x19e/0x340
>  ? process_one_work+0x340/0x340
>  worker_thread+0x30/0x360
>  ? process_one_work+0x340/0x340
>  kthread+0x116/0x130
> 
> The lockup made the machine is quite unusable.  And it also made the
> most workingset gone, the reclaimabled slab caches were reduced from 12G
> to 300MB, the page caches were decreased from 17G to 4G.
> 
> But the most disappointing thing is all the effort doesn't make the page
> offline, it just returns:
> 
> soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> 
> It seems the aggressive behavior for non-LRU page didn't pay back, so it
> doesn't make too much sense to keep it considering the terrible side
> effect.
> 
> Reported-by: David Mackey <tdmackey@twitter.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
> v2: * Rebased on top of https://lore.kernel.org/linux-mm/CAHbLzkpAEZRTmnOnjVHYHGJ7ApjdC8eDh53DAnTHsG185QGOfQ@mail.gmail.com/T/#t (Naoya Horiguchi)
>     * Added comment about possible future optimization when handling
>       slab page (David Hildenbrand)
>     * Added patch #3 to call dump_page (Matthew Wilcox)

Thank you :)

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page
  2021-08-19  5:41 [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page Yang Shi
                   ` (2 preceding siblings ...)
  2021-08-20  7:04 ` [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page HORIGUCHI NAOYA(堀口 直也)
@ 2021-08-20  7:08 ` David Hildenbrand
  3 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2021-08-20  7:08 UTC (permalink / raw)
  To: Yang Shi, naoya.horiguchi, osalvador, tdmackey, willy, akpm, corbet
  Cc: linux-mm, linux-kernel

On 19.08.21 07:41, Yang Shi wrote:
> In the current implementation of soft offline, if non-LRU page is met,
> all the slab caches will be dropped to free the page then offline.  But
> if the page is not slab page all the effort is wasted in vain.  Even
> though it is a slab page, it is not guaranteed the page could be freed
> at all.
> 
> However the side effect and cost is quite high.  It does not only drop
> the slab caches, but also may drop a significant amount of page caches
> which are associated with inode caches.  It could make the most
> workingset gone in order to just offline a page.  And the offline is not
> guaranteed to succeed at all, actually I really doubt the success rate
> for real life workload.
> 
> Furthermore the worse consequence is the system may be locked up and
> unusable since the page cache release may incur huge amount of works
> queued for memcg release.
> 
> Actually we ran into such unpleasant case in our production environment.
> Firstly, the workqueue of memory_failure_work_func is locked up as
> below:
> 
> BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 53s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=14/256 refcnt=15
>      in-flight: 409271:memory_failure_work_func
>      pending: kfree_rcu_work, kfree_rcu_monitor, kfree_rcu_work, rht_deferred_worker, rht_deferred_worker, rht_deferred_worker, rht_deferred_worker, kfree_rcu_work, kfree_rcu_work, kfree_rcu_work, kfree_rcu_work, drain_local_stock, kfree_rcu_work
> workqueue mm_percpu_wq: flags=0x8
>    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>      pending: vmstat_update
> workqueue cgroup_destroy: flags=0x0
>    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1 refcnt=12072
>      pending: css_release_work_fn
> 
> There were over 12K css_release_work_fn queued, and this caused a few
> lockups due to the contention of worker pool lock with IRQ disabled, for
> example:
> 
> NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> Modules linked in: amd64_edac_mod edac_mce_amd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel xt_DSCP iptable_mangle kvm_amd bpfilter vfat fat acpi_ipmi i2c_piix4 usb_storage ipmi_si k10temp i2c_core ipmi_devintf ipmi_msghandler acpi_cpufreq sch_fq_codel xfs libcrc32c crc32c_intel mlx5_core mlxfw nvme xhci_pci ptp nvme_core pps_core xhci_hcd
> CPU: 1 PID: 205500 Comm: kworker/1:0 Tainted: G             L    5.10.32-t1.el7.twitter.x86_64 #1
> Hardware name: TYAN F5AMT /z        /S8026GM2NRE-CGN, BIOS V8.030 03/30/2021
> Workqueue: events memory_failure_work_func
> RIP: 0010:queued_spin_lock_slowpath+0x41/0x1a0
> Code: 41 f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 1b 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 f6 c4 01 75 04 c6 47
> RSP: 0018:ffff9b2ac278f900 EFLAGS: 00000002
> RAX: 0000000000480101 RBX: ffff8ce98ce71800 RCX: 0000000000000084
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8ce98ce6a140
> RBP: 00000000000284c8 R08: ffffd7248dcb6808 R09: 0000000000000000
> R10: 0000000000000003 R11: ffff9b2ac278f9b0 R12: 0000000000000001
> R13: ffff8cb44dab9c00 R14: ffffffffbd1ce6a0 R15: ffff8cacaa37f068
> FS:  0000000000000000(0000) GS:ffff8ce98ce40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fcf6e8cb000 CR3: 0000000a0c60a000 CR4: 0000000000350ee0
> Call Trace:
>   __queue_work+0xd6/0x3c0
>   queue_work_on+0x1c/0x30
>   uncharge_batch+0x10e/0x110
>   mem_cgroup_uncharge_list+0x6d/0x80
>   release_pages+0x37f/0x3f0
>   __pagevec_release+0x1c/0x50
>   __invalidate_mapping_pages+0x348/0x380
>   ? xfs_alloc_buftarg+0xa4/0x120 [xfs]
>   inode_lru_isolate+0x10a/0x160
>   ? iput+0x1d0/0x1d0
>   __list_lru_walk_one+0x7b/0x170
>   ? iput+0x1d0/0x1d0
>   list_lru_walk_one+0x4a/0x60
>   prune_icache_sb+0x37/0x50
>   super_cache_scan+0x123/0x1a0
>   do_shrink_slab+0x10c/0x2c0
>   shrink_slab+0x1f1/0x290
>   drop_slab_node+0x4d/0x70
>   soft_offline_page+0x1ac/0x5b0
>   ? dev_mce_log+0xee/0x110
>   ? notifier_call_chain+0x39/0x90
>   memory_failure_work_func+0x6a/0x90
>   process_one_work+0x19e/0x340
>   ? process_one_work+0x340/0x340
>   worker_thread+0x30/0x360
>   ? process_one_work+0x340/0x340
>   kthread+0x116/0x130
> 
> The lockup made the machine is quite unusable.  And it also made the
> most workingset gone, the reclaimabled slab caches were reduced from 12G
> to 300MB, the page caches were decreased from 17G to 4G.
> 
> But the most disappointing thing is all the effort doesn't make the page
> offline, it just returns:
> 
> soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> 
> It seems the aggressive behavior for non-LRU page didn't pay back, so it
> doesn't make too much sense to keep it considering the terrible side
> effect.
> 
> Reported-by: David Mackey <tdmackey@twitter.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 2/3] doc: hwpoison: correct the support for hugepage
  2021-08-19  5:41 ` [v2 PATCH 2/3] doc: hwpoison: correct the support for hugepage Yang Shi
@ 2021-08-20  9:34   ` David Hildenbrand
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2021-08-20  9:34 UTC (permalink / raw)
  To: Yang Shi, naoya.horiguchi, osalvador, tdmackey, willy, akpm, corbet
  Cc: linux-mm, linux-kernel

On 19.08.21 07:41, Yang Shi wrote:
> The hwpoison support for huge page, both hugetlb and THP, has been in
> kernel for a while, the statement in document is obsolete, correct it.
> 
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
> v2: * Collected ack from Naoya Horiguchi
> 
>   Documentation/vm/hwpoison.rst | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
> index a5c884293dac..89b5f7a52077 100644
> --- a/Documentation/vm/hwpoison.rst
> +++ b/Documentation/vm/hwpoison.rst
> @@ -180,7 +180,6 @@ Limitations
>   ===========
>   - Not all page types are supported and never will. Most kernel internal
>     objects cannot be recovered, only LRU pages for now.
> -- Right now hugepage support is missing.
>   
>   ---
>   Andi Kleen, Oct 2009
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page
  2021-08-20  6:48   ` HORIGUCHI NAOYA(堀口 直也)
@ 2021-08-20 18:40     ` Yang Shi
  2021-08-23  5:05       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 14+ messages in thread
From: Yang Shi @ 2021-08-20 18:40 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: osalvador, tdmackey, david, willy, akpm, corbet, linux-mm, linux-kernel

On Thu, Aug 19, 2021 at 11:48 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@nec.com> wrote:
>
> On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> > Currently just very simple message is shown for unhandlable page, e.g.
> > non-LRU page, like:
> > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> >
> > It is not very helpful for further debug, calling dump_page() could show
> > more useful information.
> >
> > Calling dump_page() in get_any_page() in order to not duplicate the call
> > in a couple of different places.  It may be called with pcp disabled and
> > holding memory hotplug lock, it should be not a big deal since hwpoison
> > handler is not called very often.
> >
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Cc: Oscar Salvador <osalvador@suse.de>
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  mm/memory-failure.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 7cfa134b1370..60df8fcd0444 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1228,6 +1228,9 @@ static int get_any_page(struct page *p, unsigned long flags)
> >               ret = -EIO;
> >       }
> >  out:
> > +     if (ret == -EIO)
> > +             dump_page(p, "hwpoison: unhandlable page");
> > +
>
> I feel that 4 callers of get_hwpoison_page() are in the different context,
> so it might be better to consider them separately to add dump_page() or not.
> soft_offline_page() still prints out "%s: %#lx: unknown page type: %lx (%pGp)"

No strong opinion to keep or remove it.

> message, which might be duplicate so this printk() may be dropped.
> In memory_failure_hugetlb() and memory_failure(), we can call dump_page() after
> action_result().  unpoison_memory() doesn't need dump_page() at all because
> it's related to already hwpoisoned page.

I don't have a strong opinion either to have the dump_page() called
either before action or after action, it just moves around the dumped
page information around that printk.

For unpoison_memory(), I think it is harmless to have dump_page()
called, right? If get_hwpoison_page() can't return -EIO, then the
dump_page() won't be called at all, if it is possible then this is
exactly why we call dump_page() to help debug.

So IMHO calling dump_page() in get_any_page when -EIO is returned
could work for all the cases well and avoid duplicating the call.

>
> Thanks,
> Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page
  2021-08-20 18:40     ` Yang Shi
@ 2021-08-23  5:05       ` HORIGUCHI NAOYA(堀口 直也)
  2021-08-23 17:47         ` Yang Shi
  0 siblings, 1 reply; 14+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-08-23  5:05 UTC (permalink / raw)
  To: Yang Shi
  Cc: osalvador, tdmackey, david, willy, akpm, corbet, linux-mm, linux-kernel

On Fri, Aug 20, 2021 at 11:40:24AM -0700, Yang Shi wrote:
> On Thu, Aug 19, 2021 at 11:48 PM HORIGUCHI NAOYA(堀口 直也)
> <naoya.horiguchi@nec.com> wrote:
> >
> > On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> > > Currently just very simple message is shown for unhandlable page, e.g.
> > > non-LRU page, like:
> > > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> > >
> > > It is not very helpful for further debug, calling dump_page() could show
> > > more useful information.
> > >
> > > Calling dump_page() in get_any_page() in order to not duplicate the call
> > > in a couple of different places.  It may be called with pcp disabled and
> > > holding memory hotplug lock, it should be not a big deal since hwpoison
> > > handler is not called very often.
> > >
> > > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > > Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > Cc: Oscar Salvador <osalvador@suse.de>
> > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > ---
> > >  mm/memory-failure.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > index 7cfa134b1370..60df8fcd0444 100644
> > > --- a/mm/memory-failure.c
> > > +++ b/mm/memory-failure.c
> > > @@ -1228,6 +1228,9 @@ static int get_any_page(struct page *p, unsigned long flags)
> > >               ret = -EIO;
> > >       }
> > >  out:
> > > +     if (ret == -EIO)
> > > +             dump_page(p, "hwpoison: unhandlable page");
> > > +
> >
> > I feel that 4 callers of get_hwpoison_page() are in the different context,
> > so it might be better to consider them separately to add dump_page() or not.
> > soft_offline_page() still prints out "%s: %#lx: unknown page type: %lx (%pGp)"
> 
> No strong opinion to keep or remove it.

Reading the explanation below, I think that calling dump_page() in the
original place is fine.  So let's remove "else if (ret == 0)" block in
soft_offline_page().

> 
> > message, which might be duplicate so this printk() may be dropped.
> > In memory_failure_hugetlb() and memory_failure(), we can call dump_page() after
> > action_result().  unpoison_memory() doesn't need dump_page() at all because
> > it's related to already hwpoisoned page.
> 
> I don't have a strong opinion either to have the dump_page() called
> either before action or after action, it just moves around the dumped
> page information around that printk.
> 
> For unpoison_memory(), I think it is harmless to have dump_page()
> called, right? If get_hwpoison_page() can't return -EIO, then the
> dump_page() won't be called at all, if it is possible then this is
> exactly why we call dump_page() to help debug.
> 
> So IMHO calling dump_page() in get_any_page when -EIO is returned
> could work for all the cases well and avoid duplicating the call.

Fair enough. So could you repost 3/3 with the above change in soft_offline_page()?

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page
  2021-08-23  5:05       ` HORIGUCHI NAOYA(堀口 直也)
@ 2021-08-23 17:47         ` Yang Shi
  2021-08-23 23:17           ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 14+ messages in thread
From: Yang Shi @ 2021-08-23 17:47 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: osalvador, tdmackey, david, willy, akpm, corbet, linux-mm, linux-kernel

On Sun, Aug 22, 2021 at 10:05 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@nec.com> wrote:
>
> On Fri, Aug 20, 2021 at 11:40:24AM -0700, Yang Shi wrote:
> > On Thu, Aug 19, 2021 at 11:48 PM HORIGUCHI NAOYA(堀口 直也)
> > <naoya.horiguchi@nec.com> wrote:
> > >
> > > On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> > > > Currently just very simple message is shown for unhandlable page, e.g.
> > > > non-LRU page, like:
> > > > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> > > >
> > > > It is not very helpful for further debug, calling dump_page() could show
> > > > more useful information.
> > > >
> > > > Calling dump_page() in get_any_page() in order to not duplicate the call
> > > > in a couple of different places.  It may be called with pcp disabled and
> > > > holding memory hotplug lock, it should be not a big deal since hwpoison
> > > > handler is not called very often.
> > > >
> > > > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > > > Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > > Cc: Oscar Salvador <osalvador@suse.de>
> > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > ---
> > > >  mm/memory-failure.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > > index 7cfa134b1370..60df8fcd0444 100644
> > > > --- a/mm/memory-failure.c
> > > > +++ b/mm/memory-failure.c
> > > > @@ -1228,6 +1228,9 @@ static int get_any_page(struct page *p, unsigned long flags)
> > > >               ret = -EIO;
> > > >       }
> > > >  out:
> > > > +     if (ret == -EIO)
> > > > +             dump_page(p, "hwpoison: unhandlable page");
> > > > +
> > >
> > > I feel that 4 callers of get_hwpoison_page() are in the different context,
> > > so it might be better to consider them separately to add dump_page() or not.
> > > soft_offline_page() still prints out "%s: %#lx: unknown page type: %lx (%pGp)"
> >
> > No strong opinion to keep or remove it.
>
> Reading the explanation below, I think that calling dump_page() in the
> original place is fine.  So let's remove "else if (ret == 0)" block in
> soft_offline_page().

The "else if (ret == 0)" block is used to handle free page IIUC. I'm
supposed you mean the "else if (ret == -EIO)" block which just calls
printk.

>
> >
> > > message, which might be duplicate so this printk() may be dropped.
> > > In memory_failure_hugetlb() and memory_failure(), we can call dump_page() after
> > > action_result().  unpoison_memory() doesn't need dump_page() at all because
> > > it's related to already hwpoisoned page.
> >
> > I don't have a strong opinion either to have the dump_page() called
> > either before action or after action, it just moves around the dumped
> > page information around that printk.
> >
> > For unpoison_memory(), I think it is harmless to have dump_page()
> > called, right? If get_hwpoison_page() can't return -EIO, then the
> > dump_page() won't be called at all, if it is possible then this is
> > exactly why we call dump_page() to help debug.
> >
> > So IMHO calling dump_page() in get_any_page when -EIO is returned
> > could work for all the cases well and avoid duplicating the call.
>
> Fair enough. So could you repost 3/3 with the above change in soft_offline_page()?
>
> Thanks,
> Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page
  2021-08-23 17:47         ` Yang Shi
@ 2021-08-23 23:17           ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 14+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-08-23 23:17 UTC (permalink / raw)
  To: Yang Shi
  Cc: osalvador, tdmackey, david, willy, akpm, corbet, linux-mm, linux-kernel

On Mon, Aug 23, 2021 at 10:47:03AM -0700, Yang Shi wrote:
> On Sun, Aug 22, 2021 at 10:05 PM HORIGUCHI NAOYA(堀口 直也)
> <naoya.horiguchi@nec.com> wrote:
> >
> > On Fri, Aug 20, 2021 at 11:40:24AM -0700, Yang Shi wrote:
> > > On Thu, Aug 19, 2021 at 11:48 PM HORIGUCHI NAOYA(堀口 直也)
> > > <naoya.horiguchi@nec.com> wrote:
> > > >
> > > > On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> > > > > Currently just very simple message is shown for unhandlable page, e.g.
> > > > > non-LRU page, like:
> > > > > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> > > > >
> > > > > It is not very helpful for further debug, calling dump_page() could show
> > > > > more useful information.
> > > > >
> > > > > Calling dump_page() in get_any_page() in order to not duplicate the call
> > > > > in a couple of different places.  It may be called with pcp disabled and
> > > > > holding memory hotplug lock, it should be not a big deal since hwpoison
> > > > > handler is not called very often.
> > > > >
> > > > > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > > > > Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > > > Cc: Oscar Salvador <osalvador@suse.de>
> > > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > > ---
> > > > >  mm/memory-failure.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > > > index 7cfa134b1370..60df8fcd0444 100644
> > > > > --- a/mm/memory-failure.c
> > > > > +++ b/mm/memory-failure.c
> > > > > @@ -1228,6 +1228,9 @@ static int get_any_page(struct page *p, unsigned long flags)
> > > > >               ret = -EIO;
> > > > >       }
> > > > >  out:
> > > > > +     if (ret == -EIO)
> > > > > +             dump_page(p, "hwpoison: unhandlable page");
> > > > > +
> > > >
> > > > I feel that 4 callers of get_hwpoison_page() are in the different context,
> > > > so it might be better to consider them separately to add dump_page() or not.
> > > > soft_offline_page() still prints out "%s: %#lx: unknown page type: %lx (%pGp)"
> > >
> > > No strong opinion to keep or remove it.
> >
> > Reading the explanation below, I think that calling dump_page() in the
> > original place is fine.  So let's remove "else if (ret == 0)" block in
> > soft_offline_page().
> 
> The "else if (ret == 0)" block is used to handle free page IIUC. I'm
> supposed you mean the "else if (ret == -EIO)" block which just calls
> printk.

Sorry, you're right. I miss-copied the line.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page
  2021-08-19  5:41 ` [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page Yang Shi
  2021-08-20  6:48   ` HORIGUCHI NAOYA(堀口 直也)
@ 2021-09-22 19:37   ` Luck, Tony
  2021-09-22 19:58     ` Yang Shi
  1 sibling, 1 reply; 14+ messages in thread
From: Luck, Tony @ 2021-09-22 19:37 UTC (permalink / raw)
  To: Yang Shi
  Cc: naoya.horiguchi, osalvador, tdmackey, david, willy, akpm, corbet,
	linux-mm, linux-kernel

On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> Currently just very simple message is shown for unhandlable page, e.g.
> non-LRU page, like:
> soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> 
> It is not very helpful for further debug, calling dump_page() could show
> more useful information.

Looks like your code already caught something. An error injection
test may have injected into a shared library. Though I'm not sure that
the refcount/mapcount in the dump agrees with that diagnosis from the
author of this test.

Here's what appeared on the console:

[ 4817.622254] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4817.630520] page:000000003ab9dca4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xcef2747
[ 4817.638651] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4817.646860] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
[ 4818.025515] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4818.033689] raw: 0057ffffc0801000 ffd400033bc9d1c8 ffd400033bc9d1c8 0000000000000000
[ 4818.272435] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4818.280640] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 4818.280658] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4818.313606] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4818.321804] page dumped because: hwpoison: unhandlable page
[ 4818.564802] mce: Uncorrected hardware memory error in user-access at cef2747000
[ 4818.573043] Memory failure: 0xcef2747: recovery action for unknown page: Ignored
[ 4818.595837] Memory failure: 0xcef2747: already hardware poisoned
[ 4818.603245] Memory failure: 0xcef2747: Sending SIGBUS to multichase:67460 due to hardware memory corruption
[ 4818.614297] Memory failure: 0xcef2747: already hardware poisoned

-Tony

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page
  2021-09-22 19:37   ` Luck, Tony
@ 2021-09-22 19:58     ` Yang Shi
  2021-09-22 20:37       ` Yang Shi
  0 siblings, 1 reply; 14+ messages in thread
From: Yang Shi @ 2021-09-22 19:58 UTC (permalink / raw)
  To: Luck, Tony
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Oscar Salvador, tdmackey, David Hildenbrand, Matthew Wilcox,
	Andrew Morton, Jonathan Corbet, Linux MM,
	Linux Kernel Mailing List

On Wed, Sep 22, 2021 at 12:37 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> > Currently just very simple message is shown for unhandlable page, e.g.
> > non-LRU page, like:
> > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> >
> > It is not very helpful for further debug, calling dump_page() could show
> > more useful information.
>
> Looks like your code already caught something. An error injection
> test may have injected into a shared library. Though I'm not sure that
> the refcount/mapcount in the dump agrees with that diagnosis from the
> author of this test.

The messages from dump_page() are (unwind them from mce logs):

[ 4817.630520] page:000000003ab9dca4 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0xcef2747
[ 4817.646860] flags:
0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
[ 4818.033689] raw: 0057ffffc0801000 ffd400033bc9d1c8 ffd400033bc9d1c8
0000000000000000
[ 4818.280640] raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000

The page flags tell it is a "reserved" page and mapping is NULL. It
doesn't seem like a user page or movable page, so hwpoision can't
handle it so that the messages are dumped.

>
> Here's what appeared on the console:
>
> [ 4817.622254] mce: Uncorrected hardware memory error in user-access at cef2747000
> [ 4817.630520] page:000000003ab9dca4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xcef2747
> [ 4817.638651] mce: Uncorrected hardware memory error in user-access at cef2747000
> [ 4817.646860] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> [ 4818.025515] mce: Uncorrected hardware memory error in user-access at cef2747000
> [ 4818.033689] raw: 0057ffffc0801000 ffd400033bc9d1c8 ffd400033bc9d1c8 0000000000000000
> [ 4818.272435] mce: Uncorrected hardware memory error in user-access at cef2747000
> [ 4818.280640] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [ 4818.280658] mce: Uncorrected hardware memory error in user-access at cef2747000
> [ 4818.313606] mce: Uncorrected hardware memory error in user-access at cef2747000
> [ 4818.321804] page dumped because: hwpoison: unhandlable page
> [ 4818.564802] mce: Uncorrected hardware memory error in user-access at cef2747000
> [ 4818.573043] Memory failure: 0xcef2747: recovery action for unknown page: Ignored
> [ 4818.595837] Memory failure: 0xcef2747: already hardware poisoned
> [ 4818.603245] Memory failure: 0xcef2747: Sending SIGBUS to multichase:67460 due to hardware memory corruption
> [ 4818.614297] Memory failure: 0xcef2747: already hardware poisoned
>
> -Tony

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page
  2021-09-22 19:58     ` Yang Shi
@ 2021-09-22 20:37       ` Yang Shi
  0 siblings, 0 replies; 14+ messages in thread
From: Yang Shi @ 2021-09-22 20:37 UTC (permalink / raw)
  To: Luck, Tony
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Oscar Salvador, tdmackey, David Hildenbrand, Matthew Wilcox,
	Andrew Morton, Jonathan Corbet, Linux MM,
	Linux Kernel Mailing List

On Wed, Sep 22, 2021 at 12:58 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, Sep 22, 2021 at 12:37 PM Luck, Tony <tony.luck@intel.com> wrote:
> >
> > On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote:
> > > Currently just very simple message is shown for unhandlable page, e.g.
> > > non-LRU page, like:
> > > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> > >
> > > It is not very helpful for further debug, calling dump_page() could show
> > > more useful information.
> >
> > Looks like your code already caught something. An error injection
> > test may have injected into a shared library. Though I'm not sure that
> > the refcount/mapcount in the dump agrees with that diagnosis from the
> > author of this test.
>
> The messages from dump_page() are (unwind them from mce logs):
>
> [ 4817.630520] page:000000003ab9dca4 refcount:1 mapcount:0
> mapping:0000000000000000 index:0x0 pfn:0xcef2747
> [ 4817.646860] flags:
> 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> [ 4818.033689] raw: 0057ffffc0801000 ffd400033bc9d1c8 ffd400033bc9d1c8
> 0000000000000000
> [ 4818.280640] raw: 0000000000000000 0000000000000000 00000001ffffffff
> 0000000000000000

Missed one line from the dump:

[ 4818.321804] page dumped because: hwpoison: unhandlable page

Anyway dump_page() is just called when unhandlable page is met.

>
> The page flags tell it is a "reserved" page and mapping is NULL. It
> doesn't seem like a user page or movable page, so hwpoision can't
> handle it so that the messages are dumped.
>
> >
> > Here's what appeared on the console:
> >
> > [ 4817.622254] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4817.630520] page:000000003ab9dca4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xcef2747
> > [ 4817.638651] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4817.646860] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> > [ 4818.025515] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4818.033689] raw: 0057ffffc0801000 ffd400033bc9d1c8 ffd400033bc9d1c8 0000000000000000
> > [ 4818.272435] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4818.280640] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> > [ 4818.280658] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4818.313606] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4818.321804] page dumped because: hwpoison: unhandlable page
> > [ 4818.564802] mce: Uncorrected hardware memory error in user-access at cef2747000
> > [ 4818.573043] Memory failure: 0xcef2747: recovery action for unknown page: Ignored
> > [ 4818.595837] Memory failure: 0xcef2747: already hardware poisoned
> > [ 4818.603245] Memory failure: 0xcef2747: Sending SIGBUS to multichase:67460 due to hardware memory corruption
> > [ 4818.614297] Memory failure: 0xcef2747: already hardware poisoned
> >
> > -Tony

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-22 20:37 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-19  5:41 [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page Yang Shi
2021-08-19  5:41 ` [v2 PATCH 2/3] doc: hwpoison: correct the support for hugepage Yang Shi
2021-08-20  9:34   ` David Hildenbrand
2021-08-19  5:41 ` [v2 PATCH 3/3] mm: hwpoison: dump page for unhandlable page Yang Shi
2021-08-20  6:48   ` HORIGUCHI NAOYA(堀口 直也)
2021-08-20 18:40     ` Yang Shi
2021-08-23  5:05       ` HORIGUCHI NAOYA(堀口 直也)
2021-08-23 17:47         ` Yang Shi
2021-08-23 23:17           ` HORIGUCHI NAOYA(堀口 直也)
2021-09-22 19:37   ` Luck, Tony
2021-09-22 19:58     ` Yang Shi
2021-09-22 20:37       ` Yang Shi
2021-08-20  7:04 ` [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page HORIGUCHI NAOYA(堀口 直也)
2021-08-20  7:08 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).