linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][RFC v3] PM / Hibernate: Feed the wathdog when creating snapshot
@ 2017-08-17  4:04 Chen Yu
  2017-08-17 11:28 ` Michal Hocko
  0 siblings, 1 reply; 3+ messages in thread
From: Chen Yu @ 2017-08-17  4:04 UTC (permalink / raw)
  To: linux-mm
  Cc: Chen Yu, Andrew Morton, Michal Hocko, Mel Gorman,
	Vlastimil Babka, Rafael J. Wysocki, Len Brown, Dan Williams,
	linux-kernel

There is a problem that when counting the pages for creating
the hibernation snapshot will take significant amount of
time, especially on system with large memory. Since the counting
job is performed with irq disabled, this might lead to NMI lockup.
The following warning were found on a system with 1.5TB DRAM:

[ 1124.758184] Freezing user space processes ... (elapsed 0.002 seconds) done.
[ 1124.768721] OOM killer disabled.
[ 1124.847009] PM: Preallocating image memory...
[ 1139.392042] NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
[ 1139.392076] CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted 4.13.0-0.rc2.git0.1.fc27.x86_64 #1
[ 1139.392077] task: ffff9f01971ac000 task.stack: ffffb1a3f325c000
[ 1139.392083] RIP: 0010:memory_bm_find_bit+0xf4/0x100
[ 1139.392084] RSP: 0018:ffffb1a3f325fc20 EFLAGS: 00000006
[ 1139.392084] RAX: 0000000000000000 RBX: 0000000013b83000 RCX: ffff9fbe89caf000
[ 1139.392085] RDX: ffffb1a3f325fc30 RSI: 0000000000003200 RDI: ffff9fbeaffffe80
[ 1139.392085] RBP: ffffb1a3f325fc40 R08: 0000000013b80000 R09: ffff9fbe89c54878
[ 1139.392085] R10: ffffb1a3f325fc2c R11: 0000000013b83200 R12: 0000000000000400
[ 1139.392086] R13: fffffd552e0c0000 R14: ffff9fc1bffd31e0 R15: 0000000000000202
[ 1139.392086] FS:  00007f3189704180(0000) GS:ffff9fbec8ec0000(0000) knlGS:0000000000000000
[ 1139.392087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1139.392087] CR2: 00000085da0f7398 CR3: 000001771cf9a000 CR4: 00000000007406e0
[ 1139.392088] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1139.392088] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1139.392088] PKRU: 55555554
[ 1139.392089] Call Trace:
[ 1139.392092]  ? memory_bm_set_bit+0x29/0x60
[ 1139.392094]  swsusp_set_page_free+0x2b/0x30
[ 1139.392098]  mark_free_pages+0x147/0x1c0
[ 1139.392099]  count_data_pages+0x41/0xa0
[ 1139.392101]  hibernate_preallocate_memory+0x80/0x450
[ 1139.392102]  hibernation_snapshot+0x58/0x410
[ 1139.392103]  hibernate+0x17c/0x310
[ 1139.392104]  state_store+0xdf/0xf0
[ 1139.392107]  kobj_attr_store+0xf/0x20
[ 1139.392111]  sysfs_kf_write+0x37/0x40
[ 1139.392113]  kernfs_fop_write+0x11c/0x1a0
[ 1139.392117]  __vfs_write+0x37/0x170
[ 1139.392121]  ? handle_mm_fault+0xd8/0x230
[ 1139.392122]  vfs_write+0xb1/0x1a0
[ 1139.392123]  SyS_write+0x55/0xc0
[ 1139.392126]  entry_SYSCALL_64_fastpath+0x1a/0xa5

So avoid the NMI lockup by feeding the watchdog every 1000 pages.

Reported-by: Jan Filipcewicz <jan.filipcewicz@intel.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 mm/page_alloc.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6d00f74..0266eb6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -66,6 +66,7 @@
 #include <linux/kthread.h>
 #include <linux/memcontrol.h>
 #include <linux/ftrace.h>
+#include <linux/nmi.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -2531,11 +2532,14 @@ void drain_all_pages(struct zone *zone)
 
 #ifdef CONFIG_HIBERNATION
 
+/* Touch watchdog for every WD_INTERVAL_PAGE pages. */
+#define WD_INTERVAL_PAGE	1000
+
 void mark_free_pages(struct zone *zone)
 {
 	unsigned long pfn, max_zone_pfn;
 	unsigned long flags;
-	unsigned int order, t;
+	unsigned int order, t, page_num = 0;
 	struct page *page;
 
 	if (zone_is_empty(zone))
@@ -2548,6 +2552,9 @@ void mark_free_pages(struct zone *zone)
 		if (pfn_valid(pfn)) {
 			page = pfn_to_page(pfn);
 
+			if (!((page_num++) % WD_INTERVAL_PAGE))
+				touch_nmi_watchdog();
+
 			if (page_zone(page) != zone)
 				continue;
 
@@ -2555,14 +2562,19 @@ void mark_free_pages(struct zone *zone)
 				swsusp_unset_page_free(page);
 		}
 
+	page_num = 0;
+
 	for_each_migratetype_order(order, t) {
 		list_for_each_entry(page,
 				&zone->free_area[order].free_list[t], lru) {
 			unsigned long i;
 
 			pfn = page_to_pfn(page);
-			for (i = 0; i < (1UL << order); i++)
+			for (i = 0; i < (1UL << order); i++) {
+				if (!((page_num++) % WD_INTERVAL_PAGE))
+					touch_nmi_watchdog();
 				swsusp_set_page_free(pfn_to_page(pfn + i));
+			}
 		}
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH][RFC v3] PM / Hibernate: Feed the wathdog when creating snapshot
  2017-08-17  4:04 [PATCH][RFC v3] PM / Hibernate: Feed the wathdog when creating snapshot Chen Yu
@ 2017-08-17 11:28 ` Michal Hocko
  2017-08-19 10:07   ` Chen Yu
  0 siblings, 1 reply; 3+ messages in thread
From: Michal Hocko @ 2017-08-17 11:28 UTC (permalink / raw)
  To: Chen Yu
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Rafael J. Wysocki, Len Brown, Dan Williams, linux-kernel

On Thu 17-08-17 12:04:34, Chen Yu wrote:
[...]
>  #ifdef CONFIG_HIBERNATION
>  
> +/* Touch watchdog for every WD_INTERVAL_PAGE pages. */
> +#define WD_INTERVAL_PAGE	1000

traversing 1000 pages should never take too much time so this could be
overly aggressive. 100k pages could be acceptable as well. But I haven't
measure that so I might be easily wrong here. So this is just my 2c

> +
>  void mark_free_pages(struct zone *zone)
>  {
>  	unsigned long pfn, max_zone_pfn;
>  	unsigned long flags;
> -	unsigned int order, t;
> +	unsigned int order, t, page_num = 0;
>  	struct page *page;
>  
>  	if (zone_is_empty(zone))
> @@ -2548,6 +2552,9 @@ void mark_free_pages(struct zone *zone)
>  		if (pfn_valid(pfn)) {
>  			page = pfn_to_page(pfn);
>  
> +			if (!((page_num++) % WD_INTERVAL_PAGE))
> +				touch_nmi_watchdog();
> +
>  			if (page_zone(page) != zone)
>  				continue;
>  
> @@ -2555,14 +2562,19 @@ void mark_free_pages(struct zone *zone)
>  				swsusp_unset_page_free(page);
>  		}
>  
> +	page_num = 0;
> +

this part doesn't make much sense to me. You are still inside the same
IRQ disabled section. So why would you want to start counting from 0
again. Not that this would make any difference in real life but the code
is not logical

>  	for_each_migratetype_order(order, t) {
>  		list_for_each_entry(page,
>  				&zone->free_area[order].free_list[t], lru) {
>  			unsigned long i;
>  
>  			pfn = page_to_pfn(page);
> -			for (i = 0; i < (1UL << order); i++)
> +			for (i = 0; i < (1UL << order); i++) {
> +				if (!((page_num++) % WD_INTERVAL_PAGE))
> +					touch_nmi_watchdog();
>  				swsusp_set_page_free(pfn_to_page(pfn + i));
> +			}
>  		}
>  	}
>  	spin_unlock_irqrestore(&zone->lock, flags);
> -- 
> 2.7.4

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH][RFC v3] PM / Hibernate: Feed the wathdog when creating snapshot
  2017-08-17 11:28 ` Michal Hocko
@ 2017-08-19 10:07   ` Chen Yu
  0 siblings, 0 replies; 3+ messages in thread
From: Chen Yu @ 2017-08-19 10:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Rafael J. Wysocki, Len Brown, Dan Williams, linux-kernel

On Thu, Aug 17, 2017 at 01:28:06PM +0200, Michal Hocko wrote:
> On Thu 17-08-17 12:04:34, Chen Yu wrote:
> [...]
> >  #ifdef CONFIG_HIBERNATION
> >  
> > +/* Touch watchdog for every WD_INTERVAL_PAGE pages. */
> > +#define WD_INTERVAL_PAGE	1000
> 
> traversing 1000 pages should never take too much time so this could be
> overly aggressive. 100k pages could be acceptable as well. But I haven't
> measure that so I might be easily wrong here. So this is just my 2c
>
After checking the log:
[ 1144.690405] done (allocated 6590003 pages)
[ 1144.694971] PM: Allocated 26360012 kbytes in 19.89 seconds (1325.28 MB/s)

The default NMI timeout is 10 seconds AFAIK, in case the user might modify it to
1 second, a safe interval page could be 6590003/20 = 320k pages, but there
might be other machines running in a lower freq, 100k should be more robust.
I'll change it to 100k.
> > +
> >  void mark_free_pages(struct zone *zone)
> >  {
> >  	unsigned long pfn, max_zone_pfn;
> >  	unsigned long flags;
> > -	unsigned int order, t;
> > +	unsigned int order, t, page_num = 0;
> >  	struct page *page;
> >  
> >  	if (zone_is_empty(zone))
> > @@ -2548,6 +2552,9 @@ void mark_free_pages(struct zone *zone)
> >  		if (pfn_valid(pfn)) {
> >  			page = pfn_to_page(pfn);
> >  
> > +			if (!((page_num++) % WD_INTERVAL_PAGE))
> > +				touch_nmi_watchdog();
> > +
> >  			if (page_zone(page) != zone)
> >  				continue;
> >  
> > @@ -2555,14 +2562,19 @@ void mark_free_pages(struct zone *zone)
> >  				swsusp_unset_page_free(page);
> >  		}
> >  
> > +	page_num = 0;
> > +
> 
> this part doesn't make much sense to me. You are still inside the same
> IRQ disabled section. So why would you want to start counting from 0
> again. Not that this would make any difference in real life but the code
> is not logical
> 
Ok, will delete this.
> >  	for_each_migratetype_order(order, t) {
> >  		list_for_each_entry(page,
> >  				&zone->free_area[order].free_list[t], lru) {
> >  			unsigned long i;
> >  
> >  			pfn = page_to_pfn(page);
> > -			for (i = 0; i < (1UL << order); i++)
> > +			for (i = 0; i < (1UL << order); i++) {
> > +				if (!((page_num++) % WD_INTERVAL_PAGE))
> > +					touch_nmi_watchdog();
> >  				swsusp_set_page_free(pfn_to_page(pfn + i));
> > +			}
> >  		}
> >  	}
> >  	spin_unlock_irqrestore(&zone->lock, flags);
> > -- 
> > 2.7.4
> 
> -- 
> Michal Hocko
> SUSE Labs
Thanks,
	Yu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-08-19 10:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-17  4:04 [PATCH][RFC v3] PM / Hibernate: Feed the wathdog when creating snapshot Chen Yu
2017-08-17 11:28 ` Michal Hocko
2017-08-19 10:07   ` Chen Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).