linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] mm, memory_hotplug: fix few soft lockups in memory hotadd
@ 2017-09-18 12:14 Michal Hocko
  2017-09-18 12:14 ` [PATCH 1/3] mm, memory_hotplug: add scheduling point to __add_pages Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Michal Hocko @ 2017-09-18 12:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dan Williams, LKML, linux-mm, Johannes Thumshirn, Michal Hocko

Hi,
Johannes has noticed few soft lockups when adding a large nvdimm
device. All of them were caused by a long loop without any explicit
cond_resched which is a problem for !PREEMPT kernels. The fix is quite
straightforward. Just make sure that cond_resched gets called from time
to time.

Could you consider these for merging?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] mm, memory_hotplug: add scheduling point to __add_pages
  2017-09-18 12:14 [PATCH 0/3] mm, memory_hotplug: fix few soft lockups in memory hotadd Michal Hocko
@ 2017-09-18 12:14 ` Michal Hocko
  2017-09-18 12:16   ` Johannes Thumshirn
  2017-09-18 12:14 ` [PATCH 2/3] mm, page_alloc: add scheduling point to memmap_init_zone Michal Hocko
  2017-09-18 12:14 ` [PATCH 3/3] memremap: add scheduling point to devm_memremap_pages Michal Hocko
  2 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2017-09-18 12:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dan Williams, LKML, linux-mm, Michal Hocko, Johannes Thumshirn

From: Michal Hocko <mhocko@suse.com>

__add_pages gets a pfn range to add and there is no upper bound for a
single call. This is usually a memory block aligned size for the regular
memory hotplug - smaller sizes are usual for memory balloning drivers,
or the whole NUMA node for physical memory online. There is no explicit
scheduling point in that code path though.

This can lead to long latencies while __add_pages is executed and we
have even seen a soft lockup report during nvdimm initialization with
!PREEMPT kernel

[   33.588806] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [kworker/u641:3:832]
[...]
[   33.588875] Workqueue: events_unbound async_run_entry_fn
[   33.588876] task: ffff881809270f40 ti: ffff881809274000 task.ti: ffff881809274000
[   33.588878] RIP: 0010:[<ffffffff81608c01>]  [<ffffffff81608c01>] _raw_spin_unlock_irqrestore+0x11/0x20
[   33.588883] RSP: 0018:ffff881809277b10  EFLAGS: 00000286
[...]
[   33.588900] Call Trace:
[   33.588906]  [<ffffffff81603a45>] sparse_add_one_section+0x13d/0x18e
[   33.588909]  [<ffffffff815fde8a>] __add_pages+0x10a/0x1d0
[   33.588916]  [<ffffffff810634ca>] arch_add_memory+0x4a/0xc0
[   33.588920]  [<ffffffff8118b22d>] devm_memremap_pages+0x29d/0x430
[   33.588931]  [<ffffffffa042e50d>] pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
[   33.589001]  [<ffffffffa14ad984>] nvdimm_bus_probe+0x64/0x110 [libnvdimm]
[   33.589008]  [<ffffffff8146a337>] driver_probe_device+0x1f7/0x420
[   33.589012]  [<ffffffff814682f2>] bus_for_each_drv+0x52/0x80
[   33.589014]  [<ffffffff8146a020>] __device_attach+0xb0/0x130
[   33.589017]  [<ffffffff81469447>] bus_probe_device+0x87/0xa0
[   33.589020]  [<ffffffff8146741c>] device_add+0x3fc/0x5f0
[   33.589029]  [<ffffffffa14acffe>] nd_async_device_register+0xe/0x40 [libnvdimm]
[   33.589047]  [<ffffffff8109e1c3>] async_run_entry_fn+0x43/0x150
[   33.589073]  [<ffffffff8109594e>] process_one_work+0x14e/0x410
[   33.589086]  [<ffffffff810961a6>] worker_thread+0x116/0x490
[   33.589089]  [<ffffffff8109b677>] kthread+0xc7/0xe0
[   33.589119]  [<ffffffff816094bf>] ret_from_fork+0x3f/0x70
[   33.590756] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70

Fix this by adding cond_resched once per each memory section in the
given pfn range. Each section is constant amount of work which itself is
not too expensive but many of them will just add up.

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 459bbc182d10..73b56fa49b6f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -328,6 +328,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 		if (err && (err != -EEXIST))
 			break;
 		err = 0;
+		cond_resched();
 	}
 	vmemmap_populate_print_last();
 out:
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] mm, page_alloc: add scheduling point to memmap_init_zone
  2017-09-18 12:14 [PATCH 0/3] mm, memory_hotplug: fix few soft lockups in memory hotadd Michal Hocko
  2017-09-18 12:14 ` [PATCH 1/3] mm, memory_hotplug: add scheduling point to __add_pages Michal Hocko
@ 2017-09-18 12:14 ` Michal Hocko
  2017-09-18 12:17   ` Johannes Thumshirn
  2017-09-18 12:14 ` [PATCH 3/3] memremap: add scheduling point to devm_memremap_pages Michal Hocko
  2 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2017-09-18 12:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dan Williams, LKML, linux-mm, Michal Hocko, Johannes Thumshirn

From: Michal Hocko <mhocko@suse.com>

memmap_init_zone gets a pfn range to intialize and it can be really
large resulting in a soft lockup on non-preemptible kernels

[   65.585596] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [kworker/u642:5:1720]
[...]
[   65.585818] task: ffff88ecd7e902c0 ti: ffff88eca4e50000 task.ti: ffff88eca4e50000
[   65.585819] RIP: 0010:[<ffffffff815ff545>]  [<ffffffff815ff545>] move_pfn_range_to_zone+0x185/0x1d0
[...]
[   65.585843] Call Trace:
[   65.585853]  [<ffffffff8118b657>] devm_memremap_pages+0x2c7/0x430
[   65.585862]  [<ffffffffa02d650d>] pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
[   65.585893]  [<ffffffffa14bb984>] nvdimm_bus_probe+0x64/0x110 [libnvdimm]
[   65.585904]  [<ffffffff8146b257>] driver_probe_device+0x1f7/0x420
[   65.585910]  [<ffffffff81469212>] bus_for_each_drv+0x52/0x80
[   65.585913]  [<ffffffff8146af40>] __device_attach+0xb0/0x130
[   65.585916]  [<ffffffff8146a367>] bus_probe_device+0x87/0xa0
[   65.585919]  [<ffffffff814682fc>] device_add+0x3fc/0x5f0
[   65.585924]  [<ffffffffa14baffe>] nd_async_device_register+0xe/0x40 [libnvdimm]
[   65.585927]  [<ffffffff8109e413>] async_run_entry_fn+0x43/0x150
[   65.585933]  [<ffffffff81095b8e>] process_one_work+0x14e/0x410
[   65.585937]  [<ffffffff810963f6>] worker_thread+0x116/0x490
[   65.585939]  [<ffffffff8109b8c7>] kthread+0xc7/0xe0
[   65.585943]  [<ffffffff8160a57f>] ret_from_fork+0x3f/0x70

Fix this by adding a scheduling point once per page block.

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fc36755a21cf..41e93dfc702e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5320,6 +5320,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 
 			__init_single_page(page, pfn, zone, nid);
 			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			cond_resched();
 		} else {
 			__init_single_pfn(pfn, zone, nid);
 		}
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] memremap: add scheduling point to devm_memremap_pages
  2017-09-18 12:14 [PATCH 0/3] mm, memory_hotplug: fix few soft lockups in memory hotadd Michal Hocko
  2017-09-18 12:14 ` [PATCH 1/3] mm, memory_hotplug: add scheduling point to __add_pages Michal Hocko
  2017-09-18 12:14 ` [PATCH 2/3] mm, page_alloc: add scheduling point to memmap_init_zone Michal Hocko
@ 2017-09-18 12:14 ` Michal Hocko
  2017-09-18 12:17   ` Johannes Thumshirn
  2 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2017-09-18 12:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dan Williams, LKML, linux-mm, Michal Hocko, Johannes Thumshirn

From: Michal Hocko <mhocko@suse.com>

devm_memremap_pages is initializing struct pages in for_each_device_pfn
and that can take quite some time. We have even seen a soft lockup
trigerring on a non preemptive kernel
[  125.583233] NMI watchdog: BUG: soft lockup - CPU#61 stuck for 22s! [kworker/u641:11:1808]
[...]
[  125.583467] RIP: 0010:[<ffffffff8118b6b7>]  [<ffffffff8118b6b7>] devm_memremap_pages+0x327/0x430
[...]
[  125.583488] Call Trace:
[  125.583496]  [<ffffffffa016550d>] pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
[  125.583528]  [<ffffffffa14ae984>] nvdimm_bus_probe+0x64/0x110 [libnvdimm]
[  125.583536]  [<ffffffff8146b257>] driver_probe_device+0x1f7/0x420
[  125.583540]  [<ffffffff81469212>] bus_for_each_drv+0x52/0x80
[  125.583543]  [<ffffffff8146af40>] __device_attach+0xb0/0x130
[  125.583546]  [<ffffffff8146a367>] bus_probe_device+0x87/0xa0
[  125.583548]  [<ffffffff814682fc>] device_add+0x3fc/0x5f0
[  125.583553]  [<ffffffffa14adffe>] nd_async_device_register+0xe/0x40 [libnvdimm]
[  125.583556]  [<ffffffff8109e413>] async_run_entry_fn+0x43/0x150
[  125.583561]  [<ffffffff81095b8e>] process_one_work+0x14e/0x410
[  125.583563]  [<ffffffff810963f6>] worker_thread+0x116/0x490
[  125.583565]  [<ffffffff8109b8c7>] kthread+0xc7/0xe0
[  125.583569]  [<ffffffff8160a57f>] ret_from_fork+0x3f/0x70

fix this by adding cond_resched every 1024 pages.

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/memremap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 6bcbfbf1a8fd..403ab9cdb949 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -350,7 +350,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 	pgprot_t pgprot = PAGE_KERNEL;
 	struct dev_pagemap *pgmap;
 	struct page_map *page_map;
-	int error, nid, is_ram;
+	int error, nid, is_ram, i = 0;
 
 	align_start = res->start & ~(SECTION_SIZE - 1);
 	align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
@@ -448,6 +448,8 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
 		list_del(&page->lru);
 		page->pgmap = pgmap;
 		percpu_ref_get(ref);
+		if (!(++i % 1024))
+			cond_resched();
 	}
 	devres_add(dev, page_map);
 	return __va(res->start);
-- 
2.14.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] mm, memory_hotplug: add scheduling point to __add_pages
  2017-09-18 12:14 ` [PATCH 1/3] mm, memory_hotplug: add scheduling point to __add_pages Michal Hocko
@ 2017-09-18 12:16   ` Johannes Thumshirn
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Thumshirn @ 2017-09-18 12:16 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Dan Williams, LKML, linux-mm, Michal Hocko

Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nurnberg
GF: Felix Imendorffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nurnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] mm, page_alloc: add scheduling point to memmap_init_zone
  2017-09-18 12:14 ` [PATCH 2/3] mm, page_alloc: add scheduling point to memmap_init_zone Michal Hocko
@ 2017-09-18 12:17   ` Johannes Thumshirn
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Thumshirn @ 2017-09-18 12:17 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Dan Williams, LKML, linux-mm, Michal Hocko

Tested-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nurnberg
GF: Felix Imendorffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nurnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] memremap: add scheduling point to devm_memremap_pages
  2017-09-18 12:14 ` [PATCH 3/3] memremap: add scheduling point to devm_memremap_pages Michal Hocko
@ 2017-09-18 12:17   ` Johannes Thumshirn
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Thumshirn @ 2017-09-18 12:17 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Dan Williams, LKML, linux-mm, Michal Hocko

Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nurnberg
GF: Felix Imendorffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nurnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-09-18 12:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-18 12:14 [PATCH 0/3] mm, memory_hotplug: fix few soft lockups in memory hotadd Michal Hocko
2017-09-18 12:14 ` [PATCH 1/3] mm, memory_hotplug: add scheduling point to __add_pages Michal Hocko
2017-09-18 12:16   ` Johannes Thumshirn
2017-09-18 12:14 ` [PATCH 2/3] mm, page_alloc: add scheduling point to memmap_init_zone Michal Hocko
2017-09-18 12:17   ` Johannes Thumshirn
2017-09-18 12:14 ` [PATCH 3/3] memremap: add scheduling point to devm_memremap_pages Michal Hocko
2017-09-18 12:17   ` Johannes Thumshirn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).