linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs
@ 2020-04-01 10:41 David Hildenbrand
  2020-04-01 10:41 ` [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-04-01 10:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Alexander Duyck, Andrew Morton,
	Baoquan He, Daniel Jordan, Kirill Tkhai, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Shile Zhang, Yiqian Wei

Two fixes for misleading stall messages / soft lockups with huge nodes /
zones during boot without CONFIG_PREEMPT.

David Hildenbrand (2):
  mm/page_alloc: fix RCU stalls during deferred page initialization
  mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()

 mm/page_alloc.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization
  2020-04-01 10:41 [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs David Hildenbrand
@ 2020-04-01 10:41 ` David Hildenbrand
  2020-04-01 13:18   ` Pavel Tatashin
                     ` (4 more replies)
  2020-04-01 10:41 ` [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() David Hildenbrand
  2020-04-01 14:10 ` [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs David Hildenbrand
  2 siblings, 5 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-04-01 10:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Yiqian Wei, Andrew Morton,
	Kirill Tkhai, Shile Zhang, Pavel Tatashin, Daniel Jordan,
	Michal Hocko, Alexander Duyck, Baoquan He, Oscar Salvador

With CONFIG_DEFERRED_STRUCT_PAGE_INIT and without CONFIG_PREEMPT, it can
happen that we get RCU stalls detected when booting up.

[   60.474005] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   60.475000] rcu:  1-...0: (0 ticks this GP) idle=02a/1/0x4000000000000000 softirq=1/1 fqs=15000
[   60.475000] rcu:  (detected by 0, t=60002 jiffies, g=-1199, q=1)
[   60.475000] Sending NMI from CPU 0 to CPUs 1:
[    1.760091] NMI backtrace for cpu 1
[    1.760091] CPU: 1 PID: 20 Comm: pgdatinit0 Not tainted 4.18.0-147.9.1.el8_1.x86_64 #1
[    1.760091] Hardware name: Red Hat KVM, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014
[    1.760091] RIP: 0010:__init_single_page.isra.65+0x10/0x4f
[    1.760091] Code: 48 83 cf 63 48 89 f8 0f 1f 40 00 48 89 c6 48 89 d7 e8 6b 18 80 ff 66 90 5b c3 31 c0 b9 10 00 00 00 49 89 f8 48 c1 e6 33 f3 ab <b8> 07 00 00 00 48 c1 e2 36 41 c7 40 34 01 00 00 00 48 c1 e0 33 41
[    1.760091] RSP: 0000:ffffba783123be40 EFLAGS: 00000006
[    1.760091] RAX: 0000000000000000 RBX: fffffad34405e300 RCX: 0000000000000000
[    1.760091] RDX: 0000000000000000 RSI: 0010000000000000 RDI: fffffad34405e340
[    1.760091] RBP: 0000000033f3177e R08: fffffad34405e300 R09: 0000000000000002
[    1.760091] R10: 000000000000002b R11: ffff98afb691a500 R12: 0000000000000002
[    1.760091] R13: 0000000000000000 R14: 000000003f03ea00 R15: 000000003e10178c
[    1.760091] FS:  0000000000000000(0000) GS:ffff9c9ebeb00000(0000) knlGS:0000000000000000
[    1.760091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.760091] CR2: 00000000ffffffff CR3: 000000a1cf20a001 CR4: 00000000003606e0
[    1.760091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.760091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    1.760091] Call Trace:
[    1.760091]  deferred_init_pages+0x8f/0xbf
[    1.760091]  deferred_init_memmap+0x184/0x29d
[    1.760091]  ? deferred_free_pages.isra.97+0xba/0xba
[    1.760091]  kthread+0x112/0x130
[    1.760091]  ? kthread_flush_work_fn+0x10/0x10
[    1.760091]  ret_from_fork+0x35/0x40
[   89.123011] node 0 initialised, 1055935372 pages in 88650ms

The issue becomes visible when having a lot of memory (e.g., 4TB)
assigned to a single NUMA node - a system that can easily be created
using QEMU. Inside VMs on a hypervisor with quite some memory
overcommit, this is fairly easy to trigger.

Adding the cond_resched() makes RCU happy.

Reported-by: Yiqian Wei <yiwei@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ca1453204e66..084cabffc90d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1877,6 +1877,7 @@ static int __init deferred_init_memmap(void *data)
 			prev_nr_pages = nr_pages;
 			pgdat->first_deferred_pfn = spfn;
 			pgdat_resize_unlock(pgdat, &flags);
+			cond_resched();
 			goto again;
 		}
 	}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
  2020-04-01 10:41 [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs David Hildenbrand
  2020-04-01 10:41 ` [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization David Hildenbrand
@ 2020-04-01 10:41 ` David Hildenbrand
  2020-04-01 13:17   ` Pavel Tatashin
                     ` (4 more replies)
  2020-04-01 14:10 ` [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs David Hildenbrand
  2 siblings, 5 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-04-01 10:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Kirill Tkhai,
	Shile Zhang, Pavel Tatashin, Daniel Jordan, Michal Hocko,
	Alexander Duyck, Baoquan He, Oscar Salvador

Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
e.g., while booting up.

[  105.608900] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[  105.608933] Modules linked in:
[  105.608933] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
[  105.608933] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
[  105.608933] RIP: 0010:__pageblock_pfn_to_page+0x134/0x1c0
[  105.608933] Code: 85 c0 74 71 4a 8b 04 d0 48 85 c0 74 68 48 01 c1 74 63 f6 01 04 74 5e 48 c1 e7 06 4c 8b 05 cc 991
[  105.608933] RSP: 0000:ffffb6d94000fe60 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
[  105.608933] RAX: fffff81953250000 RBX: 000000000a4c9600 RCX: ffff8fe9ff7c1990
[  105.608933] RDX: ffff8fe9ff7dab80 RSI: 000000000a4c95ff RDI: 0000000293250000
[  105.608933] RBP: ffff8fe9ff7dab80 R08: fffff816c0000000 R09: 0000000000000008
[  105.608933] R10: 0000000000000014 R11: 0000000000000014 R12: 0000000000000000
[  105.608933] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  105.608933] FS:  0000000000000000(0000) GS:ffff8fe1ff400000(0000) knlGS:0000000000000000
[  105.608933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  105.608933] CR2: 000000000f613000 CR3: 00000088cf20a000 CR4: 00000000000006f0
[  105.608933] Call Trace:
[  105.608933]  set_zone_contiguous+0x56/0x70
[  105.608933]  page_alloc_init_late+0x166/0x176
[  105.608933]  kernel_init_freeable+0xfa/0x255
[  105.608933]  ? rest_init+0xaa/0xaa
[  105.608933]  kernel_init+0xa/0x106
[  105.608933]  ret_from_fork+0x35/0x40

The issue becomes visible when having a lot of memory (e.g., 4TB)
assigned to a single NUMA node - a system that can easily be created
using QEMU. Inside VMs on a hypervisor with quite some memory
overcommit, this is fairly easy to trigger.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 084cabffc90d..cc4f07d52939 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1607,6 +1607,7 @@ void set_zone_contiguous(struct zone *zone)
 		if (!__pageblock_pfn_to_page(block_start_pfn,
 					     block_end_pfn, zone))
 			return;
+		cond_resched();
 	}
 
 	/* We confirm that there is no hole */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
  2020-04-01 10:41 ` [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() David Hildenbrand
@ 2020-04-01 13:17   ` Pavel Tatashin
  2020-04-01 13:45   ` Pankaj Gupta
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Pavel Tatashin @ 2020-04-01 13:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, linux-mm, Andrew Morton, Kirill Tkhai, Shile Zhang,
	Daniel Jordan, Michal Hocko, Alexander Duyck, Baoquan He,
	Oscar Salvador

On Wed, Apr 1, 2020 at 6:42 AM David Hildenbrand <david@redhat.com> wrote:
>
> Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
> e.g., while booting up.
>
> [  105.608900] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
> [  105.608933] Modules linked in:
> [  105.608933] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
> [  105.608933] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
> [  105.608933] RIP: 0010:__pageblock_pfn_to_page+0x134/0x1c0
> [  105.608933] Code: 85 c0 74 71 4a 8b 04 d0 48 85 c0 74 68 48 01 c1 74 63 f6 01 04 74 5e 48 c1 e7 06 4c 8b 05 cc 991
> [  105.608933] RSP: 0000:ffffb6d94000fe60 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
> [  105.608933] RAX: fffff81953250000 RBX: 000000000a4c9600 RCX: ffff8fe9ff7c1990
> [  105.608933] RDX: ffff8fe9ff7dab80 RSI: 000000000a4c95ff RDI: 0000000293250000
> [  105.608933] RBP: ffff8fe9ff7dab80 R08: fffff816c0000000 R09: 0000000000000008
> [  105.608933] R10: 0000000000000014 R11: 0000000000000014 R12: 0000000000000000
> [  105.608933] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  105.608933] FS:  0000000000000000(0000) GS:ffff8fe1ff400000(0000) knlGS:0000000000000000
> [  105.608933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  105.608933] CR2: 000000000f613000 CR3: 00000088cf20a000 CR4: 00000000000006f0
> [  105.608933] Call Trace:
> [  105.608933]  set_zone_contiguous+0x56/0x70
> [  105.608933]  page_alloc_init_late+0x166/0x176
> [  105.608933]  kernel_init_freeable+0xfa/0x255
> [  105.608933]  ? rest_init+0xaa/0xaa
> [  105.608933]  kernel_init+0xa/0x106
> [  105.608933]  ret_from_fork+0x35/0x40
>
> The issue becomes visible when having a lot of memory (e.g., 4TB)
> assigned to a single NUMA node - a system that can easily be created
> using QEMU. Inside VMs on a hypervisor with quite some memory
> overcommit, this is fairly easy to trigger.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization
  2020-04-01 10:41 ` [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization David Hildenbrand
@ 2020-04-01 13:18   ` Pavel Tatashin
  2020-04-01 13:49   ` Baoquan He
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Pavel Tatashin @ 2020-04-01 13:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, linux-mm, Yiqian Wei, Andrew Morton, Kirill Tkhai,
	Shile Zhang, Daniel Jordan, Michal Hocko, Alexander Duyck,
	Baoquan He, Oscar Salvador

On Wed, Apr 1, 2020 at 6:42 AM David Hildenbrand <david@redhat.com> wrote:
>
> With CONFIG_DEFERRED_STRUCT_PAGE_INIT and without CONFIG_PREEMPT, it can
> happen that we get RCU stalls detected when booting up.
>
> [   60.474005] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [   60.475000] rcu:  1-...0: (0 ticks this GP) idle=02a/1/0x4000000000000000 softirq=1/1 fqs=15000
> [   60.475000] rcu:  (detected by 0, t=60002 jiffies, g=-1199, q=1)
> [   60.475000] Sending NMI from CPU 0 to CPUs 1:
> [    1.760091] NMI backtrace for cpu 1
> [    1.760091] CPU: 1 PID: 20 Comm: pgdatinit0 Not tainted 4.18.0-147.9.1.el8_1.x86_64 #1
> [    1.760091] Hardware name: Red Hat KVM, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014
> [    1.760091] RIP: 0010:__init_single_page.isra.65+0x10/0x4f
> [    1.760091] Code: 48 83 cf 63 48 89 f8 0f 1f 40 00 48 89 c6 48 89 d7 e8 6b 18 80 ff 66 90 5b c3 31 c0 b9 10 00 00 00 49 89 f8 48 c1 e6 33 f3 ab <b8> 07 00 00 00 48 c1 e2 36 41 c7 40 34 01 00 00 00 48 c1 e0 33 41
> [    1.760091] RSP: 0000:ffffba783123be40 EFLAGS: 00000006
> [    1.760091] RAX: 0000000000000000 RBX: fffffad34405e300 RCX: 0000000000000000
> [    1.760091] RDX: 0000000000000000 RSI: 0010000000000000 RDI: fffffad34405e340
> [    1.760091] RBP: 0000000033f3177e R08: fffffad34405e300 R09: 0000000000000002
> [    1.760091] R10: 000000000000002b R11: ffff98afb691a500 R12: 0000000000000002
> [    1.760091] R13: 0000000000000000 R14: 000000003f03ea00 R15: 000000003e10178c
> [    1.760091] FS:  0000000000000000(0000) GS:ffff9c9ebeb00000(0000) knlGS:0000000000000000
> [    1.760091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.760091] CR2: 00000000ffffffff CR3: 000000a1cf20a001 CR4: 00000000003606e0
> [    1.760091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    1.760091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    1.760091] Call Trace:
> [    1.760091]  deferred_init_pages+0x8f/0xbf
> [    1.760091]  deferred_init_memmap+0x184/0x29d
> [    1.760091]  ? deferred_free_pages.isra.97+0xba/0xba
> [    1.760091]  kthread+0x112/0x130
> [    1.760091]  ? kthread_flush_work_fn+0x10/0x10
> [    1.760091]  ret_from_fork+0x35/0x40
> [   89.123011] node 0 initialised, 1055935372 pages in 88650ms
>
> The issue becomes visible when having a lot of memory (e.g., 4TB)
> assigned to a single NUMA node - a system that can easily be created
> using QEMU. Inside VMs on a hypervisor with quite some memory
> overcommit, this is fairly easy to trigger.
>
> Adding the cond_resched() makes RCU happy.
>
> Reported-by: Yiqian Wei <yiwei@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
  2020-04-01 10:41 ` [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() David Hildenbrand
  2020-04-01 13:17   ` Pavel Tatashin
@ 2020-04-01 13:45   ` Pankaj Gupta
  2020-04-01 13:50   ` Baoquan He
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Pankaj Gupta @ 2020-04-01 13:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Kirill Tkhai, Shile Zhang,
	Pavel Tatashin, Daniel Jordan, Michal Hocko, Alexander Duyck,
	Baoquan He, Oscar Salvador

> Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
> e.g., while booting up.
>
> [  105.608900] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
> [  105.608933] Modules linked in:
> [  105.608933] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
> [  105.608933] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
> [  105.608933] RIP: 0010:__pageblock_pfn_to_page+0x134/0x1c0
> [  105.608933] Code: 85 c0 74 71 4a 8b 04 d0 48 85 c0 74 68 48 01 c1 74 63 f6 01 04 74 5e 48 c1 e7 06 4c 8b 05 cc 991
> [  105.608933] RSP: 0000:ffffb6d94000fe60 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
> [  105.608933] RAX: fffff81953250000 RBX: 000000000a4c9600 RCX: ffff8fe9ff7c1990
> [  105.608933] RDX: ffff8fe9ff7dab80 RSI: 000000000a4c95ff RDI: 0000000293250000
> [  105.608933] RBP: ffff8fe9ff7dab80 R08: fffff816c0000000 R09: 0000000000000008
> [  105.608933] R10: 0000000000000014 R11: 0000000000000014 R12: 0000000000000000
> [  105.608933] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  105.608933] FS:  0000000000000000(0000) GS:ffff8fe1ff400000(0000) knlGS:0000000000000000
> [  105.608933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  105.608933] CR2: 000000000f613000 CR3: 00000088cf20a000 CR4: 00000000000006f0
> [  105.608933] Call Trace:
> [  105.608933]  set_zone_contiguous+0x56/0x70
> [  105.608933]  page_alloc_init_late+0x166/0x176
> [  105.608933]  kernel_init_freeable+0xfa/0x255
> [  105.608933]  ? rest_init+0xaa/0xaa
> [  105.608933]  kernel_init+0xa/0x106
> [  105.608933]  ret_from_fork+0x35/0x40
>
> The issue becomes visible when having a lot of memory (e.g., 4TB)
> assigned to a single NUMA node - a system that can easily be created
> using QEMU. Inside VMs on a hypervisor with quite some memory
> overcommit, this is fairly easy to trigger.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 084cabffc90d..cc4f07d52939 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1607,6 +1607,7 @@ void set_zone_contiguous(struct zone *zone)
>                 if (!__pageblock_pfn_to_page(block_start_pfn,
>                                              block_end_pfn, zone))
>                         return;
> +               cond_resched();
>         }
>
>         /* We confirm that there is no hole */
> --

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

> 2.25.1
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization
  2020-04-01 10:41 ` [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization David Hildenbrand
  2020-04-01 13:18   ` Pavel Tatashin
@ 2020-04-01 13:49   ` Baoquan He
  2020-04-01 13:58   ` Shile Zhang
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Baoquan He @ 2020-04-01 13:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Yiqian Wei, Andrew Morton, Kirill Tkhai,
	Shile Zhang, Pavel Tatashin, Daniel Jordan, Michal Hocko,
	Alexander Duyck, Oscar Salvador

On 04/01/20 at 12:41pm, David Hildenbrand wrote:
> With CONFIG_DEFERRED_STRUCT_PAGE_INIT and without CONFIG_PREEMPT, it can
> happen that we get RCU stalls detected when booting up.
> 
> [   60.474005] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [   60.475000] rcu:  1-...0: (0 ticks this GP) idle=02a/1/0x4000000000000000 softirq=1/1 fqs=15000
> [   60.475000] rcu:  (detected by 0, t=60002 jiffies, g=-1199, q=1)
> [   60.475000] Sending NMI from CPU 0 to CPUs 1:
> [    1.760091] NMI backtrace for cpu 1
> [    1.760091] CPU: 1 PID: 20 Comm: pgdatinit0 Not tainted 4.18.0-147.9.1.el8_1.x86_64 #1
> [    1.760091] Hardware name: Red Hat KVM, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014
> [    1.760091] RIP: 0010:__init_single_page.isra.65+0x10/0x4f
> [    1.760091] Code: 48 83 cf 63 48 89 f8 0f 1f 40 00 48 89 c6 48 89 d7 e8 6b 18 80 ff 66 90 5b c3 31 c0 b9 10 00 00 00 49 89 f8 48 c1 e6 33 f3 ab <b8> 07 00 00 00 48 c1 e2 36 41 c7 40 34 01 00 00 00 48 c1 e0 33 41
> [    1.760091] RSP: 0000:ffffba783123be40 EFLAGS: 00000006
> [    1.760091] RAX: 0000000000000000 RBX: fffffad34405e300 RCX: 0000000000000000
> [    1.760091] RDX: 0000000000000000 RSI: 0010000000000000 RDI: fffffad34405e340
> [    1.760091] RBP: 0000000033f3177e R08: fffffad34405e300 R09: 0000000000000002
> [    1.760091] R10: 000000000000002b R11: ffff98afb691a500 R12: 0000000000000002
> [    1.760091] R13: 0000000000000000 R14: 000000003f03ea00 R15: 000000003e10178c
> [    1.760091] FS:  0000000000000000(0000) GS:ffff9c9ebeb00000(0000) knlGS:0000000000000000
> [    1.760091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.760091] CR2: 00000000ffffffff CR3: 000000a1cf20a001 CR4: 00000000003606e0
> [    1.760091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    1.760091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    1.760091] Call Trace:
> [    1.760091]  deferred_init_pages+0x8f/0xbf
> [    1.760091]  deferred_init_memmap+0x184/0x29d
> [    1.760091]  ? deferred_free_pages.isra.97+0xba/0xba
> [    1.760091]  kthread+0x112/0x130
> [    1.760091]  ? kthread_flush_work_fn+0x10/0x10
> [    1.760091]  ret_from_fork+0x35/0x40
> [   89.123011] node 0 initialised, 1055935372 pages in 88650ms
> 
> The issue becomes visible when having a lot of memory (e.g., 4TB)
> assigned to a single NUMA node - a system that can easily be created
> using QEMU. Inside VMs on a hypervisor with quite some memory
> overcommit, this is fairly easy to trigger.
> 
> Adding the cond_resched() makes RCU happy.
> 
> Reported-by: Yiqian Wei <yiwei@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ca1453204e66..084cabffc90d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1877,6 +1877,7 @@ static int __init deferred_init_memmap(void *data)
>  			prev_nr_pages = nr_pages;
>  			pgdat->first_deferred_pfn = spfn;
>  			pgdat_resize_unlock(pgdat, &flags);
> +			cond_resched();
>  			goto again;

Reviewed-by: Baoquan He <bhe@redhat.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
  2020-04-01 10:41 ` [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() David Hildenbrand
  2020-04-01 13:17   ` Pavel Tatashin
  2020-04-01 13:45   ` Pankaj Gupta
@ 2020-04-01 13:50   ` Baoquan He
  2020-04-01 13:59   ` Shile Zhang
  2020-04-01 15:45   ` Michal Hocko
  4 siblings, 0 replies; 21+ messages in thread
From: Baoquan He @ 2020-04-01 13:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Kirill Tkhai, Shile Zhang,
	Pavel Tatashin, Daniel Jordan, Michal Hocko, Alexander Duyck,
	Oscar Salvador

On 04/01/20 at 12:41pm, David Hildenbrand wrote:
> Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
> e.g., while booting up.
> 
> [  105.608900] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
> [  105.608933] Modules linked in:
> [  105.608933] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
> [  105.608933] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
> [  105.608933] RIP: 0010:__pageblock_pfn_to_page+0x134/0x1c0
> [  105.608933] Code: 85 c0 74 71 4a 8b 04 d0 48 85 c0 74 68 48 01 c1 74 63 f6 01 04 74 5e 48 c1 e7 06 4c 8b 05 cc 991
> [  105.608933] RSP: 0000:ffffb6d94000fe60 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
> [  105.608933] RAX: fffff81953250000 RBX: 000000000a4c9600 RCX: ffff8fe9ff7c1990
> [  105.608933] RDX: ffff8fe9ff7dab80 RSI: 000000000a4c95ff RDI: 0000000293250000
> [  105.608933] RBP: ffff8fe9ff7dab80 R08: fffff816c0000000 R09: 0000000000000008
> [  105.608933] R10: 0000000000000014 R11: 0000000000000014 R12: 0000000000000000
> [  105.608933] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  105.608933] FS:  0000000000000000(0000) GS:ffff8fe1ff400000(0000) knlGS:0000000000000000
> [  105.608933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  105.608933] CR2: 000000000f613000 CR3: 00000088cf20a000 CR4: 00000000000006f0
> [  105.608933] Call Trace:
> [  105.608933]  set_zone_contiguous+0x56/0x70
> [  105.608933]  page_alloc_init_late+0x166/0x176
> [  105.608933]  kernel_init_freeable+0xfa/0x255
> [  105.608933]  ? rest_init+0xaa/0xaa
> [  105.608933]  kernel_init+0xa/0x106
> [  105.608933]  ret_from_fork+0x35/0x40
> 
> The issue becomes visible when having a lot of memory (e.g., 4TB)
> assigned to a single NUMA node - a system that can easily be created
> using QEMU. Inside VMs on a hypervisor with quite some memory
> overcommit, this is fairly easy to trigger.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 084cabffc90d..cc4f07d52939 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1607,6 +1607,7 @@ void set_zone_contiguous(struct zone *zone)
>  		if (!__pageblock_pfn_to_page(block_start_pfn,
>  					     block_end_pfn, zone))
>  			return;
> +		cond_resched();
>  	}

Reviewed-by: Baoquan He <bhe@redhat.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization
  2020-04-01 10:41 ` [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization David Hildenbrand
  2020-04-01 13:18   ` Pavel Tatashin
  2020-04-01 13:49   ` Baoquan He
@ 2020-04-01 13:58   ` Shile Zhang
  2020-04-01 14:09   ` Pankaj Gupta
  2020-04-01 15:45   ` Michal Hocko
  4 siblings, 0 replies; 21+ messages in thread
From: Shile Zhang @ 2020-04-01 13:58 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, Yiqian Wei, Andrew Morton, Kirill Tkhai,
	Pavel Tatashin, Daniel Jordan, Michal Hocko, Alexander Duyck,
	Baoquan He, Oscar Salvador



On 2020/4/1 18:41, David Hildenbrand wrote:
> With CONFIG_DEFERRED_STRUCT_PAGE_INIT and without CONFIG_PREEMPT, it can
> happen that we get RCU stalls detected when booting up.
>
> [   60.474005] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [   60.475000] rcu:  1-...0: (0 ticks this GP) idle=02a/1/0x4000000000000000 softirq=1/1 fqs=15000
> [   60.475000] rcu:  (detected by 0, t=60002 jiffies, g=-1199, q=1)
> [   60.475000] Sending NMI from CPU 0 to CPUs 1:
> [    1.760091] NMI backtrace for cpu 1
> [    1.760091] CPU: 1 PID: 20 Comm: pgdatinit0 Not tainted 4.18.0-147.9.1.el8_1.x86_64 #1
> [    1.760091] Hardware name: Red Hat KVM, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014
> [    1.760091] RIP: 0010:__init_single_page.isra.65+0x10/0x4f
> [    1.760091] Code: 48 83 cf 63 48 89 f8 0f 1f 40 00 48 89 c6 48 89 d7 e8 6b 18 80 ff 66 90 5b c3 31 c0 b9 10 00 00 00 49 89 f8 48 c1 e6 33 f3 ab <b8> 07 00 00 00 48 c1 e2 36 41 c7 40 34 01 00 00 00 48 c1 e0 33 41
> [    1.760091] RSP: 0000:ffffba783123be40 EFLAGS: 00000006
> [    1.760091] RAX: 0000000000000000 RBX: fffffad34405e300 RCX: 0000000000000000
> [    1.760091] RDX: 0000000000000000 RSI: 0010000000000000 RDI: fffffad34405e340
> [    1.760091] RBP: 0000000033f3177e R08: fffffad34405e300 R09: 0000000000000002
> [    1.760091] R10: 000000000000002b R11: ffff98afb691a500 R12: 0000000000000002
> [    1.760091] R13: 0000000000000000 R14: 000000003f03ea00 R15: 000000003e10178c
> [    1.760091] FS:  0000000000000000(0000) GS:ffff9c9ebeb00000(0000) knlGS:0000000000000000
> [    1.760091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.760091] CR2: 00000000ffffffff CR3: 000000a1cf20a001 CR4: 00000000003606e0
> [    1.760091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    1.760091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    1.760091] Call Trace:
> [    1.760091]  deferred_init_pages+0x8f/0xbf
> [    1.760091]  deferred_init_memmap+0x184/0x29d
> [    1.760091]  ? deferred_free_pages.isra.97+0xba/0xba
> [    1.760091]  kthread+0x112/0x130
> [    1.760091]  ? kthread_flush_work_fn+0x10/0x10
> [    1.760091]  ret_from_fork+0x35/0x40
> [   89.123011] node 0 initialised, 1055935372 pages in 88650ms
>
> The issue becomes visible when having a lot of memory (e.g., 4TB)
> assigned to a single NUMA node - a system that can easily be created
> using QEMU. Inside VMs on a hypervisor with quite some memory
> overcommit, this is fairly easy to trigger.
>
> Adding the cond_resched() makes RCU happy.
>
> Reported-by: Yiqian Wei <yiwei@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   mm/page_alloc.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ca1453204e66..084cabffc90d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1877,6 +1877,7 @@ static int __init deferred_init_memmap(void *data)
>   			prev_nr_pages = nr_pages;
>   			pgdat->first_deferred_pfn = spfn;
>   			pgdat_resize_unlock(pgdat, &flags);
> +			cond_resched();
>   			goto again;
>   		}
>   	}

Reviewed-by: Shile Zhang<shile.zhang@linux.alibaba.com>




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
  2020-04-01 10:41 ` [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() David Hildenbrand
                     ` (2 preceding siblings ...)
  2020-04-01 13:50   ` Baoquan He
@ 2020-04-01 13:59   ` Shile Zhang
  2020-04-01 15:45   ` Michal Hocko
  4 siblings, 0 replies; 21+ messages in thread
From: Shile Zhang @ 2020-04-01 13:59 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, Andrew Morton, Kirill Tkhai, Pavel Tatashin,
	Daniel Jordan, Michal Hocko, Alexander Duyck, Baoquan He,
	Oscar Salvador



On 2020/4/1 18:41, David Hildenbrand wrote:
> Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
> e.g., while booting up.
>
> [  105.608900] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
> [  105.608933] Modules linked in:
> [  105.608933] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
> [  105.608933] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
> [  105.608933] RIP: 0010:__pageblock_pfn_to_page+0x134/0x1c0
> [  105.608933] Code: 85 c0 74 71 4a 8b 04 d0 48 85 c0 74 68 48 01 c1 74 63 f6 01 04 74 5e 48 c1 e7 06 4c 8b 05 cc 991
> [  105.608933] RSP: 0000:ffffb6d94000fe60 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
> [  105.608933] RAX: fffff81953250000 RBX: 000000000a4c9600 RCX: ffff8fe9ff7c1990
> [  105.608933] RDX: ffff8fe9ff7dab80 RSI: 000000000a4c95ff RDI: 0000000293250000
> [  105.608933] RBP: ffff8fe9ff7dab80 R08: fffff816c0000000 R09: 0000000000000008
> [  105.608933] R10: 0000000000000014 R11: 0000000000000014 R12: 0000000000000000
> [  105.608933] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  105.608933] FS:  0000000000000000(0000) GS:ffff8fe1ff400000(0000) knlGS:0000000000000000
> [  105.608933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  105.608933] CR2: 000000000f613000 CR3: 00000088cf20a000 CR4: 00000000000006f0
> [  105.608933] Call Trace:
> [  105.608933]  set_zone_contiguous+0x56/0x70
> [  105.608933]  page_alloc_init_late+0x166/0x176
> [  105.608933]  kernel_init_freeable+0xfa/0x255
> [  105.608933]  ? rest_init+0xaa/0xaa
> [  105.608933]  kernel_init+0xa/0x106
> [  105.608933]  ret_from_fork+0x35/0x40
>
> The issue becomes visible when having a lot of memory (e.g., 4TB)
> assigned to a single NUMA node - a system that can easily be created
> using QEMU. Inside VMs on a hypervisor with quite some memory
> overcommit, this is fairly easy to trigger.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   mm/page_alloc.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 084cabffc90d..cc4f07d52939 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1607,6 +1607,7 @@ void set_zone_contiguous(struct zone *zone)
>   		if (!__pageblock_pfn_to_page(block_start_pfn,
>   					     block_end_pfn, zone))
>   			return;
> +		cond_resched();
>   	}
>   
>   	/* We confirm that there is no hole */

Reviewed-by: Shile Zhang<shile.zhang@linux.alibaba.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization
  2020-04-01 10:41 ` [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization David Hildenbrand
                     ` (2 preceding siblings ...)
  2020-04-01 13:58   ` Shile Zhang
@ 2020-04-01 14:09   ` Pankaj Gupta
  2020-04-01 15:45   ` Michal Hocko
  4 siblings, 0 replies; 21+ messages in thread
From: Pankaj Gupta @ 2020-04-01 14:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Yiqian Wei, Andrew Morton, Kirill Tkhai,
	Shile Zhang, Pavel Tatashin, Daniel Jordan, Michal Hocko,
	Alexander Duyck, Baoquan He, Oscar Salvador

> With CONFIG_DEFERRED_STRUCT_PAGE_INIT and without CONFIG_PREEMPT, it can
> happen that we get RCU stalls detected when booting up.
>
> [   60.474005] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [   60.475000] rcu:  1-...0: (0 ticks this GP) idle=02a/1/0x4000000000000000 softirq=1/1 fqs=15000
> [   60.475000] rcu:  (detected by 0, t=60002 jiffies, g=-1199, q=1)
> [   60.475000] Sending NMI from CPU 0 to CPUs 1:
> [    1.760091] NMI backtrace for cpu 1
> [    1.760091] CPU: 1 PID: 20 Comm: pgdatinit0 Not tainted 4.18.0-147.9.1.el8_1.x86_64 #1
> [    1.760091] Hardware name: Red Hat KVM, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014
> [    1.760091] RIP: 0010:__init_single_page.isra.65+0x10/0x4f
> [    1.760091] Code: 48 83 cf 63 48 89 f8 0f 1f 40 00 48 89 c6 48 89 d7 e8 6b 18 80 ff 66 90 5b c3 31 c0 b9 10 00 00 00 49 89 f8 48 c1 e6 33 f3 ab <b8> 07 00 00 00 48 c1 e2 36 41 c7 40 34 01 00 00 00 48 c1 e0 33 41
> [    1.760091] RSP: 0000:ffffba783123be40 EFLAGS: 00000006
> [    1.760091] RAX: 0000000000000000 RBX: fffffad34405e300 RCX: 0000000000000000
> [    1.760091] RDX: 0000000000000000 RSI: 0010000000000000 RDI: fffffad34405e340
> [    1.760091] RBP: 0000000033f3177e R08: fffffad34405e300 R09: 0000000000000002
> [    1.760091] R10: 000000000000002b R11: ffff98afb691a500 R12: 0000000000000002
> [    1.760091] R13: 0000000000000000 R14: 000000003f03ea00 R15: 000000003e10178c
> [    1.760091] FS:  0000000000000000(0000) GS:ffff9c9ebeb00000(0000) knlGS:0000000000000000
> [    1.760091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.760091] CR2: 00000000ffffffff CR3: 000000a1cf20a001 CR4: 00000000003606e0
> [    1.760091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    1.760091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    1.760091] Call Trace:
> [    1.760091]  deferred_init_pages+0x8f/0xbf
> [    1.760091]  deferred_init_memmap+0x184/0x29d
> [    1.760091]  ? deferred_free_pages.isra.97+0xba/0xba
> [    1.760091]  kthread+0x112/0x130
> [    1.760091]  ? kthread_flush_work_fn+0x10/0x10
> [    1.760091]  ret_from_fork+0x35/0x40
> [   89.123011] node 0 initialised, 1055935372 pages in 88650ms
>
> The issue becomes visible when having a lot of memory (e.g., 4TB)
> assigned to a single NUMA node - a system that can easily be created
> using QEMU. Inside VMs on a hypervisor with quite some memory
> overcommit, this is fairly easy to trigger.
>
> Adding the cond_resched() makes RCU happy.
>
> Reported-by: Yiqian Wei <yiwei@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ca1453204e66..084cabffc90d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1877,6 +1877,7 @@ static int __init deferred_init_memmap(void *data)
>                         prev_nr_pages = nr_pages;
>                         pgdat->first_deferred_pfn = spfn;
>                         pgdat_resize_unlock(pgdat, &flags);
> +                       cond_resched();
>                         goto again;
>                 }
>         }
> --

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

> 2.25.1
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs
  2020-04-01 10:41 [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs David Hildenbrand
  2020-04-01 10:41 ` [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization David Hildenbrand
  2020-04-01 10:41 ` [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() David Hildenbrand
@ 2020-04-01 14:10 ` David Hildenbrand
  2020-04-01 14:31   ` Pankaj Gupta
  2 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2020-04-01 14:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Alexander Duyck, Andrew Morton, Baoquan He,
	Daniel Jordan, Kirill Tkhai, Michal Hocko, Oscar Salvador,
	Pavel Tatashin, Shile Zhang, Yiqian Wei

On 01.04.20 12:41, David Hildenbrand wrote:
> Two fixes for misleading stall messages / soft lockups with huge nodes /
> zones during boot without CONFIG_PREEMPT.
> 
> David Hildenbrand (2):
>   mm/page_alloc: fix RCU stalls during deferred page initialization
>   mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
> 
>  mm/page_alloc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 

Patch #1 requires "[PATCH v3] mm: fix tick timer stall during deferred
page init"

https://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs
  2020-04-01 14:10 ` [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs David Hildenbrand
@ 2020-04-01 14:31   ` Pankaj Gupta
  2020-04-01 14:45     ` Daniel Jordan
  0 siblings, 1 reply; 21+ messages in thread
From: Pankaj Gupta @ 2020-04-01 14:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Baoquan He, Daniel Jordan, Kirill Tkhai, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Shile Zhang, Yiqian Wei

> On 01.04.20 12:41, David Hildenbrand wrote:
> > Two fixes for misleading stall messages / soft lockups with huge nodes /
> > zones during boot without CONFIG_PREEMPT.
> >
> > David Hildenbrand (2):
> >   mm/page_alloc: fix RCU stalls during deferred page initialization
> >   mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
> >
> >  mm/page_alloc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
>
> Patch #1 requires "[PATCH v3] mm: fix tick timer stall during deferred
> page init"
>
> https://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com

Thanks! Took me some time to figure it out.

Pankaj

>
> --
> Thanks,
>
> David / dhildenb
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs
  2020-04-01 14:31   ` Pankaj Gupta
@ 2020-04-01 14:45     ` Daniel Jordan
  2020-04-01 15:54       ` David Hildenbrand
  2020-04-01 18:06       ` Andrew Morton
  0 siblings, 2 replies; 21+ messages in thread
From: Daniel Jordan @ 2020-04-01 14:45 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: David Hildenbrand, linux-kernel, linux-mm, Alexander Duyck,
	Andrew Morton, Baoquan He, Daniel Jordan, Kirill Tkhai,
	Michal Hocko, Oscar Salvador, Pavel Tatashin, Shile Zhang,
	Yiqian Wei

On Wed, Apr 01, 2020 at 04:31:51PM +0200, Pankaj Gupta wrote:
> > On 01.04.20 12:41, David Hildenbrand wrote:
> > > Two fixes for misleading stall messages / soft lockups with huge nodes /
> > > zones during boot without CONFIG_PREEMPT.
> > >
> > > David Hildenbrand (2):
> > >   mm/page_alloc: fix RCU stalls during deferred page initialization
> > >   mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
> > >
> > >  mm/page_alloc.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> >
> > Patch #1 requires "[PATCH v3] mm: fix tick timer stall during deferred
> > page init"
> >
> > https://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com
> 
> Thanks! Took me some time to figure it out.

FYI, I'm planning to post an alternate version of that fix, hopefully today if
all goes well with my testing.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization
  2020-04-01 10:41 ` [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization David Hildenbrand
                     ` (3 preceding siblings ...)
  2020-04-01 14:09   ` Pankaj Gupta
@ 2020-04-01 15:45   ` Michal Hocko
  2020-04-01 15:47     ` David Hildenbrand
  4 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2020-04-01 15:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Yiqian Wei, Andrew Morton, Kirill Tkhai,
	Shile Zhang, Pavel Tatashin, Daniel Jordan, Alexander Duyck,
	Baoquan He, Oscar Salvador

On Wed 01-04-20 12:41:55, David Hildenbrand wrote:
> With CONFIG_DEFERRED_STRUCT_PAGE_INIT and without CONFIG_PREEMPT, it can
> happen that we get RCU stalls detected when booting up.
> 
> [   60.474005] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [   60.475000] rcu:  1-...0: (0 ticks this GP) idle=02a/1/0x4000000000000000 softirq=1/1 fqs=15000
> [   60.475000] rcu:  (detected by 0, t=60002 jiffies, g=-1199, q=1)
> [   60.475000] Sending NMI from CPU 0 to CPUs 1:
> [    1.760091] NMI backtrace for cpu 1
> [    1.760091] CPU: 1 PID: 20 Comm: pgdatinit0 Not tainted 4.18.0-147.9.1.el8_1.x86_64 #1
> [    1.760091] Hardware name: Red Hat KVM, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014
> [    1.760091] RIP: 0010:__init_single_page.isra.65+0x10/0x4f
> [    1.760091] Code: 48 83 cf 63 48 89 f8 0f 1f 40 00 48 89 c6 48 89 d7 e8 6b 18 80 ff 66 90 5b c3 31 c0 b9 10 00 00 00 49 89 f8 48 c1 e6 33 f3 ab <b8> 07 00 00 00 48 c1 e2 36 41 c7 40 34 01 00 00 00 48 c1 e0 33 41
> [    1.760091] RSP: 0000:ffffba783123be40 EFLAGS: 00000006
> [    1.760091] RAX: 0000000000000000 RBX: fffffad34405e300 RCX: 0000000000000000
> [    1.760091] RDX: 0000000000000000 RSI: 0010000000000000 RDI: fffffad34405e340
> [    1.760091] RBP: 0000000033f3177e R08: fffffad34405e300 R09: 0000000000000002
> [    1.760091] R10: 000000000000002b R11: ffff98afb691a500 R12: 0000000000000002
> [    1.760091] R13: 0000000000000000 R14: 000000003f03ea00 R15: 000000003e10178c
> [    1.760091] FS:  0000000000000000(0000) GS:ffff9c9ebeb00000(0000) knlGS:0000000000000000
> [    1.760091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.760091] CR2: 00000000ffffffff CR3: 000000a1cf20a001 CR4: 00000000003606e0
> [    1.760091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    1.760091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    1.760091] Call Trace:
> [    1.760091]  deferred_init_pages+0x8f/0xbf
> [    1.760091]  deferred_init_memmap+0x184/0x29d
> [    1.760091]  ? deferred_free_pages.isra.97+0xba/0xba
> [    1.760091]  kthread+0x112/0x130
> [    1.760091]  ? kthread_flush_work_fn+0x10/0x10
> [    1.760091]  ret_from_fork+0x35/0x40
> [   89.123011] node 0 initialised, 1055935372 pages in 88650ms
> 
> The issue becomes visible when having a lot of memory (e.g., 4TB)
> assigned to a single NUMA node - a system that can easily be created
> using QEMU. Inside VMs on a hypervisor with quite some memory
> overcommit, this is fairly easy to trigger.
> 
> Adding the cond_resched() makes RCU happy.

I believe the patch you depend on is a wrong way to go so please let's
wait until that settles down. But your cond_resched makes a perfect
sense. Just have it called $FOO pages - e.g. hotplug is once per
section. This is not bound to SPARSEMEM so you would have to use
a differen't constant but something along those lines would work.

> 
> Reported-by: Yiqian Wei <yiwei@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ca1453204e66..084cabffc90d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1877,6 +1877,7 @@ static int __init deferred_init_memmap(void *data)
>  			prev_nr_pages = nr_pages;
>  			pgdat->first_deferred_pfn = spfn;
>  			pgdat_resize_unlock(pgdat, &flags);
> +			cond_resched();
>  			goto again;
>  		}
>  	}
> -- 
> 2.25.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
  2020-04-01 10:41 ` [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() David Hildenbrand
                     ` (3 preceding siblings ...)
  2020-04-01 13:59   ` Shile Zhang
@ 2020-04-01 15:45   ` Michal Hocko
  4 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2020-04-01 15:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Kirill Tkhai, Shile Zhang,
	Pavel Tatashin, Daniel Jordan, Alexander Duyck, Baoquan He,
	Oscar Salvador

On Wed 01-04-20 12:41:56, David Hildenbrand wrote:
> Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
> e.g., while booting up.
> 
> [  105.608900] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
> [  105.608933] Modules linked in:
> [  105.608933] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
> [  105.608933] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
> [  105.608933] RIP: 0010:__pageblock_pfn_to_page+0x134/0x1c0
> [  105.608933] Code: 85 c0 74 71 4a 8b 04 d0 48 85 c0 74 68 48 01 c1 74 63 f6 01 04 74 5e 48 c1 e7 06 4c 8b 05 cc 991
> [  105.608933] RSP: 0000:ffffb6d94000fe60 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
> [  105.608933] RAX: fffff81953250000 RBX: 000000000a4c9600 RCX: ffff8fe9ff7c1990
> [  105.608933] RDX: ffff8fe9ff7dab80 RSI: 000000000a4c95ff RDI: 0000000293250000
> [  105.608933] RBP: ffff8fe9ff7dab80 R08: fffff816c0000000 R09: 0000000000000008
> [  105.608933] R10: 0000000000000014 R11: 0000000000000014 R12: 0000000000000000
> [  105.608933] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  105.608933] FS:  0000000000000000(0000) GS:ffff8fe1ff400000(0000) knlGS:0000000000000000
> [  105.608933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  105.608933] CR2: 000000000f613000 CR3: 00000088cf20a000 CR4: 00000000000006f0
> [  105.608933] Call Trace:
> [  105.608933]  set_zone_contiguous+0x56/0x70
> [  105.608933]  page_alloc_init_late+0x166/0x176
> [  105.608933]  kernel_init_freeable+0xfa/0x255
> [  105.608933]  ? rest_init+0xaa/0xaa
> [  105.608933]  kernel_init+0xa/0x106
> [  105.608933]  ret_from_fork+0x35/0x40
> 
> The issue becomes visible when having a lot of memory (e.g., 4TB)
> assigned to a single NUMA node - a system that can easily be created
> using QEMU. Inside VMs on a hypervisor with quite some memory
> overcommit, this is fairly easy to trigger.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 084cabffc90d..cc4f07d52939 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1607,6 +1607,7 @@ void set_zone_contiguous(struct zone *zone)
>  		if (!__pageblock_pfn_to_page(block_start_pfn,
>  					     block_end_pfn, zone))
>  			return;
> +		cond_resched();
>  	}
>  
>  	/* We confirm that there is no hole */
> -- 
> 2.25.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization
  2020-04-01 15:45   ` Michal Hocko
@ 2020-04-01 15:47     ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-04-01 15:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Yiqian Wei, Andrew Morton, Kirill Tkhai,
	Shile Zhang, Pavel Tatashin, Daniel Jordan, Alexander Duyck,
	Baoquan He, Oscar Salvador

On 01.04.20 17:45, Michal Hocko wrote:
> On Wed 01-04-20 12:41:55, David Hildenbrand wrote:
>> With CONFIG_DEFERRED_STRUCT_PAGE_INIT and without CONFIG_PREEMPT, it can
>> happen that we get RCU stalls detected when booting up.
>>
>> [   60.474005] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>> [   60.475000] rcu:  1-...0: (0 ticks this GP) idle=02a/1/0x4000000000000000 softirq=1/1 fqs=15000
>> [   60.475000] rcu:  (detected by 0, t=60002 jiffies, g=-1199, q=1)
>> [   60.475000] Sending NMI from CPU 0 to CPUs 1:
>> [    1.760091] NMI backtrace for cpu 1
>> [    1.760091] CPU: 1 PID: 20 Comm: pgdatinit0 Not tainted 4.18.0-147.9.1.el8_1.x86_64 #1
>> [    1.760091] Hardware name: Red Hat KVM, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014
>> [    1.760091] RIP: 0010:__init_single_page.isra.65+0x10/0x4f
>> [    1.760091] Code: 48 83 cf 63 48 89 f8 0f 1f 40 00 48 89 c6 48 89 d7 e8 6b 18 80 ff 66 90 5b c3 31 c0 b9 10 00 00 00 49 89 f8 48 c1 e6 33 f3 ab <b8> 07 00 00 00 48 c1 e2 36 41 c7 40 34 01 00 00 00 48 c1 e0 33 41
>> [    1.760091] RSP: 0000:ffffba783123be40 EFLAGS: 00000006
>> [    1.760091] RAX: 0000000000000000 RBX: fffffad34405e300 RCX: 0000000000000000
>> [    1.760091] RDX: 0000000000000000 RSI: 0010000000000000 RDI: fffffad34405e340
>> [    1.760091] RBP: 0000000033f3177e R08: fffffad34405e300 R09: 0000000000000002
>> [    1.760091] R10: 000000000000002b R11: ffff98afb691a500 R12: 0000000000000002
>> [    1.760091] R13: 0000000000000000 R14: 000000003f03ea00 R15: 000000003e10178c
>> [    1.760091] FS:  0000000000000000(0000) GS:ffff9c9ebeb00000(0000) knlGS:0000000000000000
>> [    1.760091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    1.760091] CR2: 00000000ffffffff CR3: 000000a1cf20a001 CR4: 00000000003606e0
>> [    1.760091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    1.760091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    1.760091] Call Trace:
>> [    1.760091]  deferred_init_pages+0x8f/0xbf
>> [    1.760091]  deferred_init_memmap+0x184/0x29d
>> [    1.760091]  ? deferred_free_pages.isra.97+0xba/0xba
>> [    1.760091]  kthread+0x112/0x130
>> [    1.760091]  ? kthread_flush_work_fn+0x10/0x10
>> [    1.760091]  ret_from_fork+0x35/0x40
>> [   89.123011] node 0 initialised, 1055935372 pages in 88650ms
>>
>> The issue becomes visible when having a lot of memory (e.g., 4TB)
>> assigned to a single NUMA node - a system that can easily be created
>> using QEMU. Inside VMs on a hypervisor with quite some memory
>> overcommit, this is fairly easy to trigger.
>>
>> Adding the cond_resched() makes RCU happy.
> 
> I believe the patch you depend on is a wrong way to go so please let's
> wait until that settles down. 

I saw a RB as a reply and thought this would get picked up fairly soon.
But sure, let's see how that will look like. Thanks

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs
  2020-04-01 14:45     ` Daniel Jordan
@ 2020-04-01 15:54       ` David Hildenbrand
  2020-04-01 16:10         ` Daniel Jordan
  2020-04-01 18:06       ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2020-04-01 15:54 UTC (permalink / raw)
  To: Daniel Jordan, Pankaj Gupta
  Cc: linux-kernel, linux-mm, Alexander Duyck, Andrew Morton,
	Baoquan He, Kirill Tkhai, Michal Hocko, Oscar Salvador,
	Pavel Tatashin, Shile Zhang, Yiqian Wei

On 01.04.20 16:45, Daniel Jordan wrote:
> On Wed, Apr 01, 2020 at 04:31:51PM +0200, Pankaj Gupta wrote:
>>> On 01.04.20 12:41, David Hildenbrand wrote:
>>>> Two fixes for misleading stall messages / soft lockups with huge nodes /
>>>> zones during boot without CONFIG_PREEMPT.
>>>>
>>>> David Hildenbrand (2):
>>>>   mm/page_alloc: fix RCU stalls during deferred page initialization
>>>>   mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
>>>>
>>>>  mm/page_alloc.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>
>>> Patch #1 requires "[PATCH v3] mm: fix tick timer stall during deferred
>>> page init"
>>>
>>> https://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com
>>
>> Thanks! Took me some time to figure it out.
> 
> FYI, I'm planning to post an alternate version of that fix, hopefully today if
> all goes well with my testing.
> 

Cool, please CC me :)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs
  2020-04-01 15:54       ` David Hildenbrand
@ 2020-04-01 16:10         ` Daniel Jordan
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Jordan @ 2020-04-01 16:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Daniel Jordan, Pankaj Gupta, linux-kernel, linux-mm,
	Alexander Duyck, Andrew Morton, Baoquan He, Kirill Tkhai,
	Michal Hocko, Oscar Salvador, Pavel Tatashin, Shile Zhang,
	Yiqian Wei

On Wed, Apr 01, 2020 at 05:54:40PM +0200, David Hildenbrand wrote:
> On 01.04.20 16:45, Daniel Jordan wrote:
> > On Wed, Apr 01, 2020 at 04:31:51PM +0200, Pankaj Gupta wrote:
> >>> On 01.04.20 12:41, David Hildenbrand wrote:
> >>>> Two fixes for misleading stall messages / soft lockups with huge nodes /
> >>>> zones during boot without CONFIG_PREEMPT.
> >>>>
> >>>> David Hildenbrand (2):
> >>>>   mm/page_alloc: fix RCU stalls during deferred page initialization
> >>>>   mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
> >>>>
> >>>>  mm/page_alloc.c | 2 ++
> >>>>  1 file changed, 2 insertions(+)
> >>>>
> >>>
> >>> Patch #1 requires "[PATCH v3] mm: fix tick timer stall during deferred
> >>> page init"
> >>>
> >>> https://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com
> >>
> >> Thanks! Took me some time to figure it out.
> > 
> > FYI, I'm planning to post an alternate version of that fix, hopefully today if
> > all goes well with my testing.
> > 
> 
> Cool, please CC me :)

Sure, in fact you already were! :)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs
  2020-04-01 14:45     ` Daniel Jordan
  2020-04-01 15:54       ` David Hildenbrand
@ 2020-04-01 18:06       ` Andrew Morton
  2020-04-01 18:29         ` David Hildenbrand
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2020-04-01 18:06 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Pankaj Gupta, David Hildenbrand, linux-kernel, linux-mm,
	Alexander Duyck, Baoquan He, Kirill Tkhai, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Shile Zhang, Yiqian Wei

On Wed, 1 Apr 2020 10:45:29 -0400 Daniel Jordan <daniel.m.jordan@oracle.com> wrote:

> On Wed, Apr 01, 2020 at 04:31:51PM +0200, Pankaj Gupta wrote:
> > > On 01.04.20 12:41, David Hildenbrand wrote:
> > > > Two fixes for misleading stall messages / soft lockups with huge nodes /
> > > > zones during boot without CONFIG_PREEMPT.
> > > >
> > > > David Hildenbrand (2):
> > > >   mm/page_alloc: fix RCU stalls during deferred page initialization
> > > >   mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
> > > >
> > > >  mm/page_alloc.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > >
> > > Patch #1 requires "[PATCH v3] mm: fix tick timer stall during deferred
> > > page init"
> > >
> > > https://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com
> > 
> > Thanks! Took me some time to figure it out.
> 
> FYI, I'm planning to post an alternate version of that fix, hopefully today if
> all goes well with my testing.

I assume you'll redo this two-patch series to apply on top of this
forthcoming patch?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs
  2020-04-01 18:06       ` Andrew Morton
@ 2020-04-01 18:29         ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-04-01 18:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Jordan, Pankaj Gupta, David Hildenbrand, linux-kernel,
	linux-mm, Alexander Duyck, Baoquan He, Kirill Tkhai,
	Michal Hocko, Oscar Salvador, Pavel Tatashin, Shile Zhang,
	Yiqian Wei



> Am 01.04.2020 um 20:06 schrieb Andrew Morton <akpm@linux-foundation.org>:
> 
> On Wed, 1 Apr 2020 10:45:29 -0400 Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
> 
>> On Wed, Apr 01, 2020 at 04:31:51PM +0200, Pankaj Gupta wrote:
>>>>> On 01.04.20 12:41, David Hildenbrand wrote:
>>>>>> Two fixes for misleading stall messages / soft lockups with huge nodes /
>>>>>> zones during boot without CONFIG_PREEMPT.
>>>>>> 
>>>>>> David Hildenbrand (2):
>>>>>>  mm/page_alloc: fix RCU stalls during deferred page initialization
>>>>>>  mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
>>>>>> 
>>>>>> mm/page_alloc.c | 2 ++
>>>>>> 1 file changed, 2 insertions(+)
>>>>>> 
>>>>> 
>>>>> Patch #1 requires "[PATCH v3] mm: fix tick timer stall during deferred
>>>>> page init"
>>>>> 
>>>>> https://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com
>>> 
>>> Thanks! Took me some time to figure it out.
>> 
>> FYI, I'm planning to post an alternate version of that fix, hopefully today if
>> all goes well with my testing.
> 
> I assume you'll redo this two-patch series to apply on top of this
> forthcoming patch?
> 

Yes, will wait until the old one in -next has been replaced by a revised one. Thanks!



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-04-01 18:29 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-01 10:41 [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs David Hildenbrand
2020-04-01 10:41 ` [PATCH v1 1/2] mm/page_alloc: fix RCU stalls during deferred page initialization David Hildenbrand
2020-04-01 13:18   ` Pavel Tatashin
2020-04-01 13:49   ` Baoquan He
2020-04-01 13:58   ` Shile Zhang
2020-04-01 14:09   ` Pankaj Gupta
2020-04-01 15:45   ` Michal Hocko
2020-04-01 15:47     ` David Hildenbrand
2020-04-01 10:41 ` [PATCH v1 2/2] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() David Hildenbrand
2020-04-01 13:17   ` Pavel Tatashin
2020-04-01 13:45   ` Pankaj Gupta
2020-04-01 13:50   ` Baoquan He
2020-04-01 13:59   ` Shile Zhang
2020-04-01 15:45   ` Michal Hocko
2020-04-01 14:10 ` [PATCH v1 0/2] mm/page_alloc: fix stalls/soft lockups with huge VMs David Hildenbrand
2020-04-01 14:31   ` Pankaj Gupta
2020-04-01 14:45     ` Daniel Jordan
2020-04-01 15:54       ` David Hildenbrand
2020-04-01 16:10         ` Daniel Jordan
2020-04-01 18:06       ` Andrew Morton
2020-04-01 18:29         ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).