linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* 5.0-rc kernel hangs on early boot
@ 2019-02-13  8:25 Yury Norov
  2019-02-13 11:14 ` Mel Gorman
  2019-02-13 11:18 ` Will Deacon
  0 siblings, 2 replies; 9+ messages in thread
From: Yury Norov @ 2019-02-13  8:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, Catalin Marinas, yury.norov, Linus Torvalds,
	Will Deacon, linux-kernel, Michal Hocko, linux-arm-kernel,
	David Rientjes, Andrew Morton, Zi Yan, Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 2203 bytes --]

Hi Mel, all,

My kernel on qemu/arm64 setup hangs at early boot since v5.0-rc1.
Backtrace is not too verbose:
(gdb) i threads
  Id   Target Id         Frame
* 1    Thread 1 (CPU#0 [running]) 0xffff000010a49b74 in __delay (cycles=4096)
    at arch/arm64/lib/delay.c:49
  2    Thread 2 (CPU#1 [halted ]) 0x0000000000000000 in ?? ()
  3    Thread 3 (CPU#2 [halted ]) 0x0000000000000000 in ?? ()
  4    Thread 4 (CPU#3 [halted ]) 0x0000000000000000 in ?? ()
(gdb) bt
#0  0xffff000010a49b74 in __delay (cycles=4096) at arch/arm64/lib/delay.c:49
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Reverting the patch
1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
fragmentation event occurs") together with following patch
73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
helps me to boot normally. 

Some system information is below, and config is attached.

Thanks,
Yury

$ qemu-system-aarch64 --version
QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.9)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

$ cat runlinux.sh
TAP=tap0
BRIDGE=br0
NETWORK="-device virtio-net-device,netdev=netdev0 -netdev tap,ifname=$TAP,script=no,id=netdev0"

sudo ip tuntap add mode tap dev $TAP
sudo ip link set dev $TAP promisc on
sudo ip link set dev $TAP up
sudo ip link set dev $TAP master $BRIDGE

qemu-system-aarch64 -machine virt -cpu cortex-a57 -nographic -smp 4 -m 1024 \
	-machine virt -cpu cortex-a57 -nographic -smp 4 -m 1024 \
	-machine virt -cpu cortex-a57 -nographic -smp 4  -m 1024 \
	-global virtio-blk-device.scsi=off -device virtio-scsi-device,id=scsi \
	-drive file=img/ubuntu-core-14.04.1-core-arm64.img,id=coreimg,cache=unsafe,if=none -device scsi-hd,drive=coreimg \
	-kernel /home/yury/work/linux/arch/arm64/boot/Image \
	--append "rcu_nocbs=2-3 irqaffinity=0 task_isolation_debug console=ttyAMA0 root=/dev/sda nohz_full=2-3 isolcpus=2-3 task_isolation=2-3" \
	-initrd initrd.img-3.13.0-62-generic \
	$NETWORK \
	-redir tcp:2222::22 \
	-s \
	$@

sudo ip link set dev $TAP nomaster
sudo ip link set dev $TAP down
sudo ip link set dev $TAP promisc off
sudo ip tuntap del mode tap dev $TAP

[-- Attachment #2: congig.tar.gz --]
[-- Type: application/gzip, Size: 35647 bytes --]

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.0-rc kernel hangs on early boot
  2019-02-13  8:25 5.0-rc kernel hangs on early boot Yury Norov
@ 2019-02-13 11:14 ` Mel Gorman
  2019-02-13 11:51   ` Yury Norov
  2019-02-13 11:18 ` Will Deacon
  1 sibling, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2019-02-13 11:14 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrea Arcangeli, Catalin Marinas, Linus Torvalds, Will Deacon,
	linux-kernel, Michal Hocko, linux-arm-kernel, David Rientjes,
	Andrew Morton, Zi Yan, Vlastimil Babka

On Wed, Feb 13, 2019 at 11:25:40AM +0300, Yury Norov wrote:
> Hi Mel, all,
> 
> My kernel on qemu/arm64 setup hangs at early boot since v5.0-rc1.
> Backtrace is not too verbose:
> (gdb) i threads
>   Id   Target Id         Frame
> * 1    Thread 1 (CPU#0 [running]) 0xffff000010a49b74 in __delay (cycles=4096)
>     at arch/arm64/lib/delay.c:49
>   2    Thread 2 (CPU#1 [halted ]) 0x0000000000000000 in ?? ()
>   3    Thread 3 (CPU#2 [halted ]) 0x0000000000000000 in ?? ()
>   4    Thread 4 (CPU#3 [halted ]) 0x0000000000000000 in ?? ()
> (gdb) bt
> #0  0xffff000010a49b74 in __delay (cycles=4096) at arch/arm64/lib/delay.c:49
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> Reverting the patch
> 1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
> fragmentation event occurs") together with following patch
> 73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
> helps me to boot normally. 
> 

Well, that's a bad start to any day. Thanks for tracking it down. Does
the following patch help? I can't test it properly as I didn't recreate
your boot image or initrd but this appears to get past the initial boot
phase at least.

---8<---
mm, page_alloc: Fix a division by zero error when boosting watermarks

Yury Norov reported that an arm64 KVM instance could not boot since after
v5.0-rc1 and could addressed by reverting the patches

1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")

The problem is that a division by zero error is possible if boosting occurs
either very early in boot or if the high watermark is very small. This
patch checks for the conditions and avoids boosting in those cases.

Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Reported-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d295c9bc01a8..ae7e4ba5b9f5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2170,6 +2170,11 @@ static inline void boost_watermark(struct zone *zone)
 
 	max_boost = mult_frac(zone->_watermark[WMARK_HIGH],
 			watermark_boost_factor, 10000);
+
+	/* high watermark be be uninitialised or very small */
+	if (!max_boost)
+		return;
+
 	max_boost = max(pageblock_nr_pages, max_boost);
 
 	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: 5.0-rc kernel hangs on early boot
  2019-02-13  8:25 5.0-rc kernel hangs on early boot Yury Norov
  2019-02-13 11:14 ` Mel Gorman
@ 2019-02-13 11:18 ` Will Deacon
  2019-02-13 11:21   ` Mel Gorman
  2019-02-13 11:55   ` Yury Norov
  1 sibling, 2 replies; 9+ messages in thread
From: Will Deacon @ 2019-02-13 11:18 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrea Arcangeli, Vlastimil Babka, Catalin Marinas, Mel Gorman,
	linux-kernel, Zi Yan, David Rientjes, Andrew Morton,
	Michal Hocko, Linus Torvalds, linux-arm-kernel

Hi Yury,

On Wed, Feb 13, 2019 at 11:25:40AM +0300, Yury Norov wrote:
> My kernel on qemu/arm64 setup hangs at early boot since v5.0-rc1.
> Backtrace is not too verbose:
> (gdb) i threads
>   Id   Target Id         Frame
> * 1    Thread 1 (CPU#0 [running]) 0xffff000010a49b74 in __delay (cycles=4096)
>     at arch/arm64/lib/delay.c:49
>   2    Thread 2 (CPU#1 [halted ]) 0x0000000000000000 in ?? ()
>   3    Thread 3 (CPU#2 [halted ]) 0x0000000000000000 in ?? ()
>   4    Thread 4 (CPU#3 [halted ]) 0x0000000000000000 in ?? ()
> (gdb) bt
> #0  0xffff000010a49b74 in __delay (cycles=4096) at arch/arm64/lib/delay.c:49
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> Reverting the patch
> 1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
> fragmentation event occurs") together with following patch
> 73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
> helps me to boot normally. 
> 
> Some system information is below, and config is attached.

FWIW, running with your command-line and .config under KVM with earlycon
leads to an early page allocation failure followed by a NULL dereference
during boot if only 1G is configured (log below). For the mm folks, it's
probably worth pointing out that you're using 64k pages.

Can you confirm that the kernel boots successfully if you pass more memory to
qemu (e.g. 2gb)?

Will

--->8

[    0.000000] swapper: page allocation failure: order:0, mode:0x0(), nodemask=(null),cpuset=(null),mems_allowed=0-3
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc64.9.0+ #1
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] Call trace:
[    0.000000]  dump_backtrace+0x0/0x178
[    0.000000]  show_stack+0x24/0x30
[    0.000000]  dump_stack+0x90/0xb4
[    0.000000]  warn_alloc+0x100/0x188
[    0.000000]  __alloc_pages_nodemask+0xd48/0xd88
[    0.000000]  alloc_pages_current+0x8c/0xf8
[    0.000000]  allocate_slab+0x370/0x4a8
[    0.000000]  new_slab+0x98/0xc0
[    0.000000]  ___slab_alloc+0x27c/0x510
[    0.000000]  __slab_alloc+0x50/0x68
[    0.000000]  kmem_cache_alloc+0x1d0/0x1f8
[    0.000000]  bootstrap+0x34/0x170
[    0.000000]  kmem_cache_init+0x9c/0x130
[    0.000000]  start_kernel+0x240/0x464
[    0.000000] Mem-Info:
[    0.000000] active_anon:0 inactive_anon:0 isolated_anon:0
[    0.000000]  active_file:0 inactive_file:0 isolated_file:0
[    0.000000]  unevictable:0 dirty:0 writeback:0 unstable:0
[    0.000000]  slab_reclaimable:0 slab_unreclaimable:1
[    0.000000]  mapped:0 shmem:0 pagetables:0 bounce:0
[    0.000000]  free:7737 free_pcp:0 free_cma:0
[    0.000000] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[    0.000000] Node 0 DMA32 free:495168kB min:524288kB low:524288kB high:524288kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1048576kB managed:495232kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[    0.000000] lowmem_reserve[]: 0 0 0
[    0.000000] Node 0 DMA32: 1*64kB (U) 4*128kB (U) 2*256kB (U) 3*512kB (U) 3*1024kB (U) 1*2048kB (U) 3*4096kB (U) 2*8192kB (U) 2*16384kB (U) 3*32768kB (U) 3*65536kB (U) 1*131072kB (U) 0*262144kB 0*524288kB = 495168kB
[    0.000000] 0 total pagecache pages
[    0.000000] 0 pages in swap cache
[    0.000000] Swap cache stats: add 0, delete 0, find 0/0
[    0.000000] Free swap  = 0kB
[    0.000000] Total swap = 0kB
[    0.000000] 16384 pages RAM
[    0.000000] 0 pages HighMem/MovableOnly
[    0.000000] 8646 pages reserved
[    0.000000] 8192 pages cma reserved
[    0.000000] SLUB: Unable to allocate memory on node -1, gfp=0x408000(GFP_NOWAIT|__GFP_ZERO)
[    0.000000]   cache: kmem_cache, object size: 336, buffer size: 384, default order: 0, min order: 0
[    0.000000]   node 0: slabs: 0, objs: 0, free: 0
[    0.000000] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000045
[    0.000000]   Exception class = DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000045
[    0.000000]   CM = 0, WnR = 1
[    0.000000] [0000000000000000] user address but active_mm is swapper
[    0.000000] Internal error: Oops: 96000045 [#1] SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc64.9.0+ #1
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] pstate: 20000085 (nzCv daIf -PAN -UAO)
[    0.000000] pc : __memcpy+0x110/0x180
[    0.000000] lr : bootstrap+0x48/0x170
[    0.000000] sp : ffff00001104ff20
[    0.000000] x29: ffff00001104ff20 x28: 0000000080ea0018 
[    0.000000] x27: 0000000000000000 x26: 0000000000000000 
[    0.000000] x25: 0000000000000000 x24: ffff000010f80008 
[    0.000000] x23: ffff000010f92490 x22: ffff000011070000 
[    0.000000] x21: ffff000010f92490 x20: ffff000011a1f000 
[    0.000000] x19: 0000000000000000 x18: 0000000000000010 
[    0.000000] x17: 0000000000000000 x16: 0000000000000000 
[    0.000000] x15: ffffffffffffffff x14: 00000000ffffffff 
[    0.000000] x13: 00000000000000aa x12: 000000aa000000aa 
[    0.000000] x11: 0000000d00000000 x10: 0000015000000180 
[    0.000000] x9 : 0000000000000005 x8 : 0000000040002000 
[    0.000000] x7 : ffff000011030e80 x6 : 0000000000000000 
[    0.000000] x5 : 0000000000000000 x4 : 0000000000000000 
[    0.000000] x3 : 0000000000000000 x2 : 00000000000000d0 
[    0.000000] x1 : ffff000010f924d0 x0 : 0000000000000000 
[    0.000000] Process swapper (pid: 0, stack limit = 0x(____ptrval____))
[    0.000000] Call trace:
[    0.000000]  __memcpy+0x110/0x180
[    0.000000]  kmem_cache_init+0x9c/0x130
[    0.000000]  start_kernel+0x240/0x464
[    0.000000] Code: a8c12027 a8c12829 a8c1302b a8c1382d (a88120c7) 
[    0.000000] ---[ end trace a6053f496cb6dec0 ]---
[    0.000000] Kernel panic - not syncing: Fatal exception
[    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.0-rc kernel hangs on early boot
  2019-02-13 11:18 ` Will Deacon
@ 2019-02-13 11:21   ` Mel Gorman
  2019-02-13 11:25     ` Will Deacon
  2019-02-13 11:55   ` Yury Norov
  1 sibling, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2019-02-13 11:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andrea Arcangeli, Yury Norov, Vlastimil Babka, Catalin Marinas,
	linux-kernel, Michal Hocko, Zi Yan, David Rientjes,
	Andrew Morton, Linus Torvalds, linux-arm-kernel

On Wed, Feb 13, 2019 at 11:18:44AM +0000, Will Deacon wrote:
> Hi Yury,
> 
> On Wed, Feb 13, 2019 at 11:25:40AM +0300, Yury Norov wrote:
> > My kernel on qemu/arm64 setup hangs at early boot since v5.0-rc1.
> > Backtrace is not too verbose:
> > (gdb) i threads
> >   Id   Target Id         Frame
> > * 1    Thread 1 (CPU#0 [running]) 0xffff000010a49b74 in __delay (cycles=4096)
> >     at arch/arm64/lib/delay.c:49
> >   2    Thread 2 (CPU#1 [halted ]) 0x0000000000000000 in ?? ()
> >   3    Thread 3 (CPU#2 [halted ]) 0x0000000000000000 in ?? ()
> >   4    Thread 4 (CPU#3 [halted ]) 0x0000000000000000 in ?? ()
> > (gdb) bt
> > #0  0xffff000010a49b74 in __delay (cycles=4096) at arch/arm64/lib/delay.c:49
> > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> > 
> > Reverting the patch
> > 1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
> > fragmentation event occurs") together with following patch
> > 73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
> > helps me to boot normally. 
> > 
> > Some system information is below, and config is attached.
> 
> FWIW, running with your command-line and .config under KVM with earlycon
> leads to an early page allocation failure followed by a NULL dereference
> during boot if only 1G is configured (log below). For the mm folks, it's
> probably worth pointing out that you're using 64k pages.
> 

Thanks Will.

While I agree that going OOM early is a problem and would explain why
the boosting logic was hit at all, it's still the case that the boosting
should not divide by zero. Even if the booting is broken due to a lack
of memory, I'd still not prefer to crash due to 1c30844d2dfe272d58c.

-- 
Mel Gorman
SUSE Labs

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.0-rc kernel hangs on early boot
  2019-02-13 11:21   ` Mel Gorman
@ 2019-02-13 11:25     ` Will Deacon
  2019-02-13 11:29       ` Mel Gorman
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2019-02-13 11:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, Yury Norov, Vlastimil Babka, Catalin Marinas,
	linux-kernel, Michal Hocko, Zi Yan, David Rientjes,
	Andrew Morton, Linus Torvalds, linux-arm-kernel

On Wed, Feb 13, 2019 at 11:21:41AM +0000, Mel Gorman wrote:
> On Wed, Feb 13, 2019 at 11:18:44AM +0000, Will Deacon wrote:
> > On Wed, Feb 13, 2019 at 11:25:40AM +0300, Yury Norov wrote:
> > > My kernel on qemu/arm64 setup hangs at early boot since v5.0-rc1.
> > > Backtrace is not too verbose:
> > > (gdb) i threads
> > >   Id   Target Id         Frame
> > > * 1    Thread 1 (CPU#0 [running]) 0xffff000010a49b74 in __delay (cycles=4096)
> > >     at arch/arm64/lib/delay.c:49
> > >   2    Thread 2 (CPU#1 [halted ]) 0x0000000000000000 in ?? ()
> > >   3    Thread 3 (CPU#2 [halted ]) 0x0000000000000000 in ?? ()
> > >   4    Thread 4 (CPU#3 [halted ]) 0x0000000000000000 in ?? ()
> > > (gdb) bt
> > > #0  0xffff000010a49b74 in __delay (cycles=4096) at arch/arm64/lib/delay.c:49
> > > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> > > 
> > > Reverting the patch
> > > 1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
> > > fragmentation event occurs") together with following patch
> > > 73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
> > > helps me to boot normally. 
> > > 
> > > Some system information is below, and config is attached.
> > 
> > FWIW, running with your command-line and .config under KVM with earlycon
> > leads to an early page allocation failure followed by a NULL dereference
> > during boot if only 1G is configured (log below). For the mm folks, it's
> > probably worth pointing out that you're using 64k pages.
> > 
> 
> Thanks Will.
> 
> While I agree that going OOM early is a problem and would explain why
> the boosting logic was hit at all, it's still the case that the boosting
> should not divide by zero. Even if the booting is broken due to a lack
> of memory, I'd still not prefer to crash due to 1c30844d2dfe272d58c.

Yup, sorry, our previous mails crossed paths. Your patch looks sensible in
its own right, I'm just left wondering why we're OOM so early during boot!

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.0-rc kernel hangs on early boot
  2019-02-13 11:25     ` Will Deacon
@ 2019-02-13 11:29       ` Mel Gorman
  0 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2019-02-13 11:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andrea Arcangeli, Yury Norov, Vlastimil Babka, Catalin Marinas,
	linux-kernel, Michal Hocko, Zi Yan, David Rientjes,
	Andrew Morton, Linus Torvalds, linux-arm-kernel

On Wed, Feb 13, 2019 at 11:25:21AM +0000, Will Deacon wrote:
> > Thanks Will.
> > 
> > While I agree that going OOM early is a problem and would explain why
> > the boosting logic was hit at all, it's still the case that the boosting
> > should not divide by zero. Even if the booting is broken due to a lack
> > of memory, I'd still not prefer to crash due to 1c30844d2dfe272d58c.
> 
> Yup, sorry, our previous mails crossed paths. Your patch looks sensible in
> its own right, I'm just left wondering why we're OOM so early during boot!
> 

I completely agree that it's worth pinning down both issues.

-- 
Mel Gorman
SUSE Labs

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.0-rc kernel hangs on early boot
  2019-02-13 11:14 ` Mel Gorman
@ 2019-02-13 11:51   ` Yury Norov
  2019-02-13 13:19     ` Mel Gorman
  0 siblings, 1 reply; 9+ messages in thread
From: Yury Norov @ 2019-02-13 11:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, Catalin Marinas, Yury Norov, Linus Torvalds,
	Will Deacon, linux-kernel, Michal Hocko, linux-arm-kernel,
	David Rientjes, Andrew Morton, Zi Yan, Vlastimil Babka

On Wed, Feb 13, 2019 at 11:14:09AM +0000, Mel Gorman wrote:
> On Wed, Feb 13, 2019 at 11:25:40AM +0300, Yury Norov wrote:
> > Hi Mel, all,
> > 
> > My kernel on qemu/arm64 setup hangs at early boot since v5.0-rc1.
> > Backtrace is not too verbose:
> > (gdb) i threads
> >   Id   Target Id         Frame
> > * 1    Thread 1 (CPU#0 [running]) 0xffff000010a49b74 in __delay (cycles=4096)
> >     at arch/arm64/lib/delay.c:49
> >   2    Thread 2 (CPU#1 [halted ]) 0x0000000000000000 in ?? ()
> >   3    Thread 3 (CPU#2 [halted ]) 0x0000000000000000 in ?? ()
> >   4    Thread 4 (CPU#3 [halted ]) 0x0000000000000000 in ?? ()
> > (gdb) bt
> > #0  0xffff000010a49b74 in __delay (cycles=4096) at arch/arm64/lib/delay.c:49
> > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> > 
> > Reverting the patch
> > 1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
> > fragmentation event occurs") together with following patch
> > 73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
> > helps me to boot normally. 
> > 
> 
> Well, that's a bad start to any day. Thanks for tracking it down. Does
> the following patch help? I can't test it properly as I didn't recreate
> your boot image or initrd but this appears to get past the initial boot
> phase at least.

Hi Mel,

The patch works for me. The day gets better indeed. :-)

Tested-by: Yury Norov <yury.norov@gmail.com>

Yury

> ---8<---
> mm, page_alloc: Fix a division by zero error when boosting watermarks
> 
> Yury Norov reported that an arm64 KVM instance could not boot since after
> v5.0-rc1 and could addressed by reverting the patches
> 
> 1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
> 73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
> 
> The problem is that a division by zero error is possible if boosting occurs
> either very early in boot or if the high watermark is very small. This
> patch checks for the conditions and avoids boosting in those cases.
> 
> Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
> Reported-by: Yury Norov <yury.norov@gmail.com>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> ---
>  mm/page_alloc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d295c9bc01a8..ae7e4ba5b9f5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2170,6 +2170,11 @@ static inline void boost_watermark(struct zone *zone)
>  
>  	max_boost = mult_frac(zone->_watermark[WMARK_HIGH],
>  			watermark_boost_factor, 10000);
> +
> +	/* high watermark be be uninitialised or very small */
> +	if (!max_boost)
> +		return;
> +
>  	max_boost = max(pageblock_nr_pages, max_boost);
>  
>  	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.0-rc kernel hangs on early boot
  2019-02-13 11:18 ` Will Deacon
  2019-02-13 11:21   ` Mel Gorman
@ 2019-02-13 11:55   ` Yury Norov
  1 sibling, 0 replies; 9+ messages in thread
From: Yury Norov @ 2019-02-13 11:55 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andrea Arcangeli, Yury Norov, Vlastimil Babka, Catalin Marinas,
	Mel Gorman, linux-kernel, Zi Yan, David Rientjes, Andrew Morton,
	Michal Hocko, Linus Torvalds, linux-arm-kernel

On Wed, Feb 13, 2019 at 11:18:44AM +0000, Will Deacon wrote:
> Hi Yury,
> 
> On Wed, Feb 13, 2019 at 11:25:40AM +0300, Yury Norov wrote:
> > My kernel on qemu/arm64 setup hangs at early boot since v5.0-rc1.
> > Backtrace is not too verbose:
> > (gdb) i threads
> >   Id   Target Id         Frame
> > * 1    Thread 1 (CPU#0 [running]) 0xffff000010a49b74 in __delay (cycles=4096)
> >     at arch/arm64/lib/delay.c:49
> >   2    Thread 2 (CPU#1 [halted ]) 0x0000000000000000 in ?? ()
> >   3    Thread 3 (CPU#2 [halted ]) 0x0000000000000000 in ?? ()
> >   4    Thread 4 (CPU#3 [halted ]) 0x0000000000000000 in ?? ()
> > (gdb) bt
> > #0  0xffff000010a49b74 in __delay (cycles=4096) at arch/arm64/lib/delay.c:49
> > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> > 
> > Reverting the patch
> > 1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
> > fragmentation event occurs") together with following patch
> > 73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
> > helps me to boot normally. 
> > 
> > Some system information is below, and config is attached.
> 
> FWIW, running with your command-line and .config under KVM with earlycon
> leads to an early page allocation failure followed by a NULL dereference
> during boot if only 1G is configured (log below). For the mm folks, it's
> probably worth pointing out that you're using 64k pages.
> 
> Can you confirm that the kernel boots successfully if you pass more memory to
> qemu (e.g. 2gb)?

2G workaround works, I confirm. Also, Mel's fix provided in this thread works
as well.

Thanks,
Yury

> Will
> 
> --->8
> 
> [    0.000000] swapper: page allocation failure: order:0, mode:0x0(), nodemask=(null),cpuset=(null),mems_allowed=0-3
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc64.9.0+ #1
> [    0.000000] Hardware name: linux,dummy-virt (DT)
> [    0.000000] Call trace:
> [    0.000000]  dump_backtrace+0x0/0x178
> [    0.000000]  show_stack+0x24/0x30
> [    0.000000]  dump_stack+0x90/0xb4
> [    0.000000]  warn_alloc+0x100/0x188
> [    0.000000]  __alloc_pages_nodemask+0xd48/0xd88
> [    0.000000]  alloc_pages_current+0x8c/0xf8
> [    0.000000]  allocate_slab+0x370/0x4a8
> [    0.000000]  new_slab+0x98/0xc0
> [    0.000000]  ___slab_alloc+0x27c/0x510
> [    0.000000]  __slab_alloc+0x50/0x68
> [    0.000000]  kmem_cache_alloc+0x1d0/0x1f8
> [    0.000000]  bootstrap+0x34/0x170
> [    0.000000]  kmem_cache_init+0x9c/0x130
> [    0.000000]  start_kernel+0x240/0x464
> [    0.000000] Mem-Info:
> [    0.000000] active_anon:0 inactive_anon:0 isolated_anon:0
> [    0.000000]  active_file:0 inactive_file:0 isolated_file:0
> [    0.000000]  unevictable:0 dirty:0 writeback:0 unstable:0
> [    0.000000]  slab_reclaimable:0 slab_unreclaimable:1
> [    0.000000]  mapped:0 shmem:0 pagetables:0 bounce:0
> [    0.000000]  free:7737 free_pcp:0 free_cma:0
> [    0.000000] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [    0.000000] Node 0 DMA32 free:495168kB min:524288kB low:524288kB high:524288kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1048576kB managed:495232kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [    0.000000] lowmem_reserve[]: 0 0 0
> [    0.000000] Node 0 DMA32: 1*64kB (U) 4*128kB (U) 2*256kB (U) 3*512kB (U) 3*1024kB (U) 1*2048kB (U) 3*4096kB (U) 2*8192kB (U) 2*16384kB (U) 3*32768kB (U) 3*65536kB (U) 1*131072kB (U) 0*262144kB 0*524288kB = 495168kB
> [    0.000000] 0 total pagecache pages
> [    0.000000] 0 pages in swap cache
> [    0.000000] Swap cache stats: add 0, delete 0, find 0/0
> [    0.000000] Free swap  = 0kB
> [    0.000000] Total swap = 0kB
> [    0.000000] 16384 pages RAM
> [    0.000000] 0 pages HighMem/MovableOnly
> [    0.000000] 8646 pages reserved
> [    0.000000] 8192 pages cma reserved
> [    0.000000] SLUB: Unable to allocate memory on node -1, gfp=0x408000(GFP_NOWAIT|__GFP_ZERO)
> [    0.000000]   cache: kmem_cache, object size: 336, buffer size: 384, default order: 0, min order: 0
> [    0.000000]   node 0: slabs: 0, objs: 0, free: 0
> [    0.000000] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [    0.000000] Mem abort info:
> [    0.000000]   ESR = 0x96000045
> [    0.000000]   Exception class = DABT (current EL), IL = 32 bits
> [    0.000000]   SET = 0, FnV = 0
> [    0.000000]   EA = 0, S1PTW = 0
> [    0.000000] Data abort info:
> [    0.000000]   ISV = 0, ISS = 0x00000045
> [    0.000000]   CM = 0, WnR = 1
> [    0.000000] [0000000000000000] user address but active_mm is swapper
> [    0.000000] Internal error: Oops: 96000045 [#1] SMP
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc64.9.0+ #1
> [    0.000000] Hardware name: linux,dummy-virt (DT)
> [    0.000000] pstate: 20000085 (nzCv daIf -PAN -UAO)
> [    0.000000] pc : __memcpy+0x110/0x180
> [    0.000000] lr : bootstrap+0x48/0x170
> [    0.000000] sp : ffff00001104ff20
> [    0.000000] x29: ffff00001104ff20 x28: 0000000080ea0018 
> [    0.000000] x27: 0000000000000000 x26: 0000000000000000 
> [    0.000000] x25: 0000000000000000 x24: ffff000010f80008 
> [    0.000000] x23: ffff000010f92490 x22: ffff000011070000 
> [    0.000000] x21: ffff000010f92490 x20: ffff000011a1f000 
> [    0.000000] x19: 0000000000000000 x18: 0000000000000010 
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000 
> [    0.000000] x15: ffffffffffffffff x14: 00000000ffffffff 
> [    0.000000] x13: 00000000000000aa x12: 000000aa000000aa 
> [    0.000000] x11: 0000000d00000000 x10: 0000015000000180 
> [    0.000000] x9 : 0000000000000005 x8 : 0000000040002000 
> [    0.000000] x7 : ffff000011030e80 x6 : 0000000000000000 
> [    0.000000] x5 : 0000000000000000 x4 : 0000000000000000 
> [    0.000000] x3 : 0000000000000000 x2 : 00000000000000d0 
> [    0.000000] x1 : ffff000010f924d0 x0 : 0000000000000000 
> [    0.000000] Process swapper (pid: 0, stack limit = 0x(____ptrval____))
> [    0.000000] Call trace:
> [    0.000000]  __memcpy+0x110/0x180
> [    0.000000]  kmem_cache_init+0x9c/0x130
> [    0.000000]  start_kernel+0x240/0x464
> [    0.000000] Code: a8c12027 a8c12829 a8c1302b a8c1382d (a88120c7) 
> [    0.000000] ---[ end trace a6053f496cb6dec0 ]---
> [    0.000000] Kernel panic - not syncing: Fatal exception
> [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 5.0-rc kernel hangs on early boot
  2019-02-13 11:51   ` Yury Norov
@ 2019-02-13 13:19     ` Mel Gorman
  0 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2019-02-13 13:19 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrea Arcangeli, Catalin Marinas, Linus Torvalds, Will Deacon,
	linux-kernel, Michal Hocko, linux-arm-kernel, David Rientjes,
	Andrew Morton, Zi Yan, Vlastimil Babka

On Wed, Feb 13, 2019 at 02:51:15PM +0300, Yury Norov wrote:
> On Wed, Feb 13, 2019 at 11:14:09AM +0000, Mel Gorman wrote:
> > On Wed, Feb 13, 2019 at 11:25:40AM +0300, Yury Norov wrote:
> > > Hi Mel, all,
> > > 
> > > My kernel on qemu/arm64 setup hangs at early boot since v5.0-rc1.
> > > Backtrace is not too verbose:
> > > (gdb) i threads
> > >   Id   Target Id         Frame
> > > * 1    Thread 1 (CPU#0 [running]) 0xffff000010a49b74 in __delay (cycles=4096)
> > >     at arch/arm64/lib/delay.c:49
> > >   2    Thread 2 (CPU#1 [halted ]) 0x0000000000000000 in ?? ()
> > >   3    Thread 3 (CPU#2 [halted ]) 0x0000000000000000 in ?? ()
> > >   4    Thread 4 (CPU#3 [halted ]) 0x0000000000000000 in ?? ()
> > > (gdb) bt
> > > #0  0xffff000010a49b74 in __delay (cycles=4096) at arch/arm64/lib/delay.c:49
> > > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> > > 
> > > Reverting the patch
> > > 1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
> > > fragmentation event occurs") together with following patch
> > > 73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
> > > helps me to boot normally. 
> > > 
> > 
> > Well, that's a bad start to any day. Thanks for tracking it down. Does
> > the following patch help? I can't test it properly as I didn't recreate
> > your boot image or initrd but this appears to get past the initial boot
> > phase at least.
> 
> Hi Mel,
> 
> The patch works for me. The day gets better indeed. :-)
> 
> Tested-by: Yury Norov <yury.norov@gmail.com>
> 

Thanks! I've resent the patch to Andrew so hopefully it'll be picked up
before 5.0 is released.

-- 
Mel Gorman
SUSE Labs

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-02-13 13:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-13  8:25 5.0-rc kernel hangs on early boot Yury Norov
2019-02-13 11:14 ` Mel Gorman
2019-02-13 11:51   ` Yury Norov
2019-02-13 13:19     ` Mel Gorman
2019-02-13 11:18 ` Will Deacon
2019-02-13 11:21   ` Mel Gorman
2019-02-13 11:25     ` Will Deacon
2019-02-13 11:29       ` Mel Gorman
2019-02-13 11:55   ` Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).