All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] kmemleak: commit c566586818 causes failure to boot
@ 2019-10-14  2:26 Theodore Y. Ts'o
  2019-10-14  7:03 ` Catalin Marinas
  0 siblings, 1 reply; 7+ messages in thread
From: Theodore Y. Ts'o @ 2019-10-14  2:26 UTC (permalink / raw)
  To: catalin.marinas; +Cc: Linus Torvalds, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2927 bytes --]

Commit c566586818 ("mm: kmemleak: use the memory pool for early
allocations") causes my test kernels to fail to boot on using both kvm
and using Google Compute Engine.  A git bisect localized it to
c566586818, and I confirmed by test building v5.4-rc3, which failed as
above using KVM.  When I reverted c566586818 the kernel booted
successfully.

The symptoms are that the boot hangs after:

[    2.844808] hctosys: unable to open rtc device (rtc0)

and then about 25 seconds later, we get the following warning:

[   28.237938] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:7]
[   28.239345] irq event stamp: 198897938
[   28.240017] hardirqs last  enabled at (198897937): [<ffffffffa0f0e9c3>] _raw_write_unlock_irqrestore+
0x43/0x47
[   28.241979] hardirqs last disabled at (198897938): [<ffffffffa040180a>] trace_hardirqs_off_thunk+0x1a
/0x20
[   28.243930] softirqs last  enabled at (198876302): [<ffffffffa120032a>] __do_softirq+0x32a/0x42a
[   28.247350] softirqs last disabled at (198876295): [<ffffffffa04b84e3>] irq_exit+0xb3/0xc0
[   28.250080] CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.4.0-rc3-xfstests-00403-g4f5cafb5cb84 #1225
[   28.253081] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[   28.254885] Workqueue: events kmemleak_do_cleanup
[   28.255570] RIP: 0010:_raw_write_unlock_irqrestore+0x45/0x47
[   28.256401] Code: e8 b0 4d 60 ff 48 89 ef e8 d8 a6 60 ff f6 c7 02 75 11 53 9d e8 dc b1 68 ff 65 ff 0d
 cd 73 10 5f 5b 5d c3 e8 ed b0 68 ff 53 9d <eb> ed 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
 90 90
[   28.260440] RSP: 0000:ffff98984006fdf8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[   28.262258] RAX: ffff94d7fd23a1c0 RBX: 0000000000000246 RCX: 0000000000000006
[   28.264238] RDX: 0000000000000007 RSI: ffff94d7fd23a9c0 RDI: ffff94d7fd23a1c0
[   28.267333] RBP: ffffffffa1c94bc0 R08: 00000006931c1cf2 R09: 0000000000000000
[   28.269871] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   28.272175] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa1c94aa8
[   28.274649] FS:  0000000000000000(0000) GS:ffff94d7fd800000(0000) knlGS:0000000000000000
[   28.277758] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.279638] CR2: 0000000000000000 CR3: 000000005fc12001 CR4: 0000000000360ef0
[   28.282367] Call Trace:
[   28.283075]  find_and_remove_object+0x7f/0x90
[   28.284335]  delete_object_full+0xc/0x20
[   28.285488]  __kmemleak_do_cleanup+0x63/0x100
[   28.286913]  process_one_work+0x246/0x570
[   28.288801]  worker_thread+0x50/0x3b0
[   28.290406]  ? process_one_work+0x570/0x570
[   28.291497]  kthread+0x126/0x140
[   28.292316]  ? kthread_delayed_work_timer_fn+0xa0/0xa0
[   28.294262]  ret_from_fork+0x3a/0x50
[   28.837921] rcu: INFO: rcu_sched self-detected stall on CPU
    ...

I've attached the log from the KVM session and the config.gz used to
build the kernels.

					- Ted

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 17215 bytes --]

[-- Attachment #3: log.201910132216.gz --]
[-- Type: application/gzip, Size: 10103 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
  2019-10-14  2:26 [REGRESSION] kmemleak: commit c566586818 causes failure to boot Theodore Y. Ts'o
@ 2019-10-14  7:03 ` Catalin Marinas
  2019-10-14 11:50   ` Theodore Y. Ts'o
  2019-10-14 15:57   ` Linus Torvalds
  0 siblings, 2 replies; 7+ messages in thread
From: Catalin Marinas @ 2019-10-14  7:03 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton

Hi Ted,

On Sun, Oct 13, 2019 at 10:26:33PM -0400, Theodore Y. Ts'o wrote:
> Commit c566586818 ("mm: kmemleak: use the memory pool for early
> allocations") causes my test kernels to fail to boot on using both kvm
> and using Google Compute Engine.  A git bisect localized it to
> c566586818, and I confirmed by test building v5.4-rc3, which failed as
> above using KVM.  When I reverted c566586818 the kernel booted
> successfully.

Thanks for the report. I have a fix already:

http://lkml.kernel.org/r/20191004134624.46216-1-catalin.marinas@arm.com

I was hoping Andrew had sent it to Linus before -rc3 but it doesn't seem
to be in mainline yet.

Linus, could you please merge the patch above? I can send it again if
it's easier.

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
  2019-10-14  7:03 ` Catalin Marinas
@ 2019-10-14 11:50   ` Theodore Y. Ts'o
  2019-10-14 12:51     ` Catalin Marinas
  2019-10-14 15:57   ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Theodore Y. Ts'o @ 2019-10-14 11:50 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton

On Mon, Oct 14, 2019 at 08:03:14AM +0100, Catalin Marinas wrote:
> Thanks for the report. I have a fix already:
> 
> http://lkml.kernel.org/r/20191004134624.46216-1-catalin.marinas@arm.com
> 
> I was hoping Andrew had sent it to Linus before -rc3 but it doesn't seem
> to be in mainline yet.

Thanks for the pointer to the fix!  Does that mean that the workaround
is to increase the kmemleak pool size?  I had been using the default
(16000) and it seems surprising that that it wasn't enough to even get
the kernel through a standard boot sequence.  Should we perhaps
increase the default mempool size?

							- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
  2019-10-14 11:50   ` Theodore Y. Ts'o
@ 2019-10-14 12:51     ` Catalin Marinas
  2019-10-14 13:45       ` Theodore Y. Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: Catalin Marinas @ 2019-10-14 12:51 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton

On Mon, Oct 14, 2019 at 07:50:21AM -0400, Theodore Y. Ts'o wrote:
> On Mon, Oct 14, 2019 at 08:03:14AM +0100, Catalin Marinas wrote:
> > Thanks for the report. I have a fix already:
> > 
> > http://lkml.kernel.org/r/20191004134624.46216-1-catalin.marinas@arm.com
> > 
> > I was hoping Andrew had sent it to Linus before -rc3 but it doesn't seem
> > to be in mainline yet.
> 
> Thanks for the pointer to the fix!  Does that mean that the workaround
> is to increase the kmemleak pool size?  I had been using the default
> (16000) and it seems surprising that that it wasn't enough to even get
> the kernel through a standard boot sequence.  Should we perhaps
> increase the default mempool size?

In your case, CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, so it disables itself
irrespective of the pool size and trips over the bug. Even with default
off, it still involves the clean-up since kmemleak needs to track early
allocations in case it is turned on by the kmemleak=on cmdline option.

So I think 16000 is sufficient in your case, the default-off triggered
the bug (well, unless you find in the logs "kmemleak: Memory pool empty,
consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE").

-- 
Catalin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
  2019-10-14 12:51     ` Catalin Marinas
@ 2019-10-14 13:45       ` Theodore Y. Ts'o
  0 siblings, 0 replies; 7+ messages in thread
From: Theodore Y. Ts'o @ 2019-10-14 13:45 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton

On Mon, Oct 14, 2019 at 01:51:15PM +0100, Catalin Marinas wrote:
> In your case, CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, so it disables itself
> irrespective of the pool size and trips over the bug. Even with default
> off, it still involves the clean-up since kmemleak needs to track early
> allocations in case it is turned on by the kmemleak=on cmdline option.
> 
> So I think 16000 is sufficient in your case, the default-off triggered
> the bug (well, unless you find in the logs "kmemleak: Memory pool empty,
> consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE").

Ah, got it, thanks for the clarification!

					- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
  2019-10-14  7:03 ` Catalin Marinas
  2019-10-14 11:50   ` Theodore Y. Ts'o
@ 2019-10-14 15:57   ` Linus Torvalds
  2019-10-14 16:27     ` Catalin Marinas
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2019-10-14 15:57 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Theodore Y. Ts'o, Linux Kernel Mailing List, Andrew Morton

On Mon, Oct 14, 2019 at 12:03 AM Catalin Marinas
<catalin.marinas@arm.com> wrote:
>
> Linus, could you please merge the patch above? I can send it again if
> it's easier.

I took it.

Generally I prefer having patches (re-)sent to me explicitly rather
than getting a link to it, so for next time...

            Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] kmemleak: commit c566586818 causes failure to boot
  2019-10-14 15:57   ` Linus Torvalds
@ 2019-10-14 16:27     ` Catalin Marinas
  0 siblings, 0 replies; 7+ messages in thread
From: Catalin Marinas @ 2019-10-14 16:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Linux Kernel Mailing List, Andrew Morton

On Mon, Oct 14, 2019 at 08:57:41AM -0700, Linus Torvalds wrote:
> On Mon, Oct 14, 2019 at 12:03 AM Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > Linus, could you please merge the patch above? I can send it again if
> > it's easier.
> 
> I took it.

Thanks.

> Generally I prefer having patches (re-)sent to me explicitly rather
> than getting a link to it, so for next time...

Noted.

-- 
Catalin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-14 16:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-14  2:26 [REGRESSION] kmemleak: commit c566586818 causes failure to boot Theodore Y. Ts'o
2019-10-14  7:03 ` Catalin Marinas
2019-10-14 11:50   ` Theodore Y. Ts'o
2019-10-14 12:51     ` Catalin Marinas
2019-10-14 13:45       ` Theodore Y. Ts'o
2019-10-14 15:57   ` Linus Torvalds
2019-10-14 16:27     ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.