All of lore.kernel.org
 help / color / mirror / Atom feed
* tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
@ 2017-08-29  8:50 Mike Galbraith
  2017-08-29  8:58 ` Ingo Molnar
  2017-08-29 15:58 ` Kees Cook
  0 siblings, 2 replies; 36+ messages in thread
From: Mike Galbraith @ 2017-08-29  8:50 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, Ingo Molnar

Greetings,

Take 2 of KVM bisect as you work fingered $subject.  Take 1 was stymied
by build dependencies (aa5d1b81, df340524) which I foolishly tried to
skip, leading git bisect to end up handing me a list of commits that
might be busted.  During take 2, I added those two as required.

Symptom is a few splats as below, with box finally hanging.  Network
comes up, but neither ssh nor console login is possible.

[    4.105048] ------------[ cut here ]------------
[    4.106072] WARNING: CPU: 4 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
[    4.107969] Modules linked in: autofs4(E)
[    4.109328] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G            E   4.13.0.g44e89e4-tip-default #27
[    4.111075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    4.114119] task: ffff88018ee743c0 task.stack: ffffc90000cc0000
[    4.115481] RIP: 0010:netlink_sock_destruct+0x82/0xa0
[    4.116698] RSP: 0018:ffff880246103eb0 EFLAGS: 00010206
[    4.117997] RAX: 0000000000000300 RBX: ffff880236f1f000 RCX: 000077ff80000000
[    4.120657] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
[    4.123145] RBP: ffff880236f1f000 R08: 000400010000b630 R09: 0000b6290000b621
[    4.125139] R10: 000400010000b630 R11: 0000b6290000b621 R12: 0000000000000202
[    4.126866] R13: ffffffff81cf1440 R14: ffff88018ee743c0 R15: ffffffff815e0fd0
[    4.128731] FS:  0000000000000000(0000) GS:ffff880246100000(0000) knlGS:0000000000000000
[    4.130206] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.131581] CR2: 000055c7ab255df0 CR3: 0000000236fd9001 CR4: 00000000001606e0
[    4.133066] Call Trace:
[    4.133919]  <IRQ>
[    4.134836]  __sk_destruct+0x21/0x190
[    4.136016]  rcu_process_callbacks+0x23e/0x880
[    4.137050]  ? rebalance_domains+0x182/0x2b0
[    4.138050]  __do_softirq+0xc8/0x287
[    4.139174]  irq_exit+0xd5/0xe0
[    4.140252]  smp_apic_timer_interrupt+0x64/0x140
[    4.141880]  apic_timer_interrupt+0x96/0xa0
[    4.143290]  </IRQ>
[    4.144214] RIP: 0010:native_safe_halt+0x2/0x10
[    4.145990] RSP: 0018:ffffc90000cc3ed8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[    4.147449] RAX: ffffffff816d4820 RBX: ffff88018ee743c0 RCX: 0000000000000000
[    4.148626] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    4.151061] RBP: 0000000000000004 R08: 000000008e8d302a R09: 0000000000000000
[    4.153687] R10: 0000000000000006 R11: 0000000000000005 R12: ffff88018ee743c0
[    4.155587] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    4.157462]  ? __sched_text_end+0x5/0x5
[    4.158869]  default_idle+0x18/0x110
[    4.160348]  do_idle+0x15e/0x1f0
[    4.161734]  cpu_startup_entry+0x5f/0x70
[    4.163211]  start_secondary+0x14c/0x180
[    4.164716]  secondary_startup_64+0xa5/0xa5
[    4.165730] Code: 00 00 85 c0 75 25 8b 83 44 01 00 00 85 c0 75 10 48 83 bb e0 02 00 00 00 75 02 5b c3 0f ff 5b c3 0f ff 0f 1f 80 00 00 00 00 eb e5 <0f> ff eb d7 48 89 de 48 c7 c7 e0 e6 ab 81 31 c0 5b e9 25 ca af 
[    4.168787] ---[ end trace 79aa32f0718d3fc7 ]---

git bisect start
# good: [9c3a815f471a84811cf8021cf64aae3b8081dfde] page waitqueue: always add new entries at the end
git bisect good 9c3a815f471a84811cf8021cf64aae3b8081dfde
# bad: [44e89e4e15d60159fc09e8f1cbbcd952729edef7] Merge branch 'WIP.x86/fpu'
git bisect bad 44e89e4e15d60159fc09e8f1cbbcd952729edef7
# good: [702e97621ec7e7a36034ebd7a446af04f59d6dee] Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming
git bisect good 702e97621ec7e7a36034ebd7a446af04f59d6dee
# bad: [4318414db869639c928a4ffc100585efbb5552a9] Merge branch 'locking/core'
git bisect bad 4318414db869639c928a4ffc100585efbb5552a9
# good: [438a13906508d7453ee93ff3afe25ef72b99140d] Merge branch 'efi/core'
git bisect good 438a13906508d7453ee93ff3afe25ef72b99140d
# bad: [e26f34a407aec9c65bce2bc0c838fabe4f051fc6] locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
git bisect bad e26f34a407aec9c65bce2bc0c838fabe4f051fc6
# good: [d89e588ca4081615216cc25f2489b0281ac0bfe9] locking: Introduce smp_mb__after_spinlock()
git bisect good d89e588ca4081615216cc25f2489b0281ac0bfe9
# good: [383a4bc88849b804385162e81bf704f8f9690a87] locking/lockdep: Make print_circular_bug() aware of crossrelease
git bisect good 383a4bc88849b804385162e81bf704f8f9690a87
# good: [907dc16d7e23ec81a126c9585435494fa1b3a4b7] locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
git bisect good 907dc16d7e23ec81a126c9585435494fa1b3a4b7
# bad: [0f0a22260d613b4ee3f483ee1ea6fa27f92a9e40] locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
git bisect bad 0f0a22260d613b4ee3f483ee1ea6fa27f92a9e40
# bad: [d0541b0fa64b36665d6261079974a26943c75009] locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
git bisect bad d0541b0fa64b36665d6261079974a26943c75009
# bad: [7a46ec0e2f4850407de5e1d19a44edee6efa58ec] locking/refcounts, x86/asm: Implement fast refcount overflow protection
git bisect bad 7a46ec0e2f4850407de5e1d19a44edee6efa58ec
# first bad commit: [7a46ec0e2f4850407de5e1d19a44edee6efa58ec] locking/refcounts, x86/asm: Implement fast refcount overflow protection

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29  8:50 tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection Mike Galbraith
@ 2017-08-29  8:58 ` Ingo Molnar
  2017-08-29  9:21   ` Mike Galbraith
  2017-08-29 15:58 ` Kees Cook
  1 sibling, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2017-08-29  8:58 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Kees Cook, LKML, Ingo Molnar, Josh Poimboeuf, Peter Zijlstra,
	Thomas Gleixner, Linus Torvalds, Elena Reshetova, Arnd Bergmann,
	H. Peter Anvin


* Mike Galbraith <efault@gmx.de> wrote:

> Greetings,
> 
> Take 2 of KVM bisect as you work fingered $subject.  Take 1 was stymied
> by build dependencies (aa5d1b81, df340524) which I foolishly tried to
> skip, leading git bisect to end up handing me a list of commits that
> might be busted.  During take 2, I added those two as required.
> 
> Symptom is a few splats as below, with box finally hanging.  Network
> comes up, but neither ssh nor console login is possible.

Just to double check, does the patch below also cure the regression?
If we don't find the bug it might be safer than a full revert.

Thanks,

	Ingo

 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d04809921993..3b83ef8f1e9b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -55,7 +55,7 @@ config X86
 	select ARCH_HAS_KCOV			if X86_64
 	select ARCH_HAS_MMIO_FLUSH
 	select ARCH_HAS_PMEM_API		if X86_64
-	select ARCH_HAS_REFCOUNT
+#	select ARCH_HAS_REFCOUNT
 	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SG_CHAIN

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29  8:58 ` Ingo Molnar
@ 2017-08-29  9:21   ` Mike Galbraith
  2017-08-29  9:27     ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-08-29  9:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, LKML, Ingo Molnar, Josh Poimboeuf, Peter Zijlstra,
	Thomas Gleixner, Linus Torvalds, Elena Reshetova, Arnd Bergmann,
	H. Peter Anvin

On Tue, 2017-08-29 at 10:58 +0200, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > Greetings,
> > 
> > Take 2 of KVM bisect as you work fingered $subject.  Take 1 was stymied
> > by build dependencies (aa5d1b81, df340524) which I foolishly tried to
> > skip, leading git bisect to end up handing me a list of commits that
> > might be busted.  During take 2, I added those two as required.
> > 
> > Symptom is a few splats as below, with box finally hanging.  Network
> > comes up, but neither ssh nor console login is possible.
> 
> Just to double check, does the patch below also cure the regression?
> If we don't find the bug it might be safer than a full revert.

I reverted it, and vbox now boots.  Should I still try the below?

> 
> Thanks,
> 
> 	Ingo
> 
>  arch/x86/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index d04809921993..3b83ef8f1e9b 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -55,7 +55,7 @@ config X86
>  	select ARCH_HAS_KCOV			if X86_64
>  	select ARCH_HAS_MMIO_FLUSH
>  	select ARCH_HAS_PMEM_API		if X86_64
> -	select ARCH_HAS_REFCOUNT
> +#	select ARCH_HAS_REFCOUNT
>  	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
>  	select ARCH_HAS_SET_MEMORY
>  	select ARCH_HAS_SG_CHAIN

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29  9:21   ` Mike Galbraith
@ 2017-08-29  9:27     ` Ingo Molnar
  2017-08-29 10:03       ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2017-08-29  9:27 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Kees Cook, LKML, Ingo Molnar, Josh Poimboeuf, Peter Zijlstra,
	Thomas Gleixner, Linus Torvalds, Elena Reshetova, Arnd Bergmann,
	H. Peter Anvin


* Mike Galbraith <efault@gmx.de> wrote:

> On Tue, 2017-08-29 at 10:58 +0200, Ingo Molnar wrote:
> > * Mike Galbraith <efault@gmx.de> wrote:
> > 
> > > Greetings,
> > > 
> > > Take 2 of KVM bisect as you work fingered $subject.  Take 1 was stymied
> > > by build dependencies (aa5d1b81, df340524) which I foolishly tried to
> > > skip, leading git bisect to end up handing me a list of commits that
> > > might be busted.  During take 2, I added those two as required.
> > > 
> > > Symptom is a few splats as below, with box finally hanging.  Network
> > > comes up, but neither ssh nor console login is possible.
> > 
> > Just to double check, does the patch below also cure the regression?
> > If we don't find the bug it might be safer than a full revert.
> 
> I reverted it, and vbox now boots.  Should I still try the below?

Would be nice, so I can stick a Tested-by on it and such!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29  9:27     ` Ingo Molnar
@ 2017-08-29 10:03       ` Mike Galbraith
  2017-08-29 15:39         ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-08-29 10:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, LKML, Ingo Molnar, Josh Poimboeuf, Peter Zijlstra,
	Thomas Gleixner, Linus Torvalds, Elena Reshetova, Arnd Bergmann,
	H. Peter Anvin

On Tue, 2017-08-29 at 11:27 +0200, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > On Tue, 2017-08-29 at 10:58 +0200, Ingo Molnar wrote:
> > > * Mike Galbraith <efault@gmx.de> wrote:
> > > 
> > > > Greetings,
> > > > 
> > > > Take 2 of KVM bisect as you work fingered $subject.  Take 1 was stymied
> > > > by build dependencies (aa5d1b81, df340524) which I foolishly tried to
> > > > skip, leading git bisect to end up handing me a list of commits that
> > > > might be busted.  During take 2, I added those two as required.
> > > > 
> > > > Symptom is a few splats as below, with box finally hanging.  Network
> > > > comes up, but neither ssh nor console login is possible.
> > > 
> > > Just to double check, does the patch below also cure the regression?
> > > If we don't find the bug it might be safer than a full revert.
> > 
> > I reverted it, and vbox now boots.  Should I still try the below?
> 
> Would be nice, so I can stick a Tested-by on it and such!

Done, works.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29 10:03       ` Mike Galbraith
@ 2017-08-29 15:39         ` Kees Cook
  2017-08-29 16:34           ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-29 15:39 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, LKML, Ingo Molnar, Josh Poimboeuf, Peter Zijlstra,
	Thomas Gleixner, Linus Torvalds, Elena Reshetova, Arnd Bergmann,
	H. Peter Anvin

On Tue, Aug 29, 2017 at 3:03 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Tue, 2017-08-29 at 11:27 +0200, Ingo Molnar wrote:
>> * Mike Galbraith <efault@gmx.de> wrote:
>>
>> > On Tue, 2017-08-29 at 10:58 +0200, Ingo Molnar wrote:
>> > > * Mike Galbraith <efault@gmx.de> wrote:
>> > >
>> > > > Greetings,
>> > > >
>> > > > Take 2 of KVM bisect as you work fingered $subject.  Take 1 was stymied
>> > > > by build dependencies (aa5d1b81, df340524) which I foolishly tried to
>> > > > skip, leading git bisect to end up handing me a list of commits that
>> > > > might be busted.  During take 2, I added those two as required.
>> > > >
>> > > > Symptom is a few splats as below, with box finally hanging.  Network
>> > > > comes up, but neither ssh nor console login is possible.
>> > >
>> > > Just to double check, does the patch below also cure the regression?
>> > > If we don't find the bug it might be safer than a full revert.
>> >
>> > I reverted it, and vbox now boots.  Should I still try the below?
>>
>> Would be nice, so I can stick a Tested-by on it and such!
>
> Done, works.

Was this the x86 refcount implementation, or a atomic -> refcount conversion?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29  8:50 tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection Mike Galbraith
  2017-08-29  8:58 ` Ingo Molnar
@ 2017-08-29 15:58 ` Kees Cook
  2017-08-29 16:55   ` Mike Galbraith
  1 sibling, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-29 15:58 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Tue, Aug 29, 2017 at 1:50 AM, Mike Galbraith <efault@gmx.de> wrote:
> Take 2 of KVM bisect as you work fingered $subject.  Take 1 was stymied
> by build dependencies (aa5d1b81, df340524) which I foolishly tried to
> skip, leading git bisect to end up handing me a list of commits that
> might be busted.  During take 2, I added those two as required.
>
> Symptom is a few splats as below, with box finally hanging.  Network
> comes up, but neither ssh nor console login is possible.
>
> [    4.105048] ------------[ cut here ]------------
> [    4.106072] WARNING: CPU: 4 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
> [    4.107969] Modules linked in: autofs4(E)
> [    4.109328] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G            E   4.13.0.g44e89e4-tip-default #27
> [    4.111075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [    4.114119] task: ffff88018ee743c0 task.stack: ffffc90000cc0000
> [    4.115481] RIP: 0010:netlink_sock_destruct+0x82/0xa0
> [    4.116698] RSP: 0018:ffff880246103eb0 EFLAGS: 00010206
> [    4.117997] RAX: 0000000000000300 RBX: ffff880236f1f000 RCX: 000077ff80000000
> [    4.120657] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
> [    4.123145] RBP: ffff880236f1f000 R08: 000400010000b630 R09: 0000b6290000b621
> [    4.125139] R10: 000400010000b630 R11: 0000b6290000b621 R12: 0000000000000202
> [    4.126866] R13: ffffffff81cf1440 R14: ffff88018ee743c0 R15: ffffffff815e0fd0
> [    4.128731] FS:  0000000000000000(0000) GS:ffff880246100000(0000) knlGS:0000000000000000
> [    4.130206] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.131581] CR2: 000055c7ab255df0 CR3: 0000000236fd9001 CR4: 00000000001606e0
> [    4.133066] Call Trace:
> [    4.133919]  <IRQ>
> [    4.134836]  __sk_destruct+0x21/0x190
> [    4.136016]  rcu_process_callbacks+0x23e/0x880
> [    4.137050]  ? rebalance_domains+0x182/0x2b0
> [    4.138050]  __do_softirq+0xc8/0x287
> [    4.139174]  irq_exit+0xd5/0xe0
> [    4.140252]  smp_apic_timer_interrupt+0x64/0x140
> [    4.141880]  apic_timer_interrupt+0x96/0xa0
> [    4.143290]  </IRQ>
> [    4.144214] RIP: 0010:native_safe_halt+0x2/0x10
> [    4.145990] RSP: 0018:ffffc90000cc3ed8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [    4.147449] RAX: ffffffff816d4820 RBX: ffff88018ee743c0 RCX: 0000000000000000
> [    4.148626] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [    4.151061] RBP: 0000000000000004 R08: 000000008e8d302a R09: 0000000000000000
> [    4.153687] R10: 0000000000000006 R11: 0000000000000005 R12: ffff88018ee743c0
> [    4.155587] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [    4.157462]  ? __sched_text_end+0x5/0x5
> [    4.158869]  default_idle+0x18/0x110
> [    4.160348]  do_idle+0x15e/0x1f0
> [    4.161734]  cpu_startup_entry+0x5f/0x70
> [    4.163211]  start_secondary+0x14c/0x180
> [    4.164716]  secondary_startup_64+0xa5/0xa5
> [    4.165730] Code: 00 00 85 c0 75 25 8b 83 44 01 00 00 85 c0 75 10 48 83 bb e0 02 00 00 00 75 02 5b c3 0f ff 5b c3 0f ff 0f 1f 80 00 00 00 00 eb e5 <0f> ff eb d7 48 89 de 48 c7 c7 e0 e6 ab 81 31 c0 5b e9 25 ca af
> [    4.168787] ---[ end trace 79aa32f0718d3fc7 ]---

Ah-ha, found the tip-bot commit now that disables the x86 refcount
implementation. Can you boot with CONFIG_REFCOUNT_FULL=y?

Can you send me your .config and details on the machine?

Also, what is above the "cut here" line (which is very misleading, as
things above that line tend to be very important)?

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29 15:39         ` Kees Cook
@ 2017-08-29 16:34           ` Mike Galbraith
  0 siblings, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2017-08-29 16:34 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, LKML, Ingo Molnar, Josh Poimboeuf, Peter Zijlstra,
	Thomas Gleixner, Linus Torvalds, Elena Reshetova, Arnd Bergmann,
	H. Peter Anvin

On Tue, 2017-08-29 at 08:39 -0700, Kees Cook wrote:
> 
> Was this the x86 refcount implementation, or a atomic -> refcount conversion?

mmm.... <oija board> yes.

# CONFIG_REFCOUNT_FULL is not set

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29 15:58 ` Kees Cook
@ 2017-08-29 16:55   ` Mike Galbraith
  2017-08-29 18:10     ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-08-29 16:55 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, Ingo Molnar, Reshetova, Elena

[-- Attachment #1: Type: text/plain, Size: 30822 bytes --]

On Tue, 2017-08-29 at 08:58 -0700, Kees Cook wrote:
> 
> Ah-ha, found the tip-bot commit now that disables the x86 refcount
> implementation. Can you boot with CONFIG_REFCOUNT_FULL=y?

Will do in the A.M.

> Can you send me your .config and details on the machine?

Box is a garden variety i4790 desktop box running opensuse-42.3, VM is
a full clone (in 64GB fs) thereof with identical topology, hanging off
the host like a leech (NFS mounts a few spots), and sucking up half of
the host's ram.

The attached config may not be exactly what was used during the
bisection, timestamp says it's from the initial (failed) bisection.

> Also, what is above the "cut here" line (which is very misleading, as
> things above that line tend to be very important)?

Everything up to and including first splat.

[    0.000000] random: get_random_bytes called from start_kernel+0x2f/0x4a0 with crng_init=0
[    0.000000] Linux version 4.13.0.g44e89e4-tip-default (root@homer) (gcc version 4.8.5 (SUSE Linux)) #27 SMP Tue Aug 29 05:57:19 CEST 2017
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0.g44e89e4-tip-default root=UUID=6c5eef27-03f0-4d2a-882f-ca9900c1ed1e nortsched hpc_cpusets nr_cpus=8 skew_tick=1 audit=0 ftrace_dump_on_oops console=ttyS0,115200N8 console=tty rtkthreads=1 rtworkqueues=2 resume=/dev/vda1 showopts panic=60 ignore_loglevel cgroup_disable=memory crashkernel=256M
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffd8fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bffd9000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000002463fffff] usable
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] random: fast init done
[    0.000000] SMBIOS 2.8 present.
[    0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    0.000000] Hypervisor detected: KVM
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] e820: last_pfn = 0x246400 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000C0000000 mask 3FFC0000000 uncachable
[    0.000000]   1 disabled
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[    0.000000] e820: last_pfn = 0xbffd9 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [mem 0x000f6a50-0x000f6a5f] mapped at [fffffffffd200a50]
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[    0.000000] Using GB pages for direct mapping
[    0.000000] BRK [0x02ded000, 0x02dedfff] PGTABLE
[    0.000000] BRK [0x02dee000, 0x02deefff] PGTABLE
[    0.000000] BRK [0x02def000, 0x02deffff] PGTABLE
[    0.000000] BRK [0x02df0000, 0x02df0fff] PGTABLE
[    0.000000] RAMDISK: [mem 0x34df3000-0x366f0fff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F6850 000014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 0x00000000BFFE14CF 00002C (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 0x00000000BFFE13AB 000074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 0x00000000BFFE0040 00136B (v01 BOCHS  BXPCDSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACS 0x00000000BFFE0000 000040
[    0.000000] ACPI: APIC 0x00000000BFFE141F 0000B0 (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x00000002463fffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x2463d4000-0x2463fdfff]
[    0.000000] Reserving 256MB of memory at 576MB for crashkernel (System RAM: 8291MB)
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 2:46354001, primary cpu clock
[    0.000000] kvm-clock: using sched offset of 43934323442 cycles
[    0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x00000002463fffff]
[    0.000000]   Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x00000000bffd8fff]
[    0.000000]   node   0: [mem 0x0000000100000000-0x00000002463fffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x00000002463fffff]
[    0.000000] On node 0 totalpages: 2122615
[    0.000000]   DMA zone: 64 pages used for memmap
[    0.000000]   DMA zone: 21 pages reserved
[    0.000000]   DMA zone: 3998 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 12224 pages used for memmap
[    0.000000]   DMA32 zone: 782297 pages, LIFO batch:31
[    0.000000]   Normal zone: 20880 pages used for memmap
[    0.000000]   Normal zone: 1336320 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0x608
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x22 (or later)
[    0.000000] smpboot: Allowing 8 CPUs, 0 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[    0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[    0.000000] PM: Registered nosave memory: [mem 0xbffd9000-0xbfffffff]
[    0.000000] PM: Registered nosave memory: [mem 0xc0000000-0xfeffbfff]
[    0.000000] PM: Registered nosave memory: [mem 0xfeffc000-0xfeffffff]
[    0.000000] PM: Registered nosave memory: [mem 0xff000000-0xfffbffff]
[    0.000000] PM: Registered nosave memory: [mem 0xfffc0000-0xffffffff]
[    0.000000] e820: [mem 0xc0000000-0xfeffbfff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:1
[    0.000000] percpu: Embedded 38 pages/cpu @ffff880246000000 s116632 r8192 d30824 u262144
[    0.000000] pcpu-alloc: s116632 r8192 d30824 u262144 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1 2 3 4 5 6 7 
[    0.000000] KVM setup async PF for cpu 0
[    0.000000] kvm-stealtime: cpu 0, msr 24600d940
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 2089426
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0.g44e89e4-tip-default root=UUID=6c5eef27-03f0-4d2a-882f-ca9900c1ed1e nortsched hpc_cpusets nr_cpus=8 skew_tick=1 audit=0 ftrace_dump_on_oops console=ttyS0,115200N8 console=tty rtkthreads=1 rtworkqueues=2 resume=/dev/vda1 showopts panic=60 ignore_loglevel cgroup_disable=memory crashkernel=256M
[    0.000000] audit: disabled (until reboot)
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 5120420K/8490460K available (7015K kernel code, 2230K rwdata, 3288K rodata, 2508K init, 10644K bss, 519524K reserved, 0K cma-reserved)
[    0.000000] ftrace: allocating 30458 entries in 119 pages
[    0.004000] Hierarchical RCU implementation.
[    0.004000] 	RCU event tracing is enabled.
[    0.004000] 	RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=8.
[    0.004000] 	Tasks RCU enabled.
[    0.004000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[    0.004000] NR_IRQS: 524544, nr_irqs: 488, preallocated irqs: 16
[    0.004000] 	Offload RCU callbacks from CPUs: .
[    0.004000] Console: colour dummy device 80x25
[    0.004000] console [tty0] enabled
[    0.004000] console [ttyS0] enabled
[    0.004007] tsc: Detected 3591.684 MHz processor
[    0.004580] Calibrating delay loop (skipped) preset value.. 7183.36 BogoMIPS (lpj=14366736)
[    0.004848] pid_max: default: 32768 minimum: 301
[    0.005370] ACPI: Core revision 20170531
[    0.006834] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    0.008108] Security Framework initialized
[    0.008572] AppArmor: AppArmor initialized
[    0.014186] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.017500] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.018390] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.019230] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.020423] Disabling memory control group subsystem
[    0.021052] CPU: Physical Processor ID: 0
[    0.021498] CPU: Processor Core ID: 0
[    0.021849] mce: CPU supports 10 MCE banks
[    0.022277] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.022905] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[    0.023635] Freeing SMP alternatives memory: 28K
[    0.024744] smpboot: Max logical packages: 2
[    0.025366] x2apic enabled
[    0.025776] Switched APIC routing to physical x2apic.
[    0.026831] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.028000] smpboot: CPU0: Intel Core Processor (Haswell, no TSX) (family: 0x6, model: 0x3c, stepping: 0x1)
[    0.028000] Performance Events: unsupported p6 CPU model 60 no PMU driver, software events only.
[    0.028000] Hierarchical SRCU implementation.
[    0.028000] smp: Bringing up secondary CPUs ...
[    0.028000] x86: Booting SMP configuration:
[    0.028000] .... node  #0, CPUs:      #1
[    0.004000] kvm-clock: cpu 1, msr 2:46354041, secondary cpu clock
[    0.029062] KVM setup async PF for cpu 1
[    0.029062] kvm-stealtime: cpu 1, msr 24604d940
[    0.029062]  #2
[    0.004000] kvm-clock: cpu 2, msr 2:46354081, secondary cpu clock
[    0.032078] KVM setup async PF for cpu 2
[    0.032857] kvm-stealtime: cpu 2, msr 24608d940
[    0.033471]  #3
[    0.004000] kvm-clock: cpu 3, msr 2:463540c1, secondary cpu clock
[    0.033471] KVM setup async PF for cpu 3
[    0.033471] kvm-stealtime: cpu 3, msr 2460cd940
[    0.036282]  #4
[    0.004000] kvm-clock: cpu 4, msr 2:46354101, secondary cpu clock
[    0.038146] KVM setup async PF for cpu 4
[    0.038146] kvm-stealtime: cpu 4, msr 24610d940
[    0.040171]  #5
[    0.004000] kvm-clock: cpu 5, msr 2:46354141, secondary cpu clock
[    0.041058] KVM setup async PF for cpu 5
[    0.041058] kvm-stealtime: cpu 5, msr 24614d940
[    0.041058]  #6
[    0.004000] kvm-clock: cpu 6, msr 2:46354181, secondary cpu clock
[    0.041443] KVM setup async PF for cpu 6
[    0.041443] kvm-stealtime: cpu 6, msr 24618d940
[    0.044201]  #7
[    0.004000] kvm-clock: cpu 7, msr 2:463541c1, secondary cpu clock
[    0.045213] KVM setup async PF for cpu 7
[    0.045213] kvm-stealtime: cpu 7, msr 2461cd940
[    0.045213] smp: Brought up 1 node, 8 CPUs
[    0.045213] smpboot: Total of 8 processors activated (57466.94 BogoMIPS)
[    0.064290] node 0 initialised, 712629 pages in 8ms
[    0.064910] devtmpfs: initialized
[    0.064910] x86/mm: Memory block size: 128MB
[    0.065460] evm: security.selinux
[    0.068003] evm: security.ima
[    0.068345] evm: security.capability
[    0.068754] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.068754] futex hash table entries: 2048 (order: 5, 131072 bytes)
[    0.069326] pinctrl core: initialized pinctrl subsystem
[    0.072002] RTC time:  4:46:01, date: 08/29/17
[    0.072002] NET: Registered protocol family 16
[    0.072025] cpuidle: using governor ladder
[    0.072745] cpuidle: using governor menu
[    0.073023] PCCT header not found.
[    0.074139] ACPI: bus type PCI registered
[    0.074736] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.076098] PCI: Using configuration type 1 for base access
[    0.080042] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[    0.080854] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.084103] ACPI: Added _OSI(Module Device)
[    0.084521] ACPI: Added _OSI(Processor Device)
[    0.085051] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.085526] ACPI: Added _OSI(Processor Aggregator Device)
[    0.088002] ACPI: Interpreter enabled
[    0.088002] ACPI: (supports S0 S5)
[    0.088002] ACPI: Using IOAPIC for interrupt routing
[    0.088002] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.092006] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.092006] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[    0.092009] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.096003] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    0.096271] acpiphp: Slot [3] registered
[    0.096800] acpiphp: Slot [4] registered
[    0.097325] acpiphp: Slot [5] registered
[    0.097890] acpiphp: Slot [7] registered
[    0.098403] acpiphp: Slot [8] registered
[    0.100030] acpiphp: Slot [9] registered
[    0.100730] acpiphp: Slot [10] registered
[    0.101506] acpiphp: Slot [11] registered
[    0.102325] acpiphp: Slot [12] registered
[    0.103020] acpiphp: Slot [13] registered
[    0.103750] acpiphp: Slot [14] registered
[    0.104023] acpiphp: Slot [15] registered
[    0.104554] acpiphp: Slot [16] registered
[    0.105120] acpiphp: Slot [17] registered
[    0.105683] acpiphp: Slot [18] registered
[    0.106213] acpiphp: Slot [19] registered
[    0.106770] acpiphp: Slot [20] registered
[    0.107187] acpiphp: Slot [21] registered
[    0.107644] acpiphp: Slot [22] registered
[    0.108017] acpiphp: Slot [23] registered
[    0.108564] acpiphp: Slot [24] registered
[    0.109246] acpiphp: Slot [25] registered
[    0.109928] acpiphp: Slot [26] registered
[    0.110662] acpiphp: Slot [27] registered
[    0.111339] acpiphp: Slot [28] registered
[    0.112024] acpiphp: Slot [29] registered
[    0.112565] acpiphp: Slot [30] registered
[    0.113232] acpiphp: Slot [31] registered
[    0.114244] PCI host bridge to bus 0000:00
[    0.115284] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.116004] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.117497] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.118410] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
[    0.120003] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.120928] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[    0.122424] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[    0.124464] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
[    0.131174] pci 0000:00:01.1: reg 0x20: [io  0xc240-0xc24f]
[    0.134614] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.136004] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.136796] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.137768] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.139062] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
[    0.140223] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by PIIX4 ACPI
[    0.141080] pci 0000:00:01.3: quirk: [io  0x0700-0x070f] claimed by PIIX4 SMB
[    0.141996] pci 0000:00:02.0: [1b36:0100] type 00 class 0x030000
[    0.220045] pci 0000:00:02.0: reg 0x10: [mem 0xf4000000-0xf7ffffff]
[    0.292044] pci 0000:00:02.0: reg 0x14: [mem 0xf8000000-0xfbffffff]
[    0.364046] pci 0000:00:02.0: reg 0x18: [mem 0xfc054000-0xfc055fff]
[    0.440063] pci 0000:00:02.0: reg 0x1c: [io  0xc140-0xc15f]
[    0.660045] pci 0000:00:02.0: reg 0x30: [mem 0xfc040000-0xfc04ffff pref]
[    0.660420] pci 0000:00:03.0: [10ec:8139] type 00 class 0x020000
[    0.662486] pci 0000:00:03.0: reg 0x10: [io  0xc000-0xc0ff]
[    0.664812] pci 0000:00:03.0: reg 0x14: [mem 0xfc056000-0xfc0560ff]
[    0.674813] pci 0000:00:03.0: reg 0x30: [mem 0xfc000000-0xfc03ffff pref]
[    0.675766] pci 0000:00:04.0: [8086:2668] type 00 class 0x040300
[    0.676865] pci 0000:00:04.0: reg 0x10: [mem 0xfc050000-0xfc053fff]
[    0.684239] pci 0000:00:05.0: [1af4:1003] type 00 class 0x078000
[    0.686377] pci 0000:00:05.0: reg 0x10: [io  0xc160-0xc17f]
[    0.688733] pci 0000:00:05.0: reg 0x14: [mem 0xfc057000-0xfc057fff]
[    0.697736] pci 0000:00:06.0: [8086:2934] type 00 class 0x0c0300
[    0.702725] pci 0000:00:06.0: reg 0x20: [io  0xc180-0xc19f]
[    0.706417] pci 0000:00:06.1: [8086:2935] type 00 class 0x0c0300
[    0.711077] pci 0000:00:06.1: reg 0x20: [io  0xc1a0-0xc1bf]
[    0.713276] pci 0000:00:06.2: [8086:2936] type 00 class 0x0c0300
[    0.718925] pci 0000:00:06.2: reg 0x20: [io  0xc1c0-0xc1df]
[    0.721501] pci 0000:00:06.7: [8086:293a] type 00 class 0x0c0320
[    0.724003] pci 0000:00:06.7: reg 0x10: [mem 0xfc058000-0xfc058fff]
[    0.729964] pci 0000:00:07.0: [8086:2922] type 00 class 0x010601
[    0.738781] pci 0000:00:07.0: reg 0x20: [io  0xc1e0-0xc1ff]
[    0.741489] pci 0000:00:07.0: reg 0x24: [mem 0xfc059000-0xfc059fff]
[    0.744263] pci 0000:00:08.0: [1af4:1001] type 00 class 0x010000
[    0.752004] pci 0000:00:08.0: reg 0x10: [io  0xc100-0xc13f]
[    0.764004] pci 0000:00:08.0: reg 0x14: [mem 0xfc05a000-0xfc05afff]
[    0.808256] pci 0000:00:09.0: [1af4:1002] type 00 class 0x00ff00
[    0.809619] pci 0000:00:09.0: reg 0x10: [io  0xc200-0xc21f]
[    0.816995] pci 0000:00:0a.0: [1af4:1005] type 00 class 0x00ff00
[    0.818311] pci 0000:00:0a.0: reg 0x10: [io  0xc220-0xc23f]
[    0.824197] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.824951] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.825617] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.827211] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.827862] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
[    0.828571] ACPI: Enabled 2 GPEs in block 00 to 0F
[    0.829373] pci 0000:00:02.0: vgaarb: setting as boot VGA device
[    0.829373] pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.832153] pci 0000:00:02.0: vgaarb: bridge control possible
[    0.832153] vgaarb: loaded
[    0.832153] EDAC MC: Ver: 3.0.0
[    0.832632] PCI: Using ACPI for IRQ routing
[    0.832632] PCI: pci_cache_line_size set to 64 bytes
[    0.833081] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
[    0.833606] e820: reserve RAM buffer [mem 0xbffd9000-0xbfffffff]
[    0.834056] e820: reserve RAM buffer [mem 0x246400000-0x247ffffff]
[    0.834752] NetLabel: Initializing
[    0.836002] NetLabel:  domain hash size = 128
[    0.836434] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    0.836988] NetLabel:  unlabeled traffic allowed by default
[    0.837600] clocksource: Switched to clocksource kvm-clock
[    0.850790] VFS: Disk quotas dquot_6.6.0
[    0.851198] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.851995] AppArmor: AppArmor Filesystem Enabled
[    0.852481] pnp: PnP ACPI init
[    0.852834] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.853496] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active)
[    0.854231] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active)
[    0.855341] pnp 00:03: [dma 2]
[    0.855885] pnp 00:03: Plug and Play ACPI device, IDs PNP0700 (active)
[    0.856584] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (active)
[    0.857539] pnp: PnP ACPI: found 5 devices
[    0.864116] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.865102] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7 window]
[    0.865636] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff window]
[    0.866907] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[    0.868366] pci_bus 0000:00: resource 7 [mem 0xc0000000-0xfebfffff window]
[    0.869743] NET: Registered protocol family 2
[    0.870774] TCP established hash table entries: 65536 (order: 7, 524288 bytes)
[    0.871826] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.873018] TCP: Hash tables configured (established 65536 bind 65536)
[    0.874001] UDP hash table entries: 4096 (order: 5, 131072 bytes)
[    0.874874] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
[    0.875927] NET: Registered protocol family 1
[    0.876549] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.877214] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.877987] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.878844] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    0.903397] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
[    0.948475] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[    0.993466] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
[    1.038314] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[    1.062027] PCI: CLS 0 bytes, default 64
[    1.062545] Unpacking initramfs...
[    2.834009] Freeing initrd memory: 25592K
[    2.834425] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    2.835061] software IO TLB [mem 0xbbfd9000-0xbffd9000] (64MB) mapped at [ffff8800bbfd9000-ffff8800bffd8fff]
[    2.837080] Scanning for low memory corruption every 60 seconds
[    2.838302] Initialise system trusted keyrings
[    2.838786] Key type blacklist registered
[    2.839323] workingset: timestamp_bits=36 max_order=21 bucket_order=0
[    2.840126] zbud: loaded
[    2.948413] Key type asymmetric registered
[    2.948811] Asymmetric key parser 'x509' registered
[    2.949378] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249)
[    2.950645] io scheduler noop registered
[    2.951039] io scheduler deadline registered
[    2.951651] io scheduler cfq registered (default)
[    2.952302] io scheduler mq-deadline registered
[    2.952905] io scheduler kyber registered
[    2.953455] io scheduler bfq registered
[    2.954180] vesafb: mode is 1024x768x32, linelength=4096, pages=0
[    2.954922] vesafb: scrolling: redraw
[    2.955348] vesafb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[    2.956030] vesafb: framebuffer at 0xf4000000, mapped to 0xffffc90001800000, using 3072k, total 3072k
[    2.958913] Console: switching to colour frame buffer device 128x48
[    2.960700] fb0: VESA VGA frame buffer device
[    2.961326] intel_idle: does not run on family 6 model 60
[    2.962315] GHES: HEST is not enabled!
[    2.962894] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    2.985629] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    2.988273] Non-volatile memory driver v1.3
[    2.988842] Linux agpgart interface v0.103
[    2.990074] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    2.992368] serio: i8042 KBD port at 0x60,0x64 irq 1
[    2.993058] serio: i8042 AUX port at 0x60,0x64 irq 12
[    2.994011] mousedev: PS/2 mouse device common for all mice
[    2.994864] rtc_cmos 00:00: RTC can wake from S4
[    2.995800] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
[    2.997454] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram
[    2.999299] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    3.000553] ledtrig-cpu: registered to indicate activity on CPUs
[    3.001597] hidraw: raw HID events driver (C) Jiri Kosina
[    3.004054] NET: Registered protocol family 10
[    3.012148] Segment Routing with IPv6
[    3.013116] sched_clock: Marking stable (3013055919, 0)->(3458256594, -445200675)
[    3.019865] registered taskstats version 1
[    3.020832] Loading compiled-in X.509 certificates
[    3.052780] Loaded X.509 cert 'Build time autogenerated kernel key: 89af5e258045113417ab7b7a442dad2aa89ea573'
[    3.054831] zswap: loaded using pool lzo/zbud
[    3.059297] page_owner is disabled
[    3.060209] Key type trusted registered
[    3.062528] Key type encrypted registered
[    3.063749] AppArmor: AppArmor sha1 policy hashing enabled
[    3.066707] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
[    3.069300] evm: HMAC attrs: 0x1
[    3.070741]   Magic number: 13:848:768
[    3.071841] PM: Checking hibernation image partition /dev/vda1
[    3.840092] tsc: Refined TSC clocksource calibration: 3591.682 MHz
[    3.841654] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c5a15be52, max_idle_ns: 440795280454 ns
[    4.060759] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
[    4.063024] PM: Hibernation image not present or could not be loaded.
[    4.066410] Freeing unused kernel memory: 2508K
[    4.067835] Write protecting the kernel read-only data: 12288k
[    4.070492] Freeing unused kernel memory: 1152K
[    4.073084] Freeing unused kernel memory: 808K
[    4.076158] autofs4: module verification failed: signature and/or required key missing - tainting kernel
[    4.080854] systemd[1]: systemd 228 running in system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN)
[    4.084205] systemd[1]: Detected virtualization kvm.
[    4.085443] systemd[1]: Detected architecture x86-64.
[    4.086715] systemd[1]: Running in initial RAM disk.
[    4.091632] systemd[1]: Set hostname to <vbox>.
[    4.105048] ------------[ cut here ]------------
[    4.106072] WARNING: CPU: 4 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
[    4.107969] Modules linked in: autofs4(E)
[    4.109328] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G            E   4.13.0.g44e89e4-tip-default #27
[    4.111075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    4.114119] task: ffff88018ee743c0 task.stack: ffffc90000cc0000
[    4.115481] RIP: 0010:netlink_sock_destruct+0x82/0xa0
[    4.116698] RSP: 0018:ffff880246103eb0 EFLAGS: 00010206
[    4.117997] RAX: 0000000000000300 RBX: ffff880236f1f000 RCX: 000077ff80000000
[    4.120657] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
[    4.123145] RBP: ffff880236f1f000 R08: 000400010000b630 R09: 0000b6290000b621
[    4.125139] R10: 000400010000b630 R11: 0000b6290000b621 R12: 0000000000000202
[    4.126866] R13: ffffffff81cf1440 R14: ffff88018ee743c0 R15: ffffffff815e0fd0
[    4.128731] FS:  0000000000000000(0000) GS:ffff880246100000(0000) knlGS:0000000000000000
[    4.130206] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.131581] CR2: 000055c7ab255df0 CR3: 0000000236fd9001 CR4: 00000000001606e0
[    4.133066] Call Trace:
[    4.133919]  <IRQ>
[    4.134836]  __sk_destruct+0x21/0x190
[    4.136016]  rcu_process_callbacks+0x23e/0x880
[    4.137050]  ? rebalance_domains+0x182/0x2b0
[    4.138050]  __do_softirq+0xc8/0x287
[    4.139174]  irq_exit+0xd5/0xe0
[    4.140252]  smp_apic_timer_interrupt+0x64/0x140
[    4.141880]  apic_timer_interrupt+0x96/0xa0
[    4.143290]  </IRQ>
[    4.144214] RIP: 0010:native_safe_halt+0x2/0x10
[    4.145990] RSP: 0018:ffffc90000cc3ed8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[    4.147449] RAX: ffffffff816d4820 RBX: ffff88018ee743c0 RCX: 0000000000000000
[    4.148626] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    4.151061] RBP: 0000000000000004 R08: 000000008e8d302a R09: 0000000000000000
[    4.153687] R10: 0000000000000006 R11: 0000000000000005 R12: ffff88018ee743c0
[    4.155587] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    4.157462]  ? __sched_text_end+0x5/0x5
[    4.158869]  default_idle+0x18/0x110
[    4.160348]  do_idle+0x15e/0x1f0
[    4.161734]  cpu_startup_entry+0x5f/0x70
[    4.163211]  start_secondary+0x14c/0x180
[    4.164716]  secondary_startup_64+0xa5/0xa5
[    4.165730] Code: 00 00 85 c0 75 25 8b 83 44 01 00 00 85 c0 75 10 48 83 bb e0 02 00 00 00 75 02 5b c3 0f ff 5b c3 0f ff 0f 1f 80 00 00 00 00 eb e5 <0f> ff eb d7 48 89 de 48 c7 c7 e0 e6 ab 81 31 c0 5b e9 25 ca af 
[    4.168787] ---[ end trace 79aa32f0718d3fc7 ]---
[    4.178947] systemd[1]: Created slice System Slice.
[    4.180743] systemd[1]: Listening on udev Control Socket.
[    4.183118] systemd[1]: Listening on Journal Socket (/dev/log).
[    4.184775] systemd[1]: Listening on udev Kernel Socket.
[    4.186355] systemd[1]: Reached target Swap.
[    4.196794] SCSI subsystem initialized
[    4.200879] alua: device handler registered
[    4.217790] ------------[ cut here ]------------

[-- Attachment #2: config.xz --]
[-- Type: application/x-xz, Size: 25204 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29 16:55   ` Mike Galbraith
@ 2017-08-29 18:10     ` Mike Galbraith
  2017-08-29 18:41       ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-08-29 18:10 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Tue, 2017-08-29 at 18:55 +0200, Mike Galbraith wrote:
> On Tue, 2017-08-29 at 08:58 -0700, Kees Cook wrote:
> > 
> > Ah-ha, found the tip-bot commit now that disables the x86 refcount
> > implementation. Can you boot with CONFIG_REFCOUNT_FULL=y?
> 
> Will do in the A.M.

(It's A.M. somewhere..)  That boots fine.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29 18:10     ` Mike Galbraith
@ 2017-08-29 18:41       ` Kees Cook
  2017-08-30  5:02         ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-29 18:41 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Tue, Aug 29, 2017 at 11:10 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Tue, 2017-08-29 at 18:55 +0200, Mike Galbraith wrote:
>> On Tue, 2017-08-29 at 08:58 -0700, Kees Cook wrote:
>> >
>> > Ah-ha, found the tip-bot commit now that disables the x86 refcount
>> > implementation. Can you boot with CONFIG_REFCOUNT_FULL=y?
>>
>> Will do in the A.M.
>
> (It's A.M. somewhere..)  That boots fine.

Okay, thanks! I think we've seen this before, but couldn't reproduce
it. The issue is:

static void netlink_sock_destruct(struct sock *sk)
{
...
        WARN_ON(refcount_read(&sk->sk_wmem_alloc));
...
}

Can you also test with 14afee4b6092 ("net: convert sock.sk_wmem_alloc
from atomic_t to refcount_t") reverted (instead of ARCH_HAS_REFCOUNT
disabled)?

I'll try again to reproduce this...

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-29 18:41       ` Kees Cook
@ 2017-08-30  5:02         ` Mike Galbraith
  2017-08-30 16:35           ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-08-30  5:02 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Tue, 2017-08-29 at 11:41 -0700, Kees Cook wrote:
> Can you also test with 14afee4b6092 ("net: convert sock.sk_wmem_alloc
> from atomic_t to refcount_t") reverted (instead of ARCH_HAS_REFCOUNT
> disabled)?

Nogo.

...
[^[[0;32m  OK  ^[[0m] Mounted /abuild.
[^[[0;32m  OK  ^[[0m] Mounted /homer.
[^[[0;32m  OK  ^[[0m] Mounted /home/git.
[^[[0;32m  OK  ^[[0m] Mounted /usr/local/src.
[^[[0;32m  OK  ^[[0m] Mounted /usr/local/gcc.
[^[[0;32m  OK  ^[[0m] Mounted /usr/local/lib/albumcovers.
[^[[0;32m  OK  ^[[0m] Mounted /usr/local/lib/mp3.
[^[[0;32m  OK  ^[[0m] Mounted /usr/local/ltp.
[^[[0;32m  OK  ^[[0m] Started SuSEfirewall2 phase 2.
         Starting Locale Service...
[   44.901304] ------------[ cut here ]------------
[   44.901930] WARNING: CPU: 5 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
[   44.902679] Modules linked in: nf_log_ipv6(E) xt_comment(E) nf_log_ipv4(E) rpcsec_gss_krb5(E) nfsv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) dns_resolver(E) nfs(E) fscache(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ipt_REJECT(E) xt_pkttype(E) xt_tcpudp(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) joydev(E) snd(E) ppdev(E) soundcore(E) parport_pc(E) crct10dif_pclmul(E) 8139too(E) crc32_pclmul(E) ghash_clmulni_intel(E) 8139cp(E) pcbc(E) aesni_intel(E)
[   44.908297]  i2c_piix4(E) mii(E) parport(E) aes_x86_64(E) crypto_simd(E) glue_helper(E) pcspkr(E) cryptd(E) button(E) qemu_fw_cfg(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) hid_generic(E) usbhid(E) ata_piix(E) virtio_balloon(E) virtio_rng(E) virtio_blk(E) virtio_console(E) qxl(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ttm(E) libahci(E) uhci_hcd(E) ehci_pci(E) crc32c_intel(E) ehci_hcd(E) libata(E) serio_raw(E) virtio_pci(E) virtio_ring(E) drm(E) usbcore(E) virtio(E) floppy(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
[   44.913190] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W   E   4.13.0.g94a2d62-tip-default #45
[   44.915217] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[   44.916073] task: ffff88018ee78400 task.stack: ffffc90000cc8000
[   44.916742] RIP: 0010:netlink_sock_destruct+0x82/0xa0
[   44.917291] RSP: 0018:ffff880246143eb0 EFLAGS: 00010206
[   44.917818] RAX: 0000000000000300 RBX: ffff8802370a6800 RCX: 000077ff80000000
[   44.918497] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
[   44.919093] RBP: ffff8802370a6800 R08: 0000000000000001 R09: 0000000000011000
[   44.919826] R10: 0000000000001104 R11: 0000000000011bf8 R12: 0000000000000202
[   44.920486] R13: ffffffff81cf1440 R14: ffff88018ee78400 R15: ffffffff815e0f30
[   44.921138] FS:  0000000000000000(0000) GS:ffff880246140000(0000) knlGS:0000000000000000
[   44.921861] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   44.922419] CR2: 0000563972a82768 CR3: 000000011d25a005 CR4: 00000000001606e0
[   44.923079] Call Trace:
[   44.923503]  <IRQ>
[   44.923786]  __sk_destruct+0x21/0x190
[   44.924160]  rcu_process_callbacks+0x23e/0x880
[   44.924583]  ? rebalance_domains+0xf4/0x2b0
[   44.925147]  __do_softirq+0xc8/0x287
[   44.925620]  irq_exit+0xd5/0xe0
[   44.926072]  smp_apic_timer_interrupt+0x64/0x140
[   44.926602]  apic_timer_interrupt+0x96/0xa0
[   44.927138]  </IRQ>
[   44.927512] RIP: 0010:native_safe_halt+0x2/0x10
[   44.928104] RSP: 0018:ffffc90000ccbed8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[   44.928897] RAX: ffffffff816d46e0 RBX: ffff88018ee78400 RCX: 0000000000000000
[   44.931217] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   44.932064] RBP: 0000000000000005 R08: 000000008e8d32c4 R09: 0000000000000000
[   44.932757] R10: 0000000000000006 R11: 0000000000000005 R12: ffff88018ee78400
[   44.933423] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   44.934187]  ? __sched_text_end+0x5/0x5
[   44.934609]  default_idle+0x18/0x110
[   44.935041]  do_idle+0x15e/0x1f0
[   44.935455]  cpu_startup_entry+0x5f/0x70
[   44.935934]  start_secondary+0x14c/0x180
[   44.936445]  secondary_startup_64+0xa5/0xa5
[   44.936933] Code: 00 00 85 c0 75 25 8b 83 44 01 00 00 85 c0 75 10 48 83 bb e0 02 00 00 00 75 02 5b c3 0f ff 5b c3 0f ff 0f 1f 80 00 00 00 00 eb e5 <0f> ff eb d7 48 89 de 48 c7 c7 d8 e6 ab 81 31 c0 5b e9 d5 ca af 
[   44.938829] ---[ end trace bcf2d20b852804b6 ]---
[^[[0;32m  OK  ^[[0m] Started Locale Service.
[^[[0;32m  OK  ^[[0m] Started Postfix Mail Transport Agent.
[^[[0;32m  OK  ^[[0m] Started Command Scheduler.
[^[[0;32m  OK  ^[[0m] Started X Display Manager.
[^[[0;32m  OK  ^[[0m] Started Virtualization daemon.
[^[[0;32m  OK  ^[[0m] Reached target Multi-User System.
[^[[0;32m  OK  ^[[0m] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[^[[0;32m  OK  ^[[0m] Started Update UTMP about System Runlevel Changes.
...zzzzzz

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-30  5:02         ` Mike Galbraith
@ 2017-08-30 16:35           ` Kees Cook
  2017-08-30 17:13             ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-30 16:35 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Tue, Aug 29, 2017 at 10:02 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Tue, 2017-08-29 at 11:41 -0700, Kees Cook wrote:
>> Can you also test with 14afee4b6092 ("net: convert sock.sk_wmem_alloc
>> from atomic_t to refcount_t") reverted (instead of ARCH_HAS_REFCOUNT
>> disabled)?
>
> Nogo.

Thanks for checking!

> [   44.901930] WARNING: CPU: 5 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0

This is so odd if 14afee4b6092 is reverted. What is line 374 for you
in net/netlink/af_netlink.c?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-30 16:35           ` Kees Cook
@ 2017-08-30 17:13             ` Mike Galbraith
  2017-08-30 17:32               ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-08-30 17:13 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Wed, 2017-08-30 at 09:35 -0700, Kees Cook wrote:
> On Tue, Aug 29, 2017 at 10:02 PM, Mike Galbraith <efault@gmx.de> wrote:
> > On Tue, 2017-08-29 at 11:41 -0700, Kees Cook wrote:
> >> Can you also test with 14afee4b6092 ("net: convert sock.sk_wmem_alloc
> >> from atomic_t to refcount_t") reverted (instead of ARCH_HAS_REFCOUNT
> >> disabled)?
> >
> > Nogo.
> 
> Thanks for checking!
> 
> > [   44.901930] WARNING: CPU: 5 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
> 
> This is so odd if 14afee4b6092 is reverted. What is line 374 for you
> in net/netlink/af_netlink.c?

374         WARN_ON(atomic_read(&sk->sk_rmem_alloc));

That line is unchanged by 14afee4b6092.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-30 17:13             ` Mike Galbraith
@ 2017-08-30 17:32               ` Kees Cook
  2017-08-30 17:55                 ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-30 17:32 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Wed, Aug 30, 2017 at 10:13 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Wed, 2017-08-30 at 09:35 -0700, Kees Cook wrote:
>> On Tue, Aug 29, 2017 at 10:02 PM, Mike Galbraith <efault@gmx.de> wrote:
>> > On Tue, 2017-08-29 at 11:41 -0700, Kees Cook wrote:
>> >> Can you also test with 14afee4b6092 ("net: convert sock.sk_wmem_alloc
>> >> from atomic_t to refcount_t") reverted (instead of ARCH_HAS_REFCOUNT
>> >> disabled)?
>> >
>> > Nogo.
>>
>> Thanks for checking!
>>
>> > [   44.901930] WARNING: CPU: 5 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
>>
>> This is so odd if 14afee4b6092 is reverted. What is line 374 for you
>> in net/netlink/af_netlink.c?
>
> 374         WARN_ON(atomic_read(&sk->sk_rmem_alloc));
>
> That line is unchanged by 14afee4b6092.

Uuuuhmm. Wow, now I'm really baffled. I thought you were getting the
warn from the next line with the refcount usage... I will keep
digging. Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-30 17:32               ` Kees Cook
@ 2017-08-30 17:55                 ` Mike Galbraith
  2017-08-30 19:19                   ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-08-30 17:55 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Wed, 2017-08-30 at 10:32 -0700, Kees Cook wrote:
> On Wed, Aug 30, 2017 at 10:13 AM, Mike Galbraith <efault@gmx.de> wrote:
> > On Wed, 2017-08-30 at 09:35 -0700, Kees Cook wrote:
> >> On Tue, Aug 29, 2017 at 10:02 PM, Mike Galbraith <efault@gmx.de> wrote:
> >> > On Tue, 2017-08-29 at 11:41 -0700, Kees Cook wrote:
> >> >> Can you also test with 14afee4b6092 ("net: convert sock.sk_wmem_alloc
> >> >> from atomic_t to refcount_t") reverted (instead of ARCH_HAS_REFCOUNT
> >> >> disabled)?
> >> >
> >> > Nogo.
> >>
> >> Thanks for checking!
> >>
> >> > [   44.901930] WARNING: CPU: 5 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
> >>
> >> This is so odd if 14afee4b6092 is reverted. What is line 374 for you
> >> in net/netlink/af_netlink.c?
> >
> > 374         WARN_ON(atomic_read(&sk->sk_rmem_alloc));
> >
> > That line is unchanged by 14afee4b6092.
> 
> Uuuuhmm. Wow, now I'm really baffled. I thought you were getting the
> warn from the next line with the refcount usage... I will keep
> digging. Thanks!

I just double checked freshly pulled tip (rapidly moving target), and
it's definitely nogo with CONFIG_ARCH_HAS_REFCOUNT=y and 14afee4b6092
reverted.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-30 17:55                 ` Mike Galbraith
@ 2017-08-30 19:19                   ` Kees Cook
  2017-08-30 19:46                     ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-30 19:19 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Wed, Aug 30, 2017 at 10:55 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Wed, 2017-08-30 at 10:32 -0700, Kees Cook wrote:
>> On Wed, Aug 30, 2017 at 10:13 AM, Mike Galbraith <efault@gmx.de> wrote:
>> > On Wed, 2017-08-30 at 09:35 -0700, Kees Cook wrote:
>> >> On Tue, Aug 29, 2017 at 10:02 PM, Mike Galbraith <efault@gmx.de> wrote:
>> >> > On Tue, 2017-08-29 at 11:41 -0700, Kees Cook wrote:
>> >> >> Can you also test with 14afee4b6092 ("net: convert sock.sk_wmem_alloc
>> >> >> from atomic_t to refcount_t") reverted (instead of ARCH_HAS_REFCOUNT
>> >> >> disabled)?
>> >> >
>> >> > Nogo.
>> >>
>> >> Thanks for checking!
>> >>
>> >> > [   44.901930] WARNING: CPU: 5 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
>> >>
>> >> This is so odd if 14afee4b6092 is reverted. What is line 374 for you
>> >> in net/netlink/af_netlink.c?
>> >
>> > 374         WARN_ON(atomic_read(&sk->sk_rmem_alloc));
>> >
>> > That line is unchanged by 14afee4b6092.
>>
>> Uuuuhmm. Wow, now I'm really baffled. I thought you were getting the
>> warn from the next line with the refcount usage... I will keep
>> digging. Thanks!
>
> I just double checked freshly pulled tip (rapidly moving target), and
> it's definitely nogo with CONFIG_ARCH_HAS_REFCOUNT=y and 14afee4b6092
> reverted.

Okay, thanks for double-checking. I'd be curious to see if it would
help to revert any of these too:

fb5c2c17a556 net: convert packet_fanout.sk_ref from atomic_t to refcount_t
b4217b82893c net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t
c122e14df2d6 net: convert net.passive from atomic_t to refcount_t
edcb691871b2 net: convert inet_frag_queue.refcnt from atomic_t to refcount_t
717d1e993ad8 net: convert fib_rule.refcnt from atomic_t to refcount_t
8c9814b97002 net: convert unix_address.refcnt from atomic_t to refcount_t
433cea4d9bbb net: convert netpoll_info.refcnt from atomic_t to refcount_t
7658b36f1b31 net: convert in_device.refcnt from atomic_t to refcount_t
8851ab526791 net: convert ip_mc_list.refcnt from atomic_t to refcount_t
41c6d650f653 net: convert sock.sk_refcnt from atomic_t to refcount_t
14afee4b6092 net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
2638595afccf net: convert sk_buff_fclones.fclone_ref from atomic_t to refcount_t
633547973ffc net: convert sk_buff.users from atomic_t to refcount_t
53869cebce4b net: convert nf_bridge_info.use from atomic_t to refcount_t
6343944bc105 net: convert neigh_params.refcnt from atomic_t to refcount_t
9f23743017d1 net: convert neighbour.refcnt from atomic_t to refcount_t
1cc9a98b59ba net: convert inet_peer.refcnt from atomic_t to refcount_t

What I find so strange about this is that it's tripping a warning in
an atomic_t state, and works fine with FULL or "none". x86-refcount
has almost identical behavior to "none" (though obviously, this seems
like those differences must be the source of the bug), but FULL is
much more aggressive. So some silent x86-refcount issue is causing a
side-effect in af_netlink... Something very subtle is happening here.
(So subtle I can't reproduce it yet, sadly...)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-30 19:19                   ` Kees Cook
@ 2017-08-30 19:46                     ` Kees Cook
  2017-08-31  2:09                       ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-30 19:46 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Wed, Aug 30, 2017 at 12:19 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Aug 30, 2017 at 10:55 AM, Mike Galbraith <efault@gmx.de> wrote:
>> On Wed, 2017-08-30 at 10:32 -0700, Kees Cook wrote:
>>> On Wed, Aug 30, 2017 at 10:13 AM, Mike Galbraith <efault@gmx.de> wrote:
>>> > On Wed, 2017-08-30 at 09:35 -0700, Kees Cook wrote:
>>> >> On Tue, Aug 29, 2017 at 10:02 PM, Mike Galbraith <efault@gmx.de> wrote:
>>> >> > On Tue, 2017-08-29 at 11:41 -0700, Kees Cook wrote:
>>> >> >> Can you also test with 14afee4b6092 ("net: convert sock.sk_wmem_alloc
>>> >> >> from atomic_t to refcount_t") reverted (instead of ARCH_HAS_REFCOUNT
>>> >> >> disabled)?
>>> >> >
>>> >> > Nogo.
>>> >>
>>> >> Thanks for checking!
>>> >>
>>> >> > [   44.901930] WARNING: CPU: 5 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
>>> >>
>>> >> This is so odd if 14afee4b6092 is reverted. What is line 374 for you
>>> >> in net/netlink/af_netlink.c?
>>> >
>>> > 374         WARN_ON(atomic_read(&sk->sk_rmem_alloc));
>>> >
>>> > That line is unchanged by 14afee4b6092.
>>>
>>> Uuuuhmm. Wow, now I'm really baffled. I thought you were getting the
>>> warn from the next line with the refcount usage... I will keep
>>> digging. Thanks!
>>
>> I just double checked freshly pulled tip (rapidly moving target), and
>> it's definitely nogo with CONFIG_ARCH_HAS_REFCOUNT=y and 14afee4b6092
>> reverted.

With CONFIG_ARCH_HAS_REFCOUNT=y and this patch, do you get an earlier splat?

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index c076f710de4c..569d97a4c3e8 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -72,6 +72,8 @@ bool ex_handler_refcount(const struct
exception_table_entry *fixup,
                bool zero = regs->flags & X86_EFLAGS_ZF;

                refcount_error_report(regs, zero ? "hit zero" : "overflow");
+       } else {
+               refcount_error_report(regs, "silent saturation");
        }

        return true;

The difference between none, FULL, and x86-refcount is the saturation
semantics. none has, obviously, none, and FULL pins values either to
zero or UINT_MAX, depending on various things. x86-refcount saturates
to INT_MIN / 2, but it should only get there after overflow or
unexpected zeroing (both cases would WARN). So if there's some other
path to hitting saturation that could put some refcount_t into a state
that is different from none and FULL, and maybe that side-effect
results in the af_netlink WARN. (I can't see _how_ yet, though.)

So if the above patch does NOT throw a new WARN, then that'll be even weirder.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-30 19:46                     ` Kees Cook
@ 2017-08-31  2:09                       ` Mike Galbraith
  2017-08-31  2:27                         ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-08-31  2:09 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Wed, 2017-08-30 at 12:46 -0700, Kees Cook wrote:
> 
> With CONFIG_ARCH_HAS_REFCOUNT=y and this patch, do you get an earlier splat?

Yup, first gripe below.

[    2.448393] refcount_t silent saturation at skb_unref.part.36+0x12/0x1a in (haveged)[136], uid/euid: 0/0
[    2.454975] ------------[ cut here ]------------
[    2.456452] WARNING: CPU: 1 PID: 136 at kernel/panic.c:612 refcount_error_report+0xa0/0xa4
[    2.456452] Modules linked in: scsi_mod(E+) autofs4(E)
[    2.456456] CPU: 1 PID: 136 Comm: (haveged) Tainted: G            E   4.13.0.g152d54a-tip-default #47
[    2.456456] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    2.456457] task: ffff880137f65640 task.stack: ffffc9000093c000
[    2.456458] RIP: 0010:refcount_error_report+0xa0/0xa4
[    2.456459] RSP: 0018:ffffc9000093fa38 EFLAGS: 00010282
[    2.456459] RAX: 000000000000005c RBX: ffffffff81a33864 RCX: 0000000000000830
[    2.456459] RDX: 0000000000000001 RSI: 0000000000000082 RDI: 0000000000000246
[    2.456460] RBP: ffffc9000093fb88 R08: 000000008e8d4f61 R09: 0000000000000000
[    2.456462] R10: 0000000000000000 R11: ffff880137f61600 R12: ffff880137f65640
[    2.456463] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000006
[    2.456464] FS:  00007f5a33a14880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
[    2.456464] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.456464] CR2: 00007ffe37ca0c88 CR3: 0000000137f74005 CR4: 00000000001606e0
[    2.456467] Call Trace:
[    2.456478]  ex_handler_refcount+0x63/0x70
[    2.456479]  fixup_exception+0x32/0x40
[    2.456482]  do_trap+0x11b/0x180
[    2.456493]  do_error_trap+0x70/0xd0
[    2.456497]  ? skb_unref.part.36+0x10/0x1a
[    2.456502]  ? idr_get_free+0xcb/0x2f0
[    2.456505]  invalid_op+0x1e/0x30
[    2.456510] RIP: 0010:skb_unref.part.36+0x12/0x1a
[    2.456510] RSP: 0018:ffffc9000093fc30 EFLAGS: 00010202
[    2.456510] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffff880137df9ee4
[    2.456511] RDX: ffff880137df0518 RSI: ffff880137df9e00 RDI: ffff880137df9e00
[    2.456511] RBP: ffff880137df9e00 R08: ffff880137e30018 R09: ffffffff816b91f0
[    2.456511] R10: 0000000000000017 R11: 00000000fffffff4 R12: 0000000000000000
[    2.456512] R13: ffff88013a4e9000 R14: 0000000000000000 R15: ffff880137e30000
[    2.456513]  ? cleanup_uevent_env+0x10/0x10
[    2.456520]  kfree_skb+0x3f/0xa0
[    2.456525]  netlink_broadcast_filtered+0x2c8/0x420
[    2.456530]  ? cleanup_uevent_env+0x10/0x10
[    2.456531]  kobject_uevent_env+0x476/0x650
[    2.456540]  device_add+0x41e/0x5f0
[    2.456550]  netdev_register_kobject+0x8e/0x170
[    2.456553]  register_netdevice+0x27a/0x3d0
[    2.456557]  register_netdev+0x16/0x30
[    2.456562]  loopback_net_init+0x48/0xa0
[    2.456567]  ops_init+0x39/0x110
[    2.456568]  setup_net+0x85/0x130
[    2.456569]  copy_net_ns+0xb9/0x1f0
[    2.456571]  create_new_namespaces+0x11a/0x1b0
[    2.456577]  unshare_nsproxy_namespaces+0x55/0xa0
[    2.456578]  SyS_unshare+0x18d/0x330
[    2.456581]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[    2.456584] RIP: 0033:0x7f5a320669f7
[    2.456585] RSP: 002b:00007ffcd1e10908 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
[    2.456585] RAX: ffffffffffffffda RBX: 000000000159ae48 RCX: 00007f5a320669f7
[    2.456586] RDX: 000000000000000b RSI: 00007ffcd1e10910 RDI: 0000000040000000
[    2.456588] RBP: 00007ffcd1e10cf0 R08: 0000000000000001 R09: 00000000015b8d20
[    2.456588] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000
[    2.456588] R13: 00000000ffffffff R14: 00000000ffffffff R15: 0000000000000000
[    2.456589] Code: 10 09 00 00 48 8b 95 80 00 00 00 49 8d 8c 24 f0 0a 00 00 41 89 c1 44 89 2c 24 48 89 de 48 c7 c7 80 70 a3 81 31 c0 e8 bd f1 05 00 <0f> ff eb 88 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89 
[    2.456599] ---[ end trace d3215cc8334d8520 ]---

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31  2:09                       ` Mike Galbraith
@ 2017-08-31  2:27                         ` Kees Cook
  2017-08-31  3:12                           ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-31  2:27 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Wed, Aug 30, 2017 at 7:09 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Wed, 2017-08-30 at 12:46 -0700, Kees Cook wrote:
>>
>> With CONFIG_ARCH_HAS_REFCOUNT=y and this patch, do you get an earlier splat?
>
> Yup, first gripe below.
>
> [    2.448393] refcount_t silent saturation at skb_unref.part.36+0x12/0x1a in (haveged)[136], uid/euid: 0/0
> [    2.454975] ------------[ cut here ]------------
> [    2.456452] WARNING: CPU: 1 PID: 136 at kernel/panic.c:612 refcount_error_report+0xa0/0xa4
> [    2.456452] Modules linked in: scsi_mod(E+) autofs4(E)
> [    2.456456] CPU: 1 PID: 136 Comm: (haveged) Tainted: G            E   4.13.0.g152d54a-tip-default #47
> [    2.456456] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [    2.456457] task: ffff880137f65640 task.stack: ffffc9000093c000
> [    2.456458] RIP: 0010:refcount_error_report+0xa0/0xa4
> [    2.456459] RSP: 0018:ffffc9000093fa38 EFLAGS: 00010282
> [    2.456459] RAX: 000000000000005c RBX: ffffffff81a33864 RCX: 0000000000000830
> [    2.456459] RDX: 0000000000000001 RSI: 0000000000000082 RDI: 0000000000000246
> [    2.456460] RBP: ffffc9000093fb88 R08: 000000008e8d4f61 R09: 0000000000000000
> [    2.456462] R10: 0000000000000000 R11: ffff880137f61600 R12: ffff880137f65640
> [    2.456463] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000006
> [    2.456464] FS:  00007f5a33a14880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> [    2.456464] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.456464] CR2: 00007ffe37ca0c88 CR3: 0000000137f74005 CR4: 00000000001606e0
> [    2.456467] Call Trace:
> [    2.456478]  ex_handler_refcount+0x63/0x70
> [    2.456479]  fixup_exception+0x32/0x40
> [    2.456482]  do_trap+0x11b/0x180
> [    2.456493]  do_error_trap+0x70/0xd0
> [    2.456497]  ? skb_unref.part.36+0x10/0x1a
> [    2.456502]  ? idr_get_free+0xcb/0x2f0
> [    2.456505]  invalid_op+0x1e/0x30
> [    2.456510] RIP: 0010:skb_unref.part.36+0x12/0x1a
> [    2.456510] RSP: 0018:ffffc9000093fc30 EFLAGS: 00010202
> [    2.456510] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffff880137df9ee4
> [    2.456511] RDX: ffff880137df0518 RSI: ffff880137df9e00 RDI: ffff880137df9e00
> [    2.456511] RBP: ffff880137df9e00 R08: ffff880137e30018 R09: ffffffff816b91f0
> [    2.456511] R10: 0000000000000017 R11: 00000000fffffff4 R12: 0000000000000000
> [    2.456512] R13: ffff88013a4e9000 R14: 0000000000000000 R15: ffff880137e30000
> [    2.456513]  ? cleanup_uevent_env+0x10/0x10
> [    2.456520]  kfree_skb+0x3f/0xa0
> [    2.456525]  netlink_broadcast_filtered+0x2c8/0x420
> [    2.456530]  ? cleanup_uevent_env+0x10/0x10
> [    2.456531]  kobject_uevent_env+0x476/0x650
> [    2.456540]  device_add+0x41e/0x5f0
> [    2.456550]  netdev_register_kobject+0x8e/0x170
> [    2.456553]  register_netdevice+0x27a/0x3d0
> [    2.456557]  register_netdev+0x16/0x30
> [    2.456562]  loopback_net_init+0x48/0xa0
> [    2.456567]  ops_init+0x39/0x110
> [    2.456568]  setup_net+0x85/0x130
> [    2.456569]  copy_net_ns+0xb9/0x1f0
> [    2.456571]  create_new_namespaces+0x11a/0x1b0
> [    2.456577]  unshare_nsproxy_namespaces+0x55/0xa0
> [    2.456578]  SyS_unshare+0x18d/0x330
> [    2.456581]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [    2.456584] RIP: 0033:0x7f5a320669f7
> [    2.456585] RSP: 002b:00007ffcd1e10908 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
> [    2.456585] RAX: ffffffffffffffda RBX: 000000000159ae48 RCX: 00007f5a320669f7
> [    2.456586] RDX: 000000000000000b RSI: 00007ffcd1e10910 RDI: 0000000040000000
> [    2.456588] RBP: 00007ffcd1e10cf0 R08: 0000000000000001 R09: 00000000015b8d20
> [    2.456588] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000
> [    2.456588] R13: 00000000ffffffff R14: 00000000ffffffff R15: 0000000000000000
> [    2.456589] Code: 10 09 00 00 48 8b 95 80 00 00 00 49 8d 8c 24 f0 0a 00 00 41 89 c1 44 89 2c 24 48 89 de 48 c7 c7 80 70 a3 81 31 c0 e8 bd f1 05 00 <0f> ff eb 88 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89
> [    2.456599] ---[ end trace d3215cc8334d8520 ]---

Interesting! Can you try with 633547973ffc3 ("net: convert
sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
running haveged will help me trigger this on my system...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31  2:27                         ` Kees Cook
@ 2017-08-31  3:12                           ` Mike Galbraith
  2017-08-31  4:01                             ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-08-31  3:12 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, Ingo Molnar, Reshetova, Elena

On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:

> Interesting! Can you try with 633547973ffc3 ("net: convert
> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
> running haveged will help me trigger this on my system...

With that (plus 230cd1279d001 fix to it) reverted, vbox boots.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31  3:12                           ` Mike Galbraith
@ 2017-08-31  4:01                             ` Kees Cook
  2017-08-31  4:10                               ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-31  4:01 UTC (permalink / raw)
  To: Mike Galbraith, David S. Miller
  Cc: LKML, Ingo Molnar, Reshetova, Elena, Network Development

On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
>
>> Interesting! Can you try with 633547973ffc3 ("net: convert
>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
>> running haveged will help me trigger this on my system...
>
> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.

Wonderful! Thank you so much for helping track this down.

So, it seems that sk_buff.users will need some more special attention
before we can convert it to refcount.

x86-refcount will saturate with refcount_dec_and_test() if the result
is negative. But that would mean at least starting at 0. FULL should
have WARNed in this case, so I remain slightly confused why it was
missed by FULL.

Ingo, I'm not sure the best path for this. It seems we need to revert
230cd1279d001 and 633547973ffc3 and then we can restore
ARCH_HAS_REFCOUNT.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31  4:01                             ` Kees Cook
@ 2017-08-31  4:10                               ` Kees Cook
  2017-08-31  4:38                                 ` Mike Galbraith
  2017-08-31 13:58                                 ` Mike Galbraith
  0 siblings, 2 replies; 36+ messages in thread
From: Kees Cook @ 2017-08-31  4:10 UTC (permalink / raw)
  To: Mike Galbraith, David S. Miller, Peter Zijlstra
  Cc: LKML, Ingo Molnar, Reshetova, Elena, Network Development

On Wed, Aug 30, 2017 at 9:01 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
>> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
>>
>>> Interesting! Can you try with 633547973ffc3 ("net: convert
>>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
>>> running haveged will help me trigger this on my system...
>>
>> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.
>
> Wonderful! Thank you so much for helping track this down.
>
> So, it seems that sk_buff.users will need some more special attention
> before we can convert it to refcount.
>
> x86-refcount will saturate with refcount_dec_and_test() if the result
> is negative. But that would mean at least starting at 0. FULL should
> have WARNed in this case, so I remain slightly confused why it was
> missed by FULL.

Actually, if this is a race condition it's possible that FULL is slow
enough to miss it...

I bet something briefly takes the refcount negative, and with
unchecked atomics, it come back up positive again during the race.
FULL may miss the race, and x86-refcount will catch it and saturate...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31  4:10                               ` Kees Cook
@ 2017-08-31  4:38                                 ` Mike Galbraith
  2017-08-31 13:58                                 ` Mike Galbraith
  1 sibling, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2017-08-31  4:38 UTC (permalink / raw)
  To: Kees Cook, David S. Miller, Peter Zijlstra
  Cc: LKML, Ingo Molnar, Reshetova, Elena, Network Development

On Wed, 2017-08-30 at 21:10 -0700, Kees Cook wrote:
> On Wed, Aug 30, 2017 at 9:01 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
> >> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
> >>
> >>> Interesting! Can you try with 633547973ffc3 ("net: convert
> >>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
> >>> running haveged will help me trigger this on my system...
> >>
> >> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.
> >
> > Wonderful! Thank you so much for helping track this down.
> >
> > So, it seems that sk_buff.users will need some more special attention
> > before we can convert it to refcount.
> >
> > x86-refcount will saturate with refcount_dec_and_test() if the result
> > is negative. But that would mean at least starting at 0. FULL should
> > have WARNed in this case, so I remain slightly confused why it was
> > missed by FULL.
> 
> Actually, if this is a race condition it's possible that FULL is slow
> enough to miss it...
> 
> I bet something briefly takes the refcount negative, and with
> unchecked atomics, it come back up positive again during the race.
> FULL may miss the race, and x86-refcount will catch it and saturate...

Hm, I'll go have a stare.. not that that's likely to turn anything up,
memory ordering stares usually inducing a zombie like state.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31  4:10                               ` Kees Cook
  2017-08-31  4:38                                 ` Mike Galbraith
@ 2017-08-31 13:58                                 ` Mike Galbraith
  2017-08-31 17:00                                   ` Kees Cook
  2017-08-31 19:28                                   ` Kees Cook
  1 sibling, 2 replies; 36+ messages in thread
From: Mike Galbraith @ 2017-08-31 13:58 UTC (permalink / raw)
  To: Kees Cook, David S. Miller, Peter Zijlstra
  Cc: LKML, Ingo Molnar, Reshetova, Elena, Network Development

On Wed, 2017-08-30 at 21:10 -0700, Kees Cook wrote:
> On Wed, Aug 30, 2017 at 9:01 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
> >> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
> >>
> >>> Interesting! Can you try with 633547973ffc3 ("net: convert
> >>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
> >>> running haveged will help me trigger this on my system...
> >>
> >> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.
> >
> > Wonderful! Thank you so much for helping track this down.
> >
> > So, it seems that sk_buff.users will need some more special attention
> > before we can convert it to refcount.
> >
> > x86-refcount will saturate with refcount_dec_and_test() if the result
> > is negative. But that would mean at least starting at 0. FULL should
> > have WARNed in this case, so I remain slightly confused why it was
> > missed by FULL.
> 
> Actually, if this is a race condition it's possible that FULL is slow
> enough to miss it...
> 
> I bet something briefly takes the refcount negative, and with
> unchecked atomics, it come back up positive again during the race.
> FULL may miss the race, and x86-refcount will catch it and saturate...

(gdb) list *in6_dev_get+0x1e
0xffffffff8166d3de is in in6_dev_get (./arch/x86/include/asm/refcount.h:52).
47                      : "cc", "cx");
48      }
49
50      static __always_inline void refcount_inc(refcount_t *r)
51      {
52              asm volatile(LOCK_PREFIX "incl %0\n\t"
53                      REFCOUNT_CHECK_LT_ZERO
54                      : [counter] "+m" (r->refs.counter)
55                      : : "cc", "cx");
56

gdb) list *in6_dev_get+0x10
0xffffffff8166d3d0 is in in6_dev_get (./include/net/addrconf.h:318).
313     {
314             struct inet6_dev *idev;
315
316             rcu_read_lock();
317             idev = rcu_dereference(dev->ip6_ptr);
318             if (idev)
319                     refcount_inc(&idev->refcnt);
320             rcu_read_unlock();
321             return idev;
322 

That's from kernel with no revert, but your silent saturation patch
still applied, AND built with gcc-6.3.1.  Kernel traps, but it boots
and works, as does kernel built with gcc-7.0.1.  Remove your silent
saturation patch, kernel doesn't notice a thing, just works.

With gcc-4.8.5, trap means you're as good as dead, with the other two,
trap means the intended.  Compiler, constraints, dark elves.. pick one.

Full first splat from bootable gcc-6.3.1 built kernel.

[    1.293962] NET: Registered protocol family 10
[    1.294635] refcount_t silent saturation at in6_dev_get+0x25/0x104 in swapper/0[1], uid/euid: 0/0
[    1.295616] ------------[ cut here ]------------
[    1.296120] WARNING: CPU: 0 PID: 1 at kernel/panic.c:612 refcount_error_report+0x94/0x9e
[    1.296950] Modules linked in:
[    1.297276] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0.g152d54a-tip-default #53
[    1.299179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    1.300743] task: ffff88013ab84040 task.stack: ffffc9000062c000
[    1.301825] RIP: 0010:refcount_error_report+0x94/0x9e
[    1.302804] RSP: 0018:ffffc9000062fc10 EFLAGS: 00010282
[    1.303791] RAX: 0000000000000055 RBX: ffffffff81a34274 RCX: ffffffff81c605e8
[    1.304991] RDX: 0000000000000001 RSI: 0000000000000096 RDI: 0000000000000246
[    1.306189] RBP: ffffc9000062fd58 R08: 0000000000000000 R09: 0000000000000175
[    1.307392] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88013ab84040
[    1.308583] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff81a256c8
[    1.309768] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[    1.311052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.312100] CR2: 00007f4631fe8df0 CR3: 0000000137d09003 CR4: 00000000001606f0
[    1.313301] Call Trace:
[    1.314012]  ex_handler_refcount+0x63/0x70
[    1.314893]  fixup_exception+0x32/0x40
[    1.315737]  do_trap+0x8c/0x170
[    1.316519]  do_error_trap+0x70/0xd0
[    1.317340]  ? in6_dev_get+0x23/0x104
[    1.318172]  ? netlink_broadcast_filtered+0x2bd/0x430
[    1.319156]  ? kmem_cache_alloc_trace+0xce/0x5d0
[    1.320098]  ? set_debug_rodata+0x11/0x11
[    1.320964]  invalid_op+0x1e/0x30
[    1.322520] RIP: 0010:in6_dev_get+0x25/0x104
[    1.323631] RSP: 0018:ffffc9000062fe00 EFLAGS: 00010202
[    1.324614] RAX: ffff880137de2400 RBX: ffff880137df4600 RCX: ffff880137de24f0
[    1.325793] RDX: ffff88013a5e4000 RSI: 00000000fffffe00 RDI: ffff88013a5e4000
[    1.326964] RBP: 00000000000000d1 R08: 0000000000000000 R09: ffff880137de7600
[    1.328150] R10: 0000000000000000 R11: ffff8801398a4df8 R12: 0000000000000000
[    1.329374] R13: ffffffff82137872 R14: 014200ca00000000 R15: 0000000000000000
[    1.330547]  ? set_debug_rodata+0x11/0x11
[    1.331392]  ip6_route_init_special_entries+0x2a/0x89
[    1.332369]  addrconf_init+0x9e/0x203
[    1.333173]  inet6_init+0x1af/0x365
[    1.333956]  ? af_unix_init+0x4e/0x4e
[    1.334753]  do_one_initcall+0x4e/0x190
[    1.335555]  ? set_debug_rodata+0x11/0x11
[    1.336369]  kernel_init_freeable+0x189/0x20e
[    1.337230]  ? rest_init+0xd0/0xd0
[    1.337999]  kernel_init+0xa/0xf7
[    1.338744]  ret_from_fork+0x25/0x30
[    1.339500] Code: 48 8b 95 80 00 00 00 41 55 49 8d 8c 24 f0 0a 00 00 45 8b 84 24 10 09 00 00 41 89 c1 48 89 de 48 c7 c7 60 7a a3 81 e8 07 de 05 00 <0f> ff 58 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 55 48 89 e5 41 56 
[    1.342243] ---[ end trace b5d40c0fccce776c ]---

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31 13:58                                 ` Mike Galbraith
@ 2017-08-31 17:00                                   ` Kees Cook
  2017-08-31 17:19                                     ` Mike Galbraith
  2017-08-31 19:28                                   ` Kees Cook
  1 sibling, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-31 17:00 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Thu, Aug 31, 2017 at 6:58 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Wed, 2017-08-30 at 21:10 -0700, Kees Cook wrote:
>> On Wed, Aug 30, 2017 at 9:01 PM, Kees Cook <keescook@chromium.org> wrote:
>> > On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
>> >> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
>> >>
>> >>> Interesting! Can you try with 633547973ffc3 ("net: convert
>> >>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
>> >>> running haveged will help me trigger this on my system...
>> >>
>> >> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.
>> >
>> > Wonderful! Thank you so much for helping track this down.
>> >
>> > So, it seems that sk_buff.users will need some more special attention
>> > before we can convert it to refcount.
>> >
>> > x86-refcount will saturate with refcount_dec_and_test() if the result
>> > is negative. But that would mean at least starting at 0. FULL should
>> > have WARNed in this case, so I remain slightly confused why it was
>> > missed by FULL.
>>
>> Actually, if this is a race condition it's possible that FULL is slow
>> enough to miss it...
>>
>> I bet something briefly takes the refcount negative, and with
>> unchecked atomics, it come back up positive again during the race.
>> FULL may miss the race, and x86-refcount will catch it and saturate...
>
> (gdb) list *in6_dev_get+0x1e
> 0xffffffff8166d3de is in in6_dev_get (./arch/x86/include/asm/refcount.h:52).
> 47                      : "cc", "cx");
> 48      }
> 49
> 50      static __always_inline void refcount_inc(refcount_t *r)
> 51      {
> 52              asm volatile(LOCK_PREFIX "incl %0\n\t"
> 53                      REFCOUNT_CHECK_LT_ZERO
> 54                      : [counter] "+m" (r->refs.counter)
> 55                      : : "cc", "cx");
> 56
>
> gdb) list *in6_dev_get+0x10
> 0xffffffff8166d3d0 is in in6_dev_get (./include/net/addrconf.h:318).
> 313     {
> 314             struct inet6_dev *idev;
> 315
> 316             rcu_read_lock();
> 317             idev = rcu_dereference(dev->ip6_ptr);
> 318             if (idev)
> 319                     refcount_inc(&idev->refcnt);
> 320             rcu_read_unlock();
> 321             return idev;
> 322
>
> That's from kernel with no revert, but your silent saturation patch
> still applied, AND built with gcc-6.3.1.  Kernel traps, but it boots
> and works, as does kernel built with gcc-7.0.1.  Remove your silent
> saturation patch, kernel doesn't notice a thing, just works.
>
> With gcc-4.8.5, trap means you're as good as dead, with the other two,
> trap means the intended.  Compiler, constraints, dark elves.. pick one.

Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:

4.8.5: WARN, eventual kernel hang
6.3.1, 7.0.1: WARN, but continues working

> Full first splat from bootable gcc-6.3.1 built kernel.
>
> [    1.293962] NET: Registered protocol family 10
> [    1.294635] refcount_t silent saturation at in6_dev_get+0x25/0x104 in swapper/0[1], uid/euid: 0/0

That's an _increment_ saturation? Which means the result must be
negative, so it started from least -2.

> [    1.295616] ------------[ cut here ]------------
> [    1.296120] WARNING: CPU: 0 PID: 1 at kernel/panic.c:612 refcount_error_report+0x94/0x9e
> [    1.296950] Modules linked in:
> [    1.297276] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0.g152d54a-tip-default #53
> [    1.299179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [    1.300743] task: ffff88013ab84040 task.stack: ffffc9000062c000
> [    1.301825] RIP: 0010:refcount_error_report+0x94/0x9e
> [    1.302804] RSP: 0018:ffffc9000062fc10 EFLAGS: 00010282
> [    1.303791] RAX: 0000000000000055 RBX: ffffffff81a34274 RCX: ffffffff81c605e8
> [    1.304991] RDX: 0000000000000001 RSI: 0000000000000096 RDI: 0000000000000246
> [    1.306189] RBP: ffffc9000062fd58 R08: 0000000000000000 R09: 0000000000000175
> [    1.307392] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88013ab84040
> [    1.308583] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff81a256c8
> [    1.309768] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
> [    1.311052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.312100] CR2: 00007f4631fe8df0 CR3: 0000000137d09003 CR4: 00000000001606f0
> [    1.313301] Call Trace:
> [    1.314012]  ex_handler_refcount+0x63/0x70
> [    1.314893]  fixup_exception+0x32/0x40
> [    1.315737]  do_trap+0x8c/0x170
> [    1.316519]  do_error_trap+0x70/0xd0
> [    1.317340]  ? in6_dev_get+0x23/0x104
> [    1.318172]  ? netlink_broadcast_filtered+0x2bd/0x430
> [    1.319156]  ? kmem_cache_alloc_trace+0xce/0x5d0
> [    1.320098]  ? set_debug_rodata+0x11/0x11
> [    1.320964]  invalid_op+0x1e/0x30
> [    1.322520] RIP: 0010:in6_dev_get+0x25/0x104
> [    1.323631] RSP: 0018:ffffc9000062fe00 EFLAGS: 00010202
> [    1.324614] RAX: ffff880137de2400 RBX: ffff880137df4600 RCX: ffff880137de24f0
> [    1.325793] RDX: ffff88013a5e4000 RSI: 00000000fffffe00 RDI: ffff88013a5e4000
> [    1.326964] RBP: 00000000000000d1 R08: 0000000000000000 R09: ffff880137de7600
> [    1.328150] R10: 0000000000000000 R11: ffff8801398a4df8 R12: 0000000000000000
> [    1.329374] R13: ffffffff82137872 R14: 014200ca00000000 R15: 0000000000000000
> [    1.330547]  ? set_debug_rodata+0x11/0x11
> [    1.331392]  ip6_route_init_special_entries+0x2a/0x89
> [    1.332369]  addrconf_init+0x9e/0x203
> [    1.333173]  inet6_init+0x1af/0x365
> [    1.333956]  ? af_unix_init+0x4e/0x4e
> [    1.334753]  do_one_initcall+0x4e/0x190
> [    1.335555]  ? set_debug_rodata+0x11/0x11
> [    1.336369]  kernel_init_freeable+0x189/0x20e
> [    1.337230]  ? rest_init+0xd0/0xd0
> [    1.337999]  kernel_init+0xa/0xf7
> [    1.338744]  ret_from_fork+0x25/0x30
> [    1.339500] Code: 48 8b 95 80 00 00 00 41 55 49 8d 8c 24 f0 0a 00 00 45 8b 84 24 10 09 00 00 41 89 c1 48 89 de 48 c7 c7 60 7a a3 81 e8 07 de 05 00 <0f> ff 58 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 55 48 89 e5 41 56
> [    1.342243] ---[ end trace b5d40c0fccce776c ]---

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31 17:00                                   ` Kees Cook
@ 2017-08-31 17:19                                     ` Mike Galbraith
  2017-08-31 18:45                                       ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-08-31 17:19 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
> 
> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
> 
> 4.8.5: WARN, eventual kernel hang
> 6.3.1, 7.0.1: WARN, but continues working

Yeah, that's correct.  I find that troubling, simply because this gcc
version has been through one hell of a lot of kernels with me.  Yeah, I
know, that doesn't exempt it from having bugs, but color me suspicious.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31 17:19                                     ` Mike Galbraith
@ 2017-08-31 18:45                                       ` Kees Cook
  2017-09-01  6:57                                         ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-08-31 18:45 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>>
>> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
>>
>> 4.8.5: WARN, eventual kernel hang
>> 6.3.1, 7.0.1: WARN, but continues working
>
> Yeah, that's correct.  I find that troubling, simply because this gcc
> version has been through one hell of a lot of kernels with me.  Yeah, I
> know, that doesn't exempt it from having bugs, but color me suspicious.

I still can't hit this with a 4.8.5 build. :(

With _RATELIMIT removed, this should, in theory, report whatever goes
negative first...

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index c076f710de4c..d4fc6b91c0e6 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -72,6 +72,10 @@ bool ex_handler_refcount(const struct
exception_table_entry *fixup,
                bool zero = regs->flags & X86_EFLAGS_ZF;

                refcount_error_report(regs, zero ? "hit zero" : "overflow");
+       } else if (regs->flags & X86_EFLAGS_SF) {
+               refcount_error_report(regs, "result was negative");
+       } else {
+               refcount_error_report(regs, "unknown state");
        }

        return true;
diff --git a/kernel/panic.c b/kernel/panic.c
index bdd18afa19a4..966ade491543 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -605,7 +605,7 @@ EXPORT_SYMBOL(__stack_chk_fail);
 #ifdef CONFIG_ARCH_HAS_REFCOUNT
 void refcount_error_report(struct pt_regs *regs, const char *err)
 {
-       WARN_RATELIMIT(1, "refcount_t %s at %pB in %s[%d], uid/euid: %u/%u\n",
+       WARN(1, "refcount_t %s at %pB in %s[%d], uid/euid: %u/%u\n",
                err, (void *)instruction_pointer(regs),
                current->comm, task_pid_nr(current),
                from_kuid_munged(&init_user_ns, current_uid()),

And if it is still a refcount_inc(), and only on gcc 4.8.5, then I
think we need to study the resulting assembly...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31 13:58                                 ` Mike Galbraith
  2017-08-31 17:00                                   ` Kees Cook
@ 2017-08-31 19:28                                   ` Kees Cook
  1 sibling, 0 replies; 36+ messages in thread
From: Kees Cook @ 2017-08-31 19:28 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Thu, Aug 31, 2017 at 6:58 AM, Mike Galbraith <efault@gmx.de> wrote:
> gdb) list *in6_dev_get+0x10
> 0xffffffff8166d3d0 is in in6_dev_get (./include/net/addrconf.h:318).
> 313     {
> 314             struct inet6_dev *idev;
> 315
> 316             rcu_read_lock();
> 317             idev = rcu_dereference(dev->ip6_ptr);
> 318             if (idev)
> 319                     refcount_inc(&idev->refcnt);
> 320             rcu_read_unlock();
> 321             return idev;
> 322

And this is a completely different refcount from the other that
tripped. This one is quite simple, too, though I see it uses
refcount_dec(), which is a path to saturation. I've sent a patch to
try to clarify this further...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31 18:45                                       ` Kees Cook
@ 2017-09-01  6:57                                         ` Mike Galbraith
  2017-09-01 13:09                                           ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-09-01  6:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
> On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
> > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
> >>
> >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
> >>
> >> 4.8.5: WARN, eventual kernel hang
> >> 6.3.1, 7.0.1: WARN, but continues working
> >
> > Yeah, that's correct.  I find that troubling, simply because this gcc
> > version has been through one hell of a lot of kernels with me.  Yeah, I
> > know, that doesn't exempt it from having bugs, but color me suspicious.
> 
> I still can't hit this with a 4.8.5 build. :(
> 
> With _RATELIMIT removed, this should, in theory, report whatever goes
> negative first...

I applied the other patch you posted, and built with gcc-6.3.1 to
remove the gcc-4.8.5 aspect.  Look below the resulting splat.

[    1.293962] NET: Registered protocol family 10
[    1.294635] refcount_t silent saturation at in6_dev_get+0x25/0x104 in swapper/0[1], uid/euid: 0/0
[    1.295616] ------------[ cut here ]------------
[    1.296120] WARNING: CPU: 0 PID: 1 at kernel/panic.c:612 refcount_error_report+0x94/0x9e
[    1.296950] Modules linked in:
[    1.297276] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0.g152d54a-tip-default #53
[    1.299179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    1.300743] task: ffff88013ab84040 task.stack: ffffc9000062c000
[    1.301825] RIP: 0010:refcount_error_report+0x94/0x9e
[    1.302804] RSP: 0018:ffffc9000062fc10 EFLAGS: 00010282
[    1.303791] RAX: 0000000000000055 RBX: ffffffff81a34274 RCX: ffffffff81c605e8
[    1.304991] RDX: 0000000000000001 RSI: 0000000000000096 RDI: 0000000000000246
[    1.306189] RBP: ffffc9000062fd58 R08: 0000000000000000 R09: 0000000000000175
[    1.307392] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88013ab84040
[    1.308583] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff81a256c8
[    1.309768] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[    1.311052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.312100] CR2: 00007f4631fe8df0 CR3: 0000000137d09003 CR4: 00000000001606f0
[    1.313301] Call Trace:
[    1.314012]  ex_handler_refcount+0x63/0x70
[    1.314893]  fixup_exception+0x32/0x40
[    1.315737]  do_trap+0x8c/0x170
[    1.316519]  do_error_trap+0x70/0xd0
[    1.317340]  ? in6_dev_get+0x23/0x104
[    1.318172]  ? netlink_broadcast_filtered+0x2bd/0x430
[    1.319156]  ? kmem_cache_alloc_trace+0xce/0x5d0
[    1.320098]  ? set_debug_rodata+0x11/0x11
[    1.320964]  invalid_op+0x1e/0x30
[    1.322520] RIP: 0010:in6_dev_get+0x25/0x104
[    1.323631] RSP: 0018:ffffc9000062fe00 EFLAGS: 00010202
[    1.324614] RAX: ffff880137de2400 RBX: ffff880137df4600 RCX: ffff880137de24f0
[    1.325793] RDX: ffff88013a5e4000 RSI: 00000000fffffe00 RDI: ffff88013a5e4000
[    1.326964] RBP: 00000000000000d1 R08: 0000000000000000 R09: ffff880137de7600
[    1.328150] R10: 0000000000000000 R11: ffff8801398a4df8 R12: 0000000000000000
[    1.329374] R13: ffffffff82137872 R14: 014200ca00000000 R15: 0000000000000000
[    1.330547]  ? set_debug_rodata+0x11/0x11
[    1.331392]  ip6_route_init_special_entries+0x2a/0x89
[    1.332369]  addrconf_init+0x9e/0x203
[    1.333173]  inet6_init+0x1af/0x365
[    1.333956]  ? af_unix_init+0x4e/0x4e
[    1.334753]  do_one_initcall+0x4e/0x190
[    1.335555]  ? set_debug_rodata+0x11/0x11
[    1.336369]  kernel_init_freeable+0x189/0x20e
[    1.337230]  ? rest_init+0xd0/0xd0
[    1.337999]  kernel_init+0xa/0xf7
[    1.338744]  ret_from_fork+0x25/0x30
[    1.339500] Code: 48 8b 95 80 00 00 00 41 55 49 8d 8c 24 f0 0a 00 00 45 8b 84 24 10 09 00 00 41 89 c1 48 89 de 48 c7 c7 60 7a a3 81 e8 07 de 05 00 <0f> ff 58 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 55 48 89 e5 41 56 
[    1.342243] ---[ end trace b5d40c0fccce776c ]---

Back yours out, and...

# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
       swapper/0-1     [000] ...1     1.974114: in6_dev_getx: refs.counter:-1073741824
       swapper/0-1     [000] ...1     1.974116: in6_dev_getx: refs.counter:-1073741824

---
 arch/x86/include/asm/refcount.h |    9 +++++++++
 include/net/addrconf.h          |   12 ++++++++++++
 net/ipv6/route.c                |    4 ++--
 3 files changed, 23 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -55,6 +55,15 @@ static __always_inline void refcount_inc
 		: : "cc", "cx");
 }
 
+static __always_inline void refcount_inc_x(refcount_t *r)
+{
+	trace_printk("refs.counter:%d\n", r->refs.counter);
+	asm volatile(LOCK_PREFIX "incl %0\n\t"
+		REFCOUNT_CHECK_LT_ZERO
+		: [counter] "+m" (r->refs.counter)
+		: : "cc", "cx");
+}
+
 static __always_inline void refcount_dec(refcount_t *r)
 {
 	asm volatile(LOCK_PREFIX "decl %0\n\t"
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -321,6 +321,18 @@ static inline struct inet6_dev *in6_dev_
 	return idev;
 }
 
+static inline struct inet6_dev *in6_dev_getx(const struct net_device *dev)
+{
+	struct inet6_dev *idev;
+
+	rcu_read_lock();
+	idev = rcu_dereference(dev->ip6_ptr);
+	if (idev)
+		refcount_inc_x(&idev->refcnt);
+	rcu_read_unlock();
+	return idev;
+}
+
 static inline struct neigh_parms *__in6_dev_nd_parms_get_rcu(const struct net_device *dev)
 {
 	struct inet6_dev *idev = __in6_dev_get(dev);
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4044,9 +4044,9 @@ void __init ip6_route_init_special_entri
 	init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
   #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 	init_net.ipv6.ip6_prohibit_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
 	init_net.ipv6.ip6_blk_hole_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
   #endif
 }
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01  6:57                                         ` Mike Galbraith
@ 2017-09-01 13:09                                           ` Mike Galbraith
  2017-09-01 17:12                                             ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-09-01 13:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
> > >>
> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
> > >>
> > >> 4.8.5: WARN, eventual kernel hang
> > >> 6.3.1, 7.0.1: WARN, but continues working
> > >
> > > Yeah, that's correct.  I find that troubling, simply because this gcc
> > > version has been through one hell of a lot of kernels with me.  Yeah, I
> > > know, that doesn't exempt it from having bugs, but color me suspicious.
> > 
> > I still can't hit this with a 4.8.5 build. :(
> > 
> > With _RATELIMIT removed, this should, in theory, report whatever goes
> > negative first...
> 
> I applied the other patch you posted, and built with gcc-6.3.1 to
> remove the gcc-4.8.5 aspect.  Look below the resulting splat.

Grr, that one has a in6_dev_getx() line missing for the first
increment, where things go pear shaped.

With that added, looking at counter both before, and after incl, with a
trace_printk() in the exception handler showing it doing its saturate
thing, irqs disabled across the whole damn refcount_inc(), and even
booting box nr_cpus=1 for extra credit...

HTH can that first refcount_inc() get there?

# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
         systemd-1     [000] d..1     1.937284: in6_dev_getx: PRE refs.counter:3
         systemd-1     [000] d..1     1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
         systemd-1     [000] d..1     1.937296: in6_dev_getx: POST refs.counter:-1073741824
         systemd-1     [000] d..1     1.937296: in6_dev_getx: PRE refs.counter:-1073741824
         systemd-1     [000] d..1     1.937297: ex_handler_refcount: *(int *)regs->cx = -1073741824
         systemd-1     [000] d..1     1.937297: in6_dev_getx: POST refs.counter:-1073741824
         systemd-1     [000] d..1     1.937297: in6_dev_getx: PRE refs.counter:-1073741824
         systemd-1     [000] d..1     1.937298: ex_handler_refcount: *(int *)regs->cx = -1073741824
         systemd-1     [000] d..1     1.937299: in6_dev_getx: POST refs.counter:-1073741824

---
 arch/x86/include/asm/refcount.h |   14 ++++++++++++++
 arch/x86/mm/extable.c           |    1 +
 include/net/addrconf.h          |   12 ++++++++++++
 net/ipv6/route.c                |    6 +++---
 4 files changed, 30 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -55,6 +55,20 @@ static __always_inline void refcount_inc
 		: : "cc", "cx");
 }
 
+static __always_inline void refcount_inc_x(refcount_t *r)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	trace_printk("PRE refs.counter:%d\n", r->refs.counter);
+	asm volatile(LOCK_PREFIX "incl %0\n\t"
+		REFCOUNT_CHECK_LT_ZERO
+		: [counter] "+m" (r->refs.counter)
+		: : "cc", "cx");
+	trace_printk("POST refs.counter:%d\n", r->refs.counter);
+	local_irq_restore(flags);
+}
+
 static __always_inline void refcount_dec(refcount_t *r)
 {
 	asm volatile(LOCK_PREFIX "decl %0\n\t"
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -45,6 +45,7 @@ bool ex_handler_refcount(const struct ex
 {
 	/* First unconditionally saturate the refcount. */
 	*(int *)regs->cx = INT_MIN / 2;
+	trace_printk("*(int *)regs->cx = %d\n", *(int *)regs->cx);
 
 	/*
 	 * Strictly speaking, this reports the fixup destination, not
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -321,6 +321,18 @@ static inline struct inet6_dev *in6_dev_
 	return idev;
 }
 
+static inline struct inet6_dev *in6_dev_getx(const struct net_device *dev)
+{
+	struct inet6_dev *idev;
+
+	rcu_read_lock();
+	idev = rcu_dereference(dev->ip6_ptr);
+	if (idev)
+		refcount_inc_x(&idev->refcnt);
+	rcu_read_unlock();
+	return idev;
+}
+
 static inline struct neigh_parms *__in6_dev_nd_parms_get_rcu(const struct net_device *dev)
 {
 	struct inet6_dev *idev = __in6_dev_get(dev);
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4041,12 +4041,12 @@ void __init ip6_route_init_special_entri
 	 * the loopback reference in rt6_info will not be taken, do it
 	 * manually for init_net */
 	init_net.ipv6.ip6_null_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
   #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 	init_net.ipv6.ip6_prohibit_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
 	init_net.ipv6.ip6_blk_hole_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
   #endif
 }
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01 13:09                                           ` Mike Galbraith
@ 2017-09-01 17:12                                             ` Kees Cook
  2017-09-01 17:52                                               ` Mike Galbraith
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-09-01 17:12 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
>> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
>> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
>> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>> > >>
>> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
>> > >>
>> > >> 4.8.5: WARN, eventual kernel hang
>> > >> 6.3.1, 7.0.1: WARN, but continues working
>> > >
>> > > Yeah, that's correct.  I find that troubling, simply because this gcc
>> > > version has been through one hell of a lot of kernels with me.  Yeah, I
>> > > know, that doesn't exempt it from having bugs, but color me suspicious.
>> >
>> > I still can't hit this with a 4.8.5 build. :(
>> >
>> > With _RATELIMIT removed, this should, in theory, report whatever goes
>> > negative first...
>>
>> I applied the other patch you posted, and built with gcc-6.3.1 to
>> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
>
> Grr, that one has a in6_dev_getx() line missing for the first
> increment, where things go pear shaped.
>
> With that added, looking at counter both before, and after incl, with a
> trace_printk() in the exception handler showing it doing its saturate
> thing, irqs disabled across the whole damn refcount_inc(), and even
> booting box nr_cpus=1 for extra credit...
>
> HTH can that first refcount_inc() get there?
>
> # tracer: nop
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>          systemd-1     [000] d..1     1.937284: in6_dev_getx: PRE refs.counter:3
>          systemd-1     [000] d..1     1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
>          systemd-1     [000] d..1     1.937296: in6_dev_getx: POST refs.counter:-1073741824

O_o

Can you paste the disassembly of in6_dev_getx? I can't understand how
we're landing in the exception handler.

>          systemd-1     [000] d..1     1.937296: in6_dev_getx: PRE refs.counter:-1073741824
>          systemd-1     [000] d..1     1.937297: ex_handler_refcount: *(int *)regs->cx = -1073741824
>          systemd-1     [000] d..1     1.937297: in6_dev_getx: POST refs.counter:-1073741824
>          systemd-1     [000] d..1     1.937297: in6_dev_getx: PRE refs.counter:-1073741824
>          systemd-1     [000] d..1     1.937298: ex_handler_refcount: *(int *)regs->cx = -1073741824
>          systemd-1     [000] d..1     1.937299: in6_dev_getx: POST refs.counter:-1073741824
>
> ---
>  arch/x86/include/asm/refcount.h |   14 ++++++++++++++
>  arch/x86/mm/extable.c           |    1 +
>  include/net/addrconf.h          |   12 ++++++++++++
>  net/ipv6/route.c                |    6 +++---
>  4 files changed, 30 insertions(+), 3 deletions(-)
>
> --- a/arch/x86/include/asm/refcount.h
> +++ b/arch/x86/include/asm/refcount.h
> @@ -55,6 +55,20 @@ static __always_inline void refcount_inc
>                 : : "cc", "cx");
>  }
>
> +static __always_inline void refcount_inc_x(refcount_t *r)
> +{
> +       unsigned long flags;
> +
> +       local_irq_save(flags);
> +       trace_printk("PRE refs.counter:%d\n", r->refs.counter);
> +       asm volatile(LOCK_PREFIX "incl %0\n\t"
> +               REFCOUNT_CHECK_LT_ZERO
> +               : [counter] "+m" (r->refs.counter)
> +               : : "cc", "cx");

Does this need an explicit "memory" added to the clobbers line here?
This isn't present in the atomic_inc() implementation, but maybe
something confuses gcc in this case into ignoring the "+m" marking?

> +       trace_printk("POST refs.counter:%d\n", r->refs.counter);
> +       local_irq_restore(flags);
> +}
> +
>  static __always_inline void refcount_dec(refcount_t *r)
>  {
>         asm volatile(LOCK_PREFIX "decl %0\n\t"
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -45,6 +45,7 @@ bool ex_handler_refcount(const struct ex
>  {
>         /* First unconditionally saturate the refcount. */
>         *(int *)regs->cx = INT_MIN / 2;
> +       trace_printk("*(int *)regs->cx = %d\n", *(int *)regs->cx);

Just for fun, can you print out *(int *)regs->cx before the assignment too?

>
>         /*
>          * Strictly speaking, this reports the fixup destination, not
> --- a/include/net/addrconf.h
> +++ b/include/net/addrconf.h
> @@ -321,6 +321,18 @@ static inline struct inet6_dev *in6_dev_
>         return idev;
>  }
>
> +static inline struct inet6_dev *in6_dev_getx(const struct net_device *dev)
> +{
> +       struct inet6_dev *idev;
> +
> +       rcu_read_lock();
> +       idev = rcu_dereference(dev->ip6_ptr);
> +       if (idev)
> +               refcount_inc_x(&idev->refcnt);
> +       rcu_read_unlock();
> +       return idev;
> +}
> +
>  static inline struct neigh_parms *__in6_dev_nd_parms_get_rcu(const struct net_device *dev)
>  {
>         struct inet6_dev *idev = __in6_dev_get(dev);
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4041,12 +4041,12 @@ void __init ip6_route_init_special_entri
>          * the loopback reference in rt6_info will not be taken, do it
>          * manually for init_net */
>         init_net.ipv6.ip6_null_entry->dst.dev = init_net.loopback_dev;
> -       init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
> +       init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
>    #ifdef CONFIG_IPV6_MULTIPLE_TABLES
>         init_net.ipv6.ip6_prohibit_entry->dst.dev = init_net.loopback_dev;
> -       init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
> +       init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
>         init_net.ipv6.ip6_blk_hole_entry->dst.dev = init_net.loopback_dev;
> -       init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
> +       init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
>    #endif
>  }
>

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01 17:12                                             ` Kees Cook
@ 2017-09-01 17:52                                               ` Mike Galbraith
  2017-09-01 18:58                                                 ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Mike Galbraith @ 2017-09-01 17:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, 2017-09-01 at 10:12 -0700, Kees Cook wrote:
> On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith <efault@gmx.de> wrote:
> > On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
> >> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
> >> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
> >> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
> >> > >>
> >> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
> >> > >>
> >> > >> 4.8.5: WARN, eventual kernel hang
> >> > >> 6.3.1, 7.0.1: WARN, but continues working
> >> > >
> >> > > Yeah, that's correct.  I find that troubling, simply because this gcc
> >> > > version has been through one hell of a lot of kernels with me.  Yeah, I
> >> > > know, that doesn't exempt it from having bugs, but color me suspicious.
> >> >
> >> > I still can't hit this with a 4.8.5 build. :(
> >> >
> >> > With _RATELIMIT removed, this should, in theory, report whatever goes
> >> > negative first...
> >>
> >> I applied the other patch you posted, and built with gcc-6.3.1 to
> >> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
> >
> > Grr, that one has a in6_dev_getx() line missing for the first
> > increment, where things go pear shaped.
> >
> > With that added, looking at counter both before, and after incl, with a
> > trace_printk() in the exception handler showing it doing its saturate
> > thing, irqs disabled across the whole damn refcount_inc(), and even
> > booting box nr_cpus=1 for extra credit...
> >
> > HTH can that first refcount_inc() get there?
> >
> > # tracer: nop
> > #
> > #                              _-----=> irqs-off
> > #                             / _----=> need-resched
> > #                            | / _---=> hardirq/softirq
> > #                            || / _--=> preempt-depth
> > #                            ||| /     delay
> > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> > #              | |       |   ||||       |         |
> >          systemd-1     [000] d..1     1.937284: in6_dev_getx: PRE refs.counter:3
> >          systemd-1     [000] d..1     1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
> >          systemd-1     [000] d..1     1.937296: in6_dev_getx: POST refs.counter:-1073741824
> 
> O_o
> 
> Can you paste the disassembly of in6_dev_getx? I can't understand how
> we're landing in the exception handler.

I was hoping you'd say that.

   0xffffffff816b2f72 <+0>:     push   %rbp
   0xffffffff816b2f73 <+1>:     mov    %rsp,%rbp
   0xffffffff816b2f76 <+4>:     push   %r12
   0xffffffff816b2f78 <+6>:     push   %rbx
   0xffffffff816b2f79 <+7>:     incl   %gs:0x7e95a2d0(%rip)        # 0xd250 <__preempt_count>
   0xffffffff816b2f80 <+14>:    mov    0x308(%rdi),%rbx
   0xffffffff816b2f87 <+21>:    test   %rbx,%rbx
   0xffffffff816b2f8a <+24>:    je     0xffffffff816b2feb <in6_dev_getx+121>
   0xffffffff816b2f8c <+26>:    callq  *0xffffffff81c35a00
   0xffffffff816b2f93 <+33>:    mov    %rax,%r12
   0xffffffff816b2f96 <+36>:    callq  *0xffffffff81c35a10
   0xffffffff816b2f9d <+43>:    mov    0x769ad4(%rip),%rsi        # 0xffffffff81e1ca78 <trace_printk_fmt.21733>
   0xffffffff816b2fa4 <+50>:    mov    0xf0(%rbx),%edx
   0xffffffff816b2faa <+56>:    mov    $0xffffffff816b2f8c,%rdi
   0xffffffff816b2fb1 <+63>:    callq  0xffffffff81171fc0 <__trace_bprintk>
   0xffffffff816b2fb6 <+68>:    lock incl 0xf0(%rbx)
   0xffffffff816b2fbd <+75>:    js     0xffffffff816b2fbf <in6_dev_getx+77>
   0xffffffff816b2fbf <+77>:    lea    0xf0(%rbx),%rcx
   0xffffffff816b2fc6 <+84>:    (bad)  
   0xffffffff816b2fc8 <+86>:    mov    0x769a99(%rip),%rsi        # 0xffffffff81e1ca68 <trace_printk_fmt.21744>
   0xffffffff816b2fcf <+93>:    mov    0xf0(%rbx),%edx
   0xffffffff816b2fd5 <+99>:    mov    $0xffffffff816b2f8c,%rdi
   0xffffffff816b2fdc <+106>:   callq  0xffffffff81171fc0 <__trace_bprintk>
   0xffffffff816b2fe1 <+111>:   mov    %r12,%rdi
   0xffffffff816b2fe4 <+114>:   callq  *0xffffffff81c35a08
   0xffffffff816b2feb <+121>:   decl   %gs:0x7e95a25e(%rip)        # 0xd250 <__preempt_count>
   0xffffffff816b2ff2 <+128>:   mov    %rbx,%rax
   0xffffffff816b2ff5 <+131>:   pop    %rbx
   0xffffffff816b2ff6 <+132>:   pop    %r12
   0xffffffff816b2ff8 <+134>:   pop    %rbp
   0xffffffff816b2ff9 <+135>:   retq

I don't get the section business at all, +75 looks to me like we're
gonna trap no matter what.. as we appear to be doing.

> > --- a/arch/x86/include/asm/refcount.h
> > +++ b/arch/x86/include/asm/refcount.h
> > @@ -55,6 +55,20 @@ static __always_inline void refcount_inc
> >                 : : "cc", "cx");
> >  }
> >
> > +static __always_inline void refcount_inc_x(refcount_t *r)
> > +{
> > +       unsigned long flags;
> > +
> > +       local_irq_save(flags);
> > +       trace_printk("PRE refs.counter:%d\n", r->refs.counter);
> > +       asm volatile(LOCK_PREFIX "incl %0\n\t"
> > +               REFCOUNT_CHECK_LT_ZERO
> > +               : [counter] "+m" (r->refs.counter)
> > +               : : "cc", "cx");
> 
> Does this need an explicit "memory" added to the clobbers line here?
> This isn't present in the atomic_inc() implementation, but maybe
> something confuses gcc in this case into ignoring the "+m" marking?

I thought about adding that (hail mary), but resisted.

> > +       trace_printk("POST refs.counter:%d\n", r->refs.counter);
> > +       local_irq_restore(flags);
> > +}
> > +
> >  static __always_inline void refcount_dec(refcount_t *r)
> >  {
> >         asm volatile(LOCK_PREFIX "decl %0\n\t"
> > --- a/arch/x86/mm/extable.c
> > +++ b/arch/x86/mm/extable.c
> > @@ -45,6 +45,7 @@ bool ex_handler_refcount(const struct ex
> >  {
> >         /* First unconditionally saturate the refcount. */
> >         *(int *)regs->cx = INT_MIN / 2;
> > +       trace_printk("*(int *)regs->cx = %d\n", *(int *)regs->cx);
> 
> Just for fun, can you print out *(int *)regs->cx before the assignment too?

Sure, tomorrow.

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01 17:52                                               ` Mike Galbraith
@ 2017-09-01 18:58                                                 ` Kees Cook
  2017-09-01 19:24                                                   ` Mike Galbraith
  2017-09-01 19:40                                                   ` Kees Cook
  0 siblings, 2 replies; 36+ messages in thread
From: Kees Cook @ 2017-09-01 18:58 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, Sep 1, 2017 at 10:52 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Fri, 2017-09-01 at 10:12 -0700, Kees Cook wrote:
>> On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith <efault@gmx.de> wrote:
>> > On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
>> >> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
>> >> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
>> >> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>> >> > >>
>> >> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
>> >> > >>
>> >> > >> 4.8.5: WARN, eventual kernel hang
>> >> > >> 6.3.1, 7.0.1: WARN, but continues working
>> >> > >
>> >> > > Yeah, that's correct.  I find that troubling, simply because this gcc
>> >> > > version has been through one hell of a lot of kernels with me.  Yeah, I
>> >> > > know, that doesn't exempt it from having bugs, but color me suspicious.
>> >> >
>> >> > I still can't hit this with a 4.8.5 build. :(
>> >> >
>> >> > With _RATELIMIT removed, this should, in theory, report whatever goes
>> >> > negative first...
>> >>
>> >> I applied the other patch you posted, and built with gcc-6.3.1 to
>> >> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
>> >
>> > Grr, that one has a in6_dev_getx() line missing for the first
>> > increment, where things go pear shaped.
>> >
>> > With that added, looking at counter both before, and after incl, with a
>> > trace_printk() in the exception handler showing it doing its saturate
>> > thing, irqs disabled across the whole damn refcount_inc(), and even
>> > booting box nr_cpus=1 for extra credit...
>> >
>> > HTH can that first refcount_inc() get there?
>> >
>> > # tracer: nop
>> > #
>> > #                              _-----=> irqs-off
>> > #                             / _----=> need-resched
>> > #                            | / _---=> hardirq/softirq
>> > #                            || / _--=> preempt-depth
>> > #                            ||| /     delay
>> > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>> > #              | |       |   ||||       |         |
>> >          systemd-1     [000] d..1     1.937284: in6_dev_getx: PRE refs.counter:3
>> >          systemd-1     [000] d..1     1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
>> >          systemd-1     [000] d..1     1.937296: in6_dev_getx: POST refs.counter:-1073741824
>>
>> O_o
>>
>> Can you paste the disassembly of in6_dev_getx? I can't understand how
>> we're landing in the exception handler.
>
> I was hoping you'd say that.
>
>    0xffffffff816b2f72 <+0>:     push   %rbp
>    0xffffffff816b2f73 <+1>:     mov    %rsp,%rbp
>    0xffffffff816b2f76 <+4>:     push   %r12
>    0xffffffff816b2f78 <+6>:     push   %rbx
>    0xffffffff816b2f79 <+7>:     incl   %gs:0x7e95a2d0(%rip)        # 0xd250 <__preempt_count>
>    0xffffffff816b2f80 <+14>:    mov    0x308(%rdi),%rbx
>    0xffffffff816b2f87 <+21>:    test   %rbx,%rbx
>    0xffffffff816b2f8a <+24>:    je     0xffffffff816b2feb <in6_dev_getx+121>
>    0xffffffff816b2f8c <+26>:    callq  *0xffffffff81c35a00
>    0xffffffff816b2f93 <+33>:    mov    %rax,%r12
>    0xffffffff816b2f96 <+36>:    callq  *0xffffffff81c35a10
>    0xffffffff816b2f9d <+43>:    mov    0x769ad4(%rip),%rsi        # 0xffffffff81e1ca78 <trace_printk_fmt.21733>
>    0xffffffff816b2fa4 <+50>:    mov    0xf0(%rbx),%edx
>    0xffffffff816b2faa <+56>:    mov    $0xffffffff816b2f8c,%rdi
>    0xffffffff816b2fb1 <+63>:    callq  0xffffffff81171fc0 <__trace_bprintk>
>    0xffffffff816b2fb6 <+68>:    lock incl 0xf0(%rbx)
>    0xffffffff816b2fbd <+75>:    js     0xffffffff816b2fbf <in6_dev_getx+77>
>    0xffffffff816b2fbf <+77>:    lea    0xf0(%rbx),%rcx
>    0xffffffff816b2fc6 <+84>:    (bad)
>    0xffffffff816b2fc8 <+86>:    mov    0x769a99(%rip),%rsi        # 0xffffffff81e1ca68 <trace_printk_fmt.21744>
>    0xffffffff816b2fcf <+93>:    mov    0xf0(%rbx),%edx
>    0xffffffff816b2fd5 <+99>:    mov    $0xffffffff816b2f8c,%rdi
>    0xffffffff816b2fdc <+106>:   callq  0xffffffff81171fc0 <__trace_bprintk>
>    0xffffffff816b2fe1 <+111>:   mov    %r12,%rdi
>    0xffffffff816b2fe4 <+114>:   callq  *0xffffffff81c35a08
>    0xffffffff816b2feb <+121>:   decl   %gs:0x7e95a25e(%rip)        # 0xd250 <__preempt_count>
>    0xffffffff816b2ff2 <+128>:   mov    %rbx,%rax
>    0xffffffff816b2ff5 <+131>:   pop    %rbx
>    0xffffffff816b2ff6 <+132>:   pop    %r12
>    0xffffffff816b2ff8 <+134>:   pop    %rbp
>    0xffffffff816b2ff9 <+135>:   retq
>
> I don't get the section business at all, +75 looks to me like we're
> gonna trap no matter what.. as we appear to be doing.

The section stuff is supposed to be a trick to push the error case off
into the .text.unlikely area to avoid needing a jmp over the handler
and with possibly some redundancy removal done by the compiler (though
this appears to be rather limited) if it notices a bunch of error
paths are the same. However, in your disassembly, it's inline (!!) in
the code, as if "pushsection" and "popsection" were entirely ignored.

And when I make my own in6_dev_getx(), I see the same disassembly:

   0xffffffff818a757b <+181>:   lock incl 0x1e0(%rbx)
   0xffffffff818a7582 <+188>:   js     0xffffffff818a7584 <in6_dev_getx+190>
   0xffffffff818a7584 <+190>:   lea    0x1e0(%rbx),%rcx
   0xffffffff818a758b <+197>:   (bad)

Which is VERY different from how it looks in other places!

e.g. from lkdtm_REFCOUNT_INC_SATURATED:

   0xffffffff815657df <+47>:    lock incl -0xc(%rbp)
   0xffffffff815657e3 <+51>:    js     0xffffffff81565cac
...
   0xffffffff81565cac:  lea    -0xc(%rbp),%rcx
   0xffffffff81565cb0:  (bad)

So, at least I can reproduce this in the build now. I must not be
exercising these paths. FWIW, this is with Ubuntu's 6.3.0 gcc.

I'll try to figure out what's going on here...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01 18:58                                                 ` Kees Cook
@ 2017-09-01 19:24                                                   ` Mike Galbraith
  2017-09-01 19:40                                                   ` Kees Cook
  1 sibling, 0 replies; 36+ messages in thread
From: Mike Galbraith @ 2017-09-01 19:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, 2017-09-01 at 11:58 -0700, Kees Cook wrote:
> 
> The section stuff is supposed to be a trick to push the error case off
> into the .text.unlikely area to avoid needing a jmp over the handler
> and with possibly some redundancy removal done by the compiler (though
> this appears to be rather limited) if it notices a bunch of error
> paths are the same. However, in your disassembly, it's inline (!!) in
> the code, as if "pushsection" and "popsection" were entirely ignored.
> 
> And when I make my own in6_dev_getx(), I see the same disassembly:
> 
>    0xffffffff818a757b <+181>:   lock incl 0x1e0(%rbx)
>    0xffffffff818a7582 <+188>:   js     0xffffffff818a7584 <in6_dev_getx+190>
>    0xffffffff818a7584 <+190>:   lea    0x1e0(%rbx),%rcx
>    0xffffffff818a758b <+197>:   (bad)
> 
> Which is VERY different from how it looks in other places!
> 
> e.g. from lkdtm_REFCOUNT_INC_SATURATED:
> 
>    0xffffffff815657df <+47>:    lock incl -0xc(%rbp)
>    0xffffffff815657e3 <+51>:    js     0xffffffff81565cac
> ...
>    0xffffffff81565cac:  lea    -0xc(%rbp),%rcx
>    0xffffffff81565cb0:  (bad)
> 
> So, at least I can reproduce this in the build now. I must not be
> exercising these paths. FWIW, this is with Ubuntu's 6.3.0 gcc.
> 
> I'll try to figure out what's going on here...

Heh, make in6_dev_getx() __always_inline.

       swapper/0-1     [000] d..1     1.438587: ip6_route_init_special_entries: PRE refs.counter:3
       swapper/0-1     [000] d..1     1.438590: ip6_route_init_special_entries: POST refs.counter:4
       swapper/0-1     [000] d..1     1.438591: ip6_route_init_special_entries: PRE refs.counter:4
       swapper/0-1     [000] d..1     1.438592: ip6_route_init_special_entries: POST refs.counter:5
       swapper/0-1     [000] d..1     1.438592: ip6_route_init_special_entries: PRE refs.counter:5
       swapper/0-1     [000] d..1     1.438593: ip6_route_init_special_entries: POST refs.counter:6

	-Mike

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01 18:58                                                 ` Kees Cook
  2017-09-01 19:24                                                   ` Mike Galbraith
@ 2017-09-01 19:40                                                   ` Kees Cook
  1 sibling, 0 replies; 36+ messages in thread
From: Kees Cook @ 2017-09-01 19:40 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, Sep 1, 2017 at 11:58 AM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Sep 1, 2017 at 10:52 AM, Mike Galbraith <efault@gmx.de> wrote:
>> On Fri, 2017-09-01 at 10:12 -0700, Kees Cook wrote:
>>> On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith <efault@gmx.de> wrote:
>>> > On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
>>> >> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
>>> >> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
>>> >> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>>> >> > >>
>>> >> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
>>> >> > >>
>>> >> > >> 4.8.5: WARN, eventual kernel hang
>>> >> > >> 6.3.1, 7.0.1: WARN, but continues working
>>> >> > >
>>> >> > > Yeah, that's correct.  I find that troubling, simply because this gcc
>>> >> > > version has been through one hell of a lot of kernels with me.  Yeah, I
>>> >> > > know, that doesn't exempt it from having bugs, but color me suspicious.
>>> >> >
>>> >> > I still can't hit this with a 4.8.5 build. :(
>>> >> >
>>> >> > With _RATELIMIT removed, this should, in theory, report whatever goes
>>> >> > negative first...
>>> >>
>>> >> I applied the other patch you posted, and built with gcc-6.3.1 to
>>> >> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
>>> >
>>> > Grr, that one has a in6_dev_getx() line missing for the first
>>> > increment, where things go pear shaped.
>>> >
>>> > With that added, looking at counter both before, and after incl, with a
>>> > trace_printk() in the exception handler showing it doing its saturate
>>> > thing, irqs disabled across the whole damn refcount_inc(), and even
>>> > booting box nr_cpus=1 for extra credit...
>>> >
>>> > HTH can that first refcount_inc() get there?
>>> >
>>> > # tracer: nop
>>> > #
>>> > #                              _-----=> irqs-off
>>> > #                             / _----=> need-resched
>>> > #                            | / _---=> hardirq/softirq
>>> > #                            || / _--=> preempt-depth
>>> > #                            ||| /     delay
>>> > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>> > #              | |       |   ||||       |         |
>>> >          systemd-1     [000] d..1     1.937284: in6_dev_getx: PRE refs.counter:3
>>> >          systemd-1     [000] d..1     1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
>>> >          systemd-1     [000] d..1     1.937296: in6_dev_getx: POST refs.counter:-1073741824
>>>
>>> O_o
>>>
>>> Can you paste the disassembly of in6_dev_getx? I can't understand how
>>> we're landing in the exception handler.
>>
>> I was hoping you'd say that.
>>
>>    0xffffffff816b2f72 <+0>:     push   %rbp
>>    0xffffffff816b2f73 <+1>:     mov    %rsp,%rbp
>>    0xffffffff816b2f76 <+4>:     push   %r12
>>    0xffffffff816b2f78 <+6>:     push   %rbx
>>    0xffffffff816b2f79 <+7>:     incl   %gs:0x7e95a2d0(%rip)        # 0xd250 <__preempt_count>
>>    0xffffffff816b2f80 <+14>:    mov    0x308(%rdi),%rbx
>>    0xffffffff816b2f87 <+21>:    test   %rbx,%rbx
>>    0xffffffff816b2f8a <+24>:    je     0xffffffff816b2feb <in6_dev_getx+121>
>>    0xffffffff816b2f8c <+26>:    callq  *0xffffffff81c35a00
>>    0xffffffff816b2f93 <+33>:    mov    %rax,%r12
>>    0xffffffff816b2f96 <+36>:    callq  *0xffffffff81c35a10
>>    0xffffffff816b2f9d <+43>:    mov    0x769ad4(%rip),%rsi        # 0xffffffff81e1ca78 <trace_printk_fmt.21733>
>>    0xffffffff816b2fa4 <+50>:    mov    0xf0(%rbx),%edx
>>    0xffffffff816b2faa <+56>:    mov    $0xffffffff816b2f8c,%rdi
>>    0xffffffff816b2fb1 <+63>:    callq  0xffffffff81171fc0 <__trace_bprintk>
>>    0xffffffff816b2fb6 <+68>:    lock incl 0xf0(%rbx)
>>    0xffffffff816b2fbd <+75>:    js     0xffffffff816b2fbf <in6_dev_getx+77>
>>    0xffffffff816b2fbf <+77>:    lea    0xf0(%rbx),%rcx
>>    0xffffffff816b2fc6 <+84>:    (bad)
>>    0xffffffff816b2fc8 <+86>:    mov    0x769a99(%rip),%rsi        # 0xffffffff81e1ca68 <trace_printk_fmt.21744>
>>    0xffffffff816b2fcf <+93>:    mov    0xf0(%rbx),%edx
>>    0xffffffff816b2fd5 <+99>:    mov    $0xffffffff816b2f8c,%rdi
>>    0xffffffff816b2fdc <+106>:   callq  0xffffffff81171fc0 <__trace_bprintk>
>>    0xffffffff816b2fe1 <+111>:   mov    %r12,%rdi
>>    0xffffffff816b2fe4 <+114>:   callq  *0xffffffff81c35a08
>>    0xffffffff816b2feb <+121>:   decl   %gs:0x7e95a25e(%rip)        # 0xd250 <__preempt_count>
>>    0xffffffff816b2ff2 <+128>:   mov    %rbx,%rax
>>    0xffffffff816b2ff5 <+131>:   pop    %rbx
>>    0xffffffff816b2ff6 <+132>:   pop    %r12
>>    0xffffffff816b2ff8 <+134>:   pop    %rbp
>>    0xffffffff816b2ff9 <+135>:   retq
>>
>> I don't get the section business at all, +75 looks to me like we're
>> gonna trap no matter what.. as we appear to be doing.
>
> The section stuff is supposed to be a trick to push the error case off
> into the .text.unlikely area to avoid needing a jmp over the handler
> and with possibly some redundancy removal done by the compiler (though
> this appears to be rather limited) if it notices a bunch of error
> paths are the same. However, in your disassembly, it's inline (!!) in
> the code, as if "pushsection" and "popsection" were entirely ignored.
>
> And when I make my own in6_dev_getx(), I see the same disassembly:
>
>    0xffffffff818a757b <+181>:   lock incl 0x1e0(%rbx)
>    0xffffffff818a7582 <+188>:   js     0xffffffff818a7584 <in6_dev_getx+190>
>    0xffffffff818a7584 <+190>:   lea    0x1e0(%rbx),%rcx
>    0xffffffff818a758b <+197>:   (bad)
>
> Which is VERY different from how it looks in other places!

Found it.

If the compiler already pushed the entire function into
.text.unlikely, x86-refcount's .pushsection doesn't do any good
(obviously). Durrr.

        .section        .text.unlikely,"ax",@progbits
        .type   in6_dev_getx, @function
in6_dev_getx:
.LFB4673:
        .loc 2 4128 0
        .cfi_startproc
...
        lock; incl 480(%rbx)
        js 111f
        .pushsection .text.unlikely
111:    lea 480(%rbx), %rcx
112:    .byte 0x0f, 0xff
.popsection
113:

I will get this fixed. Thank you again for helping track this down!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-09-01 19:40 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29  8:50 tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection Mike Galbraith
2017-08-29  8:58 ` Ingo Molnar
2017-08-29  9:21   ` Mike Galbraith
2017-08-29  9:27     ` Ingo Molnar
2017-08-29 10:03       ` Mike Galbraith
2017-08-29 15:39         ` Kees Cook
2017-08-29 16:34           ` Mike Galbraith
2017-08-29 15:58 ` Kees Cook
2017-08-29 16:55   ` Mike Galbraith
2017-08-29 18:10     ` Mike Galbraith
2017-08-29 18:41       ` Kees Cook
2017-08-30  5:02         ` Mike Galbraith
2017-08-30 16:35           ` Kees Cook
2017-08-30 17:13             ` Mike Galbraith
2017-08-30 17:32               ` Kees Cook
2017-08-30 17:55                 ` Mike Galbraith
2017-08-30 19:19                   ` Kees Cook
2017-08-30 19:46                     ` Kees Cook
2017-08-31  2:09                       ` Mike Galbraith
2017-08-31  2:27                         ` Kees Cook
2017-08-31  3:12                           ` Mike Galbraith
2017-08-31  4:01                             ` Kees Cook
2017-08-31  4:10                               ` Kees Cook
2017-08-31  4:38                                 ` Mike Galbraith
2017-08-31 13:58                                 ` Mike Galbraith
2017-08-31 17:00                                   ` Kees Cook
2017-08-31 17:19                                     ` Mike Galbraith
2017-08-31 18:45                                       ` Kees Cook
2017-09-01  6:57                                         ` Mike Galbraith
2017-09-01 13:09                                           ` Mike Galbraith
2017-09-01 17:12                                             ` Kees Cook
2017-09-01 17:52                                               ` Mike Galbraith
2017-09-01 18:58                                                 ` Kees Cook
2017-09-01 19:24                                                   ` Mike Galbraith
2017-09-01 19:40                                                   ` Kees Cook
2017-08-31 19:28                                   ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.