Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
       [not found]                           ` <1504149176.23109.9.camel@gmx.de>
@ 2017-08-31  4:01                             ` Kees Cook
  2017-08-31  4:10                               ` Kees Cook
  0 siblings, 1 reply; 15+ messages in thread
From: Kees Cook @ 2017-08-31  4:01 UTC (permalink / raw)
  To: Mike Galbraith, David S. Miller
  Cc: LKML, Ingo Molnar, Reshetova, Elena, Network Development

On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
>
>> Interesting! Can you try with 633547973ffc3 ("net: convert
>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
>> running haveged will help me trigger this on my system...
>
> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.

Wonderful! Thank you so much for helping track this down.

So, it seems that sk_buff.users will need some more special attention
before we can convert it to refcount.

x86-refcount will saturate with refcount_dec_and_test() if the result
is negative. But that would mean at least starting at 0. FULL should
have WARNed in this case, so I remain slightly confused why it was
missed by FULL.

Ingo, I'm not sure the best path for this. It seems we need to revert
230cd1279d001 and 633547973ffc3 and then we can restore
ARCH_HAS_REFCOUNT.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31  4:01                             ` tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection Kees Cook
@ 2017-08-31  4:10                               ` Kees Cook
  2017-08-31  4:38                                 ` Mike Galbraith
  2017-08-31 13:58                                 ` Mike Galbraith
  0 siblings, 2 replies; 15+ messages in thread
From: Kees Cook @ 2017-08-31  4:10 UTC (permalink / raw)
  To: Mike Galbraith, David S. Miller, Peter Zijlstra
  Cc: LKML, Ingo Molnar, Reshetova, Elena, Network Development

On Wed, Aug 30, 2017 at 9:01 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
>> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
>>
>>> Interesting! Can you try with 633547973ffc3 ("net: convert
>>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
>>> running haveged will help me trigger this on my system...
>>
>> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.
>
> Wonderful! Thank you so much for helping track this down.
>
> So, it seems that sk_buff.users will need some more special attention
> before we can convert it to refcount.
>
> x86-refcount will saturate with refcount_dec_and_test() if the result
> is negative. But that would mean at least starting at 0. FULL should
> have WARNed in this case, so I remain slightly confused why it was
> missed by FULL.

Actually, if this is a race condition it's possible that FULL is slow
enough to miss it...

I bet something briefly takes the refcount negative, and with
unchecked atomics, it come back up positive again during the race.
FULL may miss the race, and x86-refcount will catch it and saturate...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31  4:10                               ` Kees Cook
@ 2017-08-31  4:38                                 ` Mike Galbraith
  2017-08-31 13:58                                 ` Mike Galbraith
  1 sibling, 0 replies; 15+ messages in thread
From: Mike Galbraith @ 2017-08-31  4:38 UTC (permalink / raw)
  To: Kees Cook, David S. Miller, Peter Zijlstra
  Cc: LKML, Ingo Molnar, Reshetova, Elena, Network Development

On Wed, 2017-08-30 at 21:10 -0700, Kees Cook wrote:
> On Wed, Aug 30, 2017 at 9:01 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
> >> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
> >>
> >>> Interesting! Can you try with 633547973ffc3 ("net: convert
> >>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
> >>> running haveged will help me trigger this on my system...
> >>
> >> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.
> >
> > Wonderful! Thank you so much for helping track this down.
> >
> > So, it seems that sk_buff.users will need some more special attention
> > before we can convert it to refcount.
> >
> > x86-refcount will saturate with refcount_dec_and_test() if the result
> > is negative. But that would mean at least starting at 0. FULL should
> > have WARNed in this case, so I remain slightly confused why it was
> > missed by FULL.
> 
> Actually, if this is a race condition it's possible that FULL is slow
> enough to miss it...
> 
> I bet something briefly takes the refcount negative, and with
> unchecked atomics, it come back up positive again during the race.
> FULL may miss the race, and x86-refcount will catch it and saturate...

Hm, I'll go have a stare.. not that that's likely to turn anything up,
memory ordering stares usually inducing a zombie like state.

	-Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31  4:10                               ` Kees Cook
  2017-08-31  4:38                                 ` Mike Galbraith
@ 2017-08-31 13:58                                 ` Mike Galbraith
  2017-08-31 17:00                                   ` Kees Cook
  2017-08-31 19:28                                   ` Kees Cook
  1 sibling, 2 replies; 15+ messages in thread
From: Mike Galbraith @ 2017-08-31 13:58 UTC (permalink / raw)
  To: Kees Cook, David S. Miller, Peter Zijlstra
  Cc: LKML, Ingo Molnar, Reshetova, Elena, Network Development

On Wed, 2017-08-30 at 21:10 -0700, Kees Cook wrote:
> On Wed, Aug 30, 2017 at 9:01 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
> >> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
> >>
> >>> Interesting! Can you try with 633547973ffc3 ("net: convert
> >>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
> >>> running haveged will help me trigger this on my system...
> >>
> >> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.
> >
> > Wonderful! Thank you so much for helping track this down.
> >
> > So, it seems that sk_buff.users will need some more special attention
> > before we can convert it to refcount.
> >
> > x86-refcount will saturate with refcount_dec_and_test() if the result
> > is negative. But that would mean at least starting at 0. FULL should
> > have WARNed in this case, so I remain slightly confused why it was
> > missed by FULL.
> 
> Actually, if this is a race condition it's possible that FULL is slow
> enough to miss it...
> 
> I bet something briefly takes the refcount negative, and with
> unchecked atomics, it come back up positive again during the race.
> FULL may miss the race, and x86-refcount will catch it and saturate...

(gdb) list *in6_dev_get+0x1e
0xffffffff8166d3de is in in6_dev_get (./arch/x86/include/asm/refcount.h:52).
47                      : "cc", "cx");
48      }
49
50      static __always_inline void refcount_inc(refcount_t *r)
51      {
52              asm volatile(LOCK_PREFIX "incl %0\n\t"
53                      REFCOUNT_CHECK_LT_ZERO
54                      : [counter] "+m" (r->refs.counter)
55                      : : "cc", "cx");
56

gdb) list *in6_dev_get+0x10
0xffffffff8166d3d0 is in in6_dev_get (./include/net/addrconf.h:318).
313     {
314             struct inet6_dev *idev;
315
316             rcu_read_lock();
317             idev = rcu_dereference(dev->ip6_ptr);
318             if (idev)
319                     refcount_inc(&idev->refcnt);
320             rcu_read_unlock();
321             return idev;
322 

That's from kernel with no revert, but your silent saturation patch
still applied, AND built with gcc-6.3.1.  Kernel traps, but it boots
and works, as does kernel built with gcc-7.0.1.  Remove your silent
saturation patch, kernel doesn't notice a thing, just works.

With gcc-4.8.5, trap means you're as good as dead, with the other two,
trap means the intended.  Compiler, constraints, dark elves.. pick one.

Full first splat from bootable gcc-6.3.1 built kernel.

[    1.293962] NET: Registered protocol family 10
[    1.294635] refcount_t silent saturation at in6_dev_get+0x25/0x104 in swapper/0[1], uid/euid: 0/0
[    1.295616] ------------[ cut here ]------------
[    1.296120] WARNING: CPU: 0 PID: 1 at kernel/panic.c:612 refcount_error_report+0x94/0x9e
[    1.296950] Modules linked in:
[    1.297276] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0.g152d54a-tip-default #53
[    1.299179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    1.300743] task: ffff88013ab84040 task.stack: ffffc9000062c000
[    1.301825] RIP: 0010:refcount_error_report+0x94/0x9e
[    1.302804] RSP: 0018:ffffc9000062fc10 EFLAGS: 00010282
[    1.303791] RAX: 0000000000000055 RBX: ffffffff81a34274 RCX: ffffffff81c605e8
[    1.304991] RDX: 0000000000000001 RSI: 0000000000000096 RDI: 0000000000000246
[    1.306189] RBP: ffffc9000062fd58 R08: 0000000000000000 R09: 0000000000000175
[    1.307392] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88013ab84040
[    1.308583] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff81a256c8
[    1.309768] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[    1.311052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.312100] CR2: 00007f4631fe8df0 CR3: 0000000137d09003 CR4: 00000000001606f0
[    1.313301] Call Trace:
[    1.314012]  ex_handler_refcount+0x63/0x70
[    1.314893]  fixup_exception+0x32/0x40
[    1.315737]  do_trap+0x8c/0x170
[    1.316519]  do_error_trap+0x70/0xd0
[    1.317340]  ? in6_dev_get+0x23/0x104
[    1.318172]  ? netlink_broadcast_filtered+0x2bd/0x430
[    1.319156]  ? kmem_cache_alloc_trace+0xce/0x5d0
[    1.320098]  ? set_debug_rodata+0x11/0x11
[    1.320964]  invalid_op+0x1e/0x30
[    1.322520] RIP: 0010:in6_dev_get+0x25/0x104
[    1.323631] RSP: 0018:ffffc9000062fe00 EFLAGS: 00010202
[    1.324614] RAX: ffff880137de2400 RBX: ffff880137df4600 RCX: ffff880137de24f0
[    1.325793] RDX: ffff88013a5e4000 RSI: 00000000fffffe00 RDI: ffff88013a5e4000
[    1.326964] RBP: 00000000000000d1 R08: 0000000000000000 R09: ffff880137de7600
[    1.328150] R10: 0000000000000000 R11: ffff8801398a4df8 R12: 0000000000000000
[    1.329374] R13: ffffffff82137872 R14: 014200ca00000000 R15: 0000000000000000
[    1.330547]  ? set_debug_rodata+0x11/0x11
[    1.331392]  ip6_route_init_special_entries+0x2a/0x89
[    1.332369]  addrconf_init+0x9e/0x203
[    1.333173]  inet6_init+0x1af/0x365
[    1.333956]  ? af_unix_init+0x4e/0x4e
[    1.334753]  do_one_initcall+0x4e/0x190
[    1.335555]  ? set_debug_rodata+0x11/0x11
[    1.336369]  kernel_init_freeable+0x189/0x20e
[    1.337230]  ? rest_init+0xd0/0xd0
[    1.337999]  kernel_init+0xa/0xf7
[    1.338744]  ret_from_fork+0x25/0x30
[    1.339500] Code: 48 8b 95 80 00 00 00 41 55 49 8d 8c 24 f0 0a 00 00 45 8b 84 24 10 09 00 00 41 89 c1 48 89 de 48 c7 c7 60 7a a3 81 e8 07 de 05 00 <0f> ff 58 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 55 48 89 e5 41 56 
[    1.342243] ---[ end trace b5d40c0fccce776c ]---

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31 13:58                                 ` Mike Galbraith
@ 2017-08-31 17:00                                   ` Kees Cook
  2017-08-31 17:19                                     ` Mike Galbraith
  2017-08-31 19:28                                   ` Kees Cook
  1 sibling, 1 reply; 15+ messages in thread
From: Kees Cook @ 2017-08-31 17:00 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Thu, Aug 31, 2017 at 6:58 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Wed, 2017-08-30 at 21:10 -0700, Kees Cook wrote:
>> On Wed, Aug 30, 2017 at 9:01 PM, Kees Cook <keescook@chromium.org> wrote:
>> > On Wed, Aug 30, 2017 at 8:12 PM, Mike Galbraith <efault@gmx.de> wrote:
>> >> On Wed, 2017-08-30 at 19:27 -0700, Kees Cook wrote:
>> >>
>> >>> Interesting! Can you try with 633547973ffc3 ("net: convert
>> >>> sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
>> >>> running haveged will help me trigger this on my system...
>> >>
>> >> With that (plus 230cd1279d001 fix to it) reverted, vbox boots.
>> >
>> > Wonderful! Thank you so much for helping track this down.
>> >
>> > So, it seems that sk_buff.users will need some more special attention
>> > before we can convert it to refcount.
>> >
>> > x86-refcount will saturate with refcount_dec_and_test() if the result
>> > is negative. But that would mean at least starting at 0. FULL should
>> > have WARNed in this case, so I remain slightly confused why it was
>> > missed by FULL.
>>
>> Actually, if this is a race condition it's possible that FULL is slow
>> enough to miss it...
>>
>> I bet something briefly takes the refcount negative, and with
>> unchecked atomics, it come back up positive again during the race.
>> FULL may miss the race, and x86-refcount will catch it and saturate...
>
> (gdb) list *in6_dev_get+0x1e
> 0xffffffff8166d3de is in in6_dev_get (./arch/x86/include/asm/refcount.h:52).
> 47                      : "cc", "cx");
> 48      }
> 49
> 50      static __always_inline void refcount_inc(refcount_t *r)
> 51      {
> 52              asm volatile(LOCK_PREFIX "incl %0\n\t"
> 53                      REFCOUNT_CHECK_LT_ZERO
> 54                      : [counter] "+m" (r->refs.counter)
> 55                      : : "cc", "cx");
> 56
>
> gdb) list *in6_dev_get+0x10
> 0xffffffff8166d3d0 is in in6_dev_get (./include/net/addrconf.h:318).
> 313     {
> 314             struct inet6_dev *idev;
> 315
> 316             rcu_read_lock();
> 317             idev = rcu_dereference(dev->ip6_ptr);
> 318             if (idev)
> 319                     refcount_inc(&idev->refcnt);
> 320             rcu_read_unlock();
> 321             return idev;
> 322
>
> That's from kernel with no revert, but your silent saturation patch
> still applied, AND built with gcc-6.3.1.  Kernel traps, but it boots
> and works, as does kernel built with gcc-7.0.1.  Remove your silent
> saturation patch, kernel doesn't notice a thing, just works.
>
> With gcc-4.8.5, trap means you're as good as dead, with the other two,
> trap means the intended.  Compiler, constraints, dark elves.. pick one.

Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:

4.8.5: WARN, eventual kernel hang
6.3.1, 7.0.1: WARN, but continues working

> Full first splat from bootable gcc-6.3.1 built kernel.
>
> [    1.293962] NET: Registered protocol family 10
> [    1.294635] refcount_t silent saturation at in6_dev_get+0x25/0x104 in swapper/0[1], uid/euid: 0/0

That's an _increment_ saturation? Which means the result must be
negative, so it started from least -2.

> [    1.295616] ------------[ cut here ]------------
> [    1.296120] WARNING: CPU: 0 PID: 1 at kernel/panic.c:612 refcount_error_report+0x94/0x9e
> [    1.296950] Modules linked in:
> [    1.297276] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0.g152d54a-tip-default #53
> [    1.299179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [    1.300743] task: ffff88013ab84040 task.stack: ffffc9000062c000
> [    1.301825] RIP: 0010:refcount_error_report+0x94/0x9e
> [    1.302804] RSP: 0018:ffffc9000062fc10 EFLAGS: 00010282
> [    1.303791] RAX: 0000000000000055 RBX: ffffffff81a34274 RCX: ffffffff81c605e8
> [    1.304991] RDX: 0000000000000001 RSI: 0000000000000096 RDI: 0000000000000246
> [    1.306189] RBP: ffffc9000062fd58 R08: 0000000000000000 R09: 0000000000000175
> [    1.307392] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88013ab84040
> [    1.308583] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff81a256c8
> [    1.309768] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
> [    1.311052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.312100] CR2: 00007f4631fe8df0 CR3: 0000000137d09003 CR4: 00000000001606f0
> [    1.313301] Call Trace:
> [    1.314012]  ex_handler_refcount+0x63/0x70
> [    1.314893]  fixup_exception+0x32/0x40
> [    1.315737]  do_trap+0x8c/0x170
> [    1.316519]  do_error_trap+0x70/0xd0
> [    1.317340]  ? in6_dev_get+0x23/0x104
> [    1.318172]  ? netlink_broadcast_filtered+0x2bd/0x430
> [    1.319156]  ? kmem_cache_alloc_trace+0xce/0x5d0
> [    1.320098]  ? set_debug_rodata+0x11/0x11
> [    1.320964]  invalid_op+0x1e/0x30
> [    1.322520] RIP: 0010:in6_dev_get+0x25/0x104
> [    1.323631] RSP: 0018:ffffc9000062fe00 EFLAGS: 00010202
> [    1.324614] RAX: ffff880137de2400 RBX: ffff880137df4600 RCX: ffff880137de24f0
> [    1.325793] RDX: ffff88013a5e4000 RSI: 00000000fffffe00 RDI: ffff88013a5e4000
> [    1.326964] RBP: 00000000000000d1 R08: 0000000000000000 R09: ffff880137de7600
> [    1.328150] R10: 0000000000000000 R11: ffff8801398a4df8 R12: 0000000000000000
> [    1.329374] R13: ffffffff82137872 R14: 014200ca00000000 R15: 0000000000000000
> [    1.330547]  ? set_debug_rodata+0x11/0x11
> [    1.331392]  ip6_route_init_special_entries+0x2a/0x89
> [    1.332369]  addrconf_init+0x9e/0x203
> [    1.333173]  inet6_init+0x1af/0x365
> [    1.333956]  ? af_unix_init+0x4e/0x4e
> [    1.334753]  do_one_initcall+0x4e/0x190
> [    1.335555]  ? set_debug_rodata+0x11/0x11
> [    1.336369]  kernel_init_freeable+0x189/0x20e
> [    1.337230]  ? rest_init+0xd0/0xd0
> [    1.337999]  kernel_init+0xa/0xf7
> [    1.338744]  ret_from_fork+0x25/0x30
> [    1.339500] Code: 48 8b 95 80 00 00 00 41 55 49 8d 8c 24 f0 0a 00 00 45 8b 84 24 10 09 00 00 41 89 c1 48 89 de 48 c7 c7 60 7a a3 81 e8 07 de 05 00 <0f> ff 58 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 55 48 89 e5 41 56
> [    1.342243] ---[ end trace b5d40c0fccce776c ]---

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31 17:00                                   ` Kees Cook
@ 2017-08-31 17:19                                     ` Mike Galbraith
  2017-08-31 18:45                                       ` Kees Cook
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2017-08-31 17:19 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
> 
> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
> 
> 4.8.5: WARN, eventual kernel hang
> 6.3.1, 7.0.1: WARN, but continues working

Yeah, that's correct.  I find that troubling, simply because this gcc
version has been through one hell of a lot of kernels with me.  Yeah, I
know, that doesn't exempt it from having bugs, but color me suspicious.

	-Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31 17:19                                     ` Mike Galbraith
@ 2017-08-31 18:45                                       ` Kees Cook
  2017-09-01  6:57                                         ` Mike Galbraith
  0 siblings, 1 reply; 15+ messages in thread
From: Kees Cook @ 2017-08-31 18:45 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>>
>> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
>>
>> 4.8.5: WARN, eventual kernel hang
>> 6.3.1, 7.0.1: WARN, but continues working
>
> Yeah, that's correct.  I find that troubling, simply because this gcc
> version has been through one hell of a lot of kernels with me.  Yeah, I
> know, that doesn't exempt it from having bugs, but color me suspicious.

I still can't hit this with a 4.8.5 build. :(

With _RATELIMIT removed, this should, in theory, report whatever goes
negative first...

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index c076f710de4c..d4fc6b91c0e6 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -72,6 +72,10 @@ bool ex_handler_refcount(const struct
exception_table_entry *fixup,
                bool zero = regs->flags & X86_EFLAGS_ZF;

                refcount_error_report(regs, zero ? "hit zero" : "overflow");
+       } else if (regs->flags & X86_EFLAGS_SF) {
+               refcount_error_report(regs, "result was negative");
+       } else {
+               refcount_error_report(regs, "unknown state");
        }

        return true;
diff --git a/kernel/panic.c b/kernel/panic.c
index bdd18afa19a4..966ade491543 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -605,7 +605,7 @@ EXPORT_SYMBOL(__stack_chk_fail);
 #ifdef CONFIG_ARCH_HAS_REFCOUNT
 void refcount_error_report(struct pt_regs *regs, const char *err)
 {
-       WARN_RATELIMIT(1, "refcount_t %s at %pB in %s[%d], uid/euid: %u/%u\n",
+       WARN(1, "refcount_t %s at %pB in %s[%d], uid/euid: %u/%u\n",
                err, (void *)instruction_pointer(regs),
                current->comm, task_pid_nr(current),
                from_kuid_munged(&init_user_ns, current_uid()),

And if it is still a refcount_inc(), and only on gcc 4.8.5, then I
think we need to study the resulting assembly...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31 13:58                                 ` Mike Galbraith
  2017-08-31 17:00                                   ` Kees Cook
@ 2017-08-31 19:28                                   ` Kees Cook
  1 sibling, 0 replies; 15+ messages in thread
From: Kees Cook @ 2017-08-31 19:28 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Thu, Aug 31, 2017 at 6:58 AM, Mike Galbraith <efault@gmx.de> wrote:
> gdb) list *in6_dev_get+0x10
> 0xffffffff8166d3d0 is in in6_dev_get (./include/net/addrconf.h:318).
> 313     {
> 314             struct inet6_dev *idev;
> 315
> 316             rcu_read_lock();
> 317             idev = rcu_dereference(dev->ip6_ptr);
> 318             if (idev)
> 319                     refcount_inc(&idev->refcnt);
> 320             rcu_read_unlock();
> 321             return idev;
> 322

And this is a completely different refcount from the other that
tripped. This one is quite simple, too, though I see it uses
refcount_dec(), which is a path to saturation. I've sent a patch to
try to clarify this further...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-08-31 18:45                                       ` Kees Cook
@ 2017-09-01  6:57                                         ` Mike Galbraith
  2017-09-01 13:09                                           ` Mike Galbraith
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2017-09-01  6:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
> On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
> > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
> >>
> >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
> >>
> >> 4.8.5: WARN, eventual kernel hang
> >> 6.3.1, 7.0.1: WARN, but continues working
> >
> > Yeah, that's correct.  I find that troubling, simply because this gcc
> > version has been through one hell of a lot of kernels with me.  Yeah, I
> > know, that doesn't exempt it from having bugs, but color me suspicious.
> 
> I still can't hit this with a 4.8.5 build. :(
> 
> With _RATELIMIT removed, this should, in theory, report whatever goes
> negative first...

I applied the other patch you posted, and built with gcc-6.3.1 to
remove the gcc-4.8.5 aspect.  Look below the resulting splat.

[    1.293962] NET: Registered protocol family 10
[    1.294635] refcount_t silent saturation at in6_dev_get+0x25/0x104 in swapper/0[1], uid/euid: 0/0
[    1.295616] ------------[ cut here ]------------
[    1.296120] WARNING: CPU: 0 PID: 1 at kernel/panic.c:612 refcount_error_report+0x94/0x9e
[    1.296950] Modules linked in:
[    1.297276] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0.g152d54a-tip-default #53
[    1.299179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    1.300743] task: ffff88013ab84040 task.stack: ffffc9000062c000
[    1.301825] RIP: 0010:refcount_error_report+0x94/0x9e
[    1.302804] RSP: 0018:ffffc9000062fc10 EFLAGS: 00010282
[    1.303791] RAX: 0000000000000055 RBX: ffffffff81a34274 RCX: ffffffff81c605e8
[    1.304991] RDX: 0000000000000001 RSI: 0000000000000096 RDI: 0000000000000246
[    1.306189] RBP: ffffc9000062fd58 R08: 0000000000000000 R09: 0000000000000175
[    1.307392] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88013ab84040
[    1.308583] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff81a256c8
[    1.309768] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[    1.311052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.312100] CR2: 00007f4631fe8df0 CR3: 0000000137d09003 CR4: 00000000001606f0
[    1.313301] Call Trace:
[    1.314012]  ex_handler_refcount+0x63/0x70
[    1.314893]  fixup_exception+0x32/0x40
[    1.315737]  do_trap+0x8c/0x170
[    1.316519]  do_error_trap+0x70/0xd0
[    1.317340]  ? in6_dev_get+0x23/0x104
[    1.318172]  ? netlink_broadcast_filtered+0x2bd/0x430
[    1.319156]  ? kmem_cache_alloc_trace+0xce/0x5d0
[    1.320098]  ? set_debug_rodata+0x11/0x11
[    1.320964]  invalid_op+0x1e/0x30
[    1.322520] RIP: 0010:in6_dev_get+0x25/0x104
[    1.323631] RSP: 0018:ffffc9000062fe00 EFLAGS: 00010202
[    1.324614] RAX: ffff880137de2400 RBX: ffff880137df4600 RCX: ffff880137de24f0
[    1.325793] RDX: ffff88013a5e4000 RSI: 00000000fffffe00 RDI: ffff88013a5e4000
[    1.326964] RBP: 00000000000000d1 R08: 0000000000000000 R09: ffff880137de7600
[    1.328150] R10: 0000000000000000 R11: ffff8801398a4df8 R12: 0000000000000000
[    1.329374] R13: ffffffff82137872 R14: 014200ca00000000 R15: 0000000000000000
[    1.330547]  ? set_debug_rodata+0x11/0x11
[    1.331392]  ip6_route_init_special_entries+0x2a/0x89
[    1.332369]  addrconf_init+0x9e/0x203
[    1.333173]  inet6_init+0x1af/0x365
[    1.333956]  ? af_unix_init+0x4e/0x4e
[    1.334753]  do_one_initcall+0x4e/0x190
[    1.335555]  ? set_debug_rodata+0x11/0x11
[    1.336369]  kernel_init_freeable+0x189/0x20e
[    1.337230]  ? rest_init+0xd0/0xd0
[    1.337999]  kernel_init+0xa/0xf7
[    1.338744]  ret_from_fork+0x25/0x30
[    1.339500] Code: 48 8b 95 80 00 00 00 41 55 49 8d 8c 24 f0 0a 00 00 45 8b 84 24 10 09 00 00 41 89 c1 48 89 de 48 c7 c7 60 7a a3 81 e8 07 de 05 00 <0f> ff 58 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 55 48 89 e5 41 56 
[    1.342243] ---[ end trace b5d40c0fccce776c ]---

Back yours out, and...

# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
       swapper/0-1     [000] ...1     1.974114: in6_dev_getx: refs.counter:-1073741824
       swapper/0-1     [000] ...1     1.974116: in6_dev_getx: refs.counter:-1073741824

---
 arch/x86/include/asm/refcount.h |    9 +++++++++
 include/net/addrconf.h          |   12 ++++++++++++
 net/ipv6/route.c                |    4 ++--
 3 files changed, 23 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -55,6 +55,15 @@ static __always_inline void refcount_inc
 		: : "cc", "cx");
 }
 
+static __always_inline void refcount_inc_x(refcount_t *r)
+{
+	trace_printk("refs.counter:%d\n", r->refs.counter);
+	asm volatile(LOCK_PREFIX "incl %0\n\t"
+		REFCOUNT_CHECK_LT_ZERO
+		: [counter] "+m" (r->refs.counter)
+		: : "cc", "cx");
+}
+
 static __always_inline void refcount_dec(refcount_t *r)
 {
 	asm volatile(LOCK_PREFIX "decl %0\n\t"
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -321,6 +321,18 @@ static inline struct inet6_dev *in6_dev_
 	return idev;
 }
 
+static inline struct inet6_dev *in6_dev_getx(const struct net_device *dev)
+{
+	struct inet6_dev *idev;
+
+	rcu_read_lock();
+	idev = rcu_dereference(dev->ip6_ptr);
+	if (idev)
+		refcount_inc_x(&idev->refcnt);
+	rcu_read_unlock();
+	return idev;
+}
+
 static inline struct neigh_parms *__in6_dev_nd_parms_get_rcu(const struct net_device *dev)
 {
 	struct inet6_dev *idev = __in6_dev_get(dev);
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4044,9 +4044,9 @@ void __init ip6_route_init_special_entri
 	init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
   #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 	init_net.ipv6.ip6_prohibit_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
 	init_net.ipv6.ip6_blk_hole_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
   #endif
 }
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01  6:57                                         ` Mike Galbraith
@ 2017-09-01 13:09                                           ` Mike Galbraith
  2017-09-01 17:12                                             ` Kees Cook
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2017-09-01 13:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
> > >>
> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
> > >>
> > >> 4.8.5: WARN, eventual kernel hang
> > >> 6.3.1, 7.0.1: WARN, but continues working
> > >
> > > Yeah, that's correct.  I find that troubling, simply because this gcc
> > > version has been through one hell of a lot of kernels with me.  Yeah, I
> > > know, that doesn't exempt it from having bugs, but color me suspicious.
> > 
> > I still can't hit this with a 4.8.5 build. :(
> > 
> > With _RATELIMIT removed, this should, in theory, report whatever goes
> > negative first...
> 
> I applied the other patch you posted, and built with gcc-6.3.1 to
> remove the gcc-4.8.5 aspect.  Look below the resulting splat.

Grr, that one has a in6_dev_getx() line missing for the first
increment, where things go pear shaped.

With that added, looking at counter both before, and after incl, with a
trace_printk() in the exception handler showing it doing its saturate
thing, irqs disabled across the whole damn refcount_inc(), and even
booting box nr_cpus=1 for extra credit...

HTH can that first refcount_inc() get there?

# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
         systemd-1     [000] d..1     1.937284: in6_dev_getx: PRE refs.counter:3
         systemd-1     [000] d..1     1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
         systemd-1     [000] d..1     1.937296: in6_dev_getx: POST refs.counter:-1073741824
         systemd-1     [000] d..1     1.937296: in6_dev_getx: PRE refs.counter:-1073741824
         systemd-1     [000] d..1     1.937297: ex_handler_refcount: *(int *)regs->cx = -1073741824
         systemd-1     [000] d..1     1.937297: in6_dev_getx: POST refs.counter:-1073741824
         systemd-1     [000] d..1     1.937297: in6_dev_getx: PRE refs.counter:-1073741824
         systemd-1     [000] d..1     1.937298: ex_handler_refcount: *(int *)regs->cx = -1073741824
         systemd-1     [000] d..1     1.937299: in6_dev_getx: POST refs.counter:-1073741824

---
 arch/x86/include/asm/refcount.h |   14 ++++++++++++++
 arch/x86/mm/extable.c           |    1 +
 include/net/addrconf.h          |   12 ++++++++++++
 net/ipv6/route.c                |    6 +++---
 4 files changed, 30 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -55,6 +55,20 @@ static __always_inline void refcount_inc
 		: : "cc", "cx");
 }
 
+static __always_inline void refcount_inc_x(refcount_t *r)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	trace_printk("PRE refs.counter:%d\n", r->refs.counter);
+	asm volatile(LOCK_PREFIX "incl %0\n\t"
+		REFCOUNT_CHECK_LT_ZERO
+		: [counter] "+m" (r->refs.counter)
+		: : "cc", "cx");
+	trace_printk("POST refs.counter:%d\n", r->refs.counter);
+	local_irq_restore(flags);
+}
+
 static __always_inline void refcount_dec(refcount_t *r)
 {
 	asm volatile(LOCK_PREFIX "decl %0\n\t"
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -45,6 +45,7 @@ bool ex_handler_refcount(const struct ex
 {
 	/* First unconditionally saturate the refcount. */
 	*(int *)regs->cx = INT_MIN / 2;
+	trace_printk("*(int *)regs->cx = %d\n", *(int *)regs->cx);
 
 	/*
 	 * Strictly speaking, this reports the fixup destination, not
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -321,6 +321,18 @@ static inline struct inet6_dev *in6_dev_
 	return idev;
 }
 
+static inline struct inet6_dev *in6_dev_getx(const struct net_device *dev)
+{
+	struct inet6_dev *idev;
+
+	rcu_read_lock();
+	idev = rcu_dereference(dev->ip6_ptr);
+	if (idev)
+		refcount_inc_x(&idev->refcnt);
+	rcu_read_unlock();
+	return idev;
+}
+
 static inline struct neigh_parms *__in6_dev_nd_parms_get_rcu(const struct net_device *dev)
 {
 	struct inet6_dev *idev = __in6_dev_get(dev);
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4041,12 +4041,12 @@ void __init ip6_route_init_special_entri
 	 * the loopback reference in rt6_info will not be taken, do it
 	 * manually for init_net */
 	init_net.ipv6.ip6_null_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
   #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 	init_net.ipv6.ip6_prohibit_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
 	init_net.ipv6.ip6_blk_hole_entry->dst.dev = init_net.loopback_dev;
-	init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
+	init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
   #endif
 }
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01 13:09                                           ` Mike Galbraith
@ 2017-09-01 17:12                                             ` Kees Cook
  2017-09-01 17:52                                               ` Mike Galbraith
  0 siblings, 1 reply; 15+ messages in thread
From: Kees Cook @ 2017-09-01 17:12 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
>> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
>> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
>> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>> > >>
>> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
>> > >>
>> > >> 4.8.5: WARN, eventual kernel hang
>> > >> 6.3.1, 7.0.1: WARN, but continues working
>> > >
>> > > Yeah, that's correct.  I find that troubling, simply because this gcc
>> > > version has been through one hell of a lot of kernels with me.  Yeah, I
>> > > know, that doesn't exempt it from having bugs, but color me suspicious.
>> >
>> > I still can't hit this with a 4.8.5 build. :(
>> >
>> > With _RATELIMIT removed, this should, in theory, report whatever goes
>> > negative first...
>>
>> I applied the other patch you posted, and built with gcc-6.3.1 to
>> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
>
> Grr, that one has a in6_dev_getx() line missing for the first
> increment, where things go pear shaped.
>
> With that added, looking at counter both before, and after incl, with a
> trace_printk() in the exception handler showing it doing its saturate
> thing, irqs disabled across the whole damn refcount_inc(), and even
> booting box nr_cpus=1 for extra credit...
>
> HTH can that first refcount_inc() get there?
>
> # tracer: nop
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>          systemd-1     [000] d..1     1.937284: in6_dev_getx: PRE refs.counter:3
>          systemd-1     [000] d..1     1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
>          systemd-1     [000] d..1     1.937296: in6_dev_getx: POST refs.counter:-1073741824

O_o

Can you paste the disassembly of in6_dev_getx? I can't understand how
we're landing in the exception handler.

>          systemd-1     [000] d..1     1.937296: in6_dev_getx: PRE refs.counter:-1073741824
>          systemd-1     [000] d..1     1.937297: ex_handler_refcount: *(int *)regs->cx = -1073741824
>          systemd-1     [000] d..1     1.937297: in6_dev_getx: POST refs.counter:-1073741824
>          systemd-1     [000] d..1     1.937297: in6_dev_getx: PRE refs.counter:-1073741824
>          systemd-1     [000] d..1     1.937298: ex_handler_refcount: *(int *)regs->cx = -1073741824
>          systemd-1     [000] d..1     1.937299: in6_dev_getx: POST refs.counter:-1073741824
>
> ---
>  arch/x86/include/asm/refcount.h |   14 ++++++++++++++
>  arch/x86/mm/extable.c           |    1 +
>  include/net/addrconf.h          |   12 ++++++++++++
>  net/ipv6/route.c                |    6 +++---
>  4 files changed, 30 insertions(+), 3 deletions(-)
>
> --- a/arch/x86/include/asm/refcount.h
> +++ b/arch/x86/include/asm/refcount.h
> @@ -55,6 +55,20 @@ static __always_inline void refcount_inc
>                 : : "cc", "cx");
>  }
>
> +static __always_inline void refcount_inc_x(refcount_t *r)
> +{
> +       unsigned long flags;
> +
> +       local_irq_save(flags);
> +       trace_printk("PRE refs.counter:%d\n", r->refs.counter);
> +       asm volatile(LOCK_PREFIX "incl %0\n\t"
> +               REFCOUNT_CHECK_LT_ZERO
> +               : [counter] "+m" (r->refs.counter)
> +               : : "cc", "cx");

Does this need an explicit "memory" added to the clobbers line here?
This isn't present in the atomic_inc() implementation, but maybe
something confuses gcc in this case into ignoring the "+m" marking?

> +       trace_printk("POST refs.counter:%d\n", r->refs.counter);
> +       local_irq_restore(flags);
> +}
> +
>  static __always_inline void refcount_dec(refcount_t *r)
>  {
>         asm volatile(LOCK_PREFIX "decl %0\n\t"
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -45,6 +45,7 @@ bool ex_handler_refcount(const struct ex
>  {
>         /* First unconditionally saturate the refcount. */
>         *(int *)regs->cx = INT_MIN / 2;
> +       trace_printk("*(int *)regs->cx = %d\n", *(int *)regs->cx);

Just for fun, can you print out *(int *)regs->cx before the assignment too?

>
>         /*
>          * Strictly speaking, this reports the fixup destination, not
> --- a/include/net/addrconf.h
> +++ b/include/net/addrconf.h
> @@ -321,6 +321,18 @@ static inline struct inet6_dev *in6_dev_
>         return idev;
>  }
>
> +static inline struct inet6_dev *in6_dev_getx(const struct net_device *dev)
> +{
> +       struct inet6_dev *idev;
> +
> +       rcu_read_lock();
> +       idev = rcu_dereference(dev->ip6_ptr);
> +       if (idev)
> +               refcount_inc_x(&idev->refcnt);
> +       rcu_read_unlock();
> +       return idev;
> +}
> +
>  static inline struct neigh_parms *__in6_dev_nd_parms_get_rcu(const struct net_device *dev)
>  {
>         struct inet6_dev *idev = __in6_dev_get(dev);
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4041,12 +4041,12 @@ void __init ip6_route_init_special_entri
>          * the loopback reference in rt6_info will not be taken, do it
>          * manually for init_net */
>         init_net.ipv6.ip6_null_entry->dst.dev = init_net.loopback_dev;
> -       init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
> +       init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
>    #ifdef CONFIG_IPV6_MULTIPLE_TABLES
>         init_net.ipv6.ip6_prohibit_entry->dst.dev = init_net.loopback_dev;
> -       init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
> +       init_net.ipv6.ip6_prohibit_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
>         init_net.ipv6.ip6_blk_hole_entry->dst.dev = init_net.loopback_dev;
> -       init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
> +       init_net.ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_getx(init_net.loopback_dev);
>    #endif
>  }
>

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01 17:12                                             ` Kees Cook
@ 2017-09-01 17:52                                               ` Mike Galbraith
  2017-09-01 18:58                                                 ` Kees Cook
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2017-09-01 17:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, 2017-09-01 at 10:12 -0700, Kees Cook wrote:
> On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith <efault@gmx.de> wrote:
> > On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
> >> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
> >> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
> >> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
> >> > >>
> >> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
> >> > >>
> >> > >> 4.8.5: WARN, eventual kernel hang
> >> > >> 6.3.1, 7.0.1: WARN, but continues working
> >> > >
> >> > > Yeah, that's correct.  I find that troubling, simply because this gcc
> >> > > version has been through one hell of a lot of kernels with me.  Yeah, I
> >> > > know, that doesn't exempt it from having bugs, but color me suspicious.
> >> >
> >> > I still can't hit this with a 4.8.5 build. :(
> >> >
> >> > With _RATELIMIT removed, this should, in theory, report whatever goes
> >> > negative first...
> >>
> >> I applied the other patch you posted, and built with gcc-6.3.1 to
> >> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
> >
> > Grr, that one has a in6_dev_getx() line missing for the first
> > increment, where things go pear shaped.
> >
> > With that added, looking at counter both before, and after incl, with a
> > trace_printk() in the exception handler showing it doing its saturate
> > thing, irqs disabled across the whole damn refcount_inc(), and even
> > booting box nr_cpus=1 for extra credit...
> >
> > HTH can that first refcount_inc() get there?
> >
> > # tracer: nop
> > #
> > #                              _-----=> irqs-off
> > #                             / _----=> need-resched
> > #                            | / _---=> hardirq/softirq
> > #                            || / _--=> preempt-depth
> > #                            ||| /     delay
> > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> > #              | |       |   ||||       |         |
> >          systemd-1     [000] d..1     1.937284: in6_dev_getx: PRE refs.counter:3
> >          systemd-1     [000] d..1     1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
> >          systemd-1     [000] d..1     1.937296: in6_dev_getx: POST refs.counter:-1073741824
> 
> O_o
> 
> Can you paste the disassembly of in6_dev_getx? I can't understand how
> we're landing in the exception handler.

I was hoping you'd say that.

   0xffffffff816b2f72 <+0>:     push   %rbp
   0xffffffff816b2f73 <+1>:     mov    %rsp,%rbp
   0xffffffff816b2f76 <+4>:     push   %r12
   0xffffffff816b2f78 <+6>:     push   %rbx
   0xffffffff816b2f79 <+7>:     incl   %gs:0x7e95a2d0(%rip)        # 0xd250 <__preempt_count>
   0xffffffff816b2f80 <+14>:    mov    0x308(%rdi),%rbx
   0xffffffff816b2f87 <+21>:    test   %rbx,%rbx
   0xffffffff816b2f8a <+24>:    je     0xffffffff816b2feb <in6_dev_getx+121>
   0xffffffff816b2f8c <+26>:    callq  *0xffffffff81c35a00
   0xffffffff816b2f93 <+33>:    mov    %rax,%r12
   0xffffffff816b2f96 <+36>:    callq  *0xffffffff81c35a10
   0xffffffff816b2f9d <+43>:    mov    0x769ad4(%rip),%rsi        # 0xffffffff81e1ca78 <trace_printk_fmt.21733>
   0xffffffff816b2fa4 <+50>:    mov    0xf0(%rbx),%edx
   0xffffffff816b2faa <+56>:    mov    $0xffffffff816b2f8c,%rdi
   0xffffffff816b2fb1 <+63>:    callq  0xffffffff81171fc0 <__trace_bprintk>
   0xffffffff816b2fb6 <+68>:    lock incl 0xf0(%rbx)
   0xffffffff816b2fbd <+75>:    js     0xffffffff816b2fbf <in6_dev_getx+77>
   0xffffffff816b2fbf <+77>:    lea    0xf0(%rbx),%rcx
   0xffffffff816b2fc6 <+84>:    (bad)  
   0xffffffff816b2fc8 <+86>:    mov    0x769a99(%rip),%rsi        # 0xffffffff81e1ca68 <trace_printk_fmt.21744>
   0xffffffff816b2fcf <+93>:    mov    0xf0(%rbx),%edx
   0xffffffff816b2fd5 <+99>:    mov    $0xffffffff816b2f8c,%rdi
   0xffffffff816b2fdc <+106>:   callq  0xffffffff81171fc0 <__trace_bprintk>
   0xffffffff816b2fe1 <+111>:   mov    %r12,%rdi
   0xffffffff816b2fe4 <+114>:   callq  *0xffffffff81c35a08
   0xffffffff816b2feb <+121>:   decl   %gs:0x7e95a25e(%rip)        # 0xd250 <__preempt_count>
   0xffffffff816b2ff2 <+128>:   mov    %rbx,%rax
   0xffffffff816b2ff5 <+131>:   pop    %rbx
   0xffffffff816b2ff6 <+132>:   pop    %r12
   0xffffffff816b2ff8 <+134>:   pop    %rbp
   0xffffffff816b2ff9 <+135>:   retq

I don't get the section business at all, +75 looks to me like we're
gonna trap no matter what.. as we appear to be doing.

> > --- a/arch/x86/include/asm/refcount.h
> > +++ b/arch/x86/include/asm/refcount.h
> > @@ -55,6 +55,20 @@ static __always_inline void refcount_inc
> >                 : : "cc", "cx");
> >  }
> >
> > +static __always_inline void refcount_inc_x(refcount_t *r)
> > +{
> > +       unsigned long flags;
> > +
> > +       local_irq_save(flags);
> > +       trace_printk("PRE refs.counter:%d\n", r->refs.counter);
> > +       asm volatile(LOCK_PREFIX "incl %0\n\t"
> > +               REFCOUNT_CHECK_LT_ZERO
> > +               : [counter] "+m" (r->refs.counter)
> > +               : : "cc", "cx");
> 
> Does this need an explicit "memory" added to the clobbers line here?
> This isn't present in the atomic_inc() implementation, but maybe
> something confuses gcc in this case into ignoring the "+m" marking?

I thought about adding that (hail mary), but resisted.

> > +       trace_printk("POST refs.counter:%d\n", r->refs.counter);
> > +       local_irq_restore(flags);
> > +}
> > +
> >  static __always_inline void refcount_dec(refcount_t *r)
> >  {
> >         asm volatile(LOCK_PREFIX "decl %0\n\t"
> > --- a/arch/x86/mm/extable.c
> > +++ b/arch/x86/mm/extable.c
> > @@ -45,6 +45,7 @@ bool ex_handler_refcount(const struct ex
> >  {
> >         /* First unconditionally saturate the refcount. */
> >         *(int *)regs->cx = INT_MIN / 2;
> > +       trace_printk("*(int *)regs->cx = %d\n", *(int *)regs->cx);
> 
> Just for fun, can you print out *(int *)regs->cx before the assignment too?

Sure, tomorrow.

	-Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01 17:52                                               ` Mike Galbraith
@ 2017-09-01 18:58                                                 ` Kees Cook
  2017-09-01 19:24                                                   ` Mike Galbraith
  2017-09-01 19:40                                                   ` Kees Cook
  0 siblings, 2 replies; 15+ messages in thread
From: Kees Cook @ 2017-09-01 18:58 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, Sep 1, 2017 at 10:52 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Fri, 2017-09-01 at 10:12 -0700, Kees Cook wrote:
>> On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith <efault@gmx.de> wrote:
>> > On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
>> >> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
>> >> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
>> >> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>> >> > >>
>> >> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
>> >> > >>
>> >> > >> 4.8.5: WARN, eventual kernel hang
>> >> > >> 6.3.1, 7.0.1: WARN, but continues working
>> >> > >
>> >> > > Yeah, that's correct.  I find that troubling, simply because this gcc
>> >> > > version has been through one hell of a lot of kernels with me.  Yeah, I
>> >> > > know, that doesn't exempt it from having bugs, but color me suspicious.
>> >> >
>> >> > I still can't hit this with a 4.8.5 build. :(
>> >> >
>> >> > With _RATELIMIT removed, this should, in theory, report whatever goes
>> >> > negative first...
>> >>
>> >> I applied the other patch you posted, and built with gcc-6.3.1 to
>> >> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
>> >
>> > Grr, that one has a in6_dev_getx() line missing for the first
>> > increment, where things go pear shaped.
>> >
>> > With that added, looking at counter both before, and after incl, with a
>> > trace_printk() in the exception handler showing it doing its saturate
>> > thing, irqs disabled across the whole damn refcount_inc(), and even
>> > booting box nr_cpus=1 for extra credit...
>> >
>> > HTH can that first refcount_inc() get there?
>> >
>> > # tracer: nop
>> > #
>> > #                              _-----=> irqs-off
>> > #                             / _----=> need-resched
>> > #                            | / _---=> hardirq/softirq
>> > #                            || / _--=> preempt-depth
>> > #                            ||| /     delay
>> > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>> > #              | |       |   ||||       |         |
>> >          systemd-1     [000] d..1     1.937284: in6_dev_getx: PRE refs.counter:3
>> >          systemd-1     [000] d..1     1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
>> >          systemd-1     [000] d..1     1.937296: in6_dev_getx: POST refs.counter:-1073741824
>>
>> O_o
>>
>> Can you paste the disassembly of in6_dev_getx? I can't understand how
>> we're landing in the exception handler.
>
> I was hoping you'd say that.
>
>    0xffffffff816b2f72 <+0>:     push   %rbp
>    0xffffffff816b2f73 <+1>:     mov    %rsp,%rbp
>    0xffffffff816b2f76 <+4>:     push   %r12
>    0xffffffff816b2f78 <+6>:     push   %rbx
>    0xffffffff816b2f79 <+7>:     incl   %gs:0x7e95a2d0(%rip)        # 0xd250 <__preempt_count>
>    0xffffffff816b2f80 <+14>:    mov    0x308(%rdi),%rbx
>    0xffffffff816b2f87 <+21>:    test   %rbx,%rbx
>    0xffffffff816b2f8a <+24>:    je     0xffffffff816b2feb <in6_dev_getx+121>
>    0xffffffff816b2f8c <+26>:    callq  *0xffffffff81c35a00
>    0xffffffff816b2f93 <+33>:    mov    %rax,%r12
>    0xffffffff816b2f96 <+36>:    callq  *0xffffffff81c35a10
>    0xffffffff816b2f9d <+43>:    mov    0x769ad4(%rip),%rsi        # 0xffffffff81e1ca78 <trace_printk_fmt.21733>
>    0xffffffff816b2fa4 <+50>:    mov    0xf0(%rbx),%edx
>    0xffffffff816b2faa <+56>:    mov    $0xffffffff816b2f8c,%rdi
>    0xffffffff816b2fb1 <+63>:    callq  0xffffffff81171fc0 <__trace_bprintk>
>    0xffffffff816b2fb6 <+68>:    lock incl 0xf0(%rbx)
>    0xffffffff816b2fbd <+75>:    js     0xffffffff816b2fbf <in6_dev_getx+77>
>    0xffffffff816b2fbf <+77>:    lea    0xf0(%rbx),%rcx
>    0xffffffff816b2fc6 <+84>:    (bad)
>    0xffffffff816b2fc8 <+86>:    mov    0x769a99(%rip),%rsi        # 0xffffffff81e1ca68 <trace_printk_fmt.21744>
>    0xffffffff816b2fcf <+93>:    mov    0xf0(%rbx),%edx
>    0xffffffff816b2fd5 <+99>:    mov    $0xffffffff816b2f8c,%rdi
>    0xffffffff816b2fdc <+106>:   callq  0xffffffff81171fc0 <__trace_bprintk>
>    0xffffffff816b2fe1 <+111>:   mov    %r12,%rdi
>    0xffffffff816b2fe4 <+114>:   callq  *0xffffffff81c35a08
>    0xffffffff816b2feb <+121>:   decl   %gs:0x7e95a25e(%rip)        # 0xd250 <__preempt_count>
>    0xffffffff816b2ff2 <+128>:   mov    %rbx,%rax
>    0xffffffff816b2ff5 <+131>:   pop    %rbx
>    0xffffffff816b2ff6 <+132>:   pop    %r12
>    0xffffffff816b2ff8 <+134>:   pop    %rbp
>    0xffffffff816b2ff9 <+135>:   retq
>
> I don't get the section business at all, +75 looks to me like we're
> gonna trap no matter what.. as we appear to be doing.

The section stuff is supposed to be a trick to push the error case off
into the .text.unlikely area to avoid needing a jmp over the handler
and with possibly some redundancy removal done by the compiler (though
this appears to be rather limited) if it notices a bunch of error
paths are the same. However, in your disassembly, it's inline (!!) in
the code, as if "pushsection" and "popsection" were entirely ignored.

And when I make my own in6_dev_getx(), I see the same disassembly:

   0xffffffff818a757b <+181>:   lock incl 0x1e0(%rbx)
   0xffffffff818a7582 <+188>:   js     0xffffffff818a7584 <in6_dev_getx+190>
   0xffffffff818a7584 <+190>:   lea    0x1e0(%rbx),%rcx
   0xffffffff818a758b <+197>:   (bad)

Which is VERY different from how it looks in other places!

e.g. from lkdtm_REFCOUNT_INC_SATURATED:

   0xffffffff815657df <+47>:    lock incl -0xc(%rbp)
   0xffffffff815657e3 <+51>:    js     0xffffffff81565cac
...
   0xffffffff81565cac:  lea    -0xc(%rbp),%rcx
   0xffffffff81565cb0:  (bad)

So, at least I can reproduce this in the build now. I must not be
exercising these paths. FWIW, this is with Ubuntu's 6.3.0 gcc.

I'll try to figure out what's going on here...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01 18:58                                                 ` Kees Cook
@ 2017-09-01 19:24                                                   ` Mike Galbraith
  2017-09-01 19:40                                                   ` Kees Cook
  1 sibling, 0 replies; 15+ messages in thread
From: Mike Galbraith @ 2017-09-01 19:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, 2017-09-01 at 11:58 -0700, Kees Cook wrote:
> 
> The section stuff is supposed to be a trick to push the error case off
> into the .text.unlikely area to avoid needing a jmp over the handler
> and with possibly some redundancy removal done by the compiler (though
> this appears to be rather limited) if it notices a bunch of error
> paths are the same. However, in your disassembly, it's inline (!!) in
> the code, as if "pushsection" and "popsection" were entirely ignored.
> 
> And when I make my own in6_dev_getx(), I see the same disassembly:
> 
>    0xffffffff818a757b <+181>:   lock incl 0x1e0(%rbx)
>    0xffffffff818a7582 <+188>:   js     0xffffffff818a7584 <in6_dev_getx+190>
>    0xffffffff818a7584 <+190>:   lea    0x1e0(%rbx),%rcx
>    0xffffffff818a758b <+197>:   (bad)
> 
> Which is VERY different from how it looks in other places!
> 
> e.g. from lkdtm_REFCOUNT_INC_SATURATED:
> 
>    0xffffffff815657df <+47>:    lock incl -0xc(%rbp)
>    0xffffffff815657e3 <+51>:    js     0xffffffff81565cac
> ...
>    0xffffffff81565cac:  lea    -0xc(%rbp),%rcx
>    0xffffffff81565cb0:  (bad)
> 
> So, at least I can reproduce this in the build now. I must not be
> exercising these paths. FWIW, this is with Ubuntu's 6.3.0 gcc.
> 
> I'll try to figure out what's going on here...

Heh, make in6_dev_getx() __always_inline.

       swapper/0-1     [000] d..1     1.438587: ip6_route_init_special_entries: PRE refs.counter:3
       swapper/0-1     [000] d..1     1.438590: ip6_route_init_special_entries: POST refs.counter:4
       swapper/0-1     [000] d..1     1.438591: ip6_route_init_special_entries: PRE refs.counter:4
       swapper/0-1     [000] d..1     1.438592: ip6_route_init_special_entries: POST refs.counter:5
       swapper/0-1     [000] d..1     1.438592: ip6_route_init_special_entries: PRE refs.counter:5
       swapper/0-1     [000] d..1     1.438593: ip6_route_init_special_entries: POST refs.counter:6

	-Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection
  2017-09-01 18:58                                                 ` Kees Cook
  2017-09-01 19:24                                                   ` Mike Galbraith
@ 2017-09-01 19:40                                                   ` Kees Cook
  1 sibling, 0 replies; 15+ messages in thread
From: Kees Cook @ 2017-09-01 19:40 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: David S. Miller, Peter Zijlstra, LKML, Ingo Molnar, Reshetova,
	Elena, Network Development

On Fri, Sep 1, 2017 at 11:58 AM, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Sep 1, 2017 at 10:52 AM, Mike Galbraith <efault@gmx.de> wrote:
>> On Fri, 2017-09-01 at 10:12 -0700, Kees Cook wrote:
>>> On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith <efault@gmx.de> wrote:
>>> > On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
>>> >> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
>>> >> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@gmx.de> wrote:
>>> >> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
>>> >> > >>
>>> >> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
>>> >> > >>
>>> >> > >> 4.8.5: WARN, eventual kernel hang
>>> >> > >> 6.3.1, 7.0.1: WARN, but continues working
>>> >> > >
>>> >> > > Yeah, that's correct.  I find that troubling, simply because this gcc
>>> >> > > version has been through one hell of a lot of kernels with me.  Yeah, I
>>> >> > > know, that doesn't exempt it from having bugs, but color me suspicious.
>>> >> >
>>> >> > I still can't hit this with a 4.8.5 build. :(
>>> >> >
>>> >> > With _RATELIMIT removed, this should, in theory, report whatever goes
>>> >> > negative first...
>>> >>
>>> >> I applied the other patch you posted, and built with gcc-6.3.1 to
>>> >> remove the gcc-4.8.5 aspect.  Look below the resulting splat.
>>> >
>>> > Grr, that one has a in6_dev_getx() line missing for the first
>>> > increment, where things go pear shaped.
>>> >
>>> > With that added, looking at counter both before, and after incl, with a
>>> > trace_printk() in the exception handler showing it doing its saturate
>>> > thing, irqs disabled across the whole damn refcount_inc(), and even
>>> > booting box nr_cpus=1 for extra credit...
>>> >
>>> > HTH can that first refcount_inc() get there?
>>> >
>>> > # tracer: nop
>>> > #
>>> > #                              _-----=> irqs-off
>>> > #                             / _----=> need-resched
>>> > #                            | / _---=> hardirq/softirq
>>> > #                            || / _--=> preempt-depth
>>> > #                            ||| /     delay
>>> > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>> > #              | |       |   ||||       |         |
>>> >          systemd-1     [000] d..1     1.937284: in6_dev_getx: PRE refs.counter:3
>>> >          systemd-1     [000] d..1     1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
>>> >          systemd-1     [000] d..1     1.937296: in6_dev_getx: POST refs.counter:-1073741824
>>>
>>> O_o
>>>
>>> Can you paste the disassembly of in6_dev_getx? I can't understand how
>>> we're landing in the exception handler.
>>
>> I was hoping you'd say that.
>>
>>    0xffffffff816b2f72 <+0>:     push   %rbp
>>    0xffffffff816b2f73 <+1>:     mov    %rsp,%rbp
>>    0xffffffff816b2f76 <+4>:     push   %r12
>>    0xffffffff816b2f78 <+6>:     push   %rbx
>>    0xffffffff816b2f79 <+7>:     incl   %gs:0x7e95a2d0(%rip)        # 0xd250 <__preempt_count>
>>    0xffffffff816b2f80 <+14>:    mov    0x308(%rdi),%rbx
>>    0xffffffff816b2f87 <+21>:    test   %rbx,%rbx
>>    0xffffffff816b2f8a <+24>:    je     0xffffffff816b2feb <in6_dev_getx+121>
>>    0xffffffff816b2f8c <+26>:    callq  *0xffffffff81c35a00
>>    0xffffffff816b2f93 <+33>:    mov    %rax,%r12
>>    0xffffffff816b2f96 <+36>:    callq  *0xffffffff81c35a10
>>    0xffffffff816b2f9d <+43>:    mov    0x769ad4(%rip),%rsi        # 0xffffffff81e1ca78 <trace_printk_fmt.21733>
>>    0xffffffff816b2fa4 <+50>:    mov    0xf0(%rbx),%edx
>>    0xffffffff816b2faa <+56>:    mov    $0xffffffff816b2f8c,%rdi
>>    0xffffffff816b2fb1 <+63>:    callq  0xffffffff81171fc0 <__trace_bprintk>
>>    0xffffffff816b2fb6 <+68>:    lock incl 0xf0(%rbx)
>>    0xffffffff816b2fbd <+75>:    js     0xffffffff816b2fbf <in6_dev_getx+77>
>>    0xffffffff816b2fbf <+77>:    lea    0xf0(%rbx),%rcx
>>    0xffffffff816b2fc6 <+84>:    (bad)
>>    0xffffffff816b2fc8 <+86>:    mov    0x769a99(%rip),%rsi        # 0xffffffff81e1ca68 <trace_printk_fmt.21744>
>>    0xffffffff816b2fcf <+93>:    mov    0xf0(%rbx),%edx
>>    0xffffffff816b2fd5 <+99>:    mov    $0xffffffff816b2f8c,%rdi
>>    0xffffffff816b2fdc <+106>:   callq  0xffffffff81171fc0 <__trace_bprintk>
>>    0xffffffff816b2fe1 <+111>:   mov    %r12,%rdi
>>    0xffffffff816b2fe4 <+114>:   callq  *0xffffffff81c35a08
>>    0xffffffff816b2feb <+121>:   decl   %gs:0x7e95a25e(%rip)        # 0xd250 <__preempt_count>
>>    0xffffffff816b2ff2 <+128>:   mov    %rbx,%rax
>>    0xffffffff816b2ff5 <+131>:   pop    %rbx
>>    0xffffffff816b2ff6 <+132>:   pop    %r12
>>    0xffffffff816b2ff8 <+134>:   pop    %rbp
>>    0xffffffff816b2ff9 <+135>:   retq
>>
>> I don't get the section business at all, +75 looks to me like we're
>> gonna trap no matter what.. as we appear to be doing.
>
> The section stuff is supposed to be a trick to push the error case off
> into the .text.unlikely area to avoid needing a jmp over the handler
> and with possibly some redundancy removal done by the compiler (though
> this appears to be rather limited) if it notices a bunch of error
> paths are the same. However, in your disassembly, it's inline (!!) in
> the code, as if "pushsection" and "popsection" were entirely ignored.
>
> And when I make my own in6_dev_getx(), I see the same disassembly:
>
>    0xffffffff818a757b <+181>:   lock incl 0x1e0(%rbx)
>    0xffffffff818a7582 <+188>:   js     0xffffffff818a7584 <in6_dev_getx+190>
>    0xffffffff818a7584 <+190>:   lea    0x1e0(%rbx),%rcx
>    0xffffffff818a758b <+197>:   (bad)
>
> Which is VERY different from how it looks in other places!

Found it.

If the compiler already pushed the entire function into
.text.unlikely, x86-refcount's .pushsection doesn't do any good
(obviously). Durrr.

        .section        .text.unlikely,"ax",@progbits
        .type   in6_dev_getx, @function
in6_dev_getx:
.LFB4673:
        .loc 2 4128 0
        .cfi_startproc
...
        lock; incl 480(%rbx)
        js 111f
        .pushsection .text.unlikely
111:    lea 480(%rbx), %rcx
112:    .byte 0x0f, 0xff
.popsection
113:

I will get this fixed. Thank you again for helping track this down!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-09-01 19:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1503996623.8323.20.camel@gmx.de>
     [not found] ` <CAGXu5jLYHy6Damtg_Pc4=G76OUiMV3fcY9+Ru1-Lmiq6VWEgsw@mail.gmail.com>
     [not found]   ` <1504025721.6024.25.camel@gmx.de>
     [not found]     ` <1504030207.6560.0.camel@gmx.de>
     [not found]       ` <CAGXu5jKSWiZ8NkgUqO6G8fs5R06bMtdFBmZz9MMX-HAk8nXXNw@mail.gmail.com>
     [not found]         ` <1504069332.8352.3.camel@gmx.de>
     [not found]           ` <CAGXu5jJim=r8b+y1qQXtuzm9fPyv1tp9ANebxbCc9ic2c83_Mg@mail.gmail.com>
     [not found]             ` <1504113212.5852.6.camel@gmx.de>
     [not found]               ` <CAGXu5jJd80-0PJvB51x9ZDETPSoAtRVQ7dQ6HWwd0DYpKBgPFA@mail.gmail.com>
     [not found]                 ` <1504115735.5852.11.camel@gmx.de>
     [not found]                   ` <CAGXu5j+BLj6_LPfKs-vts-mbyJHu8D--WrmWbiT8Zbt-iD65KA@mail.gmail.com>
     [not found]                     ` <CAGXu5j+neSYAqb8=r42=jQ0LsJTGjCSdpgc9Hv+ipyAFURbCdw@mail.gmail.com>
     [not found]                       ` <1504145389.23109.4.camel@gmx.de>
     [not found]                         ` <CAGXu5jLKXEDdSpxxVM5oHHUWKbSTozx4FZRn70bQm4tQAhMqUQ@mail.gmail.com>
     [not found]                           ` <1504149176.23109.9.camel@gmx.de>
2017-08-31  4:01                             ` tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection Kees Cook
2017-08-31  4:10                               ` Kees Cook
2017-08-31  4:38                                 ` Mike Galbraith
2017-08-31 13:58                                 ` Mike Galbraith
2017-08-31 17:00                                   ` Kees Cook
2017-08-31 17:19                                     ` Mike Galbraith
2017-08-31 18:45                                       ` Kees Cook
2017-09-01  6:57                                         ` Mike Galbraith
2017-09-01 13:09                                           ` Mike Galbraith
2017-09-01 17:12                                             ` Kees Cook
2017-09-01 17:52                                               ` Mike Galbraith
2017-09-01 18:58                                                 ` Kees Cook
2017-09-01 19:24                                                   ` Mike Galbraith
2017-09-01 19:40                                                   ` Kees Cook
2017-08-31 19:28                                   ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).