All of lore.kernel.org
 help / color / mirror / Atom feed
* [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-09  0:43 Patrick McLean
  2017-11-09  2:40 ` Linus Torvalds
  2017-11-11  2:47 ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 Alan Cox
  0 siblings, 2 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-09  0:43 UTC (permalink / raw)
  To: linux-kernel, linux-nfs; +Cc: stable, regressions, torvalds

As of 4.13.11 (and also with 4.14-rc) we have an issue where when
serving nfs4 sometimes we get the following BUG. When this bug happens,
it usually also causes the motherboard to no longer POST until we
externally re-flash the BIOS (using the BMC web interface). If a
motherboard does not have an external way to flash the BIOS, this would
brick the hardware.

The issue was introduced somewhere between 4.13.8 and 4.13.11 in the
stable series 4.13 kernels. It seems to be much easier to trigger on
4.14 kernels than 4.13 kernels.

We are working on bisecting it, but it is slow going since it often
takes several reboots to trigger the issue.

The taint is caused by the "gkuart" an out-of-kernel driver which is a
fork of the cp210x driver with GPIO lines added to it, we can provide
the source for this if needed.

When the BIOS is gets broke, we see these messages in the shutdown logs:
> [ 2206.698884] kvm: exiting hardware virtualization
> [ 2206.700160] e1000e: EEE TX LPI TIMER: 00t
> [ 2206.743126] ACPI MEMORY or I/O RESET_REG.

Here is the BUG we are getting:
> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
> [   58.963918] IP: vfs_statfs+0x73/0xb0
> [   58.964597] PGD 0 P4D 0 
> [   58.965208] Oops: 0000 [#1] SMP
> [   58.965847] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 nf_nat gkuart(O) usbserial x86_pkg_temp_thermal ipmi_ssif tpm_tis tpm_tis_core ie31200_edac ext4 mbcache jbd2 e1000e crc32c_intel
> [   58.969163] CPU: 0 PID: 3970 Comm: nfsd Tainted: G           O    4.14.0-rc8-git-kratos-1-00012-gd6a2cf07f0c9 #1
> [   58.970693] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   58.971685] task: ffff88040b286200 task.stack: ffffc90002c94000
> [   58.972576] RIP: 0010:vfs_statfs+0x73/0xb0
> [   58.973329] RSP: 0018:ffffc90002c97b30 EFLAGS: 00010202
> [   58.974188] RAX: 0000000000000000 RBX: ffffc90002c97bf8 RCX: 0000000000001c00
> [   58.975253] RDX: 0000000000000c00 RSI: 0000000000000020 RDI: 0000000000000000
> [   58.976213] RBP: ffffc90002c97bc8 R08: 0000000000000000 R09: 00000000000000ff
> [   58.977161] R10: 000000000038be3a R11: ffff88040ec440c8 R12: ffff88040c5ba000
> [   58.978107] R13: ffff88040a86e000 R14: ffff88040c5c1000 R15: ffffc90002c97bf8
> [   58.979051] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
> [   58.980448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   58.981419] CR2: 0000000000000230 CR3: 0000000001e0a002 CR4: 00000000001606f0
> [   58.982483] Call Trace:
> [   58.983108]  nfsd4_encode_fattr+0x1f3/0x2070
> [   58.983873]  ? find_inode_fast+0x52/0x90
> [   58.984587]  ? get_acl+0x17/0xf0
> [   58.985258]  ? generic_permission+0x122/0x1a0
> [   58.986019]  nfsd4_encode_getattr+0x25/0x30
> [   58.986746]  nfsd4_encode_operation+0x98/0x1a0
> [   58.987485]  nfsd4_proc_compound+0x3eb/0x5c0
> [   58.988206]  nfsd_dispatch+0xa8/0x230
> [   58.988891]  svc_process_common+0x347/0x640
> [   58.989619]  svc_process+0x100/0x1b0
> [   58.990334]  nfsd+0xe3/0x150
> [   58.990988]  kthread+0xfc/0x130
> [   58.991651]  ? nfsd_destroy+0x60/0x60
> [   58.992364]  ? kthread_create_on_node+0x40/0x40
> [   58.993153]  ret_from_fork+0x25/0x30
> [   58.993858] Code: d1 83 c9 08 40 f6 c6 04 0f 45 d1 89 d1 80 cd 04 40 f6 c6 08 0f 45 d1 89 d1 80 cd 08 40 f6 c6 10 0f 45 d1 89 d1 80 cd 10 83 e6 20 <48> 8b b7 30 02 00 00 0f 45 d1 83 ca 20 89 f1 83 e1 10 89 cf 83
> [   58.996592] RIP: vfs_statfs+0x73/0xb0 RSP: ffffc90002c97b30
> [   58.997474] CR2: 0000000000000230
> [   58.998147] ---[ end trace c3a6e976d53aaa00 ]---
> [  107.669217] random: crng init done
> [  210.170059] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
> [  210.176363] IP: vfs_statfs+0x73/0xb0
> [  210.177032] PGD 0 P4D 0
> [  210.177633] Oops: 0000 [#2] SMP
> [  210.178286] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 nf_nat gkuart(O) usbserial x86_pkg_temp_thermal ipmi_ssif tpm_tis tpm_tis_core ie31200_edac ext4 mbcache jbd2 e1000e crc32c_intel
> [  210.192120] CPU: 0 PID: 3969 Comm: nfsd Tainted: G      D    O    4.14.0-rc8-git-kratos-1-00012-gd6a2cf07f0c9 #1
> [  210.203168] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [  210.204140] task: ffff880409a7aa00 task.stack: ffffc90002c8c000
> [  210.205168] RIP: 0010:vfs_statfs+0x73/0xb0
> [  210.205893] RSP: 0018:ffffc90002c8fb30 EFLAGS: 00010202
> [  210.206708] RAX: 0000000000000000 RBX: ffffc90002c8fbf8 RCX: 0000000000001c00
> [  210.218314] RDX: 0000000000000c00 RSI: 0000000000000020 RDI: 0000000000000000
> [  210.219364] RBP: ffffc90002c8fbc8 R08: 0000000000000000 R09: 00000000000000ff
> [  210.220426] R10: 000000000038be3a R11: ffff88040ec440c8 R12: ffff88040c5b8000
> [  210.221455] R13: ffff88040a86e000 R14: ffff88040c5c4000 R15: ffffc90002c8fbf8
> [  210.222484] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
> [  210.223894] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  210.224938] CR2: 0000000000000230 CR3: 0000000001e0a003 CR4: 00000000001606f0
> [  210.226020] Call Trace:
> [  210.226615]  nfsd4_encode_fattr+0x1f3/0x2070
> [  210.227348]  ? find_inode_fast+0x52/0x90
> [  210.238225]  ? get_acl+0x17/0xf0
> [  210.238890]  ? generic_permission+0x122/0x1a0
> [  210.239637]  nfsd4_encode_getattr+0x25/0x30
> [  210.240365]  nfsd4_encode_operation+0x98/0x1a0
> [  210.241127]  nfsd4_proc_compound+0x3eb/0x5c0
> [  210.241868]  nfsd_dispatch+0xa8/0x230
> [  210.242564]  svc_process_common+0x347/0x640
> [  210.243294]  svc_process+0x100/0x1b0
> [  210.243969]  nfsd+0xe3/0x150
> [  210.244582]  kthread+0xfc/0x130
> [  210.255467]  ? nfsd_destroy+0x60/0x60
> [  210.256153]  ? kthread_create_on_node+0x40/0x40
> [  210.256892]  ret_from_fork+0x25/0x30
> [  210.257570] Code: d1 83 c9 08 40 f6 c6 04 0f 45 d1 89 d1 80 cd 04 40 f6 c6 08 0f 45 d1 89 d1 80 cd 08 40 f6 c6 10 0f 45 d1 89 d1 80 cd 10 83 e6 20 <48> 8b b7 30 02 00 00 0f 45 d1 83 ca 20 89 f1 83 e1 10 89 cf 83
> [  210.260340] RIP: vfs_statfs+0x73/0xb0 RSP: ffffc90002c8fb30
> [  210.261157] CR2: 0000000000000230
> [  210.261810] ---[ end trace c3a6e976d53aaa01 ]---

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09  0:43 [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 Patrick McLean
@ 2017-11-09  2:40 ` Linus Torvalds
  2017-11-09  3:45   ` Al Viro
                     ` (4 more replies)
  2017-11-11  2:47 ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 Alan Cox
  1 sibling, 5 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-09  2:40 UTC (permalink / raw)
  To: Patrick McLean, Al Viro, Bruce Fields, Darrick J. Wong
  Cc: Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Wed, Nov 8, 2017 at 4:43 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> As of 4.13.11 (and also with 4.14-rc) we have an issue where when
> serving nfs4 sometimes we get the following BUG. When this bug happens,
> it usually also causes the motherboard to no longer POST until we
> externally re-flash the BIOS (using the BMC web interface). If a
> motherboard does not have an external way to flash the BIOS, this would
> brick the hardware.

That sounds like your BIOS is just broken.

The kernel oops is probably just a trigger for that - possibly because
you reboot with a particular state that breaks the BIOS.

Also, are you sure you really need to reflash the BIOS? It's actually
fairly hard to overwrite the BIOS itself, but crashing with bad
hardware state (where "bad" can just mean "unexpected by the BIOS")
can cause the BIOS to not properly re-initialize things, and hang at
boot.

So not booting cleanly from a warm reset is a reasonably common BIOS failure.

And yes, reflashing tends to force a full initialization and thus
"fixes" things, but it may be a big hammer when a cold boot or just a
"reset BIOS to safe defaults" might be sufficient.

In pretty much all cases this is a sign of a nasty BIOS problem,
though, and you may want to look into a firmware update from the
vendor for that.

But on to the kernel side:

> Here is the BUG we are getting:
>> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
>> [   58.963918] IP: vfs_statfs+0x73/0xb0

The code disassembles to

   0: 83 c9 08              or     $0x8,%ecx
   3: 40 f6 c6 04          test   $0x4,%sil
   7: 0f 45 d1              cmovne %ecx,%edx
   a: 89 d1                mov    %edx,%ecx
   c: 80 cd 04              or     $0x4,%ch
   f: 40 f6 c6 08          test   $0x8,%sil
  13: 0f 45 d1              cmovne %ecx,%edx
  16: 89 d1                mov    %edx,%ecx
  18: 80 cd 08              or     $0x8,%ch
  1b: 40 f6 c6 10          test   $0x10,%sil
  1f: 0f 45 d1              cmovne %ecx,%edx
  22: 89 d1                mov    %edx,%ecx
  24: 80 cd 10              or     $0x10,%ch
  27: 83 e6 20              and    $0x20,%esi
  2a:* 48 8b b7 30 02 00 00 mov    0x230(%rdi),%rsi <-- trapping instruction
  31: 0f 45 d1              cmovne %ecx,%edx
  34: 83 ca 20              or     $0x20,%edx
  37: 89 f1                mov    %esi,%ecx
  39: 83 e1 10              and    $0x10,%ecx
  3c: 89 cf                mov    %ecx,%edi

and all those odd cmovne and bit-ops are just the bit selection code
in flags_by_mnt(), which is inlined through calculate_f_flags (which
is _also_ inlined) into vfs_statfs().

Sadly, gcc makes a mess of it and actually generates code that looks
like the original C. I would have hoped that gcc could have turned

   if (x & BIT)
        y |= OTHER_BIT;

into

    y |= (x & BIT) shifted-by-the-bit-difference-between BIT/OTHER_BIT;

but that doesn't happen. We actually do it by hand in some other more
critical places, but it's painful to do by hand (because the shift
direction/amount is not trivial to do in C).

Anyway, that cmovne noise makes it a bit hard to see the actual part
that matters (and that traps) but I'm almost certain that it's the
"mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
when it then does

     flags_by_sb(mnt->mnt_sb->s_flags);

and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
NULL, because we wouldn't have gotten this far if it was.

Now, afaik, mnt->mnt_sb should never be NULL in the first place for a
proper path. And the vfs_statfs() code itself hasn't changed in a
while.

Which does seem to implicate nfsd as having passed in a bad path to
vfs_statfs(). But I'm not seeing any changes in nfsd either.

In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
range. There is a bunch of xfs changes, though. What's the underlying
filesystem that you are exporting?

But bringing in Al Viro and Bruce Fields explicitly in case they see
something. And Darrick, just in case it might be xfs.

         Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09  2:40 ` Linus Torvalds
@ 2017-11-09  3:45   ` Al Viro
  2017-11-09 19:34   ` Patrick McLean
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 82+ messages in thread
From: Al Viro @ 2017-11-09  3:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:

> > Here is the BUG we are getting:
> >> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
> >> [   58.963918] IP: vfs_statfs+0x73/0xb0
> 
> The code disassembles to
> 
>    0: 83 c9 08              or     $0x8,%ecx
>    3: 40 f6 c6 04          test   $0x4,%sil
>    7: 0f 45 d1              cmovne %ecx,%edx
>    a: 89 d1                mov    %edx,%ecx
>    c: 80 cd 04              or     $0x4,%ch
>    f: 40 f6 c6 08          test   $0x8,%sil
>   13: 0f 45 d1              cmovne %ecx,%edx
>   16: 89 d1                mov    %edx,%ecx
>   18: 80 cd 08              or     $0x8,%ch
>   1b: 40 f6 c6 10          test   $0x10,%sil
>   1f: 0f 45 d1              cmovne %ecx,%edx
>   22: 89 d1                mov    %edx,%ecx
>   24: 80 cd 10              or     $0x10,%ch
>   27: 83 e6 20              and    $0x20,%esi
>   2a:* 48 8b b7 30 02 00 00 mov    0x230(%rdi),%rsi <-- trapping instruction
>   31: 0f 45 d1              cmovne %ecx,%edx
>   34: 83 ca 20              or     $0x20,%edx
>   37: 89 f1                mov    %esi,%ecx
>   39: 83 e1 10              and    $0x10,%ecx
>   3c: 89 cf                mov    %ecx,%edi
> 
> and all those odd cmovne and bit-ops are just the bit selection code
> in flags_by_mnt(), which is inlined through calculate_f_flags (which
> is _also_ inlined) into vfs_statfs().
> 
> Sadly, gcc makes a mess of it and actually generates code that looks
> like the original C. I would have hoped that gcc could have turned
> 
>    if (x & BIT)
>         y |= OTHER_BIT;
> 
> into
> 
>     y |= (x & BIT) shifted-by-the-bit-difference-between BIT/OTHER_BIT;
> 
> but that doesn't happen. We actually do it by hand in some other more
> critical places, but it's painful to do by hand (because the shift
> direction/amount is not trivial to do in C).
> 
> Anyway, that cmovne noise makes it a bit hard to see the actual part
> that matters (and that traps) but I'm almost certain that it's the
> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
> when it then does
> 
>      flags_by_sb(mnt->mnt_sb->s_flags);
> 
> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is

Interesting...

struct super_block {
        struct list_head        s_list;         /* Keep this first */
        dev_t                   s_dev;          /* search index; _not_ kdev_t */
        unsigned char           s_blocksize_bits;
        unsigned long           s_blocksize;
        loff_t                  s_maxbytes;     /* Max file size */
        struct file_system_type *s_type;
        const struct super_operations   *s_op;
        const struct dquot_operations   *dq_op;
        const struct quotactl_ops       *s_qcop;
        const struct export_operations *s_export_op;
        unsigned long           s_flags;
	...

s_flags is preceded list_head, u32, unsigned char, 2 u64 and 5 pointers.
IOW, 10 64bit words.  And sure enough, amd64 builds here have
	mov    0x50(%rdi),%rsi
in the corresponding place.  What config and toolchain had produced that?

I would definitely start with turning the randomize crap off, just to
exclude the compiler weirdness.  Incidentally, randomizing anything that
contains a hash chain and key...  super_block is not the worst here -
struct dentry is clear "winner".  Anything in
struct dentry {
        /* RCU lookup touched fields */
        unsigned int d_flags;           /* protected by d_lock */
        seqcount_t d_seq;               /* per dentry seqlock */
        struct hlist_bl_node d_hash;    /* lookup hash list */
        struct dentry *d_parent;        /* parent directory */
        struct qstr d_name;
        struct inode *d_inode;          /* Where the name belongs to - NULL is
                                         * negative */
moving into a separate cache line and we've just doubled cache footprint of
hash chain traversal.

How much reordering does that gcc misfeature do and why do we enable
that in the first place?

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09  2:40 ` Linus Torvalds
  2017-11-09  3:45   ` Al Viro
@ 2017-11-09 19:34   ` Patrick McLean
  2017-11-09 19:38     ` Al Viro
  2017-11-09 19:37   ` Al Viro
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 82+ messages in thread
From: Patrick McLean @ 2017-11-09 19:34 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro, Bruce Fields, Darrick J. Wong
  Cc: Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis



On 2017-11-08 06:40 PM, Linus Torvalds wrote:
> On Wed, Nov 8, 2017 at 4:43 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> As of 4.13.11 (and also with 4.14-rc) we have an issue where when
>> serving nfs4 sometimes we get the following BUG. When this bug happens,
>> it usually also causes the motherboard to no longer POST until we
>> externally re-flash the BIOS (using the BMC web interface). If a
>> motherboard does not have an external way to flash the BIOS, this would
>> brick the hardware.
> 
> That sounds like your BIOS is just broken.

All the dead boards were from the same vendor. We are going to try some
boards from another vendor today.

> 
> The kernel oops is probably just a trigger for that - possibly because
> you reboot with a particular state that breaks the BIOS.
> 
> Also, are you sure you really need to reflash the BIOS? It's actually
> fairly hard to overwrite the BIOS itself, but crashing with bad
> hardware state (where "bad" can just mean "unexpected by the BIOS")
> can cause the BIOS to not properly re-initialize things, and hang at
> boot.
> 
> So not booting cleanly from a warm reset is a reasonably common BIOS failure.
> 
> And yes, reflashing tends to force a full initialization and thus
> "fixes" things, but it may be a big hammer when a cold boot or just a
> "reset BIOS to safe defaults" might be sufficient.
> 
> In pretty much all cases this is a sign of a nasty BIOS problem,
> though, and you may want to look into a firmware update from the
> vendor for that.

We tried a cold power off (physically unplugging the machine from power)
and a CMOS reset, and neither helped. The only thing that actually
restored one of the dead boards was a reflash. I did the reflash with
the latest code when I reflashed it.

> 
> But on to the kernel side:
> 
>> Here is the BUG we are getting:
>>> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
>>> [   58.963918] IP: vfs_statfs+0x73/0xb0
> 
> The code disassembles to
> 
>    0: 83 c9 08              or     $0x8,%ecx
>    3: 40 f6 c6 04          test   $0x4,%sil
>    7: 0f 45 d1              cmovne %ecx,%edx
>    a: 89 d1                mov    %edx,%ecx
>    c: 80 cd 04              or     $0x4,%ch
>    f: 40 f6 c6 08          test   $0x8,%sil
>   13: 0f 45 d1              cmovne %ecx,%edx
>   16: 89 d1                mov    %edx,%ecx
>   18: 80 cd 08              or     $0x8,%ch
>   1b: 40 f6 c6 10          test   $0x10,%sil
>   1f: 0f 45 d1              cmovne %ecx,%edx
>   22: 89 d1                mov    %edx,%ecx
>   24: 80 cd 10              or     $0x10,%ch
>   27: 83 e6 20              and    $0x20,%esi
>   2a:* 48 8b b7 30 02 00 00 mov    0x230(%rdi),%rsi <-- trapping instruction
>   31: 0f 45 d1              cmovne %ecx,%edx
>   34: 83 ca 20              or     $0x20,%edx
>   37: 89 f1                mov    %esi,%ecx
>   39: 83 e1 10              and    $0x10,%ecx
>   3c: 89 cf                mov    %ecx,%edi
> 
> and all those odd cmovne and bit-ops are just the bit selection code
> in flags_by_mnt(), which is inlined through calculate_f_flags (which
> is _also_ inlined) into vfs_statfs().
> 
> Sadly, gcc makes a mess of it and actually generates code that looks
> like the original C. I would have hoped that gcc could have turned
> 
>    if (x & BIT)
>         y |= OTHER_BIT;
> 
> into
> 
>     y |= (x & BIT) shifted-by-the-bit-difference-between BIT/OTHER_BIT;
> 
> but that doesn't happen. We actually do it by hand in some other more
> critical places, but it's painful to do by hand (because the shift
> direction/amount is not trivial to do in C).
> 
> Anyway, that cmovne noise makes it a bit hard to see the actual part
> that matters (and that traps) but I'm almost certain that it's the
> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
> when it then does
> 
>      flags_by_sb(mnt->mnt_sb->s_flags);
> 
> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
> NULL, because we wouldn't have gotten this far if it was.
> 
> Now, afaik, mnt->mnt_sb should never be NULL in the first place for a
> proper path. And the vfs_statfs() code itself hasn't changed in a
> while.
> 
> Which does seem to implicate nfsd as having passed in a bad path to
> vfs_statfs(). But I'm not seeing any changes in nfsd either.
> 
> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
> range. There is a bunch of xfs changes, though. What's the underlying
> filesystem that you are exporting?

It's an ext4 filesystem.

> 
> But bringing in Al Viro and Bruce Fields explicitly in case they see
> something. And Darrick, just in case it might be xfs.
> 

Thanks

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09  2:40 ` Linus Torvalds
  2017-11-09  3:45   ` Al Viro
  2017-11-09 19:34   ` Patrick McLean
@ 2017-11-09 19:37   ` Al Viro
  2017-11-09 19:51     ` Patrick McLean
  2017-11-09 20:47   ` J. Bruce Fields
  2017-11-13 22:59   ` bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11] Rasmus Villemoes
  4 siblings, 1 reply; 82+ messages in thread
From: Al Viro @ 2017-11-09 19:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:

> > Here is the BUG we are getting:
> >> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
> >> [   58.963918] IP: vfs_statfs+0x73/0xb0
> 
> The code disassembles to

>   2a:* 48 8b b7 30 02 00 00 mov    0x230(%rdi),%rsi <-- trapping instruction

> that matters (and that traps) but I'm almost certain that it's the
> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
> when it then does
> 
>      flags_by_sb(mnt->mnt_sb->s_flags);
> 
> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
> NULL, because we wouldn't have gotten this far if it was.
> 
> Now, afaik, mnt->mnt_sb should never be NULL in the first place for a
> proper path. And the vfs_statfs() code itself hasn't changed in a
> while.
> 
> Which does seem to implicate nfsd as having passed in a bad path to
> vfs_statfs(). But I'm not seeing any changes in nfsd either.

It definitely is NULL mnt->mnt_sb and that should never happen.  All
struct mount instances are allocated by alloc_vfsmnt().  Its callers
are
	* vfs_kern_mount().  Assigns ->mnt_sb to root->d_sb before
anyone else sees the address of that object.
	* clone_mnt().  Assigns ->mnt_sb to that of preexisting instance
before anyone else sees the address of that object.

No other callers exist and no other places ever modify the value of that
field.

All instances of struct dentry are created by __d_alloc()[*], which assigns
->d_sb (never to be modified afterwards) *and* dereferences the pointer
it has stored in ->d_sb before the created struct dentry becomes visible
to anyone else.  No struct dentry should ever be observed with NULL ->d_sb;
the only way to get that is memory corruption or looking at freed instance
after its memory has been reused for something else and zeroed.

In other words, we should never observe a struct mount with NULL ->mnt.mnt_sb -
not without memory corruption or looking at freed instance.

The pointer in that case should've come from exp->ex_path.mnt, exp being
the argument of nfsd4_encode_fattr().  Sure, it might have been a dangling
reference.  However, it looks a lot more like a memory corruptor *OR*
miscompiled kernel.

What kind of load do the reproducer boxen have and how fast does that
bug trigger?  Would it be possible to slap something like
	if (unlikely(!exp->exp_path.mnt->mnt_sb)) {
		struct mount *m = real_mount(exp->exp_path.mnt);
		printk(KERN_ERR "mnt: %p\n", exp->exp_path.mnt);
		printk(KERN_ERR "name: [%s]\n", m->mnt_devname);
		printk(KERN_ERR "ns: [%p]\n", m->mnt_ns);
		printk(KERN_ERR "parent: [%p]\n", m->mnt_parent);
		WARN_ON(1);
		err = -EINVAL;
		goto out_nfserr;
	}
in the beginning of nfsd4_encode_fattr() (with include of ../mount.h added
in fs/nfsd/nfs4xdr.c) and see what will it catch?

Both with and without randomized structs, if possible - I might be barking
at the wrong tree, but IMO the very first step in localizing that crap is
to find out whether it's toolchain-related or not.

[*] strictly speaking, there is one exception - lib/test_printf.c has
four static struct dentry instances.  No chance of those being returned
by any ->mount() instance, though.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09 19:34   ` Patrick McLean
@ 2017-11-09 19:38     ` Al Viro
  2017-11-09 19:42       ` Patrick McLean
  0 siblings, 1 reply; 82+ messages in thread
From: Al Viro @ 2017-11-09 19:38 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Thu, Nov 09, 2017 at 11:34:19AM -0800, Patrick McLean wrote:

> > In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
> > range. There is a bunch of xfs changes, though. What's the underlying
> > filesystem that you are exporting?
> 
> It's an ext4 filesystem.

Had there been toolchain changes around the same period?

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09 19:38     ` Al Viro
@ 2017-11-09 19:42       ` Patrick McLean
  0 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-09 19:42 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis



On 2017-11-09 11:38 AM, Al Viro wrote:
> On Thu, Nov 09, 2017 at 11:34:19AM -0800, Patrick McLean wrote:
> 
>>> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
>>> range. There is a bunch of xfs changes, though. What's the underlying
>>> filesystem that you are exporting?
>>
>> It's an ext4 filesystem.
> 
> Had there been toolchain changes around the same period?
> 
No changes, both the working and broken kernels were built with
distro-provided gcc 5.4.0 and binutils 2.28.1.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09 19:37   ` Al Viro
@ 2017-11-09 19:51     ` Patrick McLean
  2017-11-09 20:04       ` Linus Torvalds
  2017-11-10  1:47       ` Patrick McLean
  0 siblings, 2 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-09 19:51 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Bruce Fields, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis

On 2017-11-09 11:37 AM, Al Viro wrote:
> On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
> 
>>> Here is the BUG we are getting:
>>>> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
>>>> [   58.963918] IP: vfs_statfs+0x73/0xb0
>>
>> The code disassembles to
> 
>>   2a:* 48 8b b7 30 02 00 00 mov    0x230(%rdi),%rsi <-- trapping instruction
> 
>> that matters (and that traps) but I'm almost certain that it's the
>> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
>> when it then does
>>
>>      flags_by_sb(mnt->mnt_sb->s_flags);
>>
>> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
>> NULL, because we wouldn't have gotten this far if it was.
>>
> 
> All instances of struct dentry are created by __d_alloc()[*], which assigns
> ->d_sb (never to be modified afterwards) *and* dereferences the pointer
> it has stored in ->d_sb before the created struct dentry becomes visible
> to anyone else.  No struct dentry should ever be observed with NULL ->d_sb;
> the only way to get that is memory corruption or looking at freed instance
> after its memory has been reused for something else and zeroed.
> 
> In other words, we should never observe a struct mount with NULL ->mnt.mnt_sb -
> not without memory corruption or looking at freed instance.
> 
> The pointer in that case should've come from exp->ex_path.mnt, exp being
> the argument of nfsd4_encode_fattr().  Sure, it might have been a dangling
> reference.  However, it looks a lot more like a memory corruptor *OR*
> miscompiled kernel.
> 
> What kind of load do the reproducer boxen have and how fast does that
> bug trigger?  Would it be possible to slap something like
> 	if (unlikely(!exp->exp_path.mnt->mnt_sb)) {
> 		struct mount *m = real_mount(exp->exp_path.mnt);
> 		printk(KERN_ERR "mnt: %p\n", exp->exp_path.mnt);
> 		printk(KERN_ERR "name: [%s]\n", m->mnt_devname);
> 		printk(KERN_ERR "ns: [%p]\n", m->mnt_ns);
> 		printk(KERN_ERR "parent: [%p]\n", m->mnt_parent);
> 		WARN_ON(1);
> 		err = -EINVAL;
> 		goto out_nfserr;
> 	}
> in the beginning of nfsd4_encode_fattr() (with include of ../mount.h added
> in fs/nfsd/nfs4xdr.c) and see what will it catch?
> 
> Both with and without randomized structs, if possible - I might be barking
> at the wrong tree, but IMO the very first step in localizing that crap is
> to find out whether it's toolchain-related or not.

The reproducer boxen are not under particularly heavy load, they are
serving NFS to 1 or 2 clients (which are essentially embedded devices).
When the bug triggers, it usually triggers pretty fast and reliably, but
it seems to only trigger on some subset of bootups. Once it fails to
trigger, we seem to have to reboot to get it to trigger.

I should be able to have some results with that added in a few hours.
It's weirdly unreliable to reproduce this.

We do have CONFIG_GCC_PLUGIN_STRUCTLEAK and
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL enabled on these boxes as well as
CONFIG_GCC_PLUGIN_RANDSTRUCT as you pointed out before.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09 19:51     ` Patrick McLean
@ 2017-11-09 20:04       ` Linus Torvalds
  2017-11-09 21:16         ` Al Viro
  2017-11-10  1:58         ` Patrick McLean
  2017-11-10  1:47       ` Patrick McLean
  1 sibling, 2 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-09 20:04 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Thu, Nov 9, 2017 at 11:51 AM, Patrick McLean <chutzpah@gentoo.org> wrote:
>
> We do have CONFIG_GCC_PLUGIN_STRUCTLEAK and
> CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL enabled on these boxes as well as
> CONFIG_GCC_PLUGIN_RANDSTRUCT as you pointed out before.

It might be worth just verifying without RANDSTRUCT in particular.

That case has probably not gotten a huge amount of testing.  As Al
points out, it can cause absolutely horrendous cache access pattern
changes, but it might also be triggering some corruption in case
there's a problem with the plugin, or with some piece of kernel code
that gets confused by it.

And most obviously: if there is some module or part of the kernel that
got compiled with a different seed for the randstruct hashing, that
will break in nasty nasty ways. Your out-of-kernel module is the
obvious suspect for something like that, but honestly, it could be
some missing build dependency, or simply a missing special case in the
plugin itself a missing __no_randomize_layout or any number of things.

We've hit gcc bugs many times before - and the plugins are just new
opportunities to hit cases that have gotten a lot less testing than
the "normal" code flow has.

The structleak plugin is much less likely to be a problem (simply
because it's a much simpler plugin), but hey, something being NULL
when it shouldn't possibly be might be a stray "leak initialization".

So since you seem to be able to reproduce this _reasonably_ easily,
it's definitely worth checking that it still reproduces even without
the gcc plugins.

Just to narrow it down a bit.

              Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09  2:40 ` Linus Torvalds
                     ` (2 preceding siblings ...)
  2017-11-09 19:37   ` Al Viro
@ 2017-11-09 20:47   ` J. Bruce Fields
  2017-11-09 23:07     ` Patrick McLean
  2017-11-13 22:59   ` bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11] Rasmus Villemoes
  4 siblings, 1 reply; 82+ messages in thread
From: J. Bruce Fields @ 2017-11-09 20:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Al Viro, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
> Anyway, that cmovne noise makes it a bit hard to see the actual part
> that matters (and that traps) but I'm almost certain that it's the
> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
> when it then does
> 
>      flags_by_sb(mnt->mnt_sb->s_flags);
> 
> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
> NULL, because we wouldn't have gotten this far if it was.
> 
> Now, afaik, mnt->mnt_sb should never be NULL in the first place for a
> proper path. And the vfs_statfs() code itself hasn't changed in a
> while.
> 
> Which does seem to implicate nfsd as having passed in a bad path to
> vfs_statfs(). But I'm not seeing any changes in nfsd either.
> 
> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
> range. There is a bunch of xfs changes, though. What's the underlying
> filesystem that you are exporting?
> 
> But bringing in Al Viro and Bruce Fields explicitly in case they see
> something. And Darrick, just in case it might be xfs.

Looking at https://lkml.org/lkml/2017/11/8/1086 for the actual oops...

It doesn't remind me of any known issue.

And I don't see how we can call vfs_statfs() with a bad path:
nfsd4_encode_getattr would have to have been called with nfserr 0 and
ga_fhp->fh_export bad.

Looking at nfsd4_proc_compound, I can't see how we could get there in
the op->status == 0 case without the fh_verify() in nfsd4_getattr having
succeeded and assigned the result to ga_fhp.

So either I'm overlooking something or the bug's elsewhere.

It sounds like you're varying *only* the server version, so there's not
much chance that this could be triggered by changes in client behavior?

--b.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09 20:04       ` Linus Torvalds
@ 2017-11-09 21:16         ` Al Viro
  2017-11-10  1:58         ` Patrick McLean
  1 sibling, 0 replies; 82+ messages in thread
From: Al Viro @ 2017-11-09 21:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Thu, Nov 09, 2017 at 12:04:19PM -0800, Linus Torvalds wrote:

> That case has probably not gotten a huge amount of testing.  As Al
> points out, it can cause absolutely horrendous cache access pattern
> changes, but it might also be triggering some corruption in case
> there's a problem with the plugin, or with some piece of kernel code
> that gets confused by it.

I suspect that it might be an effect of randomize shite done both to
struct mount *AND* to struct vfsmount embedded into it.  With
pointers to embedded struct vfsmount kept around a lot, and container_of()
used to get from them to corresponding struct mount.

That smells like a combination of idiocy that might have never occured
to the authors of said gcc plugin.

On the other hand, triggered gcc bugs certainly do add randomness, so good
luck explaining to the security community that it's not a good idea...

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09 20:47   ` J. Bruce Fields
@ 2017-11-09 23:07     ` Patrick McLean
  0 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-09 23:07 UTC (permalink / raw)
  To: J. Bruce Fields, Linus Torvalds
  Cc: Al Viro, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis



On 2017-11-09 12:47 PM, J. Bruce Fields wrote:
> On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
>> Anyway, that cmovne noise makes it a bit hard to see the actual part
>> that matters (and that traps) but I'm almost certain that it's the
>> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
>> when it then does
>>
>>      flags_by_sb(mnt->mnt_sb->s_flags);
>>
>> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
>> NULL, because we wouldn't have gotten this far if it was.
>>
>> Now, afaik, mnt->mnt_sb should never be NULL in the first place for a
>> proper path. And the vfs_statfs() code itself hasn't changed in a
>> while.
>>
>> Which does seem to implicate nfsd as having passed in a bad path to
>> vfs_statfs(). But I'm not seeing any changes in nfsd either.
>>
>> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
>> range. There is a bunch of xfs changes, though. What's the underlying
>> filesystem that you are exporting?
>>
>> But bringing in Al Viro and Bruce Fields explicitly in case they see
>> something. And Darrick, just in case it might be xfs.
> 
> Looking at https://lkml.org/lkml/2017/11/8/1086 for the actual oops...
> 
> It doesn't remind me of any known issue.
> 
> So either I'm overlooking something or the bug's elsewhere.
> 
> It sounds like you're varying *only* the server version, so there's not
> much chance that this could be triggered by changes in client behavior?
> 

We are definitely only varying the kernel on the server, nothing on the
client side is changing. The client in this case is essentially an
embedded device that we do not have a whole lot of control over.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09 19:51     ` Patrick McLean
  2017-11-09 20:04       ` Linus Torvalds
@ 2017-11-10  1:47       ` Patrick McLean
  1 sibling, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-10  1:47 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Bruce Fields, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis



On 2017-11-09 11:51 AM, Patrick McLean wrote:
> On 2017-11-09 11:37 AM, Al Viro wrote:
>> On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
>>
>>>> Here is the BUG we are getting:
>>>>> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
>>>>> [   58.963918] IP: vfs_statfs+0x73/0xb0
>>>
>>> The code disassembles to
>>
>>>   2a:* 48 8b b7 30 02 00 00 mov    0x230(%rdi),%rsi <-- trapping instruction
>>
>>> that matters (and that traps) but I'm almost certain that it's the
>>> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
>>> when it then does
>>>
>>>      flags_by_sb(mnt->mnt_sb->s_flags);
>>>
>>> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
>>> NULL, because we wouldn't have gotten this far if it was.
>>>
>>
>> All instances of struct dentry are created by __d_alloc()[*], which assigns
>> ->d_sb (never to be modified afterwards) *and* dereferences the pointer
>> it has stored in ->d_sb before the created struct dentry becomes visible
>> to anyone else.  No struct dentry should ever be observed with NULL ->d_sb;
>> the only way to get that is memory corruption or looking at freed instance
>> after its memory has been reused for something else and zeroed.
>>
>> In other words, we should never observe a struct mount with NULL ->mnt.mnt_sb -
>> not without memory corruption or looking at freed instance.
>>
>> The pointer in that case should've come from exp->ex_path.mnt, exp being
>> the argument of nfsd4_encode_fattr().  Sure, it might have been a dangling
>> reference.  However, it looks a lot more like a memory corruptor *OR*
>> miscompiled kernel.
>>
>> What kind of load do the reproducer boxen have and how fast does that
>> bug trigger?  Would it be possible to slap something like
>> 	if (unlikely(!exp->exp_path.mnt->mnt_sb)) {
>> 		struct mount *m = real_mount(exp->exp_path.mnt);
>> 		printk(KERN_ERR "mnt: %p\n", exp->exp_path.mnt);
>> 		printk(KERN_ERR "name: [%s]\n", m->mnt_devname);
>> 		printk(KERN_ERR "ns: [%p]\n", m->mnt_ns);
>> 		printk(KERN_ERR "parent: [%p]\n", m->mnt_parent);
>> 		WARN_ON(1);
>> 		err = -EINVAL;
>> 		goto out_nfserr;
>> 	}
>> in the beginning of nfsd4_encode_fattr() (with include of ../mount.h added
>> in fs/nfsd/nfs4xdr.c) and see what will it catch?
>>
>> Both with and without randomized structs, if possible - I might be barking
>> at the wrong tree, but IMO the very first step in localizing that crap is
>> to find out whether it's toolchain-related or not.
> 

That condition did not seem to trigger, and I am getting a slightly
different crash message (GPF rather than null pointer dereference). Here
is the dump from the latest crash (with CONFIG_GCC_PLUGIN_STRUCTLEAK,
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL and CONFIG_GCC_PLUGIN_RANDSTRUCT
all enabled).

> [   36.834232] general protection fault: 0000 [#1] SMP
> [   36.835168] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 nf_nat gkuart(O) usbserial x86_pkg_temp_thermal ie31200_edac tpm_tis ipmi_ssif tpm_tis_core ext4 mbcache jbd2 e1000e crc32c_intel
> [   36.839120] CPU: 1 PID: 3969 Comm: nfsd Tainted: G           O    4.14.0-rc8-git-kratos-1-00053-gd93d4ce103fd-dirty #1
> [   36.840883] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   36.841892] task: ffff88040a0b1c80 task.stack: ffffc900027bc000
> [   36.842887] RIP: 0010:vfs_statfs+0x73/0xb0
> [   36.843728] RSP: 0018:ffffc900027bfb30 EFLAGS: 00010202
> [   36.844687] RAX: 0000000000000000 RBX: ffffc900027bfbf8 RCX: 000000000000180d
> [   36.845891] RDX: 000000000000080d RSI: 0000000000000020 RDI: e2006d6574737973
> [   36.847075] RBP: ffffc900027bfbc8 R08: 0000000000000000 R09: 00000000000000ff
> [   36.848175] R10: 000000000038be3a R11: ffff88040b687578 R12: 0000000000000000
> [   36.849260] R13: ffff88040d7dc400 R14: ffff88040d38b000 R15: ffffc900027bfbf8
> [   36.850347] FS:  0000000000000000(0000) GS:ffff88041fc40000(0000) knlGS:0000000000000000
> [   36.851891] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   36.852873] CR2: 00007f049228edc0 CR3: 0000000001e0a004 CR4: 00000000001606e0
> [   36.853942] Call Trace:
> [   36.854667]  nfsd4_encode_fattr+0x34e/0x23b0
> [   36.855578]  ? ext4_get_acl+0x1b2/0x260 [ext4]
> [   36.856485]  ? get_acl+0x7a/0xf0
> [   36.857266]  ? generic_permission+0x125/0x1a0
> [   36.858150]  nfsd4_encode_getattr+0x25/0x30
> [   36.859002]  nfsd4_encode_operation+0x98/0x1a0
> [   36.859889]  nfsd4_proc_compound+0x3eb/0x5c0
> [   36.860736]  nfsd_dispatch+0xa8/0x230
> [   36.861538]  svc_process_common+0x347/0x640
> [   36.862383]  svc_process+0x100/0x1b0
> [   36.863204]  nfsd+0xe0/0x150
> [   36.863984]  kthread+0xfc/0x130
> [   36.864781]  ? nfsd_destroy+0x50/0x50
> [   36.865624]  ? kthread_create_on_node+0x40/0x40
> [   36.866529]  ? do_group_exit+0x3a/0xb0
> [   36.867362]  ret_from_fork+0x25/0x30
> [   36.868188] Code: d1 83 c9 08 40 f6 c6 04 0f 45 d1 89 d1 80 cd 04 40 f6 c6 08 0f 45 d1 89 d1 80 cd 08 40 f6 c6 10 0f 45 d1 89 d1 80 cd 10 83 e6 20 <48> 8b b7 b0 05 00 00 0f 45 d1 83 ca 20 89 f1 83 e1 10 89 cf 83
> [   36.871101] RIP: vfs_statfs+0x73/0xb0 RSP: ffffc900027bfb30
> [   36.872059] ---[ end trace 603ac898c4e2d616 ]---

I haven't been able to reproduce it with CONFIG_GCC_PLUGIN_RANDSTRUCT
disabled, so it seems like it must be a bug there. It's odd that it just
surfaced recently though, we have been using that since it was added.

> The reproducer boxen are not under particularly heavy load, they are
> serving NFS to 1 or 2 clients (which are essentially embedded devices).
> When the bug triggers, it usually triggers pretty fast and reliably, but
> it seems to only trigger on some subset of bootups. Once it fails to
> trigger, we seem to have to reboot to get it to trigger.
> 
> I should be able to have some results with that added in a few hours.
> It's weirdly unreliable to reproduce this.
> 
> We do have CONFIG_GCC_PLUGIN_STRUCTLEAK and
> CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL enabled on these boxes as well as
> CONFIG_GCC_PLUGIN_RANDSTRUCT as you pointed out before.
> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09 20:04       ` Linus Torvalds
  2017-11-09 21:16         ` Al Viro
@ 2017-11-10  1:58         ` Patrick McLean
  2017-11-10 13:53           ` Arnd Bergmann
  2017-11-10 18:42           ` Linus Torvalds
  1 sibling, 2 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-10  1:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On 2017-11-09 12:04 PM, Linus Torvalds wrote:
> On Thu, Nov 9, 2017 at 11:51 AM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>
>> We do have CONFIG_GCC_PLUGIN_STRUCTLEAK and
>> CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL enabled on these boxes as well as
>> CONFIG_GCC_PLUGIN_RANDSTRUCT as you pointed out before.
> 
> It might be worth just verifying without RANDSTRUCT in particular.
> 
> And most obviously: if there is some module or part of the kernel that
> got compiled with a different seed for the randstruct hashing, that
> will break in nasty nasty ways. Your out-of-kernel module is the
> obvious suspect for something like that, but honestly, it could be
> some missing build dependency, or simply a missing special case in the
> plugin itself a missing __no_randomize_layout or any number of things.
> 

We will check our fork against the in-kernel cp201x driver to make sure
we didn't miss anything, but it seems odd we would be hitting the issue
so consistently in the NFS code path, rather than somewhere in USB,
serial, or GPIO paths.

> So since you seem to be able to reproduce this _reasonably_ easily,
> it's definitely worth checking that it still reproduces even without
> the gcc plugins.

I haven't been able to reproduce it with RANDSTRUCT disabled (and
structleak enabled). I will keep trying for a little while more, but
evidence seems to be pointing to that.

Something must have changed since 4.13.8 to trigger this though. This
did not crop up at all until we tried 4.13.11, where it we saw it pretty
quickly. We have a pretty large number of machines running 4.13.6 with
RANDSTRUCT enabled and running a the same workload with many more
clients, and have not seen this bug at all.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-10  1:58         ` Patrick McLean
@ 2017-11-10 13:53           ` Arnd Bergmann
  2017-11-10 18:42           ` Linus Torvalds
  1 sibling, 0 replies; 82+ messages in thread
From: Arnd Bergmann @ 2017-11-10 13:53 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Fri, Nov 10, 2017 at 2:58 AM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-09 12:04 PM, Linus Torvalds wrote:
>> On Thu, Nov 9, 2017 at 11:51 AM, Patrick McLean <chutzpah@gentoo.org> wrote:

>
> We will check our fork against the in-kernel cp201x driver to make sure
> we didn't miss anything, but it seems odd we would be hitting the issue
> so consistently in the NFS code path, rather than somewhere in USB,
> serial, or GPIO paths.
>
>> So since you seem to be able to reproduce this _reasonably_ easily,
>> it's definitely worth checking that it still reproduces even without
>> the gcc plugins.
>
> I haven't been able to reproduce it with RANDSTRUCT disabled (and
> structleak enabled). I will keep trying for a little while more, but
> evidence seems to be pointing to that.
>
> Something must have changed since 4.13.8 to trigger this though. This
> did not crop up at all until we tried 4.13.11, where it we saw it pretty
> quickly. We have a pretty large number of machines running 4.13.6 with
> RANDSTRUCT enabled and running a the same workload with many more
> clients, and have not seen this bug at all.

I couldn't find anything overly suspicious between 4.13.8 and 4.13.11,
see the full list of commits since 3.14.6 at https://pastebin.com/AcxBZR7H

The ones I couldn't immediately rule out (but no smoking gun either) would be:

9970679f497a x86/cpu/AMD: Apply the Erratum 688 fix when the BIOS doesn't
ca6711747c5a assoc_array: Fix a buggy node-splitting case
2fbb8bf749b5 xfs: move two more RT specific functions into CONFIG_XFS_RT
1e1427356d8d xfs: trim writepage mapping to within eof
9df9b634f637 xfs: cancel dirty pages on invalidation
cd3f0bee1b94 xfs: handle error if xfs_btree_get_bufs fails
58cfca25f540 xfs: reinit btree pointer on attr tree inactivation walk
659a9989b68b xfs: don't change inode mode if ACL update fails
88ccd3b6884a xfs: move more RT specific code under CONFIG_XFS_RT
5733ebee586c xfs: Don't log uninitialised fields in inode structures
199a7448c097 xfs: handle racy AIO in xfs_reflink_end_cow
ee5d69c908a1 xfs: always swap the cow forks when swapping extents
2888145444f1 xfs: Capture state of the right inode in xfs_iflush_done
d0fa252b207f xfs: perag initialization should only touch
m_ag_max_usable for AG 0
8da6f7fbe43c xfs: update i_size after unwritten conversion in dio completion
a9eac76e958b xfs: report zeroed or not correctly in xfs_zero_range()
67d51bdcc9f4 fs/xfs: Use %pS printk format for direct addresses
2bf3122f2130 xfs: evict CoW fork extents when performing finsert/fcollapse
a58a0826656d xfs: don't unconditionally clear the reflink flag on
zero-block files
c61e905e0ee2 iomap_dio_rw: Allocate AIO completion queue before submitting dio
7610595830bb pkcs7: Prevent NULL pointer dereference, since sinfo is
not always set.
24a33a0c96f3 KEYS: don't let add_key() update an uninstantiated key
ad4aa448c9b2 FS-Cache: fix dereference of NULL user_key_payload
f45b8fe12221 KEYS: Fix race between updating and finding a negative key
e56be12012c2 ecryptfs: fix dereference of NULL user_key_payload
363ce0b01fe0 fscrypt: fix dereference of NULL user_key_payload
cc757d55c903 lib/digsig: fix dereference of NULL user_key_payload
f5e97214207f x86/microcode/intel: Disable late loading on model 79
7b5e405b7878 Revert "tools/power turbostat: stop migrating, unless '-m'"
8b1e10789c84 KEYS: encrypted: fix dereference of NULL user_key_payload
a258a35a9930 mm: page_vma_mapped: ensure pmd is loaded with READ_ONCE
outside of lock
e47a56cbf519 usb: xhci: Handle error condition in xhci_stop_device()
d53911e63388 usb: xhci: Reset halted endpoint if trb is noop
d1120fe38b3f xhci: Cleanup current_cmd in xhci_cleanup_command_queue()
301d332138d2 xhci: Identify USB 3.1 capable hosts by their port
protocol capability
015e94ead900 usb: hub: Allow reset retry for USB2 devices on connect bounce
1916547b28bd usb: quirks: add quirk for WORLDE MINI MIDI keyboard
e3a038930502 usb: cdc_acm: Add quirk for Elatec TWN3
c2110c8dea7a USB: serial: metro-usb: add MS7820 device id
775462fd5c53 USB: core: fix out-of-bounds access bug in usb_get_bos_descriptor()
a9fdf6354267 USB: devio: Revert "USB: devio: Don't corrupt user memory"

However, you mentioned cp210x, and I noticed related changes in 4.13.8:

e21045a22395 USB: serial: console: fix use-after-free after failed setup
6c7cb458405e USB: serial: console: fix use-after-free on disconnect
4b3e3c7282d6 USB: serial: qcserial: add Dell DW5818, DW5819
c796da1d110f USB: serial: option: add support for TP-Link LTE module
e7e0b4b39663 USB: serial: cp210x: add support for ELV TFD500
1ae2c690f967 USB: serial: cp210x: fix partnum regression
78a02c93648e USB: serial: ftdi_sio: add id for Cypress WICED dev board

You could try reverting those seven, this could point to your forked driver
if it makes a difference.

       Arnd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-10  1:58         ` Patrick McLean
  2017-11-10 13:53           ` Arnd Bergmann
@ 2017-11-10 18:42           ` Linus Torvalds
  2017-11-10 23:26             ` Patrick McLean
  1 sibling, 1 reply; 82+ messages in thread
From: Linus Torvalds @ 2017-11-10 18:42 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>
> Something must have changed since 4.13.8 to trigger this though.

Well, yes and no.

Obviously something changed, but it doesn't necessarily have to be
anything particular.

Almost every time we've seen compiler bugs, it's been an innocuous
change that just happened to trigger a latent issue. Pretty much by
definition compiler bugs tend to be about rare situations, so it's
some odd special case that triggers.

Since it's apparently fairly repeatable for you, a bisection between
4.13.8 and 4.13.11 would be very interesting, and shouldn't take all
that long. There's only 142 commits in that range, so even just a
partial bisection of say four of five rounds should narrow it down to
just a couple of commits. And even a full bisection should only take
something like 8 build/test cycles.

Arnd pointed to some commits that might be relevant for the cp210x
module, but those are all already in 4.13.8, so if 4.13.8 really is
rock solid for you, I don't think that's it.

I really don't see anything that looks even half-way suspicious in
that 4.13.8..11 range. But as mentioned, compiler interactions can be
_really_ subtle.

And hey, it can be a real kernel bug too, that just happens to be
exposed by RANDSTRUCT, so a bisect really would be very nice.

Because in the end, compiler bugs are very rare. They are particularly
annoying when they do happen, though, so they loom big in the mind of
people who have had to chase them down.

                 Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-10 18:42           ` Linus Torvalds
@ 2017-11-10 23:26             ` Patrick McLean
  2017-11-11  0:27               ` Patrick McLean
  2017-11-11  1:13               ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 J. Bruce Fields
  0 siblings, 2 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-10 23:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis



On 2017-11-10 10:42 AM, Linus Torvalds wrote:
> On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>
>> Something must have changed since 4.13.8 to trigger this though.
> 
> Arnd pointed to some commits that might be relevant for the cp210x
> module, but those are all already in 4.13.8, so if 4.13.8 really is
> rock solid for you, I don't think that's it.
> 
> I really don't see anything that looks even half-way suspicious in
> that 4.13.8..11 range. But as mentioned, compiler interactions can be
> _really_ subtle.
> 
> And hey, it can be a real kernel bug too, that just happens to be
> exposed by RANDSTRUCT, so a bisect really would be very nice.

I am working on bisecting the issue now, but I think I have some more
evidence pointing to a compiler issue related to RANDSTRUCT. There are
actually 3 issues that we have seen. Sometimes we get the null pointer
deref in the initial message, sometimes we get the GPF, and sometimes we
see an issue where the NFS clients see all files as root-owned
directories. Any given kernel will always see the same issue, but after
a "make mrproper" and recompile (with the same .config), the issue will
often change. I suspect that all 3 of these problems are actually the
same issue manifesting itself in different ways depending on what seed
the RANDSTRUCT gcc plugin is using.

> 
> Because in the end, compiler bugs are very rare. They are particularly
> annoying when they do happen, though, so they loom big in the mind of
> people who have had to chase them down.
> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-10 23:26             ` Patrick McLean
@ 2017-11-11  0:27               ` Patrick McLean
  2017-11-11  2:36                   ` Linus Torvalds
  2017-11-11  1:13               ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 J. Bruce Fields
  1 sibling, 1 reply; 82+ messages in thread
From: Patrick McLean @ 2017-11-11  0:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On 2017-11-10 03:26 PM, Patrick McLean wrote:
> 
> 
> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>> On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>>
>>> Something must have changed since 4.13.8 to trigger this though.
>>
>> Arnd pointed to some commits that might be relevant for the cp210x
>> module, but those are all already in 4.13.8, so if 4.13.8 really is
>> rock solid for you, I don't think that's it.
>>
>> I really don't see anything that looks even half-way suspicious in
>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>> _really_ subtle.
>>
>> And hey, it can be a real kernel bug too, that just happens to be
>> exposed by RANDSTRUCT, so a bisect really would be very nice.
> 
> I am working on bisecting the issue now, but I think I have some more
> evidence pointing to a compiler issue related to RANDSTRUCT. There are
> actually 3 issues that we have seen. Sometimes we get the null pointer
> deref in the initial message, sometimes we get the GPF, and sometimes we
> see an issue where the NFS clients see all files as root-owned
> directories. Any given kernel will always see the same issue, but after
> a "make mrproper" and recompile (with the same .config), the issue will
> often change. I suspect that all 3 of these problems are actually the
> same issue manifesting itself in different ways depending on what seed
> the RANDSTRUCT gcc plugin is using.
> 

Further update on this, using the same seed for RANDSTRUCT, I have
reproduced this issue on v4.13.0, so it does not seem to be recently
introduced. The older kernel apparently only worked for us because we
were lucky. Generally we always compile new kernels from a fresh tree,
so they are never using the same seed.

In case someone wants to play with this, here are some interesting seeds
(in include/generated/randomize_layout_hash.h):

Produce a NULL pointer dereference (though I am not sure what the client
does to produce this).
    5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc

All files for nfsd4 clients appear as directories owned as root, no
matter the real owner (this happens for all clients we have tested):
    3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e

This is the seed that was breaking motherboards (make sure you have a
way to flash the BIOS with this one):
    3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd

Finally, here is a seed that produces a kernel that does not exhibit any
problems we are aware of:
    e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b

>>
>> Because in the end, compiler bugs are very rare. They are particularly
>> annoying when they do happen, though, so they loom big in the mind of
>> people who have had to chase them down.
>>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-10 23:26             ` Patrick McLean
  2017-11-11  0:27               ` Patrick McLean
@ 2017-11-11  1:13               ` J. Bruce Fields
  2017-11-11  2:32                 ` Al Viro
  1 sibling, 1 reply; 82+ messages in thread
From: J. Bruce Fields @ 2017-11-11  1:13 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Fri, Nov 10, 2017 at 03:26:27PM -0800, Patrick McLean wrote:
> 
> 
> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
> > On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> >>
> >> Something must have changed since 4.13.8 to trigger this though.
> > 
> > Arnd pointed to some commits that might be relevant for the cp210x
> > module, but those are all already in 4.13.8, so if 4.13.8 really is
> > rock solid for you, I don't think that's it.
> > 
> > I really don't see anything that looks even half-way suspicious in
> > that 4.13.8..11 range. But as mentioned, compiler interactions can be
> > _really_ subtle.
> > 
> > And hey, it can be a real kernel bug too, that just happens to be
> > exposed by RANDSTRUCT, so a bisect really would be very nice.
> 
> I am working on bisecting the issue now, but I think I have some more
> evidence pointing to a compiler issue related to RANDSTRUCT. There are
> actually 3 issues that we have seen. Sometimes we get the null pointer
> deref in the initial message, sometimes we get the GPF, and sometimes we
> see an issue where the NFS clients see all files as root-owned
> directories.

That suggests that stat.uid is 0 and stat.mode & S_IFMT is 0040000 in
the stat structure that nfsd passed to vfs_getattr().

No idea what sort of information is useful when tracking down this kind
of bug, but you could also run wireshark and take a look at the server's
GETATTR replies to see if there's some other corruption.

--b.

> Any given kernel will always see the same issue, but after
> a "make mrproper" and recompile (with the same .config), the issue will
> often change. I suspect that all 3 of these problems are actually the
> same issue manifesting itself in different ways depending on what seed
> the RANDSTRUCT gcc plugin is using.
> 
> > 
> > Because in the end, compiler bugs are very rare. They are particularly
> > annoying when they do happen, though, so they loom big in the mind of
> > people who have had to chase them down.
> > 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-11  1:13               ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 J. Bruce Fields
@ 2017-11-11  2:32                 ` Al Viro
  0 siblings, 0 replies; 82+ messages in thread
From: Al Viro @ 2017-11-11  2:32 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Patrick McLean, Linus Torvalds, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Fri, Nov 10, 2017 at 08:13:06PM -0500, J. Bruce Fields wrote:
> On Fri, Nov 10, 2017 at 03:26:27PM -0800, Patrick McLean wrote:
> > 
> > 
> > On 2017-11-10 10:42 AM, Linus Torvalds wrote:
> > > On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> > >>
> > >> Something must have changed since 4.13.8 to trigger this though.
> > > 
> > > Arnd pointed to some commits that might be relevant for the cp210x
> > > module, but those are all already in 4.13.8, so if 4.13.8 really is
> > > rock solid for you, I don't think that's it.
> > > 
> > > I really don't see anything that looks even half-way suspicious in
> > > that 4.13.8..11 range. But as mentioned, compiler interactions can be
> > > _really_ subtle.
> > > 
> > > And hey, it can be a real kernel bug too, that just happens to be
> > > exposed by RANDSTRUCT, so a bisect really would be very nice.
> > 
> > I am working on bisecting the issue now, but I think I have some more
> > evidence pointing to a compiler issue related to RANDSTRUCT. There are
> > actually 3 issues that we have seen. Sometimes we get the null pointer
> > deref in the initial message, sometimes we get the GPF, and sometimes we
> > see an issue where the NFS clients see all files as root-owned
> > directories.
> 
> That suggests that stat.uid is 0 and stat.mode & S_IFMT is 0040000 in
> the stat structure that nfsd passed to vfs_getattr().
> 
> No idea what sort of information is useful when tracking down this kind
> of bug, but you could also run wireshark and take a look at the server's
> GETATTR replies to see if there's some other corruption.

FWIW, having looked at some of the __bugger_layout users...  Compiler bugs
aside,
	* use in struct {dentry,inode,mount,block_device} has to go - cache
use patterns at hash lookups are _not_ something to play with like that.
	* struct file_lock and struct super_block - ditto, only it's not
hash lookups that hurt here.  struct vm_area_struct, while we are at it.
	* struct group_info - Cthulhu's pus-leaking warts, what's the point
randomizing _that_?  No, really - here's the damn thing in all its glory:
struct group_info {
        atomic_t        usage;
        int             ngroups;
        kgid_t          gid[0];
} __randomize_layout;
I really hope that plugin does *not* try to move the ->gid[] anywhere...
Which leaves us a choice between putting ->usage first or second.  Sure,
every bit helps, but... even for security theatre that looks a bit too
pathetic.
	* struct vfsmount.  Wow.  All of log2(3!) bits.  Congratulations.
At least that's better than struct path.  Oh, wait - they'd done struct path
as well...

What the hell had they been doing?  Muscarine old-fashioned way?  Looks like
a mix of pointless and truly dangerous.  And then there are compiler bugs and
the charming effect on reproducibility...

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-11  0:27               ` Patrick McLean
  2017-11-11  2:36                   ` Linus Torvalds
@ 2017-11-11  2:36                   ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-11  2:36 UTC (permalink / raw)
  To: Patrick McLean, Kees Cook, Emese Revfy
  Cc: Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening

[ Bringing in the gcc plugin people and the kernel hardening list,
since it now is no longer even remotely looking like a nfsd, vfs or
filesystem issue any more ]

Kees, Emese,
 the whole thread is on lkml, but there's clearly something horribly
wrong with RANDSTRUCT, and it's not new even though it looked that way
for a while.

Patrick seems to trigger it with nfsd, so it might be specific to that.

Alternatively, it might just be that very few people run
RANDSTRUCT-built kernels, or just have been lucky with the seeding.

Sorry for top-posting, but there's not really anything in the email
itself to reply to, other than saying thanks to Patrick for narrowing
it down like this.

It would have been very interesting if it had actually bisected to
something, but it seems that the real issue is just the choice of
seeding for RANDSTRUCT.

                 Linus

On Fri, Nov 10, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-10 03:26 PM, Patrick McLean wrote:
>> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>>>
>>> I really don't see anything that looks even half-way suspicious in
>>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>>> _really_ subtle.
>>>
>>> And hey, it can be a real kernel bug too, that just happens to be
>>> exposed by RANDSTRUCT, so a bisect really would be very nice.
>>
>> I am working on bisecting the issue now, but I think I have some more
>> evidence pointing to a compiler issue related to RANDSTRUCT. There are
>> actually 3 issues that we have seen. Sometimes we get the null pointer
>> deref in the initial message, sometimes we get the GPF, and sometimes we
>> see an issue where the NFS clients see all files as root-owned
>> directories. Any given kernel will always see the same issue, but after
>> a "make mrproper" and recompile (with the same .config), the issue will
>> often change. I suspect that all 3 of these problems are actually the
>> same issue manifesting itself in different ways depending on what seed
>> the RANDSTRUCT gcc plugin is using.
>
> Further update on this, using the same seed for RANDSTRUCT, I have
> reproduced this issue on v4.13.0, so it does not seem to be recently
> introduced. The older kernel apparently only worked for us because we
> were lucky. Generally we always compile new kernels from a fresh tree,
> so they are never using the same seed.
>
> In case someone wants to play with this, here are some interesting seeds
> (in include/generated/randomize_layout_hash.h):
>
> Produce a NULL pointer dereference (though I am not sure what the client
> does to produce this).
>     5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc
>
> All files for nfsd4 clients appear as directories owned as root, no
> matter the real owner (this happens for all clients we have tested):
>     3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e
>
> This is the seed that was breaking motherboards (make sure you have a
> way to flash the BIOS with this one):
>     3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd
>
> Finally, here is a seed that produces a kernel that does not exhibit any
> problems we are aware of:
>     e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b
>
>>>
>>> Because in the end, compiler bugs are very rare. They are particularly
>>> annoying when they do happen, though, so they loom big in the mind of
>>> people who have had to chase them down.
>>>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-11  2:36                   ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-11  2:36 UTC (permalink / raw)
  To: Patrick McLean, Kees Cook, Emese Revfy
  Cc: Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening

[ Bringing in the gcc plugin people and the kernel hardening list,
since it now is no longer even remotely looking like a nfsd, vfs or
filesystem issue any more ]

Kees, Emese,
 the whole thread is on lkml, but there's clearly something horribly
wrong with RANDSTRUCT, and it's not new even though it looked that way
for a while.

Patrick seems to trigger it with nfsd, so it might be specific to that.

Alternatively, it might just be that very few people run
RANDSTRUCT-built kernels, or just have been lucky with the seeding.

Sorry for top-posting, but there's not really anything in the email
itself to reply to, other than saying thanks to Patrick for narrowing
it down like this.

It would have been very interesting if it had actually bisected to
something, but it seems that the real issue is just the choice of
seeding for RANDSTRUCT.

                 Linus

On Fri, Nov 10, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-10 03:26 PM, Patrick McLean wrote:
>> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>>>
>>> I really don't see anything that looks even half-way suspicious in
>>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>>> _really_ subtle.
>>>
>>> And hey, it can be a real kernel bug too, that just happens to be
>>> exposed by RANDSTRUCT, so a bisect really would be very nice.
>>
>> I am working on bisecting the issue now, but I think I have some more
>> evidence pointing to a compiler issue related to RANDSTRUCT. There are
>> actually 3 issues that we have seen. Sometimes we get the null pointer
>> deref in the initial message, sometimes we get the GPF, and sometimes we
>> see an issue where the NFS clients see all files as root-owned
>> directories. Any given kernel will always see the same issue, but after
>> a "make mrproper" and recompile (with the same .config), the issue will
>> often change. I suspect that all 3 of these problems are actually the
>> same issue manifesting itself in different ways depending on what seed
>> the RANDSTRUCT gcc plugin is using.
>
> Further update on this, using the same seed for RANDSTRUCT, I have
> reproduced this issue on v4.13.0, so it does not seem to be recently
> introduced. The older kernel apparently only worked for us because we
> were lucky. Generally we always compile new kernels from a fresh tree,
> so they are never using the same seed.
>
> In case someone wants to play with this, here are some interesting seeds
> (in include/generated/randomize_layout_hash.h):
>
> Produce a NULL pointer dereference (though I am not sure what the client
> does to produce this).
>     5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc
>
> All files for nfsd4 clients appear as directories owned as root, no
> matter the real owner (this happens for all clients we have tested):
>     3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e
>
> This is the seed that was breaking motherboards (make sure you have a
> way to flash the BIOS with this one):
>     3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd
>
> Finally, here is a seed that produces a kernel that does not exhibit any
> problems we are aware of:
>     e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b
>
>>>
>>> Because in the end, compiler bugs are very rare. They are particularly
>>> annoying when they do happen, though, so they loom big in the mind of
>>> people who have had to chase them down.
>>>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-11  2:36                   ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-11  2:36 UTC (permalink / raw)
  To: Patrick McLean, Kees Cook, Emese Revfy
  Cc: Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening

[ Bringing in the gcc plugin people and the kernel hardening list,
since it now is no longer even remotely looking like a nfsd, vfs or
filesystem issue any more ]

Kees, Emese,
 the whole thread is on lkml, but there's clearly something horribly
wrong with RANDSTRUCT, and it's not new even though it looked that way
for a while.

Patrick seems to trigger it with nfsd, so it might be specific to that.

Alternatively, it might just be that very few people run
RANDSTRUCT-built kernels, or just have been lucky with the seeding.

Sorry for top-posting, but there's not really anything in the email
itself to reply to, other than saying thanks to Patrick for narrowing
it down like this.

It would have been very interesting if it had actually bisected to
something, but it seems that the real issue is just the choice of
seeding for RANDSTRUCT.

                 Linus

On Fri, Nov 10, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-10 03:26 PM, Patrick McLean wrote:
>> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>>>
>>> I really don't see anything that looks even half-way suspicious in
>>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>>> _really_ subtle.
>>>
>>> And hey, it can be a real kernel bug too, that just happens to be
>>> exposed by RANDSTRUCT, so a bisect really would be very nice.
>>
>> I am working on bisecting the issue now, but I think I have some more
>> evidence pointing to a compiler issue related to RANDSTRUCT. There are
>> actually 3 issues that we have seen. Sometimes we get the null pointer
>> deref in the initial message, sometimes we get the GPF, and sometimes we
>> see an issue where the NFS clients see all files as root-owned
>> directories. Any given kernel will always see the same issue, but after
>> a "make mrproper" and recompile (with the same .config), the issue will
>> often change. I suspect that all 3 of these problems are actually the
>> same issue manifesting itself in different ways depending on what seed
>> the RANDSTRUCT gcc plugin is using.
>
> Further update on this, using the same seed for RANDSTRUCT, I have
> reproduced this issue on v4.13.0, so it does not seem to be recently
> introduced. The older kernel apparently only worked for us because we
> were lucky. Generally we always compile new kernels from a fresh tree,
> so they are never using the same seed.
>
> In case someone wants to play with this, here are some interesting seeds
> (in include/generated/randomize_layout_hash.h):
>
> Produce a NULL pointer dereference (though I am not sure what the client
> does to produce this).
>     5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc
>
> All files for nfsd4 clients appear as directories owned as root, no
> matter the real owner (this happens for all clients we have tested):
>     3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e
>
> This is the seed that was breaking motherboards (make sure you have a
> way to flash the BIOS with this one):
>     3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd
>
> Finally, here is a seed that produces a kernel that does not exhibit any
> problems we are aware of:
>     e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b
>
>>>
>>> Because in the end, compiler bugs are very rare. They are particularly
>>> annoying when they do happen, though, so they loom big in the mind of
>>> people who have had to chase them down.
>>>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-09  0:43 [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 Patrick McLean
  2017-11-09  2:40 ` Linus Torvalds
@ 2017-11-11  2:47 ` Alan Cox
  1 sibling, 0 replies; 82+ messages in thread
From: Alan Cox @ 2017-11-11  2:47 UTC (permalink / raw)
  To: Patrick McLean; +Cc: linux-kernel, linux-nfs, stable, regressions, torvalds

On Wed, 8 Nov 2017 16:43:17 -0800
Patrick McLean <chutzpah@gentoo.org> wrote:

> As of 4.13.11 (and also with 4.14-rc) we have an issue where when
> serving nfs4 sometimes we get the following BUG. When this bug happens,
> it usually also causes the motherboard to no longer POST until we
> externally re-flash the BIOS (using the BMC web interface). If a
> motherboard does not have an external way to flash the BIOS, this would
> brick the hardware.

If that is a production x86 system then you need to raise a large red
flag with the vendor because it should not even be possible to splat the
BIOS firmware on a modern PC by running even malicious OS code.

Not only that but if it has a flaw, and you bisect down to create a
reproducer then it's not going to take the bad guys very long to turn it
into an interesting toy to run if they ever exploit a box with that board.

Alan

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-11  2:36                   ` Linus Torvalds
  (?)
@ 2017-11-11 16:13                     ` Kees Cook
  -1 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-11 16:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 10, 2017 at 6:36 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> [ Bringing in the gcc plugin people and the kernel hardening list,
> since it now is no longer even remotely looking like a nfsd, vfs or
> filesystem issue any more ]
>
> Kees, Emese,
>  the whole thread is on lkml, but there's clearly something horribly
> wrong with RANDSTRUCT, and it's not new even though it looked that way
> for a while.

It wouldn't be the first issue we've seen; it's (obviously) a pretty
aggressive change to the resulting build.

> Patrick seems to trigger it with nfsd, so it might be specific to that.
>
> Alternatively, it might just be that very few people run
> RANDSTRUCT-built kernels, or just have been lucky with the seeding.

Given its potential cache-line abuse, I'm not surprised that its usage
is more limited than other features.

> Sorry for top-posting, but there's not really anything in the email
> itself to reply to, other than saying thanks to Patrick for narrowing
> it down like this.

Agreed; thanks Patrick! :) Given that the issue is non-deterministic,
I wonder if the bug is related to some kind of missing RCU or barrier
that goes unnoticed in normal struct layouts.

> It would have been very interesting if it had actually bisected to
> something, but it seems that the real issue is just the choice of
> seeding for RANDSTRUCT.

That's where we've seen bugs in the past: some pathological ordering
of a struct uncovers a corner case. In the past it's been much more
deterministic: doesn't build, or immediately crashes on boot, etc.

I'll take a closer look at this and see if I can provide something to
narrow it down.

-Kees

>
>                  Linus
>
> On Fri, Nov 10, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-10 03:26 PM, Patrick McLean wrote:
>>> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>>>>
>>>> I really don't see anything that looks even half-way suspicious in
>>>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>>>> _really_ subtle.
>>>>
>>>> And hey, it can be a real kernel bug too, that just happens to be
>>>> exposed by RANDSTRUCT, so a bisect really would be very nice.
>>>
>>> I am working on bisecting the issue now, but I think I have some more
>>> evidence pointing to a compiler issue related to RANDSTRUCT. There are
>>> actually 3 issues that we have seen. Sometimes we get the null pointer
>>> deref in the initial message, sometimes we get the GPF, and sometimes we
>>> see an issue where the NFS clients see all files as root-owned
>>> directories. Any given kernel will always see the same issue, but after
>>> a "make mrproper" and recompile (with the same .config), the issue will
>>> often change. I suspect that all 3 of these problems are actually the
>>> same issue manifesting itself in different ways depending on what seed
>>> the RANDSTRUCT gcc plugin is using.
>>
>> Further update on this, using the same seed for RANDSTRUCT, I have
>> reproduced this issue on v4.13.0, so it does not seem to be recently
>> introduced. The older kernel apparently only worked for us because we
>> were lucky. Generally we always compile new kernels from a fresh tree,
>> so they are never using the same seed.
>>
>> In case someone wants to play with this, here are some interesting seeds
>> (in include/generated/randomize_layout_hash.h):
>>
>> Produce a NULL pointer dereference (though I am not sure what the client
>> does to produce this).
>>     5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc
>>
>> All files for nfsd4 clients appear as directories owned as root, no
>> matter the real owner (this happens for all clients we have tested):
>>     3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e
>>
>> This is the seed that was breaking motherboards (make sure you have a
>> way to flash the BIOS with this one):
>>     3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd
>>
>> Finally, here is a seed that produces a kernel that does not exhibit any
>> problems we are aware of:
>>     e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b
>>
>>>>
>>>> Because in the end, compiler bugs are very rare. They are particularly
>>>> annoying when they do happen, though, so they loom big in the mind of
>>>> people who have had to chase them down.
>>>>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-11 16:13                     ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-11 16:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 10, 2017 at 6:36 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> [ Bringing in the gcc plugin people and the kernel hardening list,
> since it now is no longer even remotely looking like a nfsd, vfs or
> filesystem issue any more ]
>
> Kees, Emese,
>  the whole thread is on lkml, but there's clearly something horribly
> wrong with RANDSTRUCT, and it's not new even though it looked that way
> for a while.

It wouldn't be the first issue we've seen; it's (obviously) a pretty
aggressive change to the resulting build.

> Patrick seems to trigger it with nfsd, so it might be specific to that.
>
> Alternatively, it might just be that very few people run
> RANDSTRUCT-built kernels, or just have been lucky with the seeding.

Given its potential cache-line abuse, I'm not surprised that its usage
is more limited than other features.

> Sorry for top-posting, but there's not really anything in the email
> itself to reply to, other than saying thanks to Patrick for narrowing
> it down like this.

Agreed; thanks Patrick! :) Given that the issue is non-deterministic,
I wonder if the bug is related to some kind of missing RCU or barrier
that goes unnoticed in normal struct layouts.

> It would have been very interesting if it had actually bisected to
> something, but it seems that the real issue is just the choice of
> seeding for RANDSTRUCT.

That's where we've seen bugs in the past: some pathological ordering
of a struct uncovers a corner case. In the past it's been much more
deterministic: doesn't build, or immediately crashes on boot, etc.

I'll take a closer look at this and see if I can provide something to
narrow it down.

-Kees

>
>                  Linus
>
> On Fri, Nov 10, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-10 03:26 PM, Patrick McLean wrote:
>>> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>>>>
>>>> I really don't see anything that looks even half-way suspicious in
>>>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>>>> _really_ subtle.
>>>>
>>>> And hey, it can be a real kernel bug too, that just happens to be
>>>> exposed by RANDSTRUCT, so a bisect really would be very nice.
>>>
>>> I am working on bisecting the issue now, but I think I have some more
>>> evidence pointing to a compiler issue related to RANDSTRUCT. There are
>>> actually 3 issues that we have seen. Sometimes we get the null pointer
>>> deref in the initial message, sometimes we get the GPF, and sometimes we
>>> see an issue where the NFS clients see all files as root-owned
>>> directories. Any given kernel will always see the same issue, but after
>>> a "make mrproper" and recompile (with the same .config), the issue will
>>> often change. I suspect that all 3 of these problems are actually the
>>> same issue manifesting itself in different ways depending on what seed
>>> the RANDSTRUCT gcc plugin is using.
>>
>> Further update on this, using the same seed for RANDSTRUCT, I have
>> reproduced this issue on v4.13.0, so it does not seem to be recently
>> introduced. The older kernel apparently only worked for us because we
>> were lucky. Generally we always compile new kernels from a fresh tree,
>> so they are never using the same seed.
>>
>> In case someone wants to play with this, here are some interesting seeds
>> (in include/generated/randomize_layout_hash.h):
>>
>> Produce a NULL pointer dereference (though I am not sure what the client
>> does to produce this).
>>     5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc
>>
>> All files for nfsd4 clients appear as directories owned as root, no
>> matter the real owner (this happens for all clients we have tested):
>>     3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e
>>
>> This is the seed that was breaking motherboards (make sure you have a
>> way to flash the BIOS with this one):
>>     3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd
>>
>> Finally, here is a seed that produces a kernel that does not exhibit any
>> problems we are aware of:
>>     e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b
>>
>>>>
>>>> Because in the end, compiler bugs are very rare. They are particularly
>>>> annoying when they do happen, though, so they loom big in the mind of
>>>> people who have had to chase them down.
>>>>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-11 16:13                     ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-11 16:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 10, 2017 at 6:36 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> [ Bringing in the gcc plugin people and the kernel hardening list,
> since it now is no longer even remotely looking like a nfsd, vfs or
> filesystem issue any more ]
>
> Kees, Emese,
>  the whole thread is on lkml, but there's clearly something horribly
> wrong with RANDSTRUCT, and it's not new even though it looked that way
> for a while.

It wouldn't be the first issue we've seen; it's (obviously) a pretty
aggressive change to the resulting build.

> Patrick seems to trigger it with nfsd, so it might be specific to that.
>
> Alternatively, it might just be that very few people run
> RANDSTRUCT-built kernels, or just have been lucky with the seeding.

Given its potential cache-line abuse, I'm not surprised that its usage
is more limited than other features.

> Sorry for top-posting, but there's not really anything in the email
> itself to reply to, other than saying thanks to Patrick for narrowing
> it down like this.

Agreed; thanks Patrick! :) Given that the issue is non-deterministic,
I wonder if the bug is related to some kind of missing RCU or barrier
that goes unnoticed in normal struct layouts.

> It would have been very interesting if it had actually bisected to
> something, but it seems that the real issue is just the choice of
> seeding for RANDSTRUCT.

That's where we've seen bugs in the past: some pathological ordering
of a struct uncovers a corner case. In the past it's been much more
deterministic: doesn't build, or immediately crashes on boot, etc.

I'll take a closer look at this and see if I can provide something to
narrow it down.

-Kees

>
>                  Linus
>
> On Fri, Nov 10, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-10 03:26 PM, Patrick McLean wrote:
>>> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>>>>
>>>> I really don't see anything that looks even half-way suspicious in
>>>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>>>> _really_ subtle.
>>>>
>>>> And hey, it can be a real kernel bug too, that just happens to be
>>>> exposed by RANDSTRUCT, so a bisect really would be very nice.
>>>
>>> I am working on bisecting the issue now, but I think I have some more
>>> evidence pointing to a compiler issue related to RANDSTRUCT. There are
>>> actually 3 issues that we have seen. Sometimes we get the null pointer
>>> deref in the initial message, sometimes we get the GPF, and sometimes we
>>> see an issue where the NFS clients see all files as root-owned
>>> directories. Any given kernel will always see the same issue, but after
>>> a "make mrproper" and recompile (with the same .config), the issue will
>>> often change. I suspect that all 3 of these problems are actually the
>>> same issue manifesting itself in different ways depending on what seed
>>> the RANDSTRUCT gcc plugin is using.
>>
>> Further update on this, using the same seed for RANDSTRUCT, I have
>> reproduced this issue on v4.13.0, so it does not seem to be recently
>> introduced. The older kernel apparently only worked for us because we
>> were lucky. Generally we always compile new kernels from a fresh tree,
>> so they are never using the same seed.
>>
>> In case someone wants to play with this, here are some interesting seeds
>> (in include/generated/randomize_layout_hash.h):
>>
>> Produce a NULL pointer dereference (though I am not sure what the client
>> does to produce this).
>>     5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc
>>
>> All files for nfsd4 clients appear as directories owned as root, no
>> matter the real owner (this happens for all clients we have tested):
>>     3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e
>>
>> This is the seed that was breaking motherboards (make sure you have a
>> way to flash the BIOS with this one):
>>     3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd
>>
>> Finally, here is a seed that produces a kernel that does not exhibit any
>> problems we are aware of:
>>     e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b
>>
>>>>
>>>> Because in the end, compiler bugs are very rare. They are particularly
>>>> annoying when they do happen, though, so they loom big in the mind of
>>>> people who have had to chase them down.
>>>>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-11 16:13                     ` Kees Cook
  (?)
@ 2017-11-11 17:31                       ` Linus Torvalds
  -1 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-11 17:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

Boris Lukashev points out that Patrick should probably check a newer
version of gcc.

I looked around, and in one of the emails, Patrick said:

  "No changes, both the working and broken kernels were built with
   distro-provided gcc 5.4.0 and binutils 2.28.1"

and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
it's a bug-fix release to a pretty old branch that is not exactly new.

It would probably be good to check if the problems persist with gcc
6.x or 7.x.. I have no idea which gcc version the randstruct people
tend to use themselves.

               Linus


On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>
> I'll take a closer look at this and see if I can provide something to
> narrow it down.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-11 17:31                       ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-11 17:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

Boris Lukashev points out that Patrick should probably check a newer
version of gcc.

I looked around, and in one of the emails, Patrick said:

  "No changes, both the working and broken kernels were built with
   distro-provided gcc 5.4.0 and binutils 2.28.1"

and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
it's a bug-fix release to a pretty old branch that is not exactly new.

It would probably be good to check if the problems persist with gcc
6.x or 7.x.. I have no idea which gcc version the randstruct people
tend to use themselves.

               Linus


On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>
> I'll take a closer look at this and see if I can provide something to
> narrow it down.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-11 17:31                       ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-11 17:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

Boris Lukashev points out that Patrick should probably check a newer
version of gcc.

I looked around, and in one of the emails, Patrick said:

  "No changes, both the working and broken kernels were built with
   distro-provided gcc 5.4.0 and binutils 2.28.1"

and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
it's a bug-fix release to a pretty old branch that is not exactly new.

It would probably be good to check if the problems persist with gcc
6.x or 7.x.. I have no idea which gcc version the randstruct people
tend to use themselves.

               Linus


On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>
> I'll take a closer look at this and see if I can provide something to
> narrow it down.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-11 17:31                       ` Linus Torvalds
  (?)
@ 2017-11-13 22:48                         ` Patrick McLean
  -1 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-13 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Kees Cook
  Cc: Emese Revfy, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening

On 2017-11-11 09:31 AM, Linus Torvalds wrote:
> Boris Lukashev points out that Patrick should probably check a newer
> version of gcc.
> 
> I looked around, and in one of the emails, Patrick said:
> 
>   "No changes, both the working and broken kernels were built with
>    distro-provided gcc 5.4.0 and binutils 2.28.1"
> 
> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
> it's a bug-fix release to a pretty old branch that is not exactly new.
> 
> It would probably be good to check if the problems persist with gcc
> 6.x or 7.x.. I have no idea which gcc version the randstruct people
> tend to use themselves.

I just tested it with gcc 7.2, and was able to reproduce the NULL
pointer dereference, the backtrace looks slightly different this time.

I will also test with binutils 2.29, though I doubt that will make any
difference.

> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
> [   56.166563] IP: vfs_statfs+0x7c/0xc0
> [   56.167249] PGD 0 P4D 0
> [   56.167860] Oops: 0000 [#1] SMP
> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
> [   56.193898] Call Trace:
> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
> [   56.195267]  ? generic_permission+0x12c/0x1a0
> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
> [   56.198268]  nfsd_dispatch+0xe8/0x220
> [   56.198968]  svc_process_common+0x475/0x640
> [   56.199696]  ? nfsd_destroy+0x60/0x60
> [   56.200404]  svc_process+0xf2/0x1a0
> [   56.201079]  nfsd+0xe3/0x150
> [   56.201706]  kthread+0x117/0x130
> [   56.202354]  ? kthread_create_on_node+0x40/0x40
> [   56.203100]  ret_from_fork+0x25/0x30
> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
> [   56.207110] CR2: 0000000000000560
> [   56.207763] ---[ end trace d452986a80f64aaa ]---

> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>
>> I'll take a closer look at this and see if I can provide something to
>> narrow it down.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-13 22:48                         ` Patrick McLean
  0 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-13 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Kees Cook
  Cc: Emese Revfy, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening

On 2017-11-11 09:31 AM, Linus Torvalds wrote:
> Boris Lukashev points out that Patrick should probably check a newer
> version of gcc.
> 
> I looked around, and in one of the emails, Patrick said:
> 
>   "No changes, both the working and broken kernels were built with
>    distro-provided gcc 5.4.0 and binutils 2.28.1"
> 
> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
> it's a bug-fix release to a pretty old branch that is not exactly new.
> 
> It would probably be good to check if the problems persist with gcc
> 6.x or 7.x.. I have no idea which gcc version the randstruct people
> tend to use themselves.

I just tested it with gcc 7.2, and was able to reproduce the NULL
pointer dereference, the backtrace looks slightly different this time.

I will also test with binutils 2.29, though I doubt that will make any
difference.

> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
> [   56.166563] IP: vfs_statfs+0x7c/0xc0
> [   56.167249] PGD 0 P4D 0
> [   56.167860] Oops: 0000 [#1] SMP
> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
> [   56.193898] Call Trace:
> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
> [   56.195267]  ? generic_permission+0x12c/0x1a0
> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
> [   56.198268]  nfsd_dispatch+0xe8/0x220
> [   56.198968]  svc_process_common+0x475/0x640
> [   56.199696]  ? nfsd_destroy+0x60/0x60
> [   56.200404]  svc_process+0xf2/0x1a0
> [   56.201079]  nfsd+0xe3/0x150
> [   56.201706]  kthread+0x117/0x130
> [   56.202354]  ? kthread_create_on_node+0x40/0x40
> [   56.203100]  ret_from_fork+0x25/0x30
> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
> [   56.207110] CR2: 0000000000000560
> [   56.207763] ---[ end trace d452986a80f64aaa ]---

> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>
>> I'll take a closer look at this and see if I can provide something to
>> narrow it down.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-13 22:48                         ` Patrick McLean
  0 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-13 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Kees Cook
  Cc: Emese Revfy, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening

On 2017-11-11 09:31 AM, Linus Torvalds wrote:
> Boris Lukashev points out that Patrick should probably check a newer
> version of gcc.
> 
> I looked around, and in one of the emails, Patrick said:
> 
>   "No changes, both the working and broken kernels were built with
>    distro-provided gcc 5.4.0 and binutils 2.28.1"
> 
> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
> it's a bug-fix release to a pretty old branch that is not exactly new.
> 
> It would probably be good to check if the problems persist with gcc
> 6.x or 7.x.. I have no idea which gcc version the randstruct people
> tend to use themselves.

I just tested it with gcc 7.2, and was able to reproduce the NULL
pointer dereference, the backtrace looks slightly different this time.

I will also test with binutils 2.29, though I doubt that will make any
difference.

> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
> [   56.166563] IP: vfs_statfs+0x7c/0xc0
> [   56.167249] PGD 0 P4D 0
> [   56.167860] Oops: 0000 [#1] SMP
> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
> [   56.193898] Call Trace:
> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
> [   56.195267]  ? generic_permission+0x12c/0x1a0
> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
> [   56.198268]  nfsd_dispatch+0xe8/0x220
> [   56.198968]  svc_process_common+0x475/0x640
> [   56.199696]  ? nfsd_destroy+0x60/0x60
> [   56.200404]  svc_process+0xf2/0x1a0
> [   56.201079]  nfsd+0xe3/0x150
> [   56.201706]  kthread+0x117/0x130
> [   56.202354]  ? kthread_create_on_node+0x40/0x40
> [   56.203100]  ret_from_fork+0x25/0x30
> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
> [   56.207110] CR2: 0000000000000560
> [   56.207763] ---[ end trace d452986a80f64aaa ]---

> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>
>> I'll take a closer look at this and see if I can provide something to
>> narrow it down.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11]
  2017-11-09  2:40 ` Linus Torvalds
                     ` (3 preceding siblings ...)
  2017-11-09 20:47   ` J. Bruce Fields
@ 2017-11-13 22:59   ` Rasmus Villemoes
  2017-11-13 23:30     ` Linus Torvalds
  4 siblings, 1 reply; 82+ messages in thread
From: Rasmus Villemoes @ 2017-11-13 22:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Thu, Nov 09 2017, Linus Torvalds <torvalds@linux-foundation.org> wrote:

> The code disassembles to
>
>    0: 83 c9 08              or     $0x8,%ecx
>    3: 40 f6 c6 04          test   $0x4,%sil
>    7: 0f 45 d1              cmovne %ecx,%edx
>    a: 89 d1                mov    %edx,%ecx
>    c: 80 cd 04              or     $0x4,%ch
>    f: 40 f6 c6 08          test   $0x8,%sil
>   13: 0f 45 d1              cmovne %ecx,%edx
>   16: 89 d1                mov    %edx,%ecx
>   18: 80 cd 08              or     $0x8,%ch
>   1b: 40 f6 c6 10          test   $0x10,%sil
>   1f: 0f 45 d1              cmovne %ecx,%edx
>   22: 89 d1                mov    %edx,%ecx
>   24: 80 cd 10              or     $0x10,%ch
>   27: 83 e6 20              and    $0x20,%esi
>   2a:* 48 8b b7 30 02 00 00 mov    0x230(%rdi),%rsi <-- trapping instruction
>   31: 0f 45 d1              cmovne %ecx,%edx
>   34: 83 ca 20              or     $0x20,%edx
>   37: 89 f1                mov    %esi,%ecx
>   39: 83 e1 10              and    $0x10,%ecx
>   3c: 89 cf                mov    %ecx,%edi
>
> and all those odd cmovne and bit-ops are just the bit selection code
> in flags_by_mnt(), which is inlined through calculate_f_flags (which
> is _also_ inlined) into vfs_statfs().
>
> Sadly, gcc makes a mess of it and actually generates code that looks
> like the original C. I would have hoped that gcc could have turned
>
>    if (x & BIT)
>         y |= OTHER_BIT;
>
> into
>
>     y |= (x & BIT) shifted-by-the-bit-difference-between BIT/OTHER_BIT;
>
> but that doesn't happen.

Actually, new enough gcc (7.1, I think) does contain a pattern that does
this, but unfortunately only if one spells it

  y |= (x & BIT) ? OTHER_BIT : 0;

which is half-way to doing it by hand, I suppose. Doing the

-       if (mnt_flags & MNT_READONLY)
-               flags |= ST_RDONLY;
+       flags |= (mnt_flags & MNT_READONLY) ? ST_RDONLY : 0;

and pasting into godbolt.org, one can apparently get gcc to compile it
to

flags_by_mnt(int):
  leal (%rdi,%rdi), %edx
  movl %edi, %eax
  sarl $6, %eax
  movl %edx, %ecx
  andl $1, %eax
  andl $12, %edx
  andl $2, %ecx
  orl %ecx, %eax
  orl %eax, %edx
  movl %edi, %eax
  sall $7, %eax
  andl $7168, %eax
  orl %edx, %eax
  ret

Rasmus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11]
  2017-11-13 22:59   ` bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11] Rasmus Villemoes
@ 2017-11-13 23:30     ` Linus Torvalds
  2017-11-13 23:54       ` Linus Torvalds
  0 siblings, 1 reply; 82+ messages in thread
From: Linus Torvalds @ 2017-11-13 23:30 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Patrick McLean, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Mon, Nov 13, 2017 at 2:59 PM, Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>> Sadly, gcc makes a mess of it and actually generates code that looks
>> like the original C.[...]
>
> Actually, new enough gcc (7.1, I think) does contain a pattern that does
> this, but unfortunately only if one spells it
>
>   y |= (x & BIT) ? OTHER_BIT : 0;

Ahh, I should have recognized that, I think that's what we ended up
doing with the VM_READ -> PROT_READ translation in a few places,
exactly because gcc would then recognize it and do the much better
code generation.

> which is half-way to doing it by hand, I suppose.

Yeah, but it is at least acceptable, and the code is still legible C.
The alternatives of doing it _entirely_ by hand tend to be much worse
(ie you end up using a macro from hell that checks which of the two
bits are bigger and shifting in the right direction by using
multiplication or division).

So let's just rewrite that mnt_flags conversion that way, justr to get
gcc to generate the obvious code.

It's a bit sad how gcc didn't pick up on the original code, especially
since it had already done the much more complicated translation of
doing the if-conversion.

Thanks for pointing out the gcc pattern.

          Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11]
  2017-11-13 23:30     ` Linus Torvalds
@ 2017-11-13 23:54       ` Linus Torvalds
  2017-11-14 22:24         ` Rasmus Villemoes
  0 siblings, 1 reply; 82+ messages in thread
From: Linus Torvalds @ 2017-11-13 23:54 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Patrick McLean, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis

On Mon, Nov 13, 2017 at 3:30 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So let's just rewrite that mnt_flags conversion that way, justr to get
> gcc to generate the obvious code.

Oh wow. I tried to do the same thing in fs/namespace.c where it does
the reverse bit translation, and gcc makes a _horrible_ mess of it and
actually makes the code much worse because for some reason the pattern
doesn't trigger.

So this gcc optimization is apparently pretty damn fragile in general.
It triggers for the trivial cases, but then other code around it can
confuse it badly.

So I don't think I'll touch this, it seems to not be really reliably
something that makes gcc generate what should be the obvious code..

               Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11]
  2017-11-13 23:54       ` Linus Torvalds
@ 2017-11-14 22:24         ` Rasmus Villemoes
  2017-11-14 22:43           ` Linus Torvalds
  0 siblings, 1 reply; 82+ messages in thread
From: Rasmus Villemoes @ 2017-11-14 22:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On 14 November 2017 at 00:54, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 13, 2017 at 3:30 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> So let's just rewrite that mnt_flags conversion that way, justr to get
>> gcc to generate the obvious code.
>
> Oh wow. I tried to do the same thing in fs/namespace.c where it does
> the reverse bit translation, and gcc makes a _horrible_ mess of it and
> actually makes the code much worse because for some reason the pattern
> doesn't trigger.

[trimming cc list]

Can you be more specific? That's not what I see with gcc 7.1. I've
found two blocks where I replaced the if's with the ternary
expressions, one in clone_mnt, one in do_mount. In clone_mnt, gcc-7.1
seems to do (first instruction is the unconditional |= MNT_LOCK_ATIME)

    10d7:       0d 00 00 04 00          or     $0x40000,%eax
    10dc:       89 43 30                mov    %eax,0x30(%rbx)
    10df:       a8 02                   test   $0x2,%al
    10e1:       74 08                   je     10eb <clone_mnt+0x9b>
    10e3:       0d 00 00 20 00          or     $0x200000,%eax
    10e8:       89 43 30                mov    %eax,0x30(%rbx)
    10eb:       a8 01                   test   $0x1,%al
    10ed:       74 08                   je     10f7 <clone_mnt+0xa7>
    10ef:       0d 00 00 10 00          or     $0x100000,%eax
    10f4:       89 43 30                mov    %eax,0x30(%rbx)

and after patching

    10cc:       0d 00 00 04 00          or     $0x40000,%eax
    10d1:       89 c2                   mov    %eax,%edx
    10d3:       c1 e2 10                shl    $0x10,%edx
    10d6:       81 e2 00 00 40 00       and    $0x400000,%edx
    10dc:       09 d0                   or     %edx,%eax
    10de:       89 c2                   mov    %eax,%edx
    10e0:       c1 e2 14                shl    $0x14,%edx
    10e3:       81 e2 00 00 20 00       and    $0x200000,%edx
    10e9:       09 c2                   or     %eax,%edx

(with a final store of %eax to 0x30(%rbx)). Either way it's four
instructions per flag, but I assume the one without the branches is
preferable.

Similarly, in do_mount, before we have

    3429:       44 89 f8                mov    %r15d,%eax
    342c:       83 c8 01                or     $0x1,%eax
    342f:       40 f6 c5 02             test   $0x2,%bpl
    3433:       44 0f 45 f8             cmovne %eax,%r15d
    3437:       44 89 f8                mov    %r15d,%eax
    343a:       83 c8 02                or     $0x2,%eax
    343d:       40 f6 c5 04             test   $0x4,%bpl
    3441:       44 0f 45 f8             cmovne %eax,%r15d

but after patching it does something like

    3425:       4d 89 fe                mov    %r15,%r14
    3428:       48 c1 ea 07             shr    $0x7,%rdx
    342c:       49 d1 ee                shr    %r14
    342f:       89 d0                   mov    %edx,%eax
    3431:       c1 e1 05                shl    $0x5,%ecx
    3434:       83 e0 08                and    $0x8,%eax
    3437:       41 83 e6 07             and    $0x7,%r14d
    343b:       41 09 c6                or     %eax,%r14d
    343e:       89 d0                   mov    %edx,%eax

which actually makes use of MS_{NOSUID,NODEV,NOEXEC} all being 1 bit
off from their MNT_ counterparts (witness the shr %r14 and the and
with 0x7), so there we also cut 37 bytes according to bloat-o-meter.

Now, it does seem that older (and not that old in absolute terms)
compilers may generate worse code after the transformation - the
'replace with shift-mask-or' pattern doesn't exist until 7.1. E.g.
with 5.4 we have

$ scripts/bloat-o-meter fs/namespace.o.{0,1}-5.4
add/remove: 0/0 grow/shrink: 2/0 up/down: 50/0 (50)
function                                     old     new   delta
do_mount                                    3153    3195     +42
clone_mnt                                    768     776      +8

5.4 compiles the "if() ..." construction roughly as 7.1, i.e. with
cmovs, but the ternary expressions are transformed into this
abomination

    336f:       48 89 da                mov    %rbx,%rdx
    3372:       83 e2 04                and    $0x4,%edx
    3375:       48 83 fa 01             cmp    $0x1,%rdx
    3379:       19 d2                   sbb    %edx,%edx
    337b:       f7 d2                   not    %edx
    337d:       83 e2 02                and    $0x2,%edx
    3380:       09 d0                   or     %edx,%eax
    3382:       48 89 da                mov    %rbx,%rdx
    3385:       83 e2 08                and    $0x8,%edx
    3388:       48 83 fa 01             cmp    $0x1,%rdx
    338c:       19 d2                   sbb    %edx,%edx
    338e:       f7 d2                   not    %edx
    3390:       83 e2 04                and    $0x4,%edx
    3393:       09 d0                   or     %edx,%eax

Was it something like this you saw?

Rasmus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11]
  2017-11-14 22:24         ` Rasmus Villemoes
@ 2017-11-14 22:43           ` Linus Torvalds
  2017-11-14 23:53             ` Rasmus Villemoes
  0 siblings, 1 reply; 82+ messages in thread
From: Linus Torvalds @ 2017-11-14 22:43 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Linux Kernel Mailing List

On Tue, Nov 14, 2017 at 2:24 PM, Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> Can you be more specific? That's not what I see with gcc 7.1.

I have gcc-7.2.1, and it made a horrible mess of the do_mount() code.

Look for the comment "/* Separate the per-mountpoint flags */" and do
the obvious conversion of the simple single-bit stuff.

My gcc actually turned those into jumps after _after_ I converted it
into the ternary operation, and then the ternary conversion actually
did much worse, because it actually had two sides (one with a zero
value, and one with the bit value to be set).

I didn't have time to look into _why_ that code generated
branch-overs, when the otherwise similar reverse case in fs/statfs.c
did not.

           Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11]
  2017-11-14 22:43           ` Linus Torvalds
@ 2017-11-14 23:53             ` Rasmus Villemoes
  2017-11-15  0:02               ` Linus Torvalds
  0 siblings, 1 reply; 82+ messages in thread
From: Rasmus Villemoes @ 2017-11-14 23:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On 14 November 2017 at 23:43, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Nov 14, 2017 at 2:24 PM, Rasmus Villemoes
> <linux@rasmusvillemoes.dk> wrote:
>>
>> Can you be more specific? That's not what I see with gcc 7.1.
>
> I have gcc-7.2.1, and it made a horrible mess of the do_mount() code.

Odd. 7.2 and 7.1 (both of which I've just compiled from source, no
special configure flags or anything) generate exactly the same (good)
code for fs/namespace.o after patching. I also tried with
CONFIG_CC_OPTIMIZE_FOR_SIZE=y just in case, but that still produces
reasonable code. Oh well.

Rasmus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11]
  2017-11-14 23:53             ` Rasmus Villemoes
@ 2017-11-15  0:02               ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-15  0:02 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Linux Kernel Mailing List

On Tue, Nov 14, 2017 at 3:53 PM, Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> Odd. 7.2 and 7.1 (both of which I've just compiled from source, no
> special configure flags or anything) generate exactly the same (good)
> code for fs/namespace.o after patching. I also tried with
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y just in case, but that still produces
> reasonable code. Oh well.

It might be configuration-dependent. I usually end up having a couple
of different configurations depending on what I'm doing, one being a
fairly minimal one, and one being the "allmodconfig" I use mainly for
build testing.

           Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-13 22:48                         ` Patrick McLean
  (?)
@ 2017-11-17  0:54                           ` Kees Cook
  -1 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-17  0:54 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>> Boris Lukashev points out that Patrick should probably check a newer
>> version of gcc.
>>
>> I looked around, and in one of the emails, Patrick said:
>>
>>   "No changes, both the working and broken kernels were built with
>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>
>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>
>> It would probably be good to check if the problems persist with gcc
>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>> tend to use themselves.
>
> I just tested it with gcc 7.2, and was able to reproduce the NULL
> pointer dereference, the backtrace looks slightly different this time.
>
> I will also test with binutils 2.29, though I doubt that will make any
> difference.
>
>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>> [   56.167249] PGD 0 P4D 0
>> [   56.167860] Oops: 0000 [#1] SMP
>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>> [   56.193898] Call Trace:
>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>> [   56.198968]  svc_process_common+0x475/0x640
>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>> [   56.200404]  svc_process+0xf2/0x1a0
>> [   56.201079]  nfsd+0xe3/0x150
>> [   56.201706]  kthread+0x117/0x130
>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>> [   56.203100]  ret_from_fork+0x25/0x30
>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>> [   56.207110] CR2: 0000000000000560
>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>
>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>
>>> I'll take a closer look at this and see if I can provide something to
>>> narrow it down.

How reliable is this crash? The best idea I have to isolate it would
be to bisect the additions of the __randomize_layout markings on
various structures. I would start with the ones Al is most upset to
see randomized. ;)

All that said, I'd like to better understand the BIOS side of this a
little better. In the first email in this thread, you showed two BUGs
separated by a little time, which implies to me that the NULL deref
and the BIOS no longer POSTing are separate (though seemingly related)
issues. Have you had machines survive the BUG without blowing up the
BIOS?

I'm still trying to wrap my head around how the BIOS could be blowing
up. I assume there's some magic memory address that is getting poked
as a result of some struct randomization bug, so tracking that down
should be possible assuming you can stand reflashing your BIOS across
the bisects.

For the first step, I'd try a revert of
9225331b310821760f39ba55b00b8973602adbb5, which enables a large
portion of struct randomization. If that doesn't change things, I can
provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
and then re-applies __randomize_layout one structure per patch, and
you could bisect that?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-17  0:54                           ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-17  0:54 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>> Boris Lukashev points out that Patrick should probably check a newer
>> version of gcc.
>>
>> I looked around, and in one of the emails, Patrick said:
>>
>>   "No changes, both the working and broken kernels were built with
>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>
>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>
>> It would probably be good to check if the problems persist with gcc
>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>> tend to use themselves.
>
> I just tested it with gcc 7.2, and was able to reproduce the NULL
> pointer dereference, the backtrace looks slightly different this time.
>
> I will also test with binutils 2.29, though I doubt that will make any
> difference.
>
>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>> [   56.167249] PGD 0 P4D 0
>> [   56.167860] Oops: 0000 [#1] SMP
>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>> [   56.193898] Call Trace:
>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>> [   56.198968]  svc_process_common+0x475/0x640
>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>> [   56.200404]  svc_process+0xf2/0x1a0
>> [   56.201079]  nfsd+0xe3/0x150
>> [   56.201706]  kthread+0x117/0x130
>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>> [   56.203100]  ret_from_fork+0x25/0x30
>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>> [   56.207110] CR2: 0000000000000560
>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>
>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>
>>> I'll take a closer look at this and see if I can provide something to
>>> narrow it down.

How reliable is this crash? The best idea I have to isolate it would
be to bisect the additions of the __randomize_layout markings on
various structures. I would start with the ones Al is most upset to
see randomized. ;)

All that said, I'd like to better understand the BIOS side of this a
little better. In the first email in this thread, you showed two BUGs
separated by a little time, which implies to me that the NULL deref
and the BIOS no longer POSTing are separate (though seemingly related)
issues. Have you had machines survive the BUG without blowing up the
BIOS?

I'm still trying to wrap my head around how the BIOS could be blowing
up. I assume there's some magic memory address that is getting poked
as a result of some struct randomization bug, so tracking that down
should be possible assuming you can stand reflashing your BIOS across
the bisects.

For the first step, I'd try a revert of
9225331b310821760f39ba55b00b8973602adbb5, which enables a large
portion of struct randomization. If that doesn't change things, I can
provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
and then re-applies __randomize_layout one structure per patch, and
you could bisect that?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-17  0:54                           ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-17  0:54 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>> Boris Lukashev points out that Patrick should probably check a newer
>> version of gcc.
>>
>> I looked around, and in one of the emails, Patrick said:
>>
>>   "No changes, both the working and broken kernels were built with
>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>
>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>
>> It would probably be good to check if the problems persist with gcc
>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>> tend to use themselves.
>
> I just tested it with gcc 7.2, and was able to reproduce the NULL
> pointer dereference, the backtrace looks slightly different this time.
>
> I will also test with binutils 2.29, though I doubt that will make any
> difference.
>
>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>> [   56.167249] PGD 0 P4D 0
>> [   56.167860] Oops: 0000 [#1] SMP
>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>> [   56.193898] Call Trace:
>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>> [   56.198968]  svc_process_common+0x475/0x640
>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>> [   56.200404]  svc_process+0xf2/0x1a0
>> [   56.201079]  nfsd+0xe3/0x150
>> [   56.201706]  kthread+0x117/0x130
>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>> [   56.203100]  ret_from_fork+0x25/0x30
>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>> [   56.207110] CR2: 0000000000000560
>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>
>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>
>>> I'll take a closer look at this and see if I can provide something to
>>> narrow it down.

How reliable is this crash? The best idea I have to isolate it would
be to bisect the additions of the __randomize_layout markings on
various structures. I would start with the ones Al is most upset to
see randomized. ;)

All that said, I'd like to better understand the BIOS side of this a
little better. In the first email in this thread, you showed two BUGs
separated by a little time, which implies to me that the NULL deref
and the BIOS no longer POSTing are separate (though seemingly related)
issues. Have you had machines survive the BUG without blowing up the
BIOS?

I'm still trying to wrap my head around how the BIOS could be blowing
up. I assume there's some magic memory address that is getting poked
as a result of some struct randomization bug, so tracking that down
should be possible assuming you can stand reflashing your BIOS across
the bisects.

For the first step, I'd try a revert of
9225331b310821760f39ba55b00b8973602adbb5, which enables a large
portion of struct randomization. If that doesn't change things, I can
provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
and then re-applies __randomize_layout one structure per patch, and
you could bisect that?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-17  0:54                           ` Kees Cook
  (?)
@ 2017-11-17 19:03                             ` Patrick McLean
  -1 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-17 19:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On 2017-11-16 04:54 PM, Kees Cook wrote:
> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>> Boris Lukashev points out that Patrick should probably check a newer
>>> version of gcc.
>>>
>>> I looked around, and in one of the emails, Patrick said:
>>>
>>>   "No changes, both the working and broken kernels were built with
>>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>
>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>
>>> It would probably be good to check if the problems persist with gcc
>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>> tend to use themselves.
>>
>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>> pointer dereference, the backtrace looks slightly different this time.
>>
>> I will also test with binutils 2.29, though I doubt that will make any
>> difference.
>>
>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>> [   56.167249] PGD 0 P4D 0
>>> [   56.167860] Oops: 0000 [#1] SMP
>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>>> [   56.193898] Call Trace:
>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>> [   56.198968]  svc_process_common+0x475/0x640
>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>> [   56.200404]  svc_process+0xf2/0x1a0
>>> [   56.201079]  nfsd+0xe3/0x150
>>> [   56.201706]  kthread+0x117/0x130
>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>> [   56.203100]  ret_from_fork+0x25/0x30
>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>>> [   56.207110] CR2: 0000000000000560
>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>
>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>
>>>> I'll take a closer look at this and see if I can provide something to
>>>> narrow it down.
> 
> How reliable is this crash? The best idea I have to isolate it would
> be to bisect the additions of the __randomize_layout markings on
> various structures. I would start with the ones Al is most upset to
> see randomized. ;)

It's pretty reliable, once I get a bad seed I can reproduce the crash
pretty quickly.

> 
> All that said, I'd like to better understand the BIOS side of this a
> little better. In the first email in this thread, you showed two BUGs
> separated by a little time, which implies to me that the NULL deref
> and the BIOS no longer POSTing are separate (though seemingly related)
> issues. Have you had machines survive the BUG without blowing up the
> BIOS?

We had 3 machines die due to the BIOS issue (all of them pretty quickly
with the bad-seed kernel). All the dead machines had the same
motherboard model. I have not managed to reproduce the issue again on
the machine I restored via the IPMI interface, I suspect that it may be
a bug in the BIOS that was fixed in a more recent version.

> 
> I'm still trying to wrap my head around how the BIOS could be blowing
> up. I assume there's some magic memory address that is getting poked
> as a result of some struct randomization bug, so tracking that down
> should be possible assuming you can stand reflashing your BIOS across
> the bisects.

That is our theory, some magic memory address that caused an overwrite
of the flash where the BIOS code is stored. We are working under the
assumption that it was fixed in a more recent BIOS update, since I have
not managed to reproduce the issue on the resurrected machine.

> For the first step, I'd try a revert of
> 9225331b310821760f39ba55b00b8973602adbb5, which enables a large
> portion of struct randomization. If that doesn't change things, I can
> provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
> and then re-applies __randomize_layout one structure per patch, and
> you could bisect that?

Sure, I can bisect that.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-17 19:03                             ` Patrick McLean
  0 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-17 19:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On 2017-11-16 04:54 PM, Kees Cook wrote:
> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>> Boris Lukashev points out that Patrick should probably check a newer
>>> version of gcc.
>>>
>>> I looked around, and in one of the emails, Patrick said:
>>>
>>>   "No changes, both the working and broken kernels were built with
>>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>
>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>
>>> It would probably be good to check if the problems persist with gcc
>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>> tend to use themselves.
>>
>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>> pointer dereference, the backtrace looks slightly different this time.
>>
>> I will also test with binutils 2.29, though I doubt that will make any
>> difference.
>>
>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>> [   56.167249] PGD 0 P4D 0
>>> [   56.167860] Oops: 0000 [#1] SMP
>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>>> [   56.193898] Call Trace:
>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>> [   56.198968]  svc_process_common+0x475/0x640
>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>> [   56.200404]  svc_process+0xf2/0x1a0
>>> [   56.201079]  nfsd+0xe3/0x150
>>> [   56.201706]  kthread+0x117/0x130
>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>> [   56.203100]  ret_from_fork+0x25/0x30
>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>>> [   56.207110] CR2: 0000000000000560
>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>
>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>
>>>> I'll take a closer look at this and see if I can provide something to
>>>> narrow it down.
> 
> How reliable is this crash? The best idea I have to isolate it would
> be to bisect the additions of the __randomize_layout markings on
> various structures. I would start with the ones Al is most upset to
> see randomized. ;)

It's pretty reliable, once I get a bad seed I can reproduce the crash
pretty quickly.

> 
> All that said, I'd like to better understand the BIOS side of this a
> little better. In the first email in this thread, you showed two BUGs
> separated by a little time, which implies to me that the NULL deref
> and the BIOS no longer POSTing are separate (though seemingly related)
> issues. Have you had machines survive the BUG without blowing up the
> BIOS?

We had 3 machines die due to the BIOS issue (all of them pretty quickly
with the bad-seed kernel). All the dead machines had the same
motherboard model. I have not managed to reproduce the issue again on
the machine I restored via the IPMI interface, I suspect that it may be
a bug in the BIOS that was fixed in a more recent version.

> 
> I'm still trying to wrap my head around how the BIOS could be blowing
> up. I assume there's some magic memory address that is getting poked
> as a result of some struct randomization bug, so tracking that down
> should be possible assuming you can stand reflashing your BIOS across
> the bisects.

That is our theory, some magic memory address that caused an overwrite
of the flash where the BIOS code is stored. We are working under the
assumption that it was fixed in a more recent BIOS update, since I have
not managed to reproduce the issue on the resurrected machine.

> For the first step, I'd try a revert of
> 9225331b310821760f39ba55b00b8973602adbb5, which enables a large
> portion of struct randomization. If that doesn't change things, I can
> provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
> and then re-applies __randomize_layout one structure per patch, and
> you could bisect that?

Sure, I can bisect that.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-17 19:03                             ` Patrick McLean
  0 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-17 19:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On 2017-11-16 04:54 PM, Kees Cook wrote:
> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>> Boris Lukashev points out that Patrick should probably check a newer
>>> version of gcc.
>>>
>>> I looked around, and in one of the emails, Patrick said:
>>>
>>>   "No changes, both the working and broken kernels were built with
>>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>
>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>
>>> It would probably be good to check if the problems persist with gcc
>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>> tend to use themselves.
>>
>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>> pointer dereference, the backtrace looks slightly different this time.
>>
>> I will also test with binutils 2.29, though I doubt that will make any
>> difference.
>>
>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>> [   56.167249] PGD 0 P4D 0
>>> [   56.167860] Oops: 0000 [#1] SMP
>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>>> [   56.193898] Call Trace:
>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>> [   56.198968]  svc_process_common+0x475/0x640
>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>> [   56.200404]  svc_process+0xf2/0x1a0
>>> [   56.201079]  nfsd+0xe3/0x150
>>> [   56.201706]  kthread+0x117/0x130
>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>> [   56.203100]  ret_from_fork+0x25/0x30
>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>>> [   56.207110] CR2: 0000000000000560
>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>
>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>
>>>> I'll take a closer look at this and see if I can provide something to
>>>> narrow it down.
> 
> How reliable is this crash? The best idea I have to isolate it would
> be to bisect the additions of the __randomize_layout markings on
> various structures. I would start with the ones Al is most upset to
> see randomized. ;)

It's pretty reliable, once I get a bad seed I can reproduce the crash
pretty quickly.

> 
> All that said, I'd like to better understand the BIOS side of this a
> little better. In the first email in this thread, you showed two BUGs
> separated by a little time, which implies to me that the NULL deref
> and the BIOS no longer POSTing are separate (though seemingly related)
> issues. Have you had machines survive the BUG without blowing up the
> BIOS?

We had 3 machines die due to the BIOS issue (all of them pretty quickly
with the bad-seed kernel). All the dead machines had the same
motherboard model. I have not managed to reproduce the issue again on
the machine I restored via the IPMI interface, I suspect that it may be
a bug in the BIOS that was fixed in a more recent version.

> 
> I'm still trying to wrap my head around how the BIOS could be blowing
> up. I assume there's some magic memory address that is getting poked
> as a result of some struct randomization bug, so tracking that down
> should be possible assuming you can stand reflashing your BIOS across
> the bisects.

That is our theory, some magic memory address that caused an overwrite
of the flash where the BIOS code is stored. We are working under the
assumption that it was fixed in a more recent BIOS update, since I have
not managed to reproduce the issue on the resurrected machine.

> For the first step, I'd try a revert of
> 9225331b310821760f39ba55b00b8973602adbb5, which enables a large
> portion of struct randomization. If that doesn't change things, I can
> provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
> and then re-applies __randomize_layout one structure per patch, and
> you could bisect that?

Sure, I can bisect that.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-17 19:03                             ` Patrick McLean
  (?)
@ 2017-11-17 21:26                               ` Kees Cook
  -1 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-17 21:26 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 11:03 AM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-16 04:54 PM, Kees Cook wrote:
>> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>>> Boris Lukashev points out that Patrick should probably check a newer
>>>> version of gcc.
>>>>
>>>> I looked around, and in one of the emails, Patrick said:
>>>>
>>>>   "No changes, both the working and broken kernels were built with
>>>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>>
>>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>>
>>>> It would probably be good to check if the problems persist with gcc
>>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>>> tend to use themselves.
>>>
>>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>>> pointer dereference, the backtrace looks slightly different this time.
>>>
>>> I will also test with binutils 2.29, though I doubt that will make any
>>> difference.
>>>
>>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>>> [   56.167249] PGD 0 P4D 0
>>>> [   56.167860] Oops: 0000 [#1] SMP
>>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>>>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>>>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>>>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>>>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>>>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>>>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>>>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>>>> [   56.193898] Call Trace:
>>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>>> [   56.198968]  svc_process_common+0x475/0x640
>>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>>> [   56.200404]  svc_process+0xf2/0x1a0
>>>> [   56.201079]  nfsd+0xe3/0x150
>>>> [   56.201706]  kthread+0x117/0x130
>>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>>> [   56.203100]  ret_from_fork+0x25/0x30
>>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>>>> [   56.207110] CR2: 0000000000000560
>>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>>
>>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>>
>>>>> I'll take a closer look at this and see if I can provide something to
>>>>> narrow it down.
>>
>> How reliable is this crash? The best idea I have to isolate it would
>> be to bisect the additions of the __randomize_layout markings on
>> various structures. I would start with the ones Al is most upset to
>> see randomized. ;)
>
> It's pretty reliable, once I get a bad seed I can reproduce the crash
> pretty quickly.
>
>>
>> All that said, I'd like to better understand the BIOS side of this a
>> little better. In the first email in this thread, you showed two BUGs
>> separated by a little time, which implies to me that the NULL deref
>> and the BIOS no longer POSTing are separate (though seemingly related)
>> issues. Have you had machines survive the BUG without blowing up the
>> BIOS?
>
> We had 3 machines die due to the BIOS issue (all of them pretty quickly
> with the bad-seed kernel). All the dead machines had the same
> motherboard model. I have not managed to reproduce the issue again on
> the machine I restored via the IPMI interface, I suspect that it may be
> a bug in the BIOS that was fixed in a more recent version.
>
>>
>> I'm still trying to wrap my head around how the BIOS could be blowing
>> up. I assume there's some magic memory address that is getting poked
>> as a result of some struct randomization bug, so tracking that down
>> should be possible assuming you can stand reflashing your BIOS across
>> the bisects.
>
> That is our theory, some magic memory address that caused an overwrite
> of the flash where the BIOS code is stored. We are working under the
> assumption that it was fixed in a more recent BIOS update, since I have
> not managed to reproduce the issue on the resurrected machine.

Okay, well that's certainly better than having to reflash at every
bisection step! :)

>> For the first step, I'd try a revert of
>> 9225331b310821760f39ba55b00b8973602adbb5, which enables a large
>> portion of struct randomization. If that doesn't change things, I can
>> provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
>> and then re-applies __randomize_layout one structure per patch, and
>> you could bisect that?
>
> Sure, I can bisect that.

Okay, that should at least let us know if this is a specific struct
that is not expecting to get randomized, or if there is some deeper
flaw. Here's the tree, based on 4.14:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=kspp/randstruct/bisection

With commit d9e12200852d, all randomization selections are reverted. I
would expect this to be a "good" kernel for the bisect.

The very end of the series (commit d893c17b3146), everything is back
to being randomized. I would expect this to be a "bad" kernel.

Each step between those two commits adds randomization to a single
struct (with the filesystem stuff near the front).

Here's hoping it'll be something obvious. :) Thanks for taking the
time to debug this!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-17 21:26                               ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-17 21:26 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 11:03 AM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-16 04:54 PM, Kees Cook wrote:
>> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>>> Boris Lukashev points out that Patrick should probably check a newer
>>>> version of gcc.
>>>>
>>>> I looked around, and in one of the emails, Patrick said:
>>>>
>>>>   "No changes, both the working and broken kernels were built with
>>>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>>
>>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>>
>>>> It would probably be good to check if the problems persist with gcc
>>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>>> tend to use themselves.
>>>
>>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>>> pointer dereference, the backtrace looks slightly different this time.
>>>
>>> I will also test with binutils 2.29, though I doubt that will make any
>>> difference.
>>>
>>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>>> [   56.167249] PGD 0 P4D 0
>>>> [   56.167860] Oops: 0000 [#1] SMP
>>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>>>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>>>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>>>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>>>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>>>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>>>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>>>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>>>> [   56.193898] Call Trace:
>>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>>> [   56.198968]  svc_process_common+0x475/0x640
>>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>>> [   56.200404]  svc_process+0xf2/0x1a0
>>>> [   56.201079]  nfsd+0xe3/0x150
>>>> [   56.201706]  kthread+0x117/0x130
>>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>>> [   56.203100]  ret_from_fork+0x25/0x30
>>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>>>> [   56.207110] CR2: 0000000000000560
>>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>>
>>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>>
>>>>> I'll take a closer look at this and see if I can provide something to
>>>>> narrow it down.
>>
>> How reliable is this crash? The best idea I have to isolate it would
>> be to bisect the additions of the __randomize_layout markings on
>> various structures. I would start with the ones Al is most upset to
>> see randomized. ;)
>
> It's pretty reliable, once I get a bad seed I can reproduce the crash
> pretty quickly.
>
>>
>> All that said, I'd like to better understand the BIOS side of this a
>> little better. In the first email in this thread, you showed two BUGs
>> separated by a little time, which implies to me that the NULL deref
>> and the BIOS no longer POSTing are separate (though seemingly related)
>> issues. Have you had machines survive the BUG without blowing up the
>> BIOS?
>
> We had 3 machines die due to the BIOS issue (all of them pretty quickly
> with the bad-seed kernel). All the dead machines had the same
> motherboard model. I have not managed to reproduce the issue again on
> the machine I restored via the IPMI interface, I suspect that it may be
> a bug in the BIOS that was fixed in a more recent version.
>
>>
>> I'm still trying to wrap my head around how the BIOS could be blowing
>> up. I assume there's some magic memory address that is getting poked
>> as a result of some struct randomization bug, so tracking that down
>> should be possible assuming you can stand reflashing your BIOS across
>> the bisects.
>
> That is our theory, some magic memory address that caused an overwrite
> of the flash where the BIOS code is stored. We are working under the
> assumption that it was fixed in a more recent BIOS update, since I have
> not managed to reproduce the issue on the resurrected machine.

Okay, well that's certainly better than having to reflash at every
bisection step! :)

>> For the first step, I'd try a revert of
>> 9225331b310821760f39ba55b00b8973602adbb5, which enables a large
>> portion of struct randomization. If that doesn't change things, I can
>> provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
>> and then re-applies __randomize_layout one structure per patch, and
>> you could bisect that?
>
> Sure, I can bisect that.

Okay, that should at least let us know if this is a specific struct
that is not expecting to get randomized, or if there is some deeper
flaw. Here's the tree, based on 4.14:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=kspp/randstruct/bisection

With commit d9e12200852d, all randomization selections are reverted. I
would expect this to be a "good" kernel for the bisect.

The very end of the series (commit d893c17b3146), everything is back
to being randomized. I would expect this to be a "bad" kernel.

Each step between those two commits adds randomization to a single
struct (with the filesystem stuff near the front).

Here's hoping it'll be something obvious. :) Thanks for taking the
time to debug this!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-17 21:26                               ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-17 21:26 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 11:03 AM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-16 04:54 PM, Kees Cook wrote:
>> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>>> Boris Lukashev points out that Patrick should probably check a newer
>>>> version of gcc.
>>>>
>>>> I looked around, and in one of the emails, Patrick said:
>>>>
>>>>   "No changes, both the working and broken kernels were built with
>>>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>>
>>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>>
>>>> It would probably be good to check if the problems persist with gcc
>>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>>> tend to use themselves.
>>>
>>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>>> pointer dereference, the backtrace looks slightly different this time.
>>>
>>> I will also test with binutils 2.29, though I doubt that will make any
>>> difference.
>>>
>>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>>> [   56.167249] PGD 0 P4D 0
>>>> [   56.167860] Oops: 0000 [#1] SMP
>>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>>>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>>>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>>>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>>>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>>>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>>>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>>>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>>>> [   56.193898] Call Trace:
>>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>>> [   56.198968]  svc_process_common+0x475/0x640
>>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>>> [   56.200404]  svc_process+0xf2/0x1a0
>>>> [   56.201079]  nfsd+0xe3/0x150
>>>> [   56.201706]  kthread+0x117/0x130
>>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>>> [   56.203100]  ret_from_fork+0x25/0x30
>>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>>>> [   56.207110] CR2: 0000000000000560
>>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>>
>>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>>
>>>>> I'll take a closer look at this and see if I can provide something to
>>>>> narrow it down.
>>
>> How reliable is this crash? The best idea I have to isolate it would
>> be to bisect the additions of the __randomize_layout markings on
>> various structures. I would start with the ones Al is most upset to
>> see randomized. ;)
>
> It's pretty reliable, once I get a bad seed I can reproduce the crash
> pretty quickly.
>
>>
>> All that said, I'd like to better understand the BIOS side of this a
>> little better. In the first email in this thread, you showed two BUGs
>> separated by a little time, which implies to me that the NULL deref
>> and the BIOS no longer POSTing are separate (though seemingly related)
>> issues. Have you had machines survive the BUG without blowing up the
>> BIOS?
>
> We had 3 machines die due to the BIOS issue (all of them pretty quickly
> with the bad-seed kernel). All the dead machines had the same
> motherboard model. I have not managed to reproduce the issue again on
> the machine I restored via the IPMI interface, I suspect that it may be
> a bug in the BIOS that was fixed in a more recent version.
>
>>
>> I'm still trying to wrap my head around how the BIOS could be blowing
>> up. I assume there's some magic memory address that is getting poked
>> as a result of some struct randomization bug, so tracking that down
>> should be possible assuming you can stand reflashing your BIOS across
>> the bisects.
>
> That is our theory, some magic memory address that caused an overwrite
> of the flash where the BIOS code is stored. We are working under the
> assumption that it was fixed in a more recent BIOS update, since I have
> not managed to reproduce the issue on the resurrected machine.

Okay, well that's certainly better than having to reflash at every
bisection step! :)

>> For the first step, I'd try a revert of
>> 9225331b310821760f39ba55b00b8973602adbb5, which enables a large
>> portion of struct randomization. If that doesn't change things, I can
>> provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
>> and then re-applies __randomize_layout one structure per patch, and
>> you could bisect that?
>
> Sure, I can bisect that.

Okay, that should at least let us know if this is a specific struct
that is not expecting to get randomized, or if there is some deeper
flaw. Here's the tree, based on 4.14:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=kspp/randstruct/bisection

With commit d9e12200852d, all randomization selections are reverted. I
would expect this to be a "good" kernel for the bisect.

The very end of the series (commit d893c17b3146), everything is back
to being randomized. I would expect this to be a "bad" kernel.

Each step between those two commits adds randomization to a single
struct (with the filesystem stuff near the front).

Here's hoping it'll be something obvious. :) Thanks for taking the
time to debug this!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-17 21:26                               ` Kees Cook
  (?)
@ 2017-11-18  0:27                                 ` Patrick McLean
  -1 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-18  0:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On 2017-11-17 01:26 PM, Kees Cook wrote:
> On Fri, Nov 17, 2017 at 11:03 AM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-16 04:54 PM, Kees Cook wrote:
>>> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>>>> Boris Lukashev points out that Patrick should probably check a newer
>>>>> version of gcc.
>>>>>
>>>>> I looked around, and in one of the emails, Patrick said:
>>>>>
>>>>>   "No changes, both the working and broken kernels were built with
>>>>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>>>
>>>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>>>
>>>>> It would probably be good to check if the problems persist with gcc
>>>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>>>> tend to use themselves.
>>>>
>>>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>>>> pointer dereference, the backtrace looks slightly different this time.
>>>>
>>>> I will also test with binutils 2.29, though I doubt that will make any
>>>> difference.
>>>>
>>>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>>>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>>>> [   56.167249] PGD 0 P4D 0
>>>>> [   56.167860] Oops: 0000 [#1] SMP
>>>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>>>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>>>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>>>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>>>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>>>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>>>>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>>>>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>>>>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>>>>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>>>>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>>>>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>>>>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>>>>> [   56.193898] Call Trace:
>>>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>>>> [   56.198968]  svc_process_common+0x475/0x640
>>>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>>>> [   56.200404]  svc_process+0xf2/0x1a0
>>>>> [   56.201079]  nfsd+0xe3/0x150
>>>>> [   56.201706]  kthread+0x117/0x130
>>>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>>>> [   56.203100]  ret_from_fork+0x25/0x30
>>>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>>>>> [   56.207110] CR2: 0000000000000560
>>>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>>>
>>>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>>>
>>>>>> I'll take a closer look at this and see if I can provide something to
>>>>>> narrow it down.
>>>
>>> How reliable is this crash? The best idea I have to isolate it would
>>> be to bisect the additions of the __randomize_layout markings on
>>> various structures. I would start with the ones Al is most upset to
>>> see randomized. ;)
>>
>> It's pretty reliable, once I get a bad seed I can reproduce the crash
>> pretty quickly.
>>
>>> For the first step, I'd try a revert of
>>> 9225331b310821760f39ba55b00b8973602adbb5, which enables a large
>>> portion of struct randomization. If that doesn't change things, I can
>>> provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
>>> and then re-applies __randomize_layout one structure per patch, and
>>> you could bisect that?
>>
>> Sure, I can bisect that.
> 
> Okay, that should at least let us know if this is a specific struct
> that is not expecting to get randomized, or if there is some deeper
> flaw. Here's the tree, based on 4.14:
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=kspp/randstruct/bisection
> 
> With commit d9e12200852d, all randomization selections are reverted. I
> would expect this to be a "good" kernel for the bisect.

I am still getting the crash at d9e12200852d, I figured I would
double-check the "good" and "bad" kernels before starting a full bisect.

I guess it must be something somewhere else? I am happy to test or
bisect more patches.

Here is the BUG message for reference:
> [   56.495987] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
> [   56.497404] IP: vfs_statfs+0x7c/0xc0
> [   56.498092] PGD 0 P4D 0 
> [   56.498716] Oops: 0000 [#1] SMP
> [   56.499366] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 nf_nat gkuart(O) usbserial x86_pkg_temp_thermal tpm_tis ipmi_ssif tpm_tis_core ie31200_edac ext4 mbcache jbd2 e1000e crc32c_intel
> [   56.502653] CPU: 0 PID: 3975 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1-00061-gd893c17b3146 #3
> [   56.504071] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   56.504957] task: ffff88040cba7000 task.stack: ffffc90002c08000
> [   56.505843] RIP: 0010:vfs_statfs+0x7c/0xc0
> [   56.506571] RSP: 0018:ffffc90002c0bb28 EFLAGS: 00010202
> [   56.507383] RAX: 0000000000000000 RBX: ffffc90002c0bbf0 RCX: 0000000000000020
> [   56.508354] RDX: 0000000000001000 RSI: 0000000000001000 RDI: 0000000000000000
> [   56.509545] RBP: ffffc90002c0bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
> [   56.510622] R10: 000000000038be3a R11: ffff8804087563e8 R12: 0000000000000000
> [   56.511693] R13: ffff88040c68d000 R14: ffff88040c4df000 R15: ffffc90002c0bbf0
> [   56.512764] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
> [   56.514216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.515199] CR2: 0000000000000560 CR3: 0000000001e0a005 CR4: 00000000001606f0
> [   56.516268] Call Trace:
> [   56.516903]  nfsd4_encode_fattr+0x201/0x1f90
> [   56.517686]  ? generic_permission+0x12c/0x1a0
> [   56.518467]  nfsd4_encode_getattr+0x25/0x30
> [   56.519220]  nfsd4_encode_operation+0x98/0x1b0
> [   56.519991]  nfsd4_proc_compound+0x2a0/0x5e0
> [   56.520758]  nfsd_dispatch+0xe8/0x220
> [   56.521476]  svc_process_common+0x475/0x640
> [   56.522221]  ? nfsd_destroy+0x60/0x60
> [   56.522923]  svc_process+0xf2/0x1a0
> [   56.523611]  nfsd+0xe3/0x150
> [   56.524241]  kthread+0x117/0x130
> [   56.524896]  ? kthread_create_on_node+0x40/0x40
> [   56.525630]  ret_from_fork+0x25/0x30
> [   56.526306] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce 00 10 00 00 83 e1 20 0f 45 d6 <48> 8b b7 60 05 00 00 bf 10 00 00 00 83 ca 20 89 f1 83 e1 10 0f
> [   56.528885] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c0bb28
> [   56.529772] CR2: 0000000000000560
> [   56.530464] ---[ end trace e6cf48f1f8c0ee4e ]---


> 
> The very end of the series (commit d893c17b3146), everything is back
> to being randomized. I would expect this to be a "bad" kernel.
> 
> Each step between those two commits adds randomization to a single
> struct (with the filesystem stuff near the front).
> 
> Here's hoping it'll be something obvious. :) Thanks for taking the
> time to debug this!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  0:27                                 ` Patrick McLean
  0 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-18  0:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On 2017-11-17 01:26 PM, Kees Cook wrote:
> On Fri, Nov 17, 2017 at 11:03 AM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-16 04:54 PM, Kees Cook wrote:
>>> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>>>> Boris Lukashev points out that Patrick should probably check a newer
>>>>> version of gcc.
>>>>>
>>>>> I looked around, and in one of the emails, Patrick said:
>>>>>
>>>>>   "No changes, both the working and broken kernels were built with
>>>>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>>>
>>>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>>>
>>>>> It would probably be good to check if the problems persist with gcc
>>>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>>>> tend to use themselves.
>>>>
>>>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>>>> pointer dereference, the backtrace looks slightly different this time.
>>>>
>>>> I will also test with binutils 2.29, though I doubt that will make any
>>>> difference.
>>>>
>>>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>>>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>>>> [   56.167249] PGD 0 P4D 0
>>>>> [   56.167860] Oops: 0000 [#1] SMP
>>>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>>>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>>>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>>>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>>>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>>>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>>>>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>>>>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>>>>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>>>>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>>>>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>>>>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>>>>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>>>>> [   56.193898] Call Trace:
>>>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>>>> [   56.198968]  svc_process_common+0x475/0x640
>>>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>>>> [   56.200404]  svc_process+0xf2/0x1a0
>>>>> [   56.201079]  nfsd+0xe3/0x150
>>>>> [   56.201706]  kthread+0x117/0x130
>>>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>>>> [   56.203100]  ret_from_fork+0x25/0x30
>>>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>>>>> [   56.207110] CR2: 0000000000000560
>>>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>>>
>>>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>>>
>>>>>> I'll take a closer look at this and see if I can provide something to
>>>>>> narrow it down.
>>>
>>> How reliable is this crash? The best idea I have to isolate it would
>>> be to bisect the additions of the __randomize_layout markings on
>>> various structures. I would start with the ones Al is most upset to
>>> see randomized. ;)
>>
>> It's pretty reliable, once I get a bad seed I can reproduce the crash
>> pretty quickly.
>>
>>> For the first step, I'd try a revert of
>>> 9225331b310821760f39ba55b00b8973602adbb5, which enables a large
>>> portion of struct randomization. If that doesn't change things, I can
>>> provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
>>> and then re-applies __randomize_layout one structure per patch, and
>>> you could bisect that?
>>
>> Sure, I can bisect that.
> 
> Okay, that should at least let us know if this is a specific struct
> that is not expecting to get randomized, or if there is some deeper
> flaw. Here's the tree, based on 4.14:
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=kspp/randstruct/bisection
> 
> With commit d9e12200852d, all randomization selections are reverted. I
> would expect this to be a "good" kernel for the bisect.

I am still getting the crash at d9e12200852d, I figured I would
double-check the "good" and "bad" kernels before starting a full bisect.

I guess it must be something somewhere else? I am happy to test or
bisect more patches.

Here is the BUG message for reference:
> [   56.495987] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
> [   56.497404] IP: vfs_statfs+0x7c/0xc0
> [   56.498092] PGD 0 P4D 0 
> [   56.498716] Oops: 0000 [#1] SMP
> [   56.499366] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 nf_nat gkuart(O) usbserial x86_pkg_temp_thermal tpm_tis ipmi_ssif tpm_tis_core ie31200_edac ext4 mbcache jbd2 e1000e crc32c_intel
> [   56.502653] CPU: 0 PID: 3975 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1-00061-gd893c17b3146 #3
> [   56.504071] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   56.504957] task: ffff88040cba7000 task.stack: ffffc90002c08000
> [   56.505843] RIP: 0010:vfs_statfs+0x7c/0xc0
> [   56.506571] RSP: 0018:ffffc90002c0bb28 EFLAGS: 00010202
> [   56.507383] RAX: 0000000000000000 RBX: ffffc90002c0bbf0 RCX: 0000000000000020
> [   56.508354] RDX: 0000000000001000 RSI: 0000000000001000 RDI: 0000000000000000
> [   56.509545] RBP: ffffc90002c0bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
> [   56.510622] R10: 000000000038be3a R11: ffff8804087563e8 R12: 0000000000000000
> [   56.511693] R13: ffff88040c68d000 R14: ffff88040c4df000 R15: ffffc90002c0bbf0
> [   56.512764] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
> [   56.514216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.515199] CR2: 0000000000000560 CR3: 0000000001e0a005 CR4: 00000000001606f0
> [   56.516268] Call Trace:
> [   56.516903]  nfsd4_encode_fattr+0x201/0x1f90
> [   56.517686]  ? generic_permission+0x12c/0x1a0
> [   56.518467]  nfsd4_encode_getattr+0x25/0x30
> [   56.519220]  nfsd4_encode_operation+0x98/0x1b0
> [   56.519991]  nfsd4_proc_compound+0x2a0/0x5e0
> [   56.520758]  nfsd_dispatch+0xe8/0x220
> [   56.521476]  svc_process_common+0x475/0x640
> [   56.522221]  ? nfsd_destroy+0x60/0x60
> [   56.522923]  svc_process+0xf2/0x1a0
> [   56.523611]  nfsd+0xe3/0x150
> [   56.524241]  kthread+0x117/0x130
> [   56.524896]  ? kthread_create_on_node+0x40/0x40
> [   56.525630]  ret_from_fork+0x25/0x30
> [   56.526306] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce 00 10 00 00 83 e1 20 0f 45 d6 <48> 8b b7 60 05 00 00 bf 10 00 00 00 83 ca 20 89 f1 83 e1 10 0f
> [   56.528885] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c0bb28
> [   56.529772] CR2: 0000000000000560
> [   56.530464] ---[ end trace e6cf48f1f8c0ee4e ]---


> 
> The very end of the series (commit d893c17b3146), everything is back
> to being randomized. I would expect this to be a "bad" kernel.
> 
> Each step between those two commits adds randomization to a single
> struct (with the filesystem stuff near the front).
> 
> Here's hoping it'll be something obvious. :) Thanks for taking the
> time to debug this!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  0:27                                 ` Patrick McLean
  0 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-18  0:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On 2017-11-17 01:26 PM, Kees Cook wrote:
> On Fri, Nov 17, 2017 at 11:03 AM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-16 04:54 PM, Kees Cook wrote:
>>> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>>>> Boris Lukashev points out that Patrick should probably check a newer
>>>>> version of gcc.
>>>>>
>>>>> I looked around, and in one of the emails, Patrick said:
>>>>>
>>>>>   "No changes, both the working and broken kernels were built with
>>>>>    distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>>>
>>>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>>>
>>>>> It would probably be good to check if the problems persist with gcc
>>>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>>>> tend to use themselves.
>>>>
>>>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>>>> pointer dereference, the backtrace looks slightly different this time.
>>>>
>>>> I will also test with binutils 2.29, though I doubt that will make any
>>>> difference.
>>>>
>>>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
>>>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>>>> [   56.167249] PGD 0 P4D 0
>>>>> [   56.167860] Oops: 0000 [#1] SMP
>>>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable>
>>>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1 #1
>>>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>>>> [   56.182729] task: ffff88040c412a00 task.stack: ffffc90002c18000
>>>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>>>> [   56.184341] RSP: 0018:ffffc90002c1bb28 EFLAGS: 00010202
>>>>> [   56.185143] RAX: 0000000000000000 RBX: ffffc90002c1bbf0 RCX: 0000000000000020
>>>>> [   56.186085] RDX: 0000000000001801 RSI: 0000000000001801 RDI: 0000000000000000
>>>>> [   56.187066] RBP: ffffc90002c1bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
>>>>> [   56.188268] R10: 000000000038be3a R11: ffff880408b18258 R12: 0000000000000000
>>>>> [   56.189336] R13: ffff88040c23ad00 R14: ffff88040b874000 R15: ffffc90002c1bbf0
>>>>> [   56.190444] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
>>>>> [   56.191876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [   56.192843] CR2: 0000000000000560 CR3: 0000000001e0a002 CR4: 00000000001606f0
>>>>> [   56.193898] Call Trace:
>>>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>>>> [   56.198968]  svc_process_common+0x475/0x640
>>>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>>>> [   56.200404]  svc_process+0xf2/0x1a0
>>>>> [   56.201079]  nfsd+0xe3/0x150
>>>>> [   56.201706]  kthread+0x117/0x130
>>>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>>>> [   56.203100]  ret_from_fork+0x25/0x30
>>>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c1bb28
>>>>> [   56.207110] CR2: 0000000000000560
>>>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>>>
>>>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>>>
>>>>>> I'll take a closer look at this and see if I can provide something to
>>>>>> narrow it down.
>>>
>>> How reliable is this crash? The best idea I have to isolate it would
>>> be to bisect the additions of the __randomize_layout markings on
>>> various structures. I would start with the ones Al is most upset to
>>> see randomized. ;)
>>
>> It's pretty reliable, once I get a bad seed I can reproduce the crash
>> pretty quickly.
>>
>>> For the first step, I'd try a revert of
>>> 9225331b310821760f39ba55b00b8973602adbb5, which enables a large
>>> portion of struct randomization. If that doesn't change things, I can
>>> provide a series that reverts 3859a271a003aba01e45b85c9d8b355eb7bf25f9
>>> and then re-applies __randomize_layout one structure per patch, and
>>> you could bisect that?
>>
>> Sure, I can bisect that.
> 
> Okay, that should at least let us know if this is a specific struct
> that is not expecting to get randomized, or if there is some deeper
> flaw. Here's the tree, based on 4.14:
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=kspp/randstruct/bisection
> 
> With commit d9e12200852d, all randomization selections are reverted. I
> would expect this to be a "good" kernel for the bisect.

I am still getting the crash at d9e12200852d, I figured I would
double-check the "good" and "bad" kernels before starting a full bisect.

I guess it must be something somewhere else? I am happy to test or
bisect more patches.

Here is the BUG message for reference:
> [   56.495987] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
> [   56.497404] IP: vfs_statfs+0x7c/0xc0
> [   56.498092] PGD 0 P4D 0 
> [   56.498716] Oops: 0000 [#1] SMP
> [   56.499366] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 nf_nat gkuart(O) usbserial x86_pkg_temp_thermal tpm_tis ipmi_ssif tpm_tis_core ie31200_edac ext4 mbcache jbd2 e1000e crc32c_intel
> [   56.502653] CPU: 0 PID: 3975 Comm: nfsd Tainted: G           O    4.14.0-git-kratos-1-00061-gd893c17b3146 #3
> [   56.504071] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   56.504957] task: ffff88040cba7000 task.stack: ffffc90002c08000
> [   56.505843] RIP: 0010:vfs_statfs+0x7c/0xc0
> [   56.506571] RSP: 0018:ffffc90002c0bb28 EFLAGS: 00010202
> [   56.507383] RAX: 0000000000000000 RBX: ffffc90002c0bbf0 RCX: 0000000000000020
> [   56.508354] RDX: 0000000000001000 RSI: 0000000000001000 RDI: 0000000000000000
> [   56.509545] RBP: ffffc90002c0bbc0 R08: ffffffffffffff00 R09: 00000000000000ff
> [   56.510622] R10: 000000000038be3a R11: ffff8804087563e8 R12: 0000000000000000
> [   56.511693] R13: ffff88040c68d000 R14: ffff88040c4df000 R15: ffffc90002c0bbf0
> [   56.512764] FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
> [   56.514216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.515199] CR2: 0000000000000560 CR3: 0000000001e0a005 CR4: 00000000001606f0
> [   56.516268] Call Trace:
> [   56.516903]  nfsd4_encode_fattr+0x201/0x1f90
> [   56.517686]  ? generic_permission+0x12c/0x1a0
> [   56.518467]  nfsd4_encode_getattr+0x25/0x30
> [   56.519220]  nfsd4_encode_operation+0x98/0x1b0
> [   56.519991]  nfsd4_proc_compound+0x2a0/0x5e0
> [   56.520758]  nfsd_dispatch+0xe8/0x220
> [   56.521476]  svc_process_common+0x475/0x640
> [   56.522221]  ? nfsd_destroy+0x60/0x60
> [   56.522923]  svc_process+0xf2/0x1a0
> [   56.523611]  nfsd+0xe3/0x150
> [   56.524241]  kthread+0x117/0x130
> [   56.524896]  ? kthread_create_on_node+0x40/0x40
> [   56.525630]  ret_from_fork+0x25/0x30
> [   56.526306] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce 00 10 00 00 83 e1 20 0f 45 d6 <48> 8b b7 60 05 00 00 bf 10 00 00 00 83 ca 20 89 f1 83 e1 10 0f
> [   56.528885] RIP: vfs_statfs+0x7c/0xc0 RSP: ffffc90002c0bb28
> [   56.529772] CR2: 0000000000000560
> [   56.530464] ---[ end trace e6cf48f1f8c0ee4e ]---


> 
> The very end of the series (commit d893c17b3146), everything is back
> to being randomized. I would expect this to be a "bad" kernel.
> 
> Each step between those two commits adds randomization to a single
> struct (with the filesystem stuff near the front).
> 
> Here's hoping it'll be something obvious. :) Thanks for taking the
> time to debug this!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-18  0:27                                 ` Patrick McLean
  (?)
@ 2017-11-18  0:55                                   ` Linus Torvalds
  -1 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-18  0:55 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Kees Cook, Emese Revfy, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening

On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>
> I am still getting the crash at d9e12200852d, I figured I would
> double-check the "good" and "bad" kernels before starting a full bisect.

.. but without GCC_PLUGIN_RANDSTRUCT it's solid?

Kees removed even the baseline "randomize pure function pointer
structures", so at that commit, nothing should be randomized.

But maybe the plugin code itself ends up confusing gcc somehow?

Even when it doesn't actually do that "relayout_struct()" on the
structure, it always does those TYPE_ATTRIBUTES() games.

               Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  0:55                                   ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-18  0:55 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Kees Cook, Emese Revfy, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening

On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>
> I am still getting the crash at d9e12200852d, I figured I would
> double-check the "good" and "bad" kernels before starting a full bisect.

.. but without GCC_PLUGIN_RANDSTRUCT it's solid?

Kees removed even the baseline "randomize pure function pointer
structures", so at that commit, nothing should be randomized.

But maybe the plugin code itself ends up confusing gcc somehow?

Even when it doesn't actually do that "relayout_struct()" on the
structure, it always does those TYPE_ATTRIBUTES() games.

               Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  0:55                                   ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-18  0:55 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Kees Cook, Emese Revfy, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening

On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>
> I am still getting the crash at d9e12200852d, I figured I would
> double-check the "good" and "bad" kernels before starting a full bisect.

.. but without GCC_PLUGIN_RANDSTRUCT it's solid?

Kees removed even the baseline "randomize pure function pointer
structures", so at that commit, nothing should be randomized.

But maybe the plugin code itself ends up confusing gcc somehow?

Even when it doesn't actually do that "relayout_struct()" on the
structure, it always does those TYPE_ATTRIBUTES() games.

               Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-18  0:55                                   ` Linus Torvalds
  (?)
@ 2017-11-18  1:54                                     ` Patrick McLean
  -1 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-18  1:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Emese Revfy, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening



On 2017-11-17 04:55 PM, Linus Torvalds wrote:
> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>
>> I am still getting the crash at d9e12200852d, I figured I would
>> double-check the "good" and "bad" kernels before starting a full bisect.
> 
> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?

Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.

> Kees removed even the baseline "randomize pure function pointer
> structures", so at that commit, nothing should be randomized.
> 
> But maybe the plugin code itself ends up confusing gcc somehow?
> 
> Even when it doesn't actually do that "relayout_struct()" on the
> structure, it always does those TYPE_ATTRIBUTES() games.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  1:54                                     ` Patrick McLean
  0 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-18  1:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Emese Revfy, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening



On 2017-11-17 04:55 PM, Linus Torvalds wrote:
> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>
>> I am still getting the crash at d9e12200852d, I figured I would
>> double-check the "good" and "bad" kernels before starting a full bisect.
> 
> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?

Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.

> Kees removed even the baseline "randomize pure function pointer
> structures", so at that commit, nothing should be randomized.
> 
> But maybe the plugin code itself ends up confusing gcc somehow?
> 
> Even when it doesn't actually do that "relayout_struct()" on the
> structure, it always does those TYPE_ATTRIBUTES() games.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  1:54                                     ` Patrick McLean
  0 siblings, 0 replies; 82+ messages in thread
From: Patrick McLean @ 2017-11-18  1:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Emese Revfy, Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List, stable,
	Thorsten Leemhuis, kernel-hardening



On 2017-11-17 04:55 PM, Linus Torvalds wrote:
> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>
>> I am still getting the crash at d9e12200852d, I figured I would
>> double-check the "good" and "bad" kernels before starting a full bisect.
> 
> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?

Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.

> Kees removed even the baseline "randomize pure function pointer
> structures", so at that commit, nothing should be randomized.
> 
> But maybe the plugin code itself ends up confusing gcc somehow?
> 
> Even when it doesn't actually do that "relayout_struct()" on the
> structure, it always does those TYPE_ATTRIBUTES() games.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-18  1:54                                     ` Patrick McLean
  (?)
@ 2017-11-18  5:14                                       ` Kees Cook
  -1 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-18  5:14 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 5:54 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-17 04:55 PM, Linus Torvalds wrote:
>> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>>
>>> I am still getting the crash at d9e12200852d, I figured I would
>>> double-check the "good" and "bad" kernels before starting a full bisect.
>>
>> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?
>
> Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.

That's strange. With d9e12200852d the shuffle_seed variables won't
ever actually get used. (i.e. I wouldn't expect the seed to change any
behavior.)

Can you confirm with something like this:


diff --git a/scripts/gcc-plugins/randomize_layout_plugin.c
b/scripts/gcc-plugins/randomize_layout_plugin.c
index cdaac8c66734..aac570a57d7d 100644
--- a/scripts/gcc-plugins/randomize_layout_plugin.c
+++ b/scripts/gcc-plugins/randomize_layout_plugin.c
@@ -267,12 +267,10 @@ static void shuffle(const_tree type, tree
*newtree, unsigned long length)

        structname = ORIG_TYPE_NAME(type);

-#ifdef __DEBUG_PLUGIN
        fprintf(stderr, "Shuffling struct %s %p\n", (const char
*)structname, type);
 #ifdef __DEBUG_VERBOSE
        debug_tree((tree)type);
 #endif
-#endif

        for (i = 0; i < 4; i++) {
                seed[i] = shuffle_seed[i];


You should see no reports of "Shuffling struct ..."

And if it reports nothing, and you're on d9e12200852d, can you confirm
that switching to a "good" seed fixes it? (If it _does_, then I
suspect a build artifact being left behind or something odd like
that.)

>> Kees removed even the baseline "randomize pure function pointer
>> structures", so at that commit, nothing should be randomized.
>>
>> But maybe the plugin code itself ends up confusing gcc somehow?
>>
>> Even when it doesn't actually do that "relayout_struct()" on the
>> structure, it always does those TYPE_ATTRIBUTES() games.

FWIW, myself doing a build at d9e12200852d with and without
GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
where I did spot-checks.

Also, do you have any other plugins enabled? (Can you send your .config?)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  5:14                                       ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-18  5:14 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 5:54 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-17 04:55 PM, Linus Torvalds wrote:
>> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>>
>>> I am still getting the crash at d9e12200852d, I figured I would
>>> double-check the "good" and "bad" kernels before starting a full bisect.
>>
>> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?
>
> Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.

That's strange. With d9e12200852d the shuffle_seed variables won't
ever actually get used. (i.e. I wouldn't expect the seed to change any
behavior.)

Can you confirm with something like this:


diff --git a/scripts/gcc-plugins/randomize_layout_plugin.c
b/scripts/gcc-plugins/randomize_layout_plugin.c
index cdaac8c66734..aac570a57d7d 100644
--- a/scripts/gcc-plugins/randomize_layout_plugin.c
+++ b/scripts/gcc-plugins/randomize_layout_plugin.c
@@ -267,12 +267,10 @@ static void shuffle(const_tree type, tree
*newtree, unsigned long length)

        structname = ORIG_TYPE_NAME(type);

-#ifdef __DEBUG_PLUGIN
        fprintf(stderr, "Shuffling struct %s %p\n", (const char
*)structname, type);
 #ifdef __DEBUG_VERBOSE
        debug_tree((tree)type);
 #endif
-#endif

        for (i = 0; i < 4; i++) {
                seed[i] = shuffle_seed[i];


You should see no reports of "Shuffling struct ..."

And if it reports nothing, and you're on d9e12200852d, can you confirm
that switching to a "good" seed fixes it? (If it _does_, then I
suspect a build artifact being left behind or something odd like
that.)

>> Kees removed even the baseline "randomize pure function pointer
>> structures", so at that commit, nothing should be randomized.
>>
>> But maybe the plugin code itself ends up confusing gcc somehow?
>>
>> Even when it doesn't actually do that "relayout_struct()" on the
>> structure, it always does those TYPE_ATTRIBUTES() games.

FWIW, myself doing a build at d9e12200852d with and without
GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
where I did spot-checks.

Also, do you have any other plugins enabled? (Can you send your .config?)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  5:14                                       ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-18  5:14 UTC (permalink / raw)
  To: Patrick McLean
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 5:54 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-17 04:55 PM, Linus Torvalds wrote:
>> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>>
>>> I am still getting the crash at d9e12200852d, I figured I would
>>> double-check the "good" and "bad" kernels before starting a full bisect.
>>
>> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?
>
> Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.

That's strange. With d9e12200852d the shuffle_seed variables won't
ever actually get used. (i.e. I wouldn't expect the seed to change any
behavior.)

Can you confirm with something like this:


diff --git a/scripts/gcc-plugins/randomize_layout_plugin.c
b/scripts/gcc-plugins/randomize_layout_plugin.c
index cdaac8c66734..aac570a57d7d 100644
--- a/scripts/gcc-plugins/randomize_layout_plugin.c
+++ b/scripts/gcc-plugins/randomize_layout_plugin.c
@@ -267,12 +267,10 @@ static void shuffle(const_tree type, tree
*newtree, unsigned long length)

        structname = ORIG_TYPE_NAME(type);

-#ifdef __DEBUG_PLUGIN
        fprintf(stderr, "Shuffling struct %s %p\n", (const char
*)structname, type);
 #ifdef __DEBUG_VERBOSE
        debug_tree((tree)type);
 #endif
-#endif

        for (i = 0; i < 4; i++) {
                seed[i] = shuffle_seed[i];


You should see no reports of "Shuffling struct ..."

And if it reports nothing, and you're on d9e12200852d, can you confirm
that switching to a "good" seed fixes it? (If it _does_, then I
suspect a build artifact being left behind or something odd like
that.)

>> Kees removed even the baseline "randomize pure function pointer
>> structures", so at that commit, nothing should be randomized.
>>
>> But maybe the plugin code itself ends up confusing gcc somehow?
>>
>> Even when it doesn't actually do that "relayout_struct()" on the
>> structure, it always does those TYPE_ATTRIBUTES() games.

FWIW, myself doing a build at d9e12200852d with and without
GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
where I did spot-checks.

Also, do you have any other plugins enabled? (Can you send your .config?)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-18  5:14                                       ` Kees Cook
  (?)
@ 2017-11-18  5:29                                         ` Linus Torvalds
  -1 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-18  5:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 9:14 PM, Kees Cook <keescook@chromium.org> wrote:
>
> FWIW, myself doing a build at d9e12200852d with and without
> GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
> where I did spot-checks.

That would probably be a good thing to check anyway - check the
difference between GCC_PLUGIN_RANDSTRUCT on and off at that commit.

Just do

   objdump --disassemble vmlinux > file

and compare the two files for where the differences start occurring.

                 Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  5:29                                         ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-18  5:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 9:14 PM, Kees Cook <keescook@chromium.org> wrote:
>
> FWIW, myself doing a build at d9e12200852d with and without
> GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
> where I did spot-checks.

That would probably be a good thing to check anyway - check the
difference between GCC_PLUGIN_RANDSTRUCT on and off at that commit.

Just do

   objdump --disassemble vmlinux > file

and compare the two files for where the differences start occurring.

                 Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  5:29                                         ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2017-11-18  5:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 9:14 PM, Kees Cook <keescook@chromium.org> wrote:
>
> FWIW, myself doing a build at d9e12200852d with and without
> GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
> where I did spot-checks.

That would probably be a good thing to check anyway - check the
difference between GCC_PLUGIN_RANDSTRUCT on and off at that commit.

Just do

   objdump --disassemble vmlinux > file

and compare the two files for where the differences start occurring.

                 Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
  2017-11-18  5:29                                         ` Linus Torvalds
  (?)
@ 2017-11-18  8:20                                           ` Kees Cook
  -1 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-18  8:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 9:29 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Nov 17, 2017 at 9:14 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> FWIW, myself doing a build at d9e12200852d with and without
>> GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
>> where I did spot-checks.
>
> That would probably be a good thing to check anyway - check the
> difference between GCC_PLUGIN_RANDSTRUCT on and off at that commit.
>
> Just do
>
>    objdump --disassemble vmlinux > file
>
> and compare the two files for where the differences start occurring.

Yeah, I was just doing that now. Looks like there _is_ something
getting changed just from having the plugin enabled, but it appears
localized. For me, the first non-offset change happens in
lookup_user_key and persists for a while.

-ffffffff813893a7:      0f 85 55 03 00 00       jne
ffffffff81389702 <lookup_user_key+0x3f2>
-ffffffff813893ad:      f0 41 ff 06             lock incl (%r14)
-ffffffff813893b1:      83 fb 07                cmp    $0x7,%ebx
-ffffffff813893b4:      4c 89 b5 70 ff ff ff    mov    %r14,-0x90(%rbp)
...
+ffffffff813893a7:      0f 85 35 03 00 00       jne
ffffffff813896e2 <lookup_user_key+0x3d2>
+ffffffff813893ad:      4d 89 f0                mov    %r14,%r8
+ffffffff813893b0:      f0 41 ff 06             lock incl (%r14)
+ffffffff813893b4:      83 fb 07                cmp    $0x7,%ebx
+ffffffff813893b7:      4c 89 b5 70 ff ff ff    mov    %r14,-0x90(%rbp)

And removing the TYPE_ATTRIBUTES() poking makes the register storage
differences go away, but there's still a 0x40 byte offset delta.

I'll continue looking at this tomorrow.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  8:20                                           ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-18  8:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 9:29 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Nov 17, 2017 at 9:14 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> FWIW, myself doing a build at d9e12200852d with and without
>> GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
>> where I did spot-checks.
>
> That would probably be a good thing to check anyway - check the
> difference between GCC_PLUGIN_RANDSTRUCT on and off at that commit.
>
> Just do
>
>    objdump --disassemble vmlinux > file
>
> and compare the two files for where the differences start occurring.

Yeah, I was just doing that now. Looks like there _is_ something
getting changed just from having the plugin enabled, but it appears
localized. For me, the first non-offset change happens in
lookup_user_key and persists for a while.

-ffffffff813893a7:      0f 85 55 03 00 00       jne
ffffffff81389702 <lookup_user_key+0x3f2>
-ffffffff813893ad:      f0 41 ff 06             lock incl (%r14)
-ffffffff813893b1:      83 fb 07                cmp    $0x7,%ebx
-ffffffff813893b4:      4c 89 b5 70 ff ff ff    mov    %r14,-0x90(%rbp)
...
+ffffffff813893a7:      0f 85 35 03 00 00       jne
ffffffff813896e2 <lookup_user_key+0x3d2>
+ffffffff813893ad:      4d 89 f0                mov    %r14,%r8
+ffffffff813893b0:      f0 41 ff 06             lock incl (%r14)
+ffffffff813893b4:      83 fb 07                cmp    $0x7,%ebx
+ffffffff813893b7:      4c 89 b5 70 ff ff ff    mov    %r14,-0x90(%rbp)

And removing the TYPE_ATTRIBUTES() poking makes the register storage
differences go away, but there's still a 0x40 byte offset delta.

I'll continue looking at this tomorrow.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [kernel-hardening] Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11
@ 2017-11-18  8:20                                           ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2017-11-18  8:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, stable, Thorsten Leemhuis,
	kernel-hardening

On Fri, Nov 17, 2017 at 9:29 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Nov 17, 2017 at 9:14 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> FWIW, myself doing a build at d9e12200852d with and without
>> GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
>> where I did spot-checks.
>
> That would probably be a good thing to check anyway - check the
> difference between GCC_PLUGIN_RANDSTRUCT on and off at that commit.
>
> Just do
>
>    objdump --disassemble vmlinux > file
>
> and compare the two files for where the differences start occurring.

Yeah, I was just doing that now. Looks like there _is_ something
getting changed just from having the plugin enabled, but it appears
localized. For me, the first non-offset change happens in
lookup_user_key and persists for a while.

-ffffffff813893a7:      0f 85 55 03 00 00       jne
ffffffff81389702 <lookup_user_key+0x3f2>
-ffffffff813893ad:      f0 41 ff 06             lock incl (%r14)
-ffffffff813893b1:      83 fb 07                cmp    $0x7,%ebx
-ffffffff813893b4:      4c 89 b5 70 ff ff ff    mov    %r14,-0x90(%rbp)
...
+ffffffff813893a7:      0f 85 35 03 00 00       jne
ffffffff813896e2 <lookup_user_key+0x3d2>
+ffffffff813893ad:      4d 89 f0                mov    %r14,%r8
+ffffffff813893b0:      f0 41 ff 06             lock incl (%r14)
+ffffffff813893b4:      83 fb 07                cmp    $0x7,%ebx
+ffffffff813893b7:      4c 89 b5 70 ff ff ff    mov    %r14,-0x90(%rbp)

And removing the TYPE_ATTRIBUTES() poking makes the register storage
differences go away, but there's still a 0x40 byte offset delta.

I'll continue looking at this tomorrow.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2017-11-18  5:14                                       ` Kees Cook
                                                         ` (2 preceding siblings ...)
  (?)
@ 2018-02-21 22:19                                       ` Maciej S. Szmigiero
  2018-02-21 22:47                                           ` Linus Torvalds
  2018-02-21 22:52                                         ` Kees Cook
  -1 siblings, 2 replies; 82+ messages in thread
From: Maciej S. Szmigiero @ 2018-02-21 22:19 UTC (permalink / raw)
  To: Kees Cook, Patrick McLean
  Cc: Linus Torvalds, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On 18.11.2017 06:14, Kees Cook wrote:
> On Fri, Nov 17, 2017 at 5:54 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-17 04:55 PM, Linus Torvalds wrote:
>>> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>>>
>>>> I am still getting the crash at d9e12200852d, I figured I would
>>>> double-check the "good" and "bad" kernels before starting a full bisect.
>>>
>>> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?
>>
>> Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.
> 
> That's strange. With d9e12200852d the shuffle_seed variables won't
> ever actually get used. (i.e. I wouldn't expect the seed to change any
> behavior.)
> 
> Can you confirm with something like this:
> 
> 
>         for (i = 0; i < 4; i++) {
>                 seed[i] = shuffle_seed[i];
> 
> 
> You should see no reports of "Shuffling struct ..."
> 
> And if it reports nothing, and you're on d9e12200852d, can you confirm
> that switching to a "good" seed fixes it? (If it _does_, then I
> suspect a build artifact being left behind or something odd like
> that.)
> 
>>> Kees removed even the baseline "randomize pure function pointer
>>> structures", so at that commit, nothing should be randomized.
>>>
>>> But maybe the plugin code itself ends up confusing gcc somehow?
>>>
>>> Even when it doesn't actually do that "relayout_struct()" on the
>>> structure, it always does those TYPE_ATTRIBUTES() games.
> 
> FWIW, myself doing a build at d9e12200852d with and without
> GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
> where I did spot-checks.
> 

I have also hit a GPF in nfsd4_encode_fattr() with RANDSTRUCT plugin
enabled.
This function is located in a fs/nfsd/nfs4xdr.c file.

The fault happened at "xdr_encode_hyper(p, exp->ex_path.mnt->mnt_sb->s_maxbytes)"
line, namely when accessing s_maxbytes.

exp->ex_path is of type struct path that has been annotated with
__randomize_layout.
It seems to me that this annotation isn't really taken into consideration
when compiling nfs4xdr.c.
This most likely results in dereferencing a value of exp->ex_path.dentry
instead of exp->ex_path.mnt. Then some member of struct dentry is
dereferenced as struct super_block to access its s_maxbytes member which
results in an oops if it happens to be an invalid pointer (which it was
in my case).

How to reproduce the problem statically (tested on current Linus's tree
and on 4.15.4, with gcc 7.3.0):
1) Enable RANDSTRUCT plugin,

2) Use a RANDSTRUCT seed that results in shuffling of struct path,
Example: "55e5fea7ff662b333a190209ab31f35b6f0f2470f7d0e3c64430936169571106".

3) make fs/nfsd/nfs4xdr.s and save the result,

4) Insert "#include <linux/compiler_types.h>" at the top of
fs/nfsd/nfs4xdr.c as the very first include directive.

5) make fs/nfsd/nfs4xdr.s and compare the result with the one from step 3.

One can see that offsets used to access various members of struct path are
different, and also that the original file from step 3 contains an object
named "__randomize_layout".

This is caused by a fact that the current version of nfs4xdr.c includes
linux/fs_struct.h as the very first included header which then includes
linux/path.h as the very first included header, which then defines
struct path, but without including any files on its own.

This results in __randomize_layout tag at the end of struct path
definition being treated as a variable name (since linux/compiler-gcc.h
that defines it as a type attribute has not been included yet).

It looks like to me that every header file that defines a randomized
struct also has to include linux/compiler_types.h or some other file
that ultimately results in that file inclusion in order to make
the RANDSTRUCT plugin work correctly.

Maciej

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-02-21 22:19                                       ` RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11) Maciej S. Szmigiero
@ 2018-02-21 22:47                                           ` Linus Torvalds
  2018-02-21 22:52                                         ` Kees Cook
  1 sibling, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2018-02-21 22:47 UTC (permalink / raw)
  To: Maciej S. Szmigiero
  Cc: Kees Cook, Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On Wed, Feb 21, 2018 at 2:19 PM, Maciej S. Szmigiero
<mail@maciej.szmigiero.name> wrote:
>
> One can see that offsets used to access various members of struct path are
> different, and also that the original file from step 3 contains an object
> named "__randomize_layout".

Whee.

Thanks for root-causing this issue, and this syntax of ours is clearly
*much* too fragile.

We actually have similar issues with some of our other attributes,
where out nice "helpful" attribute shorthand can end up being just
silently interpreted as a variable name if they aren't defined in
time.

For most of our other attributes, it just doesn't matter all that much
if some user doesn't happen to see the attribute. For
__randomize_layout, it's obviously very fatal, and silently just
generates crazy code.

I'm not entirely sure what the right solution is, because it's
obviously much too easy to miss some #include by mistake. It's easy to
say "you should always include the proper header", but if a failure to
do so doesn't end up with any warnings or errors, but just silent bad
code generation, it's much too fragile.

I wonder if we could change the syntax of that "__randomize_layout"
thing. Some of our related helper macros (ie
randomized_struct_fields_start/end) don't have the same problem,
because if you don't have the define for them, the compiler will
complain about bad syntax.

And other attribute specifiers we encourage people to put in other
parts of the type, like __user etc, so they don't have that same
parsing issue.

I guess one _extreme_ fix for this would be to put

    extern struct nostruct __randomize_layout;

in our include/linux/kconfig.h, which I think we end up always
including first thanks to having it on the command line.

Because if you do that, you actually get an error:

    CC [M]  fs/nfsd/nfs4xdr.o
  In file included from ./include/linux/fs_struct.h:5:0,
                   from fs/nfsd/nfs4xdr.c:36:
  ./include/linux/path.h:11:3: error: conflicting types for ‘__randomize_layout’
   } __randomize_layout;
     ^~~~~~~~~~~~~~~~~~
  In file included from <command-line>:0:0:
  ././include/linux/kconfig.h:8:28: note: previous declaration of
‘__randomize_layout’ was here
       extern struct nostruct __randomize_layout;
                              ^~~~~~~~~~~~~~~~~~
  make[1]: *** [scripts/Makefile.build:317: fs/nfsd/nfs4xdr.o] Error 1

and we would have figured this out immediately.

Broken example patch appended, in case somebody wants to play with
something like this or comes up with a better model entirely..

               Linus

---

diff --git a/include/linux/kconfig.h b/include/linux/kconfig.h
index fec5076eda91..537dacb83380 100644
--- a/include/linux/kconfig.h
+++ b/include/linux/kconfig.h
@@ -4,6 +4,10 @@

 #include <generated/autoconf.h>

+#ifndef __ASSEMBLY__
+ extern struct nostruct __randomize_layout;
+#endif
+
 #define __ARG_PLACEHOLDER_1 0,
 #define __take_second_arg(__ignored, val, ...) val

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
@ 2018-02-21 22:47                                           ` Linus Torvalds
  0 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2018-02-21 22:47 UTC (permalink / raw)
  To: Maciej S. Szmigiero
  Cc: Kees Cook, Patrick McLean, Emese Revfy, Al Viro, Bruce Fields,
	Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On Wed, Feb 21, 2018 at 2:19 PM, Maciej S. Szmigiero
<mail@maciej.szmigiero.name> wrote:
>
> One can see that offsets used to access various members of struct path ar=
e
> different, and also that the original file from step 3 contains an object
> named "__randomize_layout".

Whee.

Thanks for root-causing this issue, and this syntax of ours is clearly
*much* too fragile.

We actually have similar issues with some of our other attributes,
where out nice "helpful" attribute shorthand can end up being just
silently interpreted as a variable name if they aren't defined in
time.

For most of our other attributes, it just doesn't matter all that much
if some user doesn't happen to see the attribute. For
__randomize_layout, it's obviously very fatal, and silently just
generates crazy code.

I'm not entirely sure what the right solution is, because it's
obviously much too easy to miss some #include by mistake. It's easy to
say "you should always include the proper header", but if a failure to
do so doesn't end up with any warnings or errors, but just silent bad
code generation, it's much too fragile.

I wonder if we could change the syntax of that "__randomize_layout"
thing. Some of our related helper macros (ie
randomized_struct_fields_start/end) don't have the same problem,
because if you don't have the define for them, the compiler will
complain about bad syntax.

And other attribute specifiers we encourage people to put in other
parts of the type, like __user etc, so they don't have that same
parsing issue.

I guess one _extreme_ fix for this would be to put

    extern struct nostruct __randomize_layout;

in our include/linux/kconfig.h, which I think we end up always
including first thanks to having it on the command line.

Because if you do that, you actually get an error:

    CC [M]  fs/nfsd/nfs4xdr.o
  In file included from ./include/linux/fs_struct.h:5:0,
                   from fs/nfsd/nfs4xdr.c:36:
  ./include/linux/path.h:11:3: error: conflicting types for =E2=80=98__rand=
omize_layout=E2=80=99
   } __randomize_layout;
     ^~~~~~~~~~~~~~~~~~
  In file included from <command-line>:0:0:
  ././include/linux/kconfig.h:8:28: note: previous declaration of
=E2=80=98__randomize_layout=E2=80=99 was here
       extern struct nostruct __randomize_layout;
                              ^~~~~~~~~~~~~~~~~~
  make[1]: *** [scripts/Makefile.build:317: fs/nfsd/nfs4xdr.o] Error 1

and we would have figured this out immediately.

Broken example patch appended, in case somebody wants to play with
something like this or comes up with a better model entirely..

               Linus

---

diff --git a/include/linux/kconfig.h b/include/linux/kconfig.h
index fec5076eda91..537dacb83380 100644
--- a/include/linux/kconfig.h
+++ b/include/linux/kconfig.h
@@ -4,6 +4,10 @@

 #include <generated/autoconf.h>

+#ifndef __ASSEMBLY__
+ extern struct nostruct __randomize_layout;
+#endif
+
 #define __ARG_PLACEHOLDER_1 0,
 #define __take_second_arg(__ignored, val, ...) val

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-02-21 22:19                                       ` RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11) Maciej S. Szmigiero
  2018-02-21 22:47                                           ` Linus Torvalds
@ 2018-02-21 22:52                                         ` Kees Cook
  2018-02-21 23:24                                           ` Linus Torvalds
  1 sibling, 1 reply; 82+ messages in thread
From: Kees Cook @ 2018-02-21 22:52 UTC (permalink / raw)
  To: Maciej S. Szmigiero
  Cc: Patrick McLean, Linus Torvalds, Emese Revfy, Al Viro,
	Bruce Fields, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On Wed, Feb 21, 2018 at 2:19 PM, Maciej S. Szmigiero
<mail@maciej.szmigiero.name> wrote:
> One can see that offsets used to access various members of struct path are
> different, and also that the original file from step 3 contains an object
> named "__randomize_layout".
>
> This is caused by a fact that the current version of nfs4xdr.c includes
> linux/fs_struct.h as the very first included header which then includes
> linux/path.h as the very first included header, which then defines
> struct path, but without including any files on its own.
>
> This results in __randomize_layout tag at the end of struct path
> definition being treated as a variable name (since linux/compiler-gcc.h
> that defines it as a type attribute has not been included yet).

Oh, well done! That would explain the code offset I was seeing when
the plugin on, but no-op, since the variable would still exist.

I'll play with Linus's suggestion and see what we get.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-02-21 22:52                                         ` Kees Cook
@ 2018-02-21 23:24                                           ` Linus Torvalds
  2018-02-22  0:12                                             ` Kees Cook
  0 siblings, 1 reply; 82+ messages in thread
From: Linus Torvalds @ 2018-02-21 23:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: Maciej S. Szmigiero, Patrick McLean, Emese Revfy, Al Viro,
	Bruce Fields, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On Wed, Feb 21, 2018 at 2:52 PM, Kees Cook <keescook@chromium.org> wrote:
>
> I'll play with Linus's suggestion and see what we get.

It may be just as well to just include <linux/compiler_types.h> from
<linux/kconfig.h> and be done with it.

If you look at that hacky script I documented in commit 23c35f48f5fb
("pinctrl: remove include file from <linux/device.h>") and run it in a
fully built kernel tree, you'll see that that header is included from
pretty much every single file anyway. At least for me, for an
allmodconfig build, the top headers are

  23322 arch/x86/include/uapi/asm/types.h
  23322 include/asm-generic/int-ll64.h
  23322 include/linux/types.h
  23322 include/uapi/asm-generic/int-ll64.h
  23322 include/uapi/asm-generic/types.h
  23322 include/uapi/linux/types.h
  23323 arch/x86/include/uapi/asm/bitsperlong.h
  23323 include/asm-generic/bitsperlong.h
  23323 include/uapi/asm-generic/bitsperlong.h
  23326 include/linux/stringify.h
  23390 include/linux/compiler_types.h

and considering that I have 25949 object files in that tree, it really
means that just about every compile ended up including that
<linux/compiler_types.h> file anyway (yeah, the "orc_types.h" header
ends up being mentioned twice for most files, so it looks even more
hot, but that's not real data).

I do hate including unnecessary stuff because it makes builds slower,
but kernel header files probably don't get much more core than
<linux/compiler_types.h>.

              Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-02-21 22:47                                           ` Linus Torvalds
@ 2018-02-21 23:34                                             ` Kees Cook
  -1 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2018-02-21 23:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maciej S. Szmigiero, Patrick McLean, Emese Revfy, Al Viro,
	Bruce Fields, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On Wed, Feb 21, 2018 at 2:47 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> And other attribute specifiers we encourage people to put in other
> parts of the type, like __user etc, so they don't have that same
> parsing issue.

Looking at other attributes we use on structs, we may have similar
risks for these:

__packed
____cacheline_aligned
____cacheline_aligned_in_smp
____cacheline_internodealigned_in_smp

But they just haven't been used in places that we could trip over it
as badly, AFAICT.

> I guess one _extreme_ fix for this would be to put
>
>     extern struct nostruct __randomize_layout;
>
> in our include/linux/kconfig.h, which I think we end up always
> including first thanks to having it on the command line.

We could do that for all the above, but I wonder if the real problem
is our convention of using "regular" names for these kinds of
attributes instead of parameterized names. If we always used something
like:

#define __struct(x)   __attribute__(x)

We'd avoid it, but we'd uglify our struct attributes:

struct thing { ... } __struct(randomize_layout);

though trying this now creates other problems. Hmmm.

(Regardless, let me send the nfs fix separately...)

-Kees

>
> Because if you do that, you actually get an error:
>
>     CC [M]  fs/nfsd/nfs4xdr.o
>   In file included from ./include/linux/fs_struct.h:5:0,
>                    from fs/nfsd/nfs4xdr.c:36:
>   ./include/linux/path.h:11:3: error: conflicting types for ‘__randomize_layout’
>    } __randomize_layout;
>      ^~~~~~~~~~~~~~~~~~
>   In file included from <command-line>:0:0:
>   ././include/linux/kconfig.h:8:28: note: previous declaration of
> ‘__randomize_layout’ was here
>        extern struct nostruct __randomize_layout;
>                               ^~~~~~~~~~~~~~~~~~
>   make[1]: *** [scripts/Makefile.build:317: fs/nfsd/nfs4xdr.o] Error 1
>
> and we would have figured this out immediately.
>
> Broken example patch appended, in case somebody wants to play with
> something like this or comes up with a better model entirely..
>
>                Linus
>
> ---
>
> diff --git a/include/linux/kconfig.h b/include/linux/kconfig.h
> index fec5076eda91..537dacb83380 100644
> --- a/include/linux/kconfig.h
> +++ b/include/linux/kconfig.h
> @@ -4,6 +4,10 @@
>
>  #include <generated/autoconf.h>
>
> +#ifndef __ASSEMBLY__
> + extern struct nostruct __randomize_layout;
> +#endif
> +
>  #define __ARG_PLACEHOLDER_1 0,
>  #define __take_second_arg(__ignored, val, ...) val



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
@ 2018-02-21 23:34                                             ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2018-02-21 23:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maciej S. Szmigiero, Patrick McLean, Emese Revfy, Al Viro,
	Bruce Fields, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On Wed, Feb 21, 2018 at 2:47 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> And other attribute specifiers we encourage people to put in other
> parts of the type, like __user etc, so they don't have that same
> parsing issue.

Looking at other attributes we use on structs, we may have similar
risks for these:

__packed
____cacheline_aligned
____cacheline_aligned_in_smp
____cacheline_internodealigned_in_smp

But they just haven't been used in places that we could trip over it
as badly, AFAICT.

> I guess one _extreme_ fix for this would be to put
>
>     extern struct nostruct __randomize_layout;
>
> in our include/linux/kconfig.h, which I think we end up always
> including first thanks to having it on the command line.

We could do that for all the above, but I wonder if the real problem
is our convention of using "regular" names for these kinds of
attributes instead of parameterized names. If we always used something
like:

#define __struct(x)   __attribute__(x)

We'd avoid it, but we'd uglify our struct attributes:

struct thing { ... } __struct(randomize_layout);

though trying this now creates other problems. Hmmm.

(Regardless, let me send the nfs fix separately...)

-Kees

>
> Because if you do that, you actually get an error:
>
>     CC [M]  fs/nfsd/nfs4xdr.o
>   In file included from ./include/linux/fs_struct.h:5:0,
>                    from fs/nfsd/nfs4xdr.c:36:
>   ./include/linux/path.h:11:3: error: conflicting types for =E2=80=98__ra=
ndomize_layout=E2=80=99
>    } __randomize_layout;
>      ^~~~~~~~~~~~~~~~~~
>   In file included from <command-line>:0:0:
>   ././include/linux/kconfig.h:8:28: note: previous declaration of
> =E2=80=98__randomize_layout=E2=80=99 was here
>        extern struct nostruct __randomize_layout;
>                               ^~~~~~~~~~~~~~~~~~
>   make[1]: *** [scripts/Makefile.build:317: fs/nfsd/nfs4xdr.o] Error 1
>
> and we would have figured this out immediately.
>
> Broken example patch appended, in case somebody wants to play with
> something like this or comes up with a better model entirely..
>
>                Linus
>
> ---
>
> diff --git a/include/linux/kconfig.h b/include/linux/kconfig.h
> index fec5076eda91..537dacb83380 100644
> --- a/include/linux/kconfig.h
> +++ b/include/linux/kconfig.h
> @@ -4,6 +4,10 @@
>
>  #include <generated/autoconf.h>
>
> +#ifndef __ASSEMBLY__
> + extern struct nostruct __randomize_layout;
> +#endif
> +
>  #define __ARG_PLACEHOLDER_1 0,
>  #define __take_second_arg(__ignored, val, ...) val



--=20
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-02-21 23:24                                           ` Linus Torvalds
@ 2018-02-22  0:12                                             ` Kees Cook
  2018-02-22  0:22                                               ` Linus Torvalds
  0 siblings, 1 reply; 82+ messages in thread
From: Kees Cook @ 2018-02-22  0:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maciej S. Szmigiero, Patrick McLean, Emese Revfy, Al Viro,
	Bruce Fields, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On Wed, Feb 21, 2018 at 3:24 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Feb 21, 2018 at 2:52 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> I'll play with Linus's suggestion and see what we get.
>
> It may be just as well to just include <linux/compiler_types.h> from
> <linux/kconfig.h> and be done with it.

Hah, yeah, that would certainly solve it too. :)

> I do hate including unnecessary stuff because it makes builds slower,
> but kernel header files probably don't get much more core than
> <linux/compiler_types.h>.

It also has the benefit of not letting it "go wrong" in the first
place. (And the separate fix for nfs isn't needed...)

Do you want me to send the patch for this, or do you already have it
prepared? The body-fields I had prepared for the nfs were:

Reported-by: Patrick McLean <chutzpah@gentoo.org>
Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Fixes: 3859a271a003 ("randstruct: Mark various structs for randomization")

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-02-22  0:12                                             ` Kees Cook
@ 2018-02-22  0:22                                               ` Linus Torvalds
  2018-02-22  0:23                                                 ` Kees Cook
  0 siblings, 1 reply; 82+ messages in thread
From: Linus Torvalds @ 2018-02-22  0:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: Maciej S. Szmigiero, Patrick McLean, Emese Revfy, Al Viro,
	Bruce Fields, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On Wed, Feb 21, 2018 at 4:12 PM, Kees Cook <keescook@chromium.org> wrote:
>
> Do you want me to send the patch for this, or do you already have it
> prepared?

I'd rather get something explicitly tested. I tried my earlier patch
with "make allmodconfig" (and a fix to nfsd to make it compile), but
now I'm back to testing hjl's gas updates so it would be better to get
a tested commit with a good commit message.

> The body-fields I had prepared for the nfs were:
>
> Reported-by: Patrick McLean <chutzpah@gentoo.org>
> Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>

Oh, I think Maciej needs to get more than a "Reported-by:". This was a
really subtle thing that we didn't figure out in the original thread,
so give him a gold star in the form of "Root-caused-by:" or something.

*Fixing* this ends up being a one-liner or so. Finding the cause was
the painful part.

               Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-02-22  0:22                                               ` Linus Torvalds
@ 2018-02-22  0:23                                                 ` Kees Cook
  2018-02-22  0:27                                                   ` Kees Cook
  0 siblings, 1 reply; 82+ messages in thread
From: Kees Cook @ 2018-02-22  0:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maciej S. Szmigiero, Patrick McLean, Emese Revfy, Al Viro,
	Bruce Fields, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On Wed, Feb 21, 2018 at 4:22 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Feb 21, 2018 at 4:12 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> Do you want me to send the patch for this, or do you already have it
>> prepared?
>
> I'd rather get something explicitly tested. I tried my earlier patch
> with "make allmodconfig" (and a fix to nfsd to make it compile), but
> now I'm back to testing hjl's gas updates so it would be better to get
> a tested commit with a good commit message.
>
>> The body-fields I had prepared for the nfs were:
>>
>> Reported-by: Patrick McLean <chutzpah@gentoo.org>
>> Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
>
> Oh, I think Maciej needs to get more than a "Reported-by:". This was a
> really subtle thing that we didn't figure out in the original thread,
> so give him a gold star in the form of "Root-caused-by:" or something.

Oops, I just sent this out. I will adjust a re-send. I couldn't find a
documented field name for this...

> *Fixing* this ends up being a one-liner or so. Finding the cause was
> the painful part.

Yes indeed!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-02-22  0:23                                                 ` Kees Cook
@ 2018-02-22  0:27                                                   ` Kees Cook
  0 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2018-02-22  0:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maciej S. Szmigiero, Patrick McLean, Emese Revfy, Al Viro,
	Bruce Fields, Darrick J. Wong, Linux Kernel Mailing List,
	Linux NFS Mailing List, Thorsten Leemhuis, kernel-hardening

On Wed, Feb 21, 2018 at 4:23 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Feb 21, 2018 at 4:22 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Wed, Feb 21, 2018 at 4:12 PM, Kees Cook <keescook@chromium.org> wrote:
>>>
>>> Do you want me to send the patch for this, or do you already have it
>>> prepared?
>>
>> I'd rather get something explicitly tested. I tried my earlier patch
>> with "make allmodconfig" (and a fix to nfsd to make it compile), but
>> now I'm back to testing hjl's gas updates so it would be better to get
>> a tested commit with a good commit message.
>>
>>> The body-fields I had prepared for the nfs were:
>>>
>>> Reported-by: Patrick McLean <chutzpah@gentoo.org>
>>> Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
>>
>> Oh, I think Maciej needs to get more than a "Reported-by:". This was a
>> really subtle thing that we didn't figure out in the original thread,
>> so give him a gold star in the form of "Root-caused-by:" or something.
>
> Oops, I just sent this out. I will adjust a re-send. I couldn't find a
> documented field name for this...

With the "root-cause" hint, I see we have used:

2    Root-cause-analysis-by:
2    Root-caused-by:
1    Root-cause-found-by:

I'll go with your "Root-caused-by" to tip the scale. :)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-02-21 22:47                                           ` Linus Torvalds
@ 2018-03-05  9:27                                             ` Masahiro Yamada
  -1 siblings, 0 replies; 82+ messages in thread
From: Masahiro Yamada @ 2018-03-05  9:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maciej S. Szmigiero, Kees Cook, Patrick McLean, Emese Revfy,
	Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List,
	Thorsten Leemhuis, kernel-hardening

Hi Linus,

2018-02-22 7:47 GMT+09:00 Linus Torvalds <torvalds@linux-foundation.org>:
> On Wed, Feb 21, 2018 at 2:19 PM, Maciej S. Szmigiero
> <mail@maciej.szmigiero.name> wrote:
>>
>> One can see that offsets used to access various members of struct path are
>> different, and also that the original file from step 3 contains an object
>> named "__randomize_layout".
>
> Whee.
>
> Thanks for root-causing this issue, and this syntax of ours is clearly
> *much* too fragile.
>
> We actually have similar issues with some of our other attributes,
> where out nice "helpful" attribute shorthand can end up being just
> silently interpreted as a variable name if they aren't defined in
> time.
>
> For most of our other attributes, it just doesn't matter all that much
> if some user doesn't happen to see the attribute. For
> __randomize_layout, it's obviously very fatal, and silently just
> generates crazy code.
>
> I'm not entirely sure what the right solution is, because it's
> obviously much too easy to miss some #include by mistake. It's easy to
> say "you should always include the proper header", but if a failure to
> do so doesn't end up with any warnings or errors, but just silent bad
> code generation, it's much too fragile.
>
> I wonder if we could change the syntax of that "__randomize_layout"
> thing. Some of our related helper macros (ie
> randomized_struct_fields_start/end) don't have the same problem,
> because if you don't have the define for them, the compiler will
> complain about bad syntax.
>
> And other attribute specifiers we encourage people to put in other
> parts of the type, like __user etc, so they don't have that same
> parsing issue.
>
> I guess one _extreme_ fix for this would be to put
>
>     extern struct nostruct __randomize_layout;
>
> in our include/linux/kconfig.h, which I think we end up always
> including first thanks to having it on the command line.
>
> Because if you do that, you actually get an error:
>
>     CC [M]  fs/nfsd/nfs4xdr.o
>   In file included from ./include/linux/fs_struct.h:5:0,
>                    from fs/nfsd/nfs4xdr.c:36:
>   ./include/linux/path.h:11:3: error: conflicting types for ‘__randomize_layout’
>    } __randomize_layout;
>      ^~~~~~~~~~~~~~~~~~
>   In file included from <command-line>:0:0:
>   ././include/linux/kconfig.h:8:28: note: previous declaration of
> ‘__randomize_layout’ was here
>        extern struct nostruct __randomize_layout;
>                               ^~~~~~~~~~~~~~~~~~
>   make[1]: *** [scripts/Makefile.build:317: fs/nfsd/nfs4xdr.o] Error 1
>
> and we would have figured this out immediately.
>
> Broken example patch appended, in case somebody wants to play with
> something like this or comes up with a better model entirely..
>
>                Linus
>


Sorry for chiming in late.

I noticed this thread today,
honestly, the commit made me upset.


Can I suggest another way to make it less fragile?
__attribute((...)) can be placed after 'struct'.


So, we can write:


struct __randomize_layout path {
        struct vfsmount *mnt;
        struct dentry *dentry;
};


  instead of


struct path {
        struct vfsmount *mnt;
        struct dentry *dentry;
} __randomize_layout;



If we force the former notation,
the undefined __randomize_layout results in a build error
instead of silent broken code generation.


It is true somebody can still place
__randomize_layout after the closing brace,
but can we check this by coccicheck or checkpatch.pl?
(we can describe it in coding style documentation, of course)


IMHO, we should not (ab)use include/linux/kconfig.h
to bring in misc things.


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
@ 2018-03-05  9:27                                             ` Masahiro Yamada
  0 siblings, 0 replies; 82+ messages in thread
From: Masahiro Yamada @ 2018-03-05  9:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maciej S. Szmigiero, Kees Cook, Patrick McLean, Emese Revfy,
	Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List,
	Thorsten Leemhuis, kernel-hardening

Hi Linus,

2018-02-22 7:47 GMT+09:00 Linus Torvalds <torvalds@linux-foundation.org>:
> On Wed, Feb 21, 2018 at 2:19 PM, Maciej S. Szmigiero
> <mail@maciej.szmigiero.name> wrote:
>>
>> One can see that offsets used to access various members of struct path a=
re
>> different, and also that the original file from step 3 contains an objec=
t
>> named "__randomize_layout".
>
> Whee.
>
> Thanks for root-causing this issue, and this syntax of ours is clearly
> *much* too fragile.
>
> We actually have similar issues with some of our other attributes,
> where out nice "helpful" attribute shorthand can end up being just
> silently interpreted as a variable name if they aren't defined in
> time.
>
> For most of our other attributes, it just doesn't matter all that much
> if some user doesn't happen to see the attribute. For
> __randomize_layout, it's obviously very fatal, and silently just
> generates crazy code.
>
> I'm not entirely sure what the right solution is, because it's
> obviously much too easy to miss some #include by mistake. It's easy to
> say "you should always include the proper header", but if a failure to
> do so doesn't end up with any warnings or errors, but just silent bad
> code generation, it's much too fragile.
>
> I wonder if we could change the syntax of that "__randomize_layout"
> thing. Some of our related helper macros (ie
> randomized_struct_fields_start/end) don't have the same problem,
> because if you don't have the define for them, the compiler will
> complain about bad syntax.
>
> And other attribute specifiers we encourage people to put in other
> parts of the type, like __user etc, so they don't have that same
> parsing issue.
>
> I guess one _extreme_ fix for this would be to put
>
>     extern struct nostruct __randomize_layout;
>
> in our include/linux/kconfig.h, which I think we end up always
> including first thanks to having it on the command line.
>
> Because if you do that, you actually get an error:
>
>     CC [M]  fs/nfsd/nfs4xdr.o
>   In file included from ./include/linux/fs_struct.h:5:0,
>                    from fs/nfsd/nfs4xdr.c:36:
>   ./include/linux/path.h:11:3: error: conflicting types for =E2=80=98__ra=
ndomize_layout=E2=80=99
>    } __randomize_layout;
>      ^~~~~~~~~~~~~~~~~~
>   In file included from <command-line>:0:0:
>   ././include/linux/kconfig.h:8:28: note: previous declaration of
> =E2=80=98__randomize_layout=E2=80=99 was here
>        extern struct nostruct __randomize_layout;
>                               ^~~~~~~~~~~~~~~~~~
>   make[1]: *** [scripts/Makefile.build:317: fs/nfsd/nfs4xdr.o] Error 1
>
> and we would have figured this out immediately.
>
> Broken example patch appended, in case somebody wants to play with
> something like this or comes up with a better model entirely..
>
>                Linus
>


Sorry for chiming in late.

I noticed this thread today,
honestly, the commit made me upset.


Can I suggest another way to make it less fragile?
__attribute((...)) can be placed after 'struct'.


So, we can write:


struct __randomize_layout path {
        struct vfsmount *mnt;
        struct dentry *dentry;
};


  instead of


struct path {
        struct vfsmount *mnt;
        struct dentry *dentry;
} __randomize_layout;



If we force the former notation,
the undefined __randomize_layout results in a build error
instead of silent broken code generation.


It is true somebody can still place
__randomize_layout after the closing brace,
but can we check this by coccicheck or checkpatch.pl?
(we can describe it in coding style documentation, of course)


IMHO, we should not (ab)use include/linux/kconfig.h
to bring in misc things.


--=20
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-03-05  9:27                                             ` Masahiro Yamada
  (?)
@ 2018-03-05 19:15                                             ` Kees Cook
  -1 siblings, 0 replies; 82+ messages in thread
From: Kees Cook @ 2018-03-05 19:15 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Linus Torvalds, Maciej S. Szmigiero, Patrick McLean, Emese Revfy,
	Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List,
	Thorsten Leemhuis, kernel-hardening

On Mon, Mar 5, 2018 at 1:27 AM, Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
> Sorry for chiming in late.
>
> I noticed this thread today,
> honestly, the commit made me upset.
>
>
> Can I suggest another way to make it less fragile?
> __attribute((...)) can be placed after 'struct'.
>
>
> So, we can write:
>
>
> struct __randomize_layout path {
>         struct vfsmount *mnt;
>         struct dentry *dentry;
> };
>
>
>   instead of
>
>
> struct path {
>         struct vfsmount *mnt;
>         struct dentry *dentry;
> } __randomize_layout;

Ugh. I had tried this after the struct _name_, not after "struct"
itself. This does fix it, though it remains fragile, as you mention.

> If we force the former notation,
> the undefined __randomize_layout results in a build error
> instead of silent broken code generation.
>
>
> It is true somebody can still place
> __randomize_layout after the closing brace,
> but can we check this by coccicheck or checkpatch.pl?
> (we can describe it in coding style documentation, of course)
>
>
> IMHO, we should not (ab)use include/linux/kconfig.h
> to bring in misc things.

I'm happy to send a patch that reverts the other changes and relocates
all the markings...

Linus, how would you like this to go?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11)
  2018-03-05  9:27                                             ` Masahiro Yamada
  (?)
  (?)
@ 2018-03-05 19:18                                             ` Linus Torvalds
  -1 siblings, 0 replies; 82+ messages in thread
From: Linus Torvalds @ 2018-03-05 19:18 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Maciej S. Szmigiero, Kees Cook, Patrick McLean, Emese Revfy,
	Al Viro, Bruce Fields, Darrick J. Wong,
	Linux Kernel Mailing List, Linux NFS Mailing List,
	Thorsten Leemhuis, kernel-hardening

On Mon, Mar 5, 2018 at 1:27 AM, Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>
> Can I suggest another way to make it less fragile?
> __attribute((...)) can be placed after 'struct'.

That avoids the actual bug, but it wouldn't have helped _find_ the
problem in the first place.

If somebody ever does the same thing, they'd hit the same issue. And
it's not just __randomize_struct, it's any of our other type markers.

We can say "don't do that", but if there is no automated checking,
it's still ripe to cause problems just because somebody didn't notice.

So I'd rather have something that causes a build failure when
something goes wrong, rather than silently accepting syntax that
wasn't intended.

                   Linus

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2018-03-05 19:18 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09  0:43 [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 Patrick McLean
2017-11-09  2:40 ` Linus Torvalds
2017-11-09  3:45   ` Al Viro
2017-11-09 19:34   ` Patrick McLean
2017-11-09 19:38     ` Al Viro
2017-11-09 19:42       ` Patrick McLean
2017-11-09 19:37   ` Al Viro
2017-11-09 19:51     ` Patrick McLean
2017-11-09 20:04       ` Linus Torvalds
2017-11-09 21:16         ` Al Viro
2017-11-10  1:58         ` Patrick McLean
2017-11-10 13:53           ` Arnd Bergmann
2017-11-10 18:42           ` Linus Torvalds
2017-11-10 23:26             ` Patrick McLean
2017-11-11  0:27               ` Patrick McLean
2017-11-11  2:36                 ` Linus Torvalds
2017-11-11  2:36                   ` [kernel-hardening] " Linus Torvalds
2017-11-11  2:36                   ` Linus Torvalds
2017-11-11 16:13                   ` Kees Cook
2017-11-11 16:13                     ` [kernel-hardening] " Kees Cook
2017-11-11 16:13                     ` Kees Cook
2017-11-11 17:31                     ` Linus Torvalds
2017-11-11 17:31                       ` [kernel-hardening] " Linus Torvalds
2017-11-11 17:31                       ` Linus Torvalds
2017-11-13 22:48                       ` Patrick McLean
2017-11-13 22:48                         ` [kernel-hardening] " Patrick McLean
2017-11-13 22:48                         ` Patrick McLean
2017-11-17  0:54                         ` Kees Cook
2017-11-17  0:54                           ` [kernel-hardening] " Kees Cook
2017-11-17  0:54                           ` Kees Cook
2017-11-17 19:03                           ` Patrick McLean
2017-11-17 19:03                             ` [kernel-hardening] " Patrick McLean
2017-11-17 19:03                             ` Patrick McLean
2017-11-17 21:26                             ` Kees Cook
2017-11-17 21:26                               ` [kernel-hardening] " Kees Cook
2017-11-17 21:26                               ` Kees Cook
2017-11-18  0:27                               ` Patrick McLean
2017-11-18  0:27                                 ` [kernel-hardening] " Patrick McLean
2017-11-18  0:27                                 ` Patrick McLean
2017-11-18  0:55                                 ` Linus Torvalds
2017-11-18  0:55                                   ` [kernel-hardening] " Linus Torvalds
2017-11-18  0:55                                   ` Linus Torvalds
2017-11-18  1:54                                   ` Patrick McLean
2017-11-18  1:54                                     ` [kernel-hardening] " Patrick McLean
2017-11-18  1:54                                     ` Patrick McLean
2017-11-18  5:14                                     ` Kees Cook
2017-11-18  5:14                                       ` [kernel-hardening] " Kees Cook
2017-11-18  5:14                                       ` Kees Cook
2017-11-18  5:29                                       ` Linus Torvalds
2017-11-18  5:29                                         ` [kernel-hardening] " Linus Torvalds
2017-11-18  5:29                                         ` Linus Torvalds
2017-11-18  8:20                                         ` Kees Cook
2017-11-18  8:20                                           ` [kernel-hardening] " Kees Cook
2017-11-18  8:20                                           ` Kees Cook
2018-02-21 22:19                                       ` RANDSTRUCT structs need linux/compiler_types.h (Was: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11) Maciej S. Szmigiero
2018-02-21 22:47                                         ` Linus Torvalds
2018-02-21 22:47                                           ` Linus Torvalds
2018-02-21 23:34                                           ` Kees Cook
2018-02-21 23:34                                             ` Kees Cook
2018-03-05  9:27                                           ` Masahiro Yamada
2018-03-05  9:27                                             ` Masahiro Yamada
2018-03-05 19:15                                             ` Kees Cook
2018-03-05 19:18                                             ` Linus Torvalds
2018-02-21 22:52                                         ` Kees Cook
2018-02-21 23:24                                           ` Linus Torvalds
2018-02-22  0:12                                             ` Kees Cook
2018-02-22  0:22                                               ` Linus Torvalds
2018-02-22  0:23                                                 ` Kees Cook
2018-02-22  0:27                                                   ` Kees Cook
2017-11-11  1:13               ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 J. Bruce Fields
2017-11-11  2:32                 ` Al Viro
2017-11-10  1:47       ` Patrick McLean
2017-11-09 20:47   ` J. Bruce Fields
2017-11-09 23:07     ` Patrick McLean
2017-11-13 22:59   ` bit tweaks [was: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11] Rasmus Villemoes
2017-11-13 23:30     ` Linus Torvalds
2017-11-13 23:54       ` Linus Torvalds
2017-11-14 22:24         ` Rasmus Villemoes
2017-11-14 22:43           ` Linus Torvalds
2017-11-14 23:53             ` Rasmus Villemoes
2017-11-15  0:02               ` Linus Torvalds
2017-11-11  2:47 ` [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 Alan Cox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.