linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: BUG: unable to handle kernel NULL pointer dereference in rb_insert_color
       [not found] <94eb2c1170ce36bd770560ad6d3a@google.com>
@ 2017-12-19 21:59 ` Eric Biggers
  2017-12-20  7:50   ` Dmitry Vyukov
  2019-12-09 13:29   ` ppc64el kernel access of bad area (ext4_htree_store_dirent->rb_insert_color) Rafael David Tinoco
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Biggers @ 2017-12-19 21:59 UTC (permalink / raw)
  To: syzbot; +Cc: adilger.kernel, linux-ext4, linux-kernel, syzkaller-bugs, tytso

On Tue, Dec 19, 2017 at 12:41:01AM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> sctp: [Deprecated]: syz-executor6 (pid 4202) Use of int in max_burst
> socket option.
> Use struct sctp_assoc_value instead
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> sctp: [Deprecated]: syz-executor4 (pid 4240) Use of int in max_burst
> socket option.
> Use struct sctp_assoc_value instead
> sctp: [Deprecated]: syz-executor4 (pid 4240) Use of int in max_burst
> socket option.
> Use struct sctp_assoc_value instead
> IP: __rb_insert lib/rbtree.c:126 [inline]
> IP: rb_insert_color+0x17/0x190 lib/rbtree.c:452
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 4244 Comm: modprobe Not tainted 4.15.0-rc3-next-20171214+ #67
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> RIP: 0010:__rb_insert lib/rbtree.c:126 [inline]
> RIP: 0010:rb_insert_color+0x17/0x190 lib/rbtree.c:452
> RSP: 0018:ffffc900010a7c08 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff814ddcb9
> RDX: ffff8801ebedf988 RSI: ffff8801ebfd6400 RDI: ffff88021413a408
> RBP: ffffc900010a7c08 R08: 000000000002bcf8 R09: ffff88021413a400
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88021413a400
> R13: ffff8801ebedf990 R14: 00000000a34fc52a R15: ffff8801ebedf988
> FS:  00007f85a5155700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 00000001eaccd006 CR4: 00000000001606f0
> DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> Call Trace:
>  ext4_htree_store_dirent+0x122/0x160 fs/ext4/dir.c:488
>  htree_dirblock_to_tree+0x112/0x300 fs/ext4/namei.c:1019
>  ext4_htree_fill_tree+0xdf/0x410 fs/ext4/namei.c:1096
>  ext4_dx_readdir fs/ext4/dir.c:575 [inline]
>  ext4_readdir+0x8cf/0xd70 fs/ext4/dir.c:122
>  iterate_dir+0xb8/0x200 fs/readdir.c:51
>  SYSC_getdents fs/readdir.c:231 [inline]
>  SyS_getdents+0xcc/0x1b0 fs/readdir.c:212
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x7f85a4a45575
> RSP: 002b:00007ffc9b5be120 EFLAGS: 00000246 ORIG_RAX: 000000000000004e
> RAX: ffffffffffffffda RBX: 00007f85a4d23e98 RCX: 00007f85a4a45575
> RDX: 0000000000008000 RSI: 00005633094701e0 RDI: 0000000000000000
> RBP: 00007f85a4d23e40 R08: 00005633094701e0 R09: 00007f85a4d23e90
> R10: 0000000000000000 R11: 0000000000000246 R12: 00005633094701b0
> R13: 0000000000018e21 R14: 0000000000000000 R15: 0000000000000004
> Code: 48 85 d2 75 eb 5d c3 31 c0 5d c3 66 0f 1f 84 00 00 00 00 00 55
> 48 8b 17 48 89 e5 48 85 d2 0f 84 4c 01 00 00 48 8b 02 a8 01 75 5e
> <48> 8b 48 08 49 89 c0 48 39 d1 74 54 48 85 c9 74 09 f6 01 01 0f
> RIP: __rb_insert lib/rbtree.c:126 [inline] RSP: ffffc900010a7c08
> RIP: rb_insert_color+0x17/0x190 lib/rbtree.c:452 RSP: ffffc900010a7c08
> CR2: 0000000000000008
> BUG: unable to handle kernel paging request at 0000000100000001
> ---[ end trace c403bd3ebad2ccb0 ]---

The line number in lib/rbtree.c seems to be slightly off.  Looking at the
disassembly:

	ffffffff825b5ea0 <rb_insert_color>:
	ffffffff825b5ea0:       55                      push   %rbp
	ffffffff825b5ea1:       48 8b 17                mov    (%rdi),%rdx
	ffffffff825b5ea4:       48 89 e5                mov    %rsp,%rbp
	ffffffff825b5ea7:       48 85 d2                test   %rdx,%rdx
	ffffffff825b5eaa:       0f 84 4c 01 00 00       je     ffffffff825b5ffc <rb_insert_color+0x15c>
	ffffffff825b5eb0:       48 8b 02                mov    (%rdx),%rax
	ffffffff825b5eb3:       a8 01                   test   $0x1,%al
	ffffffff825b5eb5:       75 5e                   jne    ffffffff825b5f15 <rb_insert_color+0x75>
	ffffffff825b5eb7:       48 8b 48 08             mov    0x8(%rax),%rcx

It crashed on 'mov 0x8(%rax),%rcx' which corresponds to
'tmp = gparent->rb_right;' at lib/rbtree.c:131.  So 'parent' was the root node,
but its color was red, while it is supposed to be black.

No idea how that happened, but it's almost certainly not an ext4 bug.  In fact
there is another report of this same crash that has a different call trace:

	Call Trace:
	 key_alloc_serial security/keys/key.c:170 [inline]
	 key_alloc+0x54c/0x5b0 security/keys/key.c:319
	 keyring_alloc+0x4d/0xb0 security/keys/keyring.c:503
	 install_process_keyring_to_cred.part.3+0x38/0x80 security/keys/process_keys.c:192
	 install_process_keyring_to_cred security/keys/process_keys.c:634 [inline]
	 install_process_keyring security/keys/process_keys.c:217 [inline]
	 lookup_user_key+0x4ed/0x7c0 security/keys/process_keys.c:574
	 SYSC_add_key security/keys/keyctl.c:114 [inline]
	 SyS_add_key+0xec/0x260 security/keys/keyctl.c:62
	 entry_SYSCALL_64_fastpath+0x1f/0x96

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference in rb_insert_color
  2017-12-19 21:59 ` BUG: unable to handle kernel NULL pointer dereference in rb_insert_color Eric Biggers
@ 2017-12-20  7:50   ` Dmitry Vyukov
  2017-12-20  7:59     ` Eric Biggers
  2019-12-09 13:29   ` ppc64el kernel access of bad area (ext4_htree_store_dirent->rb_insert_color) Rafael David Tinoco
  1 sibling, 1 reply; 9+ messages in thread
From: Dmitry Vyukov @ 2017-12-20  7:50 UTC (permalink / raw)
  To: Eric Biggers
  Cc: syzbot, Andreas Dilger, linux-ext4, LKML, syzkaller-bugs,
	Theodore Ts'o

On Tue, Dec 19, 2017 at 10:59 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> On Tue, Dec 19, 2017 at 12:41:01AM -0800, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 6084b576dca2e898f5c101baef151f7bfdbb606d
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>>
>> Unfortunately, I don't have any reproducer for this bug yet.
>>
>>
>> sctp: [Deprecated]: syz-executor6 (pid 4202) Use of int in max_burst
>> socket option.
>> Use struct sctp_assoc_value instead
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> sctp: [Deprecated]: syz-executor4 (pid 4240) Use of int in max_burst
>> socket option.
>> Use struct sctp_assoc_value instead
>> sctp: [Deprecated]: syz-executor4 (pid 4240) Use of int in max_burst
>> socket option.
>> Use struct sctp_assoc_value instead
>> IP: __rb_insert lib/rbtree.c:126 [inline]
>> IP: rb_insert_color+0x17/0x190 lib/rbtree.c:452
>> PGD 0 P4D 0
>> Oops: 0000 [#1] SMP
>> Dumping ftrace buffer:
>>    (ftrace buffer empty)
>> Modules linked in:
>> CPU: 0 PID: 4244 Comm: modprobe Not tainted 4.15.0-rc3-next-20171214+ #67
>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>> BIOS Google 01/01/2011
>> RIP: 0010:__rb_insert lib/rbtree.c:126 [inline]
>> RIP: 0010:rb_insert_color+0x17/0x190 lib/rbtree.c:452
>> RSP: 0018:ffffc900010a7c08 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff814ddcb9
>> RDX: ffff8801ebedf988 RSI: ffff8801ebfd6400 RDI: ffff88021413a408
>> RBP: ffffc900010a7c08 R08: 000000000002bcf8 R09: ffff88021413a400
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88021413a400
>> R13: ffff8801ebedf990 R14: 00000000a34fc52a R15: ffff8801ebedf988
>> FS:  00007f85a5155700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000008 CR3: 00000001eaccd006 CR4: 00000000001606f0
>> DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>> Call Trace:
>>  ext4_htree_store_dirent+0x122/0x160 fs/ext4/dir.c:488
>>  htree_dirblock_to_tree+0x112/0x300 fs/ext4/namei.c:1019
>>  ext4_htree_fill_tree+0xdf/0x410 fs/ext4/namei.c:1096
>>  ext4_dx_readdir fs/ext4/dir.c:575 [inline]
>>  ext4_readdir+0x8cf/0xd70 fs/ext4/dir.c:122
>>  iterate_dir+0xb8/0x200 fs/readdir.c:51
>>  SYSC_getdents fs/readdir.c:231 [inline]
>>  SyS_getdents+0xcc/0x1b0 fs/readdir.c:212
>>  entry_SYSCALL_64_fastpath+0x1f/0x96
>> RIP: 0033:0x7f85a4a45575
>> RSP: 002b:00007ffc9b5be120 EFLAGS: 00000246 ORIG_RAX: 000000000000004e
>> RAX: ffffffffffffffda RBX: 00007f85a4d23e98 RCX: 00007f85a4a45575
>> RDX: 0000000000008000 RSI: 00005633094701e0 RDI: 0000000000000000
>> RBP: 00007f85a4d23e40 R08: 00005633094701e0 R09: 00007f85a4d23e90
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00005633094701b0
>> R13: 0000000000018e21 R14: 0000000000000000 R15: 0000000000000004
>> Code: 48 85 d2 75 eb 5d c3 31 c0 5d c3 66 0f 1f 84 00 00 00 00 00 55
>> 48 8b 17 48 89 e5 48 85 d2 0f 84 4c 01 00 00 48 8b 02 a8 01 75 5e
>> <48> 8b 48 08 49 89 c0 48 39 d1 74 54 48 85 c9 74 09 f6 01 01 0f
>> RIP: __rb_insert lib/rbtree.c:126 [inline] RSP: ffffc900010a7c08
>> RIP: rb_insert_color+0x17/0x190 lib/rbtree.c:452 RSP: ffffc900010a7c08
>> CR2: 0000000000000008
>> BUG: unable to handle kernel paging request at 0000000100000001
>> ---[ end trace c403bd3ebad2ccb0 ]---
>
> The line number in lib/rbtree.c seems to be slightly off.  Looking at the
> disassembly:
>
>         ffffffff825b5ea0 <rb_insert_color>:
>         ffffffff825b5ea0:       55                      push   %rbp
>         ffffffff825b5ea1:       48 8b 17                mov    (%rdi),%rdx
>         ffffffff825b5ea4:       48 89 e5                mov    %rsp,%rbp
>         ffffffff825b5ea7:       48 85 d2                test   %rdx,%rdx
>         ffffffff825b5eaa:       0f 84 4c 01 00 00       je     ffffffff825b5ffc <rb_insert_color+0x15c>
>         ffffffff825b5eb0:       48 8b 02                mov    (%rdx),%rax
>         ffffffff825b5eb3:       a8 01                   test   $0x1,%al
>         ffffffff825b5eb5:       75 5e                   jne    ffffffff825b5f15 <rb_insert_color+0x75>
>         ffffffff825b5eb7:       48 8b 48 08             mov    0x8(%rax),%rcx
>
> It crashed on 'mov 0x8(%rax),%rcx' which corresponds to
> 'tmp = gparent->rb_right;' at lib/rbtree.c:131.  So 'parent' was the root node,
> but its color was red, while it is supposed to be black.
>
> No idea how that happened, but it's almost certainly not an ext4 bug.  In fact
> there is another report of this same crash that has a different call trace:
>
>         Call Trace:
>          key_alloc_serial security/keys/key.c:170 [inline]
>          key_alloc+0x54c/0x5b0 security/keys/key.c:319
>          keyring_alloc+0x4d/0xb0 security/keys/keyring.c:503
>          install_process_keyring_to_cred.part.3+0x38/0x80 security/keys/process_keys.c:192
>          install_process_keyring_to_cred security/keys/process_keys.c:634 [inline]
>          install_process_keyring security/keys/process_keys.c:217 [inline]
>          lookup_user_key+0x4ed/0x7c0 security/keys/process_keys.c:574
>          SYSC_add_key security/keys/keyctl.c:114 [inline]
>          SyS_add_key+0xec/0x260 security/keys/keyctl.c:62
>          entry_SYSCALL_64_fastpath+0x1f/0x96


My first hypothesis for an non-explainable, non-reproducible
corruption would be a data race. Is there all locking in place?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference in rb_insert_color
  2017-12-20  7:50   ` Dmitry Vyukov
@ 2017-12-20  7:59     ` Eric Biggers
  2017-12-20  8:05       ` Dmitry Vyukov
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Biggers @ 2017-12-20  7:59 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Andreas Dilger, linux-ext4, LKML, syzkaller-bugs,
	Theodore Ts'o

On Wed, Dec 20, 2017 at 08:50:40AM +0100, Dmitry Vyukov wrote:
> >
> > The line number in lib/rbtree.c seems to be slightly off.  Looking at the
> > disassembly:
> >
> >         ffffffff825b5ea0 <rb_insert_color>:
> >         ffffffff825b5ea0:       55                      push   %rbp
> >         ffffffff825b5ea1:       48 8b 17                mov    (%rdi),%rdx
> >         ffffffff825b5ea4:       48 89 e5                mov    %rsp,%rbp
> >         ffffffff825b5ea7:       48 85 d2                test   %rdx,%rdx
> >         ffffffff825b5eaa:       0f 84 4c 01 00 00       je     ffffffff825b5ffc <rb_insert_color+0x15c>
> >         ffffffff825b5eb0:       48 8b 02                mov    (%rdx),%rax
> >         ffffffff825b5eb3:       a8 01                   test   $0x1,%al
> >         ffffffff825b5eb5:       75 5e                   jne    ffffffff825b5f15 <rb_insert_color+0x75>
> >         ffffffff825b5eb7:       48 8b 48 08             mov    0x8(%rax),%rcx
> >
> > It crashed on 'mov 0x8(%rax),%rcx' which corresponds to
> > 'tmp = gparent->rb_right;' at lib/rbtree.c:131.  So 'parent' was the root node,
> > but its color was red, while it is supposed to be black.
> >
> > No idea how that happened, but it's almost certainly not an ext4 bug.  In fact
> > there is another report of this same crash that has a different call trace:
> >
> >         Call Trace:
> >          key_alloc_serial security/keys/key.c:170 [inline]
> >          key_alloc+0x54c/0x5b0 security/keys/key.c:319
> >          keyring_alloc+0x4d/0xb0 security/keys/keyring.c:503
> >          install_process_keyring_to_cred.part.3+0x38/0x80 security/keys/process_keys.c:192
> >          install_process_keyring_to_cred security/keys/process_keys.c:634 [inline]
> >          install_process_keyring security/keys/process_keys.c:217 [inline]
> >          lookup_user_key+0x4ed/0x7c0 security/keys/process_keys.c:574
> >          SYSC_add_key security/keys/keyctl.c:114 [inline]
> >          SyS_add_key+0xec/0x260 security/keys/keyctl.c:62
> >          entry_SYSCALL_64_fastpath+0x1f/0x96
> 
> 
> My first hypothesis for an non-explainable, non-reproducible
> corruption would be a data race. Is there all locking in place?

It doesn't seem to be a locking problem.  In the ext4 case the rbtree is
associated with a struct file's dir_private_info, which is protected by
->f_pos_lock (taken early in sys_getdents()).  And in the keyrings case, the
rbtree is protected by key_serial_lock.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference in rb_insert_color
  2017-12-20  7:59     ` Eric Biggers
@ 2017-12-20  8:05       ` Dmitry Vyukov
  2018-01-30 21:43         ` Eric Biggers
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Vyukov @ 2017-12-20  8:05 UTC (permalink / raw)
  To: Eric Biggers
  Cc: syzbot, Andreas Dilger, linux-ext4, LKML, syzkaller-bugs,
	Theodore Ts'o

On Wed, Dec 20, 2017 at 8:59 AM, Eric Biggers <ebiggers3@gmail.com> wrote:
> On Wed, Dec 20, 2017 at 08:50:40AM +0100, Dmitry Vyukov wrote:
>> >
>> > The line number in lib/rbtree.c seems to be slightly off.  Looking at the
>> > disassembly:
>> >
>> >         ffffffff825b5ea0 <rb_insert_color>:
>> >         ffffffff825b5ea0:       55                      push   %rbp
>> >         ffffffff825b5ea1:       48 8b 17                mov    (%rdi),%rdx
>> >         ffffffff825b5ea4:       48 89 e5                mov    %rsp,%rbp
>> >         ffffffff825b5ea7:       48 85 d2                test   %rdx,%rdx
>> >         ffffffff825b5eaa:       0f 84 4c 01 00 00       je     ffffffff825b5ffc <rb_insert_color+0x15c>
>> >         ffffffff825b5eb0:       48 8b 02                mov    (%rdx),%rax
>> >         ffffffff825b5eb3:       a8 01                   test   $0x1,%al
>> >         ffffffff825b5eb5:       75 5e                   jne    ffffffff825b5f15 <rb_insert_color+0x75>
>> >         ffffffff825b5eb7:       48 8b 48 08             mov    0x8(%rax),%rcx
>> >
>> > It crashed on 'mov 0x8(%rax),%rcx' which corresponds to
>> > 'tmp = gparent->rb_right;' at lib/rbtree.c:131.  So 'parent' was the root node,
>> > but its color was red, while it is supposed to be black.
>> >
>> > No idea how that happened, but it's almost certainly not an ext4 bug.  In fact
>> > there is another report of this same crash that has a different call trace:
>> >
>> >         Call Trace:
>> >          key_alloc_serial security/keys/key.c:170 [inline]
>> >          key_alloc+0x54c/0x5b0 security/keys/key.c:319
>> >          keyring_alloc+0x4d/0xb0 security/keys/keyring.c:503
>> >          install_process_keyring_to_cred.part.3+0x38/0x80 security/keys/process_keys.c:192
>> >          install_process_keyring_to_cred security/keys/process_keys.c:634 [inline]
>> >          install_process_keyring security/keys/process_keys.c:217 [inline]
>> >          lookup_user_key+0x4ed/0x7c0 security/keys/process_keys.c:574
>> >          SYSC_add_key security/keys/keyctl.c:114 [inline]
>> >          SyS_add_key+0xec/0x260 security/keys/keyctl.c:62
>> >          entry_SYSCALL_64_fastpath+0x1f/0x96
>>
>>
>> My first hypothesis for an non-explainable, non-reproducible
>> corruption would be a data race. Is there all locking in place?
>
> It doesn't seem to be a locking problem.  In the ext4 case the rbtree is
> associated with a struct file's dir_private_info, which is protected by
> ->f_pos_lock (taken early in sys_getdents()).

But this won't prevent somebody else to mess with the struct without
taking the lock.

> And in the keyrings case, the
> rbtree is protected by key_serial_lock.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference in rb_insert_color
  2017-12-20  8:05       ` Dmitry Vyukov
@ 2018-01-30 21:43         ` Eric Biggers
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Biggers @ 2018-01-30 21:43 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Andreas Dilger, linux-ext4, LKML, syzkaller-bugs,
	Theodore Ts'o

On Wed, Dec 20, 2017 at 09:05:39AM +0100, Dmitry Vyukov wrote:
> On Wed, Dec 20, 2017 at 8:59 AM, Eric Biggers <ebiggers3@gmail.com> wrote:
> > On Wed, Dec 20, 2017 at 08:50:40AM +0100, Dmitry Vyukov wrote:
> >> >
> >> > The line number in lib/rbtree.c seems to be slightly off.  Looking at the
> >> > disassembly:
> >> >
> >> >         ffffffff825b5ea0 <rb_insert_color>:
> >> >         ffffffff825b5ea0:       55                      push   %rbp
> >> >         ffffffff825b5ea1:       48 8b 17                mov    (%rdi),%rdx
> >> >         ffffffff825b5ea4:       48 89 e5                mov    %rsp,%rbp
> >> >         ffffffff825b5ea7:       48 85 d2                test   %rdx,%rdx
> >> >         ffffffff825b5eaa:       0f 84 4c 01 00 00       je     ffffffff825b5ffc <rb_insert_color+0x15c>
> >> >         ffffffff825b5eb0:       48 8b 02                mov    (%rdx),%rax
> >> >         ffffffff825b5eb3:       a8 01                   test   $0x1,%al
> >> >         ffffffff825b5eb5:       75 5e                   jne    ffffffff825b5f15 <rb_insert_color+0x75>
> >> >         ffffffff825b5eb7:       48 8b 48 08             mov    0x8(%rax),%rcx
> >> >
> >> > It crashed on 'mov 0x8(%rax),%rcx' which corresponds to
> >> > 'tmp = gparent->rb_right;' at lib/rbtree.c:131.  So 'parent' was the root node,
> >> > but its color was red, while it is supposed to be black.
> >> >
> >> > No idea how that happened, but it's almost certainly not an ext4 bug.  In fact
> >> > there is another report of this same crash that has a different call trace:
> >> >
> >> >         Call Trace:
> >> >          key_alloc_serial security/keys/key.c:170 [inline]
> >> >          key_alloc+0x54c/0x5b0 security/keys/key.c:319
> >> >          keyring_alloc+0x4d/0xb0 security/keys/keyring.c:503
> >> >          install_process_keyring_to_cred.part.3+0x38/0x80 security/keys/process_keys.c:192
> >> >          install_process_keyring_to_cred security/keys/process_keys.c:634 [inline]
> >> >          install_process_keyring security/keys/process_keys.c:217 [inline]
> >> >          lookup_user_key+0x4ed/0x7c0 security/keys/process_keys.c:574
> >> >          SYSC_add_key security/keys/keyctl.c:114 [inline]
> >> >          SyS_add_key+0xec/0x260 security/keys/keyctl.c:62
> >> >          entry_SYSCALL_64_fastpath+0x1f/0x96
> >>
> >>
> >> My first hypothesis for an non-explainable, non-reproducible
> >> corruption would be a data race. Is there all locking in place?
> >
> > It doesn't seem to be a locking problem.  In the ext4 case the rbtree is
> > associated with a struct file's dir_private_info, which is protected by
> > ->f_pos_lock (taken early in sys_getdents()).
> 
> But this won't prevent somebody else to mess with the struct without
> taking the lock.
> 
> > And in the keyrings case, the
> > rbtree is protected by key_serial_lock.

Invalidating this bug since it hasn't been seen again, and it was reported while
KASAN was accidentally disabled in the syzbot config due to a change to the
kconfig menus in linux-next (so this crash was probably caused by slab
corruption elsewhere).

#syz invalid

^ permalink raw reply	[flat|nested] 9+ messages in thread

* ppc64el kernel access of bad area (ext4_htree_store_dirent->rb_insert_color)
  2017-12-19 21:59 ` BUG: unable to handle kernel NULL pointer dereference in rb_insert_color Eric Biggers
  2017-12-20  7:50   ` Dmitry Vyukov
@ 2019-12-09 13:29   ` Rafael David Tinoco
  2019-12-09 13:46     ` Dmitry Vyukov
  2019-12-10  2:01     ` Theodore Y. Ts'o
  1 sibling, 2 replies; 9+ messages in thread
From: Rafael David Tinoco @ 2019-12-09 13:29 UTC (permalink / raw)
  To: ebiggers3
  Cc: adilger.kernel, bot+eb13811afcefe99cfe45081054e7883f569f949d,
	linux-ext4, linux-kernel, syzkaller-bugs, tytso

It looks like the same stacktrace that was reported in this thread. This has
been reported to ppc64el AND we got a reproducer (ocfs2-tools autopkgtests).

[ 85.605850] Faulting instruction address: 0xc000000000e81168
[ 85.605901] Oops: Kernel access of bad area, sig: 11 [#1]
[ 85.605970] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 85.606029] Modules linked in: ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue iptable_mangle xt_TCPMSS xt_tcpudp bpfilter dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c crc32c_vpmsum virtio_net virtio_blk net_failover failover
[ 85.606291] CPU: 0 PID: 1 Comm: systemd Not tainted 5.3.0-18-generic #19-Ubuntu
[ 85.606350] NIP: c000000000e81168 LR: c00000000054f240 CTR: 0000000000000000
[ 85.606410] REGS: c00000005a3e3700 TRAP: 0300 Not tainted (5.3.0-18-generic)
[ 85.606469] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28024448 XER: 00000000
[ 85.606531] CFAR: 0000701f9806f638 DAR: 0000000001744098 DSISR: 40000000 IRQMASK: 0
[ 85.606531] GPR00: 0000000000007374 c00000005a3e3990 c0000000019c9100 c00000004fe462a8
[ 85.606531] GPR04: c00000005856d840 000000000000000e 0000000074656772 c00000004fe4a568
[ 85.606531] GPR08: 0000000000000000 c000000058568004 0000000001744090 0000000000000000
[ 85.606531] GPR12: 00000000e8086002 c000000001d60000 00007fffddd522d0 0000000000000000
[ 85.606531] GPR16: 0000000000000000 0000000000000000 0000000000000000 c00000000755e07c
[ 85.606531] GPR20: c0000000598caca8 c00000005a3e3a58 0000000000000000 c000000058292f00
[ 85.606531] GPR24: c000000000eea710 0000000000000000 c00000005856d840 c00000000755e074
[ 85.606531] GPR28: 000000006518907d c00000005a3e3a68 c00000004fe4b160 00000000027c47b6
[ 85.607079] NIP [c000000000e81168] rb_insert_color+0x18/0x1c0
[ 85.607137] LR [c00000000054f240] ext4_htree_store_dirent+0x140/0x1c0
[ 85.607186] Call Trace:
[ 85.607208] [c00000005a3e3990] [c00000000054f158] ext4_htree_store_dirent+0x58/0x1c0 (unreliable)
[ 85.607279] [c00000005a3e39e0] [c000000000594cd8] htree_dirblock_to_tree+0x1b8/0x380
[ 85.607340] [c00000005a3e3b00] [c0000000005962c0] ext4_htree_fill_tree+0xc0/0x3f0
[ 85.607401] [c00000005a3e3c00] [c00000000054ebe4] ext4_readdir+0x814/0xce0
[ 85.607459] [c00000005a3e3d40] [c000000000472d6c] iterate_dir+0x1fc/0x280
[ 85.607511] [c00000005a3e3d90] [c0000000004746f0] ksys_getdents64+0xa0/0x1f0
[ 85.607572] [c00000005a3e3e00] [c000000000474868] sys_getdents64+0x28/0x130
[ 85.607622] [c00000005a3e3e20] [c00000000000b388] system_call+0x5c/0x70
[ 85.607672] Instruction dump:
[ 85.607703] 4082ffe8 4e800020 38600000 4e800020 60000000 60000000 e9230000 2c290000
[ 85.607764] 4182018c e9490000 71480001 4c820020 <e90a0008> 7c284840 2fa80000 4182006c
[ 85.607827] ---[ end trace cfc53af0f8d62cef ]---
[ 85.610600]
[ 86.611522] BUG: Unable to handle kernel data access at 0xc000030058567eff
[ 86.611604] Faulting instruction address: 0xc000000000403aa8
[ 86.611656] Oops: Kernel access of bad area, sig: 11 [#2]
[ 86.611697] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 86.611748] Modules linked in: ocfs2 quota_tr

Thread from beginning 2018, so I guess this issue is pretty intermittent but
might exist, and, perhaps, its related to specific arches/machines ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ppc64el kernel access of bad area (ext4_htree_store_dirent->rb_insert_color)
  2019-12-09 13:29   ` ppc64el kernel access of bad area (ext4_htree_store_dirent->rb_insert_color) Rafael David Tinoco
@ 2019-12-09 13:46     ` Dmitry Vyukov
  2019-12-10  2:01     ` Theodore Y. Ts'o
  1 sibling, 0 replies; 9+ messages in thread
From: Dmitry Vyukov @ 2019-12-09 13:46 UTC (permalink / raw)
  To: Rafael David Tinoco
  Cc: Eric Biggers, Andreas Dilger, syzbot, linux-ext4, LKML,
	syzkaller-bugs, Theodore Ts'o

On Mon, Dec 9, 2019 at 2:29 PM Rafael David Tinoco
<rafaeldtinoco@ubuntu.com> wrote:
>
> It looks like the same stacktrace that was reported in this thread. This has
> been reported to ppc64el AND we got a reproducer (ocfs2-tools autopkgtests).
>
> [ 85.605850] Faulting instruction address: 0xc000000000e81168
> [ 85.605901] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 85.605970] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [ 85.606029] Modules linked in: ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue iptable_mangle xt_TCPMSS xt_tcpudp bpfilter dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c crc32c_vpmsum virtio_net virtio_blk net_failover failover
> [ 85.606291] CPU: 0 PID: 1 Comm: systemd Not tainted 5.3.0-18-generic #19-Ubuntu
> [ 85.606350] NIP: c000000000e81168 LR: c00000000054f240 CTR: 0000000000000000
> [ 85.606410] REGS: c00000005a3e3700 TRAP: 0300 Not tainted (5.3.0-18-generic)
> [ 85.606469] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28024448 XER: 00000000
> [ 85.606531] CFAR: 0000701f9806f638 DAR: 0000000001744098 DSISR: 40000000 IRQMASK: 0
> [ 85.606531] GPR00: 0000000000007374 c00000005a3e3990 c0000000019c9100 c00000004fe462a8
> [ 85.606531] GPR04: c00000005856d840 000000000000000e 0000000074656772 c00000004fe4a568
> [ 85.606531] GPR08: 0000000000000000 c000000058568004 0000000001744090 0000000000000000
> [ 85.606531] GPR12: 00000000e8086002 c000000001d60000 00007fffddd522d0 0000000000000000
> [ 85.606531] GPR16: 0000000000000000 0000000000000000 0000000000000000 c00000000755e07c
> [ 85.606531] GPR20: c0000000598caca8 c00000005a3e3a58 0000000000000000 c000000058292f00
> [ 85.606531] GPR24: c000000000eea710 0000000000000000 c00000005856d840 c00000000755e074
> [ 85.606531] GPR28: 000000006518907d c00000005a3e3a68 c00000004fe4b160 00000000027c47b6
> [ 85.607079] NIP [c000000000e81168] rb_insert_color+0x18/0x1c0
> [ 85.607137] LR [c00000000054f240] ext4_htree_store_dirent+0x140/0x1c0
> [ 85.607186] Call Trace:
> [ 85.607208] [c00000005a3e3990] [c00000000054f158] ext4_htree_store_dirent+0x58/0x1c0 (unreliable)
> [ 85.607279] [c00000005a3e39e0] [c000000000594cd8] htree_dirblock_to_tree+0x1b8/0x380
> [ 85.607340] [c00000005a3e3b00] [c0000000005962c0] ext4_htree_fill_tree+0xc0/0x3f0
> [ 85.607401] [c00000005a3e3c00] [c00000000054ebe4] ext4_readdir+0x814/0xce0
> [ 85.607459] [c00000005a3e3d40] [c000000000472d6c] iterate_dir+0x1fc/0x280
> [ 85.607511] [c00000005a3e3d90] [c0000000004746f0] ksys_getdents64+0xa0/0x1f0
> [ 85.607572] [c00000005a3e3e00] [c000000000474868] sys_getdents64+0x28/0x130
> [ 85.607622] [c00000005a3e3e20] [c00000000000b388] system_call+0x5c/0x70
> [ 85.607672] Instruction dump:
> [ 85.607703] 4082ffe8 4e800020 38600000 4e800020 60000000 60000000 e9230000 2c290000
> [ 85.607764] 4182018c e9490000 71480001 4c820020 <e90a0008> 7c284840 2fa80000 4182006c
> [ 85.607827] ---[ end trace cfc53af0f8d62cef ]---
> [ 85.610600]
> [ 86.611522] BUG: Unable to handle kernel data access at 0xc000030058567eff
> [ 86.611604] Faulting instruction address: 0xc000000000403aa8
> [ 86.611656] Oops: Kernel access of bad area, sig: 11 [#2]
> [ 86.611697] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [ 86.611748] Modules linked in: ocfs2 quota_tr
>
> Thread from beginning 2018, so I guess this issue is pretty intermittent but
> might exist, and, perhaps, its related to specific arches/machines ?

FTR, here is the original thread/bug (at least my email client did not
thread them together):
https://groups.google.com/g/syzkaller-bugs/c/YBhhSkrImIM/m/3HMv_dFUCwAJ
https://syzkaller.appspot.com/bug?extid=eb13811afcefe99cfe45081054e7883f569f949d

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ppc64el kernel access of bad area (ext4_htree_store_dirent->rb_insert_color)
  2019-12-09 13:29   ` ppc64el kernel access of bad area (ext4_htree_store_dirent->rb_insert_color) Rafael David Tinoco
  2019-12-09 13:46     ` Dmitry Vyukov
@ 2019-12-10  2:01     ` Theodore Y. Ts'o
  2019-12-12 12:25       ` Rafael David Tinoco
  1 sibling, 1 reply; 9+ messages in thread
From: Theodore Y. Ts'o @ 2019-12-10  2:01 UTC (permalink / raw)
  To: Rafael David Tinoco
  Cc: ebiggers3, adilger.kernel,
	bot+eb13811afcefe99cfe45081054e7883f569f949d, linux-ext4,
	linux-kernel, syzkaller-bugs

On Mon, Dec 09, 2019 at 10:29:14AM -0300, Rafael David Tinoco wrote:
> It looks like the same stacktrace that was reported in this thread. This has
> been reported to ppc64el AND we got a reproducer (ocfs2-tools autopkgtests).

Can you share your reproducer?  Is it a super-simple reproducer that
doesn't require a complex setup and which can be triggered in some
kind of virtual machine (under KVM, etc.)?

> Thread from beginning 2018, so I guess this issue is pretty intermittent but
> might exist, and, perhaps, its related to specific arches/machines ?

What syzbot reported (a) had no reproducer, (b) only reproduced twice
on linux-next in 2017, and never since.  So if you're seeing something
in 2019 in ppc64el, it may not be the same issue.

   	   	       	       	      - Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ppc64el kernel access of bad area (ext4_htree_store_dirent->rb_insert_color)
  2019-12-10  2:01     ` Theodore Y. Ts'o
@ 2019-12-12 12:25       ` Rafael David Tinoco
  0 siblings, 0 replies; 9+ messages in thread
From: Rafael David Tinoco @ 2019-12-12 12:25 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: ebiggers3, adilger.kernel,
	bot+eb13811afcefe99cfe45081054e7883f569f949d, linux-ext4,
	linux-kernel, syzkaller-bugs



>> It looks like the same stacktrace that was reported in this thread. This has
>> been reported to ppc64el AND we got a reproducer (ocfs2-tools autopkgtests).
> Can you share your reproducer?  Is it a super-simple reproducer that
> doesn't require a complex setup and which can be triggered in some
> kind of virtual machine (under KVM, etc.)?

Yep, its the autopkgtests (debian/tests/*) from ocfs2-tools in bare
metal ppc64el. A bunch of "mkfs.ocfs2, fsck.ocfs2, debugfs.ocfs2,
mount.ocfs2" commands testing package. I got access to same HW that
generated the trace, I'll generate a kdump and share more data soon.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-12-12 12:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <94eb2c1170ce36bd770560ad6d3a@google.com>
2017-12-19 21:59 ` BUG: unable to handle kernel NULL pointer dereference in rb_insert_color Eric Biggers
2017-12-20  7:50   ` Dmitry Vyukov
2017-12-20  7:59     ` Eric Biggers
2017-12-20  8:05       ` Dmitry Vyukov
2018-01-30 21:43         ` Eric Biggers
2019-12-09 13:29   ` ppc64el kernel access of bad area (ext4_htree_store_dirent->rb_insert_color) Rafael David Tinoco
2019-12-09 13:46     ` Dmitry Vyukov
2019-12-10  2:01     ` Theodore Y. Ts'o
2019-12-12 12:25       ` Rafael David Tinoco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).