All of lore.kernel.org
 help / color / mirror / Atom feed
* general protection fault in kernfs_kill_sb
@ 2018-04-01 17:01 syzbot
  2018-04-02 10:40 ` Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: syzbot @ 2018-04-01 17:01 UTC (permalink / raw)
  To: gregkh, kstewart, linux-kernel, linux-mm, pombredanne,
	syzkaller-bugs, tglx

Hello,

syzbot hit the following crash on bpf-next commit
7828f20e3779e4e85e55371e0e43f5006a15fb41 (Sat Mar 31 00:17:57 2018 +0000)
Merge branch 'bpf-cgroup-bind-connect'
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=151de3f2be6b40ac8026

So far this crash happened 3 times on bpf-next.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4857382450495488
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=4644052230209536
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5798498637185024
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=5909223872832634926
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+151de3f2be6b40ac8026@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

kasan: GPF could be caused by NULL-ptr deref or user memory access
  should_failslab+0xec/0x120 mm/failslab.c:32
  slab_pre_alloc_hook mm/slab.h:422 [inline]
  slab_alloc mm/slab.c:3365 [inline]
  __do_kmalloc mm/slab.c:3703 [inline]
  __kmalloc+0x63/0x760 mm/slab.c:3714
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
  kmalloc include/linux/slab.h:517 [inline]
  kzalloc include/linux/slab.h:701 [inline]
  register_shrinker+0x10e/0x2d0 mm/vmscan.c:268
Modules linked in:
CPU: 1 PID: 4471 Comm: syzkaller129261 Not tainted 4.16.0-rc6+ #43
  sget_userns+0xbbf/0xe40 fs/super.c:520
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51
RSP: 0018:ffff8801ae017658 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8801d97a6e98 RDI: ffff8801d97a6ea0
RBP: ffff8801ae017670 R08: ffffffff81d39d22 R09: 0000000000000004
R10: ffff8801ae017670 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8801d91dec00 R14: ffff8801ae017700 R15: ffff8801d97a6e98
FS:  0000000001569880(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
  kernfs_mount_ns+0x13d/0x8b0 fs/kernfs/mount.c:320
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000006d0188 CR3: 00000001da40c005 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  sysfs_mount+0xc2/0x1c0 fs/sysfs/mount.c:36
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  __list_del_entry include/linux/list.h:117 [inline]
  list_del include/linux/list.h:125 [inline]
  kernfs_kill_sb+0x9e/0x330 fs/kernfs/mount.c:361
  mount_fs+0x66/0x2d0 fs/super.c:1222
  vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
  sysfs_kill_sb+0x22/0x40 fs/sysfs/mount.c:50
  vfs_kern_mount fs/namespace.c:2509 [inline]
  do_new_mount fs/namespace.c:2512 [inline]
  do_mount+0xea4/0x2bb0 fs/namespace.c:2842
  deactivate_locked_super+0x88/0xd0 fs/super.c:312
  sget_userns+0xbda/0xe40 fs/super.c:522
  SYSC_mount fs/namespace.c:3058 [inline]
  SyS_mount+0xab/0x120 fs/namespace.c:3035
  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
  kernfs_mount_ns+0x13d/0x8b0 fs/kernfs/mount.c:320
  sysfs_mount+0xc2/0x1c0 fs/sysfs/mount.c:36
  mount_fs+0x66/0x2d0 fs/super.c:1222
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
  vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
RIP: 0033:0x442609
RSP: 002b:00007fff40a278e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000442609
RDX: 0000000020000140 RSI: 0000000020000040 RDI: 0000000020000000
RBP: 00007fff40a28190 R08: 00000000200002c0 R09: 0000000300000000
  vfs_kern_mount fs/namespace.c:2509 [inline]
  do_new_mount fs/namespace.c:2512 [inline]
  do_mount+0xea4/0x2bb0 fs/namespace.c:2842
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000003 R14: 0000000000001380 R15: 00007fff40a27a28
  SYSC_mount fs/namespace.c:3058 [inline]
  SyS_mount+0xab/0x120 fs/namespace.c:3035
  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x442609
RSP: 002b:00007fff40a278e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000442609
RDX: 0000000020000140 RSI: 0000000020000040 RDI: 0000000020000000
RBP: 00007fff40a28190 R08: 00000000200002c0 R09: 0000000300000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000003 R14: 0000000000001380 R15: 00007fff40a27a28
Code: 00 00 00 00 ad de 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 89  
da 48 39 c3 74 65 48 c1 ea 03 48 b8 00 00 00 00 00 fc ff df <80> 3c 02 00  
75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8
RIP: __list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 RSP:  
ffff8801ae017658
---[ end trace b14d521943ecadbd ]---


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: general protection fault in kernfs_kill_sb
  2018-04-01 17:01 general protection fault in kernfs_kill_sb syzbot
@ 2018-04-02 10:40 ` Tetsuo Handa
  2018-04-02 14:34   ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2018-04-02 10:40 UTC (permalink / raw)
  To: Alexander Viro
  Cc: syzbot, gregkh, kstewart, linux-kernel, linux-mm, pombredanne,
	syzkaller-bugs, tglx, linux-fsdevel

On 2018/04/02 2:01, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on bpf-next commit
> 7828f20e3779e4e85e55371e0e43f5006a15fb41 (Sat Mar 31 00:17:57 2018 +0000)
> Merge branch 'bpf-cgroup-bind-connect'
> syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=151de3f2be6b40ac8026
> 
> So far this crash happened 3 times on bpf-next.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4857382450495488
> syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=4644052230209536
> Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5798498637185024
> Kernel config: https://syzkaller.appspot.com/x/.config?id=5909223872832634926
> compiler: gcc (GCC) 7.1.1 20170620

Al, I think this is another example of crash triggered by
commit 9ee332d99e4d5a97 ("sget(): handle failures of register_shrinker()").

----------------------------------------
[   23.407545] FAULT_INJECTION: forcing a failure.
[   23.407545] name failslab, interval 1, probability 0, space 0, times 1
[   23.414735] CPU: 1 PID: 4471 Comm: syzkaller129261 Not tainted 4.16.0-rc6+ #43
[   23.433147] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[   23.442491] Call Trace:
[   23.445074]  dump_stack+0x194/0x24d
[   23.448689]  ? arch_local_irq_restore+0x53/0x53
[   23.453347]  ? find_held_lock+0x35/0x1d0
[   23.457401]  should_fail+0x8c0/0xa40
[   23.461100]  ? __list_lru_init+0x352/0x750
[   23.465331]  ? fault_create_debugfs_attr+0x1f0/0x1f0
[   23.470453]  ? find_held_lock+0x35/0x1d0
[   23.474503]  ? __lock_is_held+0xb6/0x140
[   23.478556]  ? check_same_owner+0x320/0x320
[   23.482870]  ? rcu_note_context_switch+0x710/0x710
[   23.487785]  ? find_held_lock+0x35/0x1d0
[   23.491931]  should_failslab+0xec/0x120
[   23.495895]  __kmalloc+0x63/0x760
[   23.499332]  ? lock_downgrade+0x980/0x980
[   23.503469]  ? _raw_spin_unlock+0x22/0x30
[   23.507605]  ? register_shrinker+0x10e/0x2d0
[   23.511999]  ? trace_event_raw_event_module_request+0x320/0x320
[   23.518044]  register_shrinker+0x10e/0x2d0
[   23.522265]  ? __bpf_trace_mm_vmscan_wakeup_kswapd+0x40/0x40
[   23.528051]  ? memcpy+0x45/0x50
[   23.531588]  sget_userns+0xbbf/0xe40
[   23.535296]  ? kernfs_sop_show_path+0x190/0x190
[   23.539959]  ? kernfs_sop_show_options+0x180/0x180
[   23.544876]  ? destroy_unused_super.part.6+0xd0/0xd0
[   23.549972]  ? check_same_owner+0x320/0x320
[   23.554281]  ? rcu_pm_notify+0xc0/0xc0
[   23.558161]  ? rcu_read_lock_sched_held+0x108/0x120
[   23.563168]  ? kmem_cache_alloc_trace+0x459/0x740
[   23.567997]  ? lock_downgrade+0x980/0x980
[   23.572142]  kernfs_mount_ns+0x13d/0x8b0
[   23.576192]  ? kernfs_super_ns+0x70/0x70
[   23.580244]  sysfs_mount+0xc2/0x1c0
----------------------------------------

That commit assumes that calling kill_sb() from deactivate_locked_super(s)
without corresponding fill_super() is safe. We have so far crashed with
rpc_mount() and kernfs_mount_ns(). Is that really safe?

Also, I think

----------------------------------------
struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
                               struct kernfs_root *root, unsigned long magic,
                               bool *new_sb_created, const void *ns)
{
(...snipped...)
	if (!sb->s_root) {
		struct kernfs_super_info *info = kernfs_info(sb);

		error = kernfs_fill_super(sb, magic);
		if (error) {
			deactivate_locked_super(sb); // <= this call
			return ERR_PTR(error);
		}
		sb->s_flags |= SB_ACTIVE;

		mutex_lock(&kernfs_mutex);
		list_add(&info->node, &root->supers);
		mutex_unlock(&kernfs_mutex);
	}
(...snipped...)
}
----------------------------------------

is not safe, for list_del() is called via kill_sb() without
corresponding list_add().

----------------------------------------
void kernfs_kill_sb(struct super_block *sb)
{
	struct kernfs_super_info *info = kernfs_info(sb);

	mutex_lock(&kernfs_mutex);
	list_del(&info->node); // <= NULL pointer dereference
	mutex_unlock(&kernfs_mutex);

	/*
	 * Remove the superblock from fs_supers/s_instances
	 * so we can't find it, before freeing kernfs_super_info.
	 */
	kill_anon_super(sb);
	kfree(info);
}
----------------------------------------

> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+151de3f2be6b40ac8026@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for details.
> If you forward the report, please keep this part and the footer.
> 
> kasan: GPF could be caused by NULL-ptr deref or user memory access
>  should_failslab+0xec/0x120 mm/failslab.c:32
>  slab_pre_alloc_hook mm/slab.h:422 [inline]
>  slab_alloc mm/slab.c:3365 [inline]
>  __do_kmalloc mm/slab.c:3703 [inline]
>  __kmalloc+0x63/0x760 mm/slab.c:3714
> general protection fault: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
>  kmalloc include/linux/slab.h:517 [inline]
>  kzalloc include/linux/slab.h:701 [inline]
>  register_shrinker+0x10e/0x2d0 mm/vmscan.c:268
> Modules linked in:
> CPU: 1 PID: 4471 Comm: syzkaller129261 Not tainted 4.16.0-rc6+ #43
>  sget_userns+0xbbf/0xe40 fs/super.c:520
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51
> RSP: 0018:ffff8801ae017658 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff8801d97a6e98 RDI: ffff8801d97a6ea0
> RBP: ffff8801ae017670 R08: ffffffff81d39d22 R09: 0000000000000004
> R10: ffff8801ae017670 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff8801d91dec00 R14: ffff8801ae017700 R15: ffff8801d97a6e98
> FS:  0000000001569880(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
>  kernfs_mount_ns+0x13d/0x8b0 fs/kernfs/mount.c:320
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000006d0188 CR3: 00000001da40c005 CR4: 00000000001606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  sysfs_mount+0xc2/0x1c0 fs/sysfs/mount.c:36
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  __list_del_entry include/linux/list.h:117 [inline]
>  list_del include/linux/list.h:125 [inline]
>  kernfs_kill_sb+0x9e/0x330 fs/kernfs/mount.c:361
>  mount_fs+0x66/0x2d0 fs/super.c:1222
>  vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
>  sysfs_kill_sb+0x22/0x40 fs/sysfs/mount.c:50
>  vfs_kern_mount fs/namespace.c:2509 [inline]
>  do_new_mount fs/namespace.c:2512 [inline]
>  do_mount+0xea4/0x2bb0 fs/namespace.c:2842
>  deactivate_locked_super+0x88/0xd0 fs/super.c:312
>  sget_userns+0xbda/0xe40 fs/super.c:522
>  SYSC_mount fs/namespace.c:3058 [inline]
>  SyS_mount+0xab/0x120 fs/namespace.c:3035
>  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>  kernfs_mount_ns+0x13d/0x8b0 fs/kernfs/mount.c:320
>  sysfs_mount+0xc2/0x1c0 fs/sysfs/mount.c:36
>  mount_fs+0x66/0x2d0 fs/super.c:1222
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
>  vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
> RIP: 0033:0x442609
> RSP: 002b:00007fff40a278e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000442609
> RDX: 0000000020000140 RSI: 0000000020000040 RDI: 0000000020000000
> RBP: 00007fff40a28190 R08: 00000000200002c0 R09: 0000000300000000
>  vfs_kern_mount fs/namespace.c:2509 [inline]
>  do_new_mount fs/namespace.c:2512 [inline]
>  do_mount+0xea4/0x2bb0 fs/namespace.c:2842
> R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
> R13: 0000000000000003 R14: 0000000000001380 R15: 00007fff40a27a28
>  SYSC_mount fs/namespace.c:3058 [inline]
>  SyS_mount+0xab/0x120 fs/namespace.c:3035
>  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x442609
> RSP: 002b:00007fff40a278e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000442609
> RDX: 0000000020000140 RSI: 0000000020000040 RDI: 0000000020000000
> RBP: 00007fff40a28190 R08: 00000000200002c0 R09: 0000000300000000
> R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
> R13: 0000000000000003 R14: 0000000000001380 R15: 00007fff40a27a28
> Code: 00 00 00 00 ad de 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 89 da 48 39 c3 74 65 48 c1 ea 03 48 b8 00 00 00 00 00 fc ff df <80> 3c 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8
> RIP: __list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 RSP: ffff8801ae017658
> ---[ end trace b14d521943ecadbd ]---
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug report.
> Note: all commands must start from beginning of the line in the email body.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: general protection fault in kernfs_kill_sb
  2018-04-02 10:40 ` Tetsuo Handa
@ 2018-04-02 14:34   ` Al Viro
  2018-04-20  2:44     ` Eric Biggers
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2018-04-02 14:34 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: syzbot, gregkh, kstewart, linux-kernel, linux-mm, pombredanne,
	syzkaller-bugs, tglx, linux-fsdevel

On Mon, Apr 02, 2018 at 07:40:22PM +0900, Tetsuo Handa wrote:

> That commit assumes that calling kill_sb() from deactivate_locked_super(s)
> without corresponding fill_super() is safe. We have so far crashed with
> rpc_mount() and kernfs_mount_ns(). Is that really safe?

	Consider the case when fill_super() returns an error immediately.
It is exactly the same situation.  And ->kill_sb() *is* called in cases
when fill_super() has failed.  Always had been - it's much less boilerplate
that way.

	deactivate_locked_super() on that failure exit is the least painful
variant, unfortunately.

	Filesystems with ->kill_sb() instances that rely upon something
done between sget() and the first failure exit after it need to be fixed.
And yes, that should've been spotted back then.  Sorry.

Fortunately, we don't have many of those - kill_{block,litter,anon}_super()
are safe and those are the majority.  Looking through the rest uncovers
some bugs; so far all I've seen were already there.  Note that normally
we have something like
static void affs_kill_sb(struct super_block *sb)
{
        struct affs_sb_info *sbi = AFFS_SB(sb);
        kill_block_super(sb);
        if (sbi) {
                affs_free_bitmap(sb);
                affs_brelse(sbi->s_root_bh);
                kfree(sbi->s_prefix);
                mutex_destroy(&sbi->s_bmlock);
                kfree(sbi);
        }
}
which basically does one of the safe ones augmented with something that
takes care *not* to assume that e.g. ->s_fs_info has been allocated.
Not everyone does, though:

jffs2_fill_super():
        c = kzalloc(sizeof(*c), GFP_KERNEL);
        if (!c)
                return -ENOMEM;
in the very beginning.  So we can return from it with NULL ->s_fs_info.
Now, consider
        struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
        if (!(sb->s_flags & MS_RDONLY))
                jffs2_stop_garbage_collect_thread(c);
in jffs2_kill_sb() and
void jffs2_stop_garbage_collect_thread(struct jffs2_sb_info *c)
{
        int wait = 0;
        spin_lock(&c->erase_completion_lock);
        if (c->gc_task) {

IOW, fail that kzalloc() (or, indeed, an allocation in register_shrinker())
and eat an oops.  Always had been there, always hard to hit without
fault injectors and fortunately trivial to fix.

Similar in nfs_kill_super() calling nfs_free_server().
Similar in v9fs_kill_super() with v9fs_session_cancel()/v9fs_session_close() calls.
Similar in hypfs_kill_super(), afs_kill_super(), btrfs_kill_super(), cifs_kill_sb()
(all trivial to fix)

Aha... nfsd_umount() is a new regression.

orangefs: old, trivial to fix.

cgroup_kill_sb(): old, hopefully easy to fix.  Note that kernfs_root_from_sb()
can bloody well return NULL, making cgroup_root_from_kf() oops.  Always had been
there.

AFAICS, after discarding the instances that do the right thing we are left with:
hypfs_kill_super, rdt_kill_sb, v9fs_kill_super, afs_kill_super, btrfs_kill_super,
cifs_kill_sb, jffs2_kill_sb, nfs_kill_super, nfsd_umount, orangefs_kill_sb,
proc_kill_sb, sysfs_kill_sb, cgroup_kill_sb, rpc_kill_sb.

Out of those, nfsd_umount(), proc_kill_sb() and rpc_kill_sb() are regressions.
So are rdt_kill_sb() and sysfs_kill_sb() (victims of the issue you've spotted
in kernfs_kill_sb()).  The rest are old (and I wonder if syzbot had been
catching those - they are also dependent upon a specific allocation failing
at the right time).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: general protection fault in kernfs_kill_sb
  2018-04-02 14:34   ` Al Viro
@ 2018-04-20  2:44     ` Eric Biggers
  2018-04-20  3:34       ` Eric Biggers
  2018-05-02 10:37       ` Tetsuo Handa
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Biggers @ 2018-04-20  2:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Tetsuo Handa, syzbot, gregkh, kstewart, linux-kernel, linux-mm,
	pombredanne, syzkaller-bugs, tglx, linux-fsdevel

On Mon, Apr 02, 2018 at 03:34:15PM +0100, Al Viro wrote:
> On Mon, Apr 02, 2018 at 07:40:22PM +0900, Tetsuo Handa wrote:
> 
> > That commit assumes that calling kill_sb() from deactivate_locked_super(s)
> > without corresponding fill_super() is safe. We have so far crashed with
> > rpc_mount() and kernfs_mount_ns(). Is that really safe?
> 
> 	Consider the case when fill_super() returns an error immediately.
> It is exactly the same situation.  And ->kill_sb() *is* called in cases
> when fill_super() has failed.  Always had been - it's much less boilerplate
> that way.
> 
> 	deactivate_locked_super() on that failure exit is the least painful
> variant, unfortunately.
> 
> 	Filesystems with ->kill_sb() instances that rely upon something
> done between sget() and the first failure exit after it need to be fixed.
> And yes, that should've been spotted back then.  Sorry.
> 
> Fortunately, we don't have many of those - kill_{block,litter,anon}_super()
> are safe and those are the majority.  Looking through the rest uncovers
> some bugs; so far all I've seen were already there.  Note that normally
> we have something like
> static void affs_kill_sb(struct super_block *sb)
> {
>         struct affs_sb_info *sbi = AFFS_SB(sb);
>         kill_block_super(sb);
>         if (sbi) {
>                 affs_free_bitmap(sb);
>                 affs_brelse(sbi->s_root_bh);
>                 kfree(sbi->s_prefix);
>                 mutex_destroy(&sbi->s_bmlock);
>                 kfree(sbi);
>         }
> }
> which basically does one of the safe ones augmented with something that
> takes care *not* to assume that e.g. ->s_fs_info has been allocated.
> Not everyone does, though:
> 
> jffs2_fill_super():
>         c = kzalloc(sizeof(*c), GFP_KERNEL);
>         if (!c)
>                 return -ENOMEM;
> in the very beginning.  So we can return from it with NULL ->s_fs_info.
> Now, consider
>         struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
>         if (!(sb->s_flags & MS_RDONLY))
>                 jffs2_stop_garbage_collect_thread(c);
> in jffs2_kill_sb() and
> void jffs2_stop_garbage_collect_thread(struct jffs2_sb_info *c)
> {
>         int wait = 0;
>         spin_lock(&c->erase_completion_lock);
>         if (c->gc_task) {
> 
> IOW, fail that kzalloc() (or, indeed, an allocation in register_shrinker())
> and eat an oops.  Always had been there, always hard to hit without
> fault injectors and fortunately trivial to fix.
> 
> Similar in nfs_kill_super() calling nfs_free_server().
> Similar in v9fs_kill_super() with v9fs_session_cancel()/v9fs_session_close() calls.
> Similar in hypfs_kill_super(), afs_kill_super(), btrfs_kill_super(), cifs_kill_sb()
> (all trivial to fix)
> 
> Aha... nfsd_umount() is a new regression.
> 
> orangefs: old, trivial to fix.
> 
> cgroup_kill_sb(): old, hopefully easy to fix.  Note that kernfs_root_from_sb()
> can bloody well return NULL, making cgroup_root_from_kf() oops.  Always had been
> there.
> 
> AFAICS, after discarding the instances that do the right thing we are left with:
> hypfs_kill_super, rdt_kill_sb, v9fs_kill_super, afs_kill_super, btrfs_kill_super,
> cifs_kill_sb, jffs2_kill_sb, nfs_kill_super, nfsd_umount, orangefs_kill_sb,
> proc_kill_sb, sysfs_kill_sb, cgroup_kill_sb, rpc_kill_sb.
> 
> Out of those, nfsd_umount(), proc_kill_sb() and rpc_kill_sb() are regressions.
> So are rdt_kill_sb() and sysfs_kill_sb() (victims of the issue you've spotted
> in kernfs_kill_sb()).  The rest are old (and I wonder if syzbot had been
> catching those - they are also dependent upon a specific allocation failing
> at the right time).
> 

Fix for the kernfs bug is now queued in vfs/for-linus:

#syz fix: kernfs: deal with early sget() failures

syzkaller just recently (3 weeks ago) gained the ability to mount filesystem
images, so that's the main reason for the increase in filesystem bug reports.
Each time syzkaller is updated to cover more code, bugs are found.

- Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: general protection fault in kernfs_kill_sb
  2018-04-20  2:44     ` Eric Biggers
@ 2018-04-20  3:34       ` Eric Biggers
  2018-04-20  5:29         ` Tetsuo Handa
  2018-05-02 10:37       ` Tetsuo Handa
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Biggers @ 2018-04-20  3:34 UTC (permalink / raw)
  To: Al Viro
  Cc: Tetsuo Handa, syzbot, gregkh, kstewart, linux-kernel, linux-mm,
	pombredanne, syzkaller-bugs, tglx, linux-fsdevel

On Thu, Apr 19, 2018 at 07:44:40PM -0700, Eric Biggers wrote:
> On Mon, Apr 02, 2018 at 03:34:15PM +0100, Al Viro wrote:
> > On Mon, Apr 02, 2018 at 07:40:22PM +0900, Tetsuo Handa wrote:
> > 
> > > That commit assumes that calling kill_sb() from deactivate_locked_super(s)
> > > without corresponding fill_super() is safe. We have so far crashed with
> > > rpc_mount() and kernfs_mount_ns(). Is that really safe?
> > 
> > 	Consider the case when fill_super() returns an error immediately.
> > It is exactly the same situation.  And ->kill_sb() *is* called in cases
> > when fill_super() has failed.  Always had been - it's much less boilerplate
> > that way.
> > 
> > 	deactivate_locked_super() on that failure exit is the least painful
> > variant, unfortunately.
> > 
> > 	Filesystems with ->kill_sb() instances that rely upon something
> > done between sget() and the first failure exit after it need to be fixed.
> > And yes, that should've been spotted back then.  Sorry.
> > 
> > Fortunately, we don't have many of those - kill_{block,litter,anon}_super()
> > are safe and those are the majority.  Looking through the rest uncovers
> > some bugs; so far all I've seen were already there.  Note that normally
> > we have something like
> > static void affs_kill_sb(struct super_block *sb)
> > {
> >         struct affs_sb_info *sbi = AFFS_SB(sb);
> >         kill_block_super(sb);
> >         if (sbi) {
> >                 affs_free_bitmap(sb);
> >                 affs_brelse(sbi->s_root_bh);
> >                 kfree(sbi->s_prefix);
> >                 mutex_destroy(&sbi->s_bmlock);
> >                 kfree(sbi);
> >         }
> > }
> > which basically does one of the safe ones augmented with something that
> > takes care *not* to assume that e.g. ->s_fs_info has been allocated.
> > Not everyone does, though:
> > 
> > jffs2_fill_super():
> >         c = kzalloc(sizeof(*c), GFP_KERNEL);
> >         if (!c)
> >                 return -ENOMEM;
> > in the very beginning.  So we can return from it with NULL ->s_fs_info.
> > Now, consider
> >         struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
> >         if (!(sb->s_flags & MS_RDONLY))
> >                 jffs2_stop_garbage_collect_thread(c);
> > in jffs2_kill_sb() and
> > void jffs2_stop_garbage_collect_thread(struct jffs2_sb_info *c)
> > {
> >         int wait = 0;
> >         spin_lock(&c->erase_completion_lock);
> >         if (c->gc_task) {
> > 
> > IOW, fail that kzalloc() (or, indeed, an allocation in register_shrinker())
> > and eat an oops.  Always had been there, always hard to hit without
> > fault injectors and fortunately trivial to fix.
> > 
> > Similar in nfs_kill_super() calling nfs_free_server().
> > Similar in v9fs_kill_super() with v9fs_session_cancel()/v9fs_session_close() calls.
> > Similar in hypfs_kill_super(), afs_kill_super(), btrfs_kill_super(), cifs_kill_sb()
> > (all trivial to fix)
> > 
> > Aha... nfsd_umount() is a new regression.
> > 
> > orangefs: old, trivial to fix.
> > 
> > cgroup_kill_sb(): old, hopefully easy to fix.  Note that kernfs_root_from_sb()
> > can bloody well return NULL, making cgroup_root_from_kf() oops.  Always had been
> > there.
> > 
> > AFAICS, after discarding the instances that do the right thing we are left with:
> > hypfs_kill_super, rdt_kill_sb, v9fs_kill_super, afs_kill_super, btrfs_kill_super,
> > cifs_kill_sb, jffs2_kill_sb, nfs_kill_super, nfsd_umount, orangefs_kill_sb,
> > proc_kill_sb, sysfs_kill_sb, cgroup_kill_sb, rpc_kill_sb.
> > 
> > Out of those, nfsd_umount(), proc_kill_sb() and rpc_kill_sb() are regressions.
> > So are rdt_kill_sb() and sysfs_kill_sb() (victims of the issue you've spotted
> > in kernfs_kill_sb()).  The rest are old (and I wonder if syzbot had been
> > catching those - they are also dependent upon a specific allocation failing
> > at the right time).
> > 
> 
> Fix for the kernfs bug is now queued in vfs/for-linus:
> 
> #syz fix: kernfs: deal with early sget() failures
> 

But, there is still a related bug: when mounting sysfs, if register_shrinker()
fails in sget_userns(), then kernfs_kill_sb() gets called, which frees the
'struct kernfs_super_info'.  But, the 'struct kernfs_super_info' is also freed
in kernfs_mount_ns() by:

        sb = sget_userns(fs_type, kernfs_test_super, kernfs_set_super, flags,
                         &init_user_ns, info);
        if (IS_ERR(sb) || sb->s_fs_info != info)
                kfree(info);
        if (IS_ERR(sb))
                return ERR_CAST(sb);

I guess the problem is that sget_userns() shouldn't take ownership of the 'info'
if it returns an error -- but, it actually does if register_shrinker() fails,
resulting in a double free.

Here is a reproducer and the KASAN splat.  This is on Linus' tree (87ef12027b9b)
with vfs/for-linus merged in.

#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <unistd.h>

int main()
{
        int fd, i;
        char buf[16];

        unshare(CLONE_NEWNET);
        system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
        system("echo 0 | tee /sys/kernel/debug/fail*/verbose");
        fd = open("/proc/thread-self/fail-nth", O_WRONLY);
        for (i = 0; ; i++) {
                write(fd, buf, sprintf(buf, "%d", i));
                mount("sysfs", "mnt", "sysfs", 0, NULL);
                umount("mnt");
        }
}

==================================================================
BUG: KASAN: double-free or invalid-free in kernfs_mount_ns+0x5eb/0x7d0 fs/kernfs/mount.c:324

CPU: 12 PID: 268 Comm: syz_kernfs Not tainted 4.17.0-rc1-00038-gb25f027a33d42 #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x7c/0xbb lib/dump_stack.c:113
 print_address_description+0x73/0x290 mm/kasan/report.c:256
 kasan_report_invalid_free+0x60/0xa0 mm/kasan/report.c:336
 __kasan_slab_free+0x1d5/0x220 mm/kasan/kasan.c:501
 __cache_free mm/slab.c:3498 [inline]
 kfree+0x8a/0xd0 mm/slab.c:3813
 kernfs_mount_ns+0x5eb/0x7d0 fs/kernfs/mount.c:324
 sysfs_mount+0xa2/0x170 fs/sysfs/mount.c:36
 mount_fs+0x38/0x1b0 fs/super.c:1268
 vfs_kern_mount.part.10+0x53/0x3a0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:2514 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x35c/0x2560 fs/namespace.c:2847
 ksys_mount+0x7b/0xd0 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xb5/0x150 fs/namespace.c:3074
 do_syscall_64+0x90/0x3f0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7febbf5877ea
RSP: 002b:00007ffee25976a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000560d99d26af4 RCX: 00007febbf5877ea
RDX: 0000560d99d26aee RSI: 0000560d99d26af4 RDI: 0000560d99d26aee
RBP: 0000560d99d26aee R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffee25976b0
R13: 0000560d99d26aeb R14: 0000000000000003 R15: 000000000000000c

Allocated by task 268:
 kmem_cache_alloc_trace include/linux/slab.h:415 [inline]
 kmalloc include/linux/slab.h:512 [inline]
 kzalloc include/linux/slab.h:701 [inline]
 kernfs_mount_ns+0x71/0x7d0 fs/kernfs/mount.c:313
 sysfs_mount+0xa2/0x170 fs/sysfs/mount.c:36
 mount_fs+0x38/0x1b0 fs/super.c:1268
 vfs_kern_mount.part.10+0x53/0x3a0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:2514 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x35c/0x2560 fs/namespace.c:2847
 ksys_mount+0x7b/0xd0 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xb5/0x150 fs/namespace.c:3074
 do_syscall_64+0x90/0x3f0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 268:
 __cache_free mm/slab.c:3498 [inline]
 kfree+0x8a/0xd0 mm/slab.c:3813
 sysfs_kill_sb+0x15/0x30 fs/sysfs/mount.c:50
 deactivate_locked_super+0x80/0xc0 fs/super.c:313
 sget_userns+0x90c/0xb50 fs/super.c:523
 kernfs_mount_ns+0x134/0x7d0 fs/kernfs/mount.c:321
 sysfs_mount+0xa2/0x170 fs/sysfs/mount.c:36
 mount_fs+0x38/0x1b0 fs/super.c:1268
 vfs_kern_mount.part.10+0x53/0x3a0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:2514 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x35c/0x2560 fs/namespace.c:2847
 ksys_mount+0x7b/0xd0 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xb5/0x150 fs/namespace.c:3074
 do_syscall_64+0x90/0x3f0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88006a6e6480
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes inside of
 64-byte region [ffff88006a6e6480, ffff88006a6e64c0)
The buggy address belongs to the page:
page:ffffea0001748250 count:1 mapcount:0 mapping:ffff88006a6e6000 index:0xffff88006a6e6f80
flags: 0x100000000000100(slab)
raw: 0100000000000100 ffff88006a6e6000 ffff88006a6e6f80 0000000100000010
raw: ffff88006d401338 ffff88006d401338 ffff88006d400200
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88006a6e6380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88006a6e6400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88006a6e6480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                   ^
 ffff88006a6e6500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff88006a6e6580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: general protection fault in kernfs_kill_sb
  2018-04-20  3:34       ` Eric Biggers
@ 2018-04-20  5:29         ` Tetsuo Handa
  2018-04-20  7:31           ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2018-04-20  5:29 UTC (permalink / raw)
  To: Eric Biggers, Michal Hocko
  Cc: Al Viro, Tetsuo Handa, syzbot, gregkh, kstewart, linux-kernel,
	linux-mm, pombredanne, syzkaller-bugs, tglx, linux-fsdevel

Eric Biggers wrote:
> But, there is still a related bug: when mounting sysfs, if register_shrinker()
> fails in sget_userns(), then kernfs_kill_sb() gets called, which frees the
> 'struct kernfs_super_info'.  But, the 'struct kernfs_super_info' is also freed
> in kernfs_mount_ns() by:
> 
>         sb = sget_userns(fs_type, kernfs_test_super, kernfs_set_super, flags,
>                          &init_user_ns, info);
>         if (IS_ERR(sb) || sb->s_fs_info != info)
>                 kfree(info);
>         if (IS_ERR(sb))
>                 return ERR_CAST(sb);
> 
> I guess the problem is that sget_userns() shouldn't take ownership of the 'info'
> if it returns an error -- but, it actually does if register_shrinker() fails,
> resulting in a double free.
> 
> Here is a reproducer and the KASAN splat.  This is on Linus' tree (87ef12027b9b)
> with vfs/for-linus merged in.

I'm waiting for response from Michal Hocko regarding
http://lkml.kernel.org/r/201804111909.EGC64586.QSFLFJFOVHOOtM@I-love.SAKURA.ne.jp .

> 
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <sched.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mount.h>
> #include <sys/stat.h>
> #include <unistd.h>
> 
> int main()
> {
>         int fd, i;
>         char buf[16];
> 
>         unshare(CLONE_NEWNET);
>         system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
>         system("echo 0 | tee /sys/kernel/debug/fail*/verbose");
>         fd = open("/proc/thread-self/fail-nth", O_WRONLY);
>         for (i = 0; ; i++) {
>                 write(fd, buf, sprintf(buf, "%d", i));
>                 mount("sysfs", "mnt", "sysfs", 0, NULL);
>                 umount("mnt");
>         }
> }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: general protection fault in kernfs_kill_sb
  2018-04-20  5:29         ` Tetsuo Handa
@ 2018-04-20  7:31           ` Michal Hocko
  2018-04-20 17:55             ` Eric Biggers
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2018-04-20  7:31 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Eric Biggers, Al Viro, syzbot, gregkh, kstewart, linux-kernel,
	linux-mm, pombredanne, syzkaller-bugs, tglx, linux-fsdevel

On Fri 20-04-18 14:29:39, Tetsuo Handa wrote:
> Eric Biggers wrote:
> > But, there is still a related bug: when mounting sysfs, if register_shrinker()
> > fails in sget_userns(), then kernfs_kill_sb() gets called, which frees the
> > 'struct kernfs_super_info'.  But, the 'struct kernfs_super_info' is also freed
> > in kernfs_mount_ns() by:
> > 
> >         sb = sget_userns(fs_type, kernfs_test_super, kernfs_set_super, flags,
> >                          &init_user_ns, info);
> >         if (IS_ERR(sb) || sb->s_fs_info != info)
> >                 kfree(info);
> >         if (IS_ERR(sb))
> >                 return ERR_CAST(sb);
> > 
> > I guess the problem is that sget_userns() shouldn't take ownership of the 'info'
> > if it returns an error -- but, it actually does if register_shrinker() fails,
> > resulting in a double free.
> > 
> > Here is a reproducer and the KASAN splat.  This is on Linus' tree (87ef12027b9b)
> > with vfs/for-linus merged in.
> 
> I'm waiting for response from Michal Hocko regarding
> http://lkml.kernel.org/r/201804111909.EGC64586.QSFLFJFOVHOOtM@I-love.SAKURA.ne.jp .

I didn't plan to respond util all the Al's concerns with the existing
scheme are resolved. This is not an urgent thing to fix so better fix it
properly. Your API change is kinda ugly so it would be preferable to do
it properly as suggested by Al. Maybe that will be more work but my
understanding is that the resulting code would be better. If that is not
the case then I do not really have any fundamental objection to your
patch except it is ugly.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: general protection fault in kernfs_kill_sb
  2018-04-20  7:31           ` Michal Hocko
@ 2018-04-20 17:55             ` Eric Biggers
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Biggers @ 2018-04-20 17:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Al Viro, syzbot, gregkh, kstewart, linux-kernel,
	linux-mm, pombredanne, syzkaller-bugs, tglx, linux-fsdevel

On Fri, Apr 20, 2018 at 09:31:58AM +0200, Michal Hocko wrote:
> On Fri 20-04-18 14:29:39, Tetsuo Handa wrote:
> > Eric Biggers wrote:
> > > But, there is still a related bug: when mounting sysfs, if register_shrinker()
> > > fails in sget_userns(), then kernfs_kill_sb() gets called, which frees the
> > > 'struct kernfs_super_info'.  But, the 'struct kernfs_super_info' is also freed
> > > in kernfs_mount_ns() by:
> > > 
> > >         sb = sget_userns(fs_type, kernfs_test_super, kernfs_set_super, flags,
> > >                          &init_user_ns, info);
> > >         if (IS_ERR(sb) || sb->s_fs_info != info)
> > >                 kfree(info);
> > >         if (IS_ERR(sb))
> > >                 return ERR_CAST(sb);
> > > 
> > > I guess the problem is that sget_userns() shouldn't take ownership of the 'info'
> > > if it returns an error -- but, it actually does if register_shrinker() fails,
> > > resulting in a double free.
> > > 
> > > Here is a reproducer and the KASAN splat.  This is on Linus' tree (87ef12027b9b)
> > > with vfs/for-linus merged in.
> > 
> > I'm waiting for response from Michal Hocko regarding
> > http://lkml.kernel.org/r/201804111909.EGC64586.QSFLFJFOVHOOtM@I-love.SAKURA.ne.jp .
> 
> I didn't plan to respond util all the Al's concerns with the existing
> scheme are resolved. This is not an urgent thing to fix so better fix it
> properly. Your API change is kinda ugly so it would be preferable to do
> it properly as suggested by Al. Maybe that will be more work but my
> understanding is that the resulting code would be better. If that is not
> the case then I do not really have any fundamental objection to your
> patch except it is ugly.

Okay, the fix was merged already as commit 8e04944f0ea8b8 ("mm,vmscan: Allow
preallocating memory for register_shrinker().").  Thanks Tetsuo!

- Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: general protection fault in kernfs_kill_sb
  2018-04-20  2:44     ` Eric Biggers
  2018-04-20  3:34       ` Eric Biggers
@ 2018-05-02 10:37       ` Tetsuo Handa
  1 sibling, 0 replies; 9+ messages in thread
From: Tetsuo Handa @ 2018-05-02 10:37 UTC (permalink / raw)
  To: Eric Biggers, Al Viro
  Cc: syzbot, gregkh, kstewart, linux-kernel, linux-mm, pombredanne,
	syzkaller-bugs, tglx, linux-fsdevel

On 2018/04/20 11:44, Eric Biggers wrote:
> Fix for the kernfs bug is now queued in vfs/for-linus:
> 
> #syz fix: kernfs: deal with early sget() failures

Well, the following patches

  rpc_pipefs: deal with early sget() failures
  kernfs: deal with early sget() failures
  procfs: deal with early sget() failures
  nfsd_umount(): deal with early sget() failures
  nfs: avoid double-free on early sget() failures

are dropped from vfs.git#for-linus while this report is marked
as "#syz fix: kernfs: deal with early sget() failures". The patch which
actually went to linux.git is 8e04944f0ea8b838.

#syz fix: mm,vmscan: Allow preallocating memory for register_shrinker().



By the way, we still have NULL pointer dereference (as of f2125992e7cb25ec
on linux.git) shown below due to calling deactivate_locked_super() without
successful fill_super().

----------
[  162.865231] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  162.873678] PGD 130487067 P4D 130487067 PUD 138750067 PMD 0 
[  162.879845] Oops: 0000 [#1] SMP
[  162.883295] Modules linked in:
[  162.886648] CPU: 2 PID: 15505 Comm: a.out Kdump: loaded Tainted: G                T 4.17.0-rc3+ #522
[  162.894891] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[  162.899415] RIP: 0010:__list_del_entry_valid+0x29/0x90
[  162.901609] RSP: 0018:ffffc90001e07ce0 EFLAGS: 00010207
[  162.903834] RAX: 0000000000000000 RBX: ffff880132359580 RCX: dead000000000200
[  162.906825] RDX: 0000000000000000 RSI: 00000000e85efffd RDI: ffff880132359598
[  162.909863] RBP: ffffc90001e07ce0 R08: ffffffff815269c5 R09: 0000000000000004
[  162.912923] R10: ffffc90001e07ce0 R11: ffffffff840f2060 R12: ffff880134c6e000
[  162.915929] R13: ffff88013a014f00 R14: ffff880134c6e000 R15: ffff880132359580
[  162.918927] FS:  00007f6525b13740(0000) GS:ffff88013a680000(0000) knlGS:0000000000000000
[  162.922325] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  162.924751] CR2: 0000000000000000 CR3: 0000000134b2e006 CR4: 00000000000606e0
[  162.927236] Call Trace:
[  162.928153]  kernfs_kill_sb+0x2e/0x90
[  162.929377]  sysfs_kill_sb+0x22/0x40
[  162.930571]  deactivate_locked_super+0x50/0x90
[  162.931958]  kernfs_mount_ns+0x283/0x290
[  162.933126]  sysfs_mount+0x74/0xf0
[  162.934146]  mount_fs+0x46/0x1a0
[  162.935137]  vfs_kern_mount.part.28+0x67/0x190
[  162.936449]  do_mount+0x7b0/0x11f0
[  162.937473]  ? memdup_user+0x5e/0x90
[  162.938541]  ? copy_mount_options+0x1a4/0x2d0
[  162.939828]  ksys_mount+0xab/0x120
[  162.940954]  __x64_sys_mount+0x26/0x30
[  162.942153]  do_syscall_64+0x7b/0x260
[  162.943274]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  162.944903] RIP: 0033:0x7f6525640aaa
[  162.946012] RSP: 002b:00007ffc4f4e6f78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[  162.948291] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007f6525640aaa
[  162.950396] RDX: 0000000000400896 RSI: 000000000040089c RDI: 0000000000400896
[  162.952480] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000002
[  162.954545] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400694
[  162.957760] R13: 00007ffc4f4e7080 R14: 0000000000000000 R15: 0000000000000000
[  162.963839] Code: 00 00 55 48 8b 07 48 b9 00 01 00 00 00 00 ad de 48 8b 57 08 48 89 e5 48 39 c8 74 27 48 b9 00 02 00 00 00 00 ad de 48 39 ca 74 2c <48> 8b 32 48 39 fe 75 35 48 8b 50 08 48 39 f2 75 40 b8 01 00 00 
[  162.974795] RIP: __list_del_entry_valid+0x29/0x90 RSP: ffffc90001e07ce0
[  162.976752] CR2: 0000000000000000
----------

Below patch can avoid NULL pointer dereference at kernfs_kill_sb().

----------
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 26dd9a5..498c044 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -314,6 +314,7 @@ struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
 	if (!info)
 		return ERR_PTR(-ENOMEM);
 
+	INIT_LIST_HEAD(&info->node);
 	info->root = root;
 	info->ns = ns;
 
----------

But there remains a refcount bug because deactivate_locked_super() from
kernfs_mount_ns() triggers kobj_ns_drop() from sysfs_kill_sb() via
sb->kill_sb() when kobj_ns_drop() is always called by sysfs_mount()
if kernfs_mount_ns() returned an error.

----------
 static void *net_grab_current_ns(void)
 {
        struct net *ns = current->nsproxy->net_ns;
 #ifdef CONFIG_NET_NS
        if (ns)
                refcount_inc(&ns->passive);
        if (ns && !strcmp(current->comm, "a.out"))
                printk("net_grab_current_ns: %px %d %d\n", ns,
                       refcount_read(&ns->passive), refcount_read(&ns->count));
 #endif
        return ns;
 }

 void net_drop_ns(void *p)
 {
        struct net *ns = p;
        if (ns && !strcmp(current->comm, "a.out")) {
                printk("net_drop_ns: %px %d %d\n", ns,
                       refcount_read(&ns->passive), refcount_read(&ns->count));
                dump_stack();
        }
        if (ns && refcount_dec_and_test(&ns->passive))
                net_free(ns);
 }
----------

----------
Normal case
[   79.283244] net_grab_current_ns: ffff88012e570080 2 1
[   79.299881] net_drop_ns: ffff88012e570080 2 1
[   79.303463] CPU: 0 PID: 15294 Comm: a.out Kdump: loaded Tainted: G                T 4.17.0-rc3+ #527
[   79.310509] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[   79.316055] Call Trace:
[   79.317367]  dump_stack+0xe9/0x148
[   79.319154]  net_drop_ns+0xa1/0xb0
[   79.320903]  ? get_net_ns_by_id+0x170/0x170
[   79.323053]  kobj_ns_drop+0x61/0x70
[   79.324856]  sysfs_kill_sb+0x2f/0x40
[   79.326750]  deactivate_locked_super+0x50/0x90
[   79.329053]  deactivate_super+0x61/0x90
[   79.331032]  cleanup_mnt+0x49/0x90
[   79.332794]  __cleanup_mnt+0x16/0x20
[   79.334641]  task_work_run+0xb3/0xf0
[   79.336485]  exit_to_usermode_loop+0x152/0x160
[   79.338785]  do_syscall_64+0x237/0x260
[   79.340710]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[   79.357961] net_grab_current_ns: ffff88012e570080 2 1
[   79.360275] net_drop_ns: ffff88012e570080 2 1
[   79.362469] CPU: 0 PID: 15294 Comm: a.out Kdump: loaded Tainted: G                T 4.17.0-rc3+ #527
[   79.366436] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[   79.369596] Call Trace:
[   79.370363]  dump_stack+0xe9/0x148
[   79.371444]  net_drop_ns+0xa1/0xb0
[   79.372504]  ? get_net_ns_by_id+0x170/0x170
[   79.373836]  kobj_ns_drop+0x61/0x70
[   79.374912]  sysfs_mount+0xd2/0xf0
[   79.375976]  ? lockdep_init_map+0x9/0x10
[   79.377343]  mount_fs+0x46/0x1a0
[   79.378365]  vfs_kern_mount.part.28+0x67/0x190
[   79.379850]  do_mount+0x7b0/0x11f0
[   79.381001]  ? memdup_user+0x5e/0x90
[   79.382213]  ? copy_mount_options+0x1a4/0x2d0
[   79.383514]  ksys_mount+0xab/0x120
[   79.384544]  __x64_sys_mount+0x26/0x30
[   79.385761]  do_syscall_64+0x7b/0x260
[   79.386942]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
----------

----------
Error case
[   79.664326] net_grab_current_ns: ffff88012e570080 2 1
[   79.666073] net_drop_ns: ffff88012e570080 2 1
[   79.667504] CPU: 1 PID: 15294 Comm: a.out Kdump: loaded Tainted: G                T 4.17.0-rc3+ #527
[   79.670197] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[   79.673282] Call Trace:
[   79.674041]  dump_stack+0xe9/0x148
[   79.675064]  net_drop_ns+0xa1/0xb0
[   79.676085]  ? get_net_ns_by_id+0x170/0x170
[   79.677326]  kobj_ns_drop+0x61/0x70
[   79.678902]  sysfs_kill_sb+0x2f/0x40
[   79.680144]  deactivate_locked_super+0x50/0x90
[   79.681613]  kernfs_mount_ns+0x28f/0x2a0
[   79.682884]  sysfs_mount+0x74/0xf0
[   79.683906]  mount_fs+0x46/0x1a0
[   79.684879]  vfs_kern_mount.part.28+0x67/0x190
[   79.686193]  do_mount+0x7b0/0x11f0
[   79.687234]  ? memdup_user+0x5e/0x90
[   79.688305]  ? copy_mount_options+0x1a4/0x2d0
[   79.689592]  ksys_mount+0xab/0x120
[   79.690617]  __x64_sys_mount+0x26/0x30
[   79.691735]  do_syscall_64+0x7b/0x260
[   79.692833]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   79.694391] RIP: 0033:0x7fefaf3c4aaa
[   79.695744] RSP: 002b:00007ffe74af7fd8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[   79.698209] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007fefaf3c4aaa
[   79.700386] RDX: 0000000000400896 RSI: 000000000040089c RDI: 0000000000400896
[   79.702472] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000002
[   79.704552] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400694
[   79.706624] R13: 00007ffe74af80e0 R14: 0000000000000000 R15: 0000000000000000
[   79.708802] net_drop_ns: ffff88012e570080 1 1
[   79.710317] CPU: 1 PID: 15294 Comm: a.out Kdump: loaded Tainted: G                T 4.17.0-rc3+ #527
[   79.713255] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[   79.716704] Call Trace:
[   79.717463]  dump_stack+0xe9/0x148
[   79.718487]  net_drop_ns+0xa1/0xb0
[   79.719510]  ? get_net_ns_by_id+0x170/0x170
[   79.720771]  kobj_ns_drop+0x61/0x70
[   79.721818]  sysfs_mount+0xd2/0xf0
[   79.722840]  mount_fs+0x46/0x1a0
[   79.723815]  vfs_kern_mount.part.28+0x67/0x190
[   79.725178]  do_mount+0x7b0/0x11f0
[   79.726272]  ? memdup_user+0x5e/0x90
[   79.727361]  ? copy_mount_options+0x1a4/0x2d0
[   79.728811]  ksys_mount+0xab/0x120
[   79.730002]  __x64_sys_mount+0x26/0x30
[   79.731253]  do_syscall_64+0x7b/0x260
[   79.732477]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   79.734039] RIP: 0033:0x7fefaf3c4aaa
[   79.735136] RSP: 002b:00007ffe74af7fd8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[   79.737340] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007fefaf3c4aaa
[   79.739427] RDX: 0000000000400896 RSI: 000000000040089c RDI: 0000000000400896
[   79.741576] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000002
[   79.743771] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400694
[   79.746102] R13: 00007ffe74af80e0 R14: 0000000000000000 R15: 0000000000000000
[   79.748604] ------------[ cut here ]------------
[   79.750409] ODEBUG: free active (active state 0) object type: timer_list hint: can_stat_update+0x0/0x3b0
[   79.753308] WARNING: CPU: 1 PID: 15294 at lib/debugobjects.c:329 debug_print_object+0x6a/0x90
[   79.755786] Modules linked in:
[   79.756812] CPU: 1 PID: 15294 Comm: a.out Kdump: loaded Tainted: G                T 4.17.0-rc3+ #527
[   79.759725] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[   79.763120] RIP: 0010:debug_print_object+0x6a/0x90
[   79.764684] RSP: 0018:ffffc900070a3ca0 EFLAGS: 00010086
[   79.766384] RAX: 0000000000000000 RBX: ffff88012fabea00 RCX: ffffffff8126085e
[   79.768482] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88013a6565b0
[   79.770537] RBP: ffffc900070a3cb8 R08: 0000000000000000 R09: 0000000000000001
[   79.772692] R10: ffffc900070a3c18 R11: ffff88012f350140 R12: ffffffff840c6e40
[   79.774883] R13: ffffffff83ca7711 R14: 0000000000000002 R15: ffff88012e5722c0
[   79.777012] FS:  00007fefaf897740(0000) GS:ffff88013a640000(0000) knlGS:0000000000000000
[   79.779762] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   79.779765] CR2: 00007f9542e78090 CR3: 000000012f336002 CR4: 00000000000606e0
[   79.779814] Call Trace:
[   79.779825]  debug_check_no_obj_freed+0x184/0x1ff
[   79.779831]  kmem_cache_free+0x228/0x280
[   79.779838]  net_drop_ns+0x74/0xb0
[   79.779845]  ? get_net_ns_by_id+0x170/0x170
[   79.790008]  kobj_ns_drop+0x61/0x70
[   79.791177]  sysfs_mount+0xd2/0xf0
[   79.792311]  mount_fs+0x46/0x1a0
[   79.793343]  vfs_kern_mount.part.28+0x67/0x190
[   79.794653]  do_mount+0x7b0/0x11f0
[   79.795796]  ? memdup_user+0x5e/0x90
[   79.796961]  ? copy_mount_options+0x1a4/0x2d0
[   79.798365]  ksys_mount+0xab/0x120
[   79.799502]  __x64_sys_mount+0x26/0x30
[   79.800780]  do_syscall_64+0x7b/0x260
[   79.801886]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
----------

Since sysfs_mount() is the only user who calls kernfs_mount_ns() with ns != NULL,
is it OK to do sysfs specific hack shown below? Or, should we avoid calling
deactivate_locked_super() when kernfs_fill_super() failed?

----------
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 26dd9a5..498c044 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -332,6 +333,9 @@ struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
 
 		error = kernfs_fill_super(sb, magic);
 		if (error) {
+			/* Avoid double kobj_ns_drop(KOBJ_NS_TYPE_NET, ns) */
+			if (ns)
+				kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
 			deactivate_locked_super(sb);
 			return ERR_PTR(error);
 		}
----------

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-05-02 10:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-01 17:01 general protection fault in kernfs_kill_sb syzbot
2018-04-02 10:40 ` Tetsuo Handa
2018-04-02 14:34   ` Al Viro
2018-04-20  2:44     ` Eric Biggers
2018-04-20  3:34       ` Eric Biggers
2018-04-20  5:29         ` Tetsuo Handa
2018-04-20  7:31           ` Michal Hocko
2018-04-20 17:55             ` Eric Biggers
2018-05-02 10:37       ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.