From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ebiggers3@gmail.com>
ARC-Seal: i=1; a=rsa-sha256; t=1524195222; cv=none;
        d=google.com; s=arc-20160816;
        b=Y9wQou4OO7g2H4/FsbXDvepCFy0lqRmdP4PaBSOOTHAiIrFq5v0WSBq05zY+z5c26A
         5PXmvHy0n+jnfyg5w/Uye7Aie1x2HRJl2GUmY+UF6WRfjMs2FxwzzSkFpIHztri6TXqr
         6nVimHDHY7mKpEBN0wF86m6aTcmZZLFVefQsWrk3xdMbHzsBbEpqDzcyD8QjOvPxituM
         hp1R91Q0+z5NGNh6BGHvpC1Q7jlR6sJCRACAGHXJM3TfWJR+UITYkYOCruPdLRRXMcA7
         LOrXZJrJZ2/vKSNP6TATRmMqzDTwTzOOrFLCBv3KB7pEpjexqzhQ1ABhwagj97FCgyJn
         SrDw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
        h=user-agent:in-reply-to:content-disposition:mime-version:references
         :message-id:subject:cc:to:from:date:dkim-signature
         :arc-authentication-results;
        bh=uxgOtFNOuDRi8bxNm2z3w9+HIqOaLmthob4asuOo6ww=;
        b=AoBbBD/h06RCEc0hSqUqY2GlLSZNMCeSqI3e/OXR/W/my3zgU3pMP5CGtuO15OE9G+
         j3Yz8mP4rZ2/gAJpxqyCnOaJD/U7bfQGb9xT3d9Iz/myV3X2ofh/UYu0OIZzxt7YNRYk
         sge83eFHvrCrjpiJo3giEvvIi3sD7WDFAuBXBUn8a/QVRlvSkEzxqC1nl/4m1ya8xJTm
         slMBkeVFKoxmcD8FNrusBdCteHeoQ0X/3qMTI1aN+SyeAvmuyEJ9PPmm2dLjDDyyAem1
         +IfaIf+LNMyWGgaYzFeUUFNj2jOXKlEyelhYXDQnUb/fj+H/8S2g5zMHwQUrwNYfNX5k
         dshQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZohXk2gV;
       spf=pass (google.com: domain of ebiggers3@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=ebiggers3@gmail.com;
       dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZohXk2gV;
       spf=pass (google.com: domain of ebiggers3@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=ebiggers3@gmail.com;
       dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
X-Google-Smtp-Source: AIpwx49+H33BRUR7LBDGq9a13FjU1KFEEVYISOqJ2CCiist/es1RS5JlzTRARV5JWNCTFjkBsnoKMw==
Date: Thu, 19 Apr 2018 20:34:50 -0700
From: Eric Biggers <ebiggers3@gmail.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	syzbot <syzbot+151de3f2be6b40ac8026@syzkaller.appspotmail.com>,
	gregkh@linuxfoundation.org, kstewart@linuxfoundation.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	pombredanne@nexb.com, syzkaller-bugs@googlegroups.com,
	tglx@linutronix.de, linux-fsdevel@vger.kernel.org
Subject: Re: general protection fault in kernfs_kill_sb
Message-ID: <20180420033450.GC686@sol.localdomain>
References: <94eb2c0546040ebb4d0568cc6bdb@google.com>
 <821c80d2-0b55-287a-09aa-d004f4ac4215@I-love.SAKURA.ne.jp>
 <20180402143415.GC30522@ZenIV.linux.org.uk>
 <20180420024440.GB686@sol.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20180420024440.GB686@sol.localdomain>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: =?utf-8?q?1596563980561316248?=
X-GMAIL-MSGID: =?utf-8?q?1598234529216763708?=
X-Mailing-List: linux-kernel@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>

On Thu, Apr 19, 2018 at 07:44:40PM -0700, Eric Biggers wrote:
> On Mon, Apr 02, 2018 at 03:34:15PM +0100, Al Viro wrote:
> > On Mon, Apr 02, 2018 at 07:40:22PM +0900, Tetsuo Handa wrote:
> > 
> > > That commit assumes that calling kill_sb() from deactivate_locked_super(s)
> > > without corresponding fill_super() is safe. We have so far crashed with
> > > rpc_mount() and kernfs_mount_ns(). Is that really safe?
> > 
> > 	Consider the case when fill_super() returns an error immediately.
> > It is exactly the same situation.  And ->kill_sb() *is* called in cases
> > when fill_super() has failed.  Always had been - it's much less boilerplate
> > that way.
> > 
> > 	deactivate_locked_super() on that failure exit is the least painful
> > variant, unfortunately.
> > 
> > 	Filesystems with ->kill_sb() instances that rely upon something
> > done between sget() and the first failure exit after it need to be fixed.
> > And yes, that should've been spotted back then.  Sorry.
> > 
> > Fortunately, we don't have many of those - kill_{block,litter,anon}_super()
> > are safe and those are the majority.  Looking through the rest uncovers
> > some bugs; so far all I've seen were already there.  Note that normally
> > we have something like
> > static void affs_kill_sb(struct super_block *sb)
> > {
> >         struct affs_sb_info *sbi = AFFS_SB(sb);
> >         kill_block_super(sb);
> >         if (sbi) {
> >                 affs_free_bitmap(sb);
> >                 affs_brelse(sbi->s_root_bh);
> >                 kfree(sbi->s_prefix);
> >                 mutex_destroy(&sbi->s_bmlock);
> >                 kfree(sbi);
> >         }
> > }
> > which basically does one of the safe ones augmented with something that
> > takes care *not* to assume that e.g. ->s_fs_info has been allocated.
> > Not everyone does, though:
> > 
> > jffs2_fill_super():
> >         c = kzalloc(sizeof(*c), GFP_KERNEL);
> >         if (!c)
> >                 return -ENOMEM;
> > in the very beginning.  So we can return from it with NULL ->s_fs_info.
> > Now, consider
> >         struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
> >         if (!(sb->s_flags & MS_RDONLY))
> >                 jffs2_stop_garbage_collect_thread(c);
> > in jffs2_kill_sb() and
> > void jffs2_stop_garbage_collect_thread(struct jffs2_sb_info *c)
> > {
> >         int wait = 0;
> >         spin_lock(&c->erase_completion_lock);
> >         if (c->gc_task) {
> > 
> > IOW, fail that kzalloc() (or, indeed, an allocation in register_shrinker())
> > and eat an oops.  Always had been there, always hard to hit without
> > fault injectors and fortunately trivial to fix.
> > 
> > Similar in nfs_kill_super() calling nfs_free_server().
> > Similar in v9fs_kill_super() with v9fs_session_cancel()/v9fs_session_close() calls.
> > Similar in hypfs_kill_super(), afs_kill_super(), btrfs_kill_super(), cifs_kill_sb()
> > (all trivial to fix)
> > 
> > Aha... nfsd_umount() is a new regression.
> > 
> > orangefs: old, trivial to fix.
> > 
> > cgroup_kill_sb(): old, hopefully easy to fix.  Note that kernfs_root_from_sb()
> > can bloody well return NULL, making cgroup_root_from_kf() oops.  Always had been
> > there.
> > 
> > AFAICS, after discarding the instances that do the right thing we are left with:
> > hypfs_kill_super, rdt_kill_sb, v9fs_kill_super, afs_kill_super, btrfs_kill_super,
> > cifs_kill_sb, jffs2_kill_sb, nfs_kill_super, nfsd_umount, orangefs_kill_sb,
> > proc_kill_sb, sysfs_kill_sb, cgroup_kill_sb, rpc_kill_sb.
> > 
> > Out of those, nfsd_umount(), proc_kill_sb() and rpc_kill_sb() are regressions.
> > So are rdt_kill_sb() and sysfs_kill_sb() (victims of the issue you've spotted
> > in kernfs_kill_sb()).  The rest are old (and I wonder if syzbot had been
> > catching those - they are also dependent upon a specific allocation failing
> > at the right time).
> > 
> 
> Fix for the kernfs bug is now queued in vfs/for-linus:
> 
> #syz fix: kernfs: deal with early sget() failures
> 

But, there is still a related bug: when mounting sysfs, if register_shrinker()
fails in sget_userns(), then kernfs_kill_sb() gets called, which frees the
'struct kernfs_super_info'.  But, the 'struct kernfs_super_info' is also freed
in kernfs_mount_ns() by:

        sb = sget_userns(fs_type, kernfs_test_super, kernfs_set_super, flags,
                         &init_user_ns, info);
        if (IS_ERR(sb) || sb->s_fs_info != info)
                kfree(info);
        if (IS_ERR(sb))
                return ERR_CAST(sb);

I guess the problem is that sget_userns() shouldn't take ownership of the 'info'
if it returns an error -- but, it actually does if register_shrinker() fails,
resulting in a double free.

Here is a reproducer and the KASAN splat.  This is on Linus' tree (87ef12027b9b)
with vfs/for-linus merged in.

#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <unistd.h>

int main()
{
        int fd, i;
        char buf[16];

        unshare(CLONE_NEWNET);
        system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
        system("echo 0 | tee /sys/kernel/debug/fail*/verbose");
        fd = open("/proc/thread-self/fail-nth", O_WRONLY);
        for (i = 0; ; i++) {
                write(fd, buf, sprintf(buf, "%d", i));
                mount("sysfs", "mnt", "sysfs", 0, NULL);
                umount("mnt");
        }
}

==================================================================
BUG: KASAN: double-free or invalid-free in kernfs_mount_ns+0x5eb/0x7d0 fs/kernfs/mount.c:324

CPU: 12 PID: 268 Comm: syz_kernfs Not tainted 4.17.0-rc1-00038-gb25f027a33d42 #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x7c/0xbb lib/dump_stack.c:113
 print_address_description+0x73/0x290 mm/kasan/report.c:256
 kasan_report_invalid_free+0x60/0xa0 mm/kasan/report.c:336
 __kasan_slab_free+0x1d5/0x220 mm/kasan/kasan.c:501
 __cache_free mm/slab.c:3498 [inline]
 kfree+0x8a/0xd0 mm/slab.c:3813
 kernfs_mount_ns+0x5eb/0x7d0 fs/kernfs/mount.c:324
 sysfs_mount+0xa2/0x170 fs/sysfs/mount.c:36
 mount_fs+0x38/0x1b0 fs/super.c:1268
 vfs_kern_mount.part.10+0x53/0x3a0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:2514 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x35c/0x2560 fs/namespace.c:2847
 ksys_mount+0x7b/0xd0 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xb5/0x150 fs/namespace.c:3074
 do_syscall_64+0x90/0x3f0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7febbf5877ea
RSP: 002b:00007ffee25976a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000560d99d26af4 RCX: 00007febbf5877ea
RDX: 0000560d99d26aee RSI: 0000560d99d26af4 RDI: 0000560d99d26aee
RBP: 0000560d99d26aee R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffee25976b0
R13: 0000560d99d26aeb R14: 0000000000000003 R15: 000000000000000c

Allocated by task 268:
 kmem_cache_alloc_trace include/linux/slab.h:415 [inline]
 kmalloc include/linux/slab.h:512 [inline]
 kzalloc include/linux/slab.h:701 [inline]
 kernfs_mount_ns+0x71/0x7d0 fs/kernfs/mount.c:313
 sysfs_mount+0xa2/0x170 fs/sysfs/mount.c:36
 mount_fs+0x38/0x1b0 fs/super.c:1268
 vfs_kern_mount.part.10+0x53/0x3a0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:2514 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x35c/0x2560 fs/namespace.c:2847
 ksys_mount+0x7b/0xd0 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xb5/0x150 fs/namespace.c:3074
 do_syscall_64+0x90/0x3f0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 268:
 __cache_free mm/slab.c:3498 [inline]
 kfree+0x8a/0xd0 mm/slab.c:3813
 sysfs_kill_sb+0x15/0x30 fs/sysfs/mount.c:50
 deactivate_locked_super+0x80/0xc0 fs/super.c:313
 sget_userns+0x90c/0xb50 fs/super.c:523
 kernfs_mount_ns+0x134/0x7d0 fs/kernfs/mount.c:321
 sysfs_mount+0xa2/0x170 fs/sysfs/mount.c:36
 mount_fs+0x38/0x1b0 fs/super.c:1268
 vfs_kern_mount.part.10+0x53/0x3a0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:2514 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x35c/0x2560 fs/namespace.c:2847
 ksys_mount+0x7b/0xd0 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xb5/0x150 fs/namespace.c:3074
 do_syscall_64+0x90/0x3f0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88006a6e6480
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes inside of
 64-byte region [ffff88006a6e6480, ffff88006a6e64c0)
The buggy address belongs to the page:
page:ffffea0001748250 count:1 mapcount:0 mapping:ffff88006a6e6000 index:0xffff88006a6e6f80
flags: 0x100000000000100(slab)
raw: 0100000000000100 ffff88006a6e6000 ffff88006a6e6f80 0000000100000010
raw: ffff88006d401338 ffff88006d401338 ffff88006d400200
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88006a6e6380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88006a6e6400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88006a6e6480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                   ^
 ffff88006a6e6500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff88006a6e6580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================