All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING in kill_block_super
@ 2018-03-29 23:01 syzbot
  2018-04-04 10:53   ` Tetsuo Handa
  0 siblings, 1 reply; 10+ messages in thread
From: syzbot @ 2018-03-29 23:01 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, syzkaller-bugs, viro

Hello,

syzbot hit the following crash on upstream commit
0b412605ef5f5c64b31f19e2910b1d5eba9929c3 (Thu Mar 29 01:07:23 2018 +0000)
Merge tag 'drm-fixes-for-v4.16-rc8' of  
git://people.freedesktop.org/~airlied/linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=5a170e19c963a2e0df79

So far this crash happened 2 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6168466046320640
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=4851040260194304
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5936700618440704
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-8440362230543204781
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+5a170e19c963a2e0df79@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

RAX: ffffffffffffffda RBX: 00007ffed0bc62a0 RCX: 0000000000442e79
RDX: 00000000004a4d6d RSI: 0000000020001740 RDI: 0000000020001780
RBP: 0000000000000000 R08: 00007ffed0bc62a0 R09: ffffffffffffffff
R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
R13: 0000000000000008 R14: 0000000000001380 R15: 00007ffed0bc5b38
WARNING: CPU: 1 PID: 4445 at fs/super.c:1148 kill_block_super+0xc9/0xf0  
fs/super.c:1148
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 4445 Comm: syzkaller195321 Not tainted 4.16.0-rc7+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x24d lib/dump_stack.c:53
  panic+0x1e4/0x41c kernel/panic.c:183
  __warn+0x1dc/0x200 kernel/panic.c:547
  report_bug+0x1f4/0x2b0 lib/bug.c:186
  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
  fixup_bug arch/x86/kernel/traps.c:247 [inline]
  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:kill_block_super+0xc9/0xf0 fs/super.c:1148
RSP: 0018:ffff8801aecc77a0 EFLAGS: 00010293
RAX: ffff8801af180040 RBX: ffff8801aede22c0 RCX: ffffffff81b31649
RDX: 0000000000000000 RSI: ffff8801b9a76478 RDI: ffff8801d5527ba0
RBP: ffff8801aecc77b8 R08: 0000000000000000 R09: 1ffff10035d98ea5
R10: ffff8801aecc7568 R11: 0000000000000001 R12: ffff8801d5527800
R13: 0000000000000000 R14: ffff8801aecc7a10 R15: ffff8801aede22c0
  fuse_kill_sb_blk+0x90/0xb0 fs/fuse/inode.c:1232
  deactivate_locked_super+0x88/0xd0 fs/super.c:312
  sget_userns+0xbda/0xe40 fs/super.c:522
  sget+0xd2/0x120 fs/super.c:557
  mount_bdev+0x109/0x370 fs/super.c:1092
  fuse_mount_blk+0x34/0x40 fs/fuse/inode.c:1219
  mount_fs+0x66/0x2d0 fs/super.c:1222
  vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
  vfs_kern_mount fs/namespace.c:2509 [inline]
  do_new_mount fs/namespace.c:2512 [inline]
  do_mount+0xea4/0x2bb0 fs/namespace.c:2842
  SYSC_mount fs/namespace.c:3058 [inline]
  SyS_mount+0xab/0x120 fs/namespace.c:3035
  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x442e79
RSP: 002b:00007ffed0bc59e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007ffed0bc62a0 RCX: 0000000000442e79
RDX: 00000000004a4d6d RSI: 0000000020001740 RDI: 0000000020001780
RBP: 0000000000000000 R08: 00007ffed0bc62a0 R09: ffffffffffffffff
R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
R13: 0000000000000008 R14: 0000000000001380 R15: 00007ffed0bc5b38
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: WARNING in kill_block_super
  2018-03-29 23:01 WARNING in kill_block_super syzbot
@ 2018-04-04 10:53   ` Tetsuo Handa
  0 siblings, 0 replies; 10+ messages in thread
From: Tetsuo Handa @ 2018-04-04 10:53 UTC (permalink / raw)
  To: viro, Michal Hocko
  Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, linux-mm,
	Dmitry Vyukov

Al and Michal, are you OK with this patch?


>From bbc0d00935ebcb7e287403bae545fae9340830d9 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 4 Apr 2018 12:19:42 +0900
Subject: [PATCH] mm,vmscan: Allow preallocating memory for register_shrinker().

syzbot is catching so many bugs triggered by commit 9ee332d99e4d5a97
("sget(): handle failures of register_shrinker()"). That commit expected
that calling kill_sb() from deactivate_locked_super() without successful
fill_super() is safe, but the reality was different; some callers assign
attributes which are needed for kill_sb() after sget() succeeds.

For example, [1] is a report where sb->s_mode (which seems to be either
FMODE_READ | FMODE_EXCL | FMODE_WRITE or FMODE_READ | FMODE_EXCL) is not
assigned unless sget() succeeds. But it does not worth complicate sget()
so that register_shrinker() failure path can safely call
kill_block_super() via kill_sb(). Making alloc_super() fail if memory
allocation for register_shrinker() failed is much simpler. Let's avoid
calling deactivate_locked_super() from sget_userns() by preallocating
memory for the shrinker and making register_shrinker() in sget_userns()
never fail.

[1] https://syzkaller.appspot.com/bug?id=588996a25a2587be2e3a54e8646728fb9cae44e7

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+5a170e19c963a2e0df79@syzkaller.appspotmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
---
 fs/super.c               |  9 ++++-----
 include/linux/shrinker.h |  6 ++++--
 mm/vmscan.c              | 15 ++++++++++++++-
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 672538c..db00f67 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -166,6 +166,7 @@ static void destroy_unused_super(struct super_block *s)
 	security_sb_free(s);
 	put_user_ns(s->s_user_ns);
 	kfree(s->s_subtype);
+	kfree(s->s_shrink.nr_deferred);
 	/* no delays needed */
 	destroy_super_work(&s->destroy_work);
 }
@@ -251,6 +252,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	s->s_shrink.count_objects = super_cache_count;
 	s->s_shrink.batch = 1024;
 	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
+	if (prepare_shrinker(&s->s_shrink))
+		goto fail;
 	return s;
 
 fail:
@@ -517,11 +520,7 @@ struct super_block *sget_userns(struct file_system_type *type,
 	hlist_add_head(&s->s_instances, &type->fs_supers);
 	spin_unlock(&sb_lock);
 	get_filesystem(type);
-	err = register_shrinker(&s->s_shrink);
-	if (err) {
-		deactivate_locked_super(s);
-		s = ERR_PTR(err);
-	}
+	register_shrinker_prepared(&s->s_shrink);
 	return s;
 }
 
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 388ff29..2728918 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -75,6 +75,8 @@ struct shrinker {
 #define SHRINKER_NUMA_AWARE	(1 << 0)
 #define SHRINKER_MEMCG_AWARE	(1 << 1)
 
-extern int register_shrinker(struct shrinker *);
-extern void unregister_shrinker(struct shrinker *);
+extern int prepare_shrinker(struct shrinker *shrinker);
+extern void register_shrinker_prepared(struct shrinker *shrinker);
+extern int register_shrinker(struct shrinker *shrinker);
+extern void unregister_shrinker(struct shrinker *shrinker);
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index cd5dc3f..a10fe8e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -258,7 +258,7 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone
 /*
  * Add a shrinker callback to be called from the vm.
  */
-int register_shrinker(struct shrinker *shrinker)
+int prepare_shrinker(struct shrinker *shrinker)
 {
 	size_t size = sizeof(*shrinker->nr_deferred);
 
@@ -268,10 +268,23 @@ int register_shrinker(struct shrinker *shrinker)
 	shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
 	if (!shrinker->nr_deferred)
 		return -ENOMEM;
+	return 0;
+}
 
+void register_shrinker_prepared(struct shrinker *shrinker)
+{
 	down_write(&shrinker_rwsem);
 	list_add_tail(&shrinker->list, &shrinker_list);
 	up_write(&shrinker_rwsem);
+}
+
+int register_shrinker(struct shrinker *shrinker)
+{
+	int err = prepare_shrinker(shrinker);
+
+	if (err)
+		return err;
+	register_shrinker_prepared(shrinker);
 	return 0;
 }
 EXPORT_SYMBOL(register_shrinker);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: WARNING in kill_block_super
@ 2018-04-04 10:53   ` Tetsuo Handa
  0 siblings, 0 replies; 10+ messages in thread
From: Tetsuo Handa @ 2018-04-04 10:53 UTC (permalink / raw)
  To: viro, Michal Hocko
  Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, linux-mm,
	Dmitry Vyukov

Al and Michal, are you OK with this patch?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: WARNING in kill_block_super
  2018-04-04 10:53   ` Tetsuo Handa
  (?)
@ 2018-04-06  8:09   ` Michal Hocko
  2018-04-07  5:55     ` Tetsuo Handa
  -1 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2018-04-06  8:09 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: viro, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs,
	linux-mm, Dmitry Vyukov

On Wed 04-04-18 19:53:07, Tetsuo Handa wrote:
> Al and Michal, are you OK with this patch?

Maybe I've misunderstood, but hasn't Al explained [1] that the
appropriate fix is in the fs code?

[1] http://lkml.kernel.org/r/20180402143415.GC30522@ZenIV.linux.org.uk
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: WARNING in kill_block_super
  2018-04-06  8:09   ` Michal Hocko
@ 2018-04-07  5:55     ` Tetsuo Handa
  0 siblings, 0 replies; 10+ messages in thread
From: Tetsuo Handa @ 2018-04-07  5:55 UTC (permalink / raw)
  To: mhocko
  Cc: viro, syzbot+5a170e19c963a2e0df79, linux-fsdevel, linux-kernel,
	syzkaller-bugs, linux-mm, dvyukov

Michal Hocko wrote:
> On Wed 04-04-18 19:53:07, Tetsuo Handa wrote:
> > Al and Michal, are you OK with this patch?
> 
> Maybe I've misunderstood, but hasn't Al explained [1] that the
> appropriate fix is in the fs code?
> 
> [1] http://lkml.kernel.org/r/20180402143415.GC30522@ZenIV.linux.org.uk

Yes. But I wonder whether it worth complicating sget() only for handling
kmalloc() failure.

----------------------------------------
static struct file_system_type fuseblk_fs_type = {
  .owner          = THIS_MODULE,
  .name           = "fuseblk",
  .mount          = fuse_mount_blk,
  .kill_sb        = fuse_kill_sb_blk,
  .fs_flags       = FS_REQUIRES_DEV | FS_HAS_SUBTYPE,
};

static struct dentry *fuse_mount_blk(struct file_system_type *fs_type, int flags, const char *dev_name, void *raw_data) {
  return mount_bdev(fs_type, flags, dev_name, raw_data, fuse_fill_super) {
    fmode_t mode = FMODE_READ | FMODE_EXCL;
    if (!(flags & MS_RDONLY)) mode |= FMODE_WRITE;
    s = sget(fs_type, test_bdev_super, set_bdev_super, flags | MS_NOSEC, bdev) {
      return sget_userns(type, test, set, flags, user_ns, data) {
        s = alloc_super(type, (flags & ~MS_SUBMOUNT), user_ns);
        err = register_shrinker(&s->s_shrink);
        if (err) {
          deactivate_locked_super(s) {
            fs->kill_sb(s) = fuse_kill_sb_blk(s) {
              kill_block_super(sb) {
                struct block_device *bdev = sb->s_bdev;
                fmode_t mode = sb->s_mode;
                WARN_ON_ONCE(!(mode & FMODE_EXCL)); // <= Unsafe because FMODE_EXCL is not yet set which will be set at
                blkdev_put(bdev, mode | FMODE_EXCL);
              }
            }
          }
          s = ERR_PTR(err);
        }
      }
    }
    /* If sget() succeeds then ... */
    s->s_mode = mode;                               // <= this location.
    error = fill_super(s, data, flags & MS_SILENT ? 1 : 0);
    if (error) {
      deactivate_locked_super(s) {
        fs->kill_sb(s) = fuse_kill_sb_blk(s) {
          kill_block_super(sb) {
            struct block_device *bdev = sb->s_bdev;
            fmode_t mode = sb->s_mode;
            WARN_ON_ONCE(!(mode & FMODE_EXCL));     // <= Safe because FMODE_EXCL already set.
            blkdev_put(bdev, mode | FMODE_EXCL);
          }
        }
      }
      goto error;
    }
    /* If sget() fails then ... */
    error = PTR_ERR(s);
    blkdev_put(bdev, mode);                         // <= Calls blkdev_put() after deactivate_locked_super() already called blkdev_put().
  }
}
----------------------------------------

mount_bdev() is not ready to call blkdev_put() from sget().
Do we want to pass "s->s_mode" to sget() which allocates "s" ?

I feel it is preposterous that a function which allocates memory for an object
requires some of fields being already initialized in order to call a destroy
function.

By splitting register_shrinker() into prepare_shrinker() which might fail and
register_shrinker_prepared() which will not fail, we can allow shrinker users
to allocate memory at object creation time. I wrote a patch which adds
__must_check to register_shrinker() and we keep that patch in linux-next.git,
but what we got is a fake change which do not implement proper error handling
(e.g.

  Commit 6c4ca1e36cdc1a0a ("bcache: check return value of register_shrinker")

        if (register_shrinker(&c->shrink))
                pr_warn("bcache: %s: could not register shrinker",
                                __func__);

). It is not trivial to undo an error at register_shrinker().
Allocating memory for the shrinker at the time memory for an object which
contains the shrinker is allocated is much easier to undo.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: WARNING in kill_block_super
  2018-04-04 10:53   ` Tetsuo Handa
  (?)
  (?)
@ 2018-04-11  0:59   ` Al Viro
  2018-04-11  1:28     ` Tetsuo Handa
  -1 siblings, 1 reply; 10+ messages in thread
From: Al Viro @ 2018-04-11  0:59 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Michal Hocko, syzbot, linux-fsdevel, linux-kernel,
	syzkaller-bugs, linux-mm, Dmitry Vyukov

On Wed, Apr 04, 2018 at 07:53:07PM +0900, Tetsuo Handa wrote:
> Al and Michal, are you OK with this patch?

First of all, it does *NOT* fix the problems with careless ->kill_sb().
The fuse-blk case is the only real rationale so far.  Said that,

> @@ -166,6 +166,7 @@ static void destroy_unused_super(struct super_block *s)
>  	security_sb_free(s);
>  	put_user_ns(s->s_user_ns);
>  	kfree(s->s_subtype);
> +	kfree(s->s_shrink.nr_deferred);

is probably better done with an inlined helper (fs/super.c has no business knowing
about ->nr_deferred name, and there probably will be other users of that
preallocation of yours).  And the same helper would be better off zeroing the
pointer, same as unregister_shrinker() does.


> -int register_shrinker(struct shrinker *shrinker)
> +int prepare_shrinker(struct shrinker *shrinker)

preallocate_shrinker(), perhaps?

> +int register_shrinker(struct shrinker *shrinker)
> +{
> +	int err = prepare_shrinker(shrinker);
> +
> +	if (err)
> +		return err;
> +	register_shrinker_prepared(shrinker);

	if (!err)
		register_....;
	return err;

would be better, IMO.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: WARNING in kill_block_super
  2018-04-11  0:59   ` Al Viro
@ 2018-04-11  1:28     ` Tetsuo Handa
  2018-04-11  1:38       ` Al Viro
  0 siblings, 1 reply; 10+ messages in thread
From: Tetsuo Handa @ 2018-04-11  1:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Michal Hocko, linux-fsdevel, linux-kernel, syzkaller-bugs,
	linux-mm, Dmitry Vyukov, syzbot

Al Viro wrote:
> On Wed, Apr 04, 2018 at 07:53:07PM +0900, Tetsuo Handa wrote:
> > Al and Michal, are you OK with this patch?
> 
> First of all, it does *NOT* fix the problems with careless ->kill_sb().
> The fuse-blk case is the only real rationale so far.  Said that,
> 

Please notice below one as well. Fixing all careless ->kill_sb() will be too
difficult to backport. For now, avoid calling deactivate_locked_super() is
safer.


[upstream] WARNING: refcount bug in put_pid_ns
https://syzkaller.appspot.com/bug?id=17e202b4794da213570ba33ac2f70277ef1ce015

static __latent_entropy struct task_struct *copy_process(unsigned long clone_flags, unsigned long stack_start, unsigned long stack_size, int __user *child_tidptr, struct pid *pid, int trace, unsigned long tls, int node)
{
(...snipped...)
  if (pid != &init_struct_pid) {
    pid = alloc_pid(p->nsproxy->pid_ns_for_children) {
      pid_ns_prepare_proc(ns) {
        mnt = kern_mount_data(&proc_fs_type, ns) {
          mnt = vfs_kern_mount(type, SB_KERNMOUNT, type->name, data) {
            root = mount_fs(type, flags, name, data) {
              root = type->mount(type, flags, name, data) {
                return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super) {
                  sb = sget_userns(fs_type, ns_test_super, ns_set_super, flags, user_ns, ns) {
                    err = register_shrinker(&s->s_shrink); // <= failed by fault injection.
                    if (err) {
                      deactivate_locked_super(s) {
                        fs->kill_sb(s) {
                          put_pid_ns(ns) {
                            kref_put(&ns->kref, free_pid_ns) // <= ns->kref is decremented here.
                          }
                        }
                      }
                      s = ERR_PTR(err);
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    if (IS_ERR(pid)) {
      retval = PTR_ERR(pid);
      goto bad_fork_cleanup_thread;
    }
  }
(...snipped...)
bad_fork_cleanup_thread:
  exit_thread(p);
bad_fork_cleanup_io:
  if (p->io_context) exit_io_context(p);
bad_fork_cleanup_namespaces:
  exit_task_namespaces(p) {
     switch_task_namespaces(p, NULL) {
       if (ns && atomic_dec_and_test(&ns->count)) { // <= ns->count becomes 0
         free_nsproxy(ns) {
           if (ns->pid_ns_for_children) {
             put_pid_ns(ns->pid_ns_for_children) {
               kref_put(&ns->kref, free_pid_ns) // <= ns->kref is decremented again and underflows.
             }
           }
         }
       }
     }
  }
(...snipped...)
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: WARNING in kill_block_super
  2018-04-11  1:28     ` Tetsuo Handa
@ 2018-04-11  1:38       ` Al Viro
  2018-04-11 10:09           ` Tetsuo Handa
  0 siblings, 1 reply; 10+ messages in thread
From: Al Viro @ 2018-04-11  1:38 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Michal Hocko, linux-fsdevel, linux-kernel, syzkaller-bugs,
	linux-mm, Dmitry Vyukov, syzbot

On Wed, Apr 11, 2018 at 10:28:06AM +0900, Tetsuo Handa wrote:
> Al Viro wrote:
> > On Wed, Apr 04, 2018 at 07:53:07PM +0900, Tetsuo Handa wrote:
> > > Al and Michal, are you OK with this patch?
> > 
> > First of all, it does *NOT* fix the problems with careless ->kill_sb().
> > The fuse-blk case is the only real rationale so far.  Said that,
> > 
> 
> Please notice below one as well. Fixing all careless ->kill_sb() will be too
> difficult to backport. For now, avoid calling deactivate_locked_super() is
> safer.

How will that fix e.g. jffs2?

> [upstream] WARNING: refcount bug in put_pid_ns
> https://syzkaller.appspot.com/bug?id=17e202b4794da213570ba33ac2f70277ef1ce015

Should be fixed by 8e666cb33597 in that series, AFAICS.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: WARNING in kill_block_super
  2018-04-11  1:38       ` Al Viro
@ 2018-04-11 10:09           ` Tetsuo Handa
  0 siblings, 0 replies; 10+ messages in thread
From: Tetsuo Handa @ 2018-04-11 10:09 UTC (permalink / raw)
  To: viro, mhocko
  Cc: linux-fsdevel, linux-kernel, syzkaller-bugs, linux-mm, dvyukov,
	syzbot+5a170e19c963a2e0df79

Al Viro wrote:
> On Wed, Apr 11, 2018 at 10:28:06AM +0900, Tetsuo Handa wrote:
> > Al Viro wrote:
> > > On Wed, Apr 04, 2018 at 07:53:07PM +0900, Tetsuo Handa wrote:
> > > > Al and Michal, are you OK with this patch?
> > > 
> > > First of all, it does *NOT* fix the problems with careless ->kill_sb().
> > > The fuse-blk case is the only real rationale so far.  Said that,
> > > 
> > 
> > Please notice below one as well. Fixing all careless ->kill_sb() will be too
> > difficult to backport. For now, avoid calling deactivate_locked_super() is
> > safer.
> 
> How will that fix e.g. jffs2?

You can send patches which my patch does not fix.

> 
> > [upstream] WARNING: refcount bug in put_pid_ns
> > https://syzkaller.appspot.com/bug?id=17e202b4794da213570ba33ac2f70277ef1ce015
> 
> Should be fixed by 8e666cb33597 in that series, AFAICS.

OK.



Al Viro wrote:
> On Wed, Apr 04, 2018 at 07:53:07PM +0900, Tetsuo Handa wrote:
> > Al and Michal, are you OK with this patch?
> 
> First of all, it does *NOT* fix the problems with careless ->kill_sb().
> The fuse-blk case is the only real rationale so far.  Said that,
> 
> > @@ -166,6 +166,7 @@ static void destroy_unused_super(struct super_block *s)
> >  	security_sb_free(s);
> >  	put_user_ns(s->s_user_ns);
> >  	kfree(s->s_subtype);
> > +	kfree(s->s_shrink.nr_deferred);
> 
> is probably better done with an inlined helper (fs/super.c has no business knowing
> about ->nr_deferred name, and there probably will be other users of that
> preallocation of yours).  And the same helper would be better off zeroing the
> pointer, same as unregister_shrinker() does.
> 
> 
> > -int register_shrinker(struct shrinker *shrinker)
> > +int prepare_shrinker(struct shrinker *shrinker)
> 
> preallocate_shrinker(), perhaps?
> 
> > +int register_shrinker(struct shrinker *shrinker)
> > +{
> > +	int err = prepare_shrinker(shrinker);
> > +
> > +	if (err)
> > +		return err;
> > +	register_shrinker_prepared(shrinker);
> 
> 	if (!err)
> 		register_....;
> 	return err;
> 
> would be better, IMO.
> 

OK. Here is version 2. What do you think?



>From 9d035f5bee3861cb73c4d323c03121f431edf760 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 11 Apr 2018 11:44:46 +0900
Subject: [PATCH v2] mm,vmscan: Allow preallocating memory for
 register_shrinker().

syzbot is catching so many bugs triggered by commit 9ee332d99e4d5a97
("sget(): handle failures of register_shrinker()"). That commit expected
that calling kill_sb() from deactivate_locked_super() without successful
fill_super() is safe. But it turned out that there are many bugs which
exist before that commit, and also that that commit caused regressions
in several cases.

For example, [1] is a report where sb->s_mode (which seems to be either
FMODE_READ | FMODE_EXCL | FMODE_WRITE or FMODE_READ | FMODE_EXCL) is not
assigned unless sget() succeeds. But it does not worth complicate sget()
so that register_shrinker() failure path can safely call
kill_block_super() via kill_sb(). Making alloc_super() fail if memory
allocation for register_shrinker() failed is much simpler.

Although this patch hides some of bugs revealed by that commit, this patch
allows preallocating memory for the shrinker. By this change, we can avoid
calling deactivate_locked_super() from sget_userns(). Manual auditing and
syzbot tests will eventually reveal remaining bugs.

[1] https://syzkaller.appspot.com/bug?id=588996a25a2587be2e3a54e8646728fb9cae44e7

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+84371b6062cb639d797e@syzkaller.appspotmail.com> # WARNING: refcount bug in should_fail
Reported-by: syzbot <syzbot+5a170e19c963a2e0df79@syzkaller.appspotmail.com> # WARNING in kill_block_super
Reported-by: syzbot <syzbot+66a731f39da94bb14930@syzkaller.appspotmail.com> # WARNING: refcount bug in put_pid_ns
Reported-by: syzbot <syzbot+7a1cff37dbbef9e7ba4c@syzkaller.appspotmail.com> # KASAN: use-after-free Read in alloc_pid
Reported-by: syzbot <syzbot+151de3f2be6b40ac8026@syzkaller.appspotmail.com> # general protection fault in kernfs_kill_sb
Cc: stable <stable@vger.kernel.org> # 4.15+
Cc: Al Viro <vilo@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
---
 fs/super.c               |  9 ++++-----
 include/linux/shrinker.h | 21 +++++++++++++++++++--
 mm/vmscan.c              | 20 +++++++++++++++-----
 3 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 672538c..5a839cd8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -166,6 +166,7 @@ static void destroy_unused_super(struct super_block *s)
 	security_sb_free(s);
 	put_user_ns(s->s_user_ns);
 	kfree(s->s_subtype);
+	unallocate_shrinker(&s->s_shrink);
 	/* no delays needed */
 	destroy_super_work(&s->destroy_work);
 }
@@ -251,6 +252,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	s->s_shrink.count_objects = super_cache_count;
 	s->s_shrink.batch = 1024;
 	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
+	if (preallocate_shrinker(&s->s_shrink))
+		goto fail;
 	return s;
 
 fail:
@@ -517,11 +520,7 @@ struct super_block *sget_userns(struct file_system_type *type,
 	hlist_add_head(&s->s_instances, &type->fs_supers);
 	spin_unlock(&sb_lock);
 	get_filesystem(type);
-	err = register_shrinker(&s->s_shrink);
-	if (err) {
-		deactivate_locked_super(s);
-		s = ERR_PTR(err);
-	}
+	register_preallocated_shrinker(&s->s_shrink);
 	return s;
 }
 
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 388ff29..ae5f557 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -75,6 +75,23 @@ struct shrinker {
 #define SHRINKER_NUMA_AWARE	(1 << 0)
 #define SHRINKER_MEMCG_AWARE	(1 << 1)
 
-extern int register_shrinker(struct shrinker *);
-extern void unregister_shrinker(struct shrinker *);
+extern int preallocate_shrinker(struct shrinker *shrinker);
+extern void register_preallocated_shrinker(struct shrinker *shrinker);
+extern void unallocate_shrinker(struct shrinker *shrinker);
+extern void unregister_shrinker(struct shrinker *shrinker);
+
+/*
+ * Try to replace register_shrinker() with preallocate_shrinker() and
+ * register_preallocated_shrinker() if that makes error handling easier.
+ * Call unallocate_shrinker() if a shrinker is discarded between after
+ * preallocate_shrinker() and before register_preallocated_shrinker().
+ */
+static inline int register_shrinker(struct shrinker *shrinker)
+{
+	int err = preallocate_shrinker(shrinker);
+
+	if (!err)
+		register_preallocated_shrinker(shrinker);
+	return err;
+}
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4390a8d..17b3073 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -258,7 +258,7 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone
 /*
  * Add a shrinker callback to be called from the vm.
  */
-int register_shrinker(struct shrinker *shrinker)
+int preallocate_shrinker(struct shrinker *shrinker)
 {
 	size_t size = sizeof(*shrinker->nr_deferred);
 
@@ -268,17 +268,28 @@ int register_shrinker(struct shrinker *shrinker)
 	shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
 	if (!shrinker->nr_deferred)
 		return -ENOMEM;
+	return 0;
+}
+EXPORT_SYMBOL(preallocate_shrinker);
 
+void register_preallocated_shrinker(struct shrinker *shrinker)
+{
 	down_write(&shrinker_rwsem);
 	list_add_tail(&shrinker->list, &shrinker_list);
 	up_write(&shrinker_rwsem);
-	return 0;
 }
-EXPORT_SYMBOL(register_shrinker);
+EXPORT_SYMBOL(register_preallocated_shrinker);
 
 /*
  * Remove one
  */
+void unallocate_shrinker(struct shrinker *shrinker)
+{
+	kfree(shrinker->nr_deferred);
+	shrinker->nr_deferred = NULL;
+}
+EXPORT_SYMBOL(unallocate_shrinker);
+
 void unregister_shrinker(struct shrinker *shrinker)
 {
 	if (!shrinker->nr_deferred)
@@ -286,8 +297,7 @@ void unregister_shrinker(struct shrinker *shrinker)
 	down_write(&shrinker_rwsem);
 	list_del(&shrinker->list);
 	up_write(&shrinker_rwsem);
-	kfree(shrinker->nr_deferred);
-	shrinker->nr_deferred = NULL;
+	unallocate_shrinker(shrinker);
 }
 EXPORT_SYMBOL(unregister_shrinker);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: WARNING in kill_block_super
@ 2018-04-11 10:09           ` Tetsuo Handa
  0 siblings, 0 replies; 10+ messages in thread
From: Tetsuo Handa @ 2018-04-11 10:09 UTC (permalink / raw)
  To: viro, mhocko
  Cc: linux-fsdevel, linux-kernel, syzkaller-bugs, linux-mm, dvyukov,
	syzbot+5a170e19c963a2e0df79

Al Viro wrote:
> On Wed, Apr 11, 2018 at 10:28:06AM +0900, Tetsuo Handa wrote:
> > Al Viro wrote:
> > > On Wed, Apr 04, 2018 at 07:53:07PM +0900, Tetsuo Handa wrote:
> > > > Al and Michal, are you OK with this patch?
> > > 
> > > First of all, it does *NOT* fix the problems with careless ->kill_sb().
> > > The fuse-blk case is the only real rationale so far.  Said that,
> > > 
> > 
> > Please notice below one as well. Fixing all careless ->kill_sb() will be too
> > difficult to backport. For now, avoid calling deactivate_locked_super() is
> > safer.
> 
> How will that fix e.g. jffs2?

You can send patches which my patch does not fix.

> 
> > [upstream] WARNING: refcount bug in put_pid_ns
> > https://syzkaller.appspot.com/bug?id=17e202b4794da213570ba33ac2f70277ef1ce015
> 
> Should be fixed by 8e666cb33597 in that series, AFAICS.

OK.



Al Viro wrote:
> On Wed, Apr 04, 2018 at 07:53:07PM +0900, Tetsuo Handa wrote:
> > Al and Michal, are you OK with this patch?
> 
> First of all, it does *NOT* fix the problems with careless ->kill_sb().
> The fuse-blk case is the only real rationale so far.  Said that,
> 
> > @@ -166,6 +166,7 @@ static void destroy_unused_super(struct super_block *s)
> >  	security_sb_free(s);
> >  	put_user_ns(s->s_user_ns);
> >  	kfree(s->s_subtype);
> > +	kfree(s->s_shrink.nr_deferred);
> 
> is probably better done with an inlined helper (fs/super.c has no business knowing
> about ->nr_deferred name, and there probably will be other users of that
> preallocation of yours).  And the same helper would be better off zeroing the
> pointer, same as unregister_shrinker() does.
> 
> 
> > -int register_shrinker(struct shrinker *shrinker)
> > +int prepare_shrinker(struct shrinker *shrinker)
> 
> preallocate_shrinker(), perhaps?
> 
> > +int register_shrinker(struct shrinker *shrinker)
> > +{
> > +	int err = prepare_shrinker(shrinker);
> > +
> > +	if (err)
> > +		return err;
> > +	register_shrinker_prepared(shrinker);
> 
> 	if (!err)
> 		register_....;
> 	return err;
> 
> would be better, IMO.
> 

OK. Here is version 2. What do you think?



>>From 9d035f5bee3861cb73c4d323c03121f431edf760 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 11 Apr 2018 11:44:46 +0900
Subject: [PATCH v2] mm,vmscan: Allow preallocating memory for
 register_shrinker().

syzbot is catching so many bugs triggered by commit 9ee332d99e4d5a97
("sget(): handle failures of register_shrinker()"). That commit expected
that calling kill_sb() from deactivate_locked_super() without successful
fill_super() is safe. But it turned out that there are many bugs which
exist before that commit, and also that that commit caused regressions
in several cases.

For example, [1] is a report where sb->s_mode (which seems to be either
FMODE_READ | FMODE_EXCL | FMODE_WRITE or FMODE_READ | FMODE_EXCL) is not
assigned unless sget() succeeds. But it does not worth complicate sget()
so that register_shrinker() failure path can safely call
kill_block_super() via kill_sb(). Making alloc_super() fail if memory
allocation for register_shrinker() failed is much simpler.

Although this patch hides some of bugs revealed by that commit, this patch
allows preallocating memory for the shrinker. By this change, we can avoid
calling deactivate_locked_super() from sget_userns(). Manual auditing and
syzbot tests will eventually reveal remaining bugs.

[1] https://syzkaller.appspot.com/bug?id=588996a25a2587be2e3a54e8646728fb9cae44e7

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+84371b6062cb639d797e@syzkaller.appspotmail.com> # WARNING: refcount bug in should_fail
Reported-by: syzbot <syzbot+5a170e19c963a2e0df79@syzkaller.appspotmail.com> # WARNING in kill_block_super
Reported-by: syzbot <syzbot+66a731f39da94bb14930@syzkaller.appspotmail.com> # WARNING: refcount bug in put_pid_ns
Reported-by: syzbot <syzbot+7a1cff37dbbef9e7ba4c@syzkaller.appspotmail.com> # KASAN: use-after-free Read in alloc_pid
Reported-by: syzbot <syzbot+151de3f2be6b40ac8026@syzkaller.appspotmail.com> # general protection fault in kernfs_kill_sb
Cc: stable <stable@vger.kernel.org> # 4.15+
Cc: Al Viro <vilo@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
---
 fs/super.c               |  9 ++++-----
 include/linux/shrinker.h | 21 +++++++++++++++++++--
 mm/vmscan.c              | 20 +++++++++++++++-----
 3 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 672538c..5a839cd8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -166,6 +166,7 @@ static void destroy_unused_super(struct super_block *s)
 	security_sb_free(s);
 	put_user_ns(s->s_user_ns);
 	kfree(s->s_subtype);
+	unallocate_shrinker(&s->s_shrink);
 	/* no delays needed */
 	destroy_super_work(&s->destroy_work);
 }
@@ -251,6 +252,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	s->s_shrink.count_objects = super_cache_count;
 	s->s_shrink.batch = 1024;
 	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
+	if (preallocate_shrinker(&s->s_shrink))
+		goto fail;
 	return s;
 
 fail:
@@ -517,11 +520,7 @@ struct super_block *sget_userns(struct file_system_type *type,
 	hlist_add_head(&s->s_instances, &type->fs_supers);
 	spin_unlock(&sb_lock);
 	get_filesystem(type);
-	err = register_shrinker(&s->s_shrink);
-	if (err) {
-		deactivate_locked_super(s);
-		s = ERR_PTR(err);
-	}
+	register_preallocated_shrinker(&s->s_shrink);
 	return s;
 }
 
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 388ff29..ae5f557 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -75,6 +75,23 @@ struct shrinker {
 #define SHRINKER_NUMA_AWARE	(1 << 0)
 #define SHRINKER_MEMCG_AWARE	(1 << 1)
 
-extern int register_shrinker(struct shrinker *);
-extern void unregister_shrinker(struct shrinker *);
+extern int preallocate_shrinker(struct shrinker *shrinker);
+extern void register_preallocated_shrinker(struct shrinker *shrinker);
+extern void unallocate_shrinker(struct shrinker *shrinker);
+extern void unregister_shrinker(struct shrinker *shrinker);
+
+/*
+ * Try to replace register_shrinker() with preallocate_shrinker() and
+ * register_preallocated_shrinker() if that makes error handling easier.
+ * Call unallocate_shrinker() if a shrinker is discarded between after
+ * preallocate_shrinker() and before register_preallocated_shrinker().
+ */
+static inline int register_shrinker(struct shrinker *shrinker)
+{
+	int err = preallocate_shrinker(shrinker);
+
+	if (!err)
+		register_preallocated_shrinker(shrinker);
+	return err;
+}
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4390a8d..17b3073 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -258,7 +258,7 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone
 /*
  * Add a shrinker callback to be called from the vm.
  */
-int register_shrinker(struct shrinker *shrinker)
+int preallocate_shrinker(struct shrinker *shrinker)
 {
 	size_t size = sizeof(*shrinker->nr_deferred);
 
@@ -268,17 +268,28 @@ int register_shrinker(struct shrinker *shrinker)
 	shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
 	if (!shrinker->nr_deferred)
 		return -ENOMEM;
+	return 0;
+}
+EXPORT_SYMBOL(preallocate_shrinker);
 
+void register_preallocated_shrinker(struct shrinker *shrinker)
+{
 	down_write(&shrinker_rwsem);
 	list_add_tail(&shrinker->list, &shrinker_list);
 	up_write(&shrinker_rwsem);
-	return 0;
 }
-EXPORT_SYMBOL(register_shrinker);
+EXPORT_SYMBOL(register_preallocated_shrinker);
 
 /*
  * Remove one
  */
+void unallocate_shrinker(struct shrinker *shrinker)
+{
+	kfree(shrinker->nr_deferred);
+	shrinker->nr_deferred = NULL;
+}
+EXPORT_SYMBOL(unallocate_shrinker);
+
 void unregister_shrinker(struct shrinker *shrinker)
 {
 	if (!shrinker->nr_deferred)
@@ -286,8 +297,7 @@ void unregister_shrinker(struct shrinker *shrinker)
 	down_write(&shrinker_rwsem);
 	list_del(&shrinker->list);
 	up_write(&shrinker_rwsem);
-	kfree(shrinker->nr_deferred);
-	shrinker->nr_deferred = NULL;
+	unallocate_shrinker(shrinker);
 }
 EXPORT_SYMBOL(unregister_shrinker);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-04-11 10:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-29 23:01 WARNING in kill_block_super syzbot
2018-04-04 10:53 ` Tetsuo Handa
2018-04-04 10:53   ` Tetsuo Handa
2018-04-06  8:09   ` Michal Hocko
2018-04-07  5:55     ` Tetsuo Handa
2018-04-11  0:59   ` Al Viro
2018-04-11  1:28     ` Tetsuo Handa
2018-04-11  1:38       ` Al Viro
2018-04-11 10:09         ` Tetsuo Handa
2018-04-11 10:09           ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.