From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7C22C43217 for ; Tue, 5 Oct 2021 20:50:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9147D615A3 for ; Tue, 5 Oct 2021 20:50:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236480AbhJEUw1 (ORCPT ); Tue, 5 Oct 2021 16:52:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236478AbhJEUwZ (ORCPT ); Tue, 5 Oct 2021 16:52:25 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9C31C061762 for ; Tue, 5 Oct 2021 13:50:33 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id kk10so557540pjb.1 for ; Tue, 05 Oct 2021 13:50:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=mWI5gmt1f2sE4WHrG7lhmtp1KhKxIzADNM6YfrEY8FI=; b=F3bw7oYnRNAlLTG/U8ZVUqqH8tA2z7msfRnTxbabkfxLdl8FflOtJyUtlFLVuLkLAz 9/4XgUogAVDFqnvjDMA+N2pKl5D35ro/qGIAz5Y0v+lvE4CQFnbZ2kDEqmjWFnBK/D1g jd0ACG9JAl175QBrVe2zuSHE20G8rhopsMNyM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=mWI5gmt1f2sE4WHrG7lhmtp1KhKxIzADNM6YfrEY8FI=; b=Ev3Pzm6tmFY2rwyVPGlSO/g/xQVDztokudfstPJW9uSBxslxCXARU4P1hfPLKF3Hyt yt/SziD8Yee2EnJjJ1xGxW7wbD+O1nR8p2oG7zopujaSCJrToCrBuVQonbSX/FLb+6z5 ArHyXZDPeKi0rb4/+5pOpjIlJ1xxkiv6n4Uh1Lm1VujT+Cmoew3QJ73U7AkCCkU6vu12 sNMzE5bWWxSDuENiV8NTrjIriHyd1XJOKB4KHFe5odfQB2exx5LBvOuzWjyXhyZdzd6Q y9tr9/Q6iaHKdiTIZQVKbTv1bQLUL6J8wQYnzwwvRk4haoWjLEkSHvwdc2sZcj6RNcSI tQ3g== X-Gm-Message-State: AOAM532Z+0x64VthCr2IUJc6SWumEYs5CGe+3I6VZKMzKehcsb3nlJF3 T5q+tRzvoPIebjFySJV8bX9LeQ== X-Google-Smtp-Source: ABdhPJxLndePGVphN7UN+zC1lVhp4XYLuJlfIX/twoOlSBWgImAvgQltIbNaNY6E3tdvZSXfFBVtrQ== X-Received: by 2002:a17:90b:374f:: with SMTP id ne15mr6234983pjb.37.1633467032808; Tue, 05 Oct 2021 13:50:32 -0700 (PDT) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id e1sm19119194pgi.43.2021.10.05.13.50.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Oct 2021 13:50:32 -0700 (PDT) Date: Tue, 5 Oct 2021 13:50:31 -0700 From: Kees Cook To: Luis Chamberlain Cc: tj@kernel.org, gregkh@linuxfoundation.org, akpm@linux-foundation.org, minchan@kernel.org, jeyu@kernel.org, shuah@kernel.org, bvanassche@acm.org, dan.j.williams@intel.com, joe@perches.com, tglx@linutronix.de, rostedt@goodmis.org, linux-spdx@vger.kernel.org, linux-doc@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v8 09/12] sysfs: fix deadlock race with module removal Message-ID: <202110051315.B2165F455@keescook> References: <20210927163805.808907-1-mcgrof@kernel.org> <20210927163805.808907-10-mcgrof@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210927163805.808907-10-mcgrof@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 27, 2021 at 09:38:02AM -0700, Luis Chamberlain wrote: > When driver sysfs attributes use a lock also used on module removal we > can race to deadlock. This happens when for instance a sysfs file on > a driver is used, then at the same time we have module removal call > trigger. The module removal call code holds a lock, and then the > driver's sysfs file entry waits for the same lock. While holding the > lock the module removal tries to remove the sysfs entries, but these > cannot be removed yet as one is waiting for a lock. This won't complete > as the lock is already held. Likewise module removal cannot complete, > and so we deadlock. > > This can now be easily reproducible with our sysfs selftest as follows: > > ./tools/testing/selftests/sysfs/sysfs.sh -t 0027 > > This uses a local driver lock. Test 0028 can also be used, that uses > the rtnl_lock(): > > ./tools/testing/selftests/sysfs/sysfs.sh -t 0028 > > To fix this we extend the struct kernfs_node with a module reference > and use the try_module_get() after kernfs_get_active() is called. As I would agree: kernfs must know about the module containing the ops structure it has been given. (Without this, there are, at the very least, removal races for looking at kernfs_ops structures.) In other places in the kernel, function callback dependencies are more explicit in that if code is holding such things, it has already taken a module reference, etc. But kernfs is special in the sense that just because a kernfs entry exists, we don't want to pin the module use count too. But simple locking isn't workable to solve this because kernfs_remove() must be able to be called from a module_exit routine without deadlocking. (i.e. we would create exactly the situation that caused this condition to get noticed in the first place.) > documented in the prior patch, we now know that once kernfs_get_active() > is called the module is implicitly guarded to exist and cannot be removed. > This is because the module is the one in charge of removing the same > sysfs file it created, and removal of sysfs files on module exit will wait > until they don't have any active references. By using a try_module_get() > after kernfs_get_active() we yield to let module removal trump calls to > process a sysfs operation, while also preventing module removal if a sysfs > operation is in already progress. This prevents the deadlock. > > This deadlock was first reported with the zram driver, however the live > patching folks have acknowledged they have observed this as well with > live patching, when a live patch is removed. I was then able to > reproduce easily by creating a dedicated selftest for it. > > A sketch of how this can happen follows, consider foo a local mutex > part of a driver, and used on the driver's module exit routine and > on one of its sysfs ops: > > foo.c: > static DEFINE_MUTEX(foo); > static ssize_t foo_store(struct device *dev, > struct device_attribute *attr, > const char *buf, size_t count) > { > ... > mutex_lock(&foo); > ... > mutex_lock(&foo); > ... > } > static DEVICE_ATTR_RW(foo); > ... > void foo_exit(void) > { > mutex_lock(&foo); > ... > mutex_unlock(&foo); > } > module_exit(foo_exit); > > And this can lead to this condition: > > CPU A CPU B > foo_store() > foo_exit() > mutex_lock(&foo) > mutex_lock(&foo) > del_gendisk(some_struct->disk); > device_del() > device_remove_groups() Please expand this further, where does device_remove_groups() end up waiting for that never happens? > > In this situation foo_store() is waiting for the mutex foo to > become unlocked, but that won't happen until module removal is complete. > But module removal won't complete until the sysfs file being poked at > completes which is waiting for a lock already held. > > Signed-off-by: Luis Chamberlain > --- > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 +- > fs/kernfs/dir.c | 44 ++++++++++++++++++---- > fs/kernfs/file.c | 6 ++- > fs/kernfs/kernfs-internal.h | 3 +- > fs/kernfs/symlink.c | 3 +- > fs/sysfs/dir.c | 2 +- > fs/sysfs/file.c | 6 ++- > fs/sysfs/group.c | 3 +- > include/linux/kernfs.h | 14 ++++--- > include/linux/sysfs.h | 52 ++++++++++++++++++++------ > kernel/cgroup/cgroup.c | 2 +- > 11 files changed, 105 insertions(+), 34 deletions(-) > > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index b57b3db9a6a7..4edf3b37fd2c 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -209,7 +209,7 @@ static int rdtgroup_add_file(struct kernfs_node *parent_kn, struct rftype *rft) > > kn = __kernfs_create_file(parent_kn, rft->name, rft->mode, > GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, > - 0, rft->kf_ops, rft, NULL, NULL); > + 0, rft->kf_ops, rft, NULL, NULL, NULL); > if (IS_ERR(kn)) > return PTR_ERR(kn); > > @@ -2482,7 +2482,7 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name, > > kn = __kernfs_create_file(parent_kn, name, 0444, > GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0, > - &kf_mondata_ops, priv, NULL, NULL); > + &kf_mondata_ops, priv, NULL, NULL, NULL); > if (IS_ERR(kn)) > return PTR_ERR(kn); > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c > index ba581429bf7b..e841201fd11b 100644 > --- a/fs/kernfs/dir.c > +++ b/fs/kernfs/dir.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > > #include "kernfs-internal.h" > > @@ -414,12 +415,29 @@ static bool kernfs_unlink_sibling(struct kernfs_node *kn) > */ > struct kernfs_node *kernfs_get_active(struct kernfs_node *kn) > { > + int v; > + > if (unlikely(!kn)) > return NULL; > > if (!atomic_inc_unless_negative(&kn->active)) > return NULL; > > + /* > + * If a module created the kernfs_node, the module cannot possibly be > + * removed if the above atomic_inc_unless_negative() succeeded. So the > + * try_module_get() below is not to protect the lifetime of the module > + * as that is already guaranteed. The try_module_get() below is used > + * to ensure that we don't deadlock in case a kernfs operation and > + * module removal used a shared lock. > + */ > + if (!try_module_get(kn->owner)) { > + v = atomic_dec_return(&kn->active); > + if (unlikely(v == KN_DEACTIVATED_BIAS)) > + wake_up_all(&kernfs_root(kn)->deactivate_waitq); > + return NULL; > + } The special casing in here makes me think this isn't happening the right place. (i.e this looks like an open-coded version of kernfs_put_active()) > + > if (kernfs_lockdep(kn)) > rwsem_acquire_read(&kn->dep_map, 0, 1, _RET_IP_); > return kn; > @@ -442,6 +460,13 @@ void kernfs_put_active(struct kernfs_node *kn) > if (kernfs_lockdep(kn)) > rwsem_release(&kn->dep_map, _RET_IP_); > v = atomic_dec_return(&kn->active); > + > + /* > + * We prevent module exit *until* we know for sure all possible > + * kernfs ops are done. > + */ > + module_put(kn->owner); > + > if (likely(v != KN_DEACTIVATED_BIAS)) > return; What I don't understand, however, is what kernfs_get/put_active() is intending to do -- it looks like it's trying to provide an interruption point for open kernfs file operations? This all seems extremely complex for what seems like it should just be a global "am I being removed?" bool? Regardless, while I do see the logic of associating the module get/put with get/put of kernfs "active", why is it not better tied to strictly kernfs open/close? That would seem to be much simpler and not require any special handling? For example, why does this not work? diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 60e2a86c535e..e44502ac244d 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -525,6 +525,9 @@ static int kernfs_get_open_node(struct kernfs_node *kn, { struct kernfs_open_node *on, *new_on = NULL; + if (!try_module_get(kn->owner)) + return -ENODEV; + retry: mutex_lock(&kernfs_open_file_mutex); spin_lock_irq(&kernfs_open_node_lock); @@ -592,6 +595,7 @@ static void kernfs_put_open_node(struct kernfs_node *kn, mutex_unlock(&kernfs_open_file_mutex); kfree(on); + module_put(kn->owner); } static int kernfs_fop_open(struct inode *inode, struct file *file) @@ -719,6 +723,7 @@ static int kernfs_fop_open(struct inode *inode, struct file *file) kfree(of); err_out: kernfs_put_active(kn); + module_put(kn->owner); return error; } > > @@ -572,7 +597,8 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root, > struct kernfs_node *parent, > const char *name, umode_t mode, > kuid_t uid, kgid_t gid, > - unsigned flags) > + unsigned flags, > + struct module *owner) > { > struct kernfs_node *kn; > u32 id_highbits; > @@ -607,6 +633,7 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root, > kn->name = name; > kn->mode = mode; > kn->flags = flags; > + kn->owner = owner; > > if (!uid_eq(uid, GLOBAL_ROOT_UID) || !gid_eq(gid, GLOBAL_ROOT_GID)) { > struct iattr iattr = { > @@ -640,12 +667,13 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root, > struct kernfs_node *kernfs_new_node(struct kernfs_node *parent, > const char *name, umode_t mode, > kuid_t uid, kgid_t gid, > - unsigned flags) > + unsigned flags, > + struct module *owner) > { > struct kernfs_node *kn; > > kn = __kernfs_new_node(kernfs_root(parent), parent, > - name, mode, uid, gid, flags); > + name, mode, uid, gid, flags, owner); > if (kn) { > kernfs_get(parent); > kn->parent = parent; > @@ -927,7 +955,7 @@ struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops, > > kn = __kernfs_new_node(root, NULL, "", S_IFDIR | S_IRUGO | S_IXUGO, > GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, > - KERNFS_DIR); > + KERNFS_DIR, NULL); > if (!kn) { > idr_destroy(&root->ino_idr); > kfree(root); > @@ -969,20 +997,22 @@ void kernfs_destroy_root(struct kernfs_root *root) > * @gid: gid of the new directory > * @priv: opaque data associated with the new directory > * @ns: optional namespace tag of the directory > + * @owner: if set, the module owner of this directory > * > * Returns the created node on success, ERR_PTR() value on failure. > */ > struct kernfs_node *kernfs_create_dir_ns(struct kernfs_node *parent, > const char *name, umode_t mode, > kuid_t uid, kgid_t gid, > - void *priv, const void *ns) > + void *priv, const void *ns, > + struct module *owner) > { > struct kernfs_node *kn; > int rc; > > /* allocate */ > kn = kernfs_new_node(parent, name, mode | S_IFDIR, > - uid, gid, KERNFS_DIR); > + uid, gid, KERNFS_DIR, owner); > if (!kn) > return ERR_PTR(-ENOMEM); > > @@ -1014,7 +1044,7 @@ struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent, > > /* allocate */ > kn = kernfs_new_node(parent, name, S_IRUGO|S_IXUGO|S_IFDIR, > - GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, KERNFS_DIR); > + GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, KERNFS_DIR, NULL); > if (!kn) > return ERR_PTR(-ENOMEM); > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c > index 4479c6580333..0e125287e050 100644 > --- a/fs/kernfs/file.c > +++ b/fs/kernfs/file.c > @@ -978,6 +978,7 @@ const struct file_operations kernfs_file_fops = { > * @priv: private data for the file > * @ns: optional namespace tag of the file > * @key: lockdep key for the file's active_ref, %NULL to disable lockdep > + * @owner: if set, the module owner of the file > * > * Returns the created node on success, ERR_PTR() value on error. > */ > @@ -987,7 +988,8 @@ struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent, > loff_t size, > const struct kernfs_ops *ops, > void *priv, const void *ns, > - struct lock_class_key *key) > + struct lock_class_key *key, > + struct module *owner) > { > struct kernfs_node *kn; > unsigned flags; > @@ -996,7 +998,7 @@ struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent, > flags = KERNFS_FILE; > > kn = kernfs_new_node(parent, name, (mode & S_IALLUGO) | S_IFREG, > - uid, gid, flags); > + uid, gid, flags, owner); > if (!kn) > return ERR_PTR(-ENOMEM); > > diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h > index 9e3abf597e2d..6d275d661987 100644 > --- a/fs/kernfs/kernfs-internal.h > +++ b/fs/kernfs/kernfs-internal.h > @@ -134,7 +134,8 @@ int kernfs_add_one(struct kernfs_node *kn); > struct kernfs_node *kernfs_new_node(struct kernfs_node *parent, > const char *name, umode_t mode, > kuid_t uid, kgid_t gid, > - unsigned flags); > + unsigned flags, > + struct module *owner); > > /* > * file.c > diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c > index 19a6c71c6ff5..5a053eebee52 100644 > --- a/fs/kernfs/symlink.c > +++ b/fs/kernfs/symlink.c > @@ -36,7 +36,8 @@ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent, > gid = target->iattr->ia_gid; > } > > - kn = kernfs_new_node(parent, name, S_IFLNK|0777, uid, gid, KERNFS_LINK); > + kn = kernfs_new_node(parent, name, S_IFLNK|0777, uid, gid, KERNFS_LINK, > + target->owner); > if (!kn) > return ERR_PTR(-ENOMEM); > > diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c > index b6b6796e1616..9763c2fde3c7 100644 > --- a/fs/sysfs/dir.c > +++ b/fs/sysfs/dir.c > @@ -57,7 +57,7 @@ int sysfs_create_dir_ns(struct kobject *kobj, const void *ns) > kobject_get_ownership(kobj, &uid, &gid); > > kn = kernfs_create_dir_ns(parent, kobject_name(kobj), 0755, uid, gid, > - kobj, ns); > + kobj, ns, NULL); > if (IS_ERR(kn)) { > if (PTR_ERR(kn) == -EEXIST) > sysfs_warn_dup(parent, kobject_name(kobj)); > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c > index 42dcf96881b6..af9e91fd1a92 100644 > --- a/fs/sysfs/file.c > +++ b/fs/sysfs/file.c > @@ -292,7 +292,8 @@ int sysfs_add_file_mode_ns(struct kernfs_node *parent, > #endif > > kn = __kernfs_create_file(parent, attr->name, mode & 0777, uid, gid, > - PAGE_SIZE, ops, (void *)attr, ns, key); > + PAGE_SIZE, ops, (void *)attr, ns, key, > + attr->owner); > if (IS_ERR(kn)) { > if (PTR_ERR(kn) == -EEXIST) > sysfs_warn_dup(parent, attr->name); > @@ -327,7 +328,8 @@ int sysfs_add_bin_file_mode_ns(struct kernfs_node *parent, > #endif > > kn = __kernfs_create_file(parent, attr->name, mode & 0777, uid, gid, > - battr->size, ops, (void *)attr, ns, key); > + battr->size, ops, (void *)attr, ns, key, > + attr->owner); > if (IS_ERR(kn)) { > if (PTR_ERR(kn) == -EEXIST) > sysfs_warn_dup(parent, attr->name); > diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c > index eeb0e3099421..372864d1cb54 100644 > --- a/fs/sysfs/group.c > +++ b/fs/sysfs/group.c > @@ -135,7 +135,8 @@ static int internal_create_group(struct kobject *kobj, int update, > } else { > kn = kernfs_create_dir_ns(kobj->sd, grp->name, > S_IRWXU | S_IRUGO | S_IXUGO, > - uid, gid, kobj, NULL); > + uid, gid, kobj, NULL, > + grp->owner); > if (IS_ERR(kn)) { > if (PTR_ERR(kn) == -EEXIST) > sysfs_warn_dup(kobj->sd, grp->name); > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h > index cd968ee2b503..818b00ebd107 100644 > --- a/include/linux/kernfs.h > +++ b/include/linux/kernfs.h > @@ -161,6 +161,7 @@ struct kernfs_node { > unsigned short flags; > umode_t mode; > struct kernfs_iattrs *iattr; > + struct module *owner; > }; > > /* > @@ -370,7 +371,8 @@ void kernfs_destroy_root(struct kernfs_root *root); > struct kernfs_node *kernfs_create_dir_ns(struct kernfs_node *parent, > const char *name, umode_t mode, > kuid_t uid, kgid_t gid, > - void *priv, const void *ns); > + void *priv, const void *ns, > + struct module *owner); > struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent, > const char *name); > struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent, > @@ -379,7 +381,8 @@ struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent, > loff_t size, > const struct kernfs_ops *ops, > void *priv, const void *ns, > - struct lock_class_key *key); > + struct lock_class_key *key, > + struct module *owner); > struct kernfs_node *kernfs_create_link(struct kernfs_node *parent, > const char *name, > struct kernfs_node *target); > @@ -472,14 +475,15 @@ static inline void kernfs_destroy_root(struct kernfs_root *root) { } > static inline struct kernfs_node * > kernfs_create_dir_ns(struct kernfs_node *parent, const char *name, > umode_t mode, kuid_t uid, kgid_t gid, > - void *priv, const void *ns) > + void *priv, const void *ns, struct module *owner) > { return ERR_PTR(-ENOSYS); } > > static inline struct kernfs_node * > __kernfs_create_file(struct kernfs_node *parent, const char *name, > umode_t mode, kuid_t uid, kgid_t gid, > loff_t size, const struct kernfs_ops *ops, > - void *priv, const void *ns, struct lock_class_key *key) > + void *priv, const void *ns, struct lock_class_key *key, > + struct module *owner) > { return ERR_PTR(-ENOSYS); } > > static inline struct kernfs_node * > @@ -566,7 +570,7 @@ kernfs_create_dir(struct kernfs_node *parent, const char *name, umode_t mode, > { > return kernfs_create_dir_ns(parent, name, mode, > GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, > - priv, NULL); > + priv, NULL, parent->owner); > } > > static inline int kernfs_remove_by_name(struct kernfs_node *parent, > diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h > index e3f1e8ac1f85..babbabb460dc 100644 > --- a/include/linux/sysfs.h > +++ b/include/linux/sysfs.h > @@ -30,6 +30,7 @@ enum kobj_ns_type; > struct attribute { > const char *name; > umode_t mode; > + struct module *owner; > #ifdef CONFIG_DEBUG_LOCK_ALLOC > bool ignore_lockdep:1; > struct lock_class_key *key; > @@ -80,6 +81,7 @@ do { \ > * @attrs: Pointer to NULL terminated list of attributes. > * @bin_attrs: Pointer to NULL terminated list of binary attributes. > * Either attrs or bin_attrs or both must be provided. > + * @module: If set, module responsible for this attribute group > */ > struct attribute_group { > const char *name; > @@ -89,6 +91,7 @@ struct attribute_group { > struct bin_attribute *, int); > struct attribute **attrs; > struct bin_attribute **bin_attrs; > + struct module *owner; > }; > > /* > @@ -100,38 +103,52 @@ struct attribute_group { > > #define __ATTR(_name, _mode, _show, _store) { \ > .attr = {.name = __stringify(_name), \ > - .mode = VERIFY_OCTAL_PERMISSIONS(_mode) }, \ > + .mode = VERIFY_OCTAL_PERMISSIONS(_mode), \ > + .owner = THIS_MODULE, \ > + }, \ > .show = _show, \ > .store = _store, \ > } > > #define __ATTR_PREALLOC(_name, _mode, _show, _store) { \ > .attr = {.name = __stringify(_name), \ > - .mode = SYSFS_PREALLOC | VERIFY_OCTAL_PERMISSIONS(_mode) },\ > + .mode = SYSFS_PREALLOC | VERIFY_OCTAL_PERMISSIONS(_mode),\ > + .owner = THIS_MODULE, \ > + }, \ > .show = _show, \ > .store = _store, \ > } > > #define __ATTR_RO(_name) { \ > - .attr = { .name = __stringify(_name), .mode = 0444 }, \ > + .attr = { .name = __stringify(_name), \ > + .mode = 0444, \ > + .owner = THIS_MODULE, \ > + }, \ > .show = _name##_show, \ > } > > #define __ATTR_RO_MODE(_name, _mode) { \ > .attr = { .name = __stringify(_name), \ > - .mode = VERIFY_OCTAL_PERMISSIONS(_mode) }, \ > + .mode = VERIFY_OCTAL_PERMISSIONS(_mode), \ > + .owner = THIS_MODULE, \ > + }, \ > .show = _name##_show, \ > } > > #define __ATTR_RW_MODE(_name, _mode) { \ > .attr = { .name = __stringify(_name), \ > - .mode = VERIFY_OCTAL_PERMISSIONS(_mode) }, \ > + .mode = VERIFY_OCTAL_PERMISSIONS(_mode), \ > + .owner = THIS_MODULE, \ > + }, \ > .show = _name##_show, \ > .store = _name##_store, \ > } > > #define __ATTR_WO(_name) { \ > - .attr = { .name = __stringify(_name), .mode = 0200 }, \ > + .attr = { .name = __stringify(_name), \ > + .mode = 0200, \ > + .owner = THIS_MODULE, \ > + }, \ > .store = _name##_store, \ > } > > @@ -141,8 +158,11 @@ struct attribute_group { > > #ifdef CONFIG_DEBUG_LOCK_ALLOC > #define __ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store) { \ > - .attr = {.name = __stringify(_name), .mode = _mode, \ > - .ignore_lockdep = true }, \ > + .attr = {.name = __stringify(_name), \ > + .mode = _mode, \ > + .ignore_lockdep = true, \ > + .owner = THIS_MODULE, \ > + }, \ > .show = _show, \ > .store = _store, \ > } > @@ -159,6 +179,7 @@ static const struct attribute_group *_name##_groups[] = { \ > #define ATTRIBUTE_GROUPS(_name) \ > static const struct attribute_group _name##_group = { \ > .attrs = _name##_attrs, \ > + .owner = THIS_MODULE, \ > }; \ > __ATTRIBUTE_GROUPS(_name) > > @@ -199,20 +220,29 @@ struct bin_attribute { > > /* macros to create static binary attributes easier */ > #define __BIN_ATTR(_name, _mode, _read, _write, _size) { \ > - .attr = { .name = __stringify(_name), .mode = _mode }, \ > + .attr = { .name = __stringify(_name), \ > + .mode = _mode, \ > + .owner = THIS_MODULE, \ > + }, \ > .read = _read, \ > .write = _write, \ > .size = _size, \ > } > > #define __BIN_ATTR_RO(_name, _size) { \ > - .attr = { .name = __stringify(_name), .mode = 0444 }, \ > + .attr = { .name = __stringify(_name), \ > + .mode = 0444, \ > + .owner = THIS_MODULE, \ > + }, \ > .read = _name##_read, \ > .size = _size, \ > } > > #define __BIN_ATTR_WO(_name, _size) { \ > - .attr = { .name = __stringify(_name), .mode = 0200 }, \ > + .attr = { .name = __stringify(_name), \ > + .mode = 0200, \ > + .owner = THIS_MODULE, \ > + }, \ > .write = _name##_write, \ > .size = _size, \ > } > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index 9e0390000025..c6b0a28f599c 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -3975,7 +3975,7 @@ static int cgroup_add_file(struct cgroup_subsys_state *css, struct cgroup *cgrp, > cgroup_file_mode(cft), > GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, > 0, cft->kf_ops, cft, > - NULL, key); > + NULL, key, NULL); > if (IS_ERR(kn)) > return PTR_ERR(kn); > > -- > 2.30.2 > -- Kees Cook