linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] selftests: add a new test driver for sysfs
@ 2021-07-03  0:46 Luis Chamberlain
  2021-07-03  0:46 ` [PATCH v2 1/4] selftests: add tests_sysfs module Luis Chamberlain
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Luis Chamberlain @ 2021-07-03  0:46 UTC (permalink / raw)
  To: gregkh, tj, shuah, akpm, rafael, davem, kuba, ast, andriin,
	daniel, atenart, alobakin, weiwan, ap420073
  Cc: jeyu, ngupta, sergey.senozhatsky.work, minchan, mcgrof, axboe,
	mbenes, jpoimboe, tglx, keescook, jikos, rostedt, peterz,
	linux-block, netdev, linux-kselftest, linux-kernel

This v2 rebases onto the latest linux-next tag, next-20210701. A few
changes were needed, namely:

  1) changes kernfs_init_failure_injection() to return int instead
     of void. On the latest linux-next we have a new static build
     check for this, so this mistake was captured when building.

  2) I made kernfs_init_failure_injection static

  3) lib/test_sysfs.c moved to the new blk_alloc_disk() added by
     Christoph as direct queue allocation is no longer supported,
     ie, blk_alloc_queue() is no longer exported. This work was
     done by Christoph in preparation to help make add_disk*()
     callers eventually return an error code and make the error
     handling much saner. Because of this same change
     blk_cleanup_queue() is no longer needed so we embrace
     the shiny new blk_cleanup_disk().

I've put this up on my linux-next git tree [0] under the branch
named 20210701-sysfs-fix-races-v2.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20210701-sysfs-fix-races-v2

Luis Chamberlain (4):
  selftests: add tests_sysfs module
  kernfs: add initial failure injection support
  test_sysfs: add support to use kernfs failure injection
  test_sysfs: demonstrate deadlock fix

 .../fault-injection/fault-injection.rst       |   22 +
 MAINTAINERS                                   |    9 +-
 fs/kernfs/Makefile                            |    1 +
 fs/kernfs/failure-injection.c                 |   83 +
 fs/kernfs/file.c                              |   13 +
 fs/kernfs/kernfs-internal.h                   |   72 +
 include/linux/kernfs.h                        |    5 +
 lib/Kconfig.debug                             |   23 +
 lib/Makefile                                  |    1 +
 lib/test_sysfs.c                              | 1027 ++++++++++++
 tools/testing/selftests/sysfs/Makefile        |   12 +
 tools/testing/selftests/sysfs/config          |    5 +
 tools/testing/selftests/sysfs/sysfs.sh        | 1376 +++++++++++++++++
 13 files changed, 2648 insertions(+), 1 deletion(-)
 create mode 100644 fs/kernfs/failure-injection.c
 create mode 100644 lib/test_sysfs.c
 create mode 100644 tools/testing/selftests/sysfs/Makefile
 create mode 100644 tools/testing/selftests/sysfs/config
 create mode 100755 tools/testing/selftests/sysfs/sysfs.sh

-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/4] selftests: add tests_sysfs module
  2021-07-03  0:46 [PATCH v2 0/4] selftests: add a new test driver for sysfs Luis Chamberlain
@ 2021-07-03  0:46 ` Luis Chamberlain
  2021-07-21 11:32   ` Greg KH
  2021-07-03  0:46 ` [PATCH v2 2/4] kernfs: add initial failure injection support Luis Chamberlain
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Luis Chamberlain @ 2021-07-03  0:46 UTC (permalink / raw)
  To: gregkh, tj, shuah, akpm, rafael, davem, kuba, ast, andriin,
	daniel, atenart, alobakin, weiwan, ap420073
  Cc: jeyu, ngupta, sergey.senozhatsky.work, minchan, mcgrof, axboe,
	mbenes, jpoimboe, tglx, keescook, jikos, rostedt, peterz,
	linux-block, netdev, linux-kselftest, linux-kernel

This adds a new selftest module which can be used to test sysfs, which
would otherwise require using an existing driver. This lets us muck
with a template driver to test breaking things without affecting
system behaviour or requiring the dependencies of a real device
driver.

A series of 28 tests are added. Support for using two device types are
supported:

  * misc
  * block

Contrary to sysctls, sysfs requires a full write to happen at once, and
so we reduce the digit tests to single writes. Two main sysfs knobs are
provided for testing reading/storing, one which doesn't inclur any
delays and another which can incur programmed delays. What locks are
held, if any, are configurable, at module load time, or through dynamic
configuration at run time.

Since sysfs is a technically filesystem, but a pseudo one, which
requires a kernel user, our test_sysfs module and respective test script
embraces fstests format for tests in the kernel ring bufffer. Likewise,
a scraper for kernel crashes is provided which matches what fstests does
as well.

Two tests are kept disabled as they currently cause a deadlock, and so
this provides a mechanism to easily show proof and demo how the deadlock
can happen:

Demos the deadlock with a device specific lock
./tools/testing/selftests/sysfs/sysfs.sh -t 0027

Demos the deadlock with rtnl_lock()
./tools/testing/selftests/sysfs/sysfs.sh -t 0028

Two separate solutions to the deadlock issue have been proposed,
and so now its a matter of either documenting this limitation or
eventually adopting a generic fix.

This selftests will shortly be expanded upon with more tests which
require further kernel changes in order to provide better test
coverage.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 MAINTAINERS                            |    7 +
 lib/Kconfig.debug                      |   10 +
 lib/Makefile                           |    1 +
 lib/test_sysfs.c                       |  943 +++++++++++++++++++
 tools/testing/selftests/sysfs/Makefile |   12 +
 tools/testing/selftests/sysfs/config   |    2 +
 tools/testing/selftests/sysfs/sysfs.sh | 1202 ++++++++++++++++++++++++
 7 files changed, 2177 insertions(+)
 create mode 100644 lib/test_sysfs.c
 create mode 100644 tools/testing/selftests/sysfs/Makefile
 create mode 100644 tools/testing/selftests/sysfs/config
 create mode 100755 tools/testing/selftests/sysfs/sysfs.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 66d047dc6880..fd369ed50040 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17958,6 +17958,13 @@ L:	linux-mmc@vger.kernel.org
 S:	Maintained
 F:	drivers/mmc/host/sdhci-pci-dwc-mshc.c
 
+SYSFS TEST DRIVER
+M:	Luis Chamberlain <mcgrof@kernel.org>
+L:	linux-kernel@vger.kernel.org
+S:	Maintained
+F:	lib/test_sysfs.c
+F:	tools/testing/selftests/sysfs/
+
 SYSTEM CONFIGURATION (SYSCON)
 M:	Lee Jones <lee.jones@linaro.org>
 M:	Arnd Bergmann <arnd@arndb.de>
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fb370c7c4756..568838ac1189 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2360,6 +2360,16 @@ config TEST_SYSCTL
 
 	  If unsure, say N.
 
+config TEST_SYSFS
+	tristate "sysfs test driver"
+	depends on SYSFS
+	help
+	  This builds the "test_sysfs" module. This driver enables to test the
+	  sysfs file system safely without affecting production knobs which
+	  might alter system functionality.
+
+	  If unsure, say N.
+
 config BITFIELD_KUNIT
 	tristate "KUnit test bitfield functions at runtime"
 	depends on KUNIT
diff --git a/lib/Makefile b/lib/Makefile
index 5efd1b435a37..effd1ef806f0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
 obj-$(CONFIG_TEST_BITOPS) += test_bitops.o
 CFLAGS_test_bitops.o += -Werror
 obj-$(CONFIG_TEST_SYSCTL) += test_sysctl.o
+obj-$(CONFIG_TEST_SYSFS) += test_sysfs.o
 obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
 obj-$(CONFIG_TEST_IDA) += test_ida.o
 obj-$(CONFIG_KASAN_KUNIT_TEST) += test_kasan.o
diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c
new file mode 100644
index 000000000000..bf43016d40b5
--- /dev/null
+++ b/lib/test_sysfs.c
@@ -0,0 +1,943 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * sysfs test driver
+ *
+ * Copyright (C) 2021 Luis Chamberlain <mcgrof@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or at your option any
+ * later version; or, when distributed separately from the Linux kernel or
+ * when incorporated into other software packages, subject to the following
+ * license:
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of copyleft-next (version 0.3.1 or later) as published
+ * at http://copyleft-next.org/.
+ */
+
+/*
+ * This module allows us to add race conditions which we can test for
+ * against the sysfs filesystem.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/async.h>
+#include <linux/delay.h>
+#include <linux/vmalloc.h>
+#include <linux/debugfs.h>
+#include <linux/rtnetlink.h>
+#include <linux/genhd.h>
+#include <linux/blkdev.h>
+
+static bool enable_lock;
+module_param(enable_lock, bool_enable_only, 0644);
+MODULE_PARM_DESC(enable_lock,
+		 "enable locking on reads / stores from the start");
+
+static bool enable_lock_on_rmmod;
+module_param(enable_lock_on_rmmod, bool_enable_only, 0644);
+MODULE_PARM_DESC(enable_lock_on_rmmod,
+		 "enable locking on rmmod");
+
+static bool use_rtnl_lock;
+module_param(use_rtnl_lock, bool_enable_only, 0644);
+MODULE_PARM_DESC(use_rtnl_lock,
+		 "use an rtnl_lock instead of the device mutex_lock");
+
+static unsigned int write_delay_msec_y = 500;
+module_param_named(write_delay_msec_y, write_delay_msec_y, uint, 0644);
+MODULE_PARM_DESC(write_delay_msec_y, "msec write delay for writes to y");
+
+static unsigned int test_devtype;
+module_param_named(devtype, test_devtype, uint, 0644);
+MODULE_PARM_DESC(devtype, "device type to register");
+
+static bool enable_busy_alloc;
+module_param(enable_busy_alloc, bool_enable_only, 0644);
+MODULE_PARM_DESC(enable_busy_alloc, "do a fake allocation during writes");
+
+static bool enable_debugfs;
+module_param(enable_debugfs, bool_enable_only, 0644);
+MODULE_PARM_DESC(enable_debugfs, "enable a few debugfs files");
+
+static bool enable_verbose_writes;
+module_param(enable_verbose_writes, bool_enable_only, 0644);
+MODULE_PARM_DESC(enable_debugfs, "enable stores to print verbose information");
+
+static unsigned int delay_rmmod_ms;
+module_param_named(delay_rmmod_ms, delay_rmmod_ms, uint, 0644);
+MODULE_PARM_DESC(delay_rmmod_ms, "if set how many ms to delay rmmod before device deletion");
+
+static bool enable_verbose_rmmod;
+module_param(enable_verbose_rmmod, bool_enable_only, 0644);
+MODULE_PARM_DESC(enable_verbose_rmmod, "enable verbose print messages on rmmod");
+
+static int sysfs_test_major;
+
+/**
+ * test_config - used for configuring how the sysfs test device will behave
+ *
+ * @enable_lock: if enabled a lock will be used when reading/storing variables
+ * @enable_lock_on_rmmod: if enabled a lock will be used when reading/storing
+ *	sysfs attributes, but it will also be used to lock on rmmod. This is
+ *	useful to test for a deadlock.
+ * @use_rtnl_lock: if enabled instead of configuration specific mutex, we'll
+ *	use the rtnl_lock. If your test case is modifying this on the fly
+ *	while doing other stores / reads, things will break as a lock can be
+ *	left contending. Best is that tests use this knob serially, without
+ *	allowing userspace to modify other knobs while this one changes.
+ * @write_delay_msec_y: the amount of delay to use when writing to y
+ * @enable_busy_alloc: if enabled we'll do a large allocation between
+ *	writes. We immediately free right away. We also schedule to give the
+ *	kernel some time to re-use any memory we don't need. This is intened
+ *	to mimic typical driver behaviour.
+ */
+struct test_config {
+	bool enable_lock;
+	bool enable_lock_on_rmmod;
+	bool use_rtnl_lock;
+	unsigned int write_delay_msec_y;
+	bool enable_busy_alloc;
+};
+
+/**
+ * enum sysfs_test_devtype - sysfs device type
+ * @TESTDEV_TYPE_MISC: misc device type
+ * @TESTDEV_TYPE_BLOCK: use a block device for the sysfs test device.
+ */
+enum sysfs_test_devtype {
+	TESTDEV_TYPE_MISC = 0,
+	TESTDEV_TYPE_BLOCK,
+};
+
+/**
+ * sysfs_test_device - test device to help test sysfs
+ *
+ * @devtype: the type of device to use
+ * @config: configuration for the test
+ * @config_mutex: protects configuration of test
+ * @misc_dev: we use a misc device under the hood
+ * @disk: represents a disk when used as a block device
+ * @dev: pointer to misc_dev's own struct device
+ * @dev_idx: unique ID for test device
+ * @x: variable we can use to test read / store
+ * @y: slow variable we can use to test read / store
+ */
+struct sysfs_test_device {
+	enum sysfs_test_devtype devtype;
+	struct test_config config;
+	struct mutex config_mutex;
+	struct miscdevice misc_dev;
+	struct gendisk *disk;
+	struct device *dev;
+	int dev_idx;
+	int x;
+	int y;
+};
+
+struct sysfs_test_device *first_test_dev;
+
+static struct miscdevice *dev_to_misc_dev(struct device *dev)
+{
+	return dev_get_drvdata(dev);
+}
+
+static struct sysfs_test_device *misc_dev_to_test_dev(struct miscdevice *misc_dev)
+{
+	return container_of(misc_dev, struct sysfs_test_device, misc_dev);
+}
+
+static struct sysfs_test_device *devblock_to_test_dev(struct device *dev)
+{
+	return (struct sysfs_test_device *)dev_to_disk(dev)->private_data;
+}
+
+static struct sysfs_test_device *devmisc_to_testdev(struct device *dev)
+{
+	struct miscdevice *misc_dev;
+
+	misc_dev = dev_to_misc_dev(dev);
+	return misc_dev_to_test_dev(misc_dev);
+}
+
+static struct sysfs_test_device *dev_to_test_dev(struct device *dev)
+{
+	if (test_devtype == TESTDEV_TYPE_MISC)
+		return devmisc_to_testdev(dev);
+	else if (test_devtype == TESTDEV_TYPE_BLOCK)
+		return devblock_to_test_dev(dev);
+	return NULL;
+}
+
+static void test_dev_config_lock(struct sysfs_test_device *test_dev)
+{
+	struct test_config *config;
+
+	config = &test_dev->config;
+	if (config->enable_lock) {
+		if (config->use_rtnl_lock)
+			rtnl_lock();
+		else
+			mutex_lock(&test_dev->config_mutex);
+	}
+}
+
+static void test_dev_config_unlock(struct sysfs_test_device *test_dev)
+{
+	struct test_config *config;
+
+	config = &test_dev->config;
+	if (config->enable_lock) {
+		if (config->use_rtnl_lock)
+			rtnl_unlock();
+		else
+			mutex_unlock(&test_dev->config_mutex);
+	}
+}
+
+static void test_dev_config_lock_rmmod(struct sysfs_test_device *test_dev)
+{
+	struct test_config *config;
+
+	config = &test_dev->config;
+	if (config->enable_lock_on_rmmod)
+		test_dev_config_lock(test_dev);
+}
+
+static void test_dev_config_unlock_rmmod(struct sysfs_test_device *test_dev)
+{
+	struct test_config *config;
+
+	config = &test_dev->config;
+	if (config->enable_lock_on_rmmod)
+		test_dev_config_unlock(test_dev);
+}
+
+static void free_test_dev_sysfs(struct sysfs_test_device *test_dev)
+{
+	if (test_dev) {
+		kfree_const(test_dev->misc_dev.name);
+		test_dev->misc_dev.name = NULL;
+		kfree(test_dev);
+		test_dev = NULL;
+	}
+}
+
+static void test_sysfs_reset_vals(struct sysfs_test_device *test_dev)
+{
+	test_dev->x = 3;
+	test_dev->y = 4;
+}
+
+static ssize_t config_show(struct device *dev,
+			   struct device_attribute *attr,
+			   char *buf)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config = &test_dev->config;
+	int len = 0;
+
+	test_dev_config_lock(test_dev);
+
+	len += snprintf(buf, PAGE_SIZE,
+			"Configuration for: %s\n",
+			dev_name(dev));
+
+	len += snprintf(buf+len, PAGE_SIZE - len,
+			"x:\t%d\n",
+			test_dev->x);
+
+	len += snprintf(buf+len, PAGE_SIZE - len,
+			"y:\t%d\n",
+			test_dev->y);
+
+	len += snprintf(buf+len, PAGE_SIZE - len,
+			"enable_lock:\t%s\n",
+			config->enable_lock ? "true" : "false");
+
+	len += snprintf(buf+len, PAGE_SIZE - len,
+			"enable_lock_on_rmmmod:\t%s\n",
+			config->enable_lock_on_rmmod ? "true" : "false");
+
+	len += snprintf(buf+len, PAGE_SIZE - len,
+			"use_rtnl_lock:\t%s\n",
+			config->use_rtnl_lock ? "true" : "false");
+
+	len += snprintf(buf+len, PAGE_SIZE - len,
+			"write_delay_msec_y:\t%d\n",
+			config->write_delay_msec_y);
+
+	len += snprintf(buf+len, PAGE_SIZE - len,
+			"enable_busy_alloc:\t%s\n",
+			config->enable_busy_alloc ? "true" : "false");
+
+	len += snprintf(buf+len, PAGE_SIZE - len,
+			"enable_debugfs:\t%s\n",
+			enable_debugfs ? "true" : "false");
+
+	len += snprintf(buf+len, PAGE_SIZE - len,
+			"enable_verbose_writes:\t%s\n",
+			enable_verbose_writes ? "true" : "false");
+
+	test_dev_config_unlock(test_dev);
+
+	return len;
+}
+static DEVICE_ATTR_RO(config);
+
+static ssize_t reset_store(struct device *dev,
+			   struct device_attribute *attr,
+			   const char *buf, size_t count)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+
+	config = &test_dev->config;
+
+	/*
+	 * We compromise and simplify this condition and do not use a lock
+	 * here as the lock type can change.
+	 */
+	config->enable_lock = false;
+	config->enable_lock_on_rmmod = false;
+	config->use_rtnl_lock = false;
+	config->enable_busy_alloc = false;
+	test_sysfs_reset_vals(test_dev);
+
+	dev_info(dev, "reset\n");
+
+	return count;
+}
+static DEVICE_ATTR_WO(reset);
+
+static void test_dev_busy_alloc(struct sysfs_test_device *test_dev)
+{
+	struct test_config *config;
+	char *ignore;
+
+	config = &test_dev->config;
+	if (!config->enable_busy_alloc)
+		return;
+
+	ignore = kzalloc(sizeof(struct sysfs_test_device) * 10, GFP_KERNEL);
+	kfree(ignore);
+
+	schedule();
+}
+
+static ssize_t test_dev_x_store(struct device *dev,
+				struct device_attribute *attr,
+				const char *buf, size_t count)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	int ret;
+
+	test_dev_busy_alloc(test_dev);
+	test_dev_config_lock(test_dev);
+
+	ret = kstrtoint(buf, 10, &test_dev->x);
+	if (ret)
+		count = ret;
+
+	if (enable_verbose_writes)
+		dev_info(test_dev->dev, "wrote x = %d\n", test_dev->x);
+
+	test_dev_config_unlock(test_dev);
+
+	return count;
+}
+
+static ssize_t test_dev_x_show(struct device *dev,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	int ret;
+
+	test_dev_config_lock(test_dev);
+	ret = snprintf(buf, PAGE_SIZE, "%d\n", test_dev->x);
+	test_dev_config_unlock(test_dev);
+
+	return ret;
+}
+static DEVICE_ATTR_RW(test_dev_x);
+
+static ssize_t test_dev_y_store(struct device *dev,
+				struct device_attribute *attr,
+				const char *buf, size_t count)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+	int y;
+	int ret;
+
+	test_dev_busy_alloc(test_dev);
+	test_dev_config_lock(test_dev);
+
+	config = &test_dev->config;
+
+	ret = kstrtoint(buf, 10, &y);
+	if (ret)
+		count = ret;
+
+	msleep(config->write_delay_msec_y);
+	test_dev->y = test_dev->x + y + 7;
+
+	if (enable_verbose_writes)
+		dev_info(test_dev->dev, "wrote y = %d\n", test_dev->y);
+
+	test_dev_config_unlock(test_dev);
+
+	return count;
+}
+
+static ssize_t test_dev_y_show(struct device *dev,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	int ret;
+
+	test_dev_config_lock(test_dev);
+	ret = snprintf(buf, PAGE_SIZE, "%d\n", test_dev->y);
+	test_dev_config_unlock(test_dev);
+
+	return ret;
+}
+static DEVICE_ATTR_RW(test_dev_y);
+
+static ssize_t config_enable_lock_store(struct device *dev,
+					struct device_attribute *attr,
+					const char *buf, size_t count)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+	int ret;
+	int val;
+
+	config = &test_dev->config;
+	ret = kstrtoint(buf, 10, &val);
+	if (ret)
+		return ret;
+
+	/*
+	 * We compromise for simplicty and do not lock when changing
+	 * locking configuration, with the assumption userspace tests
+	 * will know this.
+	 */
+	if (val)
+		config->enable_lock = true;
+	else
+		config->enable_lock = false;
+
+	return count;
+}
+
+static ssize_t config_enable_lock_show(struct device *dev,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+	ssize_t ret;
+
+	config = &test_dev->config;
+
+	test_dev_config_lock(test_dev);
+	ret = snprintf(buf, PAGE_SIZE, "%d\n", config->enable_lock);
+	test_dev_config_unlock(test_dev);
+
+	return ret;
+}
+static DEVICE_ATTR_RW(config_enable_lock);
+
+static ssize_t config_enable_lock_on_rmmod_store(struct device *dev,
+						 struct device_attribute *attr,
+						 const char *buf, size_t count)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+	int ret;
+	int val;
+
+	config = &test_dev->config;
+	ret = kstrtoint(buf, 10, &val);
+	if (ret)
+		return ret;
+
+	test_dev_config_lock(test_dev);
+	if (val)
+		config->enable_lock_on_rmmod = true;
+	else
+		config->enable_lock_on_rmmod = false;
+	test_dev_config_unlock(test_dev);
+
+	return count;
+}
+
+static ssize_t config_enable_lock_on_rmmod_show(struct device *dev,
+						struct device_attribute *attr,
+						char *buf)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+	ssize_t ret;
+
+	config = &test_dev->config;
+
+	test_dev_config_lock(test_dev);
+	ret = snprintf(buf, PAGE_SIZE, "%d\n", config->enable_lock_on_rmmod);
+	test_dev_config_unlock(test_dev);
+
+	return ret;
+}
+static DEVICE_ATTR_RW(config_enable_lock_on_rmmod);
+
+static ssize_t config_use_rtnl_lock_store(struct device *dev,
+					  struct device_attribute *attr,
+					  const char *buf, size_t count)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+	int ret;
+	int val;
+
+	config = &test_dev->config;
+	ret = kstrtoint(buf, 10, &val);
+	if (ret)
+		return ret;
+
+	/*
+	 * We compromise and simplify this condition and do not use a lock
+	 * here as the lock type can change.
+	 */
+	if (val)
+		config->use_rtnl_lock = true;
+	else
+		config->use_rtnl_lock = false;
+
+	return count;
+}
+
+static ssize_t config_use_rtnl_lock_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+
+	config = &test_dev->config;
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", config->use_rtnl_lock);
+}
+static DEVICE_ATTR_RW(config_use_rtnl_lock);
+
+static ssize_t config_write_delay_msec_y_store(struct device *dev,
+					       struct device_attribute *attr,
+					       const char *buf, size_t count)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+	int ret;
+	int val;
+
+	config = &test_dev->config;
+	ret = kstrtoint(buf, 10, &val);
+	if (ret)
+		return ret;
+
+	test_dev_config_lock(test_dev);
+	config->write_delay_msec_y = val;
+	test_dev_config_unlock(test_dev);
+
+	return count;
+}
+
+static ssize_t config_write_delay_msec_y_show(struct device *dev,
+					      struct device_attribute *attr,
+					      char *buf)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+
+	config = &test_dev->config;
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", config->write_delay_msec_y);
+}
+static DEVICE_ATTR_RW(config_write_delay_msec_y);
+
+static ssize_t config_enable_busy_alloc_store(struct device *dev,
+					      struct device_attribute *attr,
+					      const char *buf, size_t count)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+	int ret;
+	int val;
+
+	config = &test_dev->config;
+	ret = kstrtoint(buf, 10, &val);
+	if (ret)
+		return ret;
+
+	test_dev_config_lock(test_dev);
+	config->enable_busy_alloc = val;
+	test_dev_config_unlock(test_dev);
+
+	return count;
+}
+
+static ssize_t config_enable_busy_alloc_show(struct device *dev,
+					     struct device_attribute *attr,
+					     char *buf)
+{
+	struct sysfs_test_device *test_dev = dev_to_test_dev(dev);
+	struct test_config *config;
+
+	config = &test_dev->config;
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", config->enable_busy_alloc);
+}
+static DEVICE_ATTR_RW(config_enable_busy_alloc);
+
+#define TEST_SYSFS_DEV_ATTR(name)		(&dev_attr_##name.attr)
+
+static struct attribute *test_dev_attrs[] = {
+	/* Generic driver knobs go here */
+	TEST_SYSFS_DEV_ATTR(config),
+	TEST_SYSFS_DEV_ATTR(reset),
+
+	/* These are used to test sysfs */
+	TEST_SYSFS_DEV_ATTR(test_dev_x),
+	TEST_SYSFS_DEV_ATTR(test_dev_y),
+
+	/*
+	 * These are configuration knobs to modify how we test sysfs when
+	 * doing reads / stores.
+	 */
+	TEST_SYSFS_DEV_ATTR(config_enable_lock),
+	TEST_SYSFS_DEV_ATTR(config_enable_lock_on_rmmod),
+	TEST_SYSFS_DEV_ATTR(config_use_rtnl_lock),
+	TEST_SYSFS_DEV_ATTR(config_write_delay_msec_y),
+	TEST_SYSFS_DEV_ATTR(config_enable_busy_alloc),
+
+	NULL,
+};
+
+ATTRIBUTE_GROUPS(test_dev);
+
+static int sysfs_test_dev_alloc_miscdev(struct sysfs_test_device *test_dev)
+{
+	struct miscdevice *misc_dev;
+
+	misc_dev = &test_dev->misc_dev;
+	misc_dev->minor = MISC_DYNAMIC_MINOR;
+	misc_dev->name = kasprintf(GFP_KERNEL, "test_sysfs%d", test_dev->dev_idx);
+	if (!misc_dev->name) {
+		pr_err("Cannot alloc misc_dev->name\n");
+		return -ENOMEM;
+	}
+	misc_dev->groups = test_dev_groups;
+
+	return 0;
+}
+
+static int testdev_open(struct block_device *bdev, fmode_t mode)
+{
+	return -EINVAL;
+}
+
+static blk_qc_t testdev_submit_bio(struct bio *bio)
+{
+	return BLK_QC_T_NONE;
+}
+
+static void testdev_slot_free_notify(struct block_device *bdev,
+				     unsigned long index)
+{
+}
+
+static int testdev_rw_page(struct block_device *bdev, sector_t sector,
+			   struct page *page, unsigned int op)
+{
+	return -EOPNOTSUPP;
+}
+
+static const struct block_device_operations sysfs_testdev_ops = {
+	.open = testdev_open,
+	.submit_bio = testdev_submit_bio,
+	.swap_slot_free_notify = testdev_slot_free_notify,
+	.rw_page = testdev_rw_page,
+	.owner = THIS_MODULE
+};
+
+static int sysfs_test_dev_alloc_blockdev(struct sysfs_test_device *test_dev)
+{
+	int ret = -ENOMEM;
+
+	test_dev->disk = blk_alloc_disk(NUMA_NO_NODE);
+	if (!test_dev->disk) {
+		pr_err("Error allocating disk structure for device %d\n",
+		       test_dev->dev_idx);
+		goto out;
+	}
+
+	test_dev->disk->major = sysfs_test_major;
+	test_dev->disk->first_minor = test_dev->dev_idx + 1;
+	test_dev->disk->fops = &sysfs_testdev_ops;
+	test_dev->disk->private_data = test_dev;
+	snprintf(test_dev->disk->disk_name, 16, "test_sysfs%d",
+		 test_dev->dev_idx);
+	set_capacity(test_dev->disk, 0);
+	blk_queue_flag_set(QUEUE_FLAG_NONROT, test_dev->disk->queue);
+	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, test_dev->disk->queue);
+	blk_queue_physical_block_size(test_dev->disk->queue, PAGE_SIZE);
+	blk_queue_max_discard_sectors(test_dev->disk->queue, UINT_MAX);
+	blk_queue_flag_set(QUEUE_FLAG_DISCARD, test_dev->disk->queue);
+
+	return 0;
+out:
+	return ret;
+}
+
+static struct sysfs_test_device *alloc_test_dev_sysfs(int idx)
+{
+	struct sysfs_test_device *test_dev;
+	int ret;
+
+	switch (test_devtype) {
+	case TESTDEV_TYPE_MISC:
+	       fallthrough;
+	case TESTDEV_TYPE_BLOCK:
+		break;
+	default:
+		return NULL;
+	}
+
+	test_dev = kzalloc(sizeof(struct sysfs_test_device), GFP_KERNEL);
+	if (!test_dev)
+		goto err_out;
+
+	mutex_init(&test_dev->config_mutex);
+	test_dev->dev_idx = idx;
+	test_dev->devtype = test_devtype;
+
+	if (test_dev->devtype == TESTDEV_TYPE_MISC) {
+		ret = sysfs_test_dev_alloc_miscdev(test_dev);
+		if (ret)
+			goto err_out_free;
+	} else if (test_dev->devtype == TESTDEV_TYPE_BLOCK) {
+		ret = sysfs_test_dev_alloc_blockdev(test_dev);
+		if (ret)
+			goto err_out_free;
+	}
+	return test_dev;
+
+err_out_free:
+	kfree(test_dev);
+	test_dev = NULL;
+err_out:
+	return NULL;
+}
+
+static int register_test_dev_sysfs_misc(struct sysfs_test_device *test_dev)
+{
+	int ret;
+
+	ret = misc_register(&test_dev->misc_dev);
+	if (ret)
+		return ret;
+
+	test_dev->dev = test_dev->misc_dev.this_device;
+
+	return 0;
+}
+
+static int register_test_dev_sysfs_block(struct sysfs_test_device *test_dev)
+{
+	device_add_disk(NULL, test_dev->disk, test_dev_groups);
+	test_dev->dev = disk_to_dev(test_dev->disk);
+
+	return 0;
+}
+
+static struct sysfs_test_device *register_test_dev_sysfs(void)
+{
+	struct sysfs_test_device *test_dev = NULL;
+	int ret;
+
+	test_dev = alloc_test_dev_sysfs(0);
+	if (!test_dev)
+		goto out;
+
+	if (test_dev->devtype == TESTDEV_TYPE_MISC) {
+		ret = register_test_dev_sysfs_misc(test_dev);
+		if (ret) {
+			pr_err("could not register misc device: %d\n", ret);
+			goto out_free_dev;
+		}
+	} else if (test_dev->devtype == TESTDEV_TYPE_BLOCK) {
+		ret = register_test_dev_sysfs_block(test_dev);
+		if (ret) {
+			pr_err("could not register block device: %d\n", ret);
+			goto out_free_dev;
+		}
+	}
+
+	dev_info(test_dev->dev, "interface ready\n");
+
+out:
+	return test_dev;
+out_free_dev:
+	free_test_dev_sysfs(test_dev);
+	return NULL;
+}
+
+static struct sysfs_test_device *register_test_dev_set_config(void)
+{
+	struct sysfs_test_device *test_dev;
+	struct test_config *config;
+
+	test_dev = register_test_dev_sysfs();
+	if (!test_dev)
+		return NULL;
+
+	config = &test_dev->config;
+
+	if (enable_lock)
+		config->enable_lock = true;
+	if (enable_lock_on_rmmod)
+		config->enable_lock_on_rmmod = true;
+	if (use_rtnl_lock)
+		config->use_rtnl_lock = true;
+	if (enable_busy_alloc)
+		config->enable_busy_alloc = true;
+
+	config->write_delay_msec_y = write_delay_msec_y;
+	test_sysfs_reset_vals(test_dev);
+
+	return test_dev;
+}
+
+static void unregister_test_dev_sysfs_misc(struct sysfs_test_device *test_dev)
+{
+	misc_deregister(&test_dev->misc_dev);
+}
+
+static void unregister_test_dev_sysfs_block(struct sysfs_test_device *test_dev)
+{
+	del_gendisk(test_dev->disk);
+	blk_cleanup_disk(test_dev->disk);
+}
+
+static void unregister_test_dev_sysfs(struct sysfs_test_device *test_dev)
+{
+	test_dev_config_lock_rmmod(test_dev);
+
+	dev_info(test_dev->dev, "removing interface\n");
+
+	if (test_dev->devtype == TESTDEV_TYPE_MISC)
+		unregister_test_dev_sysfs_misc(test_dev);
+	else if (test_dev->devtype == TESTDEV_TYPE_BLOCK)
+		unregister_test_dev_sysfs_block(test_dev);
+
+	test_dev_config_unlock_rmmod(test_dev);
+
+	free_test_dev_sysfs(test_dev);
+}
+
+static struct dentry *debugfs_dir;
+
+/* When read represents how many times we have reset the first_test_dev */
+static u8 reset_first_test_dev;
+
+static ssize_t read_reset_first_test_dev(struct file *file,
+					 char __user *user_buf,
+					 size_t count, loff_t *ppos)
+{
+	ssize_t len;
+	char buf[32];
+
+	reset_first_test_dev++;
+	len = sprintf(buf, "%d\n", reset_first_test_dev);
+	return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static ssize_t write_reset_first_test_dev(struct file *file,
+					  const char __user *user_buf,
+					  size_t count, loff_t *ppos)
+{
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	if (!first_test_dev) {
+		module_put(THIS_MODULE);
+		return -ENODEV;
+	}
+
+	dev_info(first_test_dev->dev, "going to reset first interface ...\n");
+
+	unregister_test_dev_sysfs(first_test_dev);
+	first_test_dev = register_test_dev_set_config();
+
+	dev_info(first_test_dev->dev, "first interface reset complete\n");
+
+	module_put(THIS_MODULE);
+
+	return count;
+}
+
+static const struct file_operations fops_reset_first_test_dev = {
+	.read = read_reset_first_test_dev,
+	.write = write_reset_first_test_dev,
+	.open = simple_open,
+	.owner = THIS_MODULE,
+	.llseek = default_llseek,
+};
+
+static int __init test_sysfs_init(void)
+{
+	first_test_dev = register_test_dev_set_config();
+	if (!first_test_dev)
+		return -ENOMEM;
+
+	if (!enable_debugfs)
+		return 0;
+
+	debugfs_dir = debugfs_create_dir("test_sysfs", NULL);
+	if (!debugfs_dir) {
+		unregister_test_dev_sysfs(first_test_dev);
+		return -ENOMEM;
+	}
+
+	debugfs_create_file("reset_first_test_dev", 0600, debugfs_dir,
+			    NULL, &fops_reset_first_test_dev);
+	return 0;
+}
+module_init(test_sysfs_init);
+
+static void __exit test_sysfs_exit(void)
+{
+	if (enable_debugfs)
+		debugfs_remove(debugfs_dir);
+	if (delay_rmmod_ms)
+		msleep(delay_rmmod_ms);
+	unregister_test_dev_sysfs(first_test_dev);
+	if (enable_verbose_rmmod)
+		pr_info("unregister_test_dev_sysfs() completed\n");
+	first_test_dev = NULL;
+}
+module_exit(test_sysfs_exit);
+
+MODULE_AUTHOR("Luis Chamberlain <mcgrof@kernel.org>");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/sysfs/Makefile b/tools/testing/selftests/sysfs/Makefile
new file mode 100644
index 000000000000..fde99caa2338
--- /dev/null
+++ b/tools/testing/selftests/sysfs/Makefile
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0-only
+# Makefile for sysfs selftests.
+
+# No binaries, but make sure arg-less "make" doesn't trigger "run_tests".
+all:
+
+TEST_PROGS := sysfs.sh
+
+include ../lib.mk
+
+# Nothing to clean up.
+clean:
diff --git a/tools/testing/selftests/sysfs/config b/tools/testing/selftests/sysfs/config
new file mode 100644
index 000000000000..9196f452ecd5
--- /dev/null
+++ b/tools/testing/selftests/sysfs/config
@@ -0,0 +1,2 @@
+CONFIG_SYSFS=m
+CONFIG_TEST_SYSFS=m
diff --git a/tools/testing/selftests/sysfs/sysfs.sh b/tools/testing/selftests/sysfs/sysfs.sh
new file mode 100755
index 000000000000..681b27579f6f
--- /dev/null
+++ b/tools/testing/selftests/sysfs/sysfs.sh
@@ -0,0 +1,1202 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (C) 2021 Luis Chamberlain <mcgrof@kernel.org>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 2 of the License, or at your option any
+# later version; or, when distributed separately from the Linux kernel or
+# when incorporated into other software packages, subject to the following
+# license:
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of copyleft-next (version 0.3.1 or later) as published
+# at http://copyleft-next.org/.
+
+# This performs a series tests against the sysfs filesystem.
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+TEST_NAME="sysfs"
+TEST_DRIVER="test_${TEST_NAME}"
+TEST_DIR=$(dirname $0)
+TEST_FILE=$(mktemp)
+
+# This represents
+#
+# TEST_ID:TEST_COUNT:ENABLED:TARGET
+#
+# TEST_ID: is the test id number
+# TEST_COUNT: number of times we should run the test
+# ENABLED: 1 if enabled, 0 otherwise
+# TARGET: test target file required on the test_sysfs module
+#
+# Once these are enabled please leave them as-is. Write your own test,
+# we have tons of space.
+ALL_TESTS="0001:3:1:test_dev_x:misc"
+ALL_TESTS="$ALL_TESTS 0002:3:1:test_dev_y:misc"
+ALL_TESTS="$ALL_TESTS 0003:3:1:test_dev_x:misc"
+ALL_TESTS="$ALL_TESTS 0004:3:1:test_dev_y:misc"
+ALL_TESTS="$ALL_TESTS 0005:1:1:test_dev_x:misc"
+ALL_TESTS="$ALL_TESTS 0006:1:1:test_dev_y:misc"
+ALL_TESTS="$ALL_TESTS 0007:1:1:test_dev_y:misc"
+ALL_TESTS="$ALL_TESTS 0008:1:1:test_dev_x:misc"
+ALL_TESTS="$ALL_TESTS 0009:1:1:test_dev_y:misc"
+ALL_TESTS="$ALL_TESTS 0010:1:1:test_dev_y:misc"
+ALL_TESTS="$ALL_TESTS 0011:1:1:test_dev_x:misc"
+ALL_TESTS="$ALL_TESTS 0012:1:1:test_dev_y:misc"
+ALL_TESTS="$ALL_TESTS 0013:1:1:test_dev_y:misc"
+ALL_TESTS="$ALL_TESTS 0014:3:1:test_dev_x:block" # block equivalent set
+ALL_TESTS="$ALL_TESTS 0015:3:1:test_dev_x:block"
+ALL_TESTS="$ALL_TESTS 0016:3:1:test_dev_x:block"
+ALL_TESTS="$ALL_TESTS 0017:3:1:test_dev_y:block"
+ALL_TESTS="$ALL_TESTS 0018:1:1:test_dev_x:block"
+ALL_TESTS="$ALL_TESTS 0019:1:1:test_dev_y:block"
+ALL_TESTS="$ALL_TESTS 0020:1:1:test_dev_y:block"
+ALL_TESTS="$ALL_TESTS 0021:1:1:test_dev_x:block"
+ALL_TESTS="$ALL_TESTS 0022:1:1:test_dev_y:block"
+ALL_TESTS="$ALL_TESTS 0023:1:1:test_dev_y:block"
+ALL_TESTS="$ALL_TESTS 0024:1:1:test_dev_x:block"
+ALL_TESTS="$ALL_TESTS 0025:1:1:test_dev_y:block"
+ALL_TESTS="$ALL_TESTS 0026:1:1:test_dev_y:block"
+ALL_TESTS="$ALL_TESTS 0027:1:0:test_dev_x:block" # deadlock test
+ALL_TESTS="$ALL_TESTS 0028:1:0:test_dev_x:block" # deadlock test with rntl_lock
+
+allow_user_defaults()
+{
+	if [ -z $DIR ]; then
+		case $TEST_DEV_TYPE in
+		misc)
+			DIR="/sys/devices/virtual/misc/${TEST_DRIVER}0"
+			;;
+		block)
+			DIR="/sys/devices/virtual/block/${TEST_DRIVER}0"
+			;;
+		*)
+			DIR="/sys/devices/virtual/misc/${TEST_DRIVER}0"
+			;;
+		esac
+	fi
+	case $TEST_DEV_TYPE in
+		misc)
+			MODPROBE_TESTDEV_TYPE=""
+			;;
+		block)
+			MODPROBE_TESTDEV_TYPE="devtype=1"
+			;;
+		*)
+			MODPROBE_TESTDEV_TYPE=""
+			;;
+	esac
+	if [ -z $SYSFS_DEBUGFS_DIR ]; then
+		SYSFS_DEBUGFS_DIR="/sys/kernel/debug/test_sysfs"
+	fi
+	if [ -z $PAGE_SIZE ]; then
+		PAGE_SIZE=$(getconf PAGESIZE)
+	fi
+	if [ -z $MAX_DIGITS ]; then
+		MAX_DIGITS=$(($PAGE_SIZE/8))
+	fi
+	if [ -z $INT_MAX ]; then
+		INT_MAX=$(getconf INT_MAX)
+	fi
+	if [ -z $UINT_MAX ]; then
+		UINT_MAX=$(getconf UINT_MAX)
+	fi
+}
+
+test_reqs()
+{
+	uid=$(id -u)
+	if [ $uid -ne 0 ]; then
+		echo $msg must be run as root >&2
+		exit $ksft_skip
+	fi
+
+	if ! which modprobe 2> /dev/null > /dev/null; then
+		echo "$0: You need modprobe installed" >&2
+		exit $ksft_skip
+	fi
+	if ! which getconf 2> /dev/null > /dev/null; then
+		echo "$0: You need getconf installed"
+		exit $ksft_skip
+	fi
+	if ! which diff 2> /dev/null > /dev/null; then
+		echo "$0: You need diff installed"
+		exit $ksft_skip
+	fi
+	if ! which perl 2> /dev/null > /dev/null; then
+		echo "$0: You need perl installed"
+		exit $ksft_skip
+	fi
+}
+
+call_modprobe()
+{
+	modprobe $TEST_DRIVER $MODPROBE_TESTDEV_TYPE $FIRST_MODPROBE_ARGS $MODPROBE_ARGS
+	return $?
+}
+
+modprobe_reset()
+{
+	modprobe -q -r $TEST_DRIVER
+	call_modprobe
+	return $?
+}
+
+modprobe_reset_enable_debugfs()
+{
+	FIRST_MODPROBE_ARGS="enable_debugfs=1"
+	modprobe_reset
+	unset FIRST_MODPROBE_ARGS
+}
+
+modprobe_reset_enable_lock_on_rmmod()
+{
+	FIRST_MODPROBE_ARGS="enable_lock=1 enable_lock_on_rmmod=1 enable_verbose_writes=1"
+	modprobe_reset
+	unset FIRST_MODPROBE_ARGS
+}
+
+modprobe_reset_enable_rtnl_lock_on_rmmod()
+{
+	FIRST_MODPROBE_ARGS="enable_lock=1 use_rtnl_lock=1 enable_lock_on_rmmod=1"
+	FIRST_MODPROBE_ARGS="$FIRST_MODPROBE_ARGS enable_verbose_writes=1"
+	modprobe_reset
+	unset FIRST_MODPROBE_ARGS
+}
+
+load_req_mod()
+{
+	modprobe_reset
+	if [ ! -d $DIR ]; then
+		if ! modprobe -q -n $TEST_DRIVER; then
+			echo "$0: module $TEST_DRIVER not found [SKIP]"
+			echo "You must set CONFIG_TEST_SYSFS=m in your kernel" >&2
+			exit $ksft_skip
+		fi
+		call_modprobe
+		if [ $? -ne 0 ]; then
+			echo "$0: modprobe $TEST_DRIVER failed."
+			exit
+		fi
+	fi
+}
+
+config_reset()
+{
+	if ! echo -n "1" >"$DIR"/reset; then
+		echo "$0: reset should have worked" >&2
+		exit 1
+	fi
+}
+
+debugfs_reset_first_test_dev_ignore_errors()
+{
+	echo -n "1" >"$SYSFS_DEBUGFS_DIR"/reset_first_test_dev
+}
+
+set_orig()
+{
+	if [[ ! -z $TARGET ]] && [[ ! -z $ORIG ]]; then
+		if [ -f ${TARGET} ]; then
+			echo "${ORIG}" > "${TARGET}"
+		fi
+	fi
+}
+
+set_test()
+{
+	echo "${TEST_STR}" > "${TARGET}"
+}
+
+set_test_ignore_errors()
+{
+	echo "${TEST_STR}" > "${TARGET}" 2> /dev/null
+}
+
+verify()
+{
+	local seen
+	seen=$(cat "$1")
+	target_short=$(basename $TARGET)
+	case $target_short in
+	test_dev_x)
+		if [ "${seen}" != "${TEST_STR}" ]; then
+			return 1
+		fi
+		;;
+	test_dev_y)
+		DIRNAME=$(dirname $1)
+		EXPECTED_RESULT=""
+		# If our target was the test file then what we write to it
+		# is the same as what that we expect when we read from it.
+		# When we write to test_dev_y directly though we expect
+		# a computed value which is driver specific.
+		if [[ "$DIRNAME" == "/tmp" ]]; then
+			let EXPECTED_RESULT="${TEST_STR}"
+		else
+			x=$(cat ${DIR}/test_dev_x)
+			let EXPECTED_RESULT="$x+${TEST_STR}+7"
+		fi
+
+		if [[ "${seen}" != "${EXPECTED_RESULT}" ]]; then
+			return 1
+		fi
+		;;
+	*)
+		echo "Unsupported target type update test script: $target_short"
+		exit 1
+	esac
+	return 0
+}
+
+verify_diff_w()
+{
+	echo "$TEST_STR" | diff -q -w -u - $1 > /dev/null
+	return $?
+}
+
+test_rc()
+{
+	if [[ $rc != 0 ]]; then
+		echo "Failed test, return value: $rc" >&2
+		exit $rc
+	fi
+}
+
+test_finish()
+{
+	set_orig
+	rm -f "${TEST_FILE}"
+
+	if [ ! -z ${old_strict} ]; then
+		echo ${old_strict} > ${WRITES_STRICT}
+	fi
+	exit $rc
+}
+
+# kernfs requires us to write everything we want in one shot because
+# There is no easy way for us to know if userspace is only doing a partial
+# write, so we don't support them. We expect the entire buffer to come on
+# the first write.  If you're writing a value, first read the file,
+# modify only the value you're changing, then write entire buffer back.
+# Since we are only testing digits we just full single writes and old stuff.
+# For more details, refer to kernfs_fop_write_iter().
+run_numerictests_single_write()
+{
+	echo "== Testing sysfs behavior against ${TARGET} =="
+
+	rc=0
+
+	echo -n "Writing test file ... "
+	echo "${TEST_STR}" > "${TEST_FILE}"
+	if ! verify "${TEST_FILE}"; then
+		echo "FAIL" >&2
+		exit 1
+	else
+		echo "ok"
+	fi
+
+	echo -n "Checking the sysfs file is not set to test value ... "
+	if verify "${TARGET}"; then
+		echo "FAIL" >&2
+		exit 1
+	else
+		echo "ok"
+	fi
+
+	echo -n "Writing to sysfs file from shell ... "
+	set_test
+	if ! verify "${TARGET}"; then
+		echo "FAIL" >&2
+		exit 1
+	else
+		echo "ok"
+	fi
+
+	echo -n "Resetting sysfs file to original value ... "
+	set_orig
+	if verify "${TARGET}"; then
+		echo "FAIL" >&2
+		exit 1
+	else
+		echo "ok"
+	fi
+
+	# Now that we've validated the sanity of "set_test" and "set_orig",
+	# we can use those functions to set starting states before running
+	# specific behavioral tests.
+
+	echo -n "Writing to the entire sysfs file in a single write ... "
+	set_orig
+	dd if="${TEST_FILE}" of="${TARGET}" bs=4096 2>/dev/null
+	if ! verify "${TARGET}"; then
+		echo "FAIL" >&2
+		rc=1
+	else
+		echo "ok"
+	fi
+
+	echo -n "Writing to the sysfs file with multiple long writes ... "
+	set_orig
+	(perl -e 'print "A" x 50;'; echo "${TEST_STR}") | \
+		dd of="${TARGET}" bs=50 2>/dev/null
+	if verify "${TARGET}"; then
+		echo "FAIL" >&2
+		rc=1
+	else
+		echo "ok"
+	fi
+	test_rc
+}
+
+reset_vals()
+{
+	echo -n 3 > $DIR/test_dev_x
+	echo -n 4 > $DIR/test_dev_x
+}
+
+check_failure()
+{
+	echo -n "Testing that $1 fails as expected..."
+	reset_vals
+	TEST_STR="$1"
+	orig="$(cat $TARGET)"
+	echo -n "$TEST_STR" > $TARGET 2> /dev/null
+
+	# write should fail and $TARGET should retain its original value
+	if [ $? = 0 ] || [ "$(cat $TARGET)" != "$orig" ]; then
+		echo "FAIL" >&2
+		rc=1
+	else
+		echo "ok"
+	fi
+	test_rc
+}
+
+load_modreqs()
+{
+	export TEST_DEV_TYPE=$(get_test_type $1)
+	unset DIR
+	allow_user_defaults
+	load_req_mod
+}
+
+target_exists()
+{
+	TARGET="${DIR}/$1"
+	TEST_ID="$2"
+
+	if [ ! -f ${TARGET} ] ; then
+		echo "Target for test $TEST_ID: $TARGET does not exist, skipping test ..."
+		return 0
+	fi
+	return 1
+}
+
+config_enable_lock()
+{
+	if ! echo -n 1 > $DIR/config_enable_lock; then
+		echo "$0: Unable to enable locks" >&2
+		exit 1
+	fi
+}
+
+config_write_delay_msec_y()
+{
+	if ! echo -n $1 > $DIR/config_write_delay_msec_y ; then
+		echo "$0: Unable to set write_delay_msec_y to $1" >&2
+		exit 1
+	fi
+}
+
+# Default filter for dmesg scanning.
+# Ignore lockdep complaining about its own bugginess when scanning dmesg
+# output, because we shouldn't be failing filesystem tests on account of
+# lockdep.
+_check_dmesg_filter()
+{
+	egrep -v -e "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low" \
+		-e "BUG: MAX_STACK_TRACE_ENTRIES too low"
+}
+
+check_dmesg()
+{
+	# filter out intentional WARNINGs or Oopses
+	local filter=${1:-_check_dmesg_filter}
+
+	_dmesg_since_test_start | $filter >$seqres.dmesg
+	egrep -q -e "kernel BUG at" \
+	     -e "WARNING:" \
+	     -e "\bBUG:" \
+	     -e "Oops:" \
+	     -e "possible recursive locking detected" \
+	     -e "Internal error" \
+	     -e "(INFO|ERR): suspicious RCU usage" \
+	     -e "INFO: possible circular locking dependency detected" \
+	     -e "general protection fault:" \
+	     -e "BUG .* remaining" \
+	     -e "UBSAN:" \
+	     $seqres.dmesg
+	if [ $? -eq 0 ]; then
+		echo "something found in dmesg (see $seqres.dmesg)"
+		return 1
+	else
+		if [ "$KEEP_DMESG" != "yes" ]; then
+			rm -f $seqres.dmesg
+		fi
+		return 0
+	fi
+}
+
+log_kernel_fstest_dmesg()
+{
+	export FSTYP="$1"
+	export seqnum="$FSTYP/$2"
+	export date_time=$(date +"%F %T")
+	echo "run fstests $seqnum at $date_time" > /dev/kmsg
+}
+
+modprobe_loop()
+{
+	while true; do
+		call_modprobe > /dev/null 2>&1
+		modprobe -r $TEST_DRIVER > /dev/null 2>&1
+	done > /dev/null 2>&1
+}
+
+write_loop()
+{
+	while true; do
+		set_test_ignore_errors > /dev/null 2>&1
+		TEST_STR=$(( $TEST_STR + 1 ))
+	done > /dev/null 2>&1
+}
+
+write_loop_reset()
+{
+	while true; do
+		set_test_ignore_errors > /dev/null 2>&1
+		debugfs_reset_first_test_dev_ignore_errors > /dev/null 2>&1
+	done > /dev/null 2>&1
+}
+
+write_loop_bg()
+{
+	BG_WRITES=1000 > /dev/null 2>&1
+	while true; do
+		for i in $(seq 1 $BG_WRITES); do
+			set_test_ignore_errors > /dev/null 2>&1 &
+			TEST_STR=$(( $TEST_STR + 1 ))
+		done > /dev/null 2>&1
+		wait
+	done > /dev/null 2>&1
+	wait
+}
+
+reset_loop()
+{
+	while true; do
+		debugfs_reset_first_test_dev_ignore_errors > /dev/null 2>&1
+	done > /dev/null 2>&1
+}
+
+kill_trigger_loop()
+{
+
+	local my_first_loop_pid=$1
+	local my_second_loop_pid=$2
+	local my_sleep_max=$3
+	local my_loop=0
+
+	while true; do
+		sleep 1
+		if [[ $my_loop -ge $my_sleep_max ]]; then
+			break
+		fi
+		let my_loop=$my_loop+1
+	done
+
+	kill -s TERM $my_first_loop_pid 2>&1 > /dev/null
+	kill -s TERM $my_second_loop_pid 2>&1 > /dev/null
+}
+
+_dmesg_since_test_start()
+{
+	# search the dmesg log of last run of $seqnum for possible failures
+	# use sed \cregexpc address type, since $seqnum contains "/"
+	dmesg | tac | sed -ne "0,\#run fstests $seqnum at $date_time#p" | tac
+}
+
+sysfs_test_0001()
+{
+	TARGET="${DIR}/$(get_test_target 0001)"
+	config_reset
+	reset_vals
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+
+	run_numerictests_single_write
+}
+
+sysfs_test_0002()
+{
+	TARGET="${DIR}/$(get_test_target 0002)"
+	config_reset
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+
+	run_numerictests_single_write
+}
+
+sysfs_test_0003()
+{
+	TARGET="${DIR}/$(get_test_target 0003)"
+	config_reset
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+
+	config_enable_lock
+
+	run_numerictests_single_write
+}
+
+sysfs_test_0004()
+{
+	TARGET="${DIR}/$(get_test_target 0004)"
+	config_reset
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+
+	config_enable_lock
+
+	run_numerictests_single_write
+}
+
+sysfs_test_0005()
+{
+	TARGET="${DIR}/$(get_test_target 0005)"
+	modprobe_reset
+	config_reset
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Loop writing x while loading/unloading the module... "
+
+	modprobe_loop &
+	modprobe_pid=$!
+
+	write_loop &
+	write_pid=$!
+
+	kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0006()
+{
+	TARGET="${DIR}/$(get_test_target 0006)"
+	modprobe_reset
+	config_reset
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Loop writing y while loading/unloading the module... "
+	modprobe_loop &
+	modprobe_pid=$!
+
+	write_loop &
+	write_pid=$!
+
+	kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0007()
+{
+	TARGET="${DIR}/$(get_test_target 0007)"
+	modprobe_reset
+	config_reset
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Loop writing y with a larger delay while loading/unloading the module... "
+
+	MODPROBE_ARGS="write_delay_msec_y=1500"
+	modprobe_loop > /dev/null 2>&1 &
+	modprobe_pid=$!
+	unset MODPROBE_ARGS
+
+	write_loop &
+	write_pid=$!
+
+	kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0008()
+{
+	TARGET="${DIR}/$(get_test_target 0008)"
+	modprobe_reset
+	config_reset
+	reset_vals
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Loop busy writing x while loading/unloading the module... "
+
+	modprobe_loop > /dev/null 2>&1 &
+	modprobe_pid=$!
+
+	write_loop_bg > /dev/null 2>&1 &
+	write_pid=$!
+
+	kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0009()
+{
+	TARGET="${DIR}/$(get_test_target 0009)"
+	modprobe_reset
+	config_reset
+	reset_vals
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Loop busy writing y while loading/unloading the module... "
+
+	modprobe_loop > /dev/null 2>&1 &
+	modprobe_pid=$!
+
+	write_loop_bg > /dev/null 2>&1 &
+	write_pid=$!
+
+	kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0010()
+{
+	TARGET="${DIR}/$(get_test_target 0010)"
+	modprobe_reset
+	config_reset
+	reset_vals
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Loop busy writing y with a larger delay while loading/unloading the module... "
+	modprobe -q -r $TEST_DRIVER > /dev/null 2>&1
+
+	MODPROBE_ARGS="write_delay_msec_y=1500"
+	modprobe_loop > /dev/null 2>&1 &
+	modprobe_pid=$!
+	unset MODPROBE_ARGS
+
+	write_loop_bg > /dev/null 2>&1 &
+	write_pid=$!
+
+	kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0011()
+{
+	TARGET="${DIR}/$(get_test_target 0011)"
+	modprobe_reset_enable_debugfs
+	config_reset
+	reset_vals
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Loop writing x and resetting ... "
+
+	write_loop > /dev/null 2>&1 &
+	write_pid=$!
+
+	reset_loop > /dev/null 2>&1 &
+	reset_pid=$!
+
+	kill_trigger_loop $write_pid $reset_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0012()
+{
+	TARGET="${DIR}/$(get_test_target 0012)"
+	modprobe_reset_enable_debugfs
+	config_reset
+	reset_vals
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Loop writing y and resetting ... "
+
+	write_loop > /dev/null 2>&1 &
+	write_pid=$!
+
+	reset_loop > /dev/null 2>&1 &
+	reset_pid=$!
+
+	kill_trigger_loop $write_pid $reset_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0013()
+{
+	TARGET="${DIR}/$(get_test_target 0013)"
+	modprobe_reset_enable_debugfs
+	config_reset
+	reset_vals
+	config_write_delay_msec_y 1500
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Loop writing y with a larger delay and resetting ... "
+
+	write_loop > /dev/null 2>&1 &
+	write_pid=$!
+
+	reset_loop > /dev/null 2>&1 &
+	reset_pid=$!
+
+	kill_trigger_loop $write_pid $reset_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0014()
+{
+	sysfs_test_0001
+}
+
+sysfs_test_0015()
+{
+	sysfs_test_0002
+}
+
+sysfs_test_0016()
+{
+	sysfs_test_0003
+}
+
+sysfs_test_0017()
+{
+	sysfs_test_0004
+}
+
+sysfs_test_0018()
+{
+	sysfs_test_0005
+}
+
+sysfs_test_0019()
+{
+	sysfs_test_0006
+}
+
+sysfs_test_0020()
+{
+	sysfs_test_0007
+}
+
+sysfs_test_0021()
+{
+	sysfs_test_0008
+}
+
+sysfs_test_0022()
+{
+	sysfs_test_0009
+}
+
+sysfs_test_0023()
+{
+	sysfs_test_0010
+}
+
+sysfs_test_0024()
+{
+	sysfs_test_0011
+}
+
+sysfs_test_0025()
+{
+	sysfs_test_0012
+}
+
+sysfs_test_0026()
+{
+	sysfs_test_0013
+}
+
+sysfs_test_0027()
+{
+	TARGET="${DIR}/$(get_test_target 0027)"
+	modprobe_reset_enable_lock_on_rmmod
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Test for possible rmmod deadlock while writing x ... "
+
+	write_loop > /dev/null 2>&1 &
+	write_pid=$!
+
+	MODPROBE_ARGS="enable_lock=1 enable_lock_on_rmmod=1 enable_verbose_writes=1"
+	modprobe_loop > /dev/null 2>&1 &
+	modprobe_pid=$!
+	unset MODPROBE_ARGS
+
+	kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0028()
+{
+	TARGET="${DIR}/$(get_test_target 0028)"
+	modprobe_reset_enable_lock_on_rmmod
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+	WAIT_TIME=2
+
+	echo -n "Test for possible rmmod deadlock using rtnl_lock while writing x ... "
+
+	write_loop > /dev/null 2>&1 &
+	write_pid=$!
+
+	MODPROBE_ARGS="enable_lock=1 enable_lock_on_rmmod=1 use_rtnl_lock=1 enable_verbose_writes=1"
+	modprobe_loop > /dev/null 2>&1 &
+	modprobe_pid=$!
+	unset MODPROBE_ARGS
+
+	kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 &
+	kill_pid=$!
+
+	wait $kill_pid > /dev/null 2>&1
+
+	if [[ $? -eq 0 ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+test_gen_desc()
+{
+	echo -n "$1 x $(get_test_count $1)"
+}
+
+list_tests()
+{
+	echo "Test ID list:"
+	echo
+	echo "TEST_ID x NUM_TEST"
+	echo "TEST_ID:   Test ID"
+	echo "NUM_TESTS: Number of recommended times to run the test"
+	echo
+	echo "$(test_gen_desc 0001) - misc test writing x in different ways"
+	echo "$(test_gen_desc 0002) - misc test writing y in different ways"
+	echo "$(test_gen_desc 0003) - misc test writing x in different ways using a mutex lock"
+	echo "$(test_gen_desc 0004) - misc test writing y in different ways using a mutex lock"
+	echo "$(test_gen_desc 0005) - misc test writing x load and remove the test_sysfs module"
+	echo "$(test_gen_desc 0006) - misc writing y load and remove the test_sysfs module"
+	echo "$(test_gen_desc 0007) - misc test writing y larger delay, load, remove test_sysfs"
+	echo "$(test_gen_desc 0008) - misc test busy writing x remove test_sysfs module"
+	echo "$(test_gen_desc 0009) - misc test busy writing y remove the test_sysfs module"
+	echo "$(test_gen_desc 0010) - misc test busy writing y larger delay, remove test_sysfs"
+	echo "$(test_gen_desc 0011) - misc test writing x and resetting device"
+	echo "$(test_gen_desc 0012) - misc test writing y and resetting device"
+	echo "$(test_gen_desc 0013) - misc test writing y with a larger delay and resetting device"
+	echo "$(test_gen_desc 0014) - block test writing x in different ways"
+	echo "$(test_gen_desc 0015) - block test writing y in different ways"
+	echo "$(test_gen_desc 0016) - block test writing x in different ways using a mutex lock"
+	echo "$(test_gen_desc 0017) - block test writing y in different ways using a mutex lock"
+	echo "$(test_gen_desc 0018) - block test writing x load and remove the test_sysfs module"
+	echo "$(test_gen_desc 0019) - block test writing y load and remove the test_sysfs module"
+	echo "$(test_gen_desc 0020) - block test writing y larger delay, load, remove test_sysfs"
+	echo "$(test_gen_desc 0021) - block test busy writing x remove the test_sysfs module"
+	echo "$(test_gen_desc 0022) - block test busy writing y remove the test_sysfs module"
+	echo "$(test_gen_desc 0023) - block test busy writing y larger delay, remove test_sysfs"
+	echo "$(test_gen_desc 0024) - block test writing x and resetting device"
+	echo "$(test_gen_desc 0025) - block test writing y and resetting device"
+	echo "$(test_gen_desc 0026) - block test writing y larger delay and resetting device"
+	echo "$(test_gen_desc 0027) - test rmmod deadlock while writing x ... "
+	echo "$(test_gen_desc 0028) - test rmmod deadlock using rtnl_lock while writing x ..."
+}
+
+usage()
+{
+	NUM_TESTS=$(grep -o ' ' <<<"$ALL_TESTS" | grep -c .)
+	let NUM_TESTS=$NUM_TESTS+1
+	MAX_TEST=$(printf "%04d\n" $NUM_TESTS)
+	echo "Usage: $0 [ -t <4-number-digit> ] | [ -w <4-number-digit> ] |"
+	echo "		 [ -s <4-number-digit> ] | [ -c <4-number-digit> <test- count>"
+	echo "           [ all ] [ -h | --help ] [ -l ]"
+	echo ""
+	echo "Valid tests: 0001-$MAX_TEST"
+	echo ""
+	echo "    all     Runs all tests (default)"
+	echo "    -t      Run test ID the number amount of times is recommended"
+	echo "    -w      Watch test ID run until it runs into an error"
+	echo "    -c      Run test ID once"
+	echo "    -s      Run test ID x test-count number of times"
+	echo "    -l      List all test ID list"
+	echo " -h|--help  Help"
+	echo
+	echo "If an error every occurs execution will immediately terminate."
+	echo "If you are adding a new test try using -w <test-ID> first to"
+	echo "make sure the test passes a series of tests."
+	echo
+	echo Example uses:
+	echo
+	echo "$TEST_NAME.sh            -- executes all tests"
+	echo "$TEST_NAME.sh -t 0002    -- Executes test ID 0002 number of times is recomended"
+	echo "$TEST_NAME.sh -w 0002    -- Watch test ID 0002 run until an error occurs"
+	echo "$TEST_NAME.sh -s 0002    -- Run test ID 0002 once"
+	echo "$TEST_NAME.sh -c 0002 3  -- Run test ID 0002 three times"
+	echo
+	list_tests
+	exit 1
+}
+
+test_num()
+{
+	re='^[0-9]+$'
+	if ! [[ $1 =~ $re ]]; then
+		usage
+	fi
+}
+
+get_test_count()
+{
+	test_num $1
+	TEST_NUM=$(echo $1 | sed 's/^0*//')
+	TEST_DATA=$(echo $ALL_TESTS | awk '{print $'$TEST_NUM'}')
+	echo ${TEST_DATA} | awk -F":" '{print $2}'
+}
+
+get_test_enabled()
+{
+	test_num $1
+	TEST_NUM=$(echo $1 | sed 's/^0*//')
+	TEST_DATA=$(echo $ALL_TESTS | awk '{print $'$TEST_NUM'}')
+	echo ${TEST_DATA} | awk -F":" '{print $3}'
+}
+
+get_test_target()
+{
+	test_num $1
+	TEST_NUM=$(echo $1 | sed 's/^0*//')
+	TEST_DATA=$(echo $ALL_TESTS | awk '{print $'$TEST_NUM'}')
+	echo ${TEST_DATA} | awk -F":" '{print $4}'
+}
+
+get_test_type()
+{
+	test_num $1
+	TEST_NUM=$(echo $1 | sed 's/^0*//')
+	TEST_DATA=$(echo $ALL_TESTS | awk '{print $'$TEST_NUM'}')
+	echo ${TEST_DATA} | awk -F":" '{print $5}'
+}
+
+run_all_tests()
+{
+	for i in $ALL_TESTS ; do
+		TEST_ID=$(echo $i | awk -F":" '{print $1}')
+		ENABLED=$(get_test_enabled $TEST_ID)
+		TEST_COUNT=$(get_test_count $TEST_ID)
+		TEST_TARGET=$(get_test_target $TEST_ID)
+		if [[ $ENABLED -eq "1" ]]; then
+			test_case $TEST_ID $TEST_COUNT $TEST_TARGET
+		else
+			echo -n "Skipping test $TEST_ID as its disabled, likely "
+			echo "could crash your system ..."
+		fi
+	done
+}
+
+watch_log()
+{
+	if [ $# -ne 3 ]; then
+		clear
+	fi
+	echo "Running test: $2 - run #$1"
+}
+
+watch_case()
+{
+	i=0
+	while [ 1 ]; do
+
+		if [ $# -eq 1 ]; then
+			test_num $1
+			watch_log $i ${TEST_NAME}_test_$1
+			${TEST_NAME}_test_$1
+			check_dmesg
+			if [[ $? -eq 0 ]]; then
+				exit 1
+			fi
+		else
+			watch_log $i all
+			run_all_tests
+		fi
+		let i=$i+1
+	done
+}
+
+test_case()
+{
+	NUM_TESTS=$2
+
+	i=0
+
+	load_modreqs $1
+	if target_exists $3 $1; then
+		return
+	fi
+
+	while [[ $i -lt $NUM_TESTS ]]; do
+		test_num $1
+		watch_log $i ${TEST_NAME}_test_$1 noclear
+		log_kernel_fstest_dmesg sysfs $1
+		RUN_TEST=${TEST_NAME}_test_$1
+		$RUN_TEST
+		let i=$i+1
+	done
+	check_dmesg
+	if [[ $? -ne 0 ]]; then
+		exit 1
+	fi
+}
+
+parse_args()
+{
+	if [ $# -eq 0 ]; then
+		run_all_tests
+	else
+		if [[ "$1" = "all" ]]; then
+			run_all_tests
+		elif [[ "$1" = "-w" ]]; then
+			shift
+			watch_case $@
+		elif [[ "$1" = "-t" ]]; then
+			shift
+			test_num $1
+			test_case $1 $(get_test_count $1) $(get_test_target $1)
+		elif [[ "$1" = "-c" ]]; then
+			shift
+			test_num $1
+			test_num $2
+			test_case $1 $2 $(get_test_target $1)
+		elif [[ "$1" = "-s" ]]; then
+			shift
+			test_case $1 1 $(get_test_target $1)
+		elif [[ "$1" = "-l" ]]; then
+			list_tests
+		elif [[ "$1" = "-h" || "$1" = "--help" ]]; then
+			usage
+		else
+			usage
+		fi
+	fi
+}
+
+test_reqs
+allow_user_defaults
+
+trap "test_finish" EXIT
+
+parse_args $@
+
+exit 0
-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] kernfs: add initial failure injection support
  2021-07-03  0:46 [PATCH v2 0/4] selftests: add a new test driver for sysfs Luis Chamberlain
  2021-07-03  0:46 ` [PATCH v2 1/4] selftests: add tests_sysfs module Luis Chamberlain
@ 2021-07-03  0:46 ` Luis Chamberlain
  2021-07-03  0:46 ` [PATCH v2 3/4] test_sysfs: add support to use kernfs failure injection Luis Chamberlain
  2021-07-03  0:46 ` [PATCH v2 4/4] test_sysfs: demonstrate deadlock fix Luis Chamberlain
  3 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2021-07-03  0:46 UTC (permalink / raw)
  To: gregkh, tj, shuah, akpm, rafael, davem, kuba, ast, andriin,
	daniel, atenart, alobakin, weiwan, ap420073
  Cc: jeyu, ngupta, sergey.senozhatsky.work, minchan, mcgrof, axboe,
	mbenes, jpoimboe, tglx, keescook, jikos, rostedt, peterz,
	linux-block, netdev, linux-kselftest, linux-kernel

This adds initial failure injection support to kernfs. We start
off with debug knobs which when enabled allow test drivers, such as
test_sysfs, to then make use of these to try to force certain
difficult races to take place with a high degree of certainty.

This only adds runtime code *iff* the new bool CONFIG_FAIL_KERNFS_KNOBS is
enabled in your kernel. If you don't have this enabled this provides
no new functional. When CONFIG_FAIL_KERNFS_KNOBS is disabled the new
routine kernfs_debug_should_wait() ends up being transformed to if
(false), and so the compiler should optimize these out as dead code
producing no new effective binary changes.

We start off with enabling failure injections in kernfs by allowing us to
alter the way kernfs_fop_write_iter() behaves. We allow for the routine
kernfs_fop_write_iter() to wait for a certain condition in the kernel to
occur, after which it will sleep a predefined amount of time. This lets
kernfs users to time exactly when it want kernfs_fop_write_iter() to
complete, allowing for developing race conditions and test for correctness
in kernfs.

You'd boot with this enabled on your kernel command line:

fail_kernfs_fop_write_iter=1,100,0,1

The values are <interval,probability,size,times>, we don't care for
size, so for now we ignore it. The above ensures a failure will trigger
only once.

*How* we allow for this routine to change behaviour is left to knobs we
expose under debugfs:

 # ls -1 /sys/kernel/debug/kernfs/config_fail_kernfs_fop_write_iter/
wait_after_active
wait_after_mutex
wait_at_start
wait_before_mutex

A debugfs entry also exists to allow us to sleep a configurabler amount
of time after the completion:

/sys/kernel/debug/kernfs/sleep_after_wait_ms

These two sets of knobs allow us to construct races and demonstrate
how the kernfs active reference should suffice to project against
races.

Enabling CONFIG_FAULT_INJECTION_DEBUG_FS enables us to configure the
differnt fault injection parametres for the new fail_kernfs_fop_write_iter
fault injection at run time:

ls -1 /sys/kernel/debug/kernfs/fail_kernfs_fop_write_iter/
interval
probability
space
task-filter
times
verbose
verbose_ratelimit_burst
verbose_ratelimit_interval_ms

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

kernfs: fix for latest linux-next

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 .../fault-injection/fault-injection.rst       | 22 +++++
 MAINTAINERS                                   |  2 +-
 fs/kernfs/Makefile                            |  1 +
 fs/kernfs/failure-injection.c                 | 83 +++++++++++++++++++
 fs/kernfs/file.c                              | 13 +++
 fs/kernfs/kernfs-internal.h                   | 72 ++++++++++++++++
 include/linux/kernfs.h                        |  5 ++
 lib/Kconfig.debug                             | 10 +++
 8 files changed, 207 insertions(+), 1 deletion(-)
 create mode 100644 fs/kernfs/failure-injection.c

diff --git a/Documentation/fault-injection/fault-injection.rst b/Documentation/fault-injection/fault-injection.rst
index f47d05ed0d94..d27620196ee8 100644
--- a/Documentation/fault-injection/fault-injection.rst
+++ b/Documentation/fault-injection/fault-injection.rst
@@ -24,6 +24,28 @@ Available fault injection capabilities
 
   injects futex deadlock and uaddr fault errors.
 
+- fail_kernfs_fop_write_iter
+
+  Allows for failures to be enabled inside kernfs_fop_write_iter(). Enabling
+  this does not immediately enable any errors to occur. You must configure
+  how you want this routine to fail or change behaviour by using the debugfs
+  knobs for it:
+
+  # ls -1 /sys/kernel/debug/kernfs/config_fail_kernfs_fop_write_iter/
+  wait_after_active
+  wait_after_mutex
+  wait_at_start
+  wait_before_mutex
+
+  You can also configure how long to sleep after a wait under
+
+  /sys/kernel/debug/kernfs/sleep_after_wait_ms
+
+  If you enable CONFIG_FAULT_INJECTION_DEBUG_FS the fail_add_disk failure
+  injection parameters are placed under:
+
+  /sys/kernel/debug/kernfs/fail_kernfs_fop_write_iter/
+
 - fail_make_request
 
   injects disk IO errors on devices permitted by setting
diff --git a/MAINTAINERS b/MAINTAINERS
index fd369ed50040..419405b2378a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10246,7 +10246,7 @@ M:	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 M:	Tejun Heo <tj@kernel.org>
 S:	Supported
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
-F:	fs/kernfs/
+F:	fs/kernfs/*
 F:	include/linux/kernfs.h
 
 KEXEC
diff --git a/fs/kernfs/Makefile b/fs/kernfs/Makefile
index 4ca54ff54c98..bc5b32ca39f9 100644
--- a/fs/kernfs/Makefile
+++ b/fs/kernfs/Makefile
@@ -4,3 +4,4 @@
 #
 
 obj-y		:= mount.o inode.o dir.o file.o symlink.o
+obj-$(CONFIG_FAIL_KERNFS_KNOBS)    += failure-injection.o
diff --git a/fs/kernfs/failure-injection.c b/fs/kernfs/failure-injection.c
new file mode 100644
index 000000000000..7019385e8145
--- /dev/null
+++ b/fs/kernfs/failure-injection.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/fault-inject.h>
+#include <linux/delay.h>
+
+#include "kernfs-internal.h"
+
+static DECLARE_FAULT_ATTR(fail_kernfs_fop_write_iter);
+struct kernfs_config_fail kernfs_config_fail;
+
+#define kernfs_config_fail(when) \
+	kernfs_config_fail.kernfs_fop_write_iter_fail.wait_ ## when
+
+#define kernfs_config_fail(when) \
+	kernfs_config_fail.kernfs_fop_write_iter_fail.wait_ ## when
+
+static int __init setup_fail_kernfs_fop_write_iter(char *str)
+{
+	return setup_fault_attr(&fail_kernfs_fop_write_iter, str);
+}
+
+__setup("fail_kernfs_fop_write_iter=", setup_fail_kernfs_fop_write_iter);
+
+struct dentry *kernfs_debugfs_root;
+struct dentry *config_fail_kernfs_fop_write_iter;
+
+static int __init kernfs_init_failure_injection(void)
+{
+	kernfs_config_fail.sleep_after_wait_ms = 100;
+	kernfs_debugfs_root = debugfs_create_dir("kernfs", NULL);
+
+	fault_create_debugfs_attr("fail_kernfs_fop_write_iter",
+				  kernfs_debugfs_root, &fail_kernfs_fop_write_iter);
+
+	config_fail_kernfs_fop_write_iter =
+		debugfs_create_dir("config_fail_kernfs_fop_write_iter",
+				   kernfs_debugfs_root);
+
+	debugfs_create_u32("sleep_after_wait_ms", 0600,
+			   kernfs_debugfs_root,
+			   &kernfs_config_fail.sleep_after_wait_ms);
+
+	debugfs_create_bool("wait_at_start", 0600,
+			    config_fail_kernfs_fop_write_iter,
+			    &kernfs_config_fail(at_start));
+	debugfs_create_bool("wait_before_mutex", 0600,
+			    config_fail_kernfs_fop_write_iter,
+			    &kernfs_config_fail(before_mutex));
+	debugfs_create_bool("wait_after_mutex", 0600,
+			    config_fail_kernfs_fop_write_iter,
+			    &kernfs_config_fail(after_mutex));
+	debugfs_create_bool("wait_after_active", 0600,
+			    config_fail_kernfs_fop_write_iter,
+			    &kernfs_config_fail(after_active));
+	return 0;
+}
+late_initcall(kernfs_init_failure_injection);
+
+int __kernfs_debug_should_wait_kernfs_fop_write_iter(bool evaluate)
+{
+	if (!evaluate)
+		return 0;
+
+	return should_fail(&fail_kernfs_fop_write_iter, 0);
+}
+
+DECLARE_COMPLETION(kernfs_debug_wait_completion);
+EXPORT_SYMBOL_NS_GPL(kernfs_debug_wait_completion, KERNFS_DEBUG_PRIVATE);
+
+void kernfs_debug_wait(void)
+{
+	wait_for_completion(&kernfs_debug_wait_completion);
+	pr_info("%s received completion\n", __func__);
+
+	/**
+	 * The goal is wait for an event, and *then* once we have
+	 * reached it, the other side will try to do something which
+	 * it thinks will break. So we must give it some time to do
+	 * that. The amount of time is configurable.
+	 */
+	msleep(kernfs_config_fail.sleep_after_wait_ms);
+	pr_info("%s ended\n", __func__);
+}
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index c75719312147..d45954600f11 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -259,6 +259,9 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	const struct kernfs_ops *ops;
 	char *buf;
 
+	if (kernfs_debug_should_wait(kernfs_fop_write_iter, at_start))
+		kernfs_debug_wait();
+
 	if (of->atomic_write_len) {
 		if (len > of->atomic_write_len)
 			return -E2BIG;
@@ -280,17 +283,27 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	}
 	buf[len] = '\0';	/* guarantee string termination */
 
+	if (kernfs_debug_should_wait(kernfs_fop_write_iter, before_mutex))
+		kernfs_debug_wait();
+
 	/*
 	 * @of->mutex nests outside active ref and is used both to ensure that
 	 * the ops aren't called concurrently for the same open file.
 	 */
 	mutex_lock(&of->mutex);
+
+	if (kernfs_debug_should_wait(kernfs_fop_write_iter, after_mutex))
+		kernfs_debug_wait();
+
 	if (!kernfs_get_active(of->kn)) {
 		mutex_unlock(&of->mutex);
 		len = -ENODEV;
 		goto out_free;
 	}
 
+	if (kernfs_debug_should_wait(kernfs_fop_write_iter, after_active))
+		kernfs_debug_wait();
+
 	ops = kernfs_ops(of->kn);
 	if (ops->write)
 		len = ops->write(of, buf, len, iocb->ki_pos);
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index ccc3b44f6306..221f341991d9 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -17,6 +17,7 @@
 
 #include <linux/kernfs.h>
 #include <linux/fs_context.h>
+#include <linux/stringify.h>
 
 struct kernfs_iattrs {
 	kuid_t			ia_uid;
@@ -127,4 +128,75 @@ void kernfs_drain_open_files(struct kernfs_node *kn);
  */
 extern const struct inode_operations kernfs_symlink_iops;
 
+/*
+ * failure-injection.c
+ */
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+
+/**
+ * struct kernfs_fop_write_iter_fail - how kernfs_fop_write_iter_fail fails
+ *
+ * This lets you configure what part of kernfs_fop_write_iter() should behave
+ * in a specific way to allow userspace to capture possible failures in
+ * kernfs. The wait knobs are allowed to let you design capture possible
+ * race conditions which would otherwise be difficult to reproduce. A
+ * secondary driver would tell kernfs's wait completion when it is done.
+ *
+ * The point to the wait completion failure injection tests are to confirm
+ * that the kernfs active refcount suffice to ensure other objects in other
+ * layers are also gauranteed to exist, even they are opaque to kernfs. This
+ * includes kobjects, devices, and other objects built on top of this, like
+ * the block layer when using sysfs block device attributes.
+ *
+ * @wait_at_start: waits for completion from a third party at the start of
+ *	the routine.
+ * @wait_before_mutex: waits for completion from a third party before we
+ *	are allowed to continue before the of->mutex is held.
+ * @wait_after_mutex: waits for completion from a third party after we
+ *	have held the of->mutex.
+ * @wait_after_active: waits for completion from a thid party after we
+ *	have refcounted the struct kernfs_node.
+ */
+struct kernfs_fop_write_iter_fail {
+	bool wait_at_start;
+	bool wait_before_mutex;
+	bool wait_after_mutex;
+	bool wait_after_active;
+};
+
+/**
+ * struct kernfs_config_fail - kernfs configuration for failure injection
+ *
+ * You can kernfs failure injection on boot, and in particular we currently
+ * only support failures for kernfs_fop_write_iter(). However, we don't
+ * want to always enable errors on this call when failure injection is enabled
+ * as this routine is used by many parts of the kernel for proper functionality.
+ * The compromise we make is we let userspace start enabling which parts it
+ * wants to fail after boot, if and only if failure injection has been enabled.
+ *
+ * @kernfs_fop_write_iter_fail: configuration for how we want to allow
+ *	for failure injection on kernfs_fop_write_iter()
+ * @sleep_after_wait_ms: how many ms to wait after completion is received.
+ */
+struct kernfs_config_fail {
+	struct kernfs_fop_write_iter_fail kernfs_fop_write_iter_fail;
+	u32 sleep_after_wait_ms;
+};
+
+extern struct kernfs_config_fail kernfs_config_fail;
+
+#define __kernfs_config_wait_var(func, when) \
+	(kernfs_config_fail.  func  ## _fail.wait_  ## when)
+#define __kernfs_debug_should_wait_func_name(func) __kernfs_debug_should_wait_## func
+
+#define kernfs_debug_should_wait(func, when) \
+	__kernfs_debug_should_wait_func_name(func)(__kernfs_config_wait_var(func, when))
+int __kernfs_debug_should_wait_kernfs_fop_write_iter(bool evaluate);
+void kernfs_debug_wait(void);
+#else
+static inline void kernfs_init_failure_injection(void) {}
+#define kernfs_debug_should_wait(when) (false)
+static inline void kernfs_debug_wait(void) {}
+#endif
+
 #endif	/* __KERNFS_INTERNAL_H */
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 9e8ca8743c26..0da36b8e18e1 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -410,6 +410,11 @@ void kernfs_init(void);
 
 struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
 						   u64 id);
+
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+extern struct completion kernfs_debug_wait_completion;
+#endif
+
 #else	/* CONFIG_KERNFS */
 
 static inline enum kernfs_node_type kernfs_type(struct kernfs_node *kn)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 568838ac1189..7dbfbfbd3cfc 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1930,6 +1930,16 @@ config FAULT_INJECTION_USERCOPY
 	  Provides fault-injection capability to inject failures
 	  in usercopy functions (copy_from_user(), get_user(), ...).
 
+config FAIL_KERNFS_KNOBS
+	bool "Fault-injection support in kernfs"
+	depends on FAULT_INJECTION
+	help
+	  Provide fault-injection capability for kernfs. This only enables
+	  the error injection functionality. To use it you must configure which
+	  which path you want to trigger on error on using debugfs under
+	  /sys/kernel/debug/kernfs/config_fail_kernfs_fop_write_iter/. By
+	  default all of these are disabled.
+
 config FAIL_MAKE_REQUEST
 	bool "Fault-injection capability for disk IO"
 	depends on FAULT_INJECTION && BLOCK
-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] test_sysfs: add support to use kernfs failure injection
  2021-07-03  0:46 [PATCH v2 0/4] selftests: add a new test driver for sysfs Luis Chamberlain
  2021-07-03  0:46 ` [PATCH v2 1/4] selftests: add tests_sysfs module Luis Chamberlain
  2021-07-03  0:46 ` [PATCH v2 2/4] kernfs: add initial failure injection support Luis Chamberlain
@ 2021-07-03  0:46 ` Luis Chamberlain
  2021-07-03  0:46 ` [PATCH v2 4/4] test_sysfs: demonstrate deadlock fix Luis Chamberlain
  3 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2021-07-03  0:46 UTC (permalink / raw)
  To: gregkh, tj, shuah, akpm, rafael, davem, kuba, ast, andriin,
	daniel, atenart, alobakin, weiwan, ap420073
  Cc: jeyu, ngupta, sergey.senozhatsky.work, minchan, mcgrof, axboe,
	mbenes, jpoimboe, tglx, keescook, jikos, rostedt, peterz,
	linux-block, netdev, linux-kselftest, linux-kernel

This extends test_sysfs with support for using the failure injection
wait completion and knobs to force a few race conditions which
demonstrates that kernfs active reference protection is sufficient
for kobject / device protection at higher layers.

This adds 4 new tests which tries to remove the device attribute
store operation in 4 different situations:

  1) at the start of kernfs_kernfs_fop_write_iter()
  2) before the of->mutex is held in kernfs_kernfs_fop_write_iter()
  3) after the of->mutex is held in kernfs_kernfs_fop_write_iter()
  4) after the kernfs node active reference is taken

A write fails in call cases.

No null dereferences are reproduced, even though this has been observed
in some complex testing cases [0]. If this issue really exists we should
have enough tools on the sysfs_test toolbox now to try to reproduce
this easily without having to poke around other drivers.

[0] https://lkml.kernel.org/r/20210623215007.862787-1-mcgrof@kernel.org

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 lib/Kconfig.debug                      |   3 +
 lib/test_sysfs.c                       |  31 +++++
 tools/testing/selftests/sysfs/config   |   3 +
 tools/testing/selftests/sysfs/sysfs.sh | 174 +++++++++++++++++++++++++
 4 files changed, 211 insertions(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7dbfbfbd3cfc..717bc367a9db 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2373,6 +2373,9 @@ config TEST_SYSCTL
 config TEST_SYSFS
 	tristate "sysfs test driver"
 	depends on SYSFS
+	select FAULT_INJECTION
+	select FAULT_INJECTION_DEBUG_FS
+	select FAIL_KERNFS_KNOBS
 	help
 	  This builds the "test_sysfs" module. This driver enables to test the
 	  sysfs file system safely without affecting production knobs which
diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c
index bf43016d40b5..f27ec0eab747 100644
--- a/lib/test_sysfs.c
+++ b/lib/test_sysfs.c
@@ -38,6 +38,11 @@
 #include <linux/rtnetlink.h>
 #include <linux/genhd.h>
 #include <linux/blkdev.h>
+#include <linux/kernfs.h>
+
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+MODULE_IMPORT_NS(KERNFS_DEBUG_PRIVATE);
+#endif
 
 static bool enable_lock;
 module_param(enable_lock, bool_enable_only, 0644);
@@ -82,6 +87,13 @@ static bool enable_verbose_rmmod;
 module_param(enable_verbose_rmmod, bool_enable_only, 0644);
 MODULE_PARM_DESC(enable_verbose_rmmod, "enable verbose print messages on rmmod");
 
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+static bool enable_completion_on_rmmod;
+module_param(enable_completion_on_rmmod, bool_enable_only, 0644);
+MODULE_PARM_DESC(enable_completion_on_rmmod,
+		 "enable sending a kernfs completion on rmmod");
+#endif
+
 static int sysfs_test_major;
 
 /**
@@ -289,6 +301,12 @@ static ssize_t config_show(struct device *dev,
 			"enable_verbose_writes:\t%s\n",
 			enable_verbose_writes ? "true" : "false");
 
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+	len += snprintf(buf+len, PAGE_SIZE - len,
+			"enable_completion_on_rmmod:\t%s\n",
+			enable_completion_on_rmmod ? "true" : "false");
+#endif
+
 	test_dev_config_unlock(test_dev);
 
 	return len;
@@ -926,10 +944,23 @@ static int __init test_sysfs_init(void)
 }
 module_init(test_sysfs_init);
 
+#ifdef CONFIG_FAIL_KERNFS_KNOBS
+/* The goal is to race our device removal with a pending kernfs -> store call */
+static void test_sysfs_kernfs_send_completion_rmmod(void)
+{
+	if (!enable_completion_on_rmmod)
+		return;
+	complete(&kernfs_debug_wait_completion);
+}
+#else
+static inline void test_sysfs_kernfs_send_completion_rmmod(void) {}
+#endif
+
 static void __exit test_sysfs_exit(void)
 {
 	if (enable_debugfs)
 		debugfs_remove(debugfs_dir);
+	test_sysfs_kernfs_send_completion_rmmod();
 	if (delay_rmmod_ms)
 		msleep(delay_rmmod_ms);
 	unregister_test_dev_sysfs(first_test_dev);
diff --git a/tools/testing/selftests/sysfs/config b/tools/testing/selftests/sysfs/config
index 9196f452ecd5..2876a229f95b 100644
--- a/tools/testing/selftests/sysfs/config
+++ b/tools/testing/selftests/sysfs/config
@@ -1,2 +1,5 @@
 CONFIG_SYSFS=m
 CONFIG_TEST_SYSFS=m
+CONFIG_FAULT_INJECTION=y
+CONFIG_FAULT_INJECTION_DEBUG_FS=y
+CONFIG_FAIL_KERNFS_KNOBS=y
diff --git a/tools/testing/selftests/sysfs/sysfs.sh b/tools/testing/selftests/sysfs/sysfs.sh
index 681b27579f6f..f27ea61e0e95 100755
--- a/tools/testing/selftests/sysfs/sysfs.sh
+++ b/tools/testing/selftests/sysfs/sysfs.sh
@@ -62,6 +62,10 @@ ALL_TESTS="$ALL_TESTS 0025:1:1:test_dev_y:block"
 ALL_TESTS="$ALL_TESTS 0026:1:1:test_dev_y:block"
 ALL_TESTS="$ALL_TESTS 0027:1:0:test_dev_x:block" # deadlock test
 ALL_TESTS="$ALL_TESTS 0028:1:0:test_dev_x:block" # deadlock test with rntl_lock
+ALL_TESTS="$ALL_TESTS 0029:1:1:test_dev_x:block" # kernfs race removal of store
+ALL_TESTS="$ALL_TESTS 0030:1:1:test_dev_x:block" # kernfs race removal before mutex
+ALL_TESTS="$ALL_TESTS 0031:1:1:test_dev_x:block" # kernfs race removal after mutex
+ALL_TESTS="$ALL_TESTS 0032:1:1:test_dev_x:block" # kernfs race removal after active
 
 allow_user_defaults()
 {
@@ -92,6 +96,9 @@ allow_user_defaults()
 	if [ -z $SYSFS_DEBUGFS_DIR ]; then
 		SYSFS_DEBUGFS_DIR="/sys/kernel/debug/test_sysfs"
 	fi
+	if [ -z $KERNFS_DEBUGFS_DIR ]; then
+		KERNFS_DEBUGFS_DIR="/sys/kernel/debug/kernfs"
+	fi
 	if [ -z $PAGE_SIZE ]; then
 		PAGE_SIZE=$(getconf PAGESIZE)
 	fi
@@ -167,6 +174,14 @@ modprobe_reset_enable_rtnl_lock_on_rmmod()
 	unset FIRST_MODPROBE_ARGS
 }
 
+modprobe_reset_enable_completion()
+{
+	FIRST_MODPROBE_ARGS="enable_completion_on_rmmod=1 enable_verbose_writes=1"
+	FIRST_MODPROBE_ARGS="$FIRST_MODPROBE_ARGS enable_verbose_rmmod=1 delay_rmmod_ms=0"
+	modprobe_reset
+	unset FIRST_MODPROBE_ARGS
+}
+
 load_req_mod()
 {
 	modprobe_reset
@@ -197,6 +212,63 @@ debugfs_reset_first_test_dev_ignore_errors()
 	echo -n "1" >"$SYSFS_DEBUGFS_DIR"/reset_first_test_dev
 }
 
+debugfs_kernfs_kernfs_fop_write_iter_exists()
+{
+	KNOB_DIR="${KERNFS_DEBUGFS_DIR}/config_fail_kernfs_fop_write_iter"
+	if [[ ! -d $KNOB_DIR ]]; then
+		echo "kernfs debugfs does not exist $KNOB_DIR"
+		return 0;
+	fi
+	KNOB_DEBUGFS="${KERNFS_DEBUGFS_DIR}/fail_kernfs_fop_write_iter"
+	if [[ ! -d $KNOB_DEBUGFS ]]; then
+		echo -n "kernfs debugfs for coniguring fail_kernfs_fop_write_iter "
+		echo "does not exist $KNOB_DIR"
+		return 0;
+	fi
+	return 1
+}
+
+debugfs_kernfs_kernfs_fop_write_iter_set_fail_once()
+{
+	KNOB_DEBUGFS="${KERNFS_DEBUGFS_DIR}/fail_kernfs_fop_write_iter"
+	echo 1 > $KNOB_DEBUGFS/interval
+	echo 100 > $KNOB_DEBUGFS/probability
+	echo 0 > $KNOB_DEBUGFS/space
+	# Disable verbose messages on the kernel ring buffer which may
+	# confuse developers with a kernel panic.
+	echo 0 > $KNOB_DEBUGFS/verbose
+
+	# Fail only once
+	echo 1 > $KNOB_DEBUGFS/times
+}
+
+debugfs_kernfs_kernfs_fop_write_iter_set_fail_never()
+{
+	KNOB_DEBUGFS="${KERNFS_DEBUGFS_DIR}/fail_kernfs_fop_write_iter"
+	echo 0 > $KNOB_DEBUGFS/times
+}
+
+debugfs_kernfs_set_wait_ms()
+{
+	SLEEP_AFTER_WAIT_MS="${KERNFS_DEBUGFS_DIR}/sleep_after_wait_ms"
+	echo $1 > $SLEEP_AFTER_WAIT_MS
+}
+
+debugfs_kernfs_disable_wait_kernfs_fop_write_iter()
+{
+	ENABLE_WAIT_KNOB="${KERNFS_DEBUGFS_DIR}/config_fail_kernfs_fop_write_iter/wait_"
+	for KNOB in ${ENABLE_WAIT_KNOB}*; do
+		echo 0 > $KNOB
+	done
+}
+
+debugfs_kernfs_enable_wait_kernfs_fop_write_iter()
+{
+	ENABLE_WAIT_KNOB="${KERNFS_DEBUGFS_DIR}/config_fail_kernfs_fop_write_iter/wait_$1"
+	echo -n "1" > $ENABLE_WAIT_KNOB
+	return $?
+}
+
 set_orig()
 {
 	if [[ ! -z $TARGET ]] && [[ ! -z $ORIG ]]; then
@@ -972,6 +1044,104 @@ sysfs_test_0028()
 	fi
 }
 
+sysfs_race_kernfs_kernfs_fop_write_iter()
+{
+	TARGET="${DIR}/$(get_test_target $1)"
+	WAIT_AT=$2
+	EXPECT_WRITE_RETURNS=$3
+	MSDELAY=$4
+
+	modprobe_reset_enable_completion
+	ORIG=$(cat "${TARGET}")
+	TEST_STR=$(( $ORIG + 1 ))
+
+	echo -n "Test racing removal of sysfs store op with kernfs $WAIT_AT ... "
+
+	if debugfs_kernfs_kernfs_fop_write_iter_exists; then
+		echo -n "skipping test as CONFIG_FAIL_KERNFS_KNOBS "
+		echo " or CONFIG_FAULT_INJECTION_DEBUG_FS is disabled"
+		return $ksft_skip
+	fi
+
+	# Allow for failing the kernfs_kernfs_fop_write_iter call once,
+	# we'll provide exact context shortly afterwards.
+	debugfs_kernfs_kernfs_fop_write_iter_set_fail_once
+
+	# First disable all waits
+	debugfs_kernfs_disable_wait_kernfs_fop_write_iter
+
+	# Enable a wait_for_completion(&kernfs_debug_wait_completion) at the
+	# specified location inside the kernfs_fop_write_iter() routine
+	debugfs_kernfs_enable_wait_kernfs_fop_write_iter $WAIT_AT
+
+	# Configure kernfs so that after its wait_for_completion() it
+	# will msleep() this amount of time and schedule(). We figure this
+	# will be sufficient time to allow for our module removal to complete.
+	debugfs_kernfs_set_wait_ms $MSDELAY
+
+	# Now we trigger a kernfs write op, which will run kernfs_fop_write_iter,
+	# but will wait until our driver sends a respective completion
+	set_test_ignore_errors &
+	write_pid=$!
+
+	# At this point kernfs_fop_write_iter() hasn't run our op, its
+	# waiting for our completion at the specified time $WAIT_AT.
+	# We now remove our module which will send a
+	# complete(&kernfs_debug_wait_completion) right before we deregister
+	# our device and the sysfs device attributes are removed.
+	#
+	# After the completion is sent, the test_sysfs driver races with
+	# kernfs to do the device deregistration with the kernfs msleep
+	# and schedule(). This should mean we've forced trying to remove the
+	# module prior to allowing kernfs to run our store operation. If the
+	# race did happen we'll panic with a null dereference on the store op.
+	#
+	# If no race happens we should see no write operation triggered.
+	modprobe -r $TEST_DRIVER > /dev/null 2>&1
+
+	debugfs_kernfs_kernfs_fop_write_iter_set_fail_never
+
+	wait $write_pid
+	if [[ $? -eq $EXPECT_WRITE_RETURNS ]]; then
+		echo "ok"
+	else
+		echo "FAIL" >&2
+	fi
+}
+
+sysfs_test_0029()
+{
+	for delay in 0 2 4 8 16 32 64 128 246 512 1024; do
+		echo "Using delay-after-completion: $delay"
+		sysfs_race_kernfs_kernfs_fop_write_iter 0029 at_start 1 $delay
+	done
+}
+
+sysfs_test_0030()
+{
+	for delay in 0 2 4 8 16 32 64 128 246 512 1024; do
+		echo "Using delay-after-completion: $delay"
+		sysfs_race_kernfs_kernfs_fop_write_iter 0030 before_mutex 1 $delay
+	done
+}
+
+sysfs_test_0031()
+{
+	for delay in 0 2 4 8 16 32 64 128 246 512 1024; do
+		echo "Using delay-after-completion: $delay"
+		sysfs_race_kernfs_kernfs_fop_write_iter 0031 after_mutex 1 $delay
+	done
+}
+
+# Even if we get the active reference the write fails
+sysfs_test_0032()
+{
+	for delay in 0 2 4 8 16 32 64 128 246 512 1024; do
+		echo "Using delay-after-completion: $delay"
+		sysfs_race_kernfs_kernfs_fop_write_iter 0032 after_active 1 $delay
+	done
+}
+
 test_gen_desc()
 {
 	echo -n "$1 x $(get_test_count $1)"
@@ -1013,6 +1183,10 @@ list_tests()
 	echo "$(test_gen_desc 0026) - block test writing y larger delay and resetting device"
 	echo "$(test_gen_desc 0027) - test rmmod deadlock while writing x ... "
 	echo "$(test_gen_desc 0028) - test rmmod deadlock using rtnl_lock while writing x ..."
+	echo "$(test_gen_desc 0029) - racing removal of store op with kernfs at start"
+	echo "$(test_gen_desc 0030) - racing removal of store op with kernfs before mutex"
+	echo "$(test_gen_desc 0031) - racing removal of store op with kernfs after mutex"
+	echo "$(test_gen_desc 0032) - racing removal of store op with kernfs after active"
 }
 
 usage()
-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 4/4] test_sysfs: demonstrate deadlock fix
  2021-07-03  0:46 [PATCH v2 0/4] selftests: add a new test driver for sysfs Luis Chamberlain
                   ` (2 preceding siblings ...)
  2021-07-03  0:46 ` [PATCH v2 3/4] test_sysfs: add support to use kernfs failure injection Luis Chamberlain
@ 2021-07-03  0:46 ` Luis Chamberlain
  2021-07-03  4:49   ` Greg KH
  3 siblings, 1 reply; 13+ messages in thread
From: Luis Chamberlain @ 2021-07-03  0:46 UTC (permalink / raw)
  To: gregkh, tj, shuah, akpm, rafael, davem, kuba, ast, andriin,
	daniel, atenart, alobakin, weiwan, ap420073
  Cc: jeyu, ngupta, sergey.senozhatsky.work, minchan, mcgrof, axboe,
	mbenes, jpoimboe, tglx, keescook, jikos, rostedt, peterz,
	linux-block, netdev, linux-kselftest, linux-kernel

Two mechanisms have been proposed to fix the sysfs deadlock issue.
The first approach proposed is by optionally allowing drivers to specify
a module and augmenting attributes with module information [0]. A secondary
approach is to use macros on drivers which needs this, in the meantime. This
embraces the secondary approach, in lieu of agreement of a generic solution.
This should be enough to allow for room for experimentation and demonstration
of the issue.

This then also enables the two test cases which we have disabled as
otherwise they would deadlock your system.

./tools/testing/selftests/sysfs/sysfs.sh -t 0027
Running test: sysfs_test_0027 - run #0
Test for possible rmmod deadlock while writing x ... ok

./tools/testing/selftests/sysfs/sysfs.sh -t 0028
Running test: sysfs_test_0028 - run #0
Test for possible rmmod deadlock using rtnl_lock while writing x ... ok

[0] https://lkml.kernel.org/r/20210401235925.GR4332@42.do-not-panic.com

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 lib/test_sysfs.c                       | 71 ++++++++++++++++++++++----
 tools/testing/selftests/sysfs/sysfs.sh |  4 +-
 2 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c
index f27ec0eab747..af14e992e1b8 100644
--- a/lib/test_sysfs.c
+++ b/lib/test_sysfs.c
@@ -94,6 +94,59 @@ MODULE_PARM_DESC(enable_completion_on_rmmod,
 		 "enable sending a kernfs completion on rmmod");
 #endif
 
+#undef __ATTR_RO
+#undef __ATTR_RW
+#undef __ATTR_WO
+
+#define __ATTR_RO(_name) {						\
+	.attr	= { .name = __stringify(_name), .mode = 0444 },		\
+	.show	= module_##_name##_show,						\
+}
+#define __ATTR_RW(_name) __ATTR(_name, 0644, module_##_name##_show, module_##_name##_store)
+#define __ATTR_WO(_name) {						\
+	.attr	= { .name = __stringify(_name), .mode = 0200 },		\
+	.store	= module_##_name##_store,				\
+}
+
+#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
+static ssize_t module_ ## _name ## _store(struct device *dev, \
+				   struct device_attribute *attr, \
+				   const char *buf, size_t len) \
+{ \
+	ssize_t __ret; \
+	if (!try_module_get(THIS_MODULE)) \
+		return -ENODEV; \
+	__ret = _name ## _store(dev, attr, buf, len); \
+	module_put(THIS_MODULE); \
+	return __ret; \
+}
+
+#define MODULE_DEVICE_ATTR_FUNC_SHOW(_name) \
+static ssize_t module_ ## _name ## _show(struct device *dev, \
+					 struct device_attribute *attr, \
+					 char *buf) \
+{ \
+	ssize_t __ret; \
+	if (!try_module_get(THIS_MODULE)) \
+		return -ENODEV; \
+	__ret = _name ## _show(dev, attr, buf); \
+	module_put(THIS_MODULE); \
+	return __ret; \
+}
+
+#define MODULE_DEVICE_ATTR_WO(_name) \
+MODULE_DEVICE_ATTR_FUNC_STORE(_name); \
+static DEVICE_ATTR_WO(_name)
+
+#define MODULE_DEVICE_ATTR_RW(_name) \
+MODULE_DEVICE_ATTR_FUNC_STORE(_name); \
+MODULE_DEVICE_ATTR_FUNC_SHOW(_name); \
+static DEVICE_ATTR_RW(_name)
+
+#define MODULE_DEVICE_ATTR_RO(_name) \
+MODULE_DEVICE_ATTR_FUNC_SHOW(_name); \
+static DEVICE_ATTR_RO(_name)
+
 static int sysfs_test_major;
 
 /**
@@ -311,7 +364,7 @@ static ssize_t config_show(struct device *dev,
 
 	return len;
 }
-static DEVICE_ATTR_RO(config);
+MODULE_DEVICE_ATTR_RO(config);
 
 static ssize_t reset_store(struct device *dev,
 			   struct device_attribute *attr,
@@ -336,7 +389,7 @@ static ssize_t reset_store(struct device *dev,
 
 	return count;
 }
-static DEVICE_ATTR_WO(reset);
+MODULE_DEVICE_ATTR_WO(reset);
 
 static void test_dev_busy_alloc(struct sysfs_test_device *test_dev)
 {
@@ -388,7 +441,7 @@ static ssize_t test_dev_x_show(struct device *dev,
 
 	return ret;
 }
-static DEVICE_ATTR_RW(test_dev_x);
+MODULE_DEVICE_ATTR_RW(test_dev_x);
 
 static ssize_t test_dev_y_store(struct device *dev,
 				struct device_attribute *attr,
@@ -432,7 +485,7 @@ static ssize_t test_dev_y_show(struct device *dev,
 
 	return ret;
 }
-static DEVICE_ATTR_RW(test_dev_y);
+MODULE_DEVICE_ATTR_RW(test_dev_y);
 
 static ssize_t config_enable_lock_store(struct device *dev,
 					struct device_attribute *attr,
@@ -477,7 +530,7 @@ static ssize_t config_enable_lock_show(struct device *dev,
 
 	return ret;
 }
-static DEVICE_ATTR_RW(config_enable_lock);
+MODULE_DEVICE_ATTR_RW(config_enable_lock);
 
 static ssize_t config_enable_lock_on_rmmod_store(struct device *dev,
 						 struct device_attribute *attr,
@@ -519,7 +572,7 @@ static ssize_t config_enable_lock_on_rmmod_show(struct device *dev,
 
 	return ret;
 }
-static DEVICE_ATTR_RW(config_enable_lock_on_rmmod);
+MODULE_DEVICE_ATTR_RW(config_enable_lock_on_rmmod);
 
 static ssize_t config_use_rtnl_lock_store(struct device *dev,
 					  struct device_attribute *attr,
@@ -558,7 +611,7 @@ static ssize_t config_use_rtnl_lock_show(struct device *dev,
 
 	return snprintf(buf, PAGE_SIZE, "%d\n", config->use_rtnl_lock);
 }
-static DEVICE_ATTR_RW(config_use_rtnl_lock);
+MODULE_DEVICE_ATTR_RW(config_use_rtnl_lock);
 
 static ssize_t config_write_delay_msec_y_store(struct device *dev,
 					       struct device_attribute *attr,
@@ -592,7 +645,7 @@ static ssize_t config_write_delay_msec_y_show(struct device *dev,
 
 	return snprintf(buf, PAGE_SIZE, "%d\n", config->write_delay_msec_y);
 }
-static DEVICE_ATTR_RW(config_write_delay_msec_y);
+MODULE_DEVICE_ATTR_RW(config_write_delay_msec_y);
 
 static ssize_t config_enable_busy_alloc_store(struct device *dev,
 					      struct device_attribute *attr,
@@ -626,7 +679,7 @@ static ssize_t config_enable_busy_alloc_show(struct device *dev,
 
 	return snprintf(buf, PAGE_SIZE, "%d\n", config->enable_busy_alloc);
 }
-static DEVICE_ATTR_RW(config_enable_busy_alloc);
+MODULE_DEVICE_ATTR_RW(config_enable_busy_alloc);
 
 #define TEST_SYSFS_DEV_ATTR(name)		(&dev_attr_##name.attr)
 
diff --git a/tools/testing/selftests/sysfs/sysfs.sh b/tools/testing/selftests/sysfs/sysfs.sh
index f27ea61e0e95..2de9f37cb00b 100755
--- a/tools/testing/selftests/sysfs/sysfs.sh
+++ b/tools/testing/selftests/sysfs/sysfs.sh
@@ -60,8 +60,8 @@ ALL_TESTS="$ALL_TESTS 0023:1:1:test_dev_y:block"
 ALL_TESTS="$ALL_TESTS 0024:1:1:test_dev_x:block"
 ALL_TESTS="$ALL_TESTS 0025:1:1:test_dev_y:block"
 ALL_TESTS="$ALL_TESTS 0026:1:1:test_dev_y:block"
-ALL_TESTS="$ALL_TESTS 0027:1:0:test_dev_x:block" # deadlock test
-ALL_TESTS="$ALL_TESTS 0028:1:0:test_dev_x:block" # deadlock test with rntl_lock
+ALL_TESTS="$ALL_TESTS 0027:1:1:test_dev_x:block" # deadlock test
+ALL_TESTS="$ALL_TESTS 0028:1:1:test_dev_x:block" # deadlock test with rntl_lock
 ALL_TESTS="$ALL_TESTS 0029:1:1:test_dev_x:block" # kernfs race removal of store
 ALL_TESTS="$ALL_TESTS 0030:1:1:test_dev_x:block" # kernfs race removal before mutex
 ALL_TESTS="$ALL_TESTS 0031:1:1:test_dev_x:block" # kernfs race removal after mutex
-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] test_sysfs: demonstrate deadlock fix
  2021-07-03  0:46 ` [PATCH v2 4/4] test_sysfs: demonstrate deadlock fix Luis Chamberlain
@ 2021-07-03  4:49   ` Greg KH
  2021-07-03 17:28     ` Luis Chamberlain
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2021-07-03  4:49 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: tj, shuah, akpm, rafael, davem, kuba, ast, andriin, daniel,
	atenart, alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kselftest, linux-kernel

On Fri, Jul 02, 2021 at 05:46:32PM -0700, Luis Chamberlain wrote:
> +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> +static ssize_t module_ ## _name ## _store(struct device *dev, \
> +				   struct device_attribute *attr, \
> +				   const char *buf, size_t len) \
> +{ \
> +	ssize_t __ret; \
> +	if (!try_module_get(THIS_MODULE)) \
> +		return -ENODEV; \
> +	__ret = _name ## _store(dev, attr, buf, len); \
> +	module_put(THIS_MODULE); \
> +	return __ret; \
> +}

As I have pointed out before, doing try_module_get(THIS_MODULE) is racy
and should not be added back to the kernel tree.  We got rid of many
instances of this "bad pattern" over the years, please do not encourage
it to be added back as others will somehow think that it correct code.

I'll go over the rest of this after 5.14-rc1 is out, am busy until then.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] test_sysfs: demonstrate deadlock fix
  2021-07-03  4:49   ` Greg KH
@ 2021-07-03 17:28     ` Luis Chamberlain
  2021-07-21 11:33       ` Greg KH
  0 siblings, 1 reply; 13+ messages in thread
From: Luis Chamberlain @ 2021-07-03 17:28 UTC (permalink / raw)
  To: Greg KH
  Cc: tj, shuah, akpm, rafael, davem, kuba, ast, andriin, daniel,
	atenart, alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kselftest, linux-kernel

On Sat, Jul 03, 2021 at 06:49:46AM +0200, Greg KH wrote:
> On Fri, Jul 02, 2021 at 05:46:32PM -0700, Luis Chamberlain wrote:
> > +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> > +static ssize_t module_ ## _name ## _store(struct device *dev, \
> > +				   struct device_attribute *attr, \
> > +				   const char *buf, size_t len) \
> > +{ \
> > +	ssize_t __ret; \
> > +	if (!try_module_get(THIS_MODULE)) \
> > +		return -ENODEV; \
> > +	__ret = _name ## _store(dev, attr, buf, len); \
> > +	module_put(THIS_MODULE); \
> > +	return __ret; \
> > +}
> 
> As I have pointed out before, doing try_module_get(THIS_MODULE) is racy
> and should not be added back to the kernel tree.  We got rid of many
> instances of this "bad pattern" over the years, please do not encourage
> it to be added back as others will somehow think that it correct code.

It is noted this is used in lieu of any agreed upon solution to
*demonstrate* how this at least does fix it. In this case (and in the
generic solution I also had suggested for kernfs a while ago), if the
try fails, we give up. If it succeeds, we now know we can rely on the
device pointer. If the refcount succeeds, can the module still not
be present? Is try_module_get() racy in that way? In what way is it
racy and where is this documented? Do we have a selftest to prove the
race?

  Luis


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] selftests: add tests_sysfs module
  2021-07-03  0:46 ` [PATCH v2 1/4] selftests: add tests_sysfs module Luis Chamberlain
@ 2021-07-21 11:32   ` Greg KH
  2021-07-22 22:34     ` Luis Chamberlain
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2021-07-21 11:32 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: tj, shuah, akpm, rafael, davem, kuba, ast, andriin, daniel,
	atenart, alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kselftest, linux-kernel

On Fri, Jul 02, 2021 at 05:46:29PM -0700, Luis Chamberlain wrote:
> This adds a new selftest module which can be used to test sysfs, which
> would otherwise require using an existing driver. This lets us muck
> with a template driver to test breaking things without affecting
> system behaviour or requiring the dependencies of a real device
> driver.
> 
> A series of 28 tests are added. Support for using two device types are
> supported:
> 
>   * misc
>   * block
> 
> Contrary to sysctls, sysfs requires a full write to happen at once, and
> so we reduce the digit tests to single writes. Two main sysfs knobs are
> provided for testing reading/storing, one which doesn't inclur any
> delays and another which can incur programmed delays. What locks are
> held, if any, are configurable, at module load time, or through dynamic
> configuration at run time.
> 
> Since sysfs is a technically filesystem, but a pseudo one, which
> requires a kernel user, our test_sysfs module and respective test script
> embraces fstests format for tests in the kernel ring bufffer. Likewise,
> a scraper for kernel crashes is provided which matches what fstests does
> as well.
> 
> Two tests are kept disabled as they currently cause a deadlock, and so
> this provides a mechanism to easily show proof and demo how the deadlock
> can happen:
> 
> Demos the deadlock with a device specific lock
> ./tools/testing/selftests/sysfs/sysfs.sh -t 0027
> 
> Demos the deadlock with rtnl_lock()
> ./tools/testing/selftests/sysfs/sysfs.sh -t 0028
> 
> Two separate solutions to the deadlock issue have been proposed,
> and so now its a matter of either documenting this limitation or
> eventually adopting a generic fix.
> 
> This selftests will shortly be expanded upon with more tests which
> require further kernel changes in order to provide better test
> coverage.

Why is this not using kunit?  We should not be adding new in-kernel
tests that are not using that api anymore.

> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  MAINTAINERS                            |    7 +
>  lib/Kconfig.debug                      |   10 +
>  lib/Makefile                           |    1 +
>  lib/test_sysfs.c                       |  943 +++++++++++++++++++
>  tools/testing/selftests/sysfs/Makefile |   12 +
>  tools/testing/selftests/sysfs/config   |    2 +
>  tools/testing/selftests/sysfs/sysfs.sh | 1202 ++++++++++++++++++++++++
>  7 files changed, 2177 insertions(+)
>  create mode 100644 lib/test_sysfs.c
>  create mode 100644 tools/testing/selftests/sysfs/Makefile
>  create mode 100644 tools/testing/selftests/sysfs/config
>  create mode 100755 tools/testing/selftests/sysfs/sysfs.sh
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 66d047dc6880..fd369ed50040 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17958,6 +17958,13 @@ L:	linux-mmc@vger.kernel.org
>  S:	Maintained
>  F:	drivers/mmc/host/sdhci-pci-dwc-mshc.c
>  
> +SYSFS TEST DRIVER
> +M:	Luis Chamberlain <mcgrof@kernel.org>
> +L:	linux-kernel@vger.kernel.org
> +S:	Maintained
> +F:	lib/test_sysfs.c
> +F:	tools/testing/selftests/sysfs/
> +
>  SYSTEM CONFIGURATION (SYSCON)
>  M:	Lee Jones <lee.jones@linaro.org>
>  M:	Arnd Bergmann <arnd@arndb.de>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index fb370c7c4756..568838ac1189 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2360,6 +2360,16 @@ config TEST_SYSCTL
>  
>  	  If unsure, say N.
>  
> +config TEST_SYSFS
> +	tristate "sysfs test driver"
> +	depends on SYSFS
> +	help
> +	  This builds the "test_sysfs" module. This driver enables to test the
> +	  sysfs file system safely without affecting production knobs which
> +	  might alter system functionality.
> +
> +	  If unsure, say N.
> +
>  config BITFIELD_KUNIT
>  	tristate "KUnit test bitfield functions at runtime"
>  	depends on KUNIT
> diff --git a/lib/Makefile b/lib/Makefile
> index 5efd1b435a37..effd1ef806f0 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -61,6 +61,7 @@ obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
>  obj-$(CONFIG_TEST_BITOPS) += test_bitops.o
>  CFLAGS_test_bitops.o += -Werror
>  obj-$(CONFIG_TEST_SYSCTL) += test_sysctl.o
> +obj-$(CONFIG_TEST_SYSFS) += test_sysfs.o
>  obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
>  obj-$(CONFIG_TEST_IDA) += test_ida.o
>  obj-$(CONFIG_KASAN_KUNIT_TEST) += test_kasan.o
> diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c
> new file mode 100644
> index 000000000000..bf43016d40b5
> --- /dev/null
> +++ b/lib/test_sysfs.c
> @@ -0,0 +1,943 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later

This does not match your "boiler-plate" text below, sorry.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] test_sysfs: demonstrate deadlock fix
  2021-07-03 17:28     ` Luis Chamberlain
@ 2021-07-21 11:33       ` Greg KH
  2021-07-22 22:36         ` Luis Chamberlain
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2021-07-21 11:33 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: tj, shuah, akpm, rafael, davem, kuba, ast, andriin, daniel,
	atenart, alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kselftest, linux-kernel

On Sat, Jul 03, 2021 at 10:28:28AM -0700, Luis Chamberlain wrote:
> On Sat, Jul 03, 2021 at 06:49:46AM +0200, Greg KH wrote:
> > On Fri, Jul 02, 2021 at 05:46:32PM -0700, Luis Chamberlain wrote:
> > > +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> > > +static ssize_t module_ ## _name ## _store(struct device *dev, \
> > > +				   struct device_attribute *attr, \
> > > +				   const char *buf, size_t len) \
> > > +{ \
> > > +	ssize_t __ret; \
> > > +	if (!try_module_get(THIS_MODULE)) \
> > > +		return -ENODEV; \
> > > +	__ret = _name ## _store(dev, attr, buf, len); \
> > > +	module_put(THIS_MODULE); \
> > > +	return __ret; \
> > > +}
> > 
> > As I have pointed out before, doing try_module_get(THIS_MODULE) is racy
> > and should not be added back to the kernel tree.  We got rid of many
> > instances of this "bad pattern" over the years, please do not encourage
> > it to be added back as others will somehow think that it correct code.
> 
> It is noted this is used in lieu of any agreed upon solution to
> *demonstrate* how this at least does fix it. In this case (and in the
> generic solution I also had suggested for kernfs a while ago), if the
> try fails, we give up. If it succeeds, we now know we can rely on the
> device pointer. If the refcount succeeds, can the module still not
> be present? Is try_module_get() racy in that way? In what way is it
> racy and where is this documented? Do we have a selftest to prove the
> race?

As I say in the other email where you tried to add this, think about
what happens if the module is removed _right before_ you make this call.

Or a few instructions before that.  The race is still there, this fixes
nothing except make the window smaller.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] selftests: add tests_sysfs module
  2021-07-21 11:32   ` Greg KH
@ 2021-07-22 22:34     ` Luis Chamberlain
  2021-07-23 10:38       ` Greg KH
  0 siblings, 1 reply; 13+ messages in thread
From: Luis Chamberlain @ 2021-07-22 22:34 UTC (permalink / raw)
  To: Greg KH, Shuah Khan
  Cc: tj, shuah, akpm, rafael, davem, kuba, ast, andriin, daniel,
	atenart, alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kselftest, linux-kernel

On Wed, Jul 21, 2021 at 01:32:41PM +0200, Greg KH wrote:
> On Fri, Jul 02, 2021 at 05:46:29PM -0700, Luis Chamberlain wrote:
> > This selftests will shortly be expanded upon with more tests which
> > require further kernel changes in order to provide better test
> > coverage.
> 
> Why is this not using kunit?  We should not be adding new in-kernel
> tests that are not using that api anymore.

No way. That cannot possibly be true. When was this decided? Did
Shuah Khan, the maintainer of selftests, all of a sudden decide we
are going to deprecate selftests in favor for trying to only use
kunit? Did we have a conference where this was talked about and decided?

If so all these are huge news to me and I missed the memo!

If I would have been at such meeting I would have definitely yelled
bloody murder!

kunit relies on UML and UML is a simple one core architecture, to start
with. This means I cannot run tests for multicore with it, which is
where many races do happen! Yes, you can run kunit on other
architectures, but all that is new.

Second, I did help review kunit getting upstream, and suggested a few
example tests, part of which were for sysctl to compare and contrast
what is possible and what we cannot do.

Not everything we want to test should be written as a kunit test.
No way.

In this case kunit is not ideal given I want to mimic something in
userspace interaction, and expose races through error injection and
if we can use as many cores to busy races out.

Trust me, I'm an advocate of kunit, and I'm even trying to see ideally
what tests from fstests / blktests could be kunit'ified. But in this
case, no. Using a selftests is much better target framework.

> > diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c
> > new file mode 100644
> > index 000000000000..bf43016d40b5
> > --- /dev/null
> > +++ b/lib/test_sysfs.c
> > @@ -0,0 +1,943 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> 
> This does not match your "boiler-plate" text below, sorry.

Indeed, I'll fix it.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] test_sysfs: demonstrate deadlock fix
  2021-07-21 11:33       ` Greg KH
@ 2021-07-22 22:36         ` Luis Chamberlain
  0 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2021-07-22 22:36 UTC (permalink / raw)
  To: Greg KH
  Cc: tj, shuah, akpm, rafael, davem, kuba, ast, andriin, daniel,
	atenart, alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kselftest, linux-kernel

On Wed, Jul 21, 2021 at 01:33:54PM +0200, Greg KH wrote:
> On Sat, Jul 03, 2021 at 10:28:28AM -0700, Luis Chamberlain wrote:
> > On Sat, Jul 03, 2021 at 06:49:46AM +0200, Greg KH wrote:
> > > On Fri, Jul 02, 2021 at 05:46:32PM -0700, Luis Chamberlain wrote:
> > > > +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> > > > +static ssize_t module_ ## _name ## _store(struct device *dev, \
> > > > +				   struct device_attribute *attr, \
> > > > +				   const char *buf, size_t len) \
> > > > +{ \
> > > > +	ssize_t __ret; \
> > > > +	if (!try_module_get(THIS_MODULE)) \
> > > > +		return -ENODEV; \
> > > > +	__ret = _name ## _store(dev, attr, buf, len); \
> > > > +	module_put(THIS_MODULE); \
> > > > +	return __ret; \
> > > > +}
> > > 
> > > As I have pointed out before, doing try_module_get(THIS_MODULE) is racy
> > > and should not be added back to the kernel tree.  We got rid of many
> > > instances of this "bad pattern" over the years, please do not encourage
> > > it to be added back as others will somehow think that it correct code.
> > 
> > It is noted this is used in lieu of any agreed upon solution to
> > *demonstrate* how this at least does fix it. In this case (and in the
> > generic solution I also had suggested for kernfs a while ago), if the
> > try fails, we give up. If it succeeds, we now know we can rely on the
> > device pointer. If the refcount succeeds, can the module still not
> > be present? Is try_module_get() racy in that way? In what way is it
> > racy and where is this documented? Do we have a selftest to prove the
> > race?
> 
> As I say in the other email where you tried to add this, think about
> what happens if the module is removed _right before_ you make this call.
> 
> Or a few instructions before that.  The race is still there, this fixes
> nothing except make the window smaller.

The kernfs active reference ensures that if the file is open the module
must still exist. As such, the use within sysfs files should be safe
as the module is the one in charge of removing the files.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] selftests: add tests_sysfs module
  2021-07-22 22:34     ` Luis Chamberlain
@ 2021-07-23 10:38       ` Greg KH
  2021-07-23 17:32         ` Luis Chamberlain
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2021-07-23 10:38 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Shuah Khan, tj, shuah, akpm, rafael, davem, kuba, ast, andriin,
	daniel, atenart, alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kselftest, linux-kernel

On Thu, Jul 22, 2021 at 03:34:49PM -0700, Luis Chamberlain wrote:
> kunit relies on UML and UML is a simple one core architecture, to start
> with.

I thought the UML requirement was long gone, are you sure it is still
present?

> This means I cannot run tests for multicore with it, which is
> where many races do happen! Yes, you can run kunit on other
> architectures, but all that is new.

What do you mean by "new"?  It should work today, in today's kernel
tree, right?

> In this case kunit is not ideal given I want to mimic something in
> userspace interaction, and expose races through error injection and
> if we can use as many cores to busy races out.

Can you not do that with kunit?  If not, why not?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] selftests: add tests_sysfs module
  2021-07-23 10:38       ` Greg KH
@ 2021-07-23 17:32         ` Luis Chamberlain
  0 siblings, 0 replies; 13+ messages in thread
From: Luis Chamberlain @ 2021-07-23 17:32 UTC (permalink / raw)
  To: Greg KH
  Cc: Shuah Khan, tj, shuah, akpm, rafael, davem, kuba, ast, andriin,
	daniel, atenart, alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kselftest, linux-kernel

On Fri, Jul 23, 2021 at 12:38:27PM +0200, Greg KH wrote:
> On Thu, Jul 22, 2021 at 03:34:49PM -0700, Luis Chamberlain wrote:
> > kunit relies on UML and UML is a simple one core architecture, to start
> > with.
> 
> I thought the UML requirement was long gone, are you sure it is still
> present?

It's *the* way to run kunit.

> > This means I cannot run tests for multicore with it, which is
> > where many races do happen! Yes, you can run kunit on other
> > architectures, but all that is new.
> 
> What do you mean by "new"?  It should work today, in today's kernel
> tree, right?

That was experimental. And I know no one using it.

> > In this case kunit is not ideal given I want to mimic something in
> > userspace interaction, and expose races through error injection and
> > if we can use as many cores to busy races out.
> 
> Can you not do that with kunit?  If not, why not?

kunit requires codifying everything in C, what we need is a lot of
flexibility to do all sorts of races in userspace, and actually use
userspace tools.

I am *not* going to even try to rewrite my selftest as a kunit test.
It is just not going to happen.

  Luis

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-07-23 17:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-03  0:46 [PATCH v2 0/4] selftests: add a new test driver for sysfs Luis Chamberlain
2021-07-03  0:46 ` [PATCH v2 1/4] selftests: add tests_sysfs module Luis Chamberlain
2021-07-21 11:32   ` Greg KH
2021-07-22 22:34     ` Luis Chamberlain
2021-07-23 10:38       ` Greg KH
2021-07-23 17:32         ` Luis Chamberlain
2021-07-03  0:46 ` [PATCH v2 2/4] kernfs: add initial failure injection support Luis Chamberlain
2021-07-03  0:46 ` [PATCH v2 3/4] test_sysfs: add support to use kernfs failure injection Luis Chamberlain
2021-07-03  0:46 ` [PATCH v2 4/4] test_sysfs: demonstrate deadlock fix Luis Chamberlain
2021-07-03  4:49   ` Greg KH
2021-07-03 17:28     ` Luis Chamberlain
2021-07-21 11:33       ` Greg KH
2021-07-22 22:36         ` Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).